This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
3/6
SIFrameLowering.cpp
1/5
SIMachineFunctionInfo.h
4
SIMachineFunctionInfo.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
2/4
assert-align.ll
-
call-outgoing-stack-args.ll
-
localizer.ll
-
abi-attribute-hints-undefined-behavior.ll
-
amdpal-callable.ll
-
call-graph-register-usage.ll
-
call-preserved-registers.ll
1/2
callee-frame-setup.ll
-
cross-block-use-is-not-abi-copy.ll
-
dwarf-multi-register-use-crash.ll
-
frame-setup-without-sgpr-to-vgpr-spills.ll
-
gfx-call-non-gfx-func.ll
-
gfx-callable-argument-types.ll
-
gfx-callable-preserved-registers.ll
-
gfx-callable-return-types.ll
-
indirect-call.ll
-
mul24-pass-ordering.ll
-
need-fp-from-vgpr-spills.ll
-
nested-calls.ll
-
no-source-locations-in-prologue.ll
-
pei-scavenge-vgpr-spill.mir
-
save-fp.ll
-
sgpr-spills-split-regalloc.ll
-
sibling-call.ll
-
spill-csr-frame-ptr-reg-copy.ll
-
stack-realign.ll
-
tail-call-amdgpu-gfx.ll
-
unstructured-cfg-def-use-issue.ll
-
vgpr-tuple-allocation.ll
-
wave32.ll
-
wwm-reserved-spill.ll

Differential D124195

[AMDGPU] Separate out SGPR spills to VGPR lanes during PEI
ClosedPublic

Authored by cdevadas on Apr 21 2022, 12:16 PM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
sebastian-ne
nhaehnle

Commits

rGb25b4c0ab4ad: [AMDGPU] Separate out SGPR spills to VGPR lanes during PEI

Summary

SILowerSGPRSpills pass handles the lowering of SGPR spills
into VGPR lanes. Some SGPR spills are handled later during
PEI. There is a common function used in both places to find
the free VGPR lane. This patch eliminates that dependency to
find the free VGPR by handling it separately for PEI. It is a
prerequisite patch for a future work to allow SGPR spills to
virtual VGPR lanes during SILowerSGPRSpills.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	80 ms	x64 debian > LLVM.CodeGen/AMDGPU::csr-sgpr-spill-live-ins.mir
	60,040 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test

Event Timeline

cdevadas created this revision.Apr 21 2022, 12:16 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 12:16 PM

Herald added subscribers: hsmhsm, foad, kerbowa and 10 others. · View Herald Transcript

cdevadas requested review of this revision.Apr 21 2022, 12:16 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 12:16 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B160699: Diff 424262.Apr 21 2022, 12:17 PM

cdevadas added a parent revision: D124194: [AMDGPU] Correctly set IsKill flag for VGPR spills in the prolog.Apr 21 2022, 12:18 PM

cdevadas added a child revision: D124196: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs.

foad added inline comments.Apr 22 2022, 3:05 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll
12	Seems like a regression. Does this get fixed by a later patch?

cdevadas added inline comments.Apr 25 2022, 4:20 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll
12	Yes, it is. With spilling SGPRs into virtual VPGR lanes, it won't directly be possible to track the unused lanes of the physical VGPR allocated for the last virtual register created during `SILowerSGPRSpills` pass. Going to insert a custom pass in the VGPR regalloc pipeline to map the physReg from virtRegMap. In that way, we can reuse the VGPR for any custom SGPR spills during PEI if free lanes are available. However, this regression can only be avoided for higher optimization levels. The `regallocfast`doesn't provide a way to correctly map a virtual to PhysReg and we can't avoid this extra VGPR usage when compiled for -O0.

arsenm added inline comments.Apr 25 2022, 1:44 PM

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll
12	I'm not sure a separate pass using VirtRegMap is the right solution to merging spill VGPRs of different SGPRs, but either way this is a separate optimization that needs to be re-implemented.

cdevadas added inline comments.Apr 26 2022, 3:45 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll
12	It's worth implementing when it comes to saving a VGPR. Yep, planning it as a separate patch.

Code rebase

Herald added subscribers: kosarev, jsilvanus. · View Herald TranscriptJun 27 2022, 10:05 AM

Harbormaster completed remote builds in B172247: Diff 440292.Jun 27 2022, 11:54 AM

arsenm added inline comments.Jun 28 2022, 11:51 AM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
465	Needs to document what "custom" means. Also the fact that it's not serialized makes me nervous. However, that should be OK since this is only set and read in PEI so it should be OK. Ideally we would have somewhere else to put it

Added a meaningful comment for SGPRToVGPRCustomSpills.

Harbormaster completed remote builds in B172561: Diff 440730.Jun 28 2022, 2:07 PM

arsenm added inline comments.Jun 28 2022, 3:41 PM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
469	Isn't this count implied by SGPRToVGPRCustomSpills.size()? I'd like to avoid multiplying the number of unserialized fields

cdevadas removed a parent revision: D124194: [AMDGPU] Correctly set IsKill flag for VGPR spills in the prolog.Jun 29 2022, 9:10 AM

cdevadas removed a child revision: D124196: [AMDGPU][SILowerSGPRSpills] Spill SGPRs to virtual VGPRs.Jun 29 2022, 9:17 AM

cdevadas added inline comments.Sep 23 2022, 5:37 AM

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
469	Currently, the function `SIMachineFunctionInfo::allocateSGPRSpillToVGPR` needs this variable to choose the num spills between the SILowerSGPRSpills pass and the custom spills later during FrameLowering. I'm planning to move these functions entirely out of SIMachineFunctionInfo and can avoid these variables entirely.

Rebase

Harbormaster completed remote builds in B188383: Diff 462455.Sep 23 2022, 5:39 AM

Ping.

Ping

cdevadas added a parent revision: D124194: [AMDGPU] Correctly set IsKill flag for VGPR spills in the prolog.Oct 27 2022, 11:33 PM

cdevadas added a child revision: D132436: [AMDGPU][SIFrameLowering] Unify PEI SGPR spill saves and restores.

Almost everything surrounding allocateSGPRSpillToVGPR and its companion methods is horrible from a coding style perspective. Now, a lot of this horribleness is pre-existing; that said, can you please get it while you're working on this anyway? Couple of things come to mind:

allocateVGPRForSGPRSpills and allocateVGPRForCustomSGPRSpills are very specialized and interact in uncomfortable ways with ambient state. They absolutely must not be public.
They are both basically the same method, and their existence only makes sense in the context of the basically identically named allocateSGPRSpillToVGPR. Please just merge those methods and inline them, so only allocateSGPRSpillToVGPR remains.
How about renaming allocateSGPRSpillToVGPR to allocateSGPRSpillToVGPRLanes? This accounts for the fact that an SGPR spill is neither allocated to an entire VGPR, nor is an SGPR spill necessarily allocated to a single VGPR (it could cross multiple VGPRs depending on how the lane allocation works out)
Relying on WWMSpills.back() to get the most recently used spill VGPR makes me rather uncomfortable. Since there's persistent tracking of allocated lanes already in the form of NumVGPR[Custom]SpillLanes, please just track the currently "open" VGPR explicitly as well.
A lot of data is tied to the "custom / non-custom" distinction (which does need a better name). How about defining a struct SIMachineFunctionInfo::SGPRToVGPRSpills and moving all the related data in there (FrameIndex to (VGPR,lane) map, next (VGPR, lane) pair)? That will make the case distinction flow more nicely.

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
466–469	Simplify this further to a simpler for loop and finally `SGPRToVGPRSpills.clear()`
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
462–463	s/wave index/lane index/?
465	Agree that "custom" is a bad name. It seems to be for prolog/epilog purposes, name it accordingly.

The patch is reasonable in terms of what it does, by the way, just that the code is a mess and I think it should be cleaned up reasonably while it's being touched anyway.

In D124195#3892023, @nhaehnle wrote:

Almost everything surrounding allocateSGPRSpillToVGPR and its companion methods is horrible from a coding style perspective. Now, a lot of this horribleness is pre-existing; that said, can you please get it while you're working on this anyway? Couple of things come to mind:

allocateVGPRForSGPRSpills and allocateVGPRForCustomSGPRSpills are very specialized and interact in uncomfortable ways with ambient state. They absolutely must not be public.

They are both basically the same method, and their existence only makes sense in the context of the basically identically named allocateSGPRSpillToVGPR. Please just merge those methods and inline them, so only allocateSGPRSpillToVGPR remains.

How about renaming allocateSGPRSpillToVGPR to allocateSGPRSpillToVGPRLanes? This accounts for the fact that an SGPR spill is neither allocated to an entire VGPR, nor is an SGPR spill necessarily allocated to a single VGPR (it could cross multiple VGPRs depending on how the lane allocation works out)

Relying on WWMSpills.back() to get the most recently used spill VGPR makes me rather uncomfortable. Since there's persistent tracking of allocated lanes already in the form of NumVGPR[Custom]SpillLanes, please just track the currently "open" VGPR explicitly as well.

A lot of data is tied to the "custom / non-custom" distinction (which does need a better name). How about defining a struct SIMachineFunctionInfo::SGPRToVGPRSpills and moving all the related data in there (FrameIndex to (VGPR,lane) map, next (VGPR, lane) pair)? That will make the case distinction flow more nicely.

Ideally, D124195 and D132436 are mostly code-refactor and enabler patches for spilling SGPRs into virtual VGPR lanes and they both should have gone in the final patch D124196 that does the spill to virtual VGPRs. I want to use the convention SGPRSpillToVirtVGPRLanes (for SILowerSGPRSpills) and SGPRSpillToPhysVGPRLanes (for SIFrameLowering) for the two maps that track the spill info. But combining them into a single review would make the patch more complex with too many things in one place. So, I have split them into separate reviews. At this point, the SGPR spills at both places go into physical VGPR lanes and I can’t use the aforementioned names for the maps. The original plan was to have a code clean-up after all these patches landed. Yes, SIMachineFunctionInfo is currently in a bad shape. I want to move out the spill related tables and methods and place them into SILowerSGPRSpills and SIFrameLowering passes. Yes, planning to introduce a structure (just like SIMachineFunctionInfo::SGPRToVGPRSpills). I can incorporate all the suggestions you mentioned here in the post-cleanup patch.
At this point, there is a lot of common code for spill handling. But after they become spill to Virtual vs Physical VGPRs, the bookkeeping differs, and we can have a better cleanup.
Hope that would be ok. For now, I will change the term “custom” in this review and can use a better name.
I don’t either like the name “custom”. But couldn’t find a better short name.
How about PrologEpilogSGPRSpillToVGPRLanes instead of SGPRToVGPRCustomSpills?

In D124195#3892634, @cdevadas wrote:

In D124195#3892023, @nhaehnle wrote:

Almost everything surrounding allocateSGPRSpillToVGPR and its companion methods is horrible from a coding style perspective. Now, a lot of this horribleness is pre-existing; that said, can you please get it while you're working on this anyway? Couple of things come to mind:

allocateVGPRForSGPRSpills and allocateVGPRForCustomSGPRSpills are very specialized and interact in uncomfortable ways with ambient state. They absolutely must not be public.

They are both basically the same method, and their existence only makes sense in the context of the basically identically named allocateSGPRSpillToVGPR. Please just merge those methods and inline them, so only allocateSGPRSpillToVGPR remains.

How about renaming allocateSGPRSpillToVGPR to allocateSGPRSpillToVGPRLanes? This accounts for the fact that an SGPR spill is neither allocated to an entire VGPR, nor is an SGPR spill necessarily allocated to a single VGPR (it could cross multiple VGPRs depending on how the lane allocation works out)

Relying on WWMSpills.back() to get the most recently used spill VGPR makes me rather uncomfortable. Since there's persistent tracking of allocated lanes already in the form of NumVGPR[Custom]SpillLanes, please just track the currently "open" VGPR explicitly as well.

A lot of data is tied to the "custom / non-custom" distinction (which does need a better name). How about defining a struct SIMachineFunctionInfo::SGPRToVGPRSpills and moving all the related data in there (FrameIndex to (VGPR,lane) map, next (VGPR, lane) pair)? That will make the case distinction flow more nicely.

Ideally, D124195 and D132436 are mostly code-refactor and enabler patches for spilling SGPRs into virtual VGPR lanes and they both should have gone in the final patch D124196 that does the spill to virtual VGPRs. I want to use the convention SGPRSpillToVirtVGPRLanes (for SILowerSGPRSpills) and SGPRSpillToPhysVGPRLanes (for SIFrameLowering) for the two maps that track the spill info. But combining them into a single review would make the patch more complex with too many things in one place. So, I have split them into separate reviews. At this point, the SGPR spills at both places go into physical VGPR lanes and I can’t use the aforementioned names for the maps. The original plan was to have a code clean-up after all these patches landed. Yes, SIMachineFunctionInfo is currently in a bad shape. I want to move out the spill related tables and methods and place them into SILowerSGPRSpills and SIFrameLowering passes. Yes, planning to introduce a structure (just like SIMachineFunctionInfo::SGPRToVGPRSpills). I can incorporate all the suggestions you mentioned here in the post-cleanup patch.
At this point, there is a lot of common code for spill handling. But after they become spill to Virtual vs Physical VGPRs, the bookkeeping differs, and we can have a better cleanup.

Hmm, I suppose we can live with that. I keep wishing for better ways to review patch series like this one. I miss e-mail based reviews :(

Hope that would be ok. For now, I will change the term “custom” in this review and can use a better name.
I don’t either like the name “custom”. But couldn’t find a better short name.
How about PrologEpilogSGPRSpillToVGPRLanes instead of SGPRToVGPRCustomSpills?

Yeah, it's long but that name is at least precise :) I guess longer term it just becomes spill-to-virtual and spill-to-physical as you said?

Yeah, it's long but that name is at least precise :) I guess longer term it just becomes spill-to-virtual and spill-to-physical as you said?

That's right.

Removed the prefix "Custom" from the SGPR spills during PrologEpilogInserter and used a meaningful name instead.

Harbormaster completed remote builds in B195426: Diff 472236.Nov 1 2022, 1:45 AM

Made allocateVGPRForSGPRSpills & allocateVGPRForPrologEpilogSGPRSpills methods private.

Harbormaster completed remote builds in B195453: Diff 472278.Nov 1 2022, 6:40 AM

In D124195#3892023, @nhaehnle wrote:

How about renaming allocateSGPRSpillToVGPR to allocateSGPRSpillToVGPRLanes? This accounts for the fact that an SGPR spill is neither allocated to an entire VGPR, nor is an SGPR spill necessarily allocated to a single VGPR (it could cross multiple VGPRs depending on how the lane allocation works out)

I believe this is actually an optimization we're regressing on with the switch to spilling to virtual VGPRs. It will need to be reimplemented as a new optimization

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
339–342	Can we defer this until after all the spills are handled?

Deferred adding lane VGPR into BBLiveIns until all SGPR spills are handled.

Harbormaster completed remote builds in B195606: Diff 472474.Nov 1 2022, 6:44 PM

arsenm added inline comments.Nov 1 2022, 6:59 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1283–1289	Actually, do we really need to do this anymore? If they were allocated from virtual registers, they should have correct livens lists already

cdevadas added inline comments.Nov 1 2022, 7:27 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1283–1289	They are needed for prolog/epilog spill insertion. If we don't mark them liveIn, there will be a MIR verifier error indicating the use of undefined registers in spill instructions.

Ping

arsenm added inline comments.Nov 7 2022, 8:00 AM

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll
415	Why the behavior change? Is this restored in a later patch?

cdevadas added inline comments.Nov 7 2022, 8:23 AM

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll
415	It's already been discussed. Jay earlier asked about the same in this review. I'm planning a follow-up patch to regain it. Using the VRM map, the unused lanes of the last allocated VGPR virtual register for SGPR spilling can be tracked and can use later during FrameLowering while trying to spill FP/BP.

Ping

arsenm added inline comments.Nov 14 2022, 11:54 AM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1283–1289	This feels too coarse grain. The whole point of doing this was to allocate these like normal virtual registers, which should then have naturally set liveins already. Is this only handling the prolog/epilog cases? It should only need to do anything for those
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
298–305	I think this referenced error cannot happen anymore
378	IsPEI feels like the wrong name. IsPrologEpilog would be a bit better but not great

cdevadas added inline comments.Nov 14 2022, 10:08 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1283–1289	Yes, they are needed only for prolog/epilog spill cases.

Renamed IsPEI to IsPrologEpilog & removed the unwanted comment.

Harbormaster completed remote builds in B197669: Diff 475339.Nov 14 2022, 10:15 PM

arsenm added inline comments.Nov 14 2022, 10:53 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1283–1289	But getWWMSpills covers everything? this is adding excess live ins?

cdevadas added inline comments.Nov 14 2022, 11:06 PM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
1283–1289	It doesn't necessarily add the live-ins at the entry block. We insert the spill to a virt-VGPR at a block properly adding the IMPLICIT_DEF at its dominator block. The physical VGPR allocated for this virt-VGPR should be added to the prolog block live-ins otherwise verifier would complain about its spill store for using an undefined register.

arsenm accepted this revision.Nov 17 2022, 5:33 PM

This revision is now accepted and ready to land.Nov 17 2022, 5:33 PM

code rebase

Harbormaster completed remote builds in B203143: Diff 482885.Dec 14 2022, 9:06 AM

arsenm accepted this revision.Dec 14 2022, 10:10 AM

This revision was landed with ongoing or failed builds.Dec 16 2022, 10:20 PM

Closed by commit rGb25b4c0ab4ad: [AMDGPU] Separate out SGPR spills to VGPR lanes during PEI (authored by cdevadas). · Explain Why

This revision was automatically updated to reflect the committed changes.

cdevadas added a commit: rGb25b4c0ab4ad: [AMDGPU] Separate out SGPR spills to VGPR lanes during PEI.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIFrameLowering.cpp

35 lines

SIMachineFunctionInfo.h

24 lines

SIMachineFunctionInfo.cpp

139 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

assert-align.ll

6 lines

call-outgoing-stack-args.ll

24 lines

localizer.ll

6 lines

abi-attribute-hints-undefined-behavior.ll

6 lines

amdpal-callable.ll

20 lines

call-graph-register-usage.ll

16 lines

call-preserved-registers.ll

16 lines

callee-frame-setup.ll

36 lines

cross-block-use-is-not-abi-copy.ll

24 lines

dwarf-multi-register-use-crash.ll

6 lines

frame-setup-without-sgpr-to-vgpr-spills.ll

6 lines

gfx-call-non-gfx-func.ll

8 lines

gfx-callable-argument-types.ll

2910 lines

gfx-callable-preserved-registers.ll

184 lines

gfx-callable-return-types.ll

48 lines

indirect-call.ll

80 lines

mul24-pass-ordering.ll

6 lines

need-fp-from-vgpr-spills.ll

12 lines

nested-calls.ll

6 lines

no-source-locations-in-prologue.ll

6 lines

pei-scavenge-vgpr-spill.mir

6 lines

save-fp.ll

4 lines

sgpr-spills-split-regalloc.ll

8 lines

sibling-call.ll

8 lines

spill-csr-frame-ptr-reg-copy.ll

6 lines

stack-realign.ll

10 lines

tail-call-amdgpu-gfx.ll

4 lines

unstructured-cfg-def-use-issue.ll

9 lines

vgpr-tuple-allocation.ll

38 lines

wave32.ll

9 lines

wwm-reserved-spill.ll

16 lines

Diff 440730

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	static void getVGPRSpillLaneOrTempRegister(MachineFunction &MF,
SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
MachineFrameInfo &FrameInfo = MF.getFrameInfo();		MachineFrameInfo &FrameInfo = MF.getFrameInfo();

const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIRegisterInfo *TRI = ST.getRegisterInfo();		const SIRegisterInfo *TRI = ST.getRegisterInfo();

// We need to save and restore the current FP/BP.		// We need to save and restore the current FP/BP.

// 1: If there is already a VGPR with free lanes, use it. We		// 1: Try to save the FP/BP in an unused SGPR.
// may already have to pay the penalty for spilling a CSR VGPR.
if (MFI->haveFreeLanesForSGPRSpill(MF, 1)) {
int NewFI = FrameInfo.CreateStackObject(4, Align(4), true, nullptr,
TargetStackID::SGPRSpill);

if (!MFI->allocateSGPRSpillToVGPR(MF, NewFI))
llvm_unreachable("allocate SGPR spill should have worked");

FrameIndex = NewFI;

LLVM_DEBUG(auto Spill = MFI->getSGPRToVGPRSpills(NewFI).front();
dbgs() << "Spilling " << (IsFP ? "FP" : "BP") << " to "
<< printReg(Spill.VGPR, TRI) << ':' << Spill.Lane
<< '\n');
return;
}

// 2: Next, try to save the FP/BP in an unused SGPR.
TempSGPR = findScratchNonCalleeSaveRegister(		TempSGPR = findScratchNonCalleeSaveRegister(
MF.getRegInfo(), LiveRegs, AMDGPU::SReg_32_XM0_XEXECRegClass, true);		MF.getRegInfo(), LiveRegs, AMDGPU::SReg_32_XM0_XEXECRegClass, true);

if (!TempSGPR) {		if (!TempSGPR) {
int NewFI = FrameInfo.CreateStackObject(4, Align(4), true, nullptr,		int NewFI = FrameInfo.CreateStackObject(4, Align(4), true, nullptr,
TargetStackID::SGPRSpill);		TargetStackID::SGPRSpill);

if (TRI->spillSGPRToVGPR() && MFI->allocateSGPRSpillToVGPR(MF, NewFI)) {		if (TRI->spillSGPRToVGPR() &&
// 3: There's no free lane to spill, and no free register to save FP/BP,		MFI->allocateSGPRSpillToVGPR(MF, NewFI, /* IsPEI */ true)) {
		// 2: There's no free lane to spill, and no free register to save FP/BP,
// so we're forced to spill another VGPR to use for the spill.		// so we're forced to spill another VGPR to use for the spill.
auto Spill = MFI->getSGPRToVGPRSpills(NewFI).front();
MFI->allocateWWMSpill(MF, Spill.VGPR);

FrameIndex = NewFI;		FrameIndex = NewFI;

LLVM_DEBUG(		LLVM_DEBUG(
		auto Spill = MFI->getSGPRToVGPRCustomSpills(NewFI).front();
dbgs() << (IsFP ? "FP" : "BP") << " requires fallback spill to "		dbgs() << (IsFP ? "FP" : "BP") << " requires fallback spill to "
<< printReg(Spill.VGPR, TRI) << ':' << Spill.Lane << '\n';);		<< printReg(Spill.VGPR, TRI) << ':' << Spill.Lane << '\n';);
} else {		} else {
// Remove dead <NewFI> index		// Remove dead <NewFI> index
MF.getFrameInfo().RemoveStackObject(NewFI);		MF.getFrameInfo().RemoveStackObject(NewFI);
// 4: If all else fails, spill the FP/BP to memory.		// 3: If all else fails, spill the FP/BP to memory.
FrameIndex = FrameInfo.CreateSpillStackObject(4, Align(4));		FrameIndex = FrameInfo.CreateSpillStackObject(4, Align(4));
LLVM_DEBUG(dbgs() << "Reserved FI " << FrameIndex << " for spilling "		LLVM_DEBUG(dbgs() << "Reserved FI " << FrameIndex << " for spilling "
<< (IsFP ? "FP" : "BP") << '\n');		<< (IsFP ? "FP" : "BP") << '\n');
}		}
} else {		} else {
LLVM_DEBUG(dbgs() << "Saving " << (IsFP ? "FP" : "BP") << " with copy to "		LLVM_DEBUG(dbgs() << "Saving " << (IsFP ? "FP" : "BP") << " with copy to "
<< printReg(TempSGPR, TRI) << '\n');		<< printReg(TempSGPR, TRI) << '\n');
}		}
▲ Show 20 Lines • Show All 693 Lines • ▼ Show 20 Lines	buildPrologSpill(ST, TRI, *FuncInfo, LiveRegs, MF, MBB, MBBI, DL, TmpVGPR,
FI);		FI);
};		};

auto SaveSGPRToVGPRLane = [&](Register Reg, const int FI) {		auto SaveSGPRToVGPRLane = [&](Register Reg, const int FI) {
assert(!MFI.isDeadObjectIndex(FI));		assert(!MFI.isDeadObjectIndex(FI));

assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);		assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);
ArrayRef<SIRegisterInfo::SpilledReg> Spill =		ArrayRef<SIRegisterInfo::SpilledReg> Spill =
FuncInfo->getSGPRToVGPRSpills(FI);		FuncInfo->getSGPRToVGPRCustomSpills(FI);
assert(Spill.size() == 1);		assert(Spill.size() == 1);

BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_WRITELANE_B32), Spill[0].VGPR)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_WRITELANE_B32), Spill[0].VGPR)
.addReg(Reg)		.addReg(Reg)
.addImm(Spill[0].Lane)		.addImm(Spill[0].Lane)
.addReg(Spill[0].VGPR, RegState::Undef);		.addReg(Spill[0].VGPR, RegState::Undef);
};		};

▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	buildEpilogRestore(ST, TRI, *FuncInfo, LiveRegs, MF, MBB, MBBI, DL, TmpVGPR,
FI);		FI);
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), Reg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), Reg)
.addReg(TmpVGPR, RegState::Kill);		.addReg(TmpVGPR, RegState::Kill);
};		};

auto RestoreSGPRFromVGPRLane = [&](Register Reg, const int FI) {		auto RestoreSGPRFromVGPRLane = [&](Register Reg, const int FI) {
assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);		assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);
ArrayRef<SIRegisterInfo::SpilledReg> Spill =		ArrayRef<SIRegisterInfo::SpilledReg> Spill =
FuncInfo->getSGPRToVGPRSpills(FI);		FuncInfo->getSGPRToVGPRCustomSpills(FI);
assert(Spill.size() == 1);		assert(Spill.size() == 1);
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READLANE_B32), Reg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READLANE_B32), Reg)
.addReg(Spill[0].VGPR)		.addReg(Spill[0].VGPR)
.addImm(Spill[0].Lane);		.addImm(Spill[0].Lane);
};		};

if (FPSaveIndex) {		if (FPSaveIndex) {
const int FramePtrFI = *FPSaveIndex;		const int FramePtrFI = *FPSaveIndex;
▲ Show 20 Lines • Show All 274 Lines • ▼ Show 20 Lines	if (MFI->SGPRForFPSaveRestoreCopy)
LiveRegs.addReg(MFI->SGPRForFPSaveRestoreCopy);		LiveRegs.addReg(MFI->SGPRForFPSaveRestoreCopy);

assert(!MFI->SGPRForBPSaveRestoreCopy &&		assert(!MFI->SGPRForBPSaveRestoreCopy &&
!MFI->BasePointerSaveIndex && "Re-reserving spill slot for BP");		!MFI->BasePointerSaveIndex && "Re-reserving spill slot for BP");
getVGPRSpillLaneOrTempRegister(MF, LiveRegs, MFI->SGPRForBPSaveRestoreCopy,		getVGPRSpillLaneOrTempRegister(MF, LiveRegs, MFI->SGPRForBPSaveRestoreCopy,
MFI->BasePointerSaveIndex, false);		MFI->BasePointerSaveIndex, false);
}		}
}		}

void SIFrameLowering::determineCalleeSavesSGPR(MachineFunction &MF,		void SIFrameLowering::determineCalleeSavesSGPR(MachineFunction &MF,
BitVector &SavedRegs,		BitVector &SavedRegs,
RegScavenger *RS) const {		RegScavenger *RS) const {
TargetFrameLowering::determineCalleeSaves(MF, SavedRegs, RS);		TargetFrameLowering::determineCalleeSaves(MF, SavedRegs, RS);
const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
if (MFI->isEntryFunction())		if (MFI->isEntryFunction())
		arsenmUnsubmitted Not Done Reply Inline Actions Actually, do we really need to do this anymore? If they were allocated from virtual registers, they should have correct livens lists already arsenm: Actually, do we really need to do this anymore? If they were allocated from virtual registers…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions They are needed for prolog/epilog spill insertion. If we don't mark them liveIn, there will be a MIR verifier error indicating the use of undefined registers in spill instructions. cdevadas: They are needed for prolog/epilog spill insertion. If we don't mark them liveIn, there will be…
		arsenmUnsubmitted Not Done Reply Inline Actions This feels too coarse grain. The whole point of doing this was to allocate these like normal virtual registers, which should then have naturally set liveins already. Is this only handling the prolog/epilog cases? It should only need to do anything for those arsenm: This feels too coarse grain. The whole point of doing this was to allocate these like normal…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Yes, they are needed only for prolog/epilog spill cases. cdevadas: Yes, they are needed only for prolog/epilog spill cases.
		arsenmUnsubmitted Not Done Reply Inline Actions But getWWMSpills covers everything? this is adding excess live ins? arsenm: But getWWMSpills covers everything? this is adding excess live ins?
		cdevadasAuthorUnsubmitted Done Reply Inline Actions It doesn't necessarily add the live-ins at the entry block. We insert the spill to a virt-VGPR at a block properly adding the IMPLICIT_DEF at its dominator block. The physical VGPR allocated for this virt-VGPR should be added to the prolog block live-ins otherwise verifier would complain about its spill store for using an undefined register. cdevadas: It doesn't necessarily add the live-ins at the entry block. We insert the spill to a virt-VGPR…
return;		return;

const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIRegisterInfo *TRI = ST.getRegisterInfo();		const SIRegisterInfo *TRI = ST.getRegisterInfo();

// The SP is specifically managed and we don't want extra spills of it.		// The SP is specifically managed and we don't want extra spills of it.
SavedRegs.reset(MFI->getStackPtrOffsetReg());		SavedRegs.reset(MFI->getStackPtrOffsetReg());

▲ Show 20 Lines • Show All 190 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

Show First 20 Lines • Show All 453 Lines • ▼ Show 20 Lines
public:		public:
struct VGPRSpillToAGPR {		struct VGPRSpillToAGPR {
SmallVector<MCPhysReg, 32> Lanes;		SmallVector<MCPhysReg, 32> Lanes;
bool FullyAllocated = false;		bool FullyAllocated = false;
bool IsDead = false;		bool IsDead = false;
};		};

private:		private:
// Track VGPR + wave index for each subregister of the SGPR spilled to		// To track VGPR + wave index for each subregister of the SGPR spilled to
// frameindex key.		// frameindex key during SILowerSGPRSpills pass.
		nhaehnleUnsubmitted Not Done Reply Inline Actions s/wave index/lane index/? nhaehnle: s/wave index/lane index/?
DenseMap<int, std::vector<SIRegisterInfo::SpilledReg>> SGPRToVGPRSpills;		DenseMap<int, std::vector<SIRegisterInfo::SpilledReg>> SGPRToVGPRSpills;
		// To track VGPR + wave index for spilling special SGPRs like Frame Pointer
		arsenmUnsubmitted Not Done Reply Inline Actions Needs to document what "custom" means. Also the fact that it's not serialized makes me nervous. However, that should be OK since this is only set and read in PEI so it should be OK. Ideally we would have somewhere else to put it arsenm: Needs to document what "custom" means. Also the fact that it's not serialized makes me nervous.
		nhaehnleUnsubmitted Not Done Reply Inline Actions Agree that "custom" is a bad name. It seems to be for prolog/epilog purposes, name it accordingly. nhaehnle: Agree that "custom" is a bad name. It seems to be for prolog/epilog purposes, name it…
		// identified during PrologEpilogInserter.
		DenseMap<int, std::vector<SIRegisterInfo::SpilledReg>> SGPRToVGPRCustomSpills;
unsigned NumVGPRSpillLanes = 0;		unsigned NumVGPRSpillLanes = 0;
		unsigned NumVGPRCustomSpillLanes = 0;
		arsenmUnsubmitted Not Done Reply Inline Actions Isn't this count implied by SGPRToVGPRCustomSpills.size()? I'd like to avoid multiplying the number of unserialized fields arsenm: Isn't this count implied by SGPRToVGPRCustomSpills.size()? I'd like to avoid multiplying the…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions Currently, the function `SIMachineFunctionInfo::allocateSGPRSpillToVGPR` needs this variable to choose the num spills between the SILowerSGPRSpills pass and the custom spills later during FrameLowering. I'm planning to move these functions entirely out of SIMachineFunctionInfo and can avoid these variables entirely. cdevadas: Currently, the function `SIMachineFunctionInfo::allocateSGPRSpillToVGPR` needs this variable to…
SmallVector<Register, 2> SpillVGPRs;		SmallVector<Register, 2> SpillVGPRs;
using WWMSpillsMap = MapVector<Register, int>;		using WWMSpillsMap = MapVector<Register, int>;
// To track the registers used in instructions that can potentially modify the		// To track the registers used in instructions that can potentially modify the
// inactive lanes. The WWM instructions and the writelane instructions for		// inactive lanes. The WWM instructions and the writelane instructions for
// spilling SGPRs to VGPRs fall under such category of operations. The VGPRs		// spilling SGPRs to VGPRs fall under such category of operations. The VGPRs
// modified by them should be spilled/restored at function prolog/epilog to		// modified by them should be spilled/restored at function prolog/epilog to
// avoid any undesired outcome. Each entry in this map holds a pair of values,		// avoid any undesired outcome. Each entry in this map holds a pair of values,
// the VGPR and its stack slot index.		// the VGPR and its stack slot index.
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	return (I == SGPRToVGPRSpills.end())
? ArrayRef<SIRegisterInfo::SpilledReg>()		? ArrayRef<SIRegisterInfo::SpilledReg>()
: makeArrayRef(I->second);		: makeArrayRef(I->second);
}		}

ArrayRef<Register> getSGPRSpillVGPRs() const { return SpillVGPRs; }		ArrayRef<Register> getSGPRSpillVGPRs() const { return SpillVGPRs; }
const WWMSpillsMap &getWWMSpills() const { return WWMSpills; }		const WWMSpillsMap &getWWMSpills() const { return WWMSpills; }
const ReservedRegSet &getWWMReservedRegs() const { return WWMReservedRegs; }		const ReservedRegSet &getWWMReservedRegs() const { return WWMReservedRegs; }

		ArrayRef<SIRegisterInfo::SpilledReg>
		getSGPRToVGPRCustomSpills(int FrameIndex) const {
		auto I = SGPRToVGPRCustomSpills.find(FrameIndex);
		return (I == SGPRToVGPRCustomSpills.end())
		? ArrayRef<SIRegisterInfo::SpilledReg>()
		: makeArrayRef(I->second);
		}

void allocateWWMSpill(MachineFunction &MF, Register VGPR, uint64_t Size = 4,		void allocateWWMSpill(MachineFunction &MF, Register VGPR, uint64_t Size = 4,
Align Alignment = Align(4));		Align Alignment = Align(4));

ArrayRef<MCPhysReg> getAGPRSpillVGPRs() const {		ArrayRef<MCPhysReg> getAGPRSpillVGPRs() const {
return SpillAGPR;		return SpillAGPR;
}		}

ArrayRef<MCPhysReg> getVGPRSpillAGPRs() const {		ArrayRef<MCPhysReg> getVGPRSpillAGPRs() const {
return SpillVGPR;		return SpillVGPR;
}		}

MCPhysReg getVGPRToAGPRSpill(int FrameIndex, unsigned Lane) const {		MCPhysReg getVGPRToAGPRSpill(int FrameIndex, unsigned Lane) const {
auto I = VGPRToAGPRSpills.find(FrameIndex);		auto I = VGPRToAGPRSpills.find(FrameIndex);
return (I == VGPRToAGPRSpills.end()) ? (MCPhysReg)AMDGPU::NoRegister		return (I == VGPRToAGPRSpills.end()) ? (MCPhysReg)AMDGPU::NoRegister
: I->second.Lanes[Lane];		: I->second.Lanes[Lane];
}		}

void setVGPRToAGPRSpillDead(int FrameIndex) {		void setVGPRToAGPRSpillDead(int FrameIndex) {
auto I = VGPRToAGPRSpills.find(FrameIndex);		auto I = VGPRToAGPRSpills.find(FrameIndex);
if (I != VGPRToAGPRSpills.end())		if (I != VGPRToAGPRSpills.end())
I->second.IsDead = true;		I->second.IsDead = true;
}		}

bool haveFreeLanesForSGPRSpill(const MachineFunction &MF,		bool allocateVGPRForSGPRSpills(MachineFunction &MF, int FI,
unsigned NumLane) const;		unsigned LaneIndex);
bool allocateSGPRSpillToVGPR(MachineFunction &MF, int FI);		bool allocateVGPRForCustomSGPRSpills(MachineFunction &MF, int FI,
		unsigned LaneIndex);
		bool allocateSGPRSpillToVGPR(MachineFunction &MF, int FI, bool IsPEI = false);
bool allocateVGPRSpillToAGPR(MachineFunction &MF, int FI, bool isAGPRtoVGPR);		bool allocateVGPRSpillToAGPR(MachineFunction &MF, int FI, bool isAGPRtoVGPR);

/// If \p ResetSGPRSpillStackIDs is true, reset the stack ID from sgpr-spill		/// If \p ResetSGPRSpillStackIDs is true, reset the stack ID from sgpr-spill
/// to the default stack.		/// to the default stack.
bool removeDeadFrameIndices(MachineFrameInfo &MFI,		bool removeDeadFrameIndices(MachineFrameInfo &MFI,
bool ResetSGPRSpillStackIDs);		bool ResetSGPRSpillStackIDs);

int getScavengeFI(MachineFrameInfo &MFI, const SIRegisterInfo &TRI);		int getScavengeFI(MachineFrameInfo &MFI, const SIRegisterInfo &TRI);
▲ Show 20 Lines • Show All 390 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Show First 20 Lines • Show All 275 Lines • ▼ Show 20 Lines	bool SIMachineFunctionInfo::isCalleeSavedReg(const MCPhysReg *CSRegs,
for (unsigned I = 0; CSRegs[I]; ++I) {		for (unsigned I = 0; CSRegs[I]; ++I) {
if (CSRegs[I] == Reg)		if (CSRegs[I] == Reg)
return true;		return true;
}		}

return false;		return false;
}		}

/// \p returns true if \p NumLanes slots are available in VGPRs already used for		bool SIMachineFunctionInfo::allocateVGPRForSGPRSpills(MachineFunction &MF,
/// SGPR spilling.		int FI,
//		unsigned LaneIndex) {
// FIXME: This only works after processFunctionBeforeFrameFinalized
bool SIMachineFunctionInfo::haveFreeLanesForSGPRSpill(const MachineFunction &MF,
unsigned NumNeed) const {
const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
unsigned WaveSize = ST.getWavefrontSize();
return NumVGPRSpillLanes + NumNeed <= WaveSize * SpillVGPRs.size();
}

/// Reserve a slice of a VGPR to support spilling for FrameIndex \p FI.
bool SIMachineFunctionInfo::allocateSGPRSpillToVGPR(MachineFunction &MF,
int FI) {
std::vector<SIRegisterInfo::SpilledReg> &SpillLanes = SGPRToVGPRSpills[FI];

// This has already been allocated.
if (!SpillLanes.empty())
return true;

const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
const SIRegisterInfo *TRI = ST.getRegisterInfo();		const SIRegisterInfo *TRI = ST.getRegisterInfo();
MachineFrameInfo &FrameInfo = MF.getFrameInfo();
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
unsigned WaveSize = ST.getWavefrontSize();

unsigned Size = FrameInfo.getObjectSize(FI);
unsigned NumLanes = Size / 4;

if (NumLanes > WaveSize)
return false;

assert(Size >= 4 && "invalid sgpr spill size");
assert(TRI->spillSGPRToVGPR() && "not spilling SGPRs to VGPRs");

// Make sure to handle the case where a wide SGPR spill may span between two
// VGPRs.
for (unsigned I = 0; I < NumLanes; ++I, ++NumVGPRSpillLanes) {
Register LaneVGPR;		Register LaneVGPR;
unsigned VGPRIndex = (NumVGPRSpillLanes % WaveSize);		if (!LaneIndex) {

if (VGPRIndex == 0) {
LaneVGPR = TRI->findUnusedRegister(MRI, &AMDGPU::VGPR_32RegClass, MF);		LaneVGPR = TRI->findUnusedRegister(MRI, &AMDGPU::VGPR_32RegClass, MF);
if (LaneVGPR == AMDGPU::NoRegister) {		if (LaneVGPR == AMDGPU::NoRegister) {
// We have no VGPRs left for spilling SGPRs. Reset because we will not		// We have no VGPRs left for spilling SGPRs. Reset because we will not
// partially spill the SGPR to VGPRs.		// partially spill the SGPR to VGPRs.
SGPRToVGPRSpills.erase(FI);		SGPRToVGPRSpills.erase(FI);
NumVGPRSpillLanes -= I;

// FIXME: We can run out of free registers with split allocation if		// FIXME: We can run out of free registers with split allocation if
// IPRA is enabled and a called function already uses every VGPR.		// IPRA is enabled and a called function already uses every VGPR.
#if 0		#if 0
DiagnosticInfoResourceLimit DiagOutOfRegs(MF.getFunction(),		DiagnosticInfoResourceLimit DiagOutOfRegs(MF.getFunction(),
"VGPRs for SGPR spilling",		"VGPRs for SGPR spilling",
0, DS_Error);		0, DS_Error);
MF.getFunction().getContext().diagnose(DiagOutOfRegs);		MF.getFunction().getContext().diagnose(DiagOutOfRegs);
#endif		#endif
		arsenmUnsubmitted Not Done Reply Inline Actions I think this referenced error cannot happen anymore arsenm: I think this referenced error cannot happen anymore
return false;		return false;
}		}

SpillVGPRs.push_back(LaneVGPR);		SpillVGPRs.push_back(LaneVGPR);

// Add this register as live-in to all blocks to avoid machine verifier		// Add this register as live-in to all blocks to avoid machine verifier
// complaining about use of an undefined physical register.		// complaining about use of an undefined physical register.
for (MachineBasicBlock &BB : MF)		for (MachineBasicBlock &BB : MF)
BB.addLiveIn(LaneVGPR);		BB.addLiveIn(LaneVGPR);
} else {		} else {
LaneVGPR = SpillVGPRs.back();		LaneVGPR = SpillVGPRs.back();
}		}

SpillLanes.push_back(SIRegisterInfo::SpilledReg(LaneVGPR, VGPRIndex));		SGPRToVGPRSpills[FI].push_back(
		SIRegisterInfo::SpilledReg(LaneVGPR, LaneIndex));
		return true;
		}

		bool SIMachineFunctionInfo::allocateVGPRForCustomSGPRSpills(
		MachineFunction &MF, int FI, unsigned LaneIndex) {
		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
		const SIRegisterInfo *TRI = ST.getRegisterInfo();
		MachineRegisterInfo &MRI = MF.getRegInfo();
		Register LaneVGPR;
		if (!LaneIndex) {
		LaneVGPR = TRI->findUnusedRegister(MRI, &AMDGPU::VGPR_32RegClass, MF);
		if (LaneVGPR == AMDGPU::NoRegister) {
		// We have no VGPRs left for spilling SGPRs. Reset because we will not
		// partially spill the SGPR to VGPRs.
		SGPRToVGPRCustomSpills.erase(FI);
		return false;
		}

		allocateWWMSpill(MF, LaneVGPR);
		for (MachineBasicBlock &MBB : MF) {
		MBB.addLiveIn(LaneVGPR);
		MBB.sortUniqueLiveIns();
		}
		arsenmUnsubmitted Not Done Reply Inline Actions Can we defer this until after all the spills are handled? arsenm: Can we defer this until after all the spills are handled?
		} else {
		LaneVGPR = WWMSpills.back().first;
		}

		SGPRToVGPRCustomSpills[FI].push_back(
		SIRegisterInfo::SpilledReg(LaneVGPR, LaneIndex));
		return true;
		}

		bool SIMachineFunctionInfo::allocateSGPRSpillToVGPR(MachineFunction &MF, int FI,
		bool IsPEI) {
		std::vector<SIRegisterInfo::SpilledReg> &SpillLanes =
		IsPEI ? SGPRToVGPRCustomSpills[FI] : SGPRToVGPRSpills[FI];

		// This has already been allocated.
		if (!SpillLanes.empty())
		return true;

		const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
		const SIRegisterInfo *TRI = ST.getRegisterInfo();
		MachineFrameInfo &FrameInfo = MF.getFrameInfo();
		unsigned WaveSize = ST.getWavefrontSize();

		unsigned Size = FrameInfo.getObjectSize(FI);
		unsigned NumLanes = Size / 4;

		if (NumLanes > WaveSize)
		return false;

		assert(Size >= 4 && "invalid sgpr spill size");
		assert(TRI->spillSGPRToVGPR() && "not spilling SGPRs to VGPRs");

		unsigned &NumSpillLanes = IsPEI ? NumVGPRCustomSpillLanes : NumVGPRSpillLanes;

		for (unsigned I = 0; I < NumLanes; ++I, ++NumSpillLanes) {
		unsigned LaneIndex = (NumSpillLanes % WaveSize);
		arsenmUnsubmitted Not Done Reply Inline Actions IsPEI feels like the wrong name. IsPrologEpilog would be a bit better but not great arsenm: IsPEI feels like the wrong name. IsPrologEpilog would be a bit better but not great

		bool Allocated = IsPEI ? allocateVGPRForCustomSGPRSpills(MF, FI, LaneIndex)
		: allocateVGPRForSGPRSpills(MF, FI, LaneIndex);
		if (!Allocated) {
		NumSpillLanes -= I;
		return false;
		}
}		}

return true;		return true;
}		}

/// Reserve AGPRs or VGPRs to support spilling for FrameIndex \p FI.		/// Reserve AGPRs or VGPRs to support spilling for FrameIndex \p FI.
/// Either AGPR is spilled to VGPR to vice versa.		/// Either AGPR is spilled to VGPR to vice versa.
/// Returns true if a \p FI can be eliminated completely.		/// Returns true if a \p FI can be eliminated completely.
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	for (int I = NumLanes - 1; I >= 0; --I) {
Spill.Lanes[I] = *NextSpillReg++;		Spill.Lanes[I] = *NextSpillReg++;
}		}

return Spill.FullyAllocated;		return Spill.FullyAllocated;
}		}

bool SIMachineFunctionInfo::removeDeadFrameIndices(		bool SIMachineFunctionInfo::removeDeadFrameIndices(
MachineFrameInfo &MFI, bool ResetSGPRSpillStackIDs) {		MachineFrameInfo &MFI, bool ResetSGPRSpillStackIDs) {
// Remove dead frame indices from function frame, however keep FP & BP since		// Remove dead frame indices from function frame. And also make sure to remove
// spills for them haven't been inserted yet. And also make sure to remove the		// the frame indices from `SGPRToVGPRSpills` data structure, otherwise, it
// frame indices from `SGPRToVGPRSpills` data structure, otherwise, it could		// could result in an unexpected side effect and bug, in case of any
// result in an unexpected side effect and bug, in case of any re-mapping of		// re-mapping of freed frame indices by later pass(es) like "stack slot
// freed frame indices by later pass(es) like "stack slot coloring".		// coloring".
for (auto &R : make_early_inc_range(SGPRToVGPRSpills)) {		for (auto &R : make_early_inc_range(SGPRToVGPRSpills)) {
if (R.first != FramePointerSaveIndex && R.first != BasePointerSaveIndex) {
MFI.RemoveStackObject(R.first);		MFI.RemoveStackObject(R.first);
SGPRToVGPRSpills.erase(R.first);		SGPRToVGPRSpills.erase(R.first);
}		}
		nhaehnleUnsubmitted Not Done Reply Inline Actions Simplify this further to a simpler for loop and finally `SGPRToVGPRSpills.clear()` nhaehnle: Simplify this further to a simpler for loop and finally `SGPRToVGPRSpills.clear()`
}

bool HaveSGPRToMemory = false;		bool HaveSGPRToMemory = false;

if (ResetSGPRSpillStackIDs) {		if (ResetSGPRSpillStackIDs) {
// All other SPGRs must be allocated on the default stack, so reset the		// All other SPGRs must be allocated on the default stack, so reset the
// stack ID.		// stack ID.
for (int i = MFI.getObjectIndexBegin(), e = MFI.getObjectIndexEnd(); i != e;		for (int i = MFI.getObjectIndexBegin(), e = MFI.getObjectIndexEnd(); i != e;
++i) {		++i) {
▲ Show 20 Lines • Show All 266 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -o - %s \| FileCheck %s			; RUN: llc -global-isel -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -o - %s \| FileCheck %s

	declare hidden i32 addrspace(1)* @ext(i8 addrspace(1)*)			declare hidden i32 addrspace(1)* @ext(i8 addrspace(1)*)

	define i32 addrspace(1)* @call_assert_align() {			define i32 addrspace(1)* @call_assert_align() {
	; CHECK-LABEL: call_assert_align:			; CHECK-LABEL: call_assert_align:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_or_saveexec_b64 s[16:17], -1			; CHECK-NEXT: s_or_saveexec_b64 s[16:17], -1
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; CHECK-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				foadUnsubmitted Not Done Reply Inline Actions Seems like a regression. Does this get fixed by a later patch? foad: Seems like a regression. Does this get fixed by a later patch?
				cdevadasAuthorUnsubmitted Done Reply Inline Actions Yes, it is. With spilling SGPRs into virtual VPGR lanes, it won't directly be possible to track the unused lanes of the physical VGPR allocated for the last virtual register created during `SILowerSGPRSpills` pass. Going to insert a custom pass in the VGPR regalloc pipeline to map the physReg from virtRegMap. In that way, we can reuse the VGPR for any custom SGPR spills during PEI if free lanes are available. However, this regression can only be avoided for higher optimization levels. The `regallocfast`doesn't provide a way to correctly map a virtual to PhysReg and we can't avoid this extra VGPR usage when compiled for -O0. cdevadas: Yes, it is. With spilling SGPRs into virtual VPGR lanes, it won't directly be possible to track…
				arsenmUnsubmitted Not Done Reply Inline Actions I'm not sure a separate pass using VirtRegMap is the right solution to merging spill VGPRs of different SGPRs, but either way this is a separate optimization that needs to be re-implemented. arsenm: I'm not sure a separate pass using VirtRegMap is the right solution to merging spill VGPRs of…
				cdevadasAuthorUnsubmitted Done Reply Inline Actions It's worth implementing when it comes to saving a VGPR. Yep, planning it as a separate patch. cdevadas: It's worth implementing when it comes to saving a VGPR. Yep, planning it as a separate patch.
	; CHECK-NEXT: s_mov_b64 exec, s[16:17]			; CHECK-NEXT: s_mov_b64 exec, s[16:17]
	; CHECK-NEXT: v_writelane_b32 v40, s33, 2			; CHECK-NEXT: v_writelane_b32 v41, s33, 0
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_addk_i32 s32, 0x400			; CHECK-NEXT: s_addk_i32 s32, 0x400
	; CHECK-NEXT: v_writelane_b32 v40, s30, 0			; CHECK-NEXT: v_writelane_b32 v40, s30, 0
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: v_mov_b32_e32 v0, 0
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: v_mov_b32_e32 v1, 0
	; CHECK-NEXT: v_writelane_b32 v40, s31, 1			; CHECK-NEXT: v_writelane_b32 v40, s31, 1
	; CHECK-NEXT: s_getpc_b64 s[16:17]			; CHECK-NEXT: s_getpc_b64 s[16:17]
	; CHECK-NEXT: s_add_u32 s16, s16, ext@rel32@lo+4			; CHECK-NEXT: s_add_u32 s16, s16, ext@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s17, s17, ext@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s17, s17, ext@rel32@hi+12
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; CHECK-NEXT: v_mov_b32_e32 v2, 0			; CHECK-NEXT: v_mov_b32_e32 v2, 0
	; CHECK-NEXT: global_store_dword v[0:1], v2, off			; CHECK-NEXT: global_store_dword v[0:1], v2, off
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: v_readlane_b32 s31, v40, 1			; CHECK-NEXT: v_readlane_b32 s31, v40, 1
	; CHECK-NEXT: v_readlane_b32 s30, v40, 0			; CHECK-NEXT: v_readlane_b32 s30, v40, 0
	; CHECK-NEXT: s_addk_i32 s32, 0xfc00			; CHECK-NEXT: s_addk_i32 s32, 0xfc00
	; CHECK-NEXT: v_readlane_b32 s33, v40, 2			; CHECK-NEXT: v_readlane_b32 s33, v41, 0
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; CHECK-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%call = call align 4 i32 addrspace(1)* @ext(i8 addrspace(1)* null)			%call = call align 4 i32 addrspace(1)* @ext(i8 addrspace(1)* null)
	store volatile i32 0, i32 addrspace(1)* %call			store volatile i32 0, i32 addrspace(1)* %call
	ret i32 addrspace(1)* %call			ret i32 addrspace(1)* %call
	}			}
	Show All 15 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll

	Show First 20 Lines • Show All 232 Lines • ▼ Show 20 Lines
	}			}

	define void @func_caller_stack() {			define void @func_caller_stack() {
	; MUBUF-LABEL: func_caller_stack:			; MUBUF-LABEL: func_caller_stack:
	; MUBUF: ; %bb.0:			; MUBUF: ; %bb.0:
	; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; MUBUF-NEXT: s_or_saveexec_b64 s[4:5], -1			; MUBUF-NEXT: s_or_saveexec_b64 s[4:5], -1
	; MUBUF-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; MUBUF-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; MUBUF-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; MUBUF-NEXT: s_mov_b64 exec, s[4:5]			; MUBUF-NEXT: s_mov_b64 exec, s[4:5]
	; MUBUF-NEXT: v_writelane_b32 v40, s33, 2			; MUBUF-NEXT: v_writelane_b32 v41, s33, 0
	; MUBUF-NEXT: s_mov_b32 s33, s32			; MUBUF-NEXT: s_mov_b32 s33, s32
	; MUBUF-NEXT: s_addk_i32 s32, 0x400			; MUBUF-NEXT: s_addk_i32 s32, 0x400
	; MUBUF-NEXT: v_mov_b32_e32 v0, 9			; MUBUF-NEXT: v_mov_b32_e32 v0, 9
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; MUBUF-NEXT: v_mov_b32_e32 v0, 10			; MUBUF-NEXT: v_mov_b32_e32 v0, 10
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; MUBUF-NEXT: v_mov_b32_e32 v0, 11			; MUBUF-NEXT: v_mov_b32_e32 v0, 11
	; MUBUF-NEXT: v_writelane_b32 v40, s30, 0			; MUBUF-NEXT: v_writelane_b32 v40, s30, 0
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	; MUBUF-NEXT: v_mov_b32_e32 v0, 12			; MUBUF-NEXT: v_mov_b32_e32 v0, 12
	; MUBUF-NEXT: v_writelane_b32 v40, s31, 1			; MUBUF-NEXT: v_writelane_b32 v40, s31, 1
	; MUBUF-NEXT: s_getpc_b64 s[4:5]			; MUBUF-NEXT: s_getpc_b64 s[4:5]
	; MUBUF-NEXT: s_add_u32 s4, s4, external_void_func_v16i32_v16i32_v4i32@rel32@lo+4			; MUBUF-NEXT: s_add_u32 s4, s4, external_void_func_v16i32_v16i32_v4i32@rel32@lo+4
	; MUBUF-NEXT: s_addc_u32 s5, s5, external_void_func_v16i32_v16i32_v4i32@rel32@hi+12			; MUBUF-NEXT: s_addc_u32 s5, s5, external_void_func_v16i32_v16i32_v4i32@rel32@hi+12
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:16
	; MUBUF-NEXT: s_swappc_b64 s[30:31], s[4:5]			; MUBUF-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; MUBUF-NEXT: v_readlane_b32 s31, v40, 1			; MUBUF-NEXT: v_readlane_b32 s31, v40, 1
	; MUBUF-NEXT: v_readlane_b32 s30, v40, 0			; MUBUF-NEXT: v_readlane_b32 s30, v40, 0
	; MUBUF-NEXT: s_addk_i32 s32, 0xfc00			; MUBUF-NEXT: s_addk_i32 s32, 0xfc00
	; MUBUF-NEXT: v_readlane_b32 s33, v40, 2			; MUBUF-NEXT: v_readlane_b32 s33, v41, 0
	; MUBUF-NEXT: s_or_saveexec_b64 s[4:5], -1			; MUBUF-NEXT: s_or_saveexec_b64 s[4:5], -1
	; MUBUF-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; MUBUF-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; MUBUF-NEXT: s_mov_b64 exec, s[4:5]			; MUBUF-NEXT: s_mov_b64 exec, s[4:5]
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: s_setpc_b64 s[30:31]			; MUBUF-NEXT: s_setpc_b64 s[30:31]
	;			;
	; FLATSCR-LABEL: func_caller_stack:			; FLATSCR-LABEL: func_caller_stack:
	; FLATSCR: ; %bb.0:			; FLATSCR: ; %bb.0:
	; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; FLATSCR-NEXT: s_or_saveexec_b64 s[0:1], -1			; FLATSCR-NEXT: s_or_saveexec_b64 s[0:1], -1
	; FLATSCR-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; FLATSCR-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]			; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]
	; FLATSCR-NEXT: v_writelane_b32 v40, s33, 2			; FLATSCR-NEXT: v_writelane_b32 v41, s33, 0
	; FLATSCR-NEXT: s_mov_b32 s33, s32			; FLATSCR-NEXT: s_mov_b32 s33, s32
	; FLATSCR-NEXT: s_add_i32 s32, s32, 16			; FLATSCR-NEXT: s_add_i32 s32, s32, 16
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 9			; FLATSCR-NEXT: v_mov_b32_e32 v0, 9
	; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:4			; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:4
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 10			; FLATSCR-NEXT: v_mov_b32_e32 v0, 10
	; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:8			; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:8
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 11			; FLATSCR-NEXT: v_mov_b32_e32 v0, 11
	; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0			; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0
	; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:12			; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:12
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 12			; FLATSCR-NEXT: v_mov_b32_e32 v0, 12
	; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1			; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1
	; FLATSCR-NEXT: s_getpc_b64 s[0:1]			; FLATSCR-NEXT: s_getpc_b64 s[0:1]
	; FLATSCR-NEXT: s_add_u32 s0, s0, external_void_func_v16i32_v16i32_v4i32@rel32@lo+4			; FLATSCR-NEXT: s_add_u32 s0, s0, external_void_func_v16i32_v16i32_v4i32@rel32@lo+4
	; FLATSCR-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32_v16i32_v4i32@rel32@hi+12			; FLATSCR-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32_v16i32_v4i32@rel32@hi+12
	; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:16			; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:16
	; FLATSCR-NEXT: s_swappc_b64 s[30:31], s[0:1]			; FLATSCR-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; FLATSCR-NEXT: v_readlane_b32 s31, v40, 1			; FLATSCR-NEXT: v_readlane_b32 s31, v40, 1
	; FLATSCR-NEXT: v_readlane_b32 s30, v40, 0			; FLATSCR-NEXT: v_readlane_b32 s30, v40, 0
	; FLATSCR-NEXT: s_add_i32 s32, s32, -16			; FLATSCR-NEXT: s_add_i32 s32, s32, -16
	; FLATSCR-NEXT: v_readlane_b32 s33, v40, 2			; FLATSCR-NEXT: v_readlane_b32 s33, v41, 0
	; FLATSCR-NEXT: s_or_saveexec_b64 s[0:1], -1			; FLATSCR-NEXT: s_or_saveexec_b64 s[0:1], -1
	; FLATSCR-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
				; FLATSCR-NEXT: scratch_load_dword v41, off, s32 offset:4 ; 4-byte Folded Reload
	; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]			; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_setpc_b64 s[30:31]			; FLATSCR-NEXT: s_setpc_b64 s[30:31]
	call void @external_void_func_v16i32_v16i32_v4i32(<16 x i32> undef, <16 x i32> undef, <4 x i32> <i32 9, i32 10, i32 11, i32 12>)			call void @external_void_func_v16i32_v16i32_v4i32(<16 x i32> undef, <16 x i32> undef, <4 x i32> <i32 9, i32 10, i32 11, i32 12>)
	ret void			ret void
	}			}

	define void @func_caller_byval([16 x i32] addrspace(5)* %argptr) {			define void @func_caller_byval([16 x i32] addrspace(5)* %argptr) {
	; MUBUF-LABEL: func_caller_byval:			; MUBUF-LABEL: func_caller_byval:
	; MUBUF: ; %bb.0:			; MUBUF: ; %bb.0:
	; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; MUBUF-NEXT: s_or_saveexec_b64 s[4:5], -1			; MUBUF-NEXT: s_or_saveexec_b64 s[4:5], -1
	; MUBUF-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; MUBUF-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; MUBUF-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; MUBUF-NEXT: s_mov_b64 exec, s[4:5]			; MUBUF-NEXT: s_mov_b64 exec, s[4:5]
	; MUBUF-NEXT: buffer_load_dword v1, v0, s[0:3], 0 offen			; MUBUF-NEXT: buffer_load_dword v1, v0, s[0:3], 0 offen
	; MUBUF-NEXT: buffer_load_dword v2, v0, s[0:3], 0 offen offset:4			; MUBUF-NEXT: buffer_load_dword v2, v0, s[0:3], 0 offen offset:4
	; MUBUF-NEXT: v_writelane_b32 v40, s33, 2			; MUBUF-NEXT: v_writelane_b32 v41, s33, 0
	; MUBUF-NEXT: s_mov_b32 s33, s32			; MUBUF-NEXT: s_mov_b32 s33, s32
	; MUBUF-NEXT: s_addk_i32 s32, 0x400			; MUBUF-NEXT: s_addk_i32 s32, 0x400
	; MUBUF-NEXT: v_writelane_b32 v40, s30, 0			; MUBUF-NEXT: v_writelane_b32 v40, s30, 0
	; MUBUF-NEXT: v_writelane_b32 v40, s31, 1			; MUBUF-NEXT: v_writelane_b32 v40, s31, 1
	; MUBUF-NEXT: s_getpc_b64 s[4:5]			; MUBUF-NEXT: s_getpc_b64 s[4:5]
	; MUBUF-NEXT: s_add_u32 s4, s4, external_void_func_byval@rel32@lo+4			; MUBUF-NEXT: s_add_u32 s4, s4, external_void_func_byval@rel32@lo+4
	; MUBUF-NEXT: s_addc_u32 s5, s5, external_void_func_byval@rel32@hi+12			; MUBUF-NEXT: s_addc_u32 s5, s5, external_void_func_byval@rel32@hi+12
	; MUBUF-NEXT: s_waitcnt vmcnt(1)			; MUBUF-NEXT: s_waitcnt vmcnt(1)
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	; MUBUF-NEXT: s_waitcnt vmcnt(1)			; MUBUF-NEXT: s_waitcnt vmcnt(1)
	; MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:56			; MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:56
	; MUBUF-NEXT: s_waitcnt vmcnt(1)			; MUBUF-NEXT: s_waitcnt vmcnt(1)
	; MUBUF-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:60			; MUBUF-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:60
	; MUBUF-NEXT: s_swappc_b64 s[30:31], s[4:5]			; MUBUF-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; MUBUF-NEXT: v_readlane_b32 s31, v40, 1			; MUBUF-NEXT: v_readlane_b32 s31, v40, 1
	; MUBUF-NEXT: v_readlane_b32 s30, v40, 0			; MUBUF-NEXT: v_readlane_b32 s30, v40, 0
	; MUBUF-NEXT: s_addk_i32 s32, 0xfc00			; MUBUF-NEXT: s_addk_i32 s32, 0xfc00
	; MUBUF-NEXT: v_readlane_b32 s33, v40, 2			; MUBUF-NEXT: v_readlane_b32 s33, v41, 0
	; MUBUF-NEXT: s_or_saveexec_b64 s[4:5], -1			; MUBUF-NEXT: s_or_saveexec_b64 s[4:5], -1
	; MUBUF-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; MUBUF-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; MUBUF-NEXT: s_mov_b64 exec, s[4:5]			; MUBUF-NEXT: s_mov_b64 exec, s[4:5]
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: s_setpc_b64 s[30:31]			; MUBUF-NEXT: s_setpc_b64 s[30:31]
	;			;
	; FLATSCR-LABEL: func_caller_byval:			; FLATSCR-LABEL: func_caller_byval:
	; FLATSCR: ; %bb.0:			; FLATSCR: ; %bb.0:
	; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; FLATSCR-NEXT: s_or_saveexec_b64 s[0:1], -1			; FLATSCR-NEXT: s_or_saveexec_b64 s[0:1], -1
	; FLATSCR-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; FLATSCR-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; FLATSCR-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]			; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]
	; FLATSCR-NEXT: scratch_load_dwordx2 v[1:2], v0, off			; FLATSCR-NEXT: scratch_load_dwordx2 v[1:2], v0, off
	; FLATSCR-NEXT: v_writelane_b32 v40, s33, 2			; FLATSCR-NEXT: v_writelane_b32 v41, s33, 0
	; FLATSCR-NEXT: s_mov_b32 s33, s32			; FLATSCR-NEXT: s_mov_b32 s33, s32
	; FLATSCR-NEXT: s_add_i32 s32, s32, 16			; FLATSCR-NEXT: s_add_i32 s32, s32, 16
	; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0			; FLATSCR-NEXT: v_writelane_b32 v40, s30, 0
	; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1			; FLATSCR-NEXT: v_writelane_b32 v40, s31, 1
	; FLATSCR-NEXT: s_getpc_b64 s[0:1]			; FLATSCR-NEXT: s_getpc_b64 s[0:1]
	; FLATSCR-NEXT: s_add_u32 s0, s0, external_void_func_byval@rel32@lo+4			; FLATSCR-NEXT: s_add_u32 s0, s0, external_void_func_byval@rel32@lo+4
	; FLATSCR-NEXT: s_addc_u32 s1, s1, external_void_func_byval@rel32@hi+12			; FLATSCR-NEXT: s_addc_u32 s1, s1, external_void_func_byval@rel32@hi+12
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	Show All 18 Lines
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[1:2], s32 offset:48			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[1:2], s32 offset:48
	; FLATSCR-NEXT: scratch_load_dwordx2 v[0:1], v0, off offset:56			; FLATSCR-NEXT: scratch_load_dwordx2 v[0:1], v0, off offset:56
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s32 offset:56			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s32 offset:56
	; FLATSCR-NEXT: s_swappc_b64 s[30:31], s[0:1]			; FLATSCR-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; FLATSCR-NEXT: v_readlane_b32 s31, v40, 1			; FLATSCR-NEXT: v_readlane_b32 s31, v40, 1
	; FLATSCR-NEXT: v_readlane_b32 s30, v40, 0			; FLATSCR-NEXT: v_readlane_b32 s30, v40, 0
	; FLATSCR-NEXT: s_add_i32 s32, s32, -16			; FLATSCR-NEXT: s_add_i32 s32, s32, -16
	; FLATSCR-NEXT: v_readlane_b32 s33, v40, 2			; FLATSCR-NEXT: v_readlane_b32 s33, v41, 0
	; FLATSCR-NEXT: s_or_saveexec_b64 s[0:1], -1			; FLATSCR-NEXT: s_or_saveexec_b64 s[0:1], -1
	; FLATSCR-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
				; FLATSCR-NEXT: scratch_load_dword v41, off, s32 offset:4 ; 4-byte Folded Reload
	; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]			; FLATSCR-NEXT: s_mov_b64 exec, s[0:1]
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_setpc_b64 s[30:31]			; FLATSCR-NEXT: s_setpc_b64 s[30:31]
	%cast = bitcast [16 x i32] addrspace(5)* %argptr to i8 addrspace(5)*			%cast = bitcast [16 x i32] addrspace(5)* %argptr to i8 addrspace(5)*
	call void @external_void_func_byval([16 x i32] addrspace(5)* byval([16 x i32]) %argptr)			call void @external_void_func_byval([16 x i32] addrspace(5)* byval([16 x i32]) %argptr)
	ret void			ret void
	}			}

	declare void @llvm.memset.p5i8.i32(i8 addrspace(5)* nocapture writeonly, i8, i32, i1 immarg) #1			declare void @llvm.memset.p5i8.i32(i8 addrspace(5)* nocapture writeonly, i8, i32, i1 immarg) #1

	attributes #0 = { nounwind "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" }			attributes #0 = { nounwind "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" }
	attributes #1 = { argmemonly nofree nounwind willreturn writeonly }			attributes #1 = { argmemonly nofree nounwind willreturn writeonly }

llvm/test/CodeGen/AMDGPU/GlobalISel/localizer.ll

	Show First 20 Lines • Show All 226 Lines • ▼ Show 20 Lines

	; This would crash from using the wrong insert point			; This would crash from using the wrong insert point
	define void @sink_null_insert_pt(i32 addrspace(4)* %arg0) {			define void @sink_null_insert_pt(i32 addrspace(4)* %arg0) {
	; GFX9-LABEL: sink_null_insert_pt:			; GFX9-LABEL: sink_null_insert_pt:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[16:17], -1			; GFX9-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[16:17]			; GFX9-NEXT: s_mov_b64 exec, s[16:17]
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: global_load_dword v0, v[0:1], off glc			; GFX9-NEXT: global_load_dword v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], 0			; GFX9-NEXT: s_swappc_b64 s[30:31], 0
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%load0 = load volatile i32, i32 addrspace(1)* null, align 4			%load0 = load volatile i32, i32 addrspace(1)* null, align 4
	br label %bb1			br label %bb1

	bb1:			bb1:
	call void null()			call void null()
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll

	Show All 13 Lines
	; does not require the implicit arguments to the function. Make sure			; does not require the implicit arguments to the function. Make sure
	; we do not crash.			; we do not crash.
	define void @parent_func_missing_inputs() #0 {			define void @parent_func_missing_inputs() #0 {
	; FIXEDABI-LABEL: parent_func_missing_inputs:			; FIXEDABI-LABEL: parent_func_missing_inputs:
	; FIXEDABI: ; %bb.0:			; FIXEDABI: ; %bb.0:
	; FIXEDABI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; FIXEDABI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; FIXEDABI-NEXT: s_or_saveexec_b64 s[16:17], -1			; FIXEDABI-NEXT: s_or_saveexec_b64 s[16:17], -1
	; FIXEDABI-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; FIXEDABI-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; FIXEDABI-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; FIXEDABI-NEXT: s_mov_b64 exec, s[16:17]			; FIXEDABI-NEXT: s_mov_b64 exec, s[16:17]
	; FIXEDABI-NEXT: v_writelane_b32 v40, s33, 2			; FIXEDABI-NEXT: v_writelane_b32 v41, s33, 0
	; FIXEDABI-NEXT: s_mov_b32 s33, s32			; FIXEDABI-NEXT: s_mov_b32 s33, s32
	; FIXEDABI-NEXT: s_addk_i32 s32, 0x400			; FIXEDABI-NEXT: s_addk_i32 s32, 0x400
	; FIXEDABI-NEXT: v_writelane_b32 v40, s30, 0			; FIXEDABI-NEXT: v_writelane_b32 v40, s30, 0
	; FIXEDABI-NEXT: v_writelane_b32 v40, s31, 1			; FIXEDABI-NEXT: v_writelane_b32 v40, s31, 1
	; FIXEDABI-NEXT: s_getpc_b64 s[16:17]			; FIXEDABI-NEXT: s_getpc_b64 s[16:17]
	; FIXEDABI-NEXT: s_add_u32 s16, s16, requires_all_inputs@rel32@lo+4			; FIXEDABI-NEXT: s_add_u32 s16, s16, requires_all_inputs@rel32@lo+4
	; FIXEDABI-NEXT: s_addc_u32 s17, s17, requires_all_inputs@rel32@hi+12			; FIXEDABI-NEXT: s_addc_u32 s17, s17, requires_all_inputs@rel32@hi+12
	; FIXEDABI-NEXT: s_swappc_b64 s[30:31], s[16:17]			; FIXEDABI-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; FIXEDABI-NEXT: v_readlane_b32 s31, v40, 1			; FIXEDABI-NEXT: v_readlane_b32 s31, v40, 1
	; FIXEDABI-NEXT: v_readlane_b32 s30, v40, 0			; FIXEDABI-NEXT: v_readlane_b32 s30, v40, 0
	; FIXEDABI-NEXT: s_addk_i32 s32, 0xfc00			; FIXEDABI-NEXT: s_addk_i32 s32, 0xfc00
	; FIXEDABI-NEXT: v_readlane_b32 s33, v40, 2			; FIXEDABI-NEXT: v_readlane_b32 s33, v41, 0
	; FIXEDABI-NEXT: s_or_saveexec_b64 s[4:5], -1			; FIXEDABI-NEXT: s_or_saveexec_b64 s[4:5], -1
	; FIXEDABI-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; FIXEDABI-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; FIXEDABI-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; FIXEDABI-NEXT: s_mov_b64 exec, s[4:5]			; FIXEDABI-NEXT: s_mov_b64 exec, s[4:5]
	; FIXEDABI-NEXT: s_waitcnt vmcnt(0)			; FIXEDABI-NEXT: s_waitcnt vmcnt(0)
	; FIXEDABI-NEXT: s_setpc_b64 s[30:31]			; FIXEDABI-NEXT: s_setpc_b64 s[30:31]
	call void @requires_all_inputs()			call void @requires_all_inputs()
	ret void			ret void
	}			}

	define amdgpu_kernel void @parent_kernel_missing_inputs() #0 {			define amdgpu_kernel void @parent_kernel_missing_inputs() #0 {
	▲ Show 20 Lines • Show All 347 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/amdpal-callable.ll

	Show First 20 Lines • Show All 166 Lines • ▼ Show 20 Lines
	; GCN-NEXT: .vgpr_count: 0x3{{$}}			; GCN-NEXT: .vgpr_count: 0x3{{$}}
	; GCN-NEXT: no_stack:			; GCN-NEXT: no_stack:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GCN-NEXT: .sgpr_count: 0x20{{$}}			; GCN-NEXT: .sgpr_count: 0x20{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}
	; GCN-NEXT: .vgpr_count: 0x1{{$}}			; GCN-NEXT: .vgpr_count: 0x1{{$}}
	; GCN-NEXT: no_stack_call:			; GCN-NEXT: no_stack_call:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GCN-NEXT: .sgpr_count: 0x24{{$}}			; GCN-NEXT: .sgpr_count: 0x25{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
	; GCN-NEXT: .vgpr_count: 0x3{{$}}			; GCN-NEXT: .vgpr_count: 0x3{{$}}
	; GCN-NEXT: no_stack_extern_call:			; GCN-NEXT: no_stack_extern_call:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GFX8-NEXT: .sgpr_count: 0x28{{$}}			; GFX8-NEXT: .sgpr_count: 0x28{{$}}
	; GFX9-NEXT: .sgpr_count: 0x2c{{$}}			; GFX9-NEXT: .sgpr_count: 0x2c{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
	; GCN-NEXT: .vgpr_count: 0x2b{{$}}			; GCN-NEXT: .vgpr_count: 0x2c{{$}}
	; GCN-NEXT: no_stack_extern_call_many_args:			; GCN-NEXT: no_stack_extern_call_many_args:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GFX8-NEXT: .sgpr_count: 0x28{{$}}			; GFX8-NEXT: .sgpr_count: 0x28{{$}}
	; GFX9-NEXT: .sgpr_count: 0x2c{{$}}			; GFX9-NEXT: .sgpr_count: 0x2c{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x90{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x90{{$}}
	; GCN-NEXT: .vgpr_count: 0x2b{{$}}			; GCN-NEXT: .vgpr_count: 0x2c{{$}}
	; GCN-NEXT: no_stack_indirect_call:			; GCN-NEXT: no_stack_indirect_call:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GFX8-NEXT: .sgpr_count: 0x28{{$}}			; GFX8-NEXT: .sgpr_count: 0x28{{$}}
	; GFX9-NEXT: .sgpr_count: 0x2c{{$}}			; GFX9-NEXT: .sgpr_count: 0x2c{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
	; GCN-NEXT: .vgpr_count: 0x2b{{$}}			; GCN-NEXT: .vgpr_count: 0x2c{{$}}
	; GCN-NEXT: simple_lds:			; GCN-NEXT: simple_lds:
	; GCN-NEXT: .lds_size: 0x100{{$}}			; GCN-NEXT: .lds_size: 0x100{{$}}
	; GCN-NEXT: .sgpr_count: 0x20{{$}}			; GCN-NEXT: .sgpr_count: 0x20{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}
	; GCN-NEXT: .vgpr_count: 0x1{{$}}			; GCN-NEXT: .vgpr_count: 0x1{{$}}
	; GCN-NEXT: simple_lds_recurse:			; GCN-NEXT: simple_lds_recurse:
	; GCN-NEXT: .lds_size: 0x100{{$}}			; GCN-NEXT: .lds_size: 0x100{{$}}
	; GCN-NEXT: .sgpr_count: 0x26{{$}}			; GCN-NEXT: .sgpr_count: 0x26{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
	; GCN-NEXT: .vgpr_count: 0x29{{$}}			; GCN-NEXT: .vgpr_count: 0x2a{{$}}
	; GCN-NEXT: simple_stack:			; GCN-NEXT: simple_stack:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GCN-NEXT: .sgpr_count: 0x21{{$}}			; GCN-NEXT: .sgpr_count: 0x21{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x14{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x14{{$}}
	; GCN-NEXT: .vgpr_count: 0x2{{$}}			; GCN-NEXT: .vgpr_count: 0x2{{$}}
	; GCN-NEXT: simple_stack_call:			; GCN-NEXT: simple_stack_call:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GCN-NEXT: .sgpr_count: 0x24{{$}}			; GCN-NEXT: .sgpr_count: 0x25{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}
	; GCN-NEXT: .vgpr_count: 0x4{{$}}			; GCN-NEXT: .vgpr_count: 0x4{{$}}
	; GCN-NEXT: simple_stack_extern_call:			; GCN-NEXT: simple_stack_extern_call:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GFX8-NEXT: .sgpr_count: 0x28{{$}}			; GFX8-NEXT: .sgpr_count: 0x28{{$}}
	; GFX9-NEXT: .sgpr_count: 0x2c{{$}}			; GFX9-NEXT: .sgpr_count: 0x2c{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}
	; GCN-NEXT: .vgpr_count: 0x2b{{$}}			; GCN-NEXT: .vgpr_count: 0x2c{{$}}
	; GCN-NEXT: simple_stack_indirect_call:			; GCN-NEXT: simple_stack_indirect_call:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GFX8-NEXT: .sgpr_count: 0x28{{$}}			; GFX8-NEXT: .sgpr_count: 0x28{{$}}
	; GFX9-NEXT: .sgpr_count: 0x2c{{$}}			; GFX9-NEXT: .sgpr_count: 0x2c{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x30{{$}}
	; GCN-NEXT: .vgpr_count: 0x2b{{$}}			; GCN-NEXT: .vgpr_count: 0x2c{{$}}
	; GCN-NEXT: simple_stack_recurse:			; GCN-NEXT: simple_stack_recurse:
	; GCN-NEXT: .lds_size: 0{{$}}			; GCN-NEXT: .lds_size: 0{{$}}
	; GCN-NEXT: .sgpr_count: 0x26{{$}}			; GCN-NEXT: .sgpr_count: 0x26{{$}}
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x20{{$}}
	; GCN-NEXT: .vgpr_count: 0x2a{{$}}			; GCN-NEXT: .vgpr_count: 0x2b{{$}}
	; GCN-NEXT: ...			; GCN-NEXT: ...

llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa --amdhsa-code-object-version=2 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,CI %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa --amdhsa-code-object-version=2 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,CI %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa --amdhsa-code-object-version=2 -mcpu=fiji -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,VI-NOBUG %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa --amdhsa-code-object-version=2 -mcpu=fiji -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,VI-NOBUG %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa --amdhsa-code-object-version=2 -mcpu=iceland -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,VI-BUG %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa --amdhsa-code-object-version=2 -mcpu=iceland -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,VI-BUG %s

	; Make sure to run a GPU with the SGPR allocation bug.			; Make sure to run a GPU with the SGPR allocation bug.

	; GCN-LABEL: {{^}}use_vcc:			; GCN-LABEL: {{^}}use_vcc:
	; GCN: ; NumSgprs: 34			; GCN: ; NumSgprs: 34
	; GCN: ; NumVgprs: 0			; GCN: ; NumVgprs: 0
	define void @use_vcc() #1 {			define void @use_vcc() #1 {
	call void asm sideeffect "", "~{vcc}" () #0			call void asm sideeffect "", "~{vcc}" () #0
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_use_vcc:			; GCN-LABEL: {{^}}indirect_use_vcc:
	; GCN: v_writelane_b32 v40, s33, 2			; GCN: v_writelane_b32 v41, s33, 0
	; GCN: v_writelane_b32 v40, s30, 0			; GCN: v_writelane_b32 v40, s30, 0
	; GCN: v_writelane_b32 v40, s31, 1			; GCN: v_writelane_b32 v40, s31, 1
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GCN: v_readlane_b32 s31, v40, 1			; GCN: v_readlane_b32 s31, v40, 1
	; GCN: v_readlane_b32 s30, v40, 0			; GCN: v_readlane_b32 s30, v40, 0
	; GCN: v_readlane_b32 s33, v40, 2			; GCN: v_readlane_b32 s33, v41, 0
	; GCN: s_setpc_b64 s[30:31]			; GCN: s_setpc_b64 s[30:31]
	; GCN: ; NumSgprs: 36			; GCN: ; NumSgprs: 36
	; GCN: ; NumVgprs: 41			; GCN: ; NumVgprs: 42
	define void @indirect_use_vcc() #1 {			define void @indirect_use_vcc() #1 {
	call void @use_vcc()			call void @use_vcc()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_2level_use_vcc_kernel:			; GCN-LABEL: {{^}}indirect_2level_use_vcc_kernel:
	; GCN: is_dynamic_callstack = 0			; GCN: is_dynamic_callstack = 0
	; CI: ; NumSgprs: 38			; CI: ; NumSgprs: 38
	; VI-NOBUG: ; NumSgprs: 40			; VI-NOBUG: ; NumSgprs: 40
	; VI-BUG: ; NumSgprs: 96			; VI-BUG: ; NumSgprs: 96
	; GCN: ; NumVgprs: 41			; GCN: ; NumVgprs: 42
	define amdgpu_kernel void @indirect_2level_use_vcc_kernel(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @indirect_2level_use_vcc_kernel(i32 addrspace(1)* %out) #0 {
	call void @indirect_use_vcc()			call void @indirect_use_vcc()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}use_flat_scratch:			; GCN-LABEL: {{^}}use_flat_scratch:
	; CI: ; NumSgprs: 36			; CI: ; NumSgprs: 36
	; VI: ; NumSgprs: 38			; VI: ; NumSgprs: 38
	; GCN: ; NumVgprs: 0			; GCN: ; NumVgprs: 0
	define void @use_flat_scratch() #1 {			define void @use_flat_scratch() #1 {
	call void asm sideeffect "", "~{flat_scratch}" () #0			call void asm sideeffect "", "~{flat_scratch}" () #0
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_use_flat_scratch:			; GCN-LABEL: {{^}}indirect_use_flat_scratch:
	; CI: ; NumSgprs: 38			; CI: ; NumSgprs: 38
	; VI: ; NumSgprs: 40			; VI: ; NumSgprs: 40
	; GCN: ; NumVgprs: 41			; GCN: ; NumVgprs: 42
	define void @indirect_use_flat_scratch() #1 {			define void @indirect_use_flat_scratch() #1 {
	call void @use_flat_scratch()			call void @use_flat_scratch()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_2level_use_flat_scratch_kernel:			; GCN-LABEL: {{^}}indirect_2level_use_flat_scratch_kernel:
	; GCN: is_dynamic_callstack = 0			; GCN: is_dynamic_callstack = 0
	; CI: ; NumSgprs: 38			; CI: ; NumSgprs: 38
	; VI-NOBUG: ; NumSgprs: 40			; VI-NOBUG: ; NumSgprs: 40
	; VI-BUG: ; NumSgprs: 96			; VI-BUG: ; NumSgprs: 96
	; GCN: ; NumVgprs: 41			; GCN: ; NumVgprs: 42
	define amdgpu_kernel void @indirect_2level_use_flat_scratch_kernel(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @indirect_2level_use_flat_scratch_kernel(i32 addrspace(1)* %out) #0 {
	call void @indirect_use_flat_scratch()			call void @indirect_use_flat_scratch()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}use_10_vgpr:			; GCN-LABEL: {{^}}use_10_vgpr:
	; GCN: ; NumVgprs: 10			; GCN: ; NumVgprs: 10
	define void @use_10_vgpr() #1 {			define void @use_10_vgpr() #1 {
	call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4}"() #0			call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4}"() #0
	call void asm sideeffect "", "~{v5},~{v6},~{v7},~{v8},~{v9}"() #0			call void asm sideeffect "", "~{v5},~{v6},~{v7},~{v8},~{v9}"() #0
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_use_10_vgpr:			; GCN-LABEL: {{^}}indirect_use_10_vgpr:
	; GCN: ; NumVgprs: 41			; GCN: ; NumVgprs: 42
	define void @indirect_use_10_vgpr() #0 {			define void @indirect_use_10_vgpr() #0 {
	call void @use_10_vgpr()			call void @use_10_vgpr()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_2_level_use_10_vgpr:			; GCN-LABEL: {{^}}indirect_2_level_use_10_vgpr:
	; GCN: is_dynamic_callstack = 0			; GCN: is_dynamic_callstack = 0
	; GCN: ; NumVgprs: 41			; GCN: ; NumVgprs: 42
	define amdgpu_kernel void @indirect_2_level_use_10_vgpr() #0 {			define amdgpu_kernel void @indirect_2_level_use_10_vgpr() #0 {
	call void @indirect_use_10_vgpr()			call void @indirect_use_10_vgpr()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}use_50_vgpr:			; GCN-LABEL: {{^}}use_50_vgpr:
	; GCN: ; NumVgprs: 50			; GCN: ; NumVgprs: 50
	define void @use_50_vgpr() #1 {			define void @use_50_vgpr() #1 {
	▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll

Show All 17 Lines	define amdgpu_kernel void @test_kernel_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "", ""() #0		call void asm sideeffect "", ""() #0
call void @external_void_func_void()		call void @external_void_func_void()
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:		; GCN-LABEL: {{^}}test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
; MUBUF: buffer_store_dword		; MUBUF: buffer_store_dword
		; MUBUF: buffer_store_dword
		; FLATSCR: scratch_store_dword
; FLATSCR: scratch_store_dword		; FLATSCR: scratch_store_dword
; GCN: v_writelane_b32 v40, s33, 4
; GCN: v_writelane_b32 v40, s30, 0		; GCN: v_writelane_b32 v40, s30, 0
; GCN: v_writelane_b32 v40, s31, 1		; GCN: v_writelane_b32 v40, s31, 1
		; GCN: v_writelane_b32 v41, s33, 0
; GCN: v_writelane_b32 v40, s34, 2		; GCN: v_writelane_b32 v40, s34, 2
; GCN: v_writelane_b32 v40, s35, 3		; GCN: v_writelane_b32 v40, s35, 3

; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: s_swappc_b64		; GCN-NEXT: s_swappc_b64
; GCN: v_readlane_b32 s35, v40, 3		; GCN: v_readlane_b32 s35, v40, 3
; GCN: v_readlane_b32 s34, v40, 2		; GCN: v_readlane_b32 s34, v40, 2
; MUBUF-DAG: v_readlane_b32 s31, v40, 1		; MUBUF-DAG: v_readlane_b32 s31, v40, 1
; MUBUF-DAG: v_readlane_b32 s30, v40, 0		; MUBUF-DAG: v_readlane_b32 s30, v40, 0
; FLATSCR-DAG: v_readlane_b32 s31, v40, 1		; FLATSCR-DAG: v_readlane_b32 s31, v40, 1
; FLATSCR-DAG: v_readlane_b32 s30, v40, 0		; FLATSCR-DAG: v_readlane_b32 s30, v40, 0

; GCN: v_readlane_b32 s33, v40, 4		; GCN: v_readlane_b32 s33, v41, 0
; MUBUF: buffer_load_dword		; MUBUF: buffer_load_dword
		; MUBUF: buffer_load_dword
		; FLATSCR: scratch_load_dword
; FLATSCR: scratch_load_dword		; FLATSCR: scratch_load_dword
; GCN: s_setpc_b64 s[30:31]		; GCN: s_setpc_b64 s[30:31]
define void @test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {		define void @test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {
call void @external_void_func_void()		call void @external_void_func_void()
call void asm sideeffect "", ""() #0		call void asm sideeffect "", ""() #0
call void @external_void_func_void()		call void @external_void_func_void()
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_func_call_external_void_funcx2:		; GCN-LABEL: {{^}}test_func_call_external_void_funcx2:
; MUBUF: buffer_store_dword v40		; MUBUF: buffer_store_dword v40
		; MUBUF: buffer_store_dword v41
; FLATSCR: scratch_store_dword off, v40		; FLATSCR: scratch_store_dword off, v40
; GCN: v_writelane_b32 v40, s33, 4		; FLATSCR: scratch_store_dword off, v41
		; GCN: v_writelane_b32 v41, s33, 0

; GCN: s_mov_b32 s33, s32		; GCN: s_mov_b32 s33, s32
; MUBUF: s_addk_i32 s32, 0x400		; MUBUF: s_addk_i32 s32, 0x400
; FLATSCR: s_add_i32 s32, s32, 16		; FLATSCR: s_add_i32 s32, s32, 16
; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-NEXT: s_swappc_b64		; GCN-NEXT: s_swappc_b64

; GCN: v_readlane_b32 s33, v40, 4		; GCN: v_readlane_b32 s33, v41, 0
; MUBUF: buffer_load_dword v40		; MUBUF: buffer_load_dword v40
		; MUBUF: buffer_load_dword v41
; FLATSCR: scratch_load_dword v40		; FLATSCR: scratch_load_dword v40
		; FLATSCR: scratch_load_dword v41
define void @test_func_call_external_void_funcx2() #0 {		define void @test_func_call_external_void_funcx2() #0 {
call void @external_void_func_void()		call void @external_void_func_void()
call void @external_void_func_void()		call void @external_void_func_void()
ret void		ret void
}		}

; GCN-LABEL: {{^}}void_func_void_clobber_s30_s31:		; GCN-LABEL: {{^}}void_func_void_clobber_s30_s31:
; GCN: s_waitcnt		; GCN: s_waitcnt
▲ Show 20 Lines • Show All 271 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	define void @callee_with_stack_no_fp_elim_non_leaf() #2 {
ret void		ret void
}		}

; GCN-LABEL: {{^}}callee_with_stack_and_call:		; GCN-LABEL: {{^}}callee_with_stack_and_call:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR_1:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill
		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR_1:v[0-9]+]], s32 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN: v_writelane_b32 [[CSR_VGPR]], s33, 2		; GCN: v_writelane_b32 [[CSR_VGPR_1]], s33, 0
; GCN-DAG: s_mov_b32 s33, s32		; GCN-DAG: s_mov_b32 s33, s32
; MUBUF-DAG: s_addk_i32 s32, 0x400{{$}}		; MUBUF-DAG: s_addk_i32 s32, 0x400{{$}}
; FLATSCR-DAG: s_add_i32 s32, s32, 16{{$}}		; FLATSCR-DAG: s_add_i32 s32, s32, 16{{$}}
; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0{{$}}		; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0{{$}}
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30,		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30,
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31,		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31,

; MUBUF-DAG: buffer_store_dword [[ZERO]], off, s[0:3], s33{{$}}		; MUBUF-DAG: buffer_store_dword [[ZERO]], off, s[0:3], s33{{$}}
; FLATSCR-DAG: scratch_store_dword off, [[ZERO]], s33{{$}}		; FLATSCR-DAG: scratch_store_dword off, [[ZERO]], s33{{$}}

; GCN: s_swappc_b64		; GCN: s_swappc_b64

; GCN-DAG: v_readlane_b32 s30, [[CSR_VGPR]]		; GCN-DAG: v_readlane_b32 s30, [[CSR_VGPR]]
; GCN-DAG: v_readlane_b32 s31, [[CSR_VGPR]]		; GCN-DAG: v_readlane_b32 s31, [[CSR_VGPR]]

; MUBUF: s_addk_i32 s32, 0xfc00{{$}}		; MUBUF: s_addk_i32 s32, 0xfc00{{$}}
; FLATSCR: s_add_i32 s32, s32, -16{{$}}		; FLATSCR: s_add_i32 s32, s32, -16{{$}}
; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2		; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR_1]], 0
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR_1]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:4 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:4 ; 4-byte Folded Reload
		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR_1]], off, s32 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)

; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
define void @callee_with_stack_and_call() #0 {		define void @callee_with_stack_and_call() #0 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca
call void @external_void_func_void()		call void @external_void_func_void()
ret void		ret void
}		}

; Should be able to copy incoming stack pointer directly to inner		; Should be able to copy incoming stack pointer directly to inner
; call's stack pointer argument.		; call's stack pointer argument.

; There is stack usage only because of the need to evict a VGPR for		; There is stack usage only because of the need to evict a VGPR for
; spilling CSR SGPRs.		; spilling CSR SGPRs.

; GCN-LABEL: {{^}}callee_no_stack_with_call:		; GCN-LABEL: {{^}}callee_no_stack_with_call:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 ; 4-byte Folded Spill
		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR_1:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 ; 4-byte Folded Spill
		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR_1:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; MUBUF-DAG: s_addk_i32 s32, 0x400		; MUBUF-DAG: s_addk_i32 s32, 0x400
; FLATSCR-DAG: s_add_i32 s32, s32, 16		; FLATSCR-DAG: s_add_i32 s32, s32, 16
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s33, [[FP_SPILL_LANE:[0-9]+]]		; GCN-DAG: v_writelane_b32 [[CSR_VGPR_1]], s33, [[FP_SPILL_LANE:[0-9]+]]

; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30, 0		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30, 0
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1
; GCN: s_swappc_b64		; GCN: s_swappc_b64

; GCN-DAG: v_readlane_b32 s30, [[CSR_VGPR]], 0		; GCN-DAG: v_readlane_b32 s30, [[CSR_VGPR]], 0
; GCN-DAG: v_readlane_b32 s31, [[CSR_VGPR]], 1		; GCN-DAG: v_readlane_b32 s31, [[CSR_VGPR]], 1

; MUBUF: s_addk_i32 s32, 0xfc00		; MUBUF: s_addk_i32 s32, 0xfc00
; FLATSCR: s_add_i32 s32, s32, -16		; FLATSCR: s_add_i32 s32, s32, -16
; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], [[FP_SPILL_LANE]]		; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR_1]], [[FP_SPILL_LANE]]
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 ; 4-byte Folded Reload
		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR_1]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 ; 4-byte Folded Reload
		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR_1]], off, s32 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
define void @callee_no_stack_with_call() #0 {		define void @callee_no_stack_with_call() #0 {
call void @external_void_func_void()		call void @external_void_func_void()
ret void		ret void
}		}

▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines

; Use a copy to a free SGPR instead of introducing a second CSR VGPR.		; Use a copy to a free SGPR instead of introducing a second CSR VGPR.
; GCN-LABEL: {{^}}last_lane_vgpr_for_fp_csr:		; GCN-LABEL: {{^}}last_lane_vgpr_for_fp_csr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:8 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN-NEXT: v_writelane_b32 v0, s33, 63
; GCN-COUNT-60: v_writelane_b32 v0		; GCN-COUNT-60: v_writelane_b32 v0
		; GCN: s_mov_b32 [[TMP_SGPR:s[0-9]+]], s33
; GCN: s_mov_b32 s33, s32		; GCN: s_mov_b32 s33, s32
; GCN: v_writelane_b32 v0		; GCN: v_writelane_b32 v0
; MUBUF: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; MUBUF: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
; FLATSCR: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill		; FLATSCR: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill
; GCN: v_writelane_b32 v0		; GCN: v_writelane_b32 v0
; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:4		; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:4
; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33 offset:4		; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33 offset:4
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN: v_writelane_b32 v0		; GCN: v_writelane_b32 v0

; MUBUF: s_addk_i32 s32, 0x400		; MUBUF: s_addk_i32 s32, 0x400
; MUBUF: s_addk_i32 s32, 0xfc00		; MUBUF: s_addk_i32 s32, 0xfc00
; FLATSCR: s_add_i32 s32, s32, 16		; FLATSCR: s_add_i32 s32, s32, 16
; FLATSCR: s_add_i32 s32, s32, -16		; FLATSCR: s_add_i32 s32, s32, -16
; GCN-NEXT: v_readlane_b32 s33, v0, 63		; GCN-NEXT: s_mov_b32 s33, [[TMP_SGPR]]
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:8 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @last_lane_vgpr_for_fp_csr() #1 {		define void @last_lane_vgpr_for_fp_csr() #1 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
}		}

; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp:		; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s33, 2		; GCN-NEXT: s_mov_b32 vcc_lo, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN: v_writelane_b32 [[CSR_VGPR]], s30, 0		; GCN: v_writelane_b32 [[CSR_VGPR]], s30, 0
; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0		; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0
; MUBUF: s_addk_i32 s32, 0x300		; MUBUF: s_addk_i32 s32, 0x300
; FLATSCR: s_add_i32 s32, s32, 12		; FLATSCR: s_add_i32 s32, s32, 12
; GCN: v_writelane_b32 [[CSR_VGPR]], s31, 1		; GCN: v_writelane_b32 [[CSR_VGPR]], s31, 1
; MUBUF: buffer_store_dword [[ZERO]], off, s[0:3], s33{{$}}		; MUBUF: buffer_store_dword [[ZERO]], off, s[0:3], s33{{$}}
; FLATSCR: scratch_store_dword off, [[ZERO]], s33{{$}}		; FLATSCR: scratch_store_dword off, [[ZERO]], s33{{$}}
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN: v_readlane_b32 s31, [[CSR_VGPR]], 1		; GCN: v_readlane_b32 s31, [[CSR_VGPR]], 1
; GCN: v_readlane_b32 s30, [[CSR_VGPR]], 0		; GCN: v_readlane_b32 s30, [[CSR_VGPR]], 0
; MUBUF: s_addk_i32 s32, 0xfd00		; MUBUF: s_addk_i32 s32, 0xfd00
; FLATSCR: s_add_i32 s32, s32, -12		; FLATSCR: s_add_i32 s32, s32, -12
; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2		; GCN-NEXT: s_mov_b32 s33, vcc_lo
		arsenmUnsubmitted Not Done Reply Inline Actions Why the behavior change? Is this restored in a later patch? arsenm: Why the behavior change? Is this restored in a later patch?
		cdevadasAuthorUnsubmitted Done Reply Inline Actions It's already been discussed. Jay earlier asked about the same in this review. I'm planning a follow-up patch to regain it. Using the VRM map, the unused lanes of the last allocated VGPR virtual register for SGPR spilling can be tracked and can use later during FrameLowering while trying to spill FP/BP. cdevadas: It's already been discussed. Jay earlier asked about the same in this review. I'm planning a…
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:4 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
define void @no_unused_non_csr_sgpr_for_fp() #1 {		define void @no_unused_non_csr_sgpr_for_fp() #1 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
Show All 11 Lines

; Need a new CSR VGPR to satisfy the FP spill.		; Need a new CSR VGPR to satisfy the FP spill.
; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr:		; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s33, 2		; GCN-NEXT: s_mov_b32 vcc_lo, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; MUBUF: s_addk_i32 s32, 0x300{{$}}		; MUBUF: s_addk_i32 s32, 0x300{{$}}
; FLATSCR: s_add_i32 s32, s32, 12{{$}}		; FLATSCR: s_add_i32 s32, s32, 12{{$}}

; MUBUF-DAG: buffer_store_dword		; MUBUF-DAG: buffer_store_dword
; FLATSCR-DAG: scratch_store_dword		; FLATSCR-DAG: scratch_store_dword

; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; MUBUF: s_addk_i32 s32, 0xfd00{{$}}		; MUBUF: s_addk_i32 s32, 0xfd00{{$}}
; FLATSCR: s_add_i32 s32, s32, -12{{$}}		; FLATSCR: s_add_i32 s32, s32, -12{{$}}
; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2		; GCN-NEXT: s_mov_b32 s33, vcc_lo
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:4 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
define void @no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr() #1 {		define void @no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr() #1 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
Show All 20 Lines
; GCN-LABEL: {{^}}scratch_reg_needed_mubuf_offset:		; GCN-LABEL: {{^}}scratch_reg_needed_mubuf_offset:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40100		; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40100
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; FLATSCR-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1004		; FLATSCR-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1004
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], [[SCRATCH_SGPR]] ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s33, 2		; GCN-NEXT: s_mov_b32 vcc_lo, s33
; GCN-DAG: s_mov_b32 s33, s32		; GCN-DAG: s_mov_b32 s33, s32
; MUBUF-DAG: s_add_i32 s32, s32, 0x40300{{$}}		; MUBUF-DAG: s_add_i32 s32, s32, 0x40300{{$}}
; FLATSCR-DAG: s_addk_i32 s32, 0x100c{{$}}		; FLATSCR-DAG: s_addk_i32 s32, 0x100c{{$}}
; MUBUF-DAG: buffer_store_dword		; MUBUF-DAG: buffer_store_dword
; FLATSCR-DAG: scratch_store_dword		; FLATSCR-DAG: scratch_store_dword

; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; MUBUF: s_add_i32 s32, s32, 0xfffbfd00{{$}}		; MUBUF: s_add_i32 s32, s32, 0xfffbfd00{{$}}
; FLATSCR: s_addk_i32 s32, 0xeff4{{$}}		; FLATSCR: s_addk_i32 s32, 0xeff4{{$}}
; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2		; GCN-NEXT: s_mov_b32 s33, vcc_lo
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40100		; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40100
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Reload
; FLATSCR-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1004		; FLATSCR-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1004
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, [[SCRATCH_SGPR]] ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, [[SCRATCH_SGPR]] ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
Show All 23 Lines
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define internal void @local_empty_func() #0 {		define internal void @local_empty_func() #0 {
ret void		ret void
}		}

; An FP is needed, despite not needing any spills		; An FP is needed, despite not needing any spills
; TODO: Ccould see callee does not use stack and omit FP.		; TODO: Ccould see callee does not use stack and omit FP.
; GCN-LABEL: {{^}}ipra_call_with_stack:		; GCN-LABEL: {{^}}ipra_call_with_stack:
; GCN: v_writelane_b32 v0, s33, 2		; GCN: s_mov_b32 [[TMP_SGPR:s[0-9]+]], s33
; GCN: s_mov_b32 s33, s32		; GCN: s_mov_b32 s33, s32
; MUBUF: s_addk_i32 s32, 0x400		; MUBUF: s_addk_i32 s32, 0x400
; FLATSCR: s_add_i32 s32, s32, 16		; FLATSCR: s_add_i32 s32, s32, 16
; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33{{$}}		; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33{{$}}
; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33{{$}}		; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33{{$}}
; GCN: s_swappc_b64		; GCN: s_swappc_b64
; MUBUF: s_addk_i32 s32, 0xfc00		; MUBUF: s_addk_i32 s32, 0xfc00
; FLATSCR: s_add_i32 s32, s32, -16		; FLATSCR: s_add_i32 s32, s32, -16
; GCN: v_readlane_b32 s33, v0, 2		; GCN: s_mov_b32 s33, [[TMP_SGPR]]
define void @ipra_call_with_stack() #0 {		define void @ipra_call_with_stack() #0 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca
call void @local_empty_func()		call void @local_empty_func()
ret void		ret void
}		}

; With no free registers, we must spill the FP to memory.		; With no free registers, we must spill the FP to memory.
▲ Show 20 Lines • Show All 141 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

	Show All 23 Lines


	define float @call_split_type_used_outside_block_v2f32() #0 {			define float @call_split_type_used_outside_block_v2f32() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v2f32:			; GCN-LABEL: call_split_type_used_outside_block_v2f32:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 2			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_v2f32@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_v2f32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_v2f32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_v2f32@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 2			; GCN-NEXT: v_readlane_b32 s33, v41, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	%split.ret.type = call <2 x float> @func_v2f32()			%split.ret.type = call <2 x float> @func_v2f32()
	br label %bb1			br label %bb1

	bb1:			bb1:
	%extract = extractelement <2 x float> %split.ret.type, i32 0			%extract = extractelement <2 x float> %split.ret.type, i32 0
	ret float %extract			ret float %extract
	}			}

	define float @call_split_type_used_outside_block_v3f32() #0 {			define float @call_split_type_used_outside_block_v3f32() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v3f32:			; GCN-LABEL: call_split_type_used_outside_block_v3f32:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 2			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_v3f32@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_v3f32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_v3f32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_v3f32@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 2			; GCN-NEXT: v_readlane_b32 s33, v41, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	%split.ret.type = call <3 x float> @func_v3f32()			%split.ret.type = call <3 x float> @func_v3f32()
	br label %bb1			br label %bb1

	bb1:			bb1:
	%extract = extractelement <3 x float> %split.ret.type, i32 0			%extract = extractelement <3 x float> %split.ret.type, i32 0
	ret float %extract			ret float %extract
	}			}

	define half @call_split_type_used_outside_block_v4f16() #0 {			define half @call_split_type_used_outside_block_v4f16() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v4f16:			; GCN-LABEL: call_split_type_used_outside_block_v4f16:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 2			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_v4f16@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_v4f16@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_v4f16@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_v4f16@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 2			; GCN-NEXT: v_readlane_b32 s33, v41, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	%split.ret.type = call <4 x half> @func_v4f16()			%split.ret.type = call <4 x half> @func_v4f16()
	br label %bb1			br label %bb1

	bb1:			bb1:
	%extract = extractelement <4 x half> %split.ret.type, i32 0			%extract = extractelement <4 x half> %split.ret.type, i32 0
	ret half %extract			ret half %extract
	}			}

	define { i32, half } @call_split_type_used_outside_block_struct() #0 {			define { i32, half } @call_split_type_used_outside_block_struct() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_struct:			; GCN-LABEL: call_split_type_used_outside_block_struct:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 2			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_getpc_b64 s[16:17]			; GCN-NEXT: s_getpc_b64 s[16:17]
	; GCN-NEXT: s_add_u32 s16, s16, func_struct@rel32@lo+4			; GCN-NEXT: s_add_u32 s16, s16, func_struct@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s17, s17, func_struct@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_struct@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GCN-NEXT: v_mov_b32_e32 v1, v4			; GCN-NEXT: v_mov_b32_e32 v1, v4
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 2			; GCN-NEXT: v_readlane_b32 s33, v41, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	%split.ret.type = call { <4 x i32>, <4 x half> } @func_struct()			%split.ret.type = call { <4 x i32>, <4 x half> } @func_struct()
	br label %bb1			br label %bb1

	bb1:			bb1:
	▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll

	Show All 12 Lines
	; CHECK: .Lfunc_begin0:			; CHECK: .Lfunc_begin0:
	; CHECK-NEXT: .loc 1 288 0 ; dummy:288:0			; CHECK-NEXT: .loc 1 288 0 ; dummy:288:0
	; CHECK-NEXT: .cfi_sections .debug_frame			; CHECK-NEXT: .cfi_sections .debug_frame
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: ; %bb.0:			; CHECK-NEXT: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_or_saveexec_b64 s[16:17], -1			; CHECK-NEXT: s_or_saveexec_b64 s[16:17], -1
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; CHECK-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[16:17]			; CHECK-NEXT: s_mov_b64 exec, s[16:17]
	; CHECK-NEXT: v_writelane_b32 v40, s33, 15
	; CHECK-NEXT: v_writelane_b32 v40, s30, 0			; CHECK-NEXT: v_writelane_b32 v40, s30, 0
	; CHECK-NEXT: v_writelane_b32 v40, s31, 1			; CHECK-NEXT: v_writelane_b32 v40, s31, 1
	; CHECK-NEXT: v_writelane_b32 v40, s34, 2			; CHECK-NEXT: v_writelane_b32 v40, s34, 2
	; CHECK-NEXT: v_writelane_b32 v40, s35, 3			; CHECK-NEXT: v_writelane_b32 v40, s35, 3
	; CHECK-NEXT: v_writelane_b32 v40, s36, 4			; CHECK-NEXT: v_writelane_b32 v40, s36, 4
	; CHECK-NEXT: v_writelane_b32 v40, s37, 5			; CHECK-NEXT: v_writelane_b32 v40, s37, 5
	; CHECK-NEXT: v_writelane_b32 v40, s38, 6			; CHECK-NEXT: v_writelane_b32 v40, s38, 6
	; CHECK-NEXT: v_writelane_b32 v40, s39, 7			; CHECK-NEXT: v_writelane_b32 v40, s39, 7
	; CHECK-NEXT: v_writelane_b32 v40, s40, 8			; CHECK-NEXT: v_writelane_b32 v40, s40, 8
	; CHECK-NEXT: v_writelane_b32 v40, s41, 9			; CHECK-NEXT: v_writelane_b32 v40, s41, 9
	; CHECK-NEXT: v_writelane_b32 v40, s42, 10			; CHECK-NEXT: v_writelane_b32 v40, s42, 10
	; CHECK-NEXT: v_writelane_b32 v40, s43, 11			; CHECK-NEXT: v_writelane_b32 v40, s43, 11
				; CHECK-NEXT: v_writelane_b32 v42, s33, 0
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_addk_i32 s32, 0x400			; CHECK-NEXT: s_addk_i32 s32, 0x400
	; CHECK-NEXT: v_writelane_b32 v40, s44, 12			; CHECK-NEXT: v_writelane_b32 v40, s44, 12
	; CHECK-NEXT: v_writelane_b32 v40, s46, 13			; CHECK-NEXT: v_writelane_b32 v40, s46, 13
	; CHECK-NEXT: s_mov_b64 s[40:41], s[4:5]			; CHECK-NEXT: s_mov_b64 s[40:41], s[4:5]
	; CHECK-NEXT: ;DEBUG_VALUE: dummy:dummy <- undef			; CHECK-NEXT: ;DEBUG_VALUE: dummy:dummy <- undef
	; CHECK-NEXT: .Ltmp0:			; CHECK-NEXT: .Ltmp0:
	; CHECK-NEXT: .loc 1 49 9 prologue_end ; dummy:49:9			; CHECK-NEXT: .loc 1 49 9 prologue_end ; dummy:49:9
	Show All 39 Lines
	; CHECK-NEXT: v_readlane_b32 s38, v40, 6			; CHECK-NEXT: v_readlane_b32 s38, v40, 6
	; CHECK-NEXT: v_readlane_b32 s37, v40, 5			; CHECK-NEXT: v_readlane_b32 s37, v40, 5
	; CHECK-NEXT: v_readlane_b32 s36, v40, 4			; CHECK-NEXT: v_readlane_b32 s36, v40, 4
	; CHECK-NEXT: v_readlane_b32 s35, v40, 3			; CHECK-NEXT: v_readlane_b32 s35, v40, 3
	; CHECK-NEXT: v_readlane_b32 s34, v40, 2			; CHECK-NEXT: v_readlane_b32 s34, v40, 2
	; CHECK-NEXT: v_readlane_b32 s31, v40, 1			; CHECK-NEXT: v_readlane_b32 s31, v40, 1
	; CHECK-NEXT: v_readlane_b32 s30, v40, 0			; CHECK-NEXT: v_readlane_b32 s30, v40, 0
	; CHECK-NEXT: s_addk_i32 s32, 0xfc00			; CHECK-NEXT: s_addk_i32 s32, 0xfc00
	; CHECK-NEXT: v_readlane_b32 s33, v40, 15			; CHECK-NEXT: v_readlane_b32 s33, v42, 0
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
				; CHECK-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	; CHECK-NEXT: .Ltmp2:			; CHECK-NEXT: .Ltmp2:
	%2 = call ptr @__kmpc_alloc_shared(), !dbg !43			%2 = call ptr @__kmpc_alloc_shared(), !dbg !43
	%3 = call ptr @__kmpc_alloc_shared()			%3 = call ptr @__kmpc_alloc_shared()
	store i32 0, ptr %3, align 4			store i32 0, ptr %3, align 4
	call void @llvm.dbg.declare(metadata ptr %3, metadata !40, metadata !DIExpression()), !dbg !43			call void @llvm.dbg.declare(metadata ptr %3, metadata !40, metadata !DIExpression()), !dbg !43
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/frame-setup-without-sgpr-to-vgpr-spills.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-spill-sgpr-to-vgpr=true < %s \| FileCheck -check-prefix=SPILL-TO-VGPR %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-spill-sgpr-to-vgpr=true < %s \| FileCheck -check-prefix=SPILL-TO-VGPR %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-spill-sgpr-to-vgpr=false < %s \| FileCheck -check-prefix=NO-SPILL-TO-VGPR %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-spill-sgpr-to-vgpr=false < %s \| FileCheck -check-prefix=NO-SPILL-TO-VGPR %s

	; Check frame setup where SGPR spills to VGPRs are disabled or enabled.			; Check frame setup where SGPR spills to VGPRs are disabled or enabled.

	declare hidden void @external_void_func_void() #0			declare hidden void @external_void_func_void() #0

	define void @callee_with_stack_and_call() #0 {			define void @callee_with_stack_and_call() #0 {
	; SPILL-TO-VGPR-LABEL: callee_with_stack_and_call:			; SPILL-TO-VGPR-LABEL: callee_with_stack_and_call:
	; SPILL-TO-VGPR: ; %bb.0:			; SPILL-TO-VGPR: ; %bb.0:
	; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[4:5], -1			; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[4:5], -1
	; SPILL-TO-VGPR-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; SPILL-TO-VGPR-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; SPILL-TO-VGPR-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[4:5]			; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[4:5]
	; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s33, 2			; SPILL-TO-VGPR-NEXT: v_writelane_b32 v41, s33, 0
	; SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s32			; SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s32
	; SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0x400			; SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0x400
	; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s30, 0			; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s30, 0
	; SPILL-TO-VGPR-NEXT: v_mov_b32_e32 v0, 0			; SPILL-TO-VGPR-NEXT: v_mov_b32_e32 v0, 0
	; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s31, 1			; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s31, 1
	; SPILL-TO-VGPR-NEXT: buffer_store_dword v0, off, s[0:3], s33			; SPILL-TO-VGPR-NEXT: buffer_store_dword v0, off, s[0:3], s33
	; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)			; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
	; SPILL-TO-VGPR-NEXT: s_getpc_b64 s[4:5]			; SPILL-TO-VGPR-NEXT: s_getpc_b64 s[4:5]
	; SPILL-TO-VGPR-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; SPILL-TO-VGPR-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; SPILL-TO-VGPR-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; SPILL-TO-VGPR-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; SPILL-TO-VGPR-NEXT: s_swappc_b64 s[30:31], s[4:5]			; SPILL-TO-VGPR-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; SPILL-TO-VGPR-NEXT: v_readlane_b32 s31, v40, 1			; SPILL-TO-VGPR-NEXT: v_readlane_b32 s31, v40, 1
	; SPILL-TO-VGPR-NEXT: v_readlane_b32 s30, v40, 0			; SPILL-TO-VGPR-NEXT: v_readlane_b32 s30, v40, 0
	; SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0xfc00			; SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0xfc00
	; SPILL-TO-VGPR-NEXT: v_readlane_b32 s33, v40, 2			; SPILL-TO-VGPR-NEXT: v_readlane_b32 s33, v41, 0
	; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[4:5], -1			; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[4:5], -1
	; SPILL-TO-VGPR-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; SPILL-TO-VGPR-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
				; SPILL-TO-VGPR-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[4:5]			; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[4:5]
	; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)			; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
	; SPILL-TO-VGPR-NEXT: s_setpc_b64 s[30:31]			; SPILL-TO-VGPR-NEXT: s_setpc_b64 s[30:31]
	;			;
	; NO-SPILL-TO-VGPR-LABEL: callee_with_stack_and_call:			; NO-SPILL-TO-VGPR-LABEL: callee_with_stack_and_call:
	; NO-SPILL-TO-VGPR: ; %bb.0:			; NO-SPILL-TO-VGPR: ; %bb.0:
	; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; NO-SPILL-TO-VGPR-NEXT: v_mov_b32_e32 v0, s33			; NO-SPILL-TO-VGPR-NEXT: v_mov_b32_e32 v0, s33
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/gfx-call-non-gfx-func.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=SDAG -enable-var-scope %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=SDAG -enable-var-scope %s
	; RUN: llc -global-isel -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GISEL -enable-var-scope %s			; RUN: llc -global-isel -mtriple=amdgcn--amdpal -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GISEL -enable-var-scope %s

	declare void @extern_c_func()			declare void @extern_c_func()

	define amdgpu_gfx void @gfx_func() {			define amdgpu_gfx void @gfx_func() {
	; SDAG-LABEL: gfx_func:			; SDAG-LABEL: gfx_func:
	; SDAG: ; %bb.0:			; SDAG: ; %bb.0:
	; SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1			; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1
	; SDAG-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; SDAG-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; SDAG-NEXT: s_mov_b64 exec, s[34:35]			; SDAG-NEXT: s_mov_b64 exec, s[34:35]
	; SDAG-NEXT: v_writelane_b32 v40, s33, 28
	; SDAG-NEXT: v_writelane_b32 v40, s4, 0			; SDAG-NEXT: v_writelane_b32 v40, s4, 0
	; SDAG-NEXT: v_writelane_b32 v40, s5, 1			; SDAG-NEXT: v_writelane_b32 v40, s5, 1
	; SDAG-NEXT: v_writelane_b32 v40, s6, 2			; SDAG-NEXT: v_writelane_b32 v40, s6, 2
	; SDAG-NEXT: v_writelane_b32 v40, s7, 3			; SDAG-NEXT: v_writelane_b32 v40, s7, 3
	; SDAG-NEXT: v_writelane_b32 v40, s8, 4			; SDAG-NEXT: v_writelane_b32 v40, s8, 4
	; SDAG-NEXT: v_writelane_b32 v40, s9, 5			; SDAG-NEXT: v_writelane_b32 v40, s9, 5
	; SDAG-NEXT: v_writelane_b32 v40, s10, 6			; SDAG-NEXT: v_writelane_b32 v40, s10, 6
	; SDAG-NEXT: v_writelane_b32 v40, s11, 7			; SDAG-NEXT: v_writelane_b32 v40, s11, 7
	; SDAG-NEXT: v_writelane_b32 v40, s12, 8			; SDAG-NEXT: v_writelane_b32 v40, s12, 8
	; SDAG-NEXT: v_writelane_b32 v40, s13, 9			; SDAG-NEXT: v_writelane_b32 v40, s13, 9
	; SDAG-NEXT: v_writelane_b32 v40, s14, 10			; SDAG-NEXT: v_writelane_b32 v40, s14, 10
	; SDAG-NEXT: v_writelane_b32 v40, s15, 11			; SDAG-NEXT: v_writelane_b32 v40, s15, 11
	; SDAG-NEXT: v_writelane_b32 v40, s16, 12			; SDAG-NEXT: v_writelane_b32 v40, s16, 12
	; SDAG-NEXT: v_writelane_b32 v40, s17, 13			; SDAG-NEXT: v_writelane_b32 v40, s17, 13
	; SDAG-NEXT: v_writelane_b32 v40, s18, 14			; SDAG-NEXT: v_writelane_b32 v40, s18, 14
	; SDAG-NEXT: v_writelane_b32 v40, s19, 15			; SDAG-NEXT: v_writelane_b32 v40, s19, 15
	; SDAG-NEXT: v_writelane_b32 v40, s20, 16			; SDAG-NEXT: v_writelane_b32 v40, s20, 16
	; SDAG-NEXT: v_writelane_b32 v40, s21, 17			; SDAG-NEXT: v_writelane_b32 v40, s21, 17
	; SDAG-NEXT: v_writelane_b32 v40, s22, 18			; SDAG-NEXT: v_writelane_b32 v40, s22, 18
	; SDAG-NEXT: v_writelane_b32 v40, s23, 19			; SDAG-NEXT: v_writelane_b32 v40, s23, 19
				; SDAG-NEXT: s_mov_b32 s36, s33
	; SDAG-NEXT: s_mov_b32 s33, s32			; SDAG-NEXT: s_mov_b32 s33, s32
	; SDAG-NEXT: s_addk_i32 s32, 0x400			; SDAG-NEXT: s_addk_i32 s32, 0x400
	; SDAG-NEXT: v_writelane_b32 v40, s24, 20			; SDAG-NEXT: v_writelane_b32 v40, s24, 20
	; SDAG-NEXT: v_writelane_b32 v40, s25, 21			; SDAG-NEXT: v_writelane_b32 v40, s25, 21
	; SDAG-NEXT: s_getpc_b64 s[34:35]			; SDAG-NEXT: s_getpc_b64 s[34:35]
	; SDAG-NEXT: s_add_u32 s34, s34, extern_c_func@gotpcrel32@lo+4			; SDAG-NEXT: s_add_u32 s34, s34, extern_c_func@gotpcrel32@lo+4
	; SDAG-NEXT: s_addc_u32 s35, s35, extern_c_func@gotpcrel32@hi+12			; SDAG-NEXT: s_addc_u32 s35, s35, extern_c_func@gotpcrel32@hi+12
	; SDAG-NEXT: v_writelane_b32 v40, s26, 22			; SDAG-NEXT: v_writelane_b32 v40, s26, 22
	Show All 30 Lines
	; SDAG-NEXT: v_readlane_b32 s10, v40, 6			; SDAG-NEXT: v_readlane_b32 s10, v40, 6
	; SDAG-NEXT: v_readlane_b32 s9, v40, 5			; SDAG-NEXT: v_readlane_b32 s9, v40, 5
	; SDAG-NEXT: v_readlane_b32 s8, v40, 4			; SDAG-NEXT: v_readlane_b32 s8, v40, 4
	; SDAG-NEXT: v_readlane_b32 s7, v40, 3			; SDAG-NEXT: v_readlane_b32 s7, v40, 3
	; SDAG-NEXT: v_readlane_b32 s6, v40, 2			; SDAG-NEXT: v_readlane_b32 s6, v40, 2
	; SDAG-NEXT: v_readlane_b32 s5, v40, 1			; SDAG-NEXT: v_readlane_b32 s5, v40, 1
	; SDAG-NEXT: v_readlane_b32 s4, v40, 0			; SDAG-NEXT: v_readlane_b32 s4, v40, 0
	; SDAG-NEXT: s_addk_i32 s32, 0xfc00			; SDAG-NEXT: s_addk_i32 s32, 0xfc00
	; SDAG-NEXT: v_readlane_b32 s33, v40, 28			; SDAG-NEXT: s_mov_b32 s33, s36
	; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1			; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1
	; SDAG-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; SDAG-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; SDAG-NEXT: s_mov_b64 exec, s[34:35]			; SDAG-NEXT: s_mov_b64 exec, s[34:35]
	; SDAG-NEXT: s_waitcnt vmcnt(0)			; SDAG-NEXT: s_waitcnt vmcnt(0)
	; SDAG-NEXT: s_setpc_b64 s[30:31]			; SDAG-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: gfx_func:			; GISEL-LABEL: gfx_func:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1			; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[34:35]			; GISEL-NEXT: s_mov_b64 exec, s[34:35]
	; GISEL-NEXT: v_writelane_b32 v40, s33, 28
	; GISEL-NEXT: v_writelane_b32 v40, s4, 0			; GISEL-NEXT: v_writelane_b32 v40, s4, 0
	; GISEL-NEXT: v_writelane_b32 v40, s5, 1			; GISEL-NEXT: v_writelane_b32 v40, s5, 1
	; GISEL-NEXT: v_writelane_b32 v40, s6, 2			; GISEL-NEXT: v_writelane_b32 v40, s6, 2
	; GISEL-NEXT: v_writelane_b32 v40, s7, 3			; GISEL-NEXT: v_writelane_b32 v40, s7, 3
	; GISEL-NEXT: v_writelane_b32 v40, s8, 4			; GISEL-NEXT: v_writelane_b32 v40, s8, 4
	; GISEL-NEXT: v_writelane_b32 v40, s9, 5			; GISEL-NEXT: v_writelane_b32 v40, s9, 5
	; GISEL-NEXT: v_writelane_b32 v40, s10, 6			; GISEL-NEXT: v_writelane_b32 v40, s10, 6
	; GISEL-NEXT: v_writelane_b32 v40, s11, 7			; GISEL-NEXT: v_writelane_b32 v40, s11, 7
	; GISEL-NEXT: v_writelane_b32 v40, s12, 8			; GISEL-NEXT: v_writelane_b32 v40, s12, 8
	; GISEL-NEXT: v_writelane_b32 v40, s13, 9			; GISEL-NEXT: v_writelane_b32 v40, s13, 9
	; GISEL-NEXT: v_writelane_b32 v40, s14, 10			; GISEL-NEXT: v_writelane_b32 v40, s14, 10
	; GISEL-NEXT: v_writelane_b32 v40, s15, 11			; GISEL-NEXT: v_writelane_b32 v40, s15, 11
	; GISEL-NEXT: v_writelane_b32 v40, s16, 12			; GISEL-NEXT: v_writelane_b32 v40, s16, 12
	; GISEL-NEXT: v_writelane_b32 v40, s17, 13			; GISEL-NEXT: v_writelane_b32 v40, s17, 13
	; GISEL-NEXT: v_writelane_b32 v40, s18, 14			; GISEL-NEXT: v_writelane_b32 v40, s18, 14
	; GISEL-NEXT: v_writelane_b32 v40, s19, 15			; GISEL-NEXT: v_writelane_b32 v40, s19, 15
	; GISEL-NEXT: v_writelane_b32 v40, s20, 16			; GISEL-NEXT: v_writelane_b32 v40, s20, 16
	; GISEL-NEXT: v_writelane_b32 v40, s21, 17			; GISEL-NEXT: v_writelane_b32 v40, s21, 17
	; GISEL-NEXT: v_writelane_b32 v40, s22, 18			; GISEL-NEXT: v_writelane_b32 v40, s22, 18
	; GISEL-NEXT: v_writelane_b32 v40, s23, 19			; GISEL-NEXT: v_writelane_b32 v40, s23, 19
				; GISEL-NEXT: s_mov_b32 s36, s33
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_addk_i32 s32, 0x400			; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: v_writelane_b32 v40, s24, 20			; GISEL-NEXT: v_writelane_b32 v40, s24, 20
	; GISEL-NEXT: v_writelane_b32 v40, s25, 21			; GISEL-NEXT: v_writelane_b32 v40, s25, 21
	; GISEL-NEXT: s_getpc_b64 s[34:35]			; GISEL-NEXT: s_getpc_b64 s[34:35]
	; GISEL-NEXT: s_add_u32 s34, s34, extern_c_func@gotpcrel32@lo+4			; GISEL-NEXT: s_add_u32 s34, s34, extern_c_func@gotpcrel32@lo+4
	; GISEL-NEXT: s_addc_u32 s35, s35, extern_c_func@gotpcrel32@hi+12			; GISEL-NEXT: s_addc_u32 s35, s35, extern_c_func@gotpcrel32@hi+12
	; GISEL-NEXT: v_writelane_b32 v40, s26, 22			; GISEL-NEXT: v_writelane_b32 v40, s26, 22
	Show All 30 Lines
	; GISEL-NEXT: v_readlane_b32 s10, v40, 6			; GISEL-NEXT: v_readlane_b32 s10, v40, 6
	; GISEL-NEXT: v_readlane_b32 s9, v40, 5			; GISEL-NEXT: v_readlane_b32 s9, v40, 5
	; GISEL-NEXT: v_readlane_b32 s8, v40, 4			; GISEL-NEXT: v_readlane_b32 s8, v40, 4
	; GISEL-NEXT: v_readlane_b32 s7, v40, 3			; GISEL-NEXT: v_readlane_b32 s7, v40, 3
	; GISEL-NEXT: v_readlane_b32 s6, v40, 2			; GISEL-NEXT: v_readlane_b32 s6, v40, 2
	; GISEL-NEXT: v_readlane_b32 s5, v40, 1			; GISEL-NEXT: v_readlane_b32 s5, v40, 1
	; GISEL-NEXT: v_readlane_b32 s4, v40, 0			; GISEL-NEXT: v_readlane_b32 s4, v40, 0
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: v_readlane_b32 s33, v40, 28			; GISEL-NEXT: s_mov_b32 s33, s36
	; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1			; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[34:35]			; GISEL-NEXT: s_mov_b64 exec, s[34:35]
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	call void @extern_c_func()			call void @extern_c_func()
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	declare hidden amdgpu_gfx void @external_void_func_v16i8(<16 x i8>) #0			declare hidden amdgpu_gfx void @external_void_func_v16i8(<16 x i8>) #0

	define amdgpu_gfx void @test_call_external_void_func_i1_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_i1_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_i1_imm:			; GFX9-LABEL: test_call_external_void_func_i1_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1@rel32@hi+12
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_imm:			; GFX10-LABEL: test_call_external_void_func_i1_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i1(i1 true)			call amdgpu_gfx void @external_void_func_i1(i1 true)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i1_signext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i1_signext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i1_signext:			; GFX9-LABEL: test_call_external_void_func_i1_signext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc			; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12
	; GFX9-NEXT: v_and_b32_e32 v0, 1, v0			; GFX9-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_signext:			; GFX10-LABEL: test_call_external_void_func_i1_signext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_signext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_signext@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: v_and_b32_e32 v0, 1, v0			; GFX10-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_signext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_signext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_signext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_signext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_signext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_signext@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0			; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i1, i1 addrspace(1)* undef			%var = load volatile i1, i1 addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_i1_signext(i1 signext%var)			call amdgpu_gfx void @external_void_func_i1_signext(i1 signext%var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i1_zeroext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i1_zeroext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i1_zeroext:			; GFX9-LABEL: test_call_external_void_func_i1_zeroext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc			; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12
	; GFX9-NEXT: v_and_b32_e32 v0, 1, v0			; GFX9-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_zeroext:			; GFX10-LABEL: test_call_external_void_func_i1_zeroext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_zeroext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_zeroext@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: v_and_b32_e32 v0, 1, v0			; GFX10-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_zeroext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_zeroext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_zeroext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_zeroext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_zeroext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_zeroext@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0			; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i1, i1 addrspace(1)* undef			%var = load volatile i1, i1 addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_i1_zeroext(i1 zeroext %var)			call amdgpu_gfx void @external_void_func_i1_zeroext(i1 zeroext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i8_imm(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i8_imm(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i8_imm:			; GFX9-LABEL: test_call_external_void_func_i8_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_imm:			; GFX10-LABEL: test_call_external_void_func_i8_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i8(i8 123)			call amdgpu_gfx void @external_void_func_i8(i8 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i8_signext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i8_signext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i8_signext:			; GFX9-LABEL: test_call_external_void_func_i8_signext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_sbyte v0, v[0:1], off glc			; GFX9-NEXT: global_load_sbyte v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_signext:			; GFX10-LABEL: test_call_external_void_func_i8_signext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_sbyte v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_sbyte v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_signext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_signext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_signext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_sbyte v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_sbyte v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_signext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_signext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_signext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_signext@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i8, i8 addrspace(1)* undef			%var = load volatile i8, i8 addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_i8_signext(i8 signext %var)			call amdgpu_gfx void @external_void_func_i8_signext(i8 signext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i8_zeroext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i8_zeroext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i8_zeroext:			; GFX9-LABEL: test_call_external_void_func_i8_zeroext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc			; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_zeroext:			; GFX10-LABEL: test_call_external_void_func_i8_zeroext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_zeroext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_zeroext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_zeroext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_zeroext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_zeroext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_zeroext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_zeroext@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i8, i8 addrspace(1)* undef			%var = load volatile i8, i8 addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_i8_zeroext(i8 zeroext %var)			call amdgpu_gfx void @external_void_func_i8_zeroext(i8 zeroext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_i16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_i16_imm:			; GFX9-LABEL: test_call_external_void_func_i16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_imm:			; GFX10-LABEL: test_call_external_void_func_i16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i16(i16 123)			call amdgpu_gfx void @external_void_func_i16(i16 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i16_signext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i16_signext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i16_signext:			; GFX9-LABEL: test_call_external_void_func_i16_signext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc			; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_signext:			; GFX10-LABEL: test_call_external_void_func_i16_signext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_signext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_signext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_signext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_signext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_signext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_signext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_signext@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i16, i16 addrspace(1)* undef			%var = load volatile i16, i16 addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_i16_signext(i16 signext %var)			call amdgpu_gfx void @external_void_func_i16_signext(i16 signext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i16_zeroext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i16_zeroext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i16_zeroext:			; GFX9-LABEL: test_call_external_void_func_i16_zeroext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc			; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_zeroext:			; GFX10-LABEL: test_call_external_void_func_i16_zeroext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_zeroext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_zeroext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_zeroext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_zeroext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_zeroext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_zeroext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_zeroext@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i16, i16 addrspace(1)* undef			%var = load volatile i16, i16 addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_i16_zeroext(i16 zeroext %var)			call amdgpu_gfx void @external_void_func_i16_zeroext(i16 zeroext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i32_imm(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i32_imm(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i32_imm:			; GFX9-LABEL: test_call_external_void_func_i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 42			; GFX9-NEXT: v_mov_b32_e32 v0, 42
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i32_imm:			; GFX10-LABEL: test_call_external_void_func_i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 42			; GFX10-NEXT: v_mov_b32_e32 v0, 42
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i32(i32 42)			call amdgpu_gfx void @external_void_func_i32(i32 42)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_i64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_i64_imm:			; GFX9-LABEL: test_call_external_void_func_i64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i64_imm:			; GFX10-LABEL: test_call_external_void_func_i64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i64(i64 123)			call amdgpu_gfx void @external_void_func_i64(i64 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i64() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i64() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i64:			; GFX9-LABEL: test_call_external_void_func_v2i64:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64:			; GFX10-LABEL: test_call_external_void_func_v2i64:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i64>, <2 x i64> addrspace(1)* null			%val = load <2 x i64>, <2 x i64> addrspace(1)* null
	call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> %val)			call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i64_imm:			; GFX9-LABEL: test_call_external_void_func_v2i64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: v_mov_b32_e32 v2, 3			; GFX9-NEXT: v_mov_b32_e32 v2, 3
	; GFX9-NEXT: v_mov_b32_e32 v3, 4			; GFX9-NEXT: v_mov_b32_e32 v3, 4
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64_imm:			; GFX10-LABEL: test_call_external_void_func_v2i64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: v_mov_b32_e32 v1, 2			; GFX10-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-NEXT: v_mov_b32_e32 v2, 3			; GFX10-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-NEXT: v_mov_b32_e32 v3, 4			; GFX10-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> <i64 8589934593, i64 17179869187>)			call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> <i64 8589934593, i64 17179869187>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i64() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i64() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i64:			; GFX9-LABEL: test_call_external_void_func_v3i64:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v4, 1			; GFX9-NEXT: v_mov_b32_e32 v4, 1
	; GFX9-NEXT: v_mov_b32_e32 v5, 2			; GFX9-NEXT: v_mov_b32_e32 v5, 2
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i64:			; GFX10-LABEL: test_call_external_void_func_v3i64:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 1			; GFX10-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-NEXT: v_mov_b32_e32 v5, 2			; GFX10-NEXT: v_mov_b32_e32 v5, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%load = load <2 x i64>, <2 x i64> addrspace(1)* null			%load = load <2 x i64>, <2 x i64> addrspace(1)* null
	%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 undef>, <3 x i32> <i32 0, i32 1, i32 2>			%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 undef>, <3 x i32> <i32 0, i32 1, i32 2>

	call amdgpu_gfx void @external_void_func_v3i64(<3 x i64> %val)			call amdgpu_gfx void @external_void_func_v3i64(<3 x i64> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i64() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i64() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i64:			; GFX9-LABEL: test_call_external_void_func_v4i64:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v4, 1			; GFX9-NEXT: v_mov_b32_e32 v4, 1
	; GFX9-NEXT: v_mov_b32_e32 v5, 2			; GFX9-NEXT: v_mov_b32_e32 v5, 2
	; GFX9-NEXT: v_mov_b32_e32 v6, 3			; GFX9-NEXT: v_mov_b32_e32 v6, 3
	; GFX9-NEXT: v_mov_b32_e32 v7, 4			; GFX9-NEXT: v_mov_b32_e32 v7, 4
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i64:			; GFX10-LABEL: test_call_external_void_func_v4i64:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 1			; GFX10-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-NEXT: v_mov_b32_e32 v5, 2			; GFX10-NEXT: v_mov_b32_e32 v5, 2
	; GFX10-NEXT: v_mov_b32_e32 v6, 3			; GFX10-NEXT: v_mov_b32_e32 v6, 3
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v7, 4			; GFX10-NEXT: v_mov_b32_e32 v7, 4
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i64:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i64:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 3
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 4
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%load = load <2 x i64>, <2 x i64> addrspace(1)* null			%load = load <2 x i64>, <2 x i64> addrspace(1)* null
	%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	call amdgpu_gfx void @external_void_func_v4i64(<4 x i64> %val)			call amdgpu_gfx void @external_void_func_v4i64(<4 x i64> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_f16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_f16_imm:			; GFX9-LABEL: test_call_external_void_func_f16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x4400			; GFX9-NEXT: v_mov_b32_e32 v0, 0x4400
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f16_imm:			; GFX10-LABEL: test_call_external_void_func_f16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x4400			; GFX10-NEXT: v_mov_b32_e32 v0, 0x4400
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x4400			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x4400
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f16(half 4.0)			call amdgpu_gfx void @external_void_func_f16(half 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_f32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_f32_imm:			; GFX9-LABEL: test_call_external_void_func_f32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 4.0			; GFX9-NEXT: v_mov_b32_e32 v0, 4.0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f32_imm:			; GFX10-LABEL: test_call_external_void_func_f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 4.0			; GFX10-NEXT: v_mov_b32_e32 v0, 4.0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 4.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 4.0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f32(float 4.0)			call amdgpu_gfx void @external_void_func_f32(float 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f32_imm:			; GFX9-LABEL: test_call_external_void_func_v2f32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1.0			; GFX9-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f32_imm:			; GFX10-LABEL: test_call_external_void_func_v2f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2f32(<2 x float> <float 1.0, float 2.0>)			call amdgpu_gfx void @external_void_func_v2f32(<2 x float> <float 1.0, float 2.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f32_imm:			; GFX9-LABEL: test_call_external_void_func_v3f32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1.0			; GFX9-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: v_mov_b32_e32 v2, 4.0			; GFX9-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f32_imm:			; GFX10-LABEL: test_call_external_void_func_v3f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-NEXT: v_mov_b32_e32 v2, 4.0			; GFX10-NEXT: v_mov_b32_e32 v2, 4.0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f32(<3 x float> <float 1.0, float 2.0, float 4.0>)			call amdgpu_gfx void @external_void_func_v3f32(<3 x float> <float 1.0, float 2.0, float 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5f32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v5f32_imm:			; GFX9-LABEL: test_call_external_void_func_v5f32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1.0			; GFX9-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: v_mov_b32_e32 v2, 4.0			; GFX9-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX9-NEXT: v_mov_b32_e32 v3, -1.0			; GFX9-NEXT: v_mov_b32_e32 v3, -1.0
	; GFX9-NEXT: v_mov_b32_e32 v4, 0.5			; GFX9-NEXT: v_mov_b32_e32 v4, 0.5
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5f32_imm:			; GFX10-LABEL: test_call_external_void_func_v5f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-NEXT: v_mov_b32_e32 v2, 4.0			; GFX10-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX10-NEXT: v_mov_b32_e32 v3, -1.0			; GFX10-NEXT: v_mov_b32_e32 v3, -1.0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 0.5			; GFX10-NEXT: v_mov_b32_e32 v4, 0.5
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, -1.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, -1.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0.5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0.5
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v5f32(<5 x float> <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)			call amdgpu_gfx void @external_void_func_v5f32(<5 x float> <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_f64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_f64_imm:			; GFX9-LABEL: test_call_external_void_func_f64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0x40100000			; GFX9-NEXT: v_mov_b32_e32 v1, 0x40100000
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f64_imm:			; GFX10-LABEL: test_call_external_void_func_f64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x40100000			; GFX10-NEXT: v_mov_b32_e32 v1, 0x40100000
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40100000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40100000
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f64(double 4.0)			call amdgpu_gfx void @external_void_func_f64(double 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f64_imm:			; GFX9-LABEL: test_call_external_void_func_v2f64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f64_imm:			; GFX10-LABEL: test_call_external_void_func_v2f64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2f64(<2 x double> <double 2.0, double 4.0>)			call amdgpu_gfx void @external_void_func_v2f64(<2 x double> <double 2.0, double 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f64_imm:			; GFX9-LABEL: test_call_external_void_func_v3f64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX9-NEXT: v_mov_b32_e32 v4, 0			; GFX9-NEXT: v_mov_b32_e32 v4, 0
	; GFX9-NEXT: v_mov_b32_e32 v5, 0x40200000			; GFX9-NEXT: v_mov_b32_e32 v5, 0x40200000
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f64_imm:			; GFX10-LABEL: test_call_external_void_func_v3f64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 0			; GFX10-NEXT: v_mov_b32_e32 v4, 0
	; GFX10-NEXT: v_mov_b32_e32 v5, 0x40200000			; GFX10-NEXT: v_mov_b32_e32 v5, 0x40200000
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64@rel32@hi+12
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x40200000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x40200000
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f64(<3 x double> <double 2.0, double 4.0, double 8.0>)			call amdgpu_gfx void @external_void_func_v3f64(<3 x double> <double 2.0, double 4.0, double 8.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i16:			; GFX9-LABEL: test_call_external_void_func_v2i16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_dword v0, v[0:1], off			; GFX9-NEXT: global_load_dword v0, v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i16:			; GFX10-LABEL: test_call_external_void_func_v2i16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_dword v0, v[0:1], off			; GFX10-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i16>, <2 x i16> addrspace(1)* undef			%val = load <2 x i16>, <2 x i16> addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_v2i16(<2 x i16> %val)			call amdgpu_gfx void @external_void_func_v2i16(<2 x i16> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i16:			; GFX9-LABEL: test_call_external_void_func_v3i16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16:			; GFX10-LABEL: test_call_external_void_func_v3i16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <3 x i16>, <3 x i16> addrspace(1)* undef			%val = load <3 x i16>, <3 x i16> addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> %val)			call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f16:			; GFX9-LABEL: test_call_external_void_func_v3f16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16:			; GFX10-LABEL: test_call_external_void_func_v3f16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <3 x half>, <3 x half> addrspace(1)* undef			%val = load <3 x half>, <3 x half> addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_v3f16(<3 x half> %val)			call amdgpu_gfx void @external_void_func_v3f16(<3 x half> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i16_imm:			; GFX9-LABEL: test_call_external_void_func_v3i16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX9-NEXT: v_mov_b32_e32 v1, 3			; GFX9-NEXT: v_mov_b32_e32 v1, 3
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16_imm:			; GFX10-LABEL: test_call_external_void_func_v3i16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX10-NEXT: v_mov_b32_e32 v1, 3			; GFX10-NEXT: v_mov_b32_e32 v1, 3
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 3
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> <i16 1, i16 2, i16 3>)			call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> <i16 1, i16 2, i16 3>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f16_imm:			; GFX9-LABEL: test_call_external_void_func_v3f16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x40003c00			; GFX9-NEXT: v_mov_b32_e32 v0, 0x40003c00
	; GFX9-NEXT: v_mov_b32_e32 v1, 0x4400			; GFX9-NEXT: v_mov_b32_e32 v1, 0x4400
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16_imm:			; GFX10-LABEL: test_call_external_void_func_v3f16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x40003c00			; GFX10-NEXT: v_mov_b32_e32 v0, 0x40003c00
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x4400			; GFX10-NEXT: v_mov_b32_e32 v1, 0x4400
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x40003c00			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x40003c00
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x4400			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x4400
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f16(<3 x half> <half 1.0, half 2.0, half 4.0>)			call amdgpu_gfx void @external_void_func_v3f16(<3 x half> <half 1.0, half 2.0, half 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i16:			; GFX9-LABEL: test_call_external_void_func_v4i16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16:			; GFX10-LABEL: test_call_external_void_func_v4i16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <4 x i16>, <4 x i16> addrspace(1)* undef			%val = load <4 x i16>, <4 x i16> addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> %val)			call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i16_imm:			; GFX9-LABEL: test_call_external_void_func_v4i16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX9-NEXT: v_mov_b32_e32 v1, 0x40003			; GFX9-NEXT: v_mov_b32_e32 v1, 0x40003
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16_imm:			; GFX10-LABEL: test_call_external_void_func_v4i16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x40003			; GFX10-NEXT: v_mov_b32_e32 v1, 0x40003
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40003			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40003
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> <i16 1, i16 2, i16 3, i16 4>)			call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> <i16 1, i16 2, i16 3, i16 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f16:			; GFX9-LABEL: test_call_external_void_func_v2f16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_dword v0, v[0:1], off			; GFX9-NEXT: global_load_dword v0, v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f16:			; GFX10-LABEL: test_call_external_void_func_v2f16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_dword v0, v[0:1], off			; GFX10-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x half>, <2 x half> addrspace(1)* undef			%val = load <2 x half>, <2 x half> addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_v2f16(<2 x half> %val)			call amdgpu_gfx void @external_void_func_v2f16(<2 x half> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i32:			; GFX9-LABEL: test_call_external_void_func_v2i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32:			; GFX10-LABEL: test_call_external_void_func_v2i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i32>, <2 x i32> addrspace(1)* undef			%val = load <2 x i32>, <2 x i32> addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> %val)			call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i32_imm:			; GFX9-LABEL: test_call_external_void_func_v2i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32_imm:			; GFX10-LABEL: test_call_external_void_func_v2i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: v_mov_b32_e32 v1, 2			; GFX10-NEXT: v_mov_b32_e32 v1, 2
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> <i32 1, i32 2>)			call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> <i32 1, i32 2>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_imm(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_imm(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i32_imm:			; GFX9-LABEL: test_call_external_void_func_v3i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 3			; GFX9-NEXT: v_mov_b32_e32 v0, 3
	; GFX9-NEXT: v_mov_b32_e32 v1, 4			; GFX9-NEXT: v_mov_b32_e32 v1, 4
	; GFX9-NEXT: v_mov_b32_e32 v2, 5			; GFX9-NEXT: v_mov_b32_e32 v2, 5
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_imm:			; GFX10-LABEL: test_call_external_void_func_v3i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 3			; GFX10-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-NEXT: v_mov_b32_e32 v1, 4			; GFX10-NEXT: v_mov_b32_e32 v1, 4
	; GFX10-NEXT: v_mov_b32_e32 v2, 5			; GFX10-NEXT: v_mov_b32_e32 v2, 5
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i32(<3 x i32> <i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v3i32(<3 x i32> <i32 3, i32 4, i32 5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_i32(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_i32(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i32_i32:			; GFX9-LABEL: test_call_external_void_func_v3i32_i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 3			; GFX9-NEXT: v_mov_b32_e32 v0, 3
	; GFX9-NEXT: v_mov_b32_e32 v1, 4			; GFX9-NEXT: v_mov_b32_e32 v1, 4
	; GFX9-NEXT: v_mov_b32_e32 v2, 5			; GFX9-NEXT: v_mov_b32_e32 v2, 5
	; GFX9-NEXT: v_mov_b32_e32 v3, 6			; GFX9-NEXT: v_mov_b32_e32 v3, 6
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_i32:			; GFX10-LABEL: test_call_external_void_func_v3i32_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 3			; GFX10-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-NEXT: v_mov_b32_e32 v1, 4			; GFX10-NEXT: v_mov_b32_e32 v1, 4
	; GFX10-NEXT: v_mov_b32_e32 v2, 5			; GFX10-NEXT: v_mov_b32_e32 v2, 5
	; GFX10-NEXT: v_mov_b32_e32 v3, 6			; GFX10-NEXT: v_mov_b32_e32 v3, 6
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 6			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i32_i32(<3 x i32> <i32 3, i32 4, i32 5>, i32 6)			call amdgpu_gfx void @external_void_func_v3i32_i32(<3 x i32> <i32 3, i32 4, i32 5>, i32 6)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i32:			; GFX9-LABEL: test_call_external_void_func_v4i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32:			; GFX10-LABEL: test_call_external_void_func_v4i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <4 x i32>, <4 x i32> addrspace(1)* undef			%val = load <4 x i32>, <4 x i32> addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> %val)			call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i32_imm:			; GFX9-LABEL: test_call_external_void_func_v4i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: v_mov_b32_e32 v2, 3			; GFX9-NEXT: v_mov_b32_e32 v2, 3
	; GFX9-NEXT: v_mov_b32_e32 v3, 4			; GFX9-NEXT: v_mov_b32_e32 v3, 4
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32_imm:			; GFX10-LABEL: test_call_external_void_func_v4i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: v_mov_b32_e32 v1, 2			; GFX10-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-NEXT: v_mov_b32_e32 v2, 3			; GFX10-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-NEXT: v_mov_b32_e32 v3, 4			; GFX10-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> <i32 1, i32 2, i32 3, i32 4>)			call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> <i32 1, i32 2, i32 3, i32 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5i32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5i32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v5i32_imm:			; GFX9-LABEL: test_call_external_void_func_v5i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: v_mov_b32_e32 v2, 3			; GFX9-NEXT: v_mov_b32_e32 v2, 3
	; GFX9-NEXT: v_mov_b32_e32 v3, 4			; GFX9-NEXT: v_mov_b32_e32 v3, 4
	; GFX9-NEXT: v_mov_b32_e32 v4, 5			; GFX9-NEXT: v_mov_b32_e32 v4, 5
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5i32_imm:			; GFX10-LABEL: test_call_external_void_func_v5i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: v_mov_b32_e32 v1, 2			; GFX10-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-NEXT: v_mov_b32_e32 v2, 3			; GFX10-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-NEXT: v_mov_b32_e32 v3, 4			; GFX10-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 5			; GFX10-NEXT: v_mov_b32_e32 v4, 5
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v5i32(<5 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v5i32(<5 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v8i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v8i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_v8i32:			; GFX9-LABEL: test_call_external_void_func_v8i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v8, 0			; GFX9-NEXT: v_mov_b32_e32 v8, 0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]
	; GFX9-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16			; GFX9-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32:			; GFX10-LABEL: test_call_external_void_func_v8i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v8, 0			; GFX10-NEXT: v_mov_b32_e32 v8, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v8, s[34:35]
	; GFX10-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16			; GFX10-NEXT: global_load_dwordx4 v[4:7], v8, s[34:35] offset:16
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: s_clause 0x1
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v8, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v8, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v8, s[0:1] offset:16			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v8, s[0:1] offset:16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load <8 x i32> addrspace(1), <8 x i32> addrspace(1) addrspace(4)* undef			%ptr = load <8 x i32> addrspace(1), <8 x i32> addrspace(1) addrspace(4)* undef
	%val = load <8 x i32>, <8 x i32> addrspace(1)* %ptr			%val = load <8 x i32>, <8 x i32> addrspace(1)* %ptr
	call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> %val)			call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v8i32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v8i32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v8i32_imm:			; GFX9-LABEL: test_call_external_void_func_v8i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: v_mov_b32_e32 v2, 3			; GFX9-NEXT: v_mov_b32_e32 v2, 3
	; GFX9-NEXT: v_mov_b32_e32 v3, 4			; GFX9-NEXT: v_mov_b32_e32 v3, 4
	; GFX9-NEXT: v_mov_b32_e32 v4, 5			; GFX9-NEXT: v_mov_b32_e32 v4, 5
	; GFX9-NEXT: v_mov_b32_e32 v5, 6			; GFX9-NEXT: v_mov_b32_e32 v5, 6
	; GFX9-NEXT: v_mov_b32_e32 v6, 7			; GFX9-NEXT: v_mov_b32_e32 v6, 7
	; GFX9-NEXT: v_mov_b32_e32 v7, 8			; GFX9-NEXT: v_mov_b32_e32 v7, 8
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32_imm:			; GFX10-LABEL: test_call_external_void_func_v8i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: v_mov_b32_e32 v1, 2			; GFX10-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-NEXT: v_mov_b32_e32 v2, 3			; GFX10-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-NEXT: v_mov_b32_e32 v3, 4			; GFX10-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 5			; GFX10-NEXT: v_mov_b32_e32 v4, 5
	; GFX10-NEXT: v_mov_b32_e32 v5, 6			; GFX10-NEXT: v_mov_b32_e32 v5, 6
	; GFX10-NEXT: v_mov_b32_e32 v6, 7			; GFX10-NEXT: v_mov_b32_e32 v6, 7
	; GFX10-NEXT: v_mov_b32_e32 v7, 8			; GFX10-NEXT: v_mov_b32_e32 v7, 8
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 6			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 6
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 7			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 7
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 8			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 8
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)			call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v16i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v16i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_v16i32:			; GFX9-LABEL: test_call_external_void_func_v16i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v16, 0			; GFX9-NEXT: v_mov_b32_e32 v16, 0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]
	; GFX9-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16			; GFX9-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16
	; GFX9-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32			; GFX9-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32
	; GFX9-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48			; GFX9-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v16i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v16i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v16i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v16i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v16i32:			; GFX10-LABEL: test_call_external_void_func_v16i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v16, 0			; GFX10-NEXT: v_mov_b32_e32 v16, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x3			; GFX10-NEXT: s_clause 0x3
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v16, s[34:35]
	; GFX10-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16			; GFX10-NEXT: global_load_dwordx4 v[4:7], v16, s[34:35] offset:16
	; GFX10-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32			; GFX10-NEXT: global_load_dwordx4 v[8:11], v16, s[34:35] offset:32
	; GFX10-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48			; GFX10-NEXT: global_load_dwordx4 v[12:15], v16, s[34:35] offset:48
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v16i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v16i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v16i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v16i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v16, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v16, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x3			; GFX10-SCRATCH-NEXT: s_clause 0x3
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v16, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v16, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v16, s[0:1] offset:16			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v16, s[0:1] offset:16
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v16, s[0:1] offset:32			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v16, s[0:1] offset:32
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v16, s[0:1] offset:48			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v16, s[0:1] offset:48
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load <16 x i32> addrspace(1), <16 x i32> addrspace(1) addrspace(4)* undef			%ptr = load <16 x i32> addrspace(1), <16 x i32> addrspace(1) addrspace(4)* undef
	%val = load <16 x i32>, <16 x i32> addrspace(1)* %ptr			%val = load <16 x i32>, <16 x i32> addrspace(1)* %ptr
	call amdgpu_gfx void @external_void_func_v16i32(<16 x i32> %val)			call amdgpu_gfx void @external_void_func_v16i32(<16 x i32> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v32i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v32i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_v32i32:			; GFX9-LABEL: test_call_external_void_func_v32i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v28, 0			; GFX9-NEXT: v_mov_b32_e32 v28, 0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]
	; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16			; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16
	; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32			; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32
	; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48			; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48
	; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[34:35] offset:64			; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[34:35] offset:64
	; GFX9-NEXT: global_load_dwordx4 v[20:23], v28, s[34:35] offset:80			; GFX9-NEXT: global_load_dwordx4 v[20:23], v28, s[34:35] offset:80
	; GFX9-NEXT: global_load_dwordx4 v[24:27], v28, s[34:35] offset:96			; GFX9-NEXT: global_load_dwordx4 v[24:27], v28, s[34:35] offset:96
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: global_load_dwordx4 v[28:31], v28, s[34:35] offset:112			; GFX9-NEXT: global_load_dwordx4 v[28:31], v28, s[34:35] offset:112
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v32i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v32i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32:			; GFX10-LABEL: test_call_external_void_func_v32i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v32, 0			; GFX10-NEXT: v_mov_b32_e32 v32, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x7			; GFX10-NEXT: s_clause 0x7
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]
	; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16			; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16
	; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32			; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32
	; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[34:35] offset:48			; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[34:35] offset:48
	; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[34:35] offset:64			; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[34:35] offset:64
	; GFX10-NEXT: global_load_dwordx4 v[20:23], v32, s[34:35] offset:80			; GFX10-NEXT: global_load_dwordx4 v[20:23], v32, s[34:35] offset:80
	; GFX10-NEXT: global_load_dwordx4 v[24:27], v32, s[34:35] offset:96			; GFX10-NEXT: global_load_dwordx4 v[24:27], v32, s[34:35] offset:96
	; GFX10-NEXT: global_load_dwordx4 v[28:31], v32, s[34:35] offset:112			; GFX10-NEXT: global_load_dwordx4 v[28:31], v32, s[34:35] offset:112
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v32i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v32i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x7			; GFX10-SCRATCH-NEXT: s_clause 0x7
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[20:23], v32, s[0:1] offset:80			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[20:23], v32, s[0:1] offset:80
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[24:27], v32, s[0:1] offset:96			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[24:27], v32, s[0:1] offset:96
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[28:31], v32, s[0:1] offset:112			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[28:31], v32, s[0:1] offset:112
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v32i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v32i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load <32 x i32> addrspace(1), <32 x i32> addrspace(1) addrspace(4)* undef			%ptr = load <32 x i32> addrspace(1), <32 x i32> addrspace(1) addrspace(4)* undef
	%val = load <32 x i32>, <32 x i32> addrspace(1)* %ptr			%val = load <32 x i32>, <32 x i32> addrspace(1)* %ptr
	call amdgpu_gfx void @external_void_func_v32i32(<32 x i32> %val)			call amdgpu_gfx void @external_void_func_v32i32(<32 x i32> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v32i32_i32(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v32i32_i32(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v32i32_i32:			; GFX9-LABEL: test_call_external_void_func_v32i32_i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v28, 0			; GFX9-NEXT: v_mov_b32_e32 v28, 0
	; GFX9-NEXT: global_load_dword v32, v[0:1], off			; GFX9-NEXT: global_load_dword v32, v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v28, s[34:35]
	; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16			; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[34:35] offset:16
	; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32			; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[34:35] offset:32
	; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48			; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[34:35] offset:48
	; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[34:35] offset:64			; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[34:35] offset:64
	; GFX9-NEXT: global_load_dwordx4 v[20:23], v28, s[34:35] offset:80			; GFX9-NEXT: global_load_dwordx4 v[20:23], v28, s[34:35] offset:80
	; GFX9-NEXT: global_load_dwordx4 v[24:27], v28, s[34:35] offset:96			; GFX9-NEXT: global_load_dwordx4 v[24:27], v28, s[34:35] offset:96
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: global_load_dwordx4 v[28:31], v28, s[34:35] offset:112			; GFX9-NEXT: global_load_dwordx4 v[28:31], v28, s[34:35] offset:112
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v32i32_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v32i32_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32_i32@rel32@hi+12
	; GFX9-NEXT: s_waitcnt vmcnt(8)			; GFX9-NEXT: s_waitcnt vmcnt(8)
	; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32_i32:			; GFX10-LABEL: test_call_external_void_func_v32i32_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v32, 0			; GFX10-NEXT: v_mov_b32_e32 v32, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: global_load_dword v33, v[0:1], off			; GFX10-NEXT: global_load_dword v33, v[0:1], off
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x7			; GFX10-NEXT: s_clause 0x7
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[34:35]
	; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16			; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[34:35] offset:16
	; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32			; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[34:35] offset:32
	; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[34:35] offset:48			; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[34:35] offset:48
	; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[34:35] offset:64			; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[34:35] offset:64
	; GFX10-NEXT: global_load_dwordx4 v[20:23], v32, s[34:35] offset:80			; GFX10-NEXT: global_load_dwordx4 v[20:23], v32, s[34:35] offset:80
	; GFX10-NEXT: global_load_dwordx4 v[24:27], v32, s[34:35] offset:96			; GFX10-NEXT: global_load_dwordx4 v[24:27], v32, s[34:35] offset:96
	; GFX10-NEXT: global_load_dwordx4 v[28:31], v32, s[34:35] offset:112			; GFX10-NEXT: global_load_dwordx4 v[28:31], v32, s[34:35] offset:112
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v32i32_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v32i32_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32_i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_waitcnt vmcnt(8)			; GFX10-NEXT: s_waitcnt vmcnt(8)
	; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: global_load_dword v33, v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dword v33, v[0:1], off
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x7			; GFX10-SCRATCH-NEXT: s_clause 0x7
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[20:23], v32, s[0:1] offset:80			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[20:23], v32, s[0:1] offset:80
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[24:27], v32, s[0:1] offset:96			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[24:27], v32, s[0:1] offset:96
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[28:31], v32, s[0:1] offset:112			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[28:31], v32, s[0:1] offset:112
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v32i32_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v32i32_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(8)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(8)
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v33, s32			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v33, s32
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr0 = load <32 x i32> addrspace(1), <32 x i32> addrspace(1) addrspace(4)* undef			%ptr0 = load <32 x i32> addrspace(1), <32 x i32> addrspace(1) addrspace(4)* undef
	%val0 = load <32 x i32>, <32 x i32> addrspace(1)* %ptr0			%val0 = load <32 x i32>, <32 x i32> addrspace(1)* %ptr0
	%val1 = load i32, i32 addrspace(1)* undef			%val1 = load i32, i32 addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_v32i32_i32(<32 x i32> %val0, i32 %val1)			call amdgpu_gfx void @external_void_func_v32i32_i32(<32 x i32> %val0, i32 %val1)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_i32_func_i32_imm(i32 addrspace(1)* %out) #0 {			define amdgpu_gfx void @test_call_external_i32_func_i32_imm(i32 addrspace(1)* %out) #0 {
	; GFX9-LABEL: test_call_external_i32_func_i32_imm:			; GFX9-LABEL: test_call_external_i32_func_i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v43, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v41, v0			; GFX9-NEXT: v_mov_b32_e32 v41, v0
	; GFX9-NEXT: v_mov_b32_e32 v0, 42			; GFX9-NEXT: v_mov_b32_e32 v0, 42
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: v_mov_b32_e32 v42, v1			; GFX9-NEXT: v_mov_b32_e32 v42, v1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: global_store_dword v[41:42], v0, off			; GFX9-NEXT: global_store_dword v[41:42], v0, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v43, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_i32_func_i32_imm:			; GFX10-LABEL: test_call_external_i32_func_i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v43, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: v_mov_b32_e32 v41, v0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_mov_b32_e32 v41, v0
	; GFX10-NEXT: v_mov_b32_e32 v0, 42			; GFX10-NEXT: v_mov_b32_e32 v0, 42
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x400
	; GFX10-NEXT: v_mov_b32_e32 v42, v1			; GFX10-NEXT: v_mov_b32_e32 v42, v1
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_i32_func_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_i32_func_i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: global_store_dword v[41:42], v0, off			; GFX10-NEXT: global_store_dword v[41:42], v0, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33			; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v43, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8
				; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_i32_func_i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_i32_func_i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v43, s32 offset:12 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v43, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v42, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v42, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v41, v0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v41, v0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v42, v1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v42, v1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_i32_func_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_i32_func_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_i32_func_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_i32_func_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: global_store_dword v[41:42], v0, off			; GFX10-SCRATCH-NEXT: global_store_dword v[41:42], v0, off
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: s_clause 0x1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v42, off, s33			; GFX10-SCRATCH-NEXT: scratch_load_dword v42, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4			; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v43, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:8 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:8
				; GFX10-SCRATCH-NEXT: scratch_load_dword v43, off, s32 offset:12
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = call amdgpu_gfx i32 @external_i32_func_i32(i32 42)			%val = call amdgpu_gfx i32 @external_i32_func_i32(i32 42)
	store volatile i32 %val, i32 addrspace(1)* %out			store volatile i32 %val, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_struct_i8_i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_struct_i8_i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_struct_i8_i32:			; GFX9-LABEL: test_call_external_void_func_struct_i8_i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_ubyte v0, v2, s[34:35]			; GFX9-NEXT: global_load_ubyte v0, v2, s[34:35]
	; GFX9-NEXT: global_load_dword v1, v2, s[34:35] offset:4			; GFX9-NEXT: global_load_dword v1, v2, s[34:35] offset:4
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_struct_i8_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_struct_i8_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_struct_i8_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_struct_i8_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_struct_i8_i32:			; GFX10-LABEL: test_call_external_void_func_struct_i8_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_ubyte v0, v2, s[34:35]			; GFX10-NEXT: global_load_ubyte v0, v2, s[34:35]
	; GFX10-NEXT: global_load_dword v1, v2, s[34:35] offset:4			; GFX10-NEXT: global_load_dword v1, v2, s[34:35] offset:4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_struct_i8_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_struct_i8_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_struct_i8_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_struct_i8_i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_struct_i8_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_struct_i8_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: s_clause 0x1
	; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v2, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v2, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dword v1, v2, s[0:1] offset:4			; GFX10-SCRATCH-NEXT: global_load_dword v1, v2, s[0:1] offset:4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_struct_i8_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_struct_i8_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_struct_i8_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_struct_i8_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr0 = load { i8, i32 } addrspace(1), { i8, i32 } addrspace(1) addrspace(4)* undef			%ptr0 = load { i8, i32 } addrspace(1), { i8, i32 } addrspace(1) addrspace(4)* undef
	%val = load { i8, i32 }, { i8, i32 } addrspace(1)* %ptr0			%val = load { i8, i32 }, { i8, i32 } addrspace(1)* %ptr0
	call amdgpu_gfx void @external_void_func_struct_i8_i32({ i8, i32 } %val)			call amdgpu_gfx void @external_void_func_struct_i8_i32({ i8, i32 } %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_byval_struct_i8_i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_byval_struct_i8_i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_byval_struct_i8_i32:			; GFX9-LABEL: test_call_external_void_func_byval_struct_i8_i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 3			; GFX9-NEXT: v_mov_b32_e32 v0, 3
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33
	; GFX9-NEXT: v_mov_b32_e32 v0, 8			; GFX9-NEXT: v_mov_b32_e32 v0, 8
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4
	; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33			; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_byval_struct_i8_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_byval_struct_i8_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_byval_struct_i8_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_byval_struct_i8_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_byval_struct_i8_i32:			; GFX10-LABEL: test_call_external_void_func_byval_struct_i8_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 3			; GFX10-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-NEXT: v_mov_b32_e32 v1, 8			; GFX10-NEXT: v_mov_b32_e32 v1, 8
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4
	; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33			; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33
				; GFX10-NEXT: s_addk_i32 s32, 0x400
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_byval_struct_i8_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_byval_struct_i8_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_byval_struct_i8_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_byval_struct_i8_i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:12
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_byval_struct_i8_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_byval_struct_i8_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:12 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, s33			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, s33
				; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_byval_struct_i8_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_byval_struct_i8_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_byval_struct_i8_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_byval_struct_i8_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:8 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:8
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:12
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = alloca { i8, i32 }, align 4, addrspace(5)			%val = alloca { i8, i32 }, align 4, addrspace(5)
	%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %val, i32 0, i32 0			%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %val, i32 0, i32 0
	%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %val, i32 0, i32 1			%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %val, i32 0, i32 1
	store i8 3, i8 addrspace(5)* %gep0			store i8 3, i8 addrspace(5)* %gep0
	store i32 8, i32 addrspace(5)* %gep1			store i32 8, i32 addrspace(5)* %gep1
	call amdgpu_gfx void @external_void_func_byval_struct_i8_i32({ i8, i32 } addrspace(5)* byval({ i8, i32 }) %val)			call amdgpu_gfx void @external_void_func_byval_struct_i8_i32({ i8, i32 } addrspace(5)* byval({ i8, i32 }) %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:			; GFX9-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 3			; GFX9-NEXT: v_mov_b32_e32 v0, 3
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33
	; GFX9-NEXT: v_mov_b32_e32 v0, 8			; GFX9-NEXT: v_mov_b32_e32 v0, 8
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4
	; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33			; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33
	; GFX9-NEXT: s_addk_i32 s32, 0x800			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_add_u32_e32 v0, 8, v0			; GFX9-NEXT: v_add_u32_e32 v0, 8, v0
	; GFX9-NEXT: v_lshrrev_b32_e64 v1, 6, s33			; GFX9-NEXT: v_lshrrev_b32_e64 v1, 6, s33
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:8			; GFX9-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:8
	; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12			; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: global_store_byte v[0:1], v0, off			; GFX9-NEXT: global_store_byte v[0:1], v0, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: global_store_dword v[0:1], v1, off			; GFX9-NEXT: global_store_dword v[0:1], v1, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:			; GFX10-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_mov_b32_e32 v0, 3			; GFX10-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-NEXT: v_mov_b32_e32 v1, 8			; GFX10-NEXT: v_mov_b32_e32 v1, 8
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x400			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4
	; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33			; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_lshrrev_b32_e64 v1, 5, s33			; GFX10-NEXT: v_lshrrev_b32_e64 v1, 5, s33
				; GFX10-NEXT: s_addk_i32 s32, 0x400
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12
	; GFX10-NEXT: v_add_nc_u32_e32 v0, 8, v0			; GFX10-NEXT: v_add_nc_u32_e32 v0, 8, v0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:8			; GFX10-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12			; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: global_store_byte v[0:1], v0, off			; GFX10-NEXT: global_store_byte v[0:1], v0, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: global_store_dword v[0:1], v1, off			; GFX10-NEXT: global_store_dword v[0:1], v1, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:20
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:16 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:16 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:20 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_add_i32 vcc_lo, s33, 8			; GFX10-SCRATCH-NEXT: s_add_i32 vcc_lo, s33, 8
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, vcc_lo			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, vcc_lo
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, s33			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, s33
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: s_clause 0x1
	; GFX10-SCRATCH-NEXT: scratch_load_ubyte v0, off, s33 offset:8			; GFX10-SCRATCH-NEXT: scratch_load_ubyte v0, off, s33 offset:8
	; GFX10-SCRATCH-NEXT: scratch_load_dword v1, off, s33 offset:12			; GFX10-SCRATCH-NEXT: scratch_load_dword v1, off, s33 offset:12
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0			; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: global_store_byte v[0:1], v0, off			; GFX10-SCRATCH-NEXT: global_store_byte v[0:1], v0, off
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: global_store_dword v[0:1], v1, off			; GFX10-SCRATCH-NEXT: global_store_dword v[0:1], v1, off
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:16 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:16
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:20
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%in.val = alloca { i8, i32 }, align 4, addrspace(5)			%in.val = alloca { i8, i32 }, align 4, addrspace(5)
	%out.val = alloca { i8, i32 }, align 4, addrspace(5)			%out.val = alloca { i8, i32 }, align 4, addrspace(5)
	%in.gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %in.val, i32 0, i32 0			%in.gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %in.val, i32 0, i32 0
	%in.gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %in.val, i32 0, i32 1			%in.gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %in.val, i32 0, i32 1
	Show All 11 Lines
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v16i8() #0 {			define amdgpu_gfx void @test_call_external_void_func_v16i8() #0 {
	; GFX9-LABEL: test_call_external_void_func_v16i8:			; GFX9-LABEL: test_call_external_void_func_v16i8:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v16i8@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v16i8@rel32@lo+4
	Show All 16 Lines
	; GFX9-NEXT: v_mov_b32_e32 v12, v3			; GFX9-NEXT: v_mov_b32_e32 v12, v3
	; GFX9-NEXT: v_mov_b32_e32 v1, v16			; GFX9-NEXT: v_mov_b32_e32 v1, v16
	; GFX9-NEXT: v_mov_b32_e32 v2, v17			; GFX9-NEXT: v_mov_b32_e32 v2, v17
	; GFX9-NEXT: v_mov_b32_e32 v3, v18			; GFX9-NEXT: v_mov_b32_e32 v3, v18
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v16i8:			; GFX10-LABEL: test_call_external_void_func_v16i8:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v0, s[34:35]
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v16i8@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v16i8@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v16i8@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v16i8@rel32@hi+12
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	Show All 14 Lines
	; GFX10-NEXT: v_mov_b32_e32 v12, v3			; GFX10-NEXT: v_mov_b32_e32 v12, v3
	; GFX10-NEXT: v_mov_b32_e32 v1, v16			; GFX10-NEXT: v_mov_b32_e32 v1, v16
	; GFX10-NEXT: v_mov_b32_e32 v2, v17			; GFX10-NEXT: v_mov_b32_e32 v2, v17
	; GFX10-NEXT: v_mov_b32_e32 v3, v18			; GFX10-NEXT: v_mov_b32_e32 v3, v18
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i8:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i8:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v0, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v0, s[0:1]
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i8@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i8@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i8@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i8@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	Show All 14 Lines
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v12, v3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v12, v3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, v16			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, v16
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, v17			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, v17
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, v18			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, v18
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load <16 x i8> addrspace(1), <16 x i8> addrspace(1) addrspace(4)* undef			%ptr = load <16 x i8> addrspace(1), <16 x i8> addrspace(1) addrspace(4)* undef
	%val = load <16 x i8>, <16 x i8> addrspace(1)* %ptr			%val = load <16 x i8>, <16 x i8> addrspace(1)* %ptr
	call amdgpu_gfx void @external_void_func_v16i8(<16 x i8> %val)			call amdgpu_gfx void @external_void_func_v16i8(<16 x i8> %val)
	ret void			ret void
	}			}

	define void @tail_call_byval_align16(<32 x i32> %val, double %tmp) #0 {			define void @tail_call_byval_align16(<32 x i32> %val, double %tmp) #0 {
	; GFX9-LABEL: tail_call_byval_align16:			; GFX9-LABEL: tail_call_byval_align16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 32			; GFX9-NEXT: s_mov_b32 s6, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:20			; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:20
	; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:16			; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:16
	; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33			; GFX9-NEXT: buffer_load_dword v31, off, s[0:3], s33
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: v_writelane_b32 v40, s34, 2			; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: v_writelane_b32 v40, s35, 3			; GFX9-NEXT: v_writelane_b32 v40, s35, 3
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_readlane_b32 s38, v40, 6			; GFX9-NEXT: v_readlane_b32 s38, v40, 6
	; GFX9-NEXT: v_readlane_b32 s37, v40, 5			; GFX9-NEXT: v_readlane_b32 s37, v40, 5
	; GFX9-NEXT: v_readlane_b32 s36, v40, 4			; GFX9-NEXT: v_readlane_b32 s36, v40, 4
	; GFX9-NEXT: v_readlane_b32 s35, v40, 3			; GFX9-NEXT: v_readlane_b32 s35, v40, 3
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v40, 32			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: tail_call_byval_align16:			; GFX10-LABEL: tail_call_byval_align16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines

	; inreg arguments are put in sgprs			; inreg arguments are put in sgprs
	define amdgpu_gfx void @test_call_external_void_func_i1_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_i1_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_i1_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_i1_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i1_inreg@rel32@hi+12
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i1_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i1_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i1_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i1_inreg(i1 inreg true)			call amdgpu_gfx void @external_void_func_i1_inreg(i1 inreg true)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i8_imm_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i8_imm_inreg(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i8_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_i8_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: s_movk_i32 s4, 0x7b			; GFX9-NEXT: s_movk_i32 s4, 0x7b
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i8_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_movk_i32 s4, 0x7b
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i8_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i8_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i8_inreg(i8 inreg 123)			call amdgpu_gfx void @external_void_func_i8_inreg(i8 inreg 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_i16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_i16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_i16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: s_movk_i32 s4, 0x7b			; GFX9-NEXT: s_movk_i32 s4, 0x7b
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_movk_i32 s4, 0x7b
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i16_inreg(i16 inreg 123)			call amdgpu_gfx void @external_void_func_i16_inreg(i16 inreg 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i32_imm_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i32_imm_inreg(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: s_mov_b32 s4, 42			; GFX9-NEXT: s_mov_b32 s4, 42
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 42
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 42
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 42
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 42
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i32_inreg(i32 inreg 42)			call amdgpu_gfx void @external_void_func_i32_inreg(i32 inreg 42)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_i64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_i64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_i64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_movk_i32 s4, 0x7b			; GFX9-NEXT: s_movk_i32 s4, 0x7b
	; GFX9-NEXT: s_mov_b32 s5, 0			; GFX9-NEXT: s_mov_b32 s5, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_movk_i32 s4, 0x7b
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_i64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i64_inreg(i64 inreg 123)			call amdgpu_gfx void @external_void_func_i64_inreg(i64 inreg 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i64_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i64_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i64_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i64_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 6
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: s_mov_b64 s[34:35], 0			; GFX9-NEXT: s_mov_b64 s[34:35], 0
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 4			; GFX9-NEXT: v_writelane_b32 v40, s30, 4
	; GFX9-NEXT: v_writelane_b32 v40, s31, 5			; GFX9-NEXT: v_writelane_b32 v40, s31, 5
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 5			; GFX9-NEXT: v_readlane_b32 s31, v40, 5
	; GFX9-NEXT: v_readlane_b32 s30, v40, 4			; GFX9-NEXT: v_readlane_b32 s30, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 6			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i64_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b64 s[34:35], 0			; GFX10-NEXT: s_mov_b64 s[34:35], 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0			; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i64>, <2 x i64> addrspace(4)* null			%val = load <2 x i64>, <2 x i64> addrspace(4)* null
	call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg %val)			call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 6
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 4			; GFX9-NEXT: v_writelane_b32 v40, s30, 4
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: s_mov_b32 s6, 3			; GFX9-NEXT: s_mov_b32 s6, 3
	; GFX9-NEXT: s_mov_b32 s7, 4			; GFX9-NEXT: s_mov_b32 s7, 4
	; GFX9-NEXT: v_writelane_b32 v40, s31, 5			; GFX9-NEXT: v_writelane_b32 v40, s31, 5
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 5			; GFX9-NEXT: v_readlane_b32 s31, v40, 5
	; GFX9-NEXT: v_readlane_b32 s30, v40, 4			; GFX9-NEXT: v_readlane_b32 s30, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 6			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 3			; GFX10-NEXT: s_mov_b32 s6, 3
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 4			; GFX10-NEXT: s_mov_b32 s7, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg <i64 8589934593, i64 17179869187>)			call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg <i64 8589934593, i64 17179869187>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i64_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i64_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i64_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i64_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 8
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: s_mov_b64 s[34:35], 0			; GFX9-NEXT: s_mov_b64 s[34:35], 0
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 6			; GFX9-NEXT: v_writelane_b32 v40, s30, 6
	; GFX9-NEXT: s_mov_b32 s8, 1			; GFX9-NEXT: s_mov_b32 s8, 1
	; GFX9-NEXT: s_mov_b32 s9, 2			; GFX9-NEXT: s_mov_b32 s9, 2
	; GFX9-NEXT: v_writelane_b32 v40, s31, 7			; GFX9-NEXT: v_writelane_b32 v40, s31, 7
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 7			; GFX9-NEXT: v_readlane_b32 s31, v40, 7
	; GFX9-NEXT: v_readlane_b32 s30, v40, 6			; GFX9-NEXT: v_readlane_b32 s30, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 8			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i64_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i64_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 8			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b64 s[34:35], 0			; GFX10-NEXT: s_mov_b64 s[34:35], 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: s_mov_b32 s8, 1			; GFX10-NEXT: s_mov_b32 s8, 1
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: s_mov_b32 s9, 2			; GFX10-NEXT: s_mov_b32 s9, 2
	; GFX10-NEXT: v_writelane_b32 v40, s30, 6			; GFX10-NEXT: v_writelane_b32 v40, s30, 6
	; GFX10-NEXT: v_writelane_b32 v40, s31, 7			; GFX10-NEXT: v_writelane_b32 v40, s31, 7
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 7			; GFX10-NEXT: v_readlane_b32 s31, v40, 7
	; GFX10-NEXT: v_readlane_b32 s30, v40, 6			; GFX10-NEXT: v_readlane_b32 s30, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 8			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 8			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0			; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 1			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 2			; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 7
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 7
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 8			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%load = load <2 x i64>, <2 x i64> addrspace(4)* null			%load = load <2 x i64>, <2 x i64> addrspace(4)* null
	%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 undef>, <3 x i32> <i32 0, i32 1, i32 2>			%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 undef>, <3 x i32> <i32 0, i32 1, i32 2>

	call amdgpu_gfx void @external_void_func_v3i64_inreg(<3 x i64> inreg %val)			call amdgpu_gfx void @external_void_func_v3i64_inreg(<3 x i64> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i64_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i64_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i64_inreg:			; GFX9-LABEL: test_call_external_void_func_v4i64_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 10
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: s_mov_b64 s[34:35], 0			; GFX9-NEXT: s_mov_b64 s[34:35], 0
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: v_writelane_b32 v40, s10, 6			; GFX9-NEXT: v_writelane_b32 v40, s10, 6
	; GFX9-NEXT: v_writelane_b32 v40, s11, 7			; GFX9-NEXT: v_writelane_b32 v40, s11, 7
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 8			; GFX9-NEXT: v_writelane_b32 v40, s30, 8
	; GFX9-NEXT: s_mov_b32 s8, 1			; GFX9-NEXT: s_mov_b32 s8, 1
	; GFX9-NEXT: s_mov_b32 s9, 2			; GFX9-NEXT: s_mov_b32 s9, 2
	; GFX9-NEXT: s_mov_b32 s10, 3			; GFX9-NEXT: s_mov_b32 s10, 3
	; GFX9-NEXT: s_mov_b32 s11, 4			; GFX9-NEXT: s_mov_b32 s11, 4
	; GFX9-NEXT: v_writelane_b32 v40, s31, 9			; GFX9-NEXT: v_writelane_b32 v40, s31, 9
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 9			; GFX9-NEXT: v_readlane_b32 s31, v40, 9
	; GFX9-NEXT: v_readlane_b32 s30, v40, 8			; GFX9-NEXT: v_readlane_b32 s30, v40, 8
	; GFX9-NEXT: v_readlane_b32 s11, v40, 7			; GFX9-NEXT: v_readlane_b32 s11, v40, 7
	; GFX9-NEXT: v_readlane_b32 s10, v40, 6			; GFX9-NEXT: v_readlane_b32 s10, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 10			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i64_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i64_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 10			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b64 s[34:35], 0			; GFX10-NEXT: s_mov_b64 s[34:35], 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	Show All 13 Lines
	; GFX10-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 10			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i64_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i64_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 10			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0			; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	Show All 13 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 10			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%load = load <2 x i64>, <2 x i64> addrspace(4)* null			%load = load <2 x i64>, <2 x i64> addrspace(4)* null
	%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	call amdgpu_gfx void @external_void_func_v4i64_inreg(<4 x i64> inreg %val)			call amdgpu_gfx void @external_void_func_v4i64_inreg(<4 x i64> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_f16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_f16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_f16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: s_movk_i32 s4, 0x4400			; GFX9-NEXT: s_movk_i32 s4, 0x4400
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_f16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_movk_i32 s4, 0x4400
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_movk_i32 s4, 0x4400
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x4400
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x4400
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f16_inreg(half inreg 4.0)			call amdgpu_gfx void @external_void_func_f16_inreg(half inreg 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_f32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_f32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_f32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: s_mov_b32 s4, 4.0			; GFX9-NEXT: s_mov_b32 s4, 4.0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 4.0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 4.0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 4.0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 4.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f32_inreg(float inreg 4.0)			call amdgpu_gfx void @external_void_func_f32_inreg(float inreg 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v2f32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_mov_b32 s4, 1.0			; GFX9-NEXT: s_mov_b32 s4, 1.0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1.0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 1.0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2f32_inreg(<2 x float> inreg <float 1.0, float 2.0>)			call amdgpu_gfx void @external_void_func_v2f32_inreg(<2 x float> inreg <float 1.0, float 2.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3f32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 5
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 3			; GFX9-NEXT: v_writelane_b32 v40, s30, 3
	; GFX9-NEXT: s_mov_b32 s4, 1.0			; GFX9-NEXT: s_mov_b32 s4, 1.0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: s_mov_b32 s6, 4.0			; GFX9-NEXT: s_mov_b32 s6, 4.0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 4			; GFX9-NEXT: v_writelane_b32 v40, s31, 4
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 4			; GFX9-NEXT: v_readlane_b32 s31, v40, 4
	; GFX9-NEXT: v_readlane_b32 s30, v40, 3			; GFX9-NEXT: v_readlane_b32 s30, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 5			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 5			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1.0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 1.0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 4.0			; GFX10-NEXT: s_mov_b32 s6, 4.0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 3			; GFX10-NEXT: v_writelane_b32 v40, s30, 3
	; GFX10-NEXT: v_writelane_b32 v40, s31, 4			; GFX10-NEXT: v_writelane_b32 v40, s31, 4
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 4			; GFX10-NEXT: v_readlane_b32 s31, v40, 4
	; GFX10-NEXT: v_readlane_b32 s30, v40, 3			; GFX10-NEXT: v_readlane_b32 s30, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 5			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 4
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f32_inreg(<3 x float> inreg <float 1.0, float 2.0, float 4.0>)			call amdgpu_gfx void @external_void_func_v3f32_inreg(<3 x float> inreg <float 1.0, float 2.0, float 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5f32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v5f32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v5f32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 7
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 5			; GFX9-NEXT: v_writelane_b32 v40, s30, 5
	; GFX9-NEXT: s_mov_b32 s4, 1.0			; GFX9-NEXT: s_mov_b32 s4, 1.0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: s_mov_b32 s6, 4.0			; GFX9-NEXT: s_mov_b32 s6, 4.0
	; GFX9-NEXT: s_mov_b32 s7, -1.0			; GFX9-NEXT: s_mov_b32 s7, -1.0
	; GFX9-NEXT: s_mov_b32 s8, 0.5			; GFX9-NEXT: s_mov_b32 s8, 0.5
	; GFX9-NEXT: v_writelane_b32 v40, s31, 6			; GFX9-NEXT: v_writelane_b32 v40, s31, 6
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5f32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5f32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 6			; GFX9-NEXT: v_readlane_b32 s31, v40, 6
	; GFX9-NEXT: v_readlane_b32 s30, v40, 5			; GFX9-NEXT: v_readlane_b32 s30, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 7			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v5f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 7			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1.0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5f32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5f32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 1.0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 4.0			; GFX10-NEXT: s_mov_b32 s6, 4.0
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, -1.0			; GFX10-NEXT: s_mov_b32 s7, -1.0
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: s_mov_b32 s8, 0.5			; GFX10-NEXT: s_mov_b32 s8, 0.5
	; GFX10-NEXT: v_writelane_b32 v40, s30, 5			; GFX10-NEXT: v_writelane_b32 v40, s30, 5
	; GFX10-NEXT: v_writelane_b32 v40, s31, 6			; GFX10-NEXT: v_writelane_b32 v40, s31, 6
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 6			; GFX10-NEXT: v_readlane_b32 s31, v40, 6
	; GFX10-NEXT: v_readlane_b32 s30, v40, 5			; GFX10-NEXT: v_readlane_b32 s30, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 7			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, -1.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, -1.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0.5			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0.5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 6
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v5f32_inreg(<5 x float> inreg <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)			call amdgpu_gfx void @external_void_func_v5f32_inreg(<5 x float> inreg <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_f64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_f64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_f64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_mov_b32 s4, 0			; GFX9-NEXT: s_mov_b32 s4, 0
	; GFX9-NEXT: s_mov_b32 s5, 0x40100000			; GFX9-NEXT: s_mov_b32 s5, 0x40100000
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_f64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 0x40100000
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_f64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_f64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 0x40100000
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40100000
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40100000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f64_inreg(double inreg 4.0)			call amdgpu_gfx void @external_void_func_f64_inreg(double inreg 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v2f64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 6
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 4			; GFX9-NEXT: v_writelane_b32 v40, s30, 4
	; GFX9-NEXT: s_mov_b32 s4, 0			; GFX9-NEXT: s_mov_b32 s4, 0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: s_mov_b32 s6, 0			; GFX9-NEXT: s_mov_b32 s6, 0
	; GFX9-NEXT: s_mov_b32 s7, 0x40100000			; GFX9-NEXT: s_mov_b32 s7, 0x40100000
	; GFX9-NEXT: v_writelane_b32 v40, s31, 5			; GFX9-NEXT: v_writelane_b32 v40, s31, 5
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 5			; GFX9-NEXT: v_readlane_b32 s31, v40, 5
	; GFX9-NEXT: v_readlane_b32 s30, v40, 4			; GFX9-NEXT: v_readlane_b32 s30, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 6			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2f64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 0			; GFX10-NEXT: s_mov_b32 s6, 0
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 0x40100000			; GFX10-NEXT: s_mov_b32 s7, 0x40100000
	; GFX10-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2f64_inreg(<2 x double> inreg <double 2.0, double 4.0>)			call amdgpu_gfx void @external_void_func_v2f64_inreg(<2 x double> inreg <double 2.0, double 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3f64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 8
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 6			; GFX9-NEXT: v_writelane_b32 v40, s30, 6
	; GFX9-NEXT: s_mov_b32 s4, 0			; GFX9-NEXT: s_mov_b32 s4, 0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: s_mov_b32 s6, 0			; GFX9-NEXT: s_mov_b32 s6, 0
	; GFX9-NEXT: s_mov_b32 s7, 0x40100000			; GFX9-NEXT: s_mov_b32 s7, 0x40100000
	; GFX9-NEXT: s_mov_b32 s8, 0			; GFX9-NEXT: s_mov_b32 s8, 0
	; GFX9-NEXT: s_mov_b32 s9, 0x40200000			; GFX9-NEXT: s_mov_b32 s9, 0x40200000
	; GFX9-NEXT: v_writelane_b32 v40, s31, 7			; GFX9-NEXT: v_writelane_b32 v40, s31, 7
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 7			; GFX9-NEXT: v_readlane_b32 s31, v40, 7
	; GFX9-NEXT: v_readlane_b32 s30, v40, 6			; GFX9-NEXT: v_readlane_b32 s30, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 8			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 8			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 0			; GFX10-NEXT: s_mov_b32 s6, 0
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 0x40100000			; GFX10-NEXT: s_mov_b32 s7, 0x40100000
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: s_mov_b32 s8, 0			; GFX10-NEXT: s_mov_b32 s8, 0
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: s_mov_b32 s9, 0x40200000			; GFX10-NEXT: s_mov_b32 s9, 0x40200000
	; GFX10-NEXT: v_writelane_b32 v40, s30, 6			; GFX10-NEXT: v_writelane_b32 v40, s30, 6
	; GFX10-NEXT: v_writelane_b32 v40, s31, 7			; GFX10-NEXT: v_writelane_b32 v40, s31, 7
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 7			; GFX10-NEXT: v_readlane_b32 s31, v40, 7
	; GFX10-NEXT: v_readlane_b32 s30, v40, 6			; GFX10-NEXT: v_readlane_b32 s30, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 8			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 8			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 0x40200000			; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 0x40200000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 7
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 7
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 8			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f64_inreg(<3 x double> inreg <double 2.0, double 4.0, double 8.0>)			call amdgpu_gfx void @external_void_func_v3f64_inreg(<3 x double> inreg <double 2.0, double 4.0, double 8.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i16_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0			; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i16_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3
	; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0			; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
				; GFX10-NEXT: s_mov_b32 s33, s32
				; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
				; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i16>, <2 x i16> addrspace(4)* undef			%val = load <2 x i16>, <2 x i16> addrspace(4)* undef
	call amdgpu_gfx void @external_void_func_v2i16_inreg(<2 x i16> inreg %val)			call amdgpu_gfx void @external_void_func_v2i16_inreg(<2 x i16> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i16_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <3 x i16>, <3 x i16> addrspace(4)* undef			%val = load <3 x i16>, <3 x i16> addrspace(4)* undef
	call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg %val)			call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f16_inreg:			; GFX9-LABEL: test_call_external_void_func_v3f16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <3 x half>, <3 x half> addrspace(4)* undef			%val = load <3 x half>, <3 x half> addrspace(4)* undef
	call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg %val)			call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_mov_b32 s4, 0x20001			; GFX9-NEXT: s_mov_b32 s4, 0x20001
	; GFX9-NEXT: s_mov_b32 s5, 3			; GFX9-NEXT: s_mov_b32 s5, 3
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 0x20001
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 3
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 3
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 3
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg <i16 1, i16 2, i16 3>)			call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg <i16 1, i16 2, i16 3>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3f16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_mov_b32 s4, 0x40003c00			; GFX9-NEXT: s_mov_b32 s4, 0x40003c00
	; GFX9-NEXT: s_movk_i32 s5, 0x4400			; GFX9-NEXT: s_movk_i32 s5, 0x4400
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 0x40003c00
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_movk_i32 s5, 0x4400
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 0x40003c00
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_movk_i32 s5, 0x4400
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x40003c00
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_movk_i32 s5, 0x4400
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x40003c00
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_movk_i32 s5, 0x4400
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg <half 1.0, half 2.0, half 4.0>)			call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg <half 1.0, half 2.0, half 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i16_inreg:			; GFX9-LABEL: test_call_external_void_func_v4i16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <4 x i16>, <4 x i16> addrspace(4)* undef			%val = load <4 x i16>, <4 x i16> addrspace(4)* undef
	call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg %val)			call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v4i16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_mov_b32 s4, 0x20001			; GFX9-NEXT: s_mov_b32 s4, 0x20001
	; GFX9-NEXT: s_mov_b32 s5, 0x40003			; GFX9-NEXT: s_mov_b32 s5, 0x40003
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 0x20001
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 0x40003
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 0x40003
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40003
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40003
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg <i16 1, i16 2, i16 3, i16 4>)			call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg <i16 1, i16 2, i16 3, i16 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f16_inreg:			; GFX9-LABEL: test_call_external_void_func_v2f16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0			; GFX9-NEXT: s_load_dword s4, s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f16_inreg:			; GFX10-LABEL: test_call_external_void_func_v2f16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3
	; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0			; GFX10-NEXT: s_load_dword s4, s[34:35], 0x0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
				; GFX10-NEXT: s_mov_b32 s33, s32
				; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2f16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
				; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x half>, <2 x half> addrspace(4)* undef			%val = load <2 x half>, <2 x half> addrspace(4)* undef
	call amdgpu_gfx void @external_void_func_v2f16_inreg(<2 x half> inreg %val)			call amdgpu_gfx void @external_void_func_v2f16_inreg(<2 x half> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i32_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i32>, <2 x i32> addrspace(4)* undef			%val = load <2 x i32>, <2 x i32> addrspace(4)* undef
	call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg <i32 1, i32 2>)			call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg <i32 1, i32 2>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_imm_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_imm_inreg(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 5
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 3			; GFX9-NEXT: v_writelane_b32 v40, s30, 3
	; GFX9-NEXT: s_mov_b32 s4, 3			; GFX9-NEXT: s_mov_b32 s4, 3
	; GFX9-NEXT: s_mov_b32 s5, 4			; GFX9-NEXT: s_mov_b32 s5, 4
	; GFX9-NEXT: s_mov_b32 s6, 5			; GFX9-NEXT: s_mov_b32 s6, 5
	; GFX9-NEXT: v_writelane_b32 v40, s31, 4			; GFX9-NEXT: v_writelane_b32 v40, s31, 4
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 4			; GFX9-NEXT: v_readlane_b32 s31, v40, 4
	; GFX9-NEXT: v_readlane_b32 s30, v40, 3			; GFX9-NEXT: v_readlane_b32 s30, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 5			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 5			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 3
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 4
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 3
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 4
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 5			; GFX10-NEXT: s_mov_b32 s6, 5
	; GFX10-NEXT: v_writelane_b32 v40, s30, 3			; GFX10-NEXT: v_writelane_b32 v40, s30, 3
	; GFX10-NEXT: v_writelane_b32 v40, s31, 4			; GFX10-NEXT: v_writelane_b32 v40, s31, 4
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 4			; GFX10-NEXT: v_readlane_b32 s31, v40, 4
	; GFX10-NEXT: v_readlane_b32 s30, v40, 3			; GFX10-NEXT: v_readlane_b32 s30, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 5			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 4
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v3i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_i32_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_i32_inreg(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i32_i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i32_i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 6
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 4			; GFX9-NEXT: v_writelane_b32 v40, s30, 4
	; GFX9-NEXT: s_mov_b32 s4, 3			; GFX9-NEXT: s_mov_b32 s4, 3
	; GFX9-NEXT: s_mov_b32 s5, 4			; GFX9-NEXT: s_mov_b32 s5, 4
	; GFX9-NEXT: s_mov_b32 s6, 5			; GFX9-NEXT: s_mov_b32 s6, 5
	; GFX9-NEXT: s_mov_b32 s7, 6			; GFX9-NEXT: s_mov_b32 s7, 6
	; GFX9-NEXT: v_writelane_b32 v40, s31, 5			; GFX9-NEXT: v_writelane_b32 v40, s31, 5
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 5			; GFX9-NEXT: v_readlane_b32 s31, v40, 5
	; GFX9-NEXT: v_readlane_b32 s30, v40, 4			; GFX9-NEXT: v_readlane_b32 s30, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 6			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i32_i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 3
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 4
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v3i32_i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 3
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 4
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 5			; GFX10-NEXT: s_mov_b32 s6, 5
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 6			; GFX10-NEXT: s_mov_b32 s7, 6
	; GFX10-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 6			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i32_i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>, i32 inreg 6)			call amdgpu_gfx void @external_void_func_v3i32_i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>, i32 inreg 6)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i32_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v4i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 6
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 4			; GFX9-NEXT: v_writelane_b32 v40, s30, 4
	; GFX9-NEXT: v_writelane_b32 v40, s31, 5			; GFX9-NEXT: v_writelane_b32 v40, s31, 5
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 5			; GFX9-NEXT: v_readlane_b32 s31, v40, 5
	; GFX9-NEXT: v_readlane_b32 s30, v40, 4			; GFX9-NEXT: v_readlane_b32 s30, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 6			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[34:35], 0x0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <4 x i32>, <4 x i32> addrspace(4)* undef			%val = load <4 x i32>, <4 x i32> addrspace(4)* undef
	call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v4i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 6
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 4			; GFX9-NEXT: v_writelane_b32 v40, s30, 4
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: s_mov_b32 s6, 3			; GFX9-NEXT: s_mov_b32 s6, 3
	; GFX9-NEXT: s_mov_b32 s7, 4			; GFX9-NEXT: s_mov_b32 s7, 4
	; GFX9-NEXT: v_writelane_b32 v40, s31, 5			; GFX9-NEXT: v_writelane_b32 v40, s31, 5
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 5			; GFX9-NEXT: v_readlane_b32 s31, v40, 5
	; GFX9-NEXT: v_readlane_b32 s30, v40, 4			; GFX9-NEXT: v_readlane_b32 s30, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 6			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 3			; GFX10-NEXT: s_mov_b32 s6, 3
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 4			; GFX10-NEXT: s_mov_b32 s7, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 5
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg <i32 1, i32 2, i32 3, i32 4>)			call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg <i32 1, i32 2, i32 3, i32 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5i32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5i32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v5i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v5i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 7
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 5			; GFX9-NEXT: v_writelane_b32 v40, s30, 5
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: s_mov_b32 s6, 3			; GFX9-NEXT: s_mov_b32 s6, 3
	; GFX9-NEXT: s_mov_b32 s7, 4			; GFX9-NEXT: s_mov_b32 s7, 4
	; GFX9-NEXT: s_mov_b32 s8, 5			; GFX9-NEXT: s_mov_b32 s8, 5
	; GFX9-NEXT: v_writelane_b32 v40, s31, 6			; GFX9-NEXT: v_writelane_b32 v40, s31, 6
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v5i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 6			; GFX9-NEXT: v_readlane_b32 s31, v40, 6
	; GFX9-NEXT: v_readlane_b32 s30, v40, 5			; GFX9-NEXT: v_readlane_b32 s30, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 7			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v5i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 7			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v5i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 3			; GFX10-NEXT: s_mov_b32 s6, 3
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 4			; GFX10-NEXT: s_mov_b32 s7, 4
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: s_mov_b32 s8, 5			; GFX10-NEXT: s_mov_b32 s8, 5
	; GFX10-NEXT: v_writelane_b32 v40, s30, 5			; GFX10-NEXT: v_writelane_b32 v40, s30, 5
	; GFX10-NEXT: v_writelane_b32 v40, s31, 6			; GFX10-NEXT: v_writelane_b32 v40, s31, 6
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 6			; GFX10-NEXT: v_readlane_b32 s31, v40, 6
	; GFX10-NEXT: v_readlane_b32 s30, v40, 5			; GFX10-NEXT: v_readlane_b32 s30, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 7			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 6
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v5i32_inreg(<5 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v5i32_inreg(<5 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v8i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v8i32_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v8i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v8i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 10
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: v_writelane_b32 v40, s10, 6			; GFX9-NEXT: v_writelane_b32 v40, s10, 6
	; GFX9-NEXT: v_writelane_b32 v40, s11, 7			; GFX9-NEXT: v_writelane_b32 v40, s11, 7
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_load_dwordx8 s[4:11], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx8 s[4:11], s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 8			; GFX9-NEXT: v_writelane_b32 v40, s30, 8
	; GFX9-NEXT: v_writelane_b32 v40, s31, 9			; GFX9-NEXT: v_writelane_b32 v40, s31, 9
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v8i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 9			; GFX9-NEXT: v_readlane_b32 s31, v40, 9
	; GFX9-NEXT: v_readlane_b32 s30, v40, 8			; GFX9-NEXT: v_readlane_b32 s30, v40, 8
	; GFX9-NEXT: v_readlane_b32 s11, v40, 7			; GFX9-NEXT: v_readlane_b32 s11, v40, 7
	; GFX9-NEXT: v_readlane_b32 s10, v40, 6			; GFX9-NEXT: v_readlane_b32 s10, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 10			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v8i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 10			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	Show All 10 Lines
	; GFX10-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 10			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 10			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	Show All 10 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 10			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load <8 x i32> addrspace(4), <8 x i32> addrspace(4) addrspace(4)* undef			%ptr = load <8 x i32> addrspace(4), <8 x i32> addrspace(4) addrspace(4)* undef
	%val = load <8 x i32>, <8 x i32> addrspace(4)* %ptr			%val = load <8 x i32>, <8 x i32> addrspace(4)* %ptr
	call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v8i32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v8i32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v8i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v8i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 10
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: v_writelane_b32 v40, s10, 6			; GFX9-NEXT: v_writelane_b32 v40, s10, 6
	; GFX9-NEXT: v_writelane_b32 v40, s11, 7			; GFX9-NEXT: v_writelane_b32 v40, s11, 7
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 8			; GFX9-NEXT: v_writelane_b32 v40, s30, 8
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: s_mov_b32 s6, 3			; GFX9-NEXT: s_mov_b32 s6, 3
	; GFX9-NEXT: s_mov_b32 s7, 4			; GFX9-NEXT: s_mov_b32 s7, 4
	; GFX9-NEXT: s_mov_b32 s8, 5			; GFX9-NEXT: s_mov_b32 s8, 5
	Show All 11 Lines
	; GFX9-NEXT: v_readlane_b32 s10, v40, 6			; GFX9-NEXT: v_readlane_b32 s10, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 10			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v8i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 10			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: s_mov_b32 s4, 1
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
				; GFX10-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_v8i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: s_mov_b32 s6, 3			; GFX10-NEXT: s_mov_b32 s6, 3
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: s_mov_b32 s7, 4			; GFX10-NEXT: s_mov_b32 s7, 4
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: s_mov_b32 s8, 5			; GFX10-NEXT: s_mov_b32 s8, 5
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: s_mov_b32 s9, 6			; GFX10-NEXT: s_mov_b32 s9, 6
	Show All 10 Lines
	; GFX10-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 10			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 10			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
				; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 6			; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 6
	Show All 10 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 10			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)			call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v16i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v16i32_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v16i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v16i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 18
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: v_writelane_b32 v40, s10, 6			; GFX9-NEXT: v_writelane_b32 v40, s10, 6
	; GFX9-NEXT: v_writelane_b32 v40, s11, 7			; GFX9-NEXT: v_writelane_b32 v40, s11, 7
	; GFX9-NEXT: v_writelane_b32 v40, s12, 8			; GFX9-NEXT: v_writelane_b32 v40, s12, 8
	; GFX9-NEXT: v_writelane_b32 v40, s13, 9			; GFX9-NEXT: v_writelane_b32 v40, s13, 9
	; GFX9-NEXT: v_writelane_b32 v40, s14, 10			; GFX9-NEXT: v_writelane_b32 v40, s14, 10
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s15, 11			; GFX9-NEXT: v_writelane_b32 v40, s15, 11
	; GFX9-NEXT: v_writelane_b32 v40, s16, 12			; GFX9-NEXT: v_writelane_b32 v40, s16, 12
	; GFX9-NEXT: v_writelane_b32 v40, s17, 13			; GFX9-NEXT: v_writelane_b32 v40, s17, 13
	; GFX9-NEXT: v_writelane_b32 v40, s18, 14			; GFX9-NEXT: v_writelane_b32 v40, s18, 14
	; GFX9-NEXT: v_writelane_b32 v40, s19, 15			; GFX9-NEXT: v_writelane_b32 v40, s19, 15
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_load_dwordx16 s[4:19], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx16 s[4:19], s[34:35], 0x0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 16			; GFX9-NEXT: v_writelane_b32 v40, s30, 16
	; GFX9-NEXT: v_writelane_b32 v40, s31, 17			; GFX9-NEXT: v_writelane_b32 v40, s31, 17
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v16i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_v16i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v16i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v16i32_inreg@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	Show All 11 Lines
	; GFX9-NEXT: v_readlane_b32 s10, v40, 6			; GFX9-NEXT: v_readlane_b32 s10, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 18			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v16i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v16i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 18			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-NEXT: v_writelane_b32 v40, s12, 8			; GFX10-NEXT: v_writelane_b32 v40, s12, 8
	Show All 26 Lines
	; GFX10-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 18			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 18			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s12, 8			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s12, 8
	Show All 26 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 18			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load <16 x i32> addrspace(4), <16 x i32> addrspace(4) addrspace(4)* undef			%ptr = load <16 x i32> addrspace(4), <16 x i32> addrspace(4) addrspace(4)* undef
	%val = load <16 x i32>, <16 x i32> addrspace(4)* %ptr			%val = load <16 x i32>, <16 x i32> addrspace(4)* %ptr
	call amdgpu_gfx void @external_void_func_v16i32_inreg(<16 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v16i32_inreg(<16 x i32> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v32i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v32i32_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v32i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v32i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 28
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: v_writelane_b32 v40, s10, 6			; GFX9-NEXT: v_writelane_b32 v40, s10, 6
	; GFX9-NEXT: v_writelane_b32 v40, s11, 7			; GFX9-NEXT: v_writelane_b32 v40, s11, 7
	Show All 12 Lines
	; GFX9-NEXT: v_writelane_b32 v40, s23, 19			; GFX9-NEXT: v_writelane_b32 v40, s23, 19
	; GFX9-NEXT: v_writelane_b32 v40, s24, 20			; GFX9-NEXT: v_writelane_b32 v40, s24, 20
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_load_dwordx16 s[36:51], s[34:35], 0x40			; GFX9-NEXT: s_load_dwordx16 s[36:51], s[34:35], 0x40
	; GFX9-NEXT: s_load_dwordx16 s[4:19], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx16 s[4:19], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s25, 21			; GFX9-NEXT: v_writelane_b32 v40, s25, 21
	; GFX9-NEXT: v_writelane_b32 v40, s26, 22			; GFX9-NEXT: v_writelane_b32 v40, s26, 22
	; GFX9-NEXT: v_writelane_b32 v40, s27, 23			; GFX9-NEXT: v_writelane_b32 v40, s27, 23
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s28, 24			; GFX9-NEXT: v_writelane_b32 v40, s28, 24
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, s46			; GFX9-NEXT: v_mov_b32_e32 v0, s46
	; GFX9-NEXT: v_writelane_b32 v40, s29, 25			; GFX9-NEXT: v_writelane_b32 v40, s29, 25
	; GFX9-NEXT: v_mov_b32_e32 v1, s47			; GFX9-NEXT: v_mov_b32_e32 v1, s47
	; GFX9-NEXT: v_mov_b32_e32 v2, s48			; GFX9-NEXT: v_mov_b32_e32 v2, s48
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_readlane_b32 s10, v40, 6			; GFX9-NEXT: v_readlane_b32 s10, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 28			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v32i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 28			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-NEXT: v_writelane_b32 v40, s12, 8			; GFX10-NEXT: v_writelane_b32 v40, s12, 8
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 28			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 28			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s12, 8			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s12, 8
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 28			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load <32 x i32> addrspace(4), <32 x i32> addrspace(4) addrspace(4)* undef			%ptr = load <32 x i32> addrspace(4), <32 x i32> addrspace(4) addrspace(4)* undef
	%val = load <32 x i32>, <32 x i32> addrspace(4)* %ptr			%val = load <32 x i32>, <32 x i32> addrspace(4)* %ptr
	call amdgpu_gfx void @external_void_func_v32i32_inreg(<32 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v32i32_inreg(<32 x i32> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v32i32_i32_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v32i32_i32_inreg(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v32i32_i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v32i32_i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 28
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
	; GFX9-NEXT: v_writelane_b32 v40, s6, 2			; GFX9-NEXT: v_writelane_b32 v40, s6, 2
	; GFX9-NEXT: v_writelane_b32 v40, s7, 3			; GFX9-NEXT: v_writelane_b32 v40, s7, 3
	; GFX9-NEXT: v_writelane_b32 v40, s8, 4			; GFX9-NEXT: v_writelane_b32 v40, s8, 4
	; GFX9-NEXT: v_writelane_b32 v40, s9, 5			; GFX9-NEXT: v_writelane_b32 v40, s9, 5
	; GFX9-NEXT: v_writelane_b32 v40, s10, 6			; GFX9-NEXT: v_writelane_b32 v40, s10, 6
	; GFX9-NEXT: v_writelane_b32 v40, s11, 7			; GFX9-NEXT: v_writelane_b32 v40, s11, 7
	Show All 13 Lines
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_load_dword s52, s[34:35], 0x0			; GFX9-NEXT: s_load_dword s52, s[34:35], 0x0
	; GFX9-NEXT: ; kill: killed $sgpr34_sgpr35			; GFX9-NEXT: ; kill: killed $sgpr34_sgpr35
	; GFX9-NEXT: ; kill: killed $sgpr34_sgpr35			; GFX9-NEXT: ; kill: killed $sgpr34_sgpr35
	; GFX9-NEXT: s_load_dwordx16 s[36:51], s[34:35], 0x40			; GFX9-NEXT: s_load_dwordx16 s[36:51], s[34:35], 0x40
	; GFX9-NEXT: s_load_dwordx16 s[4:19], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx16 s[4:19], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s24, 20			; GFX9-NEXT: v_writelane_b32 v40, s24, 20
	; GFX9-NEXT: v_writelane_b32 v40, s25, 21			; GFX9-NEXT: v_writelane_b32 v40, s25, 21
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s26, 22			; GFX9-NEXT: v_writelane_b32 v40, s26, 22
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, s52			; GFX9-NEXT: v_mov_b32_e32 v0, s52
	; GFX9-NEXT: v_writelane_b32 v40, s27, 23			; GFX9-NEXT: v_writelane_b32 v40, s27, 23
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24
	; GFX9-NEXT: v_mov_b32_e32 v0, s46			; GFX9-NEXT: v_mov_b32_e32 v0, s46
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_readlane_b32 s10, v40, 6			; GFX9-NEXT: v_readlane_b32 s10, v40, 6
	; GFX9-NEXT: v_readlane_b32 s9, v40, 5			; GFX9-NEXT: v_readlane_b32 s9, v40, 5
	; GFX9-NEXT: v_readlane_b32 s8, v40, 4			; GFX9-NEXT: v_readlane_b32 s8, v40, 4
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 28			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32_i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v32i32_i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 28			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-NEXT: v_writelane_b32 v40, s12, 8			; GFX10-NEXT: v_writelane_b32 v40, s12, 8
	▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 28			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 28			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s6, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s7, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s8, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s9, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s10, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s11, 7
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s12, 8			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s12, 8
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s10, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s9, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s8, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s7, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 28			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr0 = load <32 x i32> addrspace(4), <32 x i32> addrspace(4) addrspace(4)* undef			%ptr0 = load <32 x i32> addrspace(4), <32 x i32> addrspace(4) addrspace(4)* undef
	%val0 = load <32 x i32>, <32 x i32> addrspace(4)* %ptr0			%val0 = load <32 x i32>, <32 x i32> addrspace(4)* %ptr0
	%val1 = load i32, i32 addrspace(4)* undef			%val1 = load i32, i32 addrspace(4)* undef
	call amdgpu_gfx void @external_void_func_v32i32_i32_inreg(<32 x i32> inreg %val0, i32 inreg %val1)			call amdgpu_gfx void @external_void_func_v32i32_i32_inreg(<32 x i32> inreg %val0, i32 inreg %val1)
	ret void			ret void
	}			}

	define amdgpu_gfx void @stack_passed_arg_alignment_v32i32_f64(<32 x i32> %val, double %tmp) #0 {			define amdgpu_gfx void @stack_passed_arg_alignment_v32i32_f64(<32 x i32> %val, double %tmp) #0 {
	; GFX9-LABEL: stack_passed_arg_alignment_v32i32_f64:			; GFX9-LABEL: stack_passed_arg_alignment_v32i32_f64:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33			; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33
	; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4			; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12
	; GFX9-NEXT: s_waitcnt vmcnt(1)			; GFX9-NEXT: s_waitcnt vmcnt(1)
	; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32
	; GFX9-NEXT: s_waitcnt vmcnt(1)			; GFX9-NEXT: s_waitcnt vmcnt(1)
	; GFX9-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: stack_passed_arg_alignment_v32i32_f64:			; GFX10-LABEL: stack_passed_arg_alignment_v32i32_f64:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: buffer_load_dword v32, off, s[0:3], s33			; GFX10-NEXT: buffer_load_dword v32, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: s_addk_i32 s32, 0x400
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, stack_passed_f64_arg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, stack_passed_f64_arg@rel32@hi+12
	; GFX10-NEXT: s_waitcnt vmcnt(1)			; GFX10-NEXT: s_waitcnt vmcnt(1)
	; GFX10-NEXT: buffer_store_dword v32, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v32, off, s[0:3], s32
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4			; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:12
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: stack_passed_arg_alignment_v32i32_f64:			; GFX10-SCRATCH-LABEL: stack_passed_arg_alignment_v32i32_f64:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:12 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_load_dwordx2 v[32:33], off, s33			; GFX10-SCRATCH-NEXT: scratch_load_dwordx2 v[32:33], off, s33
				; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, stack_passed_f64_arg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, stack_passed_f64_arg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, stack_passed_f64_arg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, stack_passed_f64_arg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx2 off, v[32:33], s32			; GFX10-SCRATCH-NEXT: scratch_store_dwordx2 off, v[32:33], s32
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:8 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:8
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:12
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx void @stack_passed_f64_arg(<32 x i32> %val, double %tmp)			call amdgpu_gfx void @stack_passed_f64_arg(<32 x i32> %val, double %tmp)
	ret void			ret void
	}			}

	define amdgpu_gfx void @stack_12xv3i32() #0 {			define amdgpu_gfx void @stack_12xv3i32() #0 {
	; GFX9-LABEL: stack_12xv3i32:			; GFX9-LABEL: stack_12xv3i32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 12			; GFX9-NEXT: v_mov_b32_e32 v0, 12
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 13			; GFX9-NEXT: v_mov_b32_e32 v0, 13
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; GFX9-NEXT: v_mov_b32_e32 v0, 14			; GFX9-NEXT: v_mov_b32_e32 v0, 14
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	Show All 35 Lines
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_12xv3i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_12xv3i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_12xv3i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_12xv3i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: stack_12xv3i32:			; GFX10-LABEL: stack_12xv3i32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 12			; GFX10-NEXT: v_mov_b32_e32 v0, 12
	; GFX10-NEXT: v_mov_b32_e32 v1, 13			; GFX10-NEXT: v_mov_b32_e32 v1, 13
	; GFX10-NEXT: v_mov_b32_e32 v2, 14			; GFX10-NEXT: v_mov_b32_e32 v2, 14
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_mov_b32_e32 v3, 15			; GFX10-NEXT: v_mov_b32_e32 v3, 15
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
	; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:12			; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:12
	Show All 32 Lines
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_12xv3i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_12xv3i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_12xv3i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_12xv3i32@rel32@hi+12
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: stack_12xv3i32:			; GFX10-SCRATCH-LABEL: stack_12xv3i32:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 12			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 12
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32			; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 1
				; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v9, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v9, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v10, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v10, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v11, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v11, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v12, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v12, 4
	Show All 19 Lines
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_12xv3i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_12xv3i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_12xv3i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_12xv3i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx void @external_void_func_12xv3i32(			call amdgpu_gfx void @external_void_func_12xv3i32(
	<3 x i32><i32 0, i32 0, i32 0>,			<3 x i32><i32 0, i32 0, i32 0>,
	<3 x i32><i32 1, i32 1, i32 1>,			<3 x i32><i32 1, i32 1, i32 1>,
	Show All 11 Lines
	}			}

	define amdgpu_gfx void @stack_8xv5i32() #0 {			define amdgpu_gfx void @stack_8xv5i32() #0 {
	; GFX9-LABEL: stack_8xv5i32:			; GFX9-LABEL: stack_8xv5i32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 8			; GFX9-NEXT: v_mov_b32_e32 v0, 8
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 9			; GFX9-NEXT: v_mov_b32_e32 v0, 9
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; GFX9-NEXT: v_mov_b32_e32 v0, 10			; GFX9-NEXT: v_mov_b32_e32 v0, 10
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_8xv5i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_8xv5i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: stack_8xv5i32:			; GFX10-LABEL: stack_8xv5i32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_mov_b32_e32 v0, 8			; GFX10-NEXT: v_mov_b32_e32 v0, 8
	; GFX10-NEXT: v_mov_b32_e32 v1, 9			; GFX10-NEXT: v_mov_b32_e32 v1, 9
	; GFX10-NEXT: v_mov_b32_e32 v2, 10			; GFX10-NEXT: v_mov_b32_e32 v2, 10
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
	; GFX10-NEXT: v_mov_b32_e32 v0, 11			; GFX10-NEXT: v_mov_b32_e32 v0, 11
	; GFX10-NEXT: v_mov_b32_e32 v1, 12			; GFX10-NEXT: v_mov_b32_e32 v1, 12
	; GFX10-NEXT: v_mov_b32_e32 v2, 13			; GFX10-NEXT: v_mov_b32_e32 v2, 13
	Show All 40 Lines
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_8xv5i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_8xv5i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5i32@rel32@hi+12
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: stack_8xv5i32:			; GFX10-SCRATCH-LABEL: stack_8xv5i32:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 12			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 12
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 8			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 8
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 9			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 9
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 10			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 10
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 11			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 11
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16			; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32			; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	Show All 29 Lines
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_8xv5i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_8xv5i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx void @external_void_func_8xv5i32(			call amdgpu_gfx void @external_void_func_8xv5i32(
	<5 x i32><i32 0, i32 0, i32 0, i32 0, i32 0>,			<5 x i32><i32 0, i32 0, i32 0, i32 0, i32 0>,
	<5 x i32><i32 1, i32 1, i32 1, i32 1, i32 1>,			<5 x i32><i32 1, i32 1, i32 1, i32 1, i32 1>,
	<5 x i32><i32 2, i32 2, i32 2, i32 2, i32 2>,			<5 x i32><i32 2, i32 2, i32 2, i32 2, i32 2>,
	<5 x i32><i32 3, i32 3, i32 3, i32 3, i32 3>,			<5 x i32><i32 3, i32 3, i32 3, i32 3, i32 3>,
	<5 x i32><i32 4, i32 4, i32 4, i32 4, i32 4>,			<5 x i32><i32 4, i32 4, i32 4, i32 4, i32 4>,
	<5 x i32><i32 5, i32 5, i32 5, i32 5, i32 5>,			<5 x i32><i32 5, i32 5, i32 5, i32 5, i32 5>,
	<5 x i32><i32 6, i32 7, i32 8, i32 9, i32 10>,			<5 x i32><i32 6, i32 7, i32 8, i32 9, i32 10>,
	<5 x i32><i32 11, i32 12, i32 13, i32 14, i32 15>)			<5 x i32><i32 11, i32 12, i32 13, i32 14, i32 15>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @stack_8xv5f32() #0 {			define amdgpu_gfx void @stack_8xv5f32() #0 {
	; GFX9-LABEL: stack_8xv5f32:			; GFX9-LABEL: stack_8xv5f32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41000000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41000000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41100000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41100000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41200000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41200000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_8xv5f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_8xv5f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: stack_8xv5f32:			; GFX10-LABEL: stack_8xv5f32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x41000000			; GFX10-NEXT: v_mov_b32_e32 v0, 0x41000000
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x41100000			; GFX10-NEXT: v_mov_b32_e32 v1, 0x41100000
	; GFX10-NEXT: v_mov_b32_e32 v2, 0x41200000			; GFX10-NEXT: v_mov_b32_e32 v2, 0x41200000
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x41300000			; GFX10-NEXT: v_mov_b32_e32 v0, 0x41300000
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x41400000			; GFX10-NEXT: v_mov_b32_e32 v1, 0x41400000
	; GFX10-NEXT: v_mov_b32_e32 v2, 0x41500000			; GFX10-NEXT: v_mov_b32_e32 v2, 0x41500000
	Show All 40 Lines
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_8xv5f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_8xv5f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5f32@rel32@hi+12
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: stack_8xv5f32:			; GFX10-SCRATCH-LABEL: stack_8xv5f32:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
				; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s32 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x41400000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x41400000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x41500000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x41500000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0x41600000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0x41600000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x41700000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x41700000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0x41000000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0x41000000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x41100000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x41100000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 0x41200000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 0x41200000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 0x41300000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 0x41300000
				; GFX10-SCRATCH-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16			; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32			; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	Show All 29 Lines
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_8xv5f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_8xv5f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5f32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: s_clause 0x1
				; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32
				; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s32 offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx void @external_void_func_8xv5f32(			call amdgpu_gfx void @external_void_func_8xv5f32(
	<5 x float><float 0.0, float 0.0, float 0.0, float 0.0, float 0.0>,			<5 x float><float 0.0, float 0.0, float 0.0, float 0.0, float 0.0>,
	<5 x float><float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>,			<5 x float><float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>,
	Show All 21 Lines

llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX9 %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX9 %s
	; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1010 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX10 %s			; RUN: llc -mtriple=amdgcn--amdpal -mcpu=gfx1010 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX10 %s

	declare hidden amdgpu_gfx void @external_void_func_void() #0			declare hidden amdgpu_gfx void @external_void_func_void() #0

	define amdgpu_gfx void @test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {			define amdgpu_gfx void @test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {
	; GFX9-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:			; GFX9-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s5, 1			; GFX9-NEXT: v_writelane_b32 v40, s5, 1
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:			; GFX10-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s5, 1			; GFX10-NEXT: v_writelane_b32 v40, s5, 1
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 3			; GFX10-NEXT: v_readlane_b32 s31, v40, 3
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "", ""() #0			call void asm sideeffect "", ""() #0
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	ret void			ret void
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	}			}

	define amdgpu_gfx void @test_call_void_func_void_mayclobber_s31(i32 addrspace(1)* %out) #0 {			define amdgpu_gfx void @test_call_void_func_void_mayclobber_s31(i32 addrspace(1)* %out) #0 {
	; GFX9-LABEL: test_call_void_func_void_mayclobber_s31:			; GFX9-LABEL: test_call_void_func_void_mayclobber_s31:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s31			; GFX9-NEXT: ; def s31
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_mov_b32 s4, s31			; GFX9-NEXT: s_mov_b32 s4, s31
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: s_mov_b32 s31, s4			; GFX9-NEXT: s_mov_b32 s31, s4
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use s31			; GFX9-NEXT: ; use s31
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_mayclobber_s31:			; GFX10-LABEL: test_call_void_func_void_mayclobber_s31:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s31			; GFX10-NEXT: ; def s31
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_mov_b32 s4, s31			; GFX10-NEXT: s_mov_b32 s4, s31
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: s_mov_b32 s31, s4			; GFX10-NEXT: s_mov_b32 s31, s4
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s31			; GFX10-NEXT: ; use s31
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	%s31 = call i32 asm sideeffect "; def $0", "={s31}"()			%s31 = call i32 asm sideeffect "; def $0", "={s31}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{s31}"(i32 %s31)			call void asm sideeffect "; use $0", "{s31}"(i32 %s31)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_void_func_void_mayclobber_v31(i32 addrspace(1)* %out) #0 {			define amdgpu_gfx void @test_call_void_func_void_mayclobber_v31(i32 addrspace(1)* %out) #0 {
	; GFX9-LABEL: test_call_void_func_void_mayclobber_v31:			; GFX9-LABEL: test_call_void_func_void_mayclobber_v31:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v42, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def v31			; GFX9-NEXT: ; def v31
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_mov_b32_e32 v41, v31			; GFX9-NEXT: v_mov_b32_e32 v41, v31
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_mov_b32_e32 v31, v41			; GFX9-NEXT: v_mov_b32_e32 v31, v41
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v31			; GFX9-NEXT: ; use v31
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v42, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_mayclobber_v31:			; GFX10-LABEL: test_call_void_func_void_mayclobber_v31:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v42, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def v31			; GFX10-NEXT: ; def v31
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v41, v31			; GFX10-NEXT: v_mov_b32_e32 v41, v31
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_mov_b32_e32 v31, v41			; GFX10-NEXT: v_mov_b32_e32 v31, v41
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v31			; GFX10-NEXT: ; use v31
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v42, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4
				; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	%v31 = call i32 asm sideeffect "; def $0", "={v31}"()			%v31 = call i32 asm sideeffect "; def $0", "={v31}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{v31}"(i32 %v31)			call void asm sideeffect "; use $0", "{v31}"(i32 %v31)
	ret void			ret void
	}			}


	define amdgpu_gfx void @test_call_void_func_void_preserves_s33(i32 addrspace(1)* %out) #0 {			define amdgpu_gfx void @test_call_void_func_void_preserves_s33(i32 addrspace(1)* %out) #0 {
	; GFX9-LABEL: test_call_void_func_void_preserves_s33:			; GFX9-LABEL: test_call_void_func_void_preserves_s33:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s33			; GFX9-NEXT: ; def s33
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_mov_b32 s4, s33			; GFX9-NEXT: s_mov_b32 s4, s33
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use s33			; GFX9-NEXT: ; use s33
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_preserves_s33:			; GFX10-LABEL: test_call_void_func_void_preserves_s33:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s33			; GFX10-NEXT: ; def s33
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, s33			; GFX10-NEXT: s_mov_b32 s4, s33
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: s_mov_b32 s33, s4			; GFX10-NEXT: s_mov_b32 s33, s4
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s33			; GFX10-NEXT: ; use s33
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	%s33 = call i32 asm sideeffect "; def $0", "={s33}"()			%s33 = call i32 asm sideeffect "; def $0", "={s33}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{s33}"(i32 %s33)			call void asm sideeffect "; use $0", "{s33}"(i32 %s33)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_void_func_void_preserves_s34(i32 addrspace(1)* %out) #0 {			define amdgpu_gfx void @test_call_void_func_void_preserves_s34(i32 addrspace(1)* %out) #0 {
	; GFX9-LABEL: test_call_void_func_void_preserves_s34:			; GFX9-LABEL: test_call_void_func_void_preserves_s34:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s34			; GFX9-NEXT: ; def s34
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: s_mov_b32 s4, s34			; GFX9-NEXT: s_mov_b32 s4, s34
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: s_mov_b32 s34, s4			; GFX9-NEXT: s_mov_b32 s34, s4
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use s34			; GFX9-NEXT: ; use s34
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_preserves_s34:			; GFX10-LABEL: test_call_void_func_void_preserves_s34:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[36:37]			; GFX10-NEXT: s_getpc_b64 s[36:37]
	; GFX10-NEXT: s_add_u32 s36, s36, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s36, s36, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s37, s37, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s37, s37, external_void_func_void@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s34			; GFX10-NEXT: ; def s34
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, s34			; GFX10-NEXT: s_mov_b32 s4, s34
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[36:37]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[36:37]
	; GFX10-NEXT: s_mov_b32 s34, s4			; GFX10-NEXT: s_mov_b32 s34, s4
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s34			; GFX10-NEXT: ; use s34
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	%s34 = call i32 asm sideeffect "; def $0", "={s34}"()			%s34 = call i32 asm sideeffect "; def $0", "={s34}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{s34}"(i32 %s34)			call void asm sideeffect "; use $0", "{s34}"(i32 %s34)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_void_func_void_preserves_v40(i32 addrspace(1)* %out) #0 {			define amdgpu_gfx void @test_call_void_func_void_preserves_v40(i32 addrspace(1)* %out) #0 {
	; GFX9-LABEL: test_call_void_func_void_preserves_v40:			; GFX9-LABEL: test_call_void_func_void_preserves_v40:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v41, s33, 2			; GFX9-NEXT: v_writelane_b32 v42, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v41, s30, 0			; GFX9-NEXT: v_writelane_b32 v41, s30, 0
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: v_writelane_b32 v41, s31, 1			; GFX9-NEXT: v_writelane_b32 v41, s31, 1
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def v40			; GFX9-NEXT: ; def v40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v40			; GFX9-NEXT: ; use v40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v41, 1			; GFX9-NEXT: v_readlane_b32 s31, v41, 1
	; GFX9-NEXT: v_readlane_b32 s30, v41, 0			; GFX9-NEXT: v_readlane_b32 s30, v41, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v41, 2			; GFX9-NEXT: v_readlane_b32 s33, v42, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_preserves_v40:			; GFX10-LABEL: test_call_void_func_void_preserves_v40:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v41, s33, 2			; GFX10-NEXT: v_writelane_b32 v41, s30, 0
				; GFX10-NEXT: v_writelane_b32 v42, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
				; GFX10-NEXT: v_writelane_b32 v41, s31, 1
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def v40			; GFX10-NEXT: ; def v40
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v41, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v41, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v40			; GFX10-NEXT: ; use v40
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: v_readlane_b32 s31, v41, 1			; GFX10-NEXT: v_readlane_b32 s31, v41, 1
	; GFX10-NEXT: v_readlane_b32 s30, v41, 0			; GFX10-NEXT: v_readlane_b32 s30, v41, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v41, 2			; GFX10-NEXT: v_readlane_b32 s33, v42, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
				; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	%v40 = call i32 asm sideeffect "; def $0", "={v40}"()			%v40 = call i32 asm sideeffect "; def $0", "={v40}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{v40}"(i32 %v40)			call void asm sideeffect "; use $0", "{v40}"(i32 %v40)
	ret void			ret void
	▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
	}			}

	define amdgpu_gfx void @test_call_void_func_void_clobber_s33() #0 {			define amdgpu_gfx void @test_call_void_func_void_clobber_s33() #0 {
	; GFX9-LABEL: test_call_void_func_void_clobber_s33:			; GFX9-LABEL: test_call_void_func_void_clobber_s33:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_clobber_s33:			; GFX10-LABEL: test_call_void_func_void_clobber_s33:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s33@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @void_func_void_clobber_s33()			call amdgpu_gfx void @void_func_void_clobber_s33()
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_void_func_void_clobber_s34() #0 {			define amdgpu_gfx void @test_call_void_func_void_clobber_s34() #0 {
	; GFX9-LABEL: test_call_void_func_void_clobber_s34:			; GFX9-LABEL: test_call_void_func_void_clobber_s34:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_clobber_s34:			; GFX10-LABEL: test_call_void_func_void_clobber_s34:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, void_func_void_clobber_s34@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @void_func_void_clobber_s34()			call amdgpu_gfx void @void_func_void_clobber_s34()
	ret void			ret void
	}			}

	define amdgpu_gfx void @callee_saved_sgpr_kernel() #1 {			define amdgpu_gfx void @callee_saved_sgpr_kernel() #1 {
	; GFX9-LABEL: callee_saved_sgpr_kernel:			; GFX9-LABEL: callee_saved_sgpr_kernel:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v41, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s40			; GFX9-NEXT: ; def s40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_mov_b32 s4, s40			; GFX9-NEXT: s_mov_b32 s4, s40
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use s4			; GFX9-NEXT: ; use s4
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: callee_saved_sgpr_kernel:			; GFX10-LABEL: callee_saved_sgpr_kernel:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v41, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s40			; GFX10-NEXT: ; def s40
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, s40			; GFX10-NEXT: s_mov_b32 s4, s40
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s4			; GFX10-NEXT: ; use s4
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0			%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "s"(i32 %s40) #0			call void asm sideeffect "; use $0", "s"(i32 %s40) #0
	ret void			ret void
	}			}

	define amdgpu_gfx void @callee_saved_sgpr_vgpr_kernel() #1 {			define amdgpu_gfx void @callee_saved_sgpr_vgpr_kernel() #1 {
	; GFX9-LABEL: callee_saved_sgpr_vgpr_kernel:			; GFX9-LABEL: callee_saved_sgpr_vgpr_kernel:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s4, 0			; GFX9-NEXT: v_writelane_b32 v40, s4, 0
				; GFX9-NEXT: v_writelane_b32 v42, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s40			; GFX9-NEXT: ; def s40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	Show All 12 Lines
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v41			; GFX9-NEXT: ; use v41
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v42, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: callee_saved_sgpr_vgpr_kernel:			; GFX10-LABEL: callee_saved_sgpr_vgpr_kernel:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s4, 0
				; GFX10-NEXT: v_writelane_b32 v42, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
				; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s40			; GFX10-NEXT: ; def s40
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v40, s4, 0
	; GFX10-NEXT: s_mov_b32 s4, s40			; GFX10-NEXT: s_mov_b32 s4, s40
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def v32			; GFX10-NEXT: ; def v32
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_mov_b32_e32 v41, v32			; GFX10-NEXT: v_mov_b32_e32 v41, v32
				; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s4			; GFX10-NEXT: ; use s4
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v41			; GFX10-NEXT: ; use v41
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v42, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4
				; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0			%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0
	%v32 = call i32 asm sideeffect "; def v32", "={v32}"() #0			%v32 = call i32 asm sideeffect "; def v32", "={v32}"() #0
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "s"(i32 %s40) #0			call void asm sideeffect "; use $0", "s"(i32 %s40) #0
	call void asm sideeffect "; use $0", "v"(i32 %v32) #0			call void asm sideeffect "; use $0", "v"(i32 %v32) #0
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind noinline }			attributes #1 = { nounwind noinline }

llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll

	Show All 20 Lines

	define amdgpu_gfx void @call_i1() #0 {			define amdgpu_gfx void @call_i1() #0 {
	; GFX9-LABEL: call_i1:			; GFX9-LABEL: call_i1:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v1, s33, 2			; GFX9-NEXT: s_mov_b32 s36, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_i1@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_i1@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_i1@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_i1@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v1, s30, 0			; GFX9-NEXT: v_writelane_b32 v1, s30, 0
	; GFX9-NEXT: v_writelane_b32 v1, s31, 1			; GFX9-NEXT: v_writelane_b32 v1, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v1, 1			; GFX9-NEXT: v_readlane_b32 s31, v1, 1
	; GFX9-NEXT: v_readlane_b32 s30, v1, 0			; GFX9-NEXT: v_readlane_b32 s30, v1, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v1, 2			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_i1:			; GFX10-LABEL: call_i1:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v1, s33, 2			; GFX10-NEXT: s_mov_b32 s36, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_i1@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_i1@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_i1@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_i1@gotpcrel32@hi+12
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v1, s30, 0			; GFX10-NEXT: v_writelane_b32 v1, s30, 0
				; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v1, s31, 1			; GFX10-NEXT: v_writelane_b32 v1, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v1, 1			; GFX10-NEXT: v_readlane_b32 s31, v1, 1
	; GFX10-NEXT: v_readlane_b32 s30, v1, 0			; GFX10-NEXT: v_readlane_b32 s30, v1, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v1, 2			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx i1 @return_i1()			call amdgpu_gfx i1 @return_i1()
	Show All 19 Lines

	define amdgpu_gfx void @call_i16() #0 {			define amdgpu_gfx void @call_i16() #0 {
	; GFX9-LABEL: call_i16:			; GFX9-LABEL: call_i16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v1, s33, 2			; GFX9-NEXT: s_mov_b32 s36, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_i16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_i16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_i16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_i16@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v1, s30, 0			; GFX9-NEXT: v_writelane_b32 v1, s30, 0
	; GFX9-NEXT: v_writelane_b32 v1, s31, 1			; GFX9-NEXT: v_writelane_b32 v1, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v1, 1			; GFX9-NEXT: v_readlane_b32 s31, v1, 1
	; GFX9-NEXT: v_readlane_b32 s30, v1, 0			; GFX9-NEXT: v_readlane_b32 s30, v1, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v1, 2			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_i16:			; GFX10-LABEL: call_i16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v1, s33, 2			; GFX10-NEXT: s_mov_b32 s36, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_i16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_i16@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_i16@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_i16@gotpcrel32@hi+12
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v1, s30, 0			; GFX10-NEXT: v_writelane_b32 v1, s30, 0
				; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v1, s31, 1			; GFX10-NEXT: v_writelane_b32 v1, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v1, 1			; GFX10-NEXT: v_readlane_b32 s31, v1, 1
	; GFX10-NEXT: v_readlane_b32 s30, v1, 0			; GFX10-NEXT: v_readlane_b32 s30, v1, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v1, 2			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx i16 @return_i16()			call amdgpu_gfx i16 @return_i16()
	Show All 19 Lines

	define amdgpu_gfx void @call_2xi16() #0 {			define amdgpu_gfx void @call_2xi16() #0 {
	; GFX9-LABEL: call_2xi16:			; GFX9-LABEL: call_2xi16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v1, s33, 2			; GFX9-NEXT: s_mov_b32 s36, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_2xi16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_2xi16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_2xi16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_2xi16@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v1, s30, 0			; GFX9-NEXT: v_writelane_b32 v1, s30, 0
	; GFX9-NEXT: v_writelane_b32 v1, s31, 1			; GFX9-NEXT: v_writelane_b32 v1, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v1, 1			; GFX9-NEXT: v_readlane_b32 s31, v1, 1
	; GFX9-NEXT: v_readlane_b32 s30, v1, 0			; GFX9-NEXT: v_readlane_b32 s30, v1, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v1, 2			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_2xi16:			; GFX10-LABEL: call_2xi16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v1, s33, 2			; GFX10-NEXT: s_mov_b32 s36, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_2xi16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_2xi16@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_2xi16@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_2xi16@gotpcrel32@hi+12
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v1, s30, 0			; GFX10-NEXT: v_writelane_b32 v1, s30, 0
				; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v1, s31, 1			; GFX10-NEXT: v_writelane_b32 v1, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v1, 1			; GFX10-NEXT: v_readlane_b32 s31, v1, 1
	; GFX10-NEXT: v_readlane_b32 s30, v1, 0			; GFX10-NEXT: v_readlane_b32 s30, v1, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v1, 2			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx <2 x i16> @return_2xi16()			call amdgpu_gfx <2 x i16> @return_2xi16()
	Show All 21 Lines

	define amdgpu_gfx void @call_3xi16() #0 {			define amdgpu_gfx void @call_3xi16() #0 {
	; GFX9-LABEL: call_3xi16:			; GFX9-LABEL: call_3xi16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v2, s33, 2			; GFX9-NEXT: s_mov_b32 s36, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_3xi16@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_3xi16@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_3xi16@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_3xi16@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v2, s30, 0			; GFX9-NEXT: v_writelane_b32 v2, s30, 0
	; GFX9-NEXT: v_writelane_b32 v2, s31, 1			; GFX9-NEXT: v_writelane_b32 v2, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v2, 1			; GFX9-NEXT: v_readlane_b32 s31, v2, 1
	; GFX9-NEXT: v_readlane_b32 s30, v2, 0			; GFX9-NEXT: v_readlane_b32 s30, v2, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v2, 2			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_3xi16:			; GFX10-LABEL: call_3xi16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v2, s33, 2			; GFX10-NEXT: s_mov_b32 s36, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_3xi16@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_3xi16@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_3xi16@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_3xi16@gotpcrel32@hi+12
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v2, s30, 0			; GFX10-NEXT: v_writelane_b32 v2, s30, 0
				; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_writelane_b32 v2, s31, 1			; GFX10-NEXT: v_writelane_b32 v2, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v2, 1			; GFX10-NEXT: v_readlane_b32 s31, v2, 1
	; GFX10-NEXT: v_readlane_b32 s30, v2, 0			; GFX10-NEXT: v_readlane_b32 s30, v2, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v2, 2			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx <3 x i16> @return_3xi16()			call amdgpu_gfx <3 x i16> @return_3xi16()
	▲ Show 20 Lines • Show All 1,047 Lines • ▼ Show 20 Lines

	define amdgpu_gfx void @call_512xi32() #0 {			define amdgpu_gfx void @call_512xi32() #0 {
	; GFX9-LABEL: call_512xi32:			; GFX9-LABEL: call_512xi32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:2048 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:2048 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: v_writelane_b32 v2, s33, 2			; GFX9-NEXT: s_mov_b32 s36, s33
	; GFX9-NEXT: s_add_i32 s33, s32, 0x1ffc0			; GFX9-NEXT: s_add_i32 s33, s32, 0x1ffc0
	; GFX9-NEXT: s_and_b32 s33, s33, 0xfffe0000			; GFX9-NEXT: s_and_b32 s33, s33, 0xfffe0000
	; GFX9-NEXT: s_add_i32 s32, s32, 0x60000			; GFX9-NEXT: s_add_i32 s32, s32, 0x60000
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, return_512xi32@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, return_512xi32@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, return_512xi32@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, return_512xi32@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX9-NEXT: v_writelane_b32 v2, s30, 0			; GFX9-NEXT: v_writelane_b32 v2, s30, 0
	; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33			; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33
	; GFX9-NEXT: v_writelane_b32 v2, s31, 1			; GFX9-NEXT: v_writelane_b32 v2, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v2, 1			; GFX9-NEXT: v_readlane_b32 s31, v2, 1
	; GFX9-NEXT: v_readlane_b32 s30, v2, 0			; GFX9-NEXT: v_readlane_b32 s30, v2, 0
	; GFX9-NEXT: s_add_i32 s32, s32, 0xfffa0000			; GFX9-NEXT: s_add_i32 s32, s32, 0xfffa0000
	; GFX9-NEXT: v_readlane_b32 s33, v2, 2			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:2048 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:2048 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_512xi32:			; GFX10-LABEL: call_512xi32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:2048 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:2048 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: v_writelane_b32 v2, s33, 2			; GFX10-NEXT: s_mov_b32 s36, s33
	; GFX10-NEXT: s_add_i32 s33, s32, 0xffe0			; GFX10-NEXT: s_add_i32 s33, s32, 0xffe0
	; GFX10-NEXT: s_add_i32 s32, s32, 0x30000			; GFX10-NEXT: s_add_i32 s32, s32, 0x30000
	; GFX10-NEXT: s_and_b32 s33, s33, 0xffff0000			; GFX10-NEXT: s_and_b32 s33, s33, 0xffff0000
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, return_512xi32@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, return_512xi32@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, return_512xi32@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, return_512xi32@gotpcrel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v2, s30, 0			; GFX10-NEXT: v_writelane_b32 v2, s30, 0
	; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0			; GFX10-NEXT: s_load_dwordx2 s[34:35], s[34:35], 0x0
	; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33			; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33
	; GFX10-NEXT: v_writelane_b32 v2, s31, 1			; GFX10-NEXT: v_writelane_b32 v2, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v2, 1			; GFX10-NEXT: v_readlane_b32 s31, v2, 1
	; GFX10-NEXT: v_readlane_b32 s30, v2, 0			; GFX10-NEXT: v_readlane_b32 s30, v2, 0
	; GFX10-NEXT: s_add_i32 s32, s32, 0xfffd0000			; GFX10-NEXT: s_add_i32 s32, s32, 0xfffd0000
	; GFX10-NEXT: v_readlane_b32 s33, v2, 2			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:2048 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:2048 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx <512 x i32> @return_512xi32()			call amdgpu_gfx <512 x i32> @return_512xi32()
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/test/CodeGen/AMDGPU/indirect-call.ll

	Show First 20 Lines • Show All 389 Lines • ▼ Show 20 Lines
	}			}

	define void @test_indirect_call_vgpr_ptr(void()* %fptr) {			define void @test_indirect_call_vgpr_ptr(void()* %fptr) {
	; GCN-LABEL: test_indirect_call_vgpr_ptr:			; GCN-LABEL: test_indirect_call_vgpr_ptr:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 17			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v40, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v40, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v40, s36, 4
	; GCN-NEXT: v_writelane_b32 v40, s37, 5			; GCN-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s38, v40, 6			; GCN-NEXT: v_readlane_b32 s38, v40, 6
	; GCN-NEXT: v_readlane_b32 s37, v40, 5			; GCN-NEXT: v_readlane_b32 s37, v40, 5
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 17			; GCN-NEXT: v_readlane_b32 s33, v41, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr:			; GISEL-LABEL: test_indirect_call_vgpr_ptr:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1			; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[16:17]			; GISEL-NEXT: s_mov_b64 exec, s[16:17]
	; GISEL-NEXT: v_writelane_b32 v40, s33, 17			; GISEL-NEXT: v_writelane_b32 v41, s33, 0
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_addk_i32 s32, 0x400			; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: v_writelane_b32 v40, s30, 0			; GISEL-NEXT: v_writelane_b32 v40, s30, 0
	; GISEL-NEXT: v_writelane_b32 v40, s31, 1			; GISEL-NEXT: v_writelane_b32 v40, s31, 1
	; GISEL-NEXT: v_writelane_b32 v40, s34, 2			; GISEL-NEXT: v_writelane_b32 v40, s34, 2
	; GISEL-NEXT: v_writelane_b32 v40, s35, 3			; GISEL-NEXT: v_writelane_b32 v40, s35, 3
	; GISEL-NEXT: v_writelane_b32 v40, s36, 4			; GISEL-NEXT: v_writelane_b32 v40, s36, 4
	; GISEL-NEXT: v_writelane_b32 v40, s37, 5			; GISEL-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s38, v40, 6			; GISEL-NEXT: v_readlane_b32 s38, v40, 6
	; GISEL-NEXT: v_readlane_b32 s37, v40, 5			; GISEL-NEXT: v_readlane_b32 s37, v40, 5
	; GISEL-NEXT: v_readlane_b32 s36, v40, 4			; GISEL-NEXT: v_readlane_b32 s36, v40, 4
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: v_readlane_b32 s33, v40, 17			; GISEL-NEXT: v_readlane_b32 s33, v41, 0
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	call void %fptr()			call void %fptr()
	ret void			ret void
	}			}

	define void @test_indirect_call_vgpr_ptr_arg(void(i32)* %fptr) {			define void @test_indirect_call_vgpr_ptr_arg(void(i32)* %fptr) {
	; GCN-LABEL: test_indirect_call_vgpr_ptr_arg:			; GCN-LABEL: test_indirect_call_vgpr_ptr_arg:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 17			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v40, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v40, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v40, s36, 4
	; GCN-NEXT: v_writelane_b32 v40, s37, 5			; GCN-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s38, v40, 6			; GCN-NEXT: v_readlane_b32 s38, v40, 6
	; GCN-NEXT: v_readlane_b32 s37, v40, 5			; GCN-NEXT: v_readlane_b32 s37, v40, 5
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 17			; GCN-NEXT: v_readlane_b32 s33, v41, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg:			; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1			; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[16:17]			; GISEL-NEXT: s_mov_b64 exec, s[16:17]
	; GISEL-NEXT: v_writelane_b32 v40, s33, 17			; GISEL-NEXT: v_writelane_b32 v41, s33, 0
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_addk_i32 s32, 0x400			; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: v_writelane_b32 v40, s30, 0			; GISEL-NEXT: v_writelane_b32 v40, s30, 0
	; GISEL-NEXT: v_writelane_b32 v40, s31, 1			; GISEL-NEXT: v_writelane_b32 v40, s31, 1
	; GISEL-NEXT: v_writelane_b32 v40, s34, 2			; GISEL-NEXT: v_writelane_b32 v40, s34, 2
	; GISEL-NEXT: v_writelane_b32 v40, s35, 3			; GISEL-NEXT: v_writelane_b32 v40, s35, 3
	; GISEL-NEXT: v_writelane_b32 v40, s36, 4			; GISEL-NEXT: v_writelane_b32 v40, s36, 4
	; GISEL-NEXT: v_writelane_b32 v40, s37, 5			; GISEL-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s38, v40, 6			; GISEL-NEXT: v_readlane_b32 s38, v40, 6
	; GISEL-NEXT: v_readlane_b32 s37, v40, 5			; GISEL-NEXT: v_readlane_b32 s37, v40, 5
	; GISEL-NEXT: v_readlane_b32 s36, v40, 4			; GISEL-NEXT: v_readlane_b32 s36, v40, 4
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: v_readlane_b32 s33, v40, 17			; GISEL-NEXT: v_readlane_b32 s33, v41, 0
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	call void %fptr(i32 123)			call void %fptr(i32 123)
	ret void			ret void
	}			}

	define i32 @test_indirect_call_vgpr_ptr_ret(i32()* %fptr) {			define i32 @test_indirect_call_vgpr_ptr_ret(i32()* %fptr) {
	; GCN-LABEL: test_indirect_call_vgpr_ptr_ret:			; GCN-LABEL: test_indirect_call_vgpr_ptr_ret:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 17			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v40, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v40, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v40, s36, 4
	; GCN-NEXT: v_writelane_b32 v40, s37, 5			; GCN-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s38, v40, 6			; GCN-NEXT: v_readlane_b32 s38, v40, 6
	; GCN-NEXT: v_readlane_b32 s37, v40, 5			; GCN-NEXT: v_readlane_b32 s37, v40, 5
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 17			; GCN-NEXT: v_readlane_b32 s33, v41, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr_ret:			; GISEL-LABEL: test_indirect_call_vgpr_ptr_ret:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1			; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[16:17]			; GISEL-NEXT: s_mov_b64 exec, s[16:17]
	; GISEL-NEXT: v_writelane_b32 v40, s33, 17			; GISEL-NEXT: v_writelane_b32 v41, s33, 0
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_addk_i32 s32, 0x400			; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: v_writelane_b32 v40, s30, 0			; GISEL-NEXT: v_writelane_b32 v40, s30, 0
	; GISEL-NEXT: v_writelane_b32 v40, s31, 1			; GISEL-NEXT: v_writelane_b32 v40, s31, 1
	; GISEL-NEXT: v_writelane_b32 v40, s34, 2			; GISEL-NEXT: v_writelane_b32 v40, s34, 2
	; GISEL-NEXT: v_writelane_b32 v40, s35, 3			; GISEL-NEXT: v_writelane_b32 v40, s35, 3
	; GISEL-NEXT: v_writelane_b32 v40, s36, 4			; GISEL-NEXT: v_writelane_b32 v40, s36, 4
	; GISEL-NEXT: v_writelane_b32 v40, s37, 5			; GISEL-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s38, v40, 6			; GISEL-NEXT: v_readlane_b32 s38, v40, 6
	; GISEL-NEXT: v_readlane_b32 s37, v40, 5			; GISEL-NEXT: v_readlane_b32 s37, v40, 5
	; GISEL-NEXT: v_readlane_b32 s36, v40, 4			; GISEL-NEXT: v_readlane_b32 s36, v40, 4
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: v_readlane_b32 s33, v40, 17			; GISEL-NEXT: v_readlane_b32 s33, v41, 0
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	%a = call i32 %fptr()			%a = call i32 %fptr()
	%b = add i32 %a, 1			%b = add i32 %a, 1
	ret i32 %b			ret i32 %b
	}			}

	define void @test_indirect_call_vgpr_ptr_in_branch(void()* %fptr, i1 %cond) {			define void @test_indirect_call_vgpr_ptr_in_branch(void()* %fptr, i1 %cond) {
	; GCN-LABEL: test_indirect_call_vgpr_ptr_in_branch:			; GCN-LABEL: test_indirect_call_vgpr_ptr_in_branch:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 19			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v40, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v40, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v40, s36, 4
	; GCN-NEXT: v_writelane_b32 v40, s37, 5			; GCN-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s38, v40, 6			; GCN-NEXT: v_readlane_b32 s38, v40, 6
	; GCN-NEXT: v_readlane_b32 s37, v40, 5			; GCN-NEXT: v_readlane_b32 s37, v40, 5
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 19			; GCN-NEXT: v_readlane_b32 s33, v41, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr_in_branch:			; GISEL-LABEL: test_indirect_call_vgpr_ptr_in_branch:
	; GISEL: ; %bb.0: ; %bb0			; GISEL: ; %bb.0: ; %bb0
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1			; GISEL-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[16:17]			; GISEL-NEXT: s_mov_b64 exec, s[16:17]
	; GISEL-NEXT: v_writelane_b32 v40, s33, 19			; GISEL-NEXT: v_writelane_b32 v41, s33, 0
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_addk_i32 s32, 0x400			; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: v_writelane_b32 v40, s30, 0			; GISEL-NEXT: v_writelane_b32 v40, s30, 0
	; GISEL-NEXT: v_writelane_b32 v40, s31, 1			; GISEL-NEXT: v_writelane_b32 v40, s31, 1
	; GISEL-NEXT: v_writelane_b32 v40, s34, 2			; GISEL-NEXT: v_writelane_b32 v40, s34, 2
	; GISEL-NEXT: v_writelane_b32 v40, s35, 3			; GISEL-NEXT: v_writelane_b32 v40, s35, 3
	; GISEL-NEXT: v_writelane_b32 v40, s36, 4			; GISEL-NEXT: v_writelane_b32 v40, s36, 4
	; GISEL-NEXT: v_writelane_b32 v40, s37, 5			; GISEL-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s38, v40, 6			; GISEL-NEXT: v_readlane_b32 s38, v40, 6
	; GISEL-NEXT: v_readlane_b32 s37, v40, 5			; GISEL-NEXT: v_readlane_b32 s37, v40, 5
	; GISEL-NEXT: v_readlane_b32 s36, v40, 4			; GISEL-NEXT: v_readlane_b32 s36, v40, 4
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: v_readlane_b32 s33, v40, 19			; GISEL-NEXT: v_readlane_b32 s33, v41, 0
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	br i1 %cond, label %bb1, label %bb2			br i1 %cond, label %bb1, label %bb2

	bb1:			bb1:
	call void %fptr()			call void %fptr()
	br label %bb2			br label %bb2

	bb2:			bb2:
	ret void			ret void
	}			}

	define void @test_indirect_call_vgpr_ptr_inreg_arg(void(i32)* %fptr) {			define void @test_indirect_call_vgpr_ptr_inreg_arg(void(i32)* %fptr) {
	; GCN-LABEL: test_indirect_call_vgpr_ptr_inreg_arg:			; GCN-LABEL: test_indirect_call_vgpr_ptr_inreg_arg:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v40, s33, 32			; GCN-NEXT: s_mov_b32 s5, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v40, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v40, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v40, s36, 4
	; GCN-NEXT: v_writelane_b32 v40, s37, 5			; GCN-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s38, v40, 6			; GCN-NEXT: v_readlane_b32 s38, v40, 6
	; GCN-NEXT: v_readlane_b32 s37, v40, 5			; GCN-NEXT: v_readlane_b32 s37, v40, 5
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 32			; GCN-NEXT: s_mov_b32 s33, s5
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr_inreg_arg:			; GISEL-LABEL: test_indirect_call_vgpr_ptr_inreg_arg:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: v_writelane_b32 v40, s33, 32			; GISEL-NEXT: s_mov_b32 s5, s33
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_addk_i32 s32, 0x400			; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: v_writelane_b32 v40, s30, 0			; GISEL-NEXT: v_writelane_b32 v40, s30, 0
	; GISEL-NEXT: v_writelane_b32 v40, s31, 1			; GISEL-NEXT: v_writelane_b32 v40, s31, 1
	; GISEL-NEXT: v_writelane_b32 v40, s34, 2			; GISEL-NEXT: v_writelane_b32 v40, s34, 2
	; GISEL-NEXT: v_writelane_b32 v40, s35, 3			; GISEL-NEXT: v_writelane_b32 v40, s35, 3
	; GISEL-NEXT: v_writelane_b32 v40, s36, 4			; GISEL-NEXT: v_writelane_b32 v40, s36, 4
	; GISEL-NEXT: v_writelane_b32 v40, s37, 5			; GISEL-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s38, v40, 6			; GISEL-NEXT: v_readlane_b32 s38, v40, 6
	; GISEL-NEXT: v_readlane_b32 s37, v40, 5			; GISEL-NEXT: v_readlane_b32 s37, v40, 5
	; GISEL-NEXT: v_readlane_b32 s36, v40, 4			; GISEL-NEXT: v_readlane_b32 s36, v40, 4
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: v_readlane_b32 s33, v40, 32			; GISEL-NEXT: s_mov_b32 s33, s5
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void %fptr(i32 inreg 123)			call amdgpu_gfx void %fptr(i32 inreg 123)
	ret void			ret void
	}			}

	define i32 @test_indirect_call_vgpr_ptr_arg_and_reuse(i32 %i, void(i32)* %fptr) {			define i32 @test_indirect_call_vgpr_ptr_arg_and_reuse(i32 %i, void(i32)* %fptr) {
	; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:			; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v40, s33, 32			; GCN-NEXT: s_mov_b32 s10, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v40, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v40, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v40, s36, 4
	▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s37, v40, 5			; GCN-NEXT: v_readlane_b32 s37, v40, 5
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 32			; GCN-NEXT: s_mov_b32 s33, s10
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:			; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: v_writelane_b32 v40, s33, 32			; GISEL-NEXT: s_mov_b32 s10, s33
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_addk_i32 s32, 0x400			; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GISEL-NEXT: v_writelane_b32 v40, s30, 0			; GISEL-NEXT: v_writelane_b32 v40, s30, 0
	; GISEL-NEXT: v_writelane_b32 v40, s31, 1			; GISEL-NEXT: v_writelane_b32 v40, s31, 1
	; GISEL-NEXT: v_writelane_b32 v40, s34, 2			; GISEL-NEXT: v_writelane_b32 v40, s34, 2
	; GISEL-NEXT: v_writelane_b32 v40, s35, 3			; GISEL-NEXT: v_writelane_b32 v40, s35, 3
	; GISEL-NEXT: v_writelane_b32 v40, s36, 4			; GISEL-NEXT: v_writelane_b32 v40, s36, 4
	▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s37, v40, 5			; GISEL-NEXT: v_readlane_b32 s37, v40, 5
	; GISEL-NEXT: v_readlane_b32 s36, v40, 4			; GISEL-NEXT: v_readlane_b32 s36, v40, 4
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: v_readlane_b32 s33, v40, 32			; GISEL-NEXT: s_mov_b32 s33, s10
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void %fptr(i32 %i)			call amdgpu_gfx void %fptr(i32 %i)
	ret i32 %i			ret i32 %i
	}			}

	; Use a variable inside a waterfall loop and use the return variable after the loop.			; Use a variable inside a waterfall loop and use the return variable after the loop.
	; TODO The argument and return variable could be in the same physical register, but the register			; TODO The argument and return variable could be in the same physical register, but the register
	; allocator is not able to do that because the return value clashes with the liverange of an			; allocator is not able to do that because the return value clashes with the liverange of an
	; IMPLICIT_DEF of the argument.			; IMPLICIT_DEF of the argument.
	define i32 @test_indirect_call_vgpr_ptr_arg_and_return(i32 %i, i32(i32)* %fptr) {			define i32 @test_indirect_call_vgpr_ptr_arg_and_return(i32 %i, i32(i32)* %fptr) {
	; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:			; GCN-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v40, s33, 32			; GCN-NEXT: s_mov_b32 s10, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v40, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v40, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v40, s36, 4
	; GCN-NEXT: v_writelane_b32 v40, s37, 5			; GCN-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s38, v40, 6			; GCN-NEXT: v_readlane_b32 s38, v40, 6
	; GCN-NEXT: v_readlane_b32 s37, v40, 5			; GCN-NEXT: v_readlane_b32 s37, v40, 5
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 32			; GCN-NEXT: s_mov_b32 s33, s10
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:			; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: v_writelane_b32 v40, s33, 32			; GISEL-NEXT: s_mov_b32 s10, s33
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_addk_i32 s32, 0x400			; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: v_writelane_b32 v40, s30, 0			; GISEL-NEXT: v_writelane_b32 v40, s30, 0
	; GISEL-NEXT: v_writelane_b32 v40, s31, 1			; GISEL-NEXT: v_writelane_b32 v40, s31, 1
	; GISEL-NEXT: v_writelane_b32 v40, s34, 2			; GISEL-NEXT: v_writelane_b32 v40, s34, 2
	; GISEL-NEXT: v_writelane_b32 v40, s35, 3			; GISEL-NEXT: v_writelane_b32 v40, s35, 3
	; GISEL-NEXT: v_writelane_b32 v40, s36, 4			; GISEL-NEXT: v_writelane_b32 v40, s36, 4
	; GISEL-NEXT: v_writelane_b32 v40, s37, 5			; GISEL-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s38, v40, 6			; GISEL-NEXT: v_readlane_b32 s38, v40, 6
	; GISEL-NEXT: v_readlane_b32 s37, v40, 5			; GISEL-NEXT: v_readlane_b32 s37, v40, 5
	; GISEL-NEXT: v_readlane_b32 s36, v40, 4			; GISEL-NEXT: v_readlane_b32 s36, v40, 4
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: v_readlane_b32 s33, v40, 32			; GISEL-NEXT: s_mov_b32 s33, s10
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	%ret = call amdgpu_gfx i32 %fptr(i32 %i)			%ret = call amdgpu_gfx i32 %fptr(i32 %i)
	ret i32 %ret			ret i32 %ret
	}			}

	; Calling a vgpr can never be a tail call.			; Calling a vgpr can never be a tail call.
	define void @test_indirect_tail_call_vgpr_ptr(void()* %fptr) {			define void @test_indirect_tail_call_vgpr_ptr(void()* %fptr) {
	; GCN-LABEL: test_indirect_tail_call_vgpr_ptr:			; GCN-LABEL: test_indirect_tail_call_vgpr_ptr:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v40, s33, 32			; GCN-NEXT: s_mov_b32 s10, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_writelane_b32 v40, s34, 2			; GCN-NEXT: v_writelane_b32 v40, s34, 2
	; GCN-NEXT: v_writelane_b32 v40, s35, 3			; GCN-NEXT: v_writelane_b32 v40, s35, 3
	; GCN-NEXT: v_writelane_b32 v40, s36, 4			; GCN-NEXT: v_writelane_b32 v40, s36, 4
	; GCN-NEXT: v_writelane_b32 v40, s37, 5			; GCN-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s38, v40, 6			; GCN-NEXT: v_readlane_b32 s38, v40, 6
	; GCN-NEXT: v_readlane_b32 s37, v40, 5			; GCN-NEXT: v_readlane_b32 s37, v40, 5
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 32			; GCN-NEXT: s_mov_b32 s33, s10
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_tail_call_vgpr_ptr:			; GISEL-LABEL: test_indirect_tail_call_vgpr_ptr:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GISEL-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: v_writelane_b32 v40, s33, 32			; GISEL-NEXT: s_mov_b32 s10, s33
	; GISEL-NEXT: s_mov_b32 s33, s32			; GISEL-NEXT: s_mov_b32 s33, s32
	; GISEL-NEXT: s_addk_i32 s32, 0x400			; GISEL-NEXT: s_addk_i32 s32, 0x400
	; GISEL-NEXT: v_writelane_b32 v40, s30, 0			; GISEL-NEXT: v_writelane_b32 v40, s30, 0
	; GISEL-NEXT: v_writelane_b32 v40, s31, 1			; GISEL-NEXT: v_writelane_b32 v40, s31, 1
	; GISEL-NEXT: v_writelane_b32 v40, s34, 2			; GISEL-NEXT: v_writelane_b32 v40, s34, 2
	; GISEL-NEXT: v_writelane_b32 v40, s35, 3			; GISEL-NEXT: v_writelane_b32 v40, s35, 3
	; GISEL-NEXT: v_writelane_b32 v40, s36, 4			; GISEL-NEXT: v_writelane_b32 v40, s36, 4
	; GISEL-NEXT: v_writelane_b32 v40, s37, 5			; GISEL-NEXT: v_writelane_b32 v40, s37, 5
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s38, v40, 6			; GISEL-NEXT: v_readlane_b32 s38, v40, 6
	; GISEL-NEXT: v_readlane_b32 s37, v40, 5			; GISEL-NEXT: v_readlane_b32 s37, v40, 5
	; GISEL-NEXT: v_readlane_b32 s36, v40, 4			; GISEL-NEXT: v_readlane_b32 s36, v40, 4
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_addk_i32 s32, 0xfc00
	; GISEL-NEXT: v_readlane_b32 s33, v40, 32			; GISEL-NEXT: s_mov_b32 s33, s10
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	tail call amdgpu_gfx void %fptr()			tail call amdgpu_gfx void %fptr()
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

	Show First 20 Lines • Show All 184 Lines • ▼ Show 20 Lines
	}			}

	define void @slsr1_1(i32 %b.arg, i32 %s.arg) #0 {			define void @slsr1_1(i32 %b.arg, i32 %s.arg) #0 {
	; GFX9-LABEL: slsr1_1:			; GFX9-LABEL: slsr1_1:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
				; GFX9-NEXT: v_writelane_b32 v44, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x800			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: v_writelane_b32 v40, s34, 2			; GFX9-NEXT: v_writelane_b32 v40, s34, 2
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, foo@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, foo@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, foo@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, foo@gotpcrel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s35, 3			; GFX9-NEXT: v_writelane_b32 v40, s35, 3
	Show All 15 Lines
	; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s35, v40, 3			; GFX9-NEXT: v_readlane_b32 s35, v40, 3
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v44, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	%b = and i32 %b.arg, 16777215			%b = and i32 %b.arg, 16777215
	%s = and i32 %s.arg, 16777215			%s = and i32 %s.arg, 16777215

	; CHECK-LABEL: @slsr1(			; CHECK-LABEL: @slsr1(
	; foo(b * s);			; foo(b * s);
	Show All 26 Lines

llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll

	Show All 24 Lines
	; redundant spills of s33 or assert.			; redundant spills of s33 or assert.
	define internal fastcc void @csr_vgpr_spill_fp_callee() #0 {			define internal fastcc void @csr_vgpr_spill_fp_callee() #0 {
	; CHECK-LABEL: csr_vgpr_spill_fp_callee:			; CHECK-LABEL: csr_vgpr_spill_fp_callee:
	; CHECK: ; %bb.0: ; %bb			; CHECK: ; %bb.0: ; %bb
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: v_writelane_b32 v1, s33, 2			; CHECK-NEXT: s_mov_b32 s6, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_add_i32 s32, s32, 0x400			; CHECK-NEXT: s_add_i32 s32, s32, 0x400
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; CHECK-NEXT: v_writelane_b32 v1, s30, 0			; CHECK-NEXT: v_writelane_b32 v1, s30, 0
	; CHECK-NEXT: v_writelane_b32 v1, s31, 1			; CHECK-NEXT: v_writelane_b32 v1, s31, 1
	; CHECK-NEXT: s_getpc_b64 s[4:5]			; CHECK-NEXT: s_getpc_b64 s[4:5]
	; CHECK-NEXT: s_add_u32 s4, s4, callee_has_fp@rel32@lo+4			; CHECK-NEXT: s_add_u32 s4, s4, callee_has_fp@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s5, s5, callee_has_fp@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s5, s5, callee_has_fp@rel32@hi+12
	; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]			; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]			; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]
	; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]			; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]			; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; clobber csr v40			; CHECK-NEXT: ; clobber csr v40
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: v_readlane_b32 s31, v1, 1			; CHECK-NEXT: v_readlane_b32 s31, v1, 1
	; CHECK-NEXT: v_readlane_b32 s30, v1, 0			; CHECK-NEXT: v_readlane_b32 s30, v1, 0
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00
	; CHECK-NEXT: v_readlane_b32 s33, v1, 2			; CHECK-NEXT: s_mov_b32 s33, s6
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	call fastcc void @callee_has_fp()			call fastcc void @callee_has_fp()
	call void asm sideeffect "; clobber csr v40", "~{v40}"()			call void asm sideeffect "; clobber csr v40", "~{v40}"()
	▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines

	define hidden i32 @caller_save_vgpr_spill_fp_tail_call() #0 {			define hidden i32 @caller_save_vgpr_spill_fp_tail_call() #0 {
	; CHECK-LABEL: caller_save_vgpr_spill_fp_tail_call:			; CHECK-LABEL: caller_save_vgpr_spill_fp_tail_call:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: v_writelane_b32 v1, s33, 2			; CHECK-NEXT: s_mov_b32 s6, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_add_i32 s32, s32, 0x400			; CHECK-NEXT: s_add_i32 s32, s32, 0x400
	; CHECK-NEXT: v_writelane_b32 v1, s30, 0			; CHECK-NEXT: v_writelane_b32 v1, s30, 0
	; CHECK-NEXT: v_writelane_b32 v1, s31, 1			; CHECK-NEXT: v_writelane_b32 v1, s31, 1
	; CHECK-NEXT: s_getpc_b64 s[4:5]			; CHECK-NEXT: s_getpc_b64 s[4:5]
	; CHECK-NEXT: s_add_u32 s4, s4, tail_call@rel32@lo+4			; CHECK-NEXT: s_add_u32 s4, s4, tail_call@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s5, s5, tail_call@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s5, s5, tail_call@rel32@hi+12
	; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]			; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]			; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]
	; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]			; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]			; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; CHECK-NEXT: v_readlane_b32 s31, v1, 1			; CHECK-NEXT: v_readlane_b32 s31, v1, 1
	; CHECK-NEXT: v_readlane_b32 s30, v1, 0			; CHECK-NEXT: v_readlane_b32 s30, v1, 0
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00
	; CHECK-NEXT: v_readlane_b32 s33, v1, 2			; CHECK-NEXT: s_mov_b32 s33, s6
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%call = call i32 @tail_call()			%call = call i32 @tail_call()
	ret i32 %call			ret i32 %call
	}			}

	define hidden i32 @caller_save_vgpr_spill_fp() #0 {			define hidden i32 @caller_save_vgpr_spill_fp() #0 {
	; CHECK-LABEL: caller_save_vgpr_spill_fp:			; CHECK-LABEL: caller_save_vgpr_spill_fp:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: v_writelane_b32 v2, s33, 2			; CHECK-NEXT: s_mov_b32 s7, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_add_i32 s32, s32, 0x400			; CHECK-NEXT: s_add_i32 s32, s32, 0x400
	; CHECK-NEXT: v_writelane_b32 v2, s30, 0			; CHECK-NEXT: v_writelane_b32 v2, s30, 0
	; CHECK-NEXT: v_writelane_b32 v2, s31, 1			; CHECK-NEXT: v_writelane_b32 v2, s31, 1
	; CHECK-NEXT: s_getpc_b64 s[4:5]			; CHECK-NEXT: s_getpc_b64 s[4:5]
	; CHECK-NEXT: s_add_u32 s4, s4, caller_save_vgpr_spill_fp_tail_call@rel32@lo+4			; CHECK-NEXT: s_add_u32 s4, s4, caller_save_vgpr_spill_fp_tail_call@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s5, s5, caller_save_vgpr_spill_fp_tail_call@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s5, s5, caller_save_vgpr_spill_fp_tail_call@rel32@hi+12
	; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]			; CHECK-NEXT: s_mov_b64 s[10:11], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]			; CHECK-NEXT: s_mov_b64 s[8:9], s[0:1]
	; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]			; CHECK-NEXT: s_mov_b64 s[0:1], s[8:9]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]			; CHECK-NEXT: s_mov_b64 s[2:3], s[10:11]
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; CHECK-NEXT: v_readlane_b32 s31, v2, 1			; CHECK-NEXT: v_readlane_b32 s31, v2, 1
	; CHECK-NEXT: v_readlane_b32 s30, v2, 0			; CHECK-NEXT: v_readlane_b32 s30, v2, 0
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00
	; CHECK-NEXT: v_readlane_b32 s33, v2, 2			; CHECK-NEXT: s_mov_b32 s33, s7
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%call = call i32 @caller_save_vgpr_spill_fp_tail_call()			%call = call i32 @caller_save_vgpr_spill_fp_tail_call()
	ret i32 %call			ret i32 %call
	Show All 26 Lines

llvm/test/CodeGen/AMDGPU/nested-calls.ll

	; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s
	; RUN: llc -march=amdgcn -mcpu=hawaii -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=hawaii -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s

	; Test calls when called by other callable functions rather than			; Test calls when called by other callable functions rather than
	; kernels.			; kernels.

	declare void @external_void_func_i32(i32) #0			declare void @external_void_func_i32(i32) #0

	; GCN-LABEL: {{^}}test_func_call_external_void_func_i32_imm:			; GCN-LABEL: {{^}}test_func_call_external_void_func_i32_imm:
	; GCN: s_waitcnt			; GCN: s_waitcnt

	; Spill CSR VGPR used for SGPR spilling			; Spill CSR VGPR used for SGPR spilling
	; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GCN-DAG: v_writelane_b32 v40, s33, 2			; GCN-DAG: v_writelane_b32 v41, s33, 0
	; GCN-DAG: s_mov_b32 s33, s32			; GCN-DAG: s_mov_b32 s33, s32
	; GCN-DAG: s_addk_i32 s32, 0x400			; GCN-DAG: s_addk_i32 s32, 0x400
	; GCN-DAG: v_writelane_b32 v40, s30, 0			; GCN-DAG: v_writelane_b32 v40, s30, 0
	; GCN-DAG: v_writelane_b32 v40, s31, 1			; GCN-DAG: v_writelane_b32 v40, s31, 1

	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN: v_readlane_b32 s31, v40, 1			; GCN: v_readlane_b32 s31, v40, 1
	; GCN: v_readlane_b32 s30, v40, 0			; GCN: v_readlane_b32 s30, v40, 0

	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 2			; GCN-NEXT: v_readlane_b32 s33, v41, 0
	; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	define void @test_func_call_external_void_func_i32_imm() #0 {			define void @test_func_call_external_void_func_i32_imm() #0 {
	call void @external_void_func_i32(i32 42)			call void @external_void_func_i32(i32 42)
	ret void			ret void
	}			}

	Show All 21 Lines

llvm/test/CodeGen/AMDGPU/no-source-locations-in-prologue.ll

	Show All 9 Lines
	; CHECK-NEXT: .file 0 "/tmp" "lane-info.cpp" md5 0x4ab9b75a30baffdf0f6f536a80e3e382			; CHECK-NEXT: .file 0 "/tmp" "lane-info.cpp" md5 0x4ab9b75a30baffdf0f6f536a80e3e382
	; CHECK-NEXT: .loc 0 30 0 ; lane-info.cpp:30:0			; CHECK-NEXT: .loc 0 30 0 ; lane-info.cpp:30:0
	; CHECK-NEXT: .cfi_sections .debug_frame			; CHECK-NEXT: .cfi_sections .debug_frame
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: ; %bb.0: ; %entry			; CHECK-NEXT: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_or_saveexec_b64 s[16:17], -1			; CHECK-NEXT: s_or_saveexec_b64 s[16:17], -1
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; CHECK-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 exec, s[16:17]			; CHECK-NEXT: s_mov_b64 exec, s[16:17]
	; CHECK-NEXT: v_writelane_b32 v40, s33, 2			; CHECK-NEXT: v_writelane_b32 v41, s33, 0
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_add_i32 s32, s32, 0x400			; CHECK-NEXT: s_add_i32 s32, s32, 0x400
	; CHECK-NEXT: .Ltmp0:			; CHECK-NEXT: .Ltmp0:
	; CHECK-NEXT: .loc 0 31 3 prologue_end ; lane-info.cpp:31:3			; CHECK-NEXT: .loc 0 31 3 prologue_end ; lane-info.cpp:31:3
	; CHECK-NEXT: v_writelane_b32 v40, s30, 0			; CHECK-NEXT: v_writelane_b32 v40, s30, 0
	; CHECK-NEXT: v_writelane_b32 v40, s31, 1			; CHECK-NEXT: v_writelane_b32 v40, s31, 1
	; CHECK-NEXT: s_getpc_b64 s[16:17]			; CHECK-NEXT: s_getpc_b64 s[16:17]
	; CHECK-NEXT: s_add_u32 s16, s16, _ZL13sleep_foreverv@gotpcrel32@lo+4			; CHECK-NEXT: s_add_u32 s16, s16, _ZL13sleep_foreverv@gotpcrel32@lo+4
	; CHECK-NEXT: s_addc_u32 s17, s17, _ZL13sleep_foreverv@gotpcrel32@hi+12			; CHECK-NEXT: s_addc_u32 s17, s17, _ZL13sleep_foreverv@gotpcrel32@hi+12
	; CHECK-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0			; CHECK-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
	; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]			; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]			; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]
	; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]			; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]			; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; CHECK-NEXT: .Ltmp1:			; CHECK-NEXT: .Ltmp1:
	; CHECK-NEXT: .loc 0 32 1 ; lane-info.cpp:32:1			; CHECK-NEXT: .loc 0 32 1 ; lane-info.cpp:32:1
	; CHECK-NEXT: v_readlane_b32 s31, v40, 1			; CHECK-NEXT: v_readlane_b32 s31, v40, 1
	; CHECK-NEXT: v_readlane_b32 s30, v40, 0			; CHECK-NEXT: v_readlane_b32 s30, v40, 0
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00
	; CHECK-NEXT: v_readlane_b32 s33, v40, 2			; CHECK-NEXT: v_readlane_b32 s33, v41, 0
	; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_or_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; CHECK-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	; CHECK-NEXT: .Ltmp2:			; CHECK-NEXT: .Ltmp2:
	entry:			entry:
	call void @_ZL13sleep_foreverv(), !dbg !1646			call void @_ZL13sleep_foreverv(), !dbg !1646
	ret void, !dbg !1647			ret void, !dbg !1647
	}			}
	Show All 20 Lines

llvm/test/CodeGen/AMDGPU/pei-scavenge-vgpr-spill.mir

Show All 18 Lines	machineFunctionInfo:
frameOffsetReg: $sgpr33		frameOffsetReg: $sgpr33
stackPtrOffsetReg: $sgpr32		stackPtrOffsetReg: $sgpr32

body: \|		body: \|
bb.0:		bb.0:
liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255		liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255

; GFX8-LABEL: name: pei_scavenge_vgpr_spill		; GFX8-LABEL: name: pei_scavenge_vgpr_spill
; GFX8: liveins: $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr2		; GFX8: liveins: $vgpr2, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239
; GFX8-NEXT: {{ $}}		; GFX8-NEXT: {{ $}}
; GFX8-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GFX8-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GFX8-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 1048832, implicit-def dead $scc		; GFX8-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 1048832, implicit-def dead $scc
; GFX8-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5)		; GFX8-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5)
; GFX8-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5		; GFX8-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5
; GFX8-NEXT: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2		; GFX8-NEXT: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2
; GFX8-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 524224, implicit-def $scc		; GFX8-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 524224, implicit-def $scc
; GFX8-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def dead $scc		; GFX8-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def dead $scc
Show All 10 Lines	bb.0:
; GFX8-NEXT: $sgpr33 = V_READLANE_B32 $vgpr2, 0		; GFX8-NEXT: $sgpr33 = V_READLANE_B32 $vgpr2, 0
; GFX8-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GFX8-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GFX8-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 1048832, implicit-def dead $scc		; GFX8-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 1048832, implicit-def dead $scc
; GFX8-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)		; GFX8-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)
; GFX8-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5		; GFX8-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5
; GFX8-NEXT: $vgpr3 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.4, addrspace 5)		; GFX8-NEXT: $vgpr3 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.4, addrspace 5)
; GFX8-NEXT: S_ENDPGM 0, amdgpu_allvgprs		; GFX8-NEXT: S_ENDPGM 0, amdgpu_allvgprs
; GFX9-LABEL: name: pei_scavenge_vgpr_spill		; GFX9-LABEL: name: pei_scavenge_vgpr_spill
; GFX9: liveins: $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr2		; GFX9: liveins: $vgpr2, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239
; GFX9-NEXT: {{ $}}		; GFX9-NEXT: {{ $}}
; GFX9-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GFX9-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GFX9-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 1048832, implicit-def dead $scc		; GFX9-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 1048832, implicit-def dead $scc
; GFX9-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5)		; GFX9-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5)
; GFX9-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5		; GFX9-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5
; GFX9-NEXT: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2		; GFX9-NEXT: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2
; GFX9-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 524224, implicit-def $scc		; GFX9-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 524224, implicit-def $scc
; GFX9-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def dead $scc		; GFX9-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def dead $scc
; GFX9-NEXT: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 2097152, implicit-def dead $scc		; GFX9-NEXT: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 2097152, implicit-def dead $scc
; GFX9-NEXT: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec		; GFX9-NEXT: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
; GFX9-NEXT: $vgpr0 = V_ADD_U32_e32 8192, killed $vgpr0, implicit $exec		; GFX9-NEXT: $vgpr0 = V_ADD_U32_e32 8192, killed $vgpr0, implicit $exec
; GFX9-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr3, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 0, 0, 0, 0, implicit $exec :: (store (s32) into %stack.4, addrspace 5)		; GFX9-NEXT: BUFFER_STORE_DWORD_OFFSET killed $vgpr3, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 0, 0, 0, 0, implicit $exec :: (store (s32) into %stack.4, addrspace 5)
; GFX9-NEXT: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec		; GFX9-NEXT: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
; GFX9-NEXT: $vgpr3 = V_ADD_U32_e32 16384, killed $vgpr3, implicit $exec		; GFX9-NEXT: $vgpr3 = V_ADD_U32_e32 16384, killed $vgpr3, implicit $exec
; GFX9-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec		; GFX9-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec
; GFX9-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -2097152, implicit-def dead $scc		; GFX9-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -2097152, implicit-def dead $scc
; GFX9-NEXT: $sgpr33 = V_READLANE_B32 $vgpr2, 0		; GFX9-NEXT: $sgpr33 = V_READLANE_B32 $vgpr2, 0
; GFX9-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GFX9-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GFX9-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 1048832, implicit-def dead $scc		; GFX9-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 1048832, implicit-def dead $scc
; GFX9-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)		; GFX9-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)
; GFX9-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5		; GFX9-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5
; GFX9-NEXT: $vgpr3 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.4, addrspace 5)		; GFX9-NEXT: $vgpr3 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load (s32) from %stack.4, addrspace 5)
; GFX9-NEXT: S_ENDPGM 0, amdgpu_allvgprs		; GFX9-NEXT: S_ENDPGM 0, amdgpu_allvgprs
; GFX9-FLATSCR-LABEL: name: pei_scavenge_vgpr_spill		; GFX9-FLATSCR-LABEL: name: pei_scavenge_vgpr_spill
; GFX9-FLATSCR: liveins: $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr2		; GFX9-FLATSCR: liveins: $vgpr2, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239
; GFX9-FLATSCR-NEXT: {{ $}}		; GFX9-FLATSCR-NEXT: {{ $}}
; GFX9-FLATSCR-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GFX9-FLATSCR-NEXT: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GFX9-FLATSCR-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 16388, implicit-def dead $scc		; GFX9-FLATSCR-NEXT: $sgpr6 = S_ADD_I32 $sgpr32, 16388, implicit-def dead $scc
; GFX9-FLATSCR-NEXT: SCRATCH_STORE_DWORD_SADDR $vgpr2, killed $sgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.3, addrspace 5)		; GFX9-FLATSCR-NEXT: SCRATCH_STORE_DWORD_SADDR $vgpr2, killed $sgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.3, addrspace 5)
; GFX9-FLATSCR-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5		; GFX9-FLATSCR-NEXT: $exec = S_MOV_B64 killed $sgpr4_sgpr5
; GFX9-FLATSCR-NEXT: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2		; GFX9-FLATSCR-NEXT: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2
; GFX9-FLATSCR-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 8191, implicit-def $scc		; GFX9-FLATSCR-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 8191, implicit-def $scc
; GFX9-FLATSCR-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294959104, implicit-def dead $scc		; GFX9-FLATSCR-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294959104, implicit-def dead $scc
Show All 16 Lines

llvm/test/CodeGen/AMDGPU/save-fp.ll

	; RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX908 %s			; RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX908 %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX900 %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX900 %s

	define void @foo() {			define void @foo() {
	bb:			bb:
	ret void			ret void
	}			}

	; FIXME: We spill v40 into AGPR, but still save and restore FP			; FIXME: We spill v40 into AGPR, but still save and restore FP
	; which is not needed in this case.			; which is not needed in this case.

	; GCN-LABEL: {{^}}caller:			; GCN-LABEL: {{^}}caller:

	; GCN: v_writelane_b32 v2, s33, 2			; GCN: s_mov_b32 [[TMP_SGPR:s[0-9]+]], s33
	; GCN: s_mov_b32 s33, s32			; GCN: s_mov_b32 s33, s32
	; GFX900: buffer_store_dword			; GFX900: buffer_store_dword
	; GFX908-DAG: v_accvgpr_write_b32			; GFX908-DAG: v_accvgpr_write_b32
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GFX900: buffer_load_dword			; GFX900: buffer_load_dword
	; GFX908: v_accvgpr_read_b32			; GFX908: v_accvgpr_read_b32
	; GCN: v_readlane_b32 s33, v2, 2			; GCN: s_mov_b32 s33, [[TMP_SGPR]]
	define i64 @caller() {			define i64 @caller() {
	bb:			bb:
	call void asm sideeffect "", "~{v40}" ()			call void asm sideeffect "", "~{v40}" ()
	tail call void @foo()			tail call void @foo()
	ret i64 0			ret i64 0
	}			}

llvm/test/CodeGen/AMDGPU/sgpr-spills-split-regalloc.ll

; RUN: llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx803 -O0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s		; RUN: llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx803 -O0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s

define void @child_function() #0 {		define void @child_function() #0 {
call void asm sideeffect "", "~{vcc}" () #0		call void asm sideeffect "", "~{vcc}" () #0
ret void		ret void
}		}

; GCN-LABEL: {{^}}spill_sgpr_with_no_lower_vgpr_available:		; GCN-LABEL: {{^}}spill_sgpr_with_no_lower_vgpr_available:
; GCN: buffer_store_dword v255, off, s[0:3], s32		; GCN: buffer_store_dword v255, off, s[0:3], s32
; GCN: v_writelane_b32 v255, s33, 2		; GCN: s_mov_b32 [[TMP_SGPR:s[0-9]+]], s33
; GCN: v_writelane_b32 v255, s30, 0		; GCN: v_writelane_b32 v255, s30, 0
; GCN: v_writelane_b32 v255, s31, 1		; GCN: v_writelane_b32 v255, s31, 1
; GCN: s_swappc_b64 s[30:31], s[4:5]		; GCN: s_swappc_b64 s[30:31], s[4:5]
; GCN: v_readlane_b32 s31, v255, 1		; GCN: v_readlane_b32 s31, v255, 1
; GCN: v_readlane_b32 s30, v255, 0		; GCN: v_readlane_b32 s30, v255, 0
; GCN: v_readlane_b32 s33, v255, 2		; GCN: s_mov_b32 s33, [[TMP_SGPR]]
; GCN: ; NumVgprs: 256		; GCN: ; NumVgprs: 256

define void @spill_sgpr_with_no_lower_vgpr_available() #0 {		define void @spill_sgpr_with_no_lower_vgpr_available() #0 {
%alloca = alloca i32, align 4, addrspace(5)		%alloca = alloca i32, align 4, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca

call void asm sideeffect "",		call void asm sideeffect "",
"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}		"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
Show All 23 Lines	define void @spill_sgpr_with_no_lower_vgpr_available() #0 {
,~{v240},~{v241},~{v242},~{v243},~{v244},~{v245},~{v246},~{v247},~{v248},~{v249}		,~{v240},~{v241},~{v242},~{v243},~{v244},~{v245},~{v246},~{v247},~{v248},~{v249}
,~{v250},~{v251},~{v252},~{v253},~{v254}" () #0		,~{v250},~{v251},~{v252},~{v253},~{v254}" () #0
call void @child_function()		call void @child_function()
ret void		ret void
}		}

; GCN-LABEL: {{^}}spill_to_lowest_available_vgpr:		; GCN-LABEL: {{^}}spill_to_lowest_available_vgpr:
; GCN: buffer_store_dword v254, off, s[0:3], s32		; GCN: buffer_store_dword v254, off, s[0:3], s32
; GCN: v_writelane_b32 v254, s33, 2		; GCN: s_mov_b32 [[TMP_SGPR:s[0-9]+]], s33
; GCN: v_writelane_b32 v254, s30, 0		; GCN: v_writelane_b32 v254, s30, 0
; GCN: v_writelane_b32 v254, s31, 1		; GCN: v_writelane_b32 v254, s31, 1
; GCN: s_swappc_b64 s[30:31], s[4:5]		; GCN: s_swappc_b64 s[30:31], s[4:5]
; GCN: v_readlane_b32 s31, v254, 1		; GCN: v_readlane_b32 s31, v254, 1
; GCN: v_readlane_b32 s30, v254, 0		; GCN: v_readlane_b32 s30, v254, 0
; GCN: v_readlane_b32 s33, v254, 2		; GCN: s_mov_b32 s33, [[TMP_SGPR]]

define void @spill_to_lowest_available_vgpr() #0 {		define void @spill_to_lowest_available_vgpr() #0 {
%alloca = alloca i32, align 4, addrspace(5)		%alloca = alloca i32, align 4, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca

call void asm sideeffect "",		call void asm sideeffect "",
"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}		"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}		,~{v10},~{v11},~{v12},~{v13},~{v14},~{v15},~{v16},~{v17},~{v18},~{v19}
▲ Show 20 Lines • Show All 273 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sibling-call.ll

Show First 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	entry:
%ret = tail call fastcc i32 @i32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] zeroinitializer)		%ret = tail call fastcc i32 @i32_fastcc_i32_i32_a32i32(i32 %a, i32 %b, [32 x i32] zeroinitializer)
ret i32 %ret		ret i32 %ret
}		}

; Have another non-tail in the function		; Have another non-tail in the function
; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_i32_other_call:		; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_i32_other_call:
; GCN: s_or_saveexec_b64 s{{\[[0-9]+:[0-9]+\]}}, -1		; GCN: s_or_saveexec_b64 s{{\[[0-9]+:[0-9]+\]}}, -1
; GCN-NEXT: buffer_store_dword [[CSRV:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword [[CSRV:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
		; GCN-NEXT: buffer_store_dword [[CSRV_1:v[0-9]+]], off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec		; GCN-NEXT: s_mov_b64 exec
; GCN: v_writelane_b32 [[CSRV]], s33, 2		; GCN: v_writelane_b32 [[CSRV_1]], s33, 0
; GCN-DAG: s_addk_i32 s32, 0x400		; GCN-DAG: s_addk_i32 s32, 0x800

; GCN-DAG: s_getpc_b64 s[4:5]		; GCN-DAG: s_getpc_b64 s[4:5]
; GCN-DAG: s_add_u32 s4, s4, i32_fastcc_i32_i32@gotpcrel32@lo+4		; GCN-DAG: s_add_u32 s4, s4, i32_fastcc_i32_i32@gotpcrel32@lo+4
; GCN-DAG: s_addc_u32 s5, s5, i32_fastcc_i32_i32@gotpcrel32@hi+12		; GCN-DAG: s_addc_u32 s5, s5, i32_fastcc_i32_i32@gotpcrel32@hi+12

; GCN-DAG: v_writelane_b32 [[CSRV]], s30, 0		; GCN-DAG: v_writelane_b32 [[CSRV]], s30, 0
; GCN-DAG: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill		; GCN-DAG: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
; GCN-DAG: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill		; GCN-DAG: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-DAG: v_writelane_b32 [[CSRV]], s31, 1		; GCN-DAG: v_writelane_b32 [[CSRV]], s31, 1


; GCN: s_swappc_b64		; GCN: s_swappc_b64

; GCN-DAG: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload		; GCN-DAG: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN-DAG: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload		; GCN-DAG: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload

; GCN: s_getpc_b64 s[4:5]		; GCN: s_getpc_b64 s[4:5]
; GCN-NEXT: s_add_u32 s4, s4, sibling_call_i32_fastcc_i32_i32@rel32@lo+4		; GCN-NEXT: s_add_u32 s4, s4, sibling_call_i32_fastcc_i32_i32@rel32@lo+4
; GCN-NEXT: s_addc_u32 s5, s5, sibling_call_i32_fastcc_i32_i32@rel32@hi+12		; GCN-NEXT: s_addc_u32 s5, s5, sibling_call_i32_fastcc_i32_i32@rel32@hi+12

; GCN-DAG: v_readlane_b32 s30, [[CSRV]], 0		; GCN-DAG: v_readlane_b32 s30, [[CSRV]], 0
; GCN-DAG: v_readlane_b32 s31, [[CSRV]], 1		; GCN-DAG: v_readlane_b32 s31, [[CSRV]], 1

; GCN: s_addk_i32 s32, 0xfc00		; GCN: s_addk_i32 s32, 0xf800
; GCN-NEXT: v_readlane_b32 s33,		; GCN-NEXT: v_readlane_b32 s33,
; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1		; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
; GCN-NEXT: buffer_load_dword [[CSRV]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword [[CSRV]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
		; GCN-NEXT: buffer_load_dword [[CSRV_1]], off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[6:7]		; GCN-NEXT: s_mov_b64 exec, s[6:7]
; GCN-NEXT: s_setpc_b64 s[4:5]		; GCN-NEXT: s_setpc_b64 s[4:5]
define fastcc i32 @sibling_call_i32_fastcc_i32_i32_other_call(i32 %a, i32 %b, i32 %c) #1 {		define fastcc i32 @sibling_call_i32_fastcc_i32_i32_other_call(i32 %a, i32 %b, i32 %c) #1 {
entry:		entry:
%other.call = tail call fastcc i32 @i32_fastcc_i32_i32(i32 %a, i32 %b)		%other.call = tail call fastcc i32 @i32_fastcc_i32_i32(i32 %a, i32 %b)
%ret = tail call fastcc i32 @sibling_call_i32_fastcc_i32_i32(i32 %a, i32 %b, i32 %other.call)		%ret = tail call fastcc i32 @sibling_call_i32_fastcc_i32_i32(i32 %a, i32 %b, i32 %other.call)
ret i32 %ret		ret i32 %ret
}		}
▲ Show 20 Lines • Show All 231 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill-csr-frame-ptr-reg-copy.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=1 < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -stress-regalloc=1 < %s \| FileCheck -check-prefix=GCN %s

	; GCN-LABEL: {{^}}spill_csr_s5_copy:			; GCN-LABEL: {{^}}spill_csr_s5_copy:
	; GCN: s_or_saveexec_b64			; GCN: s_or_saveexec_b64
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec			; GCN-NEXT: s_mov_b64 exec
	; GCN: v_writelane_b32 v40, s33, 2			; GCN: v_writelane_b32 v41, s33, 0
	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN: v_mov_b32_e32 [[K:v[0-9]+]], 9			; GCN: v_mov_b32_e32 [[K:v[0-9]+]], 9
	; GCN: buffer_store_dword [[K]], off, s[0:3], s33{{$}}			; GCN: buffer_store_dword [[K]], off, s[0:3], s33{{$}}

	; GCN: v_readlane_b32 s33, v40, 2			; GCN: v_readlane_b32 s33, v41, 0
	; GCN: s_or_saveexec_b64			; GCN: s_or_saveexec_b64
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GCN: s_mov_b64 exec			; GCN: s_mov_b64 exec
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	define void @spill_csr_s5_copy() #0 {			define void @spill_csr_s5_copy() #0 {
	bb:			bb:
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	%tmp = tail call i64 @func() #1			%tmp = tail call i64 @func() #1
	%tmp1 = getelementptr inbounds i32, i32 addrspace(1)* null, i64 %tmp			%tmp1 = getelementptr inbounds i32, i32 addrspace(1)* null, i64 %tmp
	%tmp2 = load i32, i32 addrspace(1)* %tmp1, align 4			%tmp2 = load i32, i32 addrspace(1)* %tmp1, align 4
	Show All 9 Lines

llvm/test/CodeGen/AMDGPU/stack-realign.ll

	Show First 20 Lines • Show All 151 Lines • ▼ Show 20 Lines
	define void @func_call_align1024_bp_gets_vgpr_spill(<32 x i32> %a, i32 %b) #0 {			define void @func_call_align1024_bp_gets_vgpr_spill(<32 x i32> %a, i32 %b) #0 {
	; The test forces the stack to be realigned to a new boundary			; The test forces the stack to be realigned to a new boundary
	; since there is a local object with an alignment of 1024.			; since there is a local object with an alignment of 1024.
	; Should use BP to access the incoming stack arguments.			; Should use BP to access the incoming stack arguments.
	; The BP value is saved/restored with a VGPR spill.			; The BP value is saved/restored with a VGPR spill.

	; GCN-LABEL: func_call_align1024_bp_gets_vgpr_spill:			; GCN-LABEL: func_call_align1024_bp_gets_vgpr_spill:
	; GCN: buffer_store_dword [[VGPR_REG:v[0-9]+]], off, s[0:3], s32 offset:1028 ; 4-byte Folded Spill			; GCN: buffer_store_dword [[VGPR_REG:v[0-9]+]], off, s[0:3], s32 offset:1028 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword [[VGPR_REG_1:v[0-9]+]], off, s[0:3], s32 offset:1032 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 [[VGPR_REG]], s33, 2			; GCN-NEXT: v_writelane_b32 [[VGPR_REG_1]], s33, 0
	; GCN-DAG: s_add_i32 [[SCRATCH_REG:s[0-9]+]], s32, 0xffc0			; GCN-DAG: s_add_i32 [[SCRATCH_REG:s[0-9]+]], s32, 0xffc0
	; GCN: s_and_b32 s33, [[SCRATCH_REG]], 0xffff0000			; GCN: s_and_b32 s33, [[SCRATCH_REG]], 0xffff0000
	; GCN: v_mov_b32_e32 v32, 0			; GCN: v_mov_b32_e32 v32, 0
	; GCN-DAG: v_writelane_b32 [[VGPR_REG]], s34, 3			; GCN-DAG: v_writelane_b32 [[VGPR_REG_1]], s34, 1
	; GCN: s_mov_b32 s34, s32			; GCN: s_mov_b32 s34, s32
	; GCN: buffer_store_dword v32, off, s[0:3], s33 offset:1024			; GCN: buffer_store_dword v32, off, s[0:3], s33 offset:1024
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s34			; GCN-NEXT: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s34
	; GCN-DAG: s_add_i32 s32, s32, 0x30000			; GCN-DAG: s_add_i32 s32, s32, 0x30000
	; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s32			; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s32
	; GCN: s_swappc_b64 s[30:31],			; GCN: s_swappc_b64 s[30:31],

	; GCN: v_readlane_b32 s31, [[VGPR_REG]], 1			; GCN: v_readlane_b32 s31, [[VGPR_REG]], 1
	; GCN: v_readlane_b32 s30, [[VGPR_REG]], 0			; GCN: v_readlane_b32 s30, [[VGPR_REG]], 0
	; GCN: s_add_i32 s32, s32, 0xfffd0000			; GCN: s_add_i32 s32, s32, 0xfffd0000
	; GCN-NEXT: v_readlane_b32 s33, [[VGPR_REG]], 2			; GCN-NEXT: v_readlane_b32 s33, [[VGPR_REG_1]], 0
	; GCN-NEXT: v_readlane_b32 s34, [[VGPR_REG]], 3			; GCN-NEXT: v_readlane_b32 s34, [[VGPR_REG_1]], 1
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword [[VGPR_REG]], off, s[0:3], s32 offset:1028 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword [[VGPR_REG]], off, s[0:3], s32 offset:1028 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword [[VGPR_REG_1]], off, s[0:3], s32 offset:1032 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN: s_setpc_b64 s[30:31]			; GCN: s_setpc_b64 s[30:31]
	%temp = alloca i32, align 1024, addrspace(5)			%temp = alloca i32, align 1024, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %temp, align 1024			store volatile i32 0, i32 addrspace(5)* %temp, align 1024
	call void @extern_func(<32 x i32> %a, i32 %b)			call void @extern_func(<32 x i32> %a, i32 %b)
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 148 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/tail-call-amdgpu-gfx.ll

	Show All 14 Lines

	define amdgpu_gfx float @caller(float %arg0) {			define amdgpu_gfx float @caller(float %arg0) {
	; GCN-LABEL: caller:			; GCN-LABEL: caller:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1			; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GCN-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v1, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[34:35]			; GCN-NEXT: s_mov_b64 exec, s[34:35]
	; GCN-NEXT: v_writelane_b32 v1, s33, 3
	; GCN-NEXT: v_writelane_b32 v1, s4, 0			; GCN-NEXT: v_writelane_b32 v1, s4, 0
				; GCN-NEXT: s_mov_b32 s36, s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v1, s30, 1			; GCN-NEXT: v_writelane_b32 v1, s30, 1
	; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0			; GCN-NEXT: v_add_f32_e32 v0, 1.0, v0
	; GCN-NEXT: s_mov_b32 s4, 2.0			; GCN-NEXT: s_mov_b32 s4, 2.0
	; GCN-NEXT: v_writelane_b32 v1, s31, 2			; GCN-NEXT: v_writelane_b32 v1, s31, 2
	; GCN-NEXT: s_getpc_b64 s[34:35]			; GCN-NEXT: s_getpc_b64 s[34:35]
	; GCN-NEXT: s_add_u32 s34, s34, callee@rel32@lo+4			; GCN-NEXT: s_add_u32 s34, s34, callee@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s35, s35, callee@rel32@hi+12			; GCN-NEXT: s_addc_u32 s35, s35, callee@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GCN-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GCN-NEXT: v_readlane_b32 s31, v1, 2			; GCN-NEXT: v_readlane_b32 s31, v1, 2
	; GCN-NEXT: v_readlane_b32 s30, v1, 1			; GCN-NEXT: v_readlane_b32 s30, v1, 1
	; GCN-NEXT: v_readlane_b32 s4, v1, 0			; GCN-NEXT: v_readlane_b32 s4, v1, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v1, 3			; GCN-NEXT: s_mov_b32 s33, s36
	; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1			; GCN-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GCN-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v1, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[34:35]			; GCN-NEXT: s_mov_b64 exec, s[34:35]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%add = fadd float %arg0, 1.0			%add = fadd float %arg0, 1.0
	%call = tail call amdgpu_gfx float @callee(float %add, float inreg 2.0)			%call = tail call amdgpu_gfx float @callee(float %add, float inreg 2.0)
	ret float %call			ret float %call
	}			}

llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: llc -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=GCN %s
	; RUN: opt -S -si-annotate-control-flow -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=SI-OPT %s			; RUN: opt -S -si-annotate-control-flow -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=SI-OPT %s

	define hidden void @widget() {			define hidden void @widget() {
	; GCN-LABEL: widget:			; GCN-LABEL: widget:
	; GCN: ; %bb.0: ; %bb			; GCN: ; %bb.0: ; %bb
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 2			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: v_mov_b32_e32 v1, 0			; GCN-NEXT: v_mov_b32_e32 v1, 0
	; GCN-NEXT: flat_load_dword v0, v[0:1]			; GCN-NEXT: flat_load_dword v0, v[0:1]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	Show All 25 Lines
	; GCN-NEXT: v_mov_b32_e32 v2, 0			; GCN-NEXT: v_mov_b32_e32 v2, 0
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: v_mov_b32_e32 v1, 0			; GCN-NEXT: v_mov_b32_e32 v1, 0
	; GCN-NEXT: flat_store_dword v[0:1], v2			; GCN-NEXT: flat_store_dword v[0:1], v2
	; GCN-NEXT: .LBB0_7: ; %UnifiedReturnBlock			; GCN-NEXT: .LBB0_7: ; %UnifiedReturnBlock
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 2			; GCN-NEXT: v_readlane_b32 s33, v41, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	; SI-OPT-LABEL: @widget(			; SI-OPT-LABEL: @widget(
	; SI-OPT-NEXT: bb:			; SI-OPT-NEXT: bb:
	; SI-OPT-NEXT: [[TMP:%.]] = load i32, i32 addrspace(1) null, align 16			; SI-OPT-NEXT: [[TMP:%.]] = load i32, i32 addrspace(1) null, align 16
	; SI-OPT-NEXT: [[TMP1:%.*]] = icmp slt i32 [[TMP]], 21			; SI-OPT-NEXT: [[TMP1:%.*]] = icmp slt i32 [[TMP]], 21
	; SI-OPT-NEXT: br i1 [[TMP1]], label [[BB4:%.]], label [[BB2:%.]]			; SI-OPT-NEXT: br i1 [[TMP1]], label [[BB4:%.]], label [[BB2:%.]]
	▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines
	; SI-OPT-NEXT: store float 0x7FF8000000000000, float addrspace(5)* null, align 4			; SI-OPT-NEXT: store float 0x7FF8000000000000, float addrspace(5)* null, align 4
	; SI-OPT-NEXT: br label [[BB2]]			; SI-OPT-NEXT: br label [[BB2]]
	;			;
	; GCN-LABEL: blam:			; GCN-LABEL: blam:
	; GCN: ; %bb.0: ; %bb			; GCN: ; %bb.0: ; %bb
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v40, s33, 17			; GCN-NEXT: v_writelane_b32 v46, s33, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_addk_i32 s32, 0x800			; GCN-NEXT: s_addk_i32 s32, 0x800
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX9 %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX10 %s

	declare void @extern_func() #2			declare void @extern_func() #2

	define <4 x float> @non_preserved_vgpr_tuple8(<8 x i32> %rsrc, <4 x i32> %samp, float %bias, float %zcompare, float %s, float %t, float %clamp) {			define <4 x float> @non_preserved_vgpr_tuple8(<8 x i32> %rsrc, <4 x i32> %samp, float %bias, float %zcompare, float %s, float %t, float %clamp) {
	; The vgpr tuple8 operand in image_gather4_c_b_cl instruction needs not be			; The vgpr tuple8 operand in image_gather4_c_b_cl instruction needs not be
	; preserved across the call and should get 8 scratch registers.			; preserved across the call and should get 8 scratch registers.
	; GFX9-LABEL: non_preserved_vgpr_tuple8:			; GFX9-LABEL: non_preserved_vgpr_tuple8:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_mov_b32 s4, 0			; GFX9-NEXT: s_mov_b32 s4, 0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v45, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: v_mov_b32_e32 v36, v16			; GFX9-NEXT: v_mov_b32_e32 v36, v16
	; GFX9-NEXT: v_mov_b32_e32 v35, v15			; GFX9-NEXT: v_mov_b32_e32 v35, v15
	; GFX9-NEXT: v_mov_b32_e32 v34, v14			; GFX9-NEXT: v_mov_b32_e32 v34, v14
	; GFX9-NEXT: v_mov_b32_e32 v33, v13			; GFX9-NEXT: v_mov_b32_e32 v33, v13
	; GFX9-NEXT: v_mov_b32_e32 v32, v12			; GFX9-NEXT: v_mov_b32_e32 v32, v12
	; GFX9-NEXT: s_mov_b32 s5, s4			; GFX9-NEXT: s_mov_b32 s5, s4
	; GFX9-NEXT: s_mov_b32 s6, s4			; GFX9-NEXT: s_mov_b32 s6, s4
	Show All 30 Lines
	; GFX9-NEXT: v_mov_b32_e32 v3, v44			; GFX9-NEXT: v_mov_b32_e32 v3, v44
	; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v44, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v45, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: non_preserved_vgpr_tuple8:			; GFX10-LABEL: non_preserved_vgpr_tuple8:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v45, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_mov_b32_e32 v36, v16			; GFX10-NEXT: v_mov_b32_e32 v36, v16
	; GFX10-NEXT: v_mov_b32_e32 v35, v15			; GFX10-NEXT: v_mov_b32_e32 v35, v15
	; GFX10-NEXT: v_mov_b32_e32 v34, v14			; GFX10-NEXT: v_mov_b32_e32 v34, v14
	; GFX10-NEXT: v_mov_b32_e32 v33, v13			; GFX10-NEXT: v_mov_b32_e32 v33, v13
	; GFX10-NEXT: v_mov_b32_e32 v32, v12			; GFX10-NEXT: v_mov_b32_e32 v32, v12
	; GFX10-NEXT: s_mov_b32 s4, 0			; GFX10-NEXT: s_mov_b32 s4, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v45, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_mov_b32 s5, s4			; GFX10-NEXT: s_mov_b32 s5, s4
	; GFX10-NEXT: s_mov_b32 s6, s4			; GFX10-NEXT: s_mov_b32 s6, s4
	; GFX10-NEXT: s_mov_b32 s7, s4			; GFX10-NEXT: s_mov_b32 s7, s4
	; GFX10-NEXT: s_mov_b32 s8, s4			; GFX10-NEXT: s_mov_b32 s8, s4
	; GFX10-NEXT: s_mov_b32 s9, s4			; GFX10-NEXT: s_mov_b32 s9, s4
	; GFX10-NEXT: s_mov_b32 s10, s4			; GFX10-NEXT: s_mov_b32 s10, s4
	; GFX10-NEXT: s_mov_b32 s11, s4			; GFX10-NEXT: s_mov_b32 s11, s4
	Show All 27 Lines
	; GFX10-NEXT: s_clause 0x3			; GFX10-NEXT: s_clause 0x3
	; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33			; GFX10-NEXT: buffer_load_dword v44, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:4
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8			; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v45, 0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16
				; GFX10-NEXT: buffer_load_dword v45, off, s[0:3], s32 offset:20
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]




	Show All 18 Lines
	; across the call and should get allcoated to 8 CSRs.			; across the call and should get allcoated to 8 CSRs.
	; Only the lower 5 sub-registers of the tuple are preserved.			; Only the lower 5 sub-registers of the tuple are preserved.
	; The upper 3 sub-registers are unused.			; The upper 3 sub-registers are unused.
	; GFX9-LABEL: call_preserved_vgpr_tuple8:			; GFX9-LABEL: call_preserved_vgpr_tuple8:
	; GFX9: ; %bb.0: ; %main_body			; GFX9: ; %bb.0: ; %main_body
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
				; GFX9-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 10
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: v_writelane_b32 v40, s36, 2			; GFX9-NEXT: v_writelane_b32 v40, s36, 2
	; GFX9-NEXT: v_writelane_b32 v40, s37, 3			; GFX9-NEXT: v_writelane_b32 v40, s37, 3
	; GFX9-NEXT: v_writelane_b32 v40, s38, 4			; GFX9-NEXT: v_writelane_b32 v40, s38, 4
	; GFX9-NEXT: v_writelane_b32 v40, s39, 5			; GFX9-NEXT: v_writelane_b32 v40, s39, 5
	; GFX9-NEXT: v_writelane_b32 v40, s40, 6			; GFX9-NEXT: v_writelane_b32 v40, s40, 6
	; GFX9-NEXT: v_writelane_b32 v40, s41, 7			; GFX9-NEXT: v_writelane_b32 v40, s41, 7
				; GFX9-NEXT: v_writelane_b32 v46, s33, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: v_writelane_b32 v40, s42, 8			; GFX9-NEXT: v_writelane_b32 v40, s42, 8
	; GFX9-NEXT: s_mov_b32 s36, 0			; GFX9-NEXT: s_mov_b32 s36, 0
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill
	Show All 33 Lines
	; GFX9-NEXT: v_readlane_b32 s40, v40, 6			; GFX9-NEXT: v_readlane_b32 s40, v40, 6
	; GFX9-NEXT: v_readlane_b32 s39, v40, 5			; GFX9-NEXT: v_readlane_b32 s39, v40, 5
	; GFX9-NEXT: v_readlane_b32 s38, v40, 4			; GFX9-NEXT: v_readlane_b32 s38, v40, 4
	; GFX9-NEXT: v_readlane_b32 s37, v40, 3			; GFX9-NEXT: v_readlane_b32 s37, v40, 3
	; GFX9-NEXT: v_readlane_b32 s36, v40, 2			; GFX9-NEXT: v_readlane_b32 s36, v40, 2
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v40, 10			; GFX9-NEXT: v_readlane_b32 s33, v46, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload
				; GFX9-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:24 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_preserved_vgpr_tuple8:			; GFX10-LABEL: call_preserved_vgpr_tuple8:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
				; GFX10-NEXT: buffer_store_dword v46, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 10			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
				; GFX10-NEXT: v_writelane_b32 v46, s33, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v45, off, s[0:3], s33 ; 4-byte Folded Spill
				; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_addk_i32 s32, 0x400			; GFX10-NEXT: s_addk_i32 s32, 0x400
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v41, v16			; GFX10-NEXT: v_mov_b32_e32 v41, v16
	; GFX10-NEXT: v_mov_b32_e32 v42, v15			; GFX10-NEXT: v_mov_b32_e32 v42, v15
	; GFX10-NEXT: v_mov_b32_e32 v43, v14			; GFX10-NEXT: v_mov_b32_e32 v43, v14
	; GFX10-NEXT: v_mov_b32_e32 v44, v13
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: v_mov_b32_e32 v45, v12
	; GFX10-NEXT: v_writelane_b32 v40, s36, 2			; GFX10-NEXT: v_writelane_b32 v40, s36, 2
	; GFX10-NEXT: s_mov_b32 s36, 0			; GFX10-NEXT: s_mov_b32 s36, 0
				; GFX10-NEXT: v_mov_b32_e32 v44, v13
				; GFX10-NEXT: v_mov_b32_e32 v45, v12
	; GFX10-NEXT: v_writelane_b32 v40, s37, 3			; GFX10-NEXT: v_writelane_b32 v40, s37, 3
	; GFX10-NEXT: s_mov_b32 s37, s36			; GFX10-NEXT: s_mov_b32 s37, s36
	; GFX10-NEXT: v_writelane_b32 v40, s38, 4			; GFX10-NEXT: v_writelane_b32 v40, s38, 4
	; GFX10-NEXT: s_mov_b32 s38, s36			; GFX10-NEXT: s_mov_b32 s38, s36
	; GFX10-NEXT: v_writelane_b32 v40, s39, 5			; GFX10-NEXT: v_writelane_b32 v40, s39, 5
	; GFX10-NEXT: s_mov_b32 s39, s36			; GFX10-NEXT: s_mov_b32 s39, s36
	; GFX10-NEXT: v_writelane_b32 v40, s40, 6			; GFX10-NEXT: v_writelane_b32 v40, s40, 6
	; GFX10-NEXT: s_mov_b32 s40, s36			; GFX10-NEXT: s_mov_b32 s40, s36
	Show All 26 Lines
	; GFX10-NEXT: v_readlane_b32 s40, v40, 6			; GFX10-NEXT: v_readlane_b32 s40, v40, 6
	; GFX10-NEXT: v_readlane_b32 s39, v40, 5			; GFX10-NEXT: v_readlane_b32 s39, v40, 5
	; GFX10-NEXT: v_readlane_b32 s38, v40, 4			; GFX10-NEXT: v_readlane_b32 s38, v40, 4
	; GFX10-NEXT: v_readlane_b32 s37, v40, 3			; GFX10-NEXT: v_readlane_b32 s37, v40, 3
	; GFX10-NEXT: v_readlane_b32 s36, v40, 2			; GFX10-NEXT: v_readlane_b32 s36, v40, 2
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 10			; GFX10-NEXT: v_readlane_b32 s33, v46, 0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:20 ; 4-byte Folded Reload			; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:20
				; GFX10-NEXT: buffer_load_dword v46, off, s[0:3], s32 offset:24
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]




	Show All 16 Lines

llvm/test/CodeGen/AMDGPU/wave32.ll

	Show First 20 Lines • Show All 1,115 Lines • ▼ Show 20 Lines

	; GCN-LABEL: {{^}}callee_no_stack_with_call:			; GCN-LABEL: {{^}}callee_no_stack_with_call:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: s_waitcnt_vscnt			; GCN-NEXT: s_waitcnt_vscnt

	; GFX1064-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GFX1064-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GFX1032-NEXT: s_or_saveexec_b32 [[COPY_EXEC0:s[0-9]+]], -1{{$}}			; GFX1032-NEXT: s_or_saveexec_b32 [[COPY_EXEC0:s[0-9]+]], -1{{$}}
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
				; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_waitcnt_depctr 0xffe3			; GCN-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GFX1064-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GFX1032-NEXT: s_mov_b32 exec_lo, [[COPY_EXEC0]]			; GFX1032-NEXT: s_mov_b32 exec_lo, [[COPY_EXEC0]]

	; GCN-NEXT: v_writelane_b32 v40, s33, 2			; GCN-NEXT: v_writelane_b32 v41, s33, 0
	; GCN: s_mov_b32 s33, s32			; GCN: s_mov_b32 s33, s32
	; GFX1064: s_addk_i32 s32, 0x400			; GFX1064: s_addk_i32 s32, 0x400
	; GFX1032: s_addk_i32 s32, 0x200			; GFX1032: s_addk_i32 s32, 0x200


	; GCN-DAG: v_writelane_b32 v40, s30, 0			; GCN-DAG: v_writelane_b32 v40, s30, 0
	; GCN-DAG: v_writelane_b32 v40, s31, 1			; GCN-DAG: v_writelane_b32 v40, s31, 1
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GCN-DAG: v_readlane_b32 s30, v40, 0			; GCN-DAG: v_readlane_b32 s30, v40, 0
	; GCN-DAG: v_readlane_b32 s31, v40, 1			; GCN-DAG: v_readlane_b32 s31, v40, 1


	; GFX1064: s_addk_i32 s32, 0xfc00			; GFX1064: s_addk_i32 s32, 0xfc00
	; GFX1032: s_addk_i32 s32, 0xfe00			; GFX1032: s_addk_i32 s32, 0xfe00
	; GCN: v_readlane_b32 s33, v40, 2			; GCN: v_readlane_b32 s33, v41, 0
	; GFX1064: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GFX1064: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GFX1032: s_or_saveexec_b32 [[COPY_EXEC1:s[0-9]]], -1{{$}}			; GFX1032: s_or_saveexec_b32 [[COPY_EXEC1:s[0-9]]], -1{{$}}
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: s_clause 0x1
				; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4
	; GCN-NEXT: s_waitcnt_depctr 0xffe3			; GCN-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GFX1064-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GFX1032-NEXT: s_mov_b32 exec_lo, [[COPY_EXEC1]]			; GFX1032-NEXT: s_mov_b32 exec_lo, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define void @callee_no_stack_with_call() #1 {			define void @callee_no_stack_with_call() #1 {
	call void @external_void_func_void()			call void @external_void_func_void()
	ret void			ret void
	Show All 39 Lines

llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll

	Show First 20 Lines • Show All 330 Lines • ▼ Show 20 Lines
	; GFX9-O0-LABEL: strict_wwm_call:			; GFX9-O0-LABEL: strict_wwm_call:
	; GFX9-O0: ; %bb.0:			; GFX9-O0: ; %bb.0:
	; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O0-NEXT: v_writelane_b32 v3, s33, 2			; GFX9-O0-NEXT: s_mov_b32 s35, s33
	; GFX9-O0-NEXT: s_mov_b32 s33, s32			; GFX9-O0-NEXT: s_mov_b32 s33, s32
	; GFX9-O0-NEXT: s_add_i32 s32, s32, 0x400			; GFX9-O0-NEXT: s_add_i32 s32, s32, 0x400
	; GFX9-O0-NEXT: v_writelane_b32 v3, s30, 0			; GFX9-O0-NEXT: v_writelane_b32 v3, s30, 0
	; GFX9-O0-NEXT: v_writelane_b32 v3, s31, 1			; GFX9-O0-NEXT: v_writelane_b32 v3, s31, 1
	; GFX9-O0-NEXT: s_mov_b32 s36, s4			; GFX9-O0-NEXT: s_mov_b32 s36, s4
	; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39			; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39
	; GFX9-O0-NEXT: s_mov_b32 s37, s5			; GFX9-O0-NEXT: s_mov_b32 s37, s5
	; GFX9-O0-NEXT: s_mov_b32 s38, s6			; GFX9-O0-NEXT: s_mov_b32 s38, s6
	Show All 17 Lines
	; GFX9-O0-NEXT: v_mov_b32_e32 v1, v0			; GFX9-O0-NEXT: v_mov_b32_e32 v1, v0
	; GFX9-O0-NEXT: v_add_u32_e64 v1, v1, v2			; GFX9-O0-NEXT: v_add_u32_e64 v1, v1, v2
	; GFX9-O0-NEXT: s_mov_b64 exec, s[40:41]			; GFX9-O0-NEXT: s_mov_b64 exec, s[40:41]
	; GFX9-O0-NEXT: v_mov_b32_e32 v0, v1			; GFX9-O0-NEXT: v_mov_b32_e32 v0, v1
	; GFX9-O0-NEXT: buffer_store_dword v0, off, s[36:39], s34 offset:4			; GFX9-O0-NEXT: buffer_store_dword v0, off, s[36:39], s34 offset:4
	; GFX9-O0-NEXT: v_readlane_b32 s31, v3, 1			; GFX9-O0-NEXT: v_readlane_b32 s31, v3, 1
	; GFX9-O0-NEXT: v_readlane_b32 s30, v3, 0			; GFX9-O0-NEXT: v_readlane_b32 s30, v3, 0
	; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffffc00			; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffffc00
	; GFX9-O0-NEXT: v_readlane_b32 s33, v3, 2			; GFX9-O0-NEXT: s_mov_b32 s33, s35
	; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O0-NEXT: s_waitcnt vmcnt(0)			; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O0-NEXT: s_setpc_b64 s[30:31]			; GFX9-O0-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-O3-LABEL: strict_wwm_call:			; GFX9-O3-LABEL: strict_wwm_call:
	; GFX9-O3: ; %bb.0:			; GFX9-O3: ; %bb.0:
	; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O3-NEXT: v_writelane_b32 v3, s33, 2
	; GFX9-O3-NEXT: v_writelane_b32 v3, s30, 0			; GFX9-O3-NEXT: v_writelane_b32 v3, s30, 0
				; GFX9-O3-NEXT: s_mov_b32 s38, s33
	; GFX9-O3-NEXT: s_mov_b32 s33, s32			; GFX9-O3-NEXT: s_mov_b32 s33, s32
	; GFX9-O3-NEXT: s_addk_i32 s32, 0x400			; GFX9-O3-NEXT: s_addk_i32 s32, 0x400
	; GFX9-O3-NEXT: v_writelane_b32 v3, s31, 1			; GFX9-O3-NEXT: v_writelane_b32 v3, s31, 1
	; GFX9-O3-NEXT: v_mov_b32_e32 v2, s8			; GFX9-O3-NEXT: v_mov_b32_e32 v2, s8
	; GFX9-O3-NEXT: s_not_b64 exec, exec			; GFX9-O3-NEXT: s_not_b64 exec, exec
	; GFX9-O3-NEXT: v_mov_b32_e32 v2, 0			; GFX9-O3-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-O3-NEXT: s_not_b64 exec, exec			; GFX9-O3-NEXT: s_not_b64 exec, exec
	; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-O3-NEXT: v_mov_b32_e32 v0, v2			; GFX9-O3-NEXT: v_mov_b32_e32 v0, v2
	; GFX9-O3-NEXT: s_getpc_b64 s[36:37]			; GFX9-O3-NEXT: s_getpc_b64 s[36:37]
	; GFX9-O3-NEXT: s_add_u32 s36, s36, strict_wwm_called@rel32@lo+4			; GFX9-O3-NEXT: s_add_u32 s36, s36, strict_wwm_called@rel32@lo+4
	; GFX9-O3-NEXT: s_addc_u32 s37, s37, strict_wwm_called@rel32@hi+12			; GFX9-O3-NEXT: s_addc_u32 s37, s37, strict_wwm_called@rel32@hi+12
	; GFX9-O3-NEXT: s_swappc_b64 s[30:31], s[36:37]			; GFX9-O3-NEXT: s_swappc_b64 s[30:31], s[36:37]
	; GFX9-O3-NEXT: v_mov_b32_e32 v1, v0			; GFX9-O3-NEXT: v_mov_b32_e32 v1, v0
	; GFX9-O3-NEXT: v_add_u32_e32 v1, v1, v2			; GFX9-O3-NEXT: v_add_u32_e32 v1, v1, v2
	; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O3-NEXT: v_mov_b32_e32 v0, v1			; GFX9-O3-NEXT: v_mov_b32_e32 v0, v1
	; GFX9-O3-NEXT: buffer_store_dword v0, off, s[4:7], 0 offset:4			; GFX9-O3-NEXT: buffer_store_dword v0, off, s[4:7], 0 offset:4
	; GFX9-O3-NEXT: v_readlane_b32 s31, v3, 1			; GFX9-O3-NEXT: v_readlane_b32 s31, v3, 1
	; GFX9-O3-NEXT: v_readlane_b32 s30, v3, 0			; GFX9-O3-NEXT: v_readlane_b32 s30, v3, 0
	; GFX9-O3-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-O3-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-O3-NEXT: v_readlane_b32 s33, v3, 2			; GFX9-O3-NEXT: s_mov_b32 s33, s38
	; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-O3-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v3, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O3-NEXT: s_waitcnt vmcnt(0)			; GFX9-O3-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O3-NEXT: s_setpc_b64 s[30:31]			; GFX9-O3-NEXT: s_setpc_b64 s[30:31]
	%tmp107 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %arg, i32 0)			%tmp107 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %arg, i32 0)
	▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines
	; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: s_waitcnt vmcnt(0)			; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:40 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O0-NEXT: v_writelane_b32 v10, s33, 8			; GFX9-O0-NEXT: s_mov_b32 s42, s33
	; GFX9-O0-NEXT: s_mov_b32 s33, s32			; GFX9-O0-NEXT: s_mov_b32 s33, s32
	; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xc00			; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xc00
	; GFX9-O0-NEXT: v_writelane_b32 v10, s30, 0			; GFX9-O0-NEXT: v_writelane_b32 v10, s30, 0
	; GFX9-O0-NEXT: v_writelane_b32 v10, s31, 1			; GFX9-O0-NEXT: v_writelane_b32 v10, s31, 1
	; GFX9-O0-NEXT: s_mov_b32 s34, s8			; GFX9-O0-NEXT: s_mov_b32 s34, s8
	; GFX9-O0-NEXT: s_mov_b32 s36, s4			; GFX9-O0-NEXT: s_mov_b32 s36, s4
	; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39			; GFX9-O0-NEXT: ; kill: def $sgpr36 killed $sgpr36 def $sgpr36_sgpr37_sgpr38_sgpr39
	; GFX9-O0-NEXT: s_mov_b32 s37, s5			; GFX9-O0-NEXT: s_mov_b32 s37, s5
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2			; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2
	; GFX9-O0-NEXT: v_mov_b32_e32 v1, v3			; GFX9-O0-NEXT: v_mov_b32_e32 v1, v3
	; GFX9-O0-NEXT: s_mov_b32 s34, 0			; GFX9-O0-NEXT: s_mov_b32 s34, 0
	; GFX9-O0-NEXT: buffer_store_dwordx2 v[0:1], off, s[36:39], s34 offset:4			; GFX9-O0-NEXT: buffer_store_dwordx2 v[0:1], off, s[36:39], s34 offset:4
	; GFX9-O0-NEXT: v_readlane_b32 s31, v10, 1			; GFX9-O0-NEXT: v_readlane_b32 s31, v10, 1
	; GFX9-O0-NEXT: v_readlane_b32 s30, v10, 0			; GFX9-O0-NEXT: v_readlane_b32 s30, v10, 0
	; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffff400			; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffff400
	; GFX9-O0-NEXT: v_readlane_b32 s33, v10, 8			; GFX9-O0-NEXT: s_mov_b32 s33, s42
	; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-O0-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-O0-NEXT: buffer_load_dword v10, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v10, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: s_nop 0			; GFX9-O0-NEXT: s_nop 0
	; GFX9-O0-NEXT: buffer_load_dword v8, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v8, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: s_nop 0			; GFX9-O0-NEXT: s_nop 0
	; GFX9-O0-NEXT: buffer_load_dword v9, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v9, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: s_nop 0			; GFX9-O0-NEXT: s_nop 0
	; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	Show All 23 Lines
	; GFX9-O3-NEXT: s_waitcnt vmcnt(0)			; GFX9-O3-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O3-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v7, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: s_waitcnt vmcnt(0)			; GFX9-O3-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O3-NEXT: v_writelane_b32 v8, s33, 2
	; GFX9-O3-NEXT: v_writelane_b32 v8, s30, 0			; GFX9-O3-NEXT: v_writelane_b32 v8, s30, 0
				; GFX9-O3-NEXT: s_mov_b32 s40, s33
	; GFX9-O3-NEXT: s_mov_b32 s33, s32			; GFX9-O3-NEXT: s_mov_b32 s33, s32
	; GFX9-O3-NEXT: s_addk_i32 s32, 0x800			; GFX9-O3-NEXT: s_addk_i32 s32, 0x800
	; GFX9-O3-NEXT: v_writelane_b32 v8, s31, 1			; GFX9-O3-NEXT: v_writelane_b32 v8, s31, 1
	; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-O3-NEXT: s_getpc_b64 s[36:37]			; GFX9-O3-NEXT: s_getpc_b64 s[36:37]
	; GFX9-O3-NEXT: s_add_u32 s36, s36, strict_wwm_called_i64@gotpcrel32@lo+4			; GFX9-O3-NEXT: s_add_u32 s36, s36, strict_wwm_called_i64@gotpcrel32@lo+4
	; GFX9-O3-NEXT: s_addc_u32 s37, s37, strict_wwm_called_i64@gotpcrel32@hi+12			; GFX9-O3-NEXT: s_addc_u32 s37, s37, strict_wwm_called_i64@gotpcrel32@hi+12
	; GFX9-O3-NEXT: s_load_dwordx2 s[36:37], s[36:37], 0x0			; GFX9-O3-NEXT: s_load_dwordx2 s[36:37], s[36:37], 0x0
	Show All 15 Lines
	; GFX9-O3-NEXT: v_addc_co_u32_e32 v3, vcc, v3, v7, vcc			; GFX9-O3-NEXT: v_addc_co_u32_e32 v3, vcc, v3, v7, vcc
	; GFX9-O3-NEXT: s_mov_b64 exec, s[38:39]			; GFX9-O3-NEXT: s_mov_b64 exec, s[38:39]
	; GFX9-O3-NEXT: v_mov_b32_e32 v0, v2			; GFX9-O3-NEXT: v_mov_b32_e32 v0, v2
	; GFX9-O3-NEXT: v_mov_b32_e32 v1, v3			; GFX9-O3-NEXT: v_mov_b32_e32 v1, v3
	; GFX9-O3-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 offset:4			; GFX9-O3-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 offset:4
	; GFX9-O3-NEXT: v_readlane_b32 s31, v8, 1			; GFX9-O3-NEXT: v_readlane_b32 s31, v8, 1
	; GFX9-O3-NEXT: v_readlane_b32 s30, v8, 0			; GFX9-O3-NEXT: v_readlane_b32 s30, v8, 0
	; GFX9-O3-NEXT: s_addk_i32 s32, 0xf800			; GFX9-O3-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-O3-NEXT: v_readlane_b32 s33, v8, 2			; GFX9-O3-NEXT: s_mov_b32 s33, s40
	; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1			; GFX9-O3-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GFX9-O3-NEXT: buffer_load_dword v8, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v8, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: s_nop 0			; GFX9-O3-NEXT: s_nop 0
	; GFX9-O3-NEXT: buffer_load_dword v6, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v6, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: s_nop 0			; GFX9-O3-NEXT: s_nop 0
	; GFX9-O3-NEXT: buffer_load_dword v7, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v7, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: s_nop 0			; GFX9-O3-NEXT: s_nop 0
	; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	▲ Show 20 Lines • Show All 191 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Separate out SGPR spills to VGPR lanes during PEIClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 440730

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/localizer.ll

llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll

llvm/test/CodeGen/AMDGPU/amdpal-callable.ll

llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll

llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll

llvm/test/CodeGen/AMDGPU/frame-setup-without-sgpr-to-vgpr-spills.ll

llvm/test/CodeGen/AMDGPU/gfx-call-non-gfx-func.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll

llvm/test/CodeGen/AMDGPU/indirect-call.ll

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll

llvm/test/CodeGen/AMDGPU/nested-calls.ll

llvm/test/CodeGen/AMDGPU/no-source-locations-in-prologue.ll

llvm/test/CodeGen/AMDGPU/pei-scavenge-vgpr-spill.mir

llvm/test/CodeGen/AMDGPU/save-fp.ll

llvm/test/CodeGen/AMDGPU/sgpr-spills-split-regalloc.ll

llvm/test/CodeGen/AMDGPU/sibling-call.ll

llvm/test/CodeGen/AMDGPU/spill-csr-frame-ptr-reg-copy.ll

llvm/test/CodeGen/AMDGPU/stack-realign.ll

llvm/test/CodeGen/AMDGPU/tail-call-amdgpu-gfx.ll

llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll

llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll

llvm/test/CodeGen/AMDGPU/wave32.ll

llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll

[AMDGPU] Separate out SGPR spills to VGPR lanes during PEI
ClosedPublic