This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPUISelDAGToDAG.cpp
-
AMDGPUInstructionSelector.cpp
1/2
SIFrameLowering.cpp
-
SIRegisterInfo.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
-
dynamic-alloca-uniform.ll
-
extractelement-stack-lower.ll
-
flat-scratch.ll
-
non-entry-alloca.ll
-
addrspacecast.ll
-
amdgpu.private-memory.ll
-
call-constant.ll
-
call-preserved-registers.ll
-
callee-frame-setup.ll
-
callee-special-input-sgprs.ll
-
callee-special-input-vgprs-packed.ll
-
callee-special-input-vgprs.ll
-
cc-update.ll
-
cross-block-use-is-not-abi-copy.ll
-
flat-scratch.ll
-
frame-index-elimination.ll
-
frame-setup-without-sgpr-to-vgpr-spills.ll
-
gfx-callable-argument-types.ll
-
gfx-callable-preserved-registers.ll
-
gfx-callable-return-types.ll
-
indirect-call.ll
-
local-stack-alloc-block-sp-reference.ll
-
mul24-pass-ordering.ll
-
need-fp-from-csr-vgpr-spill.ll
-
nested-calls.ll
-
non-entry-alloca.ll
-
pei-scavenge-sgpr-carry-out.mir
-
pei-scavenge-sgpr-gfx9.mir
-
pei-scavenge-sgpr.mir
-
pei-scavenge-vgpr-spill.mir
-
sgpr-spill.mir
-
sibling-call.ll
-
spill-offset-calculation.ll
-
spill-scavenge-offset.ll
-
stack-realign-kernel.ll
-
stack-realign.ll
-
unstructured-cfg-def-use-issue.ll
-
wave32.ll
-
wwm-reserved-spill.ll

Differential D103322

[AMDGPU] Use s_add_i32 for address additions
ClosedPublic

Authored by sebastian-ne on May 28 2021, 8:38 AM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec

Commits

rG96e1fcb1e005: [AMDGPU] Use s_add_i32 for address additions

Summary

This allows to convert the add instruction to s_addk_i32 and
v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU
instruction.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sebastian-ne created this revision.May 28 2021, 8:38 AM

Herald added subscribers: foad, kerbowa, arphaman and 9 others. · View Herald TranscriptMay 28 2021, 8:38 AM

sebastian-ne requested review of this revision.May 28 2021, 8:38 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2021, 8:38 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

This allows to convert the add instruction to s_addk_i32

Nice. (But perhaps we should be able to convert s_add_u32 -> s_addk_i32 if scc is dead?)

and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction.

None of the tests show this. Why is it better? Just because it does not clobber vcc?

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
311	Matt usually objects to this extra indentation on the grounds that clang-format is wrong.
1297–1302	`*=`?

Harbormaster completed remote builds in B106722: Diff 348539.May 28 2021, 9:17 AM

This allows to convert the add instruction to s_addk_i32

Nice. (But perhaps we should be able to convert s_add_u32 -> s_addk_i32 if scc is dead?)

That would be nice, but how can I find out if SCC is unused? The dead flag is unreliable (at least for GlobalISel it is not set when the ShrinkInstructions pass is run. In some review Matt suggested that should remove the flag altogether).

and v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU instruction.

None of the tests show this. Why is it better? Just because it does not clobber vcc?

I hoped it could save a register, but you’re right, it doesn’t change anything.

LGTM modulo Jay's comments.

This revision is now accepted and ready to land.Jun 1 2021, 2:42 PM

arsenm accepted this revision.Jun 1 2021, 2:44 PM

Closed by commit rG96e1fcb1e005: [AMDGPU] Use s_add_i32 for address additions (authored by sebastian-ne). · Explain WhyJun 7 2021, 7:10 AM

This revision was automatically updated to reflect the committed changes.

sebastian-ne marked an inline comment as done.

sebastian-ne added a commit: rG96e1fcb1e005: [AMDGPU] Use s_add_i32 for address additions.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUISelDAGToDAG.cpp

7 lines

AMDGPUInstructionSelector.cpp

6 lines

SIFrameLowering.cpp

30 lines

SIRegisterInfo.cpp

32 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

dynamic-alloca-uniform.ll

28 lines

extractelement-stack-lower.ll

18 lines

flat-scratch.ll

40 lines

non-entry-alloca.ll

10 lines

addrspacecast.ll

2 lines

amdgpu.private-memory.ll

2 lines

call-constant.ll

4 lines

call-preserved-registers.ll

4 lines

callee-frame-setup.ll

106 lines

callee-special-input-sgprs.ll

2 lines

callee-special-input-vgprs-packed.ll

4 lines

callee-special-input-vgprs.ll

4 lines

cc-update.ll

14 lines

cross-block-use-is-not-abi-copy.ll

16 lines

flat-scratch.ll

186 lines

frame-index-elimination.ll

6 lines

frame-setup-without-sgpr-to-vgpr-spills.ll

8 lines

gfx-callable-argument-types.ll

1070 lines

gfx-callable-preserved-registers.ll

80 lines

gfx-callable-return-types.ll

12 lines

indirect-call.ll

24 lines

local-stack-alloc-block-sp-reference.ll

30 lines

mul24-pass-ordering.ll

4 lines

need-fp-from-csr-vgpr-spill.ll

8 lines

nested-calls.ll

8 lines

non-entry-alloca.ll

44 lines

pei-scavenge-sgpr-carry-out.mir

38 lines

pei-scavenge-sgpr-gfx9.mir

24 lines

pei-scavenge-sgpr.mir

6 lines

pei-scavenge-vgpr-spill.mir

40 lines

sgpr-spill.mir

12 lines

sibling-call.ll

4 lines

spill-offset-calculation.ll

18 lines

spill-scavenge-offset.ll

7 lines

stack-realign-kernel.ll

6 lines

stack-realign.ll

42 lines

unstructured-cfg-def-use-issue.ll

6 lines

wave32.ll

8 lines

wwm-reserved-spill.ll

16 lines

Diff 350278

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

Show First 20 Lines • Show All 1,888 Lines • ▼ Show 20 Lines	if (auto FI = dyn_cast<FrameIndexSDNode>(SAddr)) {
SAddr = CurDAG->getTargetFrameIndex(FI->getIndex(), FI->getValueType(0));		SAddr = CurDAG->getTargetFrameIndex(FI->getIndex(), FI->getValueType(0));
} else if (SAddr.getOpcode() == ISD::ADD &&		} else if (SAddr.getOpcode() == ISD::ADD &&
isa<FrameIndexSDNode>(SAddr.getOperand(0))) {		isa<FrameIndexSDNode>(SAddr.getOperand(0))) {
// Materialize this into a scalar move for scalar address to avoid		// Materialize this into a scalar move for scalar address to avoid
// readfirstlane.		// readfirstlane.
auto FI = cast<FrameIndexSDNode>(SAddr.getOperand(0));		auto FI = cast<FrameIndexSDNode>(SAddr.getOperand(0));
SDValue TFI = CurDAG->getTargetFrameIndex(FI->getIndex(),		SDValue TFI = CurDAG->getTargetFrameIndex(FI->getIndex(),
FI->getValueType(0));		FI->getValueType(0));
SAddr = SDValue(CurDAG->getMachineNode(AMDGPU::S_ADD_U32, SDLoc(SAddr),		SAddr = SDValue(CurDAG->getMachineNode(AMDGPU::S_ADD_I32, SDLoc(SAddr),
MVT::i32, TFI, SAddr.getOperand(1)),		MVT::i32, TFI, SAddr.getOperand(1)),
0);		0);
}		}

return SAddr;		return SAddr;
}		}

// Match (32-bit SGPR base) + sext(imm offset)		// Match (32-bit SGPR base) + sext(imm offset)
Show All 25 Lines	std::tie(SplitImmOffset, RemainderOffset) = TII->splitFlatOffset(
COffsetVal, AMDGPUAS::PRIVATE_ADDRESS, SIInstrFlags::FlatScratch);		COffsetVal, AMDGPUAS::PRIVATE_ADDRESS, SIInstrFlags::FlatScratch);

COffsetVal = SplitImmOffset;		COffsetVal = SplitImmOffset;

SDValue AddOffset =		SDValue AddOffset =
SAddr.getOpcode() == ISD::TargetFrameIndex		SAddr.getOpcode() == ISD::TargetFrameIndex
? getMaterializedScalarImm32(Lo_32(RemainderOffset), DL)		? getMaterializedScalarImm32(Lo_32(RemainderOffset), DL)
: CurDAG->getTargetConstant(RemainderOffset, DL, MVT::i32);		: CurDAG->getTargetConstant(RemainderOffset, DL, MVT::i32);
SAddr = SDValue(CurDAG->getMachineNode(AMDGPU::S_ADD_U32, DL, MVT::i32,		SAddr = SDValue(CurDAG->getMachineNode(AMDGPU::S_ADD_I32, DL, MVT::i32,
SAddr, AddOffset), 0);		SAddr, AddOffset),
		0);
}		}

Offset = CurDAG->getTargetConstant(COffsetVal, DL, MVT::i16);		Offset = CurDAG->getTargetConstant(COffsetVal, DL, MVT::i16);

return true;		return true;
}		}

bool AMDGPUDAGToDAGISel::SelectSMRDOffset(SDValue ByteOffsetNode,		bool AMDGPUDAGToDAGISel::SelectSMRDOffset(SDValue ByteOffsetNode,
▲ Show 20 Lines • Show All 1,223 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

Show First 20 Lines • Show All 3,688 Lines • ▼ Show 20 Lines	if (LHSDef && RHSDef &&
LHSDef->MI->getOpcode() == AMDGPU::G_FRAME_INDEX &&		LHSDef->MI->getOpcode() == AMDGPU::G_FRAME_INDEX &&
isSGPR(RHSDef->Reg)) {		isSGPR(RHSDef->Reg)) {
int FI = LHSDef->MI->getOperand(1).getIndex();		int FI = LHSDef->MI->getOperand(1).getIndex();
MachineInstr &I = *Root.getParent();		MachineInstr &I = *Root.getParent();
MachineBasicBlock *BB = I.getParent();		MachineBasicBlock *BB = I.getParent();
const DebugLoc &DL = I.getDebugLoc();		const DebugLoc &DL = I.getDebugLoc();
SAddr = MRI->createVirtualRegister(&AMDGPU::SReg_32RegClass);		SAddr = MRI->createVirtualRegister(&AMDGPU::SReg_32RegClass);

BuildMI(*BB, &I, DL, TII.get(AMDGPU::S_ADD_U32), SAddr)		BuildMI(*BB, &I, DL, TII.get(AMDGPU::S_ADD_I32), SAddr)
.addFrameIndex(FI)		.addFrameIndex(FI)
.addReg(RHSDef->Reg);		.addReg(RHSDef->Reg);
}		}
}		}

if (!isSGPR(SAddr))		if (!isSGPR(SAddr))
return None;		return None;

return {{		return {{
[=](MachineInstrBuilder &MIB) { MIB.addReg(SAddr); }, // saddr		[=](MachineInstrBuilder &MIB) { MIB.addReg(SAddr); }, // saddr
▲ Show 20 Lines • Show All 699 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

Show First 20 Lines • Show All 301 Lines • ▼ Show 20 Lines	void SIFrameLowering::emitEntryFunctionFlatScratchInit(
assert(ST.getGeneration() < AMDGPUSubtarget::GFX9);		assert(ST.getGeneration() < AMDGPUSubtarget::GFX9);

// Copy the size in bytes.		// Copy the size in bytes.
BuildMI(MBB, I, DL, TII->get(AMDGPU::COPY), AMDGPU::FLAT_SCR_LO)		BuildMI(MBB, I, DL, TII->get(AMDGPU::COPY), AMDGPU::FLAT_SCR_LO)
.addReg(FlatScrInitHi, RegState::Kill);		.addReg(FlatScrInitHi, RegState::Kill);

// Add wave offset in bytes to private base offset.		// Add wave offset in bytes to private base offset.
// See comment in AMDKernelCodeT.h for enable_sgpr_flat_scratch_init.		// See comment in AMDKernelCodeT.h for enable_sgpr_flat_scratch_init.
BuildMI(MBB, I, DL, TII->get(AMDGPU::S_ADD_U32), FlatScrInitLo)		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_ADD_I32), FlatScrInitLo)
.addReg(FlatScrInitLo)		.addReg(FlatScrInitLo)
		foadUnsubmitted Not Done Reply Inline Actions Matt usually objects to this extra indentation on the grounds that clang-format is wrong. foad: Matt usually objects to this extra indentation on the grounds that clang-format is wrong.
.addReg(ScratchWaveOffsetReg);		.addReg(ScratchWaveOffsetReg);

// Convert offset to 256-byte units.		// Convert offset to 256-byte units.
BuildMI(MBB, I, DL, TII->get(AMDGPU::S_LSHR_B32), AMDGPU::FLAT_SCR_HI)		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_LSHR_B32), AMDGPU::FLAT_SCR_HI)
.addReg(FlatScrInitLo, RegState::Kill)		.addReg(FlatScrInitLo, RegState::Kill)
.addImm(8);		.addImm(8);
}		}

// Note SGPRSpill stack IDs should only be used for SGPR spilling to VGPRs, not		// Note SGPRSpill stack IDs should only be used for SGPR spilling to VGPRs, not
▲ Show 20 Lines • Show All 583 Lines • ▼ Show 20 Lines	if (TRI.hasStackRealignment(MF)) {
const unsigned Alignment = MFI.getMaxAlign().value();		const unsigned Alignment = MFI.getMaxAlign().value();

RoundedSize += Alignment;		RoundedSize += Alignment;
if (LiveRegs.empty()) {		if (LiveRegs.empty()) {
LiveRegs.init(TRI);		LiveRegs.init(TRI);
LiveRegs.addLiveIns(MBB);		LiveRegs.addLiveIns(MBB);
}		}

// s_add_u32 s33, s32, NumBytes		// s_add_i32 s33, s32, NumBytes
// s_and_b32 s33, s33, 0b111...0000		// s_and_b32 s33, s33, 0b111...0000
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_ADD_U32), FramePtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_ADD_I32), FramePtrReg)
.addReg(StackPtrReg)		.addReg(StackPtrReg)
.addImm((Alignment - 1) * getScratchScaleFactor(ST))		.addImm((Alignment - 1) * getScratchScaleFactor(ST))
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_AND_B32), FramePtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_AND_B32), FramePtrReg)
.addReg(FramePtrReg, RegState::Kill)		.addReg(FramePtrReg, RegState::Kill)
.addImm(-Alignment * getScratchScaleFactor(ST))		.addImm(-Alignment * getScratchScaleFactor(ST))
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
FuncInfo->setIsStackRealigned(true);		FuncInfo->setIsStackRealigned(true);
Show All 9 Lines	void SIFrameLowering::emitPrologue(MachineFunction &MF,
// the incoming arguments.		// the incoming arguments.
if ((HasBP = TRI.hasBasePointer(MF))) {		if ((HasBP = TRI.hasBasePointer(MF))) {
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::COPY), BasePtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::COPY), BasePtrReg)
.addReg(StackPtrReg)		.addReg(StackPtrReg)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
}		}

if (HasFP && RoundedSize != 0) {		if (HasFP && RoundedSize != 0) {
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_ADD_U32), StackPtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_ADD_I32), StackPtrReg)
.addReg(StackPtrReg)		.addReg(StackPtrReg)
.addImm(RoundedSize * getScratchScaleFactor(ST))		.addImm(RoundedSize * getScratchScaleFactor(ST))
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
}		}

assert((!HasFP \|\| (FuncInfo->SGPRForFPSaveRestoreCopy \|\|		assert((!HasFP \|\| (FuncInfo->SGPRForFPSaveRestoreCopy \|\|
FuncInfo->FramePointerSaveIndex)) &&		FuncInfo->FramePointerSaveIndex)) &&
"Needed to save FP but didn't save it anywhere");		"Needed to save FP but didn't save it anywhere");
Show All 34 Lines	void SIFrameLowering::emitEpilogue(MachineFunction &MF,
const Register FramePtrReg = FuncInfo->getFrameOffsetReg();		const Register FramePtrReg = FuncInfo->getFrameOffsetReg();
const Register BasePtrReg =		const Register BasePtrReg =
TRI.hasBasePointer(MF) ? TRI.getBaseRegister() : Register();		TRI.hasBasePointer(MF) ? TRI.getBaseRegister() : Register();

Optional<int> FPSaveIndex = FuncInfo->FramePointerSaveIndex;		Optional<int> FPSaveIndex = FuncInfo->FramePointerSaveIndex;
Optional<int> BPSaveIndex = FuncInfo->BasePointerSaveIndex;		Optional<int> BPSaveIndex = FuncInfo->BasePointerSaveIndex;

if (RoundedSize != 0 && hasFP(MF)) {		if (RoundedSize != 0 && hasFP(MF)) {
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_SUB_U32), StackPtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_ADD_I32), StackPtrReg)
.addReg(StackPtrReg)		.addReg(StackPtrReg)
.addImm(RoundedSize * getScratchScaleFactor(ST))		.addImm(-static_cast<int64_t>(RoundedSize * getScratchScaleFactor(ST)))
.setMIFlag(MachineInstr::FrameDestroy);		.setMIFlag(MachineInstr::FrameDestroy);
}		}

if (FuncInfo->SGPRForFPSaveRestoreCopy) {		if (FuncInfo->SGPRForFPSaveRestoreCopy) {
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::COPY), FramePtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::COPY), FramePtrReg)
.addReg(FuncInfo->SGPRForFPSaveRestoreCopy)		.addReg(FuncInfo->SGPRForFPSaveRestoreCopy)
.setMIFlag(MachineInstr::FrameDestroy);		.setMIFlag(MachineInstr::FrameDestroy);
}		}

▲ Show 20 Lines • Show All 286 Lines • ▼ Show 20 Lines	MachineBasicBlock::iterator SIFrameLowering::eliminateCallFramePseudoInstr(
uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;		uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;

if (!hasReservedCallFrame(MF)) {		if (!hasReservedCallFrame(MF)) {
Amount = alignTo(Amount, getStackAlign());		Amount = alignTo(Amount, getStackAlign());
assert(isUInt<32>(Amount) && "exceeded stack address space size");		assert(isUInt<32>(Amount) && "exceeded stack address space size");
const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
Register SPReg = MFI->getStackPtrOffsetReg();		Register SPReg = MFI->getStackPtrOffsetReg();

unsigned Op = IsDestroy ? AMDGPU::S_SUB_U32 : AMDGPU::S_ADD_U32;		Amount *= getScratchScaleFactor(ST);
BuildMI(MBB, I, DL, TII->get(Op), SPReg)		if (IsDestroy)
		Amount = -Amount;
		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_ADD_I32), SPReg)
.addReg(SPReg)		.addReg(SPReg)
.addImm(Amount * getScratchScaleFactor(ST));		.addImm(Amount);
		foadUnsubmitted Done Reply Inline Actions `=`? foad:* `*=`?
} else if (CalleePopAmount != 0) {		} else if (CalleePopAmount != 0) {
llvm_unreachable("is this used?");		llvm_unreachable("is this used?");
}		}

return MBB.erase(I);		return MBB.erase(I);
}		}

/// Returns true if the frame will require a reference to the stack pointer.		/// Returns true if the frame will require a reference to the stack pointer.
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 697 Lines • ▼ Show 20 Lines	Register FIReg = MRI.createVirtualRegister(
: &AMDGPU::VGPR_32RegClass);		: &AMDGPU::VGPR_32RegClass);

BuildMI(*MBB, Ins, DL, TII->get(AMDGPU::S_MOV_B32), OffsetReg)		BuildMI(*MBB, Ins, DL, TII->get(AMDGPU::S_MOV_B32), OffsetReg)
.addImm(Offset);		.addImm(Offset);
BuildMI(*MBB, Ins, DL, TII->get(MovOpc), FIReg)		BuildMI(*MBB, Ins, DL, TII->get(MovOpc), FIReg)
.addFrameIndex(FrameIdx);		.addFrameIndex(FrameIdx);

if (ST.enableFlatScratch() ) {		if (ST.enableFlatScratch() ) {
BuildMI(*MBB, Ins, DL, TII->get(AMDGPU::S_ADD_U32), BaseReg)		BuildMI(*MBB, Ins, DL, TII->get(AMDGPU::S_ADD_I32), BaseReg)
.addReg(OffsetReg, RegState::Kill)		.addReg(OffsetReg, RegState::Kill)
.addReg(FIReg);		.addReg(FIReg);
return BaseReg;		return BaseReg;
}		}

TII->getAddNoCarry(*MBB, Ins, DL, BaseReg)		TII->getAddNoCarry(*MBB, Ins, DL, BaseReg)
.addReg(OffsetReg, RegState::Kill)		.addReg(OffsetReg, RegState::Kill)
.addReg(FIReg)		.addReg(FIReg)
▲ Show 20 Lines • Show All 393 Lines • ▼ Show 20 Lines	if (!IsOffsetLegal \|\| (IsFlat && !SOffset && !ST.hasFlatScratchSTMode())) {
}		}

if (!SOffset)		if (!SOffset)
report_fatal_error("could not scavenge SGPR to spill in entry function");		report_fatal_error("could not scavenge SGPR to spill in entry function");

if (ScratchOffsetReg == AMDGPU::NoRegister) {		if (ScratchOffsetReg == AMDGPU::NoRegister) {
BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_MOV_B32), SOffset).addImm(Offset);		BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_MOV_B32), SOffset).addImm(Offset);
} else {		} else {
BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_ADD_U32), SOffset)		BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_ADD_I32), SOffset)
.addReg(ScratchOffsetReg)		.addReg(ScratchOffsetReg)
.addImm(Offset);		.addImm(Offset);
}		}

Offset = 0;		Offset = 0;
}		}

if (IsFlat && SOffset == AMDGPU::NoRegister) {		if (IsFlat && SOffset == AMDGPU::NoRegister) {
▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = NumSubRegs + NumRemSubRegs, RegOffset = 0; i != e;
}		}

if (NeedSuperRegImpOperand)		if (NeedSuperRegImpOperand)
MIB.addReg(ValueReg, RegState::Implicit \| SrcDstRegState);		MIB.addReg(ValueReg, RegState::Implicit \| SrcDstRegState);
}		}

if (ScratchOffsetRegDelta != 0) {		if (ScratchOffsetRegDelta != 0) {
// Subtract the offset we added to the ScratchOffset register.		// Subtract the offset we added to the ScratchOffset register.
BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_SUB_U32), SOffset)		BuildMI(MBB, MI, DL, TII->get(AMDGPU::S_ADD_I32), SOffset)
.addReg(SOffset)		.addReg(SOffset)
.addImm(ScratchOffsetRegDelta);		.addImm(-ScratchOffsetRegDelta);
}		}
}		}

void SIRegisterInfo::buildVGPRSpillLoadStore(SGPRSpillBuilder &SB, int Index,		void SIRegisterInfo::buildVGPRSpillLoadStore(SGPRSpillBuilder &SB, int Index,
int Offset, bool IsLoad,		int Offset, bool IsLoad,
bool IsKill) const {		bool IsKill) const {
// Load/store VGPR		// Load/store VGPR
MachineFrameInfo &FrameInfo = SB.MF.getFrameInfo();		MachineFrameInfo &FrameInfo = SB.MF.getFrameInfo();
▲ Show 20 Lines • Show All 426 Lines • ▼ Show 20 Lines	default: {

if (!TmpSReg) {		if (!TmpSReg) {
// Use frame register and restore it after.		// Use frame register and restore it after.
TmpSReg = FrameReg;		TmpSReg = FrameReg;
FIOp.setReg(FrameReg);		FIOp.setReg(FrameReg);
FIOp.setIsKill(false);		FIOp.setIsKill(false);
}		}

BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_U32), TmpSReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_I32), TmpSReg)
.addReg(FrameReg)		.addReg(FrameReg)
.addImm(Offset);		.addImm(Offset);

if (!UseSGPR)		if (!UseSGPR)
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_MOV_B32_e32), TmpReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_MOV_B32_e32), TmpReg)
.addReg(TmpSReg, RegState::Kill);		.addReg(TmpSReg, RegState::Kill);

if (TmpSReg == FrameReg) {		if (TmpSReg == FrameReg) {
// Undo frame register modification.		// Undo frame register modification.
BuildMI(*MBB, std::next(MI), DL, TII->get(AMDGPU::S_SUB_U32),		BuildMI(*MBB, std::next(MI), DL, TII->get(AMDGPU::S_ADD_I32),
FrameReg)		FrameReg)
.addReg(FrameReg)		.addReg(FrameReg)
.addImm(Offset);		.addImm(-Offset);
}		}

return;		return;
}		}

bool IsMUBUF = TII->isMUBUF(*MI);		bool IsMUBUF = TII->isMUBUF(*MI);

if (!IsMUBUF && !MFI->isEntryFunction()) {		if (!IsMUBUF && !MFI->isEntryFunction()) {
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	default: {
// unavailable. Only one additional mov is needed.		// unavailable. Only one additional mov is needed.
Register TmpScaledReg =		Register TmpScaledReg =
RS->scavengeRegister(&AMDGPU::SReg_32_XM0RegClass, MI, 0, false);		RS->scavengeRegister(&AMDGPU::SReg_32_XM0RegClass, MI, 0, false);
Register ScaledReg = TmpScaledReg.isValid() ? TmpScaledReg : FrameReg;		Register ScaledReg = TmpScaledReg.isValid() ? TmpScaledReg : FrameReg;

BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_LSHR_B32), ScaledReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_LSHR_B32), ScaledReg)
.addReg(FrameReg)		.addReg(FrameReg)
.addImm(ST.getWavefrontSizeLog2());		.addImm(ST.getWavefrontSizeLog2());
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_U32), ScaledReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_I32), ScaledReg)
.addReg(ScaledReg, RegState::Kill)		.addReg(ScaledReg, RegState::Kill)
.addImm(Offset);		.addImm(Offset);
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::COPY), ResultReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::COPY), ResultReg)
.addReg(ScaledReg, RegState::Kill);		.addReg(ScaledReg, RegState::Kill);

// If there were truly no free SGPRs, we need to undo everything.		// If there were truly no free SGPRs, we need to undo everything.
if (!TmpScaledReg.isValid()) {		if (!TmpScaledReg.isValid()) {
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_SUB_U32), ScaledReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_I32), ScaledReg)
.addReg(ScaledReg, RegState::Kill)		.addReg(ScaledReg, RegState::Kill)
.addImm(Offset);		.addImm(-Offset);
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_LSHL_B32), ScaledReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_LSHL_B32), ScaledReg)
.addReg(FrameReg)		.addReg(FrameReg)
.addImm(ST.getWavefrontSizeLog2());		.addImm(ST.getWavefrontSizeLog2());
}		}
}		}
}		}

// Don't introduce an extra copy if we're just materializing in a mov.		// Don't introduce an extra copy if we're just materializing in a mov.
▲ Show 20 Lines • Show All 678 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/dynamic-alloca-uniform.ll

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
}		}

define void @func_dynamic_stackalloc_sgpr_align4() {		define void @func_dynamic_stackalloc_sgpr_align4() {
; GFX9-LABEL: func_dynamic_stackalloc_sgpr_align4:		; GFX9-LABEL: func_dynamic_stackalloc_sgpr_align4:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s6, s33		; GFX9-NEXT: s_mov_b32 s6, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_add_u32 s32, s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: s_getpc_b64 s[4:5]		; GFX9-NEXT: s_getpc_b64 s[4:5]
; GFX9-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4		; GFX9-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4
; GFX9-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12		; GFX9-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12
; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0		; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: s_mov_b32 s33, s6		; GFX9-NEXT: s_mov_b32 s33, s6
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s4, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_lshl2_add_u32 s4, s4, 15		; GFX9-NEXT: s_lshl2_add_u32 s4, s4, 15
; GFX9-NEXT: s_and_b32 s4, s4, -16		; GFX9-NEXT: s_and_b32 s4, s4, -16
; GFX9-NEXT: s_lshl_b32 s4, s4, 6		; GFX9-NEXT: s_lshl_b32 s4, s4, 6
; GFX9-NEXT: s_add_u32 s4, s32, s4		; GFX9-NEXT: s_add_u32 s4, s32, s4
; GFX9-NEXT: v_mov_b32_e32 v1, s4		; GFX9-NEXT: v_mov_b32_e32 v1, s4
; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen		; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
; GFX9-NEXT: s_sub_u32 s32, s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0xfc00
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: func_dynamic_stackalloc_sgpr_align4:		; GFX10-LABEL: func_dynamic_stackalloc_sgpr_align4:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s6, s33		; GFX10-NEXT: s_mov_b32 s6, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_add_u32 s32, s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[4:5]		; GFX10-NEXT: s_getpc_b64 s[4:5]
; GFX10-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4		; GFX10-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4
; GFX10-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12		; GFX10-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0		; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
; GFX10-NEXT: s_mov_b32 s33, s6		; GFX10-NEXT: s_mov_b32 s33, s6
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_load_dword s4, s[4:5], 0x0		; GFX10-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_lshl2_add_u32 s4, s4, 15		; GFX10-NEXT: s_lshl2_add_u32 s4, s4, 15
; GFX10-NEXT: s_and_b32 s4, s4, -16		; GFX10-NEXT: s_and_b32 s4, s4, -16
; GFX10-NEXT: s_lshl_b32 s4, s4, 5		; GFX10-NEXT: s_lshl_b32 s4, s4, 5
; GFX10-NEXT: s_add_u32 s4, s32, s4		; GFX10-NEXT: s_add_u32 s4, s32, s4
; GFX10-NEXT: s_sub_u32 s32, s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_mov_b32_e32 v1, s4		; GFX10-NEXT: v_mov_b32_e32 v1, s4
; GFX10-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen		; GFX10-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
%n = load i32, i32 addrspace(4)* @gv, align 4		%n = load i32, i32 addrspace(4)* @gv, align 4
%alloca = alloca i32, i32 %n, addrspace(5)		%alloca = alloca i32, i32 %n, addrspace(5)
store i32 0, i32 addrspace(5)* %alloca		store i32 0, i32 addrspace(5)* %alloca
ret void		ret void
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
}		}

define void @func_dynamic_stackalloc_sgpr_align16() {		define void @func_dynamic_stackalloc_sgpr_align16() {
; GFX9-LABEL: func_dynamic_stackalloc_sgpr_align16:		; GFX9-LABEL: func_dynamic_stackalloc_sgpr_align16:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s6, s33		; GFX9-NEXT: s_mov_b32 s6, s33
; GFX9-NEXT: s_mov_b32 s33, s32		; GFX9-NEXT: s_mov_b32 s33, s32
; GFX9-NEXT: s_add_u32 s32, s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0x400
; GFX9-NEXT: s_getpc_b64 s[4:5]		; GFX9-NEXT: s_getpc_b64 s[4:5]
; GFX9-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4		; GFX9-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4
; GFX9-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12		; GFX9-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12
; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0		; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: s_mov_b32 s33, s6		; GFX9-NEXT: s_mov_b32 s33, s6
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s4, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_lshl2_add_u32 s4, s4, 15		; GFX9-NEXT: s_lshl2_add_u32 s4, s4, 15
; GFX9-NEXT: s_and_b32 s4, s4, -16		; GFX9-NEXT: s_and_b32 s4, s4, -16
; GFX9-NEXT: s_lshl_b32 s4, s4, 6		; GFX9-NEXT: s_lshl_b32 s4, s4, 6
; GFX9-NEXT: s_add_u32 s4, s32, s4		; GFX9-NEXT: s_add_u32 s4, s32, s4
; GFX9-NEXT: v_mov_b32_e32 v1, s4		; GFX9-NEXT: v_mov_b32_e32 v1, s4
; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen		; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
; GFX9-NEXT: s_sub_u32 s32, s32, 0x400		; GFX9-NEXT: s_addk_i32 s32, 0xfc00
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: func_dynamic_stackalloc_sgpr_align16:		; GFX10-LABEL: func_dynamic_stackalloc_sgpr_align16:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s6, s33		; GFX10-NEXT: s_mov_b32 s6, s33
; GFX10-NEXT: s_mov_b32 s33, s32		; GFX10-NEXT: s_mov_b32 s33, s32
; GFX10-NEXT: s_add_u32 s32, s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0x200
; GFX10-NEXT: s_getpc_b64 s[4:5]		; GFX10-NEXT: s_getpc_b64 s[4:5]
; GFX10-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4		; GFX10-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4
; GFX10-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12		; GFX10-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0		; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
; GFX10-NEXT: s_mov_b32 s33, s6		; GFX10-NEXT: s_mov_b32 s33, s6
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_load_dword s4, s[4:5], 0x0		; GFX10-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_lshl2_add_u32 s4, s4, 15		; GFX10-NEXT: s_lshl2_add_u32 s4, s4, 15
; GFX10-NEXT: s_and_b32 s4, s4, -16		; GFX10-NEXT: s_and_b32 s4, s4, -16
; GFX10-NEXT: s_lshl_b32 s4, s4, 5		; GFX10-NEXT: s_lshl_b32 s4, s4, 5
; GFX10-NEXT: s_add_u32 s4, s32, s4		; GFX10-NEXT: s_add_u32 s4, s32, s4
; GFX10-NEXT: s_sub_u32 s32, s32, 0x200		; GFX10-NEXT: s_addk_i32 s32, 0xfe00
; GFX10-NEXT: v_mov_b32_e32 v1, s4		; GFX10-NEXT: v_mov_b32_e32 v1, s4
; GFX10-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen		; GFX10-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
%n = load i32, i32 addrspace(4)* @gv, align 16		%n = load i32, i32 addrspace(4)* @gv, align 16
%alloca = alloca i32, i32 %n, addrspace(5)		%alloca = alloca i32, i32 %n, addrspace(5)
store i32 0, i32 addrspace(5)* %alloca		store i32 0, i32 addrspace(5)* %alloca
ret void		ret void
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	; GFX10-NEXT: s_endpgm
ret void		ret void
}		}

define void @func_dynamic_stackalloc_sgpr_align32(i32 addrspace(1)* %out) {		define void @func_dynamic_stackalloc_sgpr_align32(i32 addrspace(1)* %out) {
; GFX9-LABEL: func_dynamic_stackalloc_sgpr_align32:		; GFX9-LABEL: func_dynamic_stackalloc_sgpr_align32:
; GFX9: ; %bb.0:		; GFX9: ; %bb.0:
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s6, s33		; GFX9-NEXT: s_mov_b32 s6, s33
; GFX9-NEXT: s_add_u32 s33, s32, 0x7c0		; GFX9-NEXT: s_add_i32 s33, s32, 0x7c0
; GFX9-NEXT: s_and_b32 s33, s33, 0xfffff800		; GFX9-NEXT: s_and_b32 s33, s33, 0xfffff800
; GFX9-NEXT: s_add_u32 s32, s32, 0x1000		; GFX9-NEXT: s_addk_i32 s32, 0x1000
; GFX9-NEXT: s_getpc_b64 s[4:5]		; GFX9-NEXT: s_getpc_b64 s[4:5]
; GFX9-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4		; GFX9-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4
; GFX9-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12		; GFX9-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12
; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0		; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
; GFX9-NEXT: v_mov_b32_e32 v0, 0		; GFX9-NEXT: v_mov_b32_e32 v0, 0
; GFX9-NEXT: s_mov_b32 s33, s6		; GFX9-NEXT: s_mov_b32 s33, s6
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_load_dword s4, s[4:5], 0x0		; GFX9-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_lshl2_add_u32 s4, s4, 15		; GFX9-NEXT: s_lshl2_add_u32 s4, s4, 15
; GFX9-NEXT: s_and_b32 s4, s4, -16		; GFX9-NEXT: s_and_b32 s4, s4, -16
; GFX9-NEXT: s_lshl_b32 s4, s4, 6		; GFX9-NEXT: s_lshl_b32 s4, s4, 6
; GFX9-NEXT: s_add_u32 s4, s32, s4		; GFX9-NEXT: s_add_u32 s4, s32, s4
; GFX9-NEXT: s_and_b32 s4, s4, 0xfffff800		; GFX9-NEXT: s_and_b32 s4, s4, 0xfffff800
; GFX9-NEXT: v_mov_b32_e32 v1, s4		; GFX9-NEXT: v_mov_b32_e32 v1, s4
; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen		; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
; GFX9-NEXT: s_sub_u32 s32, s32, 0x1000		; GFX9-NEXT: s_addk_i32 s32, 0xf000
; GFX9-NEXT: s_waitcnt vmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]		; GFX9-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX10-LABEL: func_dynamic_stackalloc_sgpr_align32:		; GFX10-LABEL: func_dynamic_stackalloc_sgpr_align32:
; GFX10: ; %bb.0:		; GFX10: ; %bb.0:
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s6, s33		; GFX10-NEXT: s_mov_b32 s6, s33
; GFX10-NEXT: s_add_u32 s33, s32, 0x3e0		; GFX10-NEXT: s_add_i32 s33, s32, 0x3e0
; GFX10-NEXT: v_mov_b32_e32 v0, 0		; GFX10-NEXT: v_mov_b32_e32 v0, 0
; GFX10-NEXT: s_and_b32 s33, s33, 0xfffffc00		; GFX10-NEXT: s_and_b32 s33, s33, 0xfffffc00
; GFX10-NEXT: s_add_u32 s32, s32, 0x800		; GFX10-NEXT: s_addk_i32 s32, 0x800
; GFX10-NEXT: s_getpc_b64 s[4:5]		; GFX10-NEXT: s_getpc_b64 s[4:5]
; GFX10-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4		; GFX10-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4
; GFX10-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12		; GFX10-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12
; GFX10-NEXT: s_mov_b32 s33, s6		; GFX10-NEXT: s_mov_b32 s33, s6
; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0		; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_load_dword s4, s[4:5], 0x0		; GFX10-NEXT: s_load_dword s4, s[4:5], 0x0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_lshl2_add_u32 s4, s4, 15		; GFX10-NEXT: s_lshl2_add_u32 s4, s4, 15
; GFX10-NEXT: s_and_b32 s4, s4, -16		; GFX10-NEXT: s_and_b32 s4, s4, -16
; GFX10-NEXT: s_lshl_b32 s4, s4, 5		; GFX10-NEXT: s_lshl_b32 s4, s4, 5
; GFX10-NEXT: s_add_u32 s4, s32, s4		; GFX10-NEXT: s_add_u32 s4, s32, s4
; GFX10-NEXT: s_and_b32 s4, s4, 0xfffffc00		; GFX10-NEXT: s_and_b32 s4, s4, 0xfffffc00
; GFX10-NEXT: s_sub_u32 s32, s32, 0x800		; GFX10-NEXT: s_addk_i32 s32, 0xf800
; GFX10-NEXT: v_mov_b32_e32 v1, s4		; GFX10-NEXT: v_mov_b32_e32 v1, s4
; GFX10-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen		; GFX10-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_setpc_b64 s[30:31]		; GFX10-NEXT: s_setpc_b64 s[30:31]
%n = load i32, i32 addrspace(4)* @gv		%n = load i32, i32 addrspace(4)* @gv
%alloca = alloca i32, i32 %n, align 32, addrspace(5)		%alloca = alloca i32, i32 %n, align 32, addrspace(5)
store i32 0, i32 addrspace(5)* %alloca		store i32 0, i32 addrspace(5)* %alloca
ret void		ret void
}		}

llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement-stack-lower.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -mattr=-xnack -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s		; RUN: llc -global-isel -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 -mattr=-xnack -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN %s

; Check lowering of some large extractelement that use the stack		; Check lowering of some large extractelement that use the stack
; instead of register indexing.		; instead of register indexing.

define i32 @v_extract_v64i32_varidx(<64 x i32> addrspace(1)* %ptr, i32 %idx) {		define i32 @v_extract_v64i32_varidx(<64 x i32> addrspace(1)* %ptr, i32 %idx) {
; GCN-LABEL: v_extract_v64i32_varidx:		; GCN-LABEL: v_extract_v64i32_varidx:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s6, s33		; GCN-NEXT: s_mov_b32 s6, s33
; GCN-NEXT: s_add_u32 s33, s32, 0x3fc0		; GCN-NEXT: s_add_i32 s33, s32, 0x3fc0
; GCN-NEXT: s_and_b32 s33, s33, 0xffffc000		; GCN-NEXT: s_and_b32 s33, s33, 0xffffc000
; GCN-NEXT: v_add_co_u32_e32 v3, vcc, 64, v0		; GCN-NEXT: v_add_co_u32_e32 v3, vcc, 64, v0
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:52 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:52 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:48 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:48 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:44 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:44 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill
Show All 30 Lines
; GCN-NEXT: v_mov_b32_e32 v5, s4		; GCN-NEXT: v_mov_b32_e32 v5, s4
; GCN-NEXT: v_add_co_u32_e32 v60, vcc, v0, v5		; GCN-NEXT: v_add_co_u32_e32 v60, vcc, v0, v5
; GCN-NEXT: v_addc_co_u32_e32 v61, vcc, v1, v6, vcc		; GCN-NEXT: v_addc_co_u32_e32 v61, vcc, v1, v6, vcc
; GCN-NEXT: v_and_b32_e32 v0, 63, v2		; GCN-NEXT: v_and_b32_e32 v0, 63, v2
; GCN-NEXT: v_lshrrev_b32_e64 v1, 6, s33		; GCN-NEXT: v_lshrrev_b32_e64 v1, 6, s33
; GCN-NEXT: v_lshlrev_b32_e32 v0, 2, v0		; GCN-NEXT: v_lshlrev_b32_e32 v0, 2, v0
; GCN-NEXT: v_add_u32_e32 v1, 0x100, v1		; GCN-NEXT: v_add_u32_e32 v1, 0x100, v1
; GCN-NEXT: v_add_u32_e32 v0, v1, v0		; GCN-NEXT: v_add_u32_e32 v0, v1, v0
; GCN-NEXT: s_add_u32 s32, s32, 0x10000		; GCN-NEXT: s_add_i32 s32, s32, 0x10000
; GCN-NEXT: s_sub_u32 s32, s32, 0x10000		; GCN-NEXT: s_add_i32 s32, s32, 0xffff0000
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:640 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:640 ; 4-byte Folded Spill
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:644 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:644 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:648 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:648 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:652 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:652 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:656 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:656 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:660 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:660 ; 4-byte Folded Spill
▲ Show 20 Lines • Show All 196 Lines • ▼ Show 20 Lines	; GCN-NEXT: s_setpc_b64 s[30:31]
ret i32 %elt		ret i32 %elt
}		}

define i16 @v_extract_v128i16_varidx(<128 x i16> addrspace(1)* %ptr, i32 %idx) {		define i16 @v_extract_v128i16_varidx(<128 x i16> addrspace(1)* %ptr, i32 %idx) {
; GCN-LABEL: v_extract_v128i16_varidx:		; GCN-LABEL: v_extract_v128i16_varidx:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s6, s33		; GCN-NEXT: s_mov_b32 s6, s33
; GCN-NEXT: s_add_u32 s33, s32, 0x3fc0		; GCN-NEXT: s_add_i32 s33, s32, 0x3fc0
; GCN-NEXT: s_and_b32 s33, s33, 0xffffc000		; GCN-NEXT: s_and_b32 s33, s33, 0xffffc000
; GCN-NEXT: v_add_co_u32_e32 v3, vcc, 64, v0		; GCN-NEXT: v_add_co_u32_e32 v3, vcc, 64, v0
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:52 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:52 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:48 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:48 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:44 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:44 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill
Show All 30 Lines
; GCN-NEXT: v_mov_b32_e32 v5, s4		; GCN-NEXT: v_mov_b32_e32 v5, s4
; GCN-NEXT: v_add_co_u32_e32 v60, vcc, v0, v5		; GCN-NEXT: v_add_co_u32_e32 v60, vcc, v0, v5
; GCN-NEXT: v_addc_co_u32_e32 v61, vcc, v1, v6, vcc		; GCN-NEXT: v_addc_co_u32_e32 v61, vcc, v1, v6, vcc
; GCN-NEXT: v_lshrrev_b32_e32 v0, 1, v2		; GCN-NEXT: v_lshrrev_b32_e32 v0, 1, v2
; GCN-NEXT: v_and_b32_e32 v0, 63, v0		; GCN-NEXT: v_and_b32_e32 v0, 63, v0
; GCN-NEXT: v_lshlrev_b32_e32 v0, 2, v0		; GCN-NEXT: v_lshlrev_b32_e32 v0, 2, v0
; GCN-NEXT: v_and_b32_e32 v1, 1, v2		; GCN-NEXT: v_and_b32_e32 v1, 1, v2
; GCN-NEXT: v_lshlrev_b32_e32 v1, 4, v1		; GCN-NEXT: v_lshlrev_b32_e32 v1, 4, v1
; GCN-NEXT: s_add_u32 s32, s32, 0x10000		; GCN-NEXT: s_add_i32 s32, s32, 0x10000
; GCN-NEXT: s_sub_u32 s32, s32, 0x10000		; GCN-NEXT: s_add_i32 s32, s32, 0xffff0000
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:640 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:640 ; 4-byte Folded Spill
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:644 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:644 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:648 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:648 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:652 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:652 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:656 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:656 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:660 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:660 ; 4-byte Folded Spill
▲ Show 20 Lines • Show All 201 Lines • ▼ Show 20 Lines	; GCN-NEXT: s_setpc_b64 s[30:31]
ret i16 %elt		ret i16 %elt
}		}

define i64 @v_extract_v32i64_varidx(<32 x i64> addrspace(1)* %ptr, i32 %idx) {		define i64 @v_extract_v32i64_varidx(<32 x i64> addrspace(1)* %ptr, i32 %idx) {
; GCN-LABEL: v_extract_v32i64_varidx:		; GCN-LABEL: v_extract_v32i64_varidx:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s6, s33		; GCN-NEXT: s_mov_b32 s6, s33
; GCN-NEXT: s_add_u32 s33, s32, 0x3fc0		; GCN-NEXT: s_add_i32 s33, s32, 0x3fc0
; GCN-NEXT: s_and_b32 s33, s33, 0xffffc000		; GCN-NEXT: s_and_b32 s33, s33, 0xffffc000
; GCN-NEXT: v_add_co_u32_e32 v3, vcc, 64, v0		; GCN-NEXT: v_add_co_u32_e32 v3, vcc, 64, v0
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:52 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:52 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:48 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:48 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:44 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:44 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:40 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:36 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:32 ; 4-byte Folded Spill
Show All 30 Lines
; GCN-NEXT: v_mov_b32_e32 v5, s4		; GCN-NEXT: v_mov_b32_e32 v5, s4
; GCN-NEXT: v_add_co_u32_e32 v60, vcc, v0, v5		; GCN-NEXT: v_add_co_u32_e32 v60, vcc, v0, v5
; GCN-NEXT: v_addc_co_u32_e32 v61, vcc, v1, v6, vcc		; GCN-NEXT: v_addc_co_u32_e32 v61, vcc, v1, v6, vcc
; GCN-NEXT: v_and_b32_e32 v0, 31, v2		; GCN-NEXT: v_and_b32_e32 v0, 31, v2
; GCN-NEXT: v_lshrrev_b32_e64 v2, 6, s33		; GCN-NEXT: v_lshrrev_b32_e64 v2, 6, s33
; GCN-NEXT: v_lshlrev_b32_e32 v0, 3, v0		; GCN-NEXT: v_lshlrev_b32_e32 v0, 3, v0
; GCN-NEXT: v_add_u32_e32 v2, 0x100, v2		; GCN-NEXT: v_add_u32_e32 v2, 0x100, v2
; GCN-NEXT: v_add_u32_e32 v1, v2, v0		; GCN-NEXT: v_add_u32_e32 v1, v2, v0
; GCN-NEXT: s_add_u32 s32, s32, 0x10000		; GCN-NEXT: s_add_i32 s32, s32, 0x10000
; GCN-NEXT: s_sub_u32 s32, s32, 0x10000		; GCN-NEXT: s_add_i32 s32, s32, 0xffff0000
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:640 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:640 ; 4-byte Folded Spill
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:644 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:644 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:648 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 offset:648 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:652 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s33 offset:652 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:656 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v44, off, s[0:3], s33 offset:656 ; 4-byte Folded Spill
; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:660 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v45, off, s[0:3], s33 offset:660 ; 4-byte Folded Spill
▲ Show 20 Lines • Show All 199 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx900 -global-isel -mattr=-promote-alloca -amdgpu-enable-flat-scratch -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX9 %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -global-isel -mattr=-promote-alloca -amdgpu-enable-flat-scratch -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX9 %s
	; RUN: llc -march=amdgcn -mcpu=gfx1030 -global-isel -mattr=-promote-alloca -amdgpu-enable-flat-scratch -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX10 %s			; RUN: llc -march=amdgcn -mcpu=gfx1030 -global-isel -mattr=-promote-alloca -amdgpu-enable-flat-scratch -verify-machineinstrs < %s \| FileCheck -check-prefix=GFX10 %s

	define amdgpu_kernel void @store_load_sindex_kernel(i32 %idx) {			define amdgpu_kernel void @store_load_sindex_kernel(i32 %idx) {
	; GFX9-LABEL: store_load_sindex_kernel:			; GFX9-LABEL: store_load_sindex_kernel:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX9-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s2, s5			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s2, s5
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_lshl_b32 s1, s0, 2			; GFX9-NEXT: s_lshl_b32 s1, s0, 2
	; GFX9-NEXT: s_and_b32 s0, s0, 15			; GFX9-NEXT: s_and_b32 s0, s0, 15
	; GFX9-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-NEXT: s_add_u32 s1, 4, s1			; GFX9-NEXT: s_add_i32 s1, s1, 4
	; GFX9-NEXT: scratch_store_dword off, v0, s1			; GFX9-NEXT: scratch_store_dword off, v0, s1
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_u32 s0, 4, s0			; GFX9-NEXT: s_add_i32 s0, s0, 4
	; GFX9-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_sindex_kernel:			; GFX10-LABEL: store_load_sindex_kernel:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s2, s2, s5			; GFX10-NEXT: s_add_u32 s2, s2, s5
	; GFX10-NEXT: s_addc_u32 s3, s3, 0			; GFX10-NEXT: s_addc_u32 s3, s3, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX10-NEXT: v_mov_b32_e32 v0, 15			; GFX10-NEXT: v_mov_b32_e32 v0, 15
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_and_b32 s1, s0, 15			; GFX10-NEXT: s_and_b32 s1, s0, 15
	; GFX10-NEXT: s_lshl_b32 s0, s0, 2			; GFX10-NEXT: s_lshl_b32 s0, s0, 2
	; GFX10-NEXT: s_lshl_b32 s1, s1, 2			; GFX10-NEXT: s_lshl_b32 s1, s1, 2
	; GFX10-NEXT: s_add_u32 s0, 4, s0			; GFX10-NEXT: s_add_i32 s0, s0, 4
	; GFX10-NEXT: s_add_u32 s1, 4, s1			; GFX10-NEXT: s_add_i32 s1, s1, 4
	; GFX10-NEXT: scratch_store_dword off, v0, s0			; GFX10-NEXT: scratch_store_dword off, v0, s0
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	bb:			bb:
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*			%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
	▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc			; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_lshl_b32 s1, s0, 2			; GFX9-NEXT: s_lshl_b32 s1, s0, 2
	; GFX9-NEXT: s_and_b32 s0, s0, 15			; GFX9-NEXT: s_and_b32 s0, s0, 15
	; GFX9-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: s_add_u32 s1, 0x104, s1			; GFX9-NEXT: s_addk_i32 s1, 0x104
	; GFX9-NEXT: scratch_store_dword off, v0, s1			; GFX9-NEXT: scratch_store_dword off, v0, s1
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_u32 s0, 0x104, s0			; GFX9-NEXT: s_addk_i32 s0, 0x104
	; GFX9-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_sindex_small_offset_kernel:			; GFX10-LABEL: store_load_sindex_small_offset_kernel:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s2, s2, s5			; GFX10-NEXT: s_add_u32 s2, s2, s5
	; GFX10-NEXT: s_addc_u32 s3, s3, 0			; GFX10-NEXT: s_addc_u32 s3, s3, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, 15			; GFX10-NEXT: v_mov_b32_e32 v0, 15
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_and_b32 s1, s0, 15			; GFX10-NEXT: s_and_b32 s1, s0, 15
	; GFX10-NEXT: s_lshl_b32 s0, s0, 2			; GFX10-NEXT: s_lshl_b32 s0, s0, 2
	; GFX10-NEXT: s_lshl_b32 s1, s1, 2			; GFX10-NEXT: s_lshl_b32 s1, s1, 2
	; GFX10-NEXT: s_add_u32 s0, 0x104, s0			; GFX10-NEXT: s_addk_i32 s0, 0x104
	; GFX10-NEXT: s_add_u32 s1, 0x104, s1			; GFX10-NEXT: s_addk_i32 s1, 0x104
	; GFX10-NEXT: scratch_store_dword off, v0, s0			; GFX10-NEXT: scratch_store_dword off, v0, s0
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	bb:			bb:
	%padding = alloca [64 x i32], align 4, addrspace(5)			%padding = alloca [64 x i32], align 4, addrspace(5)
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	}			}

	define void @store_load_vindex_small_offset_foo(i32 %idx) {			define void @store_load_vindex_small_offset_foo(i32 %idx) {
	; GFX9-LABEL: store_load_vindex_small_offset_foo:			; GFX9-LABEL: store_load_vindex_small_offset_foo:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: scratch_load_dword v1, off, s32 glc			; GFX9-NEXT: scratch_load_dword v1, off, s32 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x100			; GFX9-NEXT: s_add_i32 vcc_hi, s32, 0x100
	; GFX9-NEXT: v_lshlrev_b32_e32 v1, 2, v0			; GFX9-NEXT: v_lshlrev_b32_e32 v1, 2, v0
	; GFX9-NEXT: v_and_b32_e32 v0, 15, v0			; GFX9-NEXT: v_and_b32_e32 v0, 15, v0
	; GFX9-NEXT: v_mov_b32_e32 v2, vcc_hi			; GFX9-NEXT: v_mov_b32_e32 v2, vcc_hi
	; GFX9-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX9-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX9-NEXT: v_add_u32_e32 v1, v2, v1			; GFX9-NEXT: v_add_u32_e32 v1, v2, v1
	; GFX9-NEXT: v_mov_b32_e32 v3, 15			; GFX9-NEXT: v_mov_b32_e32 v3, 15
	; GFX9-NEXT: scratch_store_dword v1, v3, off			; GFX9-NEXT: scratch_store_dword v1, v3, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_add_u32_e32 v0, v2, v0			; GFX9-NEXT: v_add_u32_e32 v0, v2, v0
	; GFX9-NEXT: scratch_load_dword v0, v0, off glc			; GFX9-NEXT: scratch_load_dword v0, v0, off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: store_load_vindex_small_offset_foo:			; GFX10-LABEL: store_load_vindex_small_offset_foo:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_and_b32_e32 v1, 15, v0			; GFX10-NEXT: v_and_b32_e32 v1, 15, v0
	; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x100			; GFX10-NEXT: s_add_i32 vcc_lo, s32, 0x100
	; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX10-NEXT: v_mov_b32_e32 v2, vcc_lo			; GFX10-NEXT: v_mov_b32_e32 v2, vcc_lo
	; GFX10-NEXT: v_mov_b32_e32 v3, 15			; GFX10-NEXT: v_mov_b32_e32 v3, 15
	; GFX10-NEXT: v_lshlrev_b32_e32 v1, 2, v1			; GFX10-NEXT: v_lshlrev_b32_e32 v1, 2, v1
	; GFX10-NEXT: v_add_nc_u32_e32 v0, v2, v0			; GFX10-NEXT: v_add_nc_u32_e32 v0, v2, v0
	; GFX10-NEXT: v_add_nc_u32_e32 v1, v2, v1			; GFX10-NEXT: v_add_nc_u32_e32 v1, v2, v1
	; GFX10-NEXT: scratch_load_dword v2, off, s32 glc dlc			; GFX10-NEXT: scratch_load_dword v2, off, s32 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	Show All 26 Lines
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc			; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_lshl_b32 s1, s0, 2			; GFX9-NEXT: s_lshl_b32 s1, s0, 2
	; GFX9-NEXT: s_and_b32 s0, s0, 15			; GFX9-NEXT: s_and_b32 s0, s0, 15
	; GFX9-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: s_add_u32 s1, 0x4004, s1			; GFX9-NEXT: s_addk_i32 s1, 0x4004
	; GFX9-NEXT: scratch_store_dword off, v0, s1			; GFX9-NEXT: scratch_store_dword off, v0, s1
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_u32 s0, 0x4004, s0			; GFX9-NEXT: s_addk_i32 s0, 0x4004
	; GFX9-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_sindex_large_offset_kernel:			; GFX10-LABEL: store_load_sindex_large_offset_kernel:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s2, s2, s5			; GFX10-NEXT: s_add_u32 s2, s2, s5
	; GFX10-NEXT: s_addc_u32 s3, s3, 0			; GFX10-NEXT: s_addc_u32 s3, s3, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, 15			; GFX10-NEXT: v_mov_b32_e32 v0, 15
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_and_b32 s1, s0, 15			; GFX10-NEXT: s_and_b32 s1, s0, 15
	; GFX10-NEXT: s_lshl_b32 s0, s0, 2			; GFX10-NEXT: s_lshl_b32 s0, s0, 2
	; GFX10-NEXT: s_lshl_b32 s1, s1, 2			; GFX10-NEXT: s_lshl_b32 s1, s1, 2
	; GFX10-NEXT: s_add_u32 s0, 0x4004, s0			; GFX10-NEXT: s_addk_i32 s0, 0x4004
	; GFX10-NEXT: s_add_u32 s1, 0x4004, s1			; GFX10-NEXT: s_addk_i32 s1, 0x4004
	; GFX10-NEXT: scratch_store_dword off, v0, s0			; GFX10-NEXT: scratch_store_dword off, v0, s0
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	bb:			bb:
	%padding = alloca [4096 x i32], align 4, addrspace(5)			%padding = alloca [4096 x i32], align 4, addrspace(5)
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	}			}

	define void @store_load_vindex_large_offset_foo(i32 %idx) {			define void @store_load_vindex_large_offset_foo(i32 %idx) {
	; GFX9-LABEL: store_load_vindex_large_offset_foo:			; GFX9-LABEL: store_load_vindex_large_offset_foo:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: scratch_load_dword v1, off, s32 glc			; GFX9-NEXT: scratch_load_dword v1, off, s32 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000			; GFX9-NEXT: s_add_i32 vcc_hi, s32, 0x4000
	; GFX9-NEXT: v_lshlrev_b32_e32 v1, 2, v0			; GFX9-NEXT: v_lshlrev_b32_e32 v1, 2, v0
	; GFX9-NEXT: v_and_b32_e32 v0, 15, v0			; GFX9-NEXT: v_and_b32_e32 v0, 15, v0
	; GFX9-NEXT: v_mov_b32_e32 v2, vcc_hi			; GFX9-NEXT: v_mov_b32_e32 v2, vcc_hi
	; GFX9-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX9-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX9-NEXT: v_add_u32_e32 v1, v2, v1			; GFX9-NEXT: v_add_u32_e32 v1, v2, v1
	; GFX9-NEXT: v_mov_b32_e32 v3, 15			; GFX9-NEXT: v_mov_b32_e32 v3, 15
	; GFX9-NEXT: scratch_store_dword v1, v3, off			; GFX9-NEXT: scratch_store_dword v1, v3, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_add_u32_e32 v0, v2, v0			; GFX9-NEXT: v_add_u32_e32 v0, v2, v0
	; GFX9-NEXT: scratch_load_dword v0, v0, off glc			; GFX9-NEXT: scratch_load_dword v0, v0, off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: store_load_vindex_large_offset_foo:			; GFX10-LABEL: store_load_vindex_large_offset_foo:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_and_b32_e32 v1, 15, v0			; GFX10-NEXT: v_and_b32_e32 v1, 15, v0
	; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000			; GFX10-NEXT: s_add_i32 vcc_lo, s32, 0x4000
	; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0			; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
	; GFX10-NEXT: v_mov_b32_e32 v2, vcc_lo			; GFX10-NEXT: v_mov_b32_e32 v2, vcc_lo
	; GFX10-NEXT: v_mov_b32_e32 v3, 15			; GFX10-NEXT: v_mov_b32_e32 v3, 15
	; GFX10-NEXT: v_lshlrev_b32_e32 v1, 2, v1			; GFX10-NEXT: v_lshlrev_b32_e32 v1, 2, v1
	; GFX10-NEXT: v_add_nc_u32_e32 v0, v2, v0			; GFX10-NEXT: v_add_nc_u32_e32 v0, v2, v0
	; GFX10-NEXT: v_add_nc_u32_e32 v1, v2, v1			; GFX10-NEXT: v_add_nc_u32_e32 v1, v2, v1
	; GFX10-NEXT: scratch_load_dword v2, off, s32 glc dlc			; GFX10-NEXT: scratch_load_dword v2, off, s32 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	Show All 24 Lines
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
	; GFX9-NEXT: s_movk_i32 s0, 0x3e80			; GFX9-NEXT: s_movk_i32 s0, 0x3e80
	; GFX9-NEXT: v_mov_b32_e32 v0, 13			; GFX9-NEXT: v_mov_b32_e32 v0, 13
	; GFX9-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:4			; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: s_add_u32 s0, 4, s0			; GFX9-NEXT: s_add_i32 s0, s0, 4
	; GFX9-NEXT: scratch_store_dword off, v0, s0			; GFX9-NEXT: scratch_store_dword off, v0, s0
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_large_imm_offset_kernel:			; GFX10-LABEL: store_load_large_imm_offset_kernel:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s0, s0, s3			; GFX10-NEXT: s_add_u32 s0, s0, s3
	; GFX10-NEXT: s_addc_u32 s1, s1, 0			; GFX10-NEXT: s_addc_u32 s1, s1, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
	; GFX10-NEXT: v_mov_b32_e32 v0, 13			; GFX10-NEXT: v_mov_b32_e32 v0, 13
	; GFX10-NEXT: v_mov_b32_e32 v1, 15			; GFX10-NEXT: v_mov_b32_e32 v1, 15
	; GFX10-NEXT: s_movk_i32 s0, 0x3e80			; GFX10-NEXT: s_movk_i32 s0, 0x3e80
	; GFX10-NEXT: s_add_u32 s0, 4, s0			; GFX10-NEXT: s_add_i32 s0, s0, 4
	; GFX10-NEXT: scratch_store_dword off, v0, off offset:4			; GFX10-NEXT: scratch_store_dword off, v0, off offset:4
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_store_dword off, v1, s0			; GFX10-NEXT: scratch_store_dword off, v1, s0
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s0 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s0 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	bb:			bb:
	Show All 11 Lines
	; GFX9-LABEL: store_load_large_imm_offset_foo:			; GFX9-LABEL: store_load_large_imm_offset_foo:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_movk_i32 s0, 0x3e80			; GFX9-NEXT: s_movk_i32 s0, 0x3e80
	; GFX9-NEXT: v_mov_b32_e32 v0, 13			; GFX9-NEXT: v_mov_b32_e32 v0, 13
	; GFX9-NEXT: scratch_store_dword off, v0, s32			; GFX9-NEXT: scratch_store_dword off, v0, s32
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: s_add_u32 s0, s32, s0			; GFX9-NEXT: s_add_i32 s0, s0, s32
	; GFX9-NEXT: scratch_store_dword off, v0, s0			; GFX9-NEXT: scratch_store_dword off, v0, s0
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: store_load_large_imm_offset_foo:			; GFX10-LABEL: store_load_large_imm_offset_foo:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_mov_b32_e32 v0, 13			; GFX10-NEXT: v_mov_b32_e32 v0, 13
	; GFX10-NEXT: v_mov_b32_e32 v1, 15			; GFX10-NEXT: v_mov_b32_e32 v1, 15
	; GFX10-NEXT: s_movk_i32 s0, 0x3e80			; GFX10-NEXT: s_movk_i32 s0, 0x3e80
	; GFX10-NEXT: s_add_u32 s0, s32, s0			; GFX10-NEXT: s_add_i32 s0, s0, s32
	; GFX10-NEXT: scratch_store_dword off, v0, s32			; GFX10-NEXT: scratch_store_dword off, v0, s32
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_store_dword off, v1, s0			; GFX10-NEXT: scratch_store_dword off, v1, s0
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s0 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s0 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	▲ Show 20 Lines • Show All 190 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll

Show First 20 Lines • Show All 149 Lines • ▼ Show 20 Lines

define void @func_non_entry_block_static_alloca_align4(i32 addrspace(1)* %out, i32 %arg.cond0, i32 %arg.cond1, i32 %in) {		define void @func_non_entry_block_static_alloca_align4(i32 addrspace(1)* %out, i32 %arg.cond0, i32 %arg.cond1, i32 %in) {
; GCN-LABEL: func_non_entry_block_static_alloca_align4:		; GCN-LABEL: func_non_entry_block_static_alloca_align4:
; GCN: ; %bb.0: ; %entry		; GCN: ; %bb.0: ; %entry
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s7, s33		; GCN-NEXT: s_mov_b32 s7, s33
; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2		; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_add_u32 s32, s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0x400
; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc		; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GCN-NEXT: s_cbranch_execz BB2_3		; GCN-NEXT: s_cbranch_execz BB2_3
; GCN-NEXT: ; %bb.1: ; %bb.0		; GCN-NEXT: ; %bb.1: ; %bb.0
; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3		; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3
; GCN-NEXT: s_and_b64 exec, exec, vcc		; GCN-NEXT: s_and_b64 exec, exec, vcc
; GCN-NEXT: s_cbranch_execz BB2_3		; GCN-NEXT: s_cbranch_execz BB2_3
; GCN-NEXT: ; %bb.2: ; %bb.1		; GCN-NEXT: ; %bb.2: ; %bb.1
; GCN-NEXT: s_add_u32 s6, s32, 0x1000		; GCN-NEXT: s_add_u32 s6, s32, 0x1000
Show All 9 Lines
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_add_u32_e32 v2, v2, v3		; GCN-NEXT: v_add_u32_e32 v2, v2, v3
; GCN-NEXT: global_store_dword v[0:1], v2, off		; GCN-NEXT: global_store_dword v[0:1], v2, off
; GCN-NEXT: BB2_3: ; %bb.2		; GCN-NEXT: BB2_3: ; %bb.2
; GCN-NEXT: s_or_b64 exec, exec, s[4:5]		; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
; GCN-NEXT: v_mov_b32_e32 v0, 0		; GCN-NEXT: v_mov_b32_e32 v0, 0
; GCN-NEXT: global_store_dword v[0:1], v0, off		; GCN-NEXT: global_store_dword v[0:1], v0, off
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_sub_u32 s32, s32, 0x400		; GCN-NEXT: s_addk_i32 s32, 0xfc00
; GCN-NEXT: s_mov_b32 s33, s7		; GCN-NEXT: s_mov_b32 s33, s7
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]

entry:		entry:
%cond0 = icmp eq i32 %arg.cond0, 0		%cond0 = icmp eq i32 %arg.cond0, 0
br i1 %cond0, label %bb.0, label %bb.2		br i1 %cond0, label %bb.0, label %bb.2

bb.0:		bb.0:
Show All 19 Lines	bb.2:
ret void		ret void
}		}

define void @func_non_entry_block_static_alloca_align64(i32 addrspace(1)* %out, i32 %arg.cond, i32 %in) {		define void @func_non_entry_block_static_alloca_align64(i32 addrspace(1)* %out, i32 %arg.cond, i32 %in) {
; GCN-LABEL: func_non_entry_block_static_alloca_align64:		; GCN-LABEL: func_non_entry_block_static_alloca_align64:
; GCN: ; %bb.0: ; %entry		; GCN: ; %bb.0: ; %entry
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s7, s33		; GCN-NEXT: s_mov_b32 s7, s33
; GCN-NEXT: s_add_u32 s33, s32, 0xfc0		; GCN-NEXT: s_add_i32 s33, s32, 0xfc0
; GCN-NEXT: s_and_b32 s33, s33, 0xfffff000		; GCN-NEXT: s_and_b32 s33, s33, 0xfffff000
; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2		; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2
; GCN-NEXT: s_add_u32 s32, s32, 0x2000		; GCN-NEXT: s_addk_i32 s32, 0x2000
; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc		; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GCN-NEXT: s_cbranch_execz BB3_2		; GCN-NEXT: s_cbranch_execz BB3_2
; GCN-NEXT: ; %bb.1: ; %bb.0		; GCN-NEXT: ; %bb.1: ; %bb.0
; GCN-NEXT: s_add_u32 s6, s32, 0x1000		; GCN-NEXT: s_add_u32 s6, s32, 0x1000
; GCN-NEXT: s_and_b32 s6, s6, 0xfffff000		; GCN-NEXT: s_and_b32 s6, s6, 0xfffff000
; GCN-NEXT: v_mov_b32_e32 v2, 0		; GCN-NEXT: v_mov_b32_e32 v2, 0
; GCN-NEXT: v_mov_b32_e32 v5, s6		; GCN-NEXT: v_mov_b32_e32 v5, s6
; GCN-NEXT: buffer_store_dword v2, v5, s[0:3], 0 offen		; GCN-NEXT: buffer_store_dword v2, v5, s[0:3], 0 offen
; GCN-NEXT: v_mov_b32_e32 v2, 1		; GCN-NEXT: v_mov_b32_e32 v2, 1
; GCN-NEXT: buffer_store_dword v2, v5, s[0:3], 0 offen offset:4		; GCN-NEXT: buffer_store_dword v2, v5, s[0:3], 0 offen offset:4
; GCN-NEXT: v_lshlrev_b32_e32 v2, 2, v3		; GCN-NEXT: v_lshlrev_b32_e32 v2, 2, v3
; GCN-NEXT: v_add_u32_e32 v2, s6, v2		; GCN-NEXT: v_add_u32_e32 v2, s6, v2
; GCN-NEXT: buffer_load_dword v2, v2, s[0:3], 0 offen		; GCN-NEXT: buffer_load_dword v2, v2, s[0:3], 0 offen
; GCN-NEXT: v_and_b32_e32 v3, 0x3ff, v4		; GCN-NEXT: v_and_b32_e32 v3, 0x3ff, v4
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_add_u32_e32 v2, v2, v3		; GCN-NEXT: v_add_u32_e32 v2, v2, v3
; GCN-NEXT: global_store_dword v[0:1], v2, off		; GCN-NEXT: global_store_dword v[0:1], v2, off
; GCN-NEXT: BB3_2: ; %bb.1		; GCN-NEXT: BB3_2: ; %bb.1
; GCN-NEXT: s_or_b64 exec, exec, s[4:5]		; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
; GCN-NEXT: v_mov_b32_e32 v0, 0		; GCN-NEXT: v_mov_b32_e32 v0, 0
; GCN-NEXT: global_store_dword v[0:1], v0, off		; GCN-NEXT: global_store_dword v[0:1], v0, off
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_sub_u32 s32, s32, 0x2000		; GCN-NEXT: s_addk_i32 s32, 0xe000
; GCN-NEXT: s_mov_b32 s33, s7		; GCN-NEXT: s_mov_b32 s33, s7
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%cond = icmp eq i32 %arg.cond, 0		%cond = icmp eq i32 %arg.cond, 0
br i1 %cond, label %bb.0, label %bb.1		br i1 %cond, label %bb.0, label %bb.1

bb.0:		bb.0:
%alloca = alloca [16 x i32], align 64, addrspace(5)		%alloca = alloca [16 x i32], align 64, addrspace(5)
Show All 19 Lines

llvm/test/CodeGen/AMDGPU/addrspacecast.ll

	Show First 20 Lines • Show All 341 Lines • ▼ Show 20 Lines
	; %val = load i32, i32* %fptr, align 4			; %val = load i32, i32* %fptr, align 4
	; store i32 %val, i32 addrspace(1)* %out, align 4			; store i32 %val, i32 addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; Check for prologue initializing special SGPRs pointing to scratch.			; Check for prologue initializing special SGPRs pointing to scratch.
	; HSA-LABEL: {{^}}store_flat_scratch:			; HSA-LABEL: {{^}}store_flat_scratch:
	; CI-DAG: s_mov_b32 flat_scratch_lo, s9			; CI-DAG: s_mov_b32 flat_scratch_lo, s9
	; CI-DAG: s_add_u32 [[ADD:s[0-9]+]], s8, s11			; CI-DAG: s_add_i32 [[ADD:s[0-9]+]], s8, s11
	; CI-DAG: s_lshr_b32 flat_scratch_hi, [[ADD]], 8			; CI-DAG: s_lshr_b32 flat_scratch_hi, [[ADD]], 8

	; GFX9: s_add_u32 flat_scratch_lo, s6, s9			; GFX9: s_add_u32 flat_scratch_lo, s6, s9
	; GFX9: s_addc_u32 flat_scratch_hi, s7, 0			; GFX9: s_addc_u32 flat_scratch_hi, s7, 0

	; HSA: {{flat\|global}}_store_dword			; HSA: {{flat\|global}}_store_dword
	; HSA: s_barrier			; HSA: s_barrier
	; HSA: {{flat\|global}}_load_dword			; HSA: {{flat\|global}}_load_dword
	▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll

	Show All 36 Lines

	; HSA-ALLOCA: .amd_kernel_code_t			; HSA-ALLOCA: .amd_kernel_code_t
	; FIXME: Creating the emergency stack slots causes us to over-estimate scratch			; FIXME: Creating the emergency stack slots causes us to over-estimate scratch
	; by 4 bytes.			; by 4 bytes.
	; HSA-ALLOCA: workitem_private_segment_byte_size = 24			; HSA-ALLOCA: workitem_private_segment_byte_size = 24
	; HSA-ALLOCA: .end_amd_kernel_code_t			; HSA-ALLOCA: .end_amd_kernel_code_t

	; HSA-ALLOCA: s_mov_b32 flat_scratch_lo, s7			; HSA-ALLOCA: s_mov_b32 flat_scratch_lo, s7
	; HSA-ALLOCA: s_add_u32 s6, s6, s9			; HSA-ALLOCA: s_add_i32 s6, s6, s9
	; HSA-ALLOCA: s_lshr_b32 flat_scratch_hi, s6, 8			; HSA-ALLOCA: s_lshr_b32 flat_scratch_hi, s6, 8

	; SI-ALLOCA: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], 0 offen ; encoding: [0x00,0x10,0x70,0xe0			; SI-ALLOCA: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], 0 offen ; encoding: [0x00,0x10,0x70,0xe0
	; SI-ALLOCA: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], 0 offen ; encoding: [0x00,0x10,0x70,0xe0			; SI-ALLOCA: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}], 0 offen ; encoding: [0x00,0x10,0x70,0xe0


	; HSAOPT: [[DISPATCH_PTR:%[0-9]+]] = call noalias nonnull dereferenceable(64) i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr()			; HSAOPT: [[DISPATCH_PTR:%[0-9]+]] = call noalias nonnull dereferenceable(64) i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr()
	; HSAOPT: [[CAST_DISPATCH_PTR:%[0-9]+]] = bitcast i8 addrspace(4)* [[DISPATCH_PTR]] to i32 addrspace(4)*			; HSAOPT: [[CAST_DISPATCH_PTR:%[0-9]+]] = bitcast i8 addrspace(4)* [[DISPATCH_PTR]] to i32 addrspace(4)*
	▲ Show 20 Lines • Show All 510 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/call-constant.ll

	; RUN: llc -global-isel=0 -amdgpu-fixed-function-abi=0 -mtriple=amdgcn-amd-amdhsa < %s \| FileCheck -check-prefixes=GCN,SDAG %s			; RUN: llc -global-isel=0 -amdgpu-fixed-function-abi=0 -mtriple=amdgcn-amd-amdhsa < %s \| FileCheck -check-prefixes=GCN,SDAG %s
	; RUN: llc -global-isel=1 -amdgpu-fixed-function-abi=1 -mtriple=amdgcn-amd-amdhsa < %s \| FileCheck -check-prefixes=GCN,GISEL %s			; RUN: llc -global-isel=1 -amdgpu-fixed-function-abi=1 -mtriple=amdgcn-amd-amdhsa < %s \| FileCheck -check-prefixes=GCN,GISEL %s

	; FIXME: Emitting unnecessary flat_scratch setup			; FIXME: Emitting unnecessary flat_scratch setup

	; GCN-LABEL: {{^}}test_call_undef:			; GCN-LABEL: {{^}}test_call_undef:
	; SDAG: s_mov_b32 flat_scratch_lo, s13			; SDAG: s_mov_b32 flat_scratch_lo, s13
	; SDAG: s_add_u32 s12, s12, s17			; SDAG: s_add_i32 s12, s12, s17
	; SDAG: s_lshr_b32			; SDAG: s_lshr_b32
	; GCN: s_endpgm			; GCN: s_endpgm
	define amdgpu_kernel void @test_call_undef() #0 {			define amdgpu_kernel void @test_call_undef() #0 {
	%val = call i32 undef(i32 1)			%val = call i32 undef(i32 1)
	%op = add i32 %val, 1			%op = add i32 %val, 1
	store volatile i32 %op, i32 addrspace(1)* undef			store volatile i32 %op, i32 addrspace(1)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}test_tail_call_undef:			; GCN-LABEL: {{^}}test_tail_call_undef:
	; SDAG: s_waitcnt			; SDAG: s_waitcnt
	; SDAG-NEXT: .Lfunc_end			; SDAG-NEXT: .Lfunc_end

	; GISEL: s_setpc_b64 s{{\[[0-9]+:[0-9]+\]}}			; GISEL: s_setpc_b64 s{{\[[0-9]+:[0-9]+\]}}
	define i32 @test_tail_call_undef() #0 {			define i32 @test_tail_call_undef() #0 {
	%call = tail call i32 undef(i32 1)			%call = tail call i32 undef(i32 1)
	ret i32 %call			ret i32 %call
	}			}

	; GCN-LABEL: {{^}}test_call_null:			; GCN-LABEL: {{^}}test_call_null:
	; SDAG: s_mov_b32 flat_scratch_lo, s13			; SDAG: s_mov_b32 flat_scratch_lo, s13
	; SDAG: s_add_u32 s12, s12, s17			; SDAG: s_add_i32 s12, s12, s17
	; SDAG: s_lshr_b32			; SDAG: s_lshr_b32

	; GISEL: s_swappc_b64 s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}			; GISEL: s_swappc_b64 s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}
	; GCN: s_endpgm			; GCN: s_endpgm
	define amdgpu_kernel void @test_call_null() #0 {			define amdgpu_kernel void @test_call_null() #0 {
	%val = call i32 null(i32 1)			%val = call i32 null(i32 1)
	%op = add i32 %val, 1			%op = add i32 %val, 1
	store volatile i32 %op, i32 addrspace(1)* null			store volatile i32 %op, i32 addrspace(1)* null
	Show All 12 Lines

llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll

	Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	}			}

	; GCN-LABEL: {{^}}test_func_call_external_void_funcx2:			; GCN-LABEL: {{^}}test_func_call_external_void_funcx2:
	; MUBUF: buffer_store_dword v40			; MUBUF: buffer_store_dword v40
	; FLATSCR: scratch_store_dword off, v40			; FLATSCR: scratch_store_dword off, v40
	; GCN: v_writelane_b32 v40, s33, 4			; GCN: v_writelane_b32 v40, s33, 4

	; GCN: s_mov_b32 s33, s32			; GCN: s_mov_b32 s33, s32
	; MUBUF: s_add_u32 s32, s32, 0x400			; MUBUF: s_addk_i32 s32, 0x400
	; FLATSCR: s_add_u32 s32, s32, 16			; FLATSCR: s_add_i32 s32, s32, 16
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GCN-NEXT: s_swappc_b64			; GCN-NEXT: s_swappc_b64

	; GCN: v_readlane_b32 s33, v40, 4			; GCN: v_readlane_b32 s33, v40, 4
	; MUBUF: buffer_load_dword v40			; MUBUF: buffer_load_dword v40
	; FLATSCR: scratch_load_dword v40			; FLATSCR: scratch_load_dword v40
	define void @test_func_call_external_void_funcx2() #0 {			define void @test_func_call_external_void_funcx2() #0 {
	call void @external_void_func_void()			call void @external_void_func_void()
	▲ Show 20 Lines • Show All 274 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
; Can use free call clobbered register to preserve original FP value.		; Can use free call clobbered register to preserve original FP value.

; GCN-LABEL: {{^}}callee_with_stack_no_fp_elim_all:		; GCN-LABEL: {{^}}callee_with_stack_no_fp_elim_all:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; MUBUF-NEXT: s_mov_b32 [[FP_COPY:s4]], s33		; MUBUF-NEXT: s_mov_b32 [[FP_COPY:s4]], s33
; FLATSCR-NEXT: s_mov_b32 [[FP_COPY:s0]], s33		; FLATSCR-NEXT: s_mov_b32 [[FP_COPY:s0]], s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; MUBUF-NEXT: s_add_u32 s32, s32, 0x200		; MUBUF-NEXT: s_addk_i32 s32, 0x200
; FLATSCR-NEXT: s_add_u32 s32, s32, 8		; FLATSCR-NEXT: s_add_i32 s32, s32, 8
; GCN-NEXT: v_mov_b32_e32 v0, 0{{$}}		; GCN-NEXT: v_mov_b32_e32 v0, 0{{$}}
; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4{{$}}		; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4{{$}}
; FLATSCR-NEXT: scratch_store_dword off, v0, s33 offset:4{{$}}		; FLATSCR-NEXT: scratch_store_dword off, v0, s33 offset:4{{$}}
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; MUBUF-NEXT: s_sub_u32 s32, s32, 0x200		; MUBUF-NEXT: s_addk_i32 s32, 0xfe00
; FLATSCR-NEXT: s_sub_u32 s32, s32, 8		; FLATSCR-NEXT: s_add_i32 s32, s32, -8
; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]		; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @callee_with_stack_no_fp_elim_all() #1 {		define void @callee_with_stack_no_fp_elim_all() #1 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca
ret void		ret void
}		}

Show All 15 Lines
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN: v_writelane_b32 [[CSR_VGPR]], s33, 2		; GCN: v_writelane_b32 [[CSR_VGPR]], s33, 2
; GCN-DAG: s_mov_b32 s33, s32		; GCN-DAG: s_mov_b32 s33, s32
; MUBUF-DAG: s_add_u32 s32, s32, 0x400{{$}}		; MUBUF-DAG: s_addk_i32 s32, 0x400{{$}}
; FLATSCR-DAG: s_add_u32 s32, s32, 16{{$}}		; FLATSCR-DAG: s_add_i32 s32, s32, 16{{$}}
; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0{{$}}		; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0{{$}}
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30,		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30,
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31,		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31,

; MUBUF-DAG: buffer_store_dword [[ZERO]], off, s[0:3], s33{{$}}		; MUBUF-DAG: buffer_store_dword [[ZERO]], off, s[0:3], s33{{$}}
; FLATSCR-DAG: scratch_store_dword off, [[ZERO]], s33{{$}}		; FLATSCR-DAG: scratch_store_dword off, [[ZERO]], s33{{$}}

; GCN: s_swappc_b64		; GCN: s_swappc_b64

; MUBUF-DAG: v_readlane_b32 s5, [[CSR_VGPR]]		; MUBUF-DAG: v_readlane_b32 s5, [[CSR_VGPR]]
; MUBUF-DAG: v_readlane_b32 s4, [[CSR_VGPR]]		; MUBUF-DAG: v_readlane_b32 s4, [[CSR_VGPR]]
; FLATSCR-DAG: v_readlane_b32 s0, [[CSR_VGPR]]		; FLATSCR-DAG: v_readlane_b32 s0, [[CSR_VGPR]]
; FLATSCR-DAG: v_readlane_b32 s1, [[CSR_VGPR]]		; FLATSCR-DAG: v_readlane_b32 s1, [[CSR_VGPR]]

; MUBUF: s_sub_u32 s32, s32, 0x400{{$}}		; MUBUF: s_addk_i32 s32, 0xfc00{{$}}
; FLATSCR: s_sub_u32 s32, s32, 16{{$}}		; FLATSCR: s_add_i32 s32, s32, -16{{$}}
; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2		; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:4 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)

; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
Show All 11 Lines
; spilling CSR SGPRs.		; spilling CSR SGPRs.

; GCN-LABEL: {{^}}callee_no_stack_with_call:		; GCN-LABEL: {{^}}callee_no_stack_with_call:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; MUBUF-DAG: s_add_u32 s32, s32, 0x400		; MUBUF-DAG: s_addk_i32 s32, 0x400
; FLATSCR-DAG: s_add_u32 s32, s32, 16		; FLATSCR-DAG: s_add_i32 s32, s32, 16
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s33, [[FP_SPILL_LANE:[0-9]+]]		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s33, [[FP_SPILL_LANE:[0-9]+]]

; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30, 0		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30, 0
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1
; GCN: s_swappc_b64		; GCN: s_swappc_b64

; MUBUF-DAG: v_readlane_b32 s4, v40, 0		; MUBUF-DAG: v_readlane_b32 s4, v40, 0
; MUBUF-DAG: v_readlane_b32 s5, v40, 1		; MUBUF-DAG: v_readlane_b32 s5, v40, 1
; FLATSCR-DAG: v_readlane_b32 s0, v40, 0		; FLATSCR-DAG: v_readlane_b32 s0, v40, 0
; FLATSCR-DAG: v_readlane_b32 s1, v40, 1		; FLATSCR-DAG: v_readlane_b32 s1, v40, 1

; MUBUF: s_sub_u32 s32, s32, 0x400		; MUBUF: s_addk_i32 s32, 0xfc00
; FLATSCR: s_sub_u32 s32, s32, 16		; FLATSCR: s_add_i32 s32, s32, -16
; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], [[FP_SPILL_LANE]]		; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], [[FP_SPILL_LANE]]
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @callee_no_stack_with_call() #0 {		define void @callee_no_stack_with_call() #0 {
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
; FLATSCR-DAG: scratch_store_dword off, [[ZERO]], s33 offset:8		; FLATSCR-DAG: scratch_store_dword off, [[ZERO]], s33 offset:8

; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN-NEXT: ; clobber v41		; GCN-NEXT: ; clobber v41
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND

; MUBUF: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; MUBUF: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
; FLATSCR: scratch_load_dword v41, off, s33 ; 4-byte Folded Reload		; FLATSCR: scratch_load_dword v41, off, s33 ; 4-byte Folded Reload
; MUBUF: s_add_u32 s32, s32, 0x300		; MUBUF: s_addk_i32 s32, 0x300
; MUBUF-NEXT: s_sub_u32 s32, s32, 0x300		; MUBUF-NEXT: s_addk_i32 s32, 0xfd00
; MUBUF-NEXT: s_mov_b32 s33, s4		; MUBUF-NEXT: s_mov_b32 s33, s4
; FLATSCR: s_add_u32 s32, s32, 12		; FLATSCR: s_add_i32 s32, s32, 12
; FLATSCR-NEXT: s_sub_u32 s32, s32, 12		; FLATSCR-NEXT: s_add_i32 s32, s32, -12
; FLATSCR-NEXT: s_mov_b32 s33, s0		; FLATSCR-NEXT: s_mov_b32 s33, s0
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @callee_with_stack_no_fp_elim_csr_vgpr() #1 {		define void @callee_with_stack_no_fp_elim_csr_vgpr() #1 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca
call void asm sideeffect "; clobber v41", "~{v41}"()		call void asm sideeffect "; clobber v41", "~{v41}"()
ret void		ret void
Show All 12 Lines
; GCN-COUNT-2: v_writelane_b32 v1		; GCN-COUNT-2: v_writelane_b32 v1
; MUBUF: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; MUBUF: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
; FLATSCR: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill		; FLATSCR: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill
; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:8		; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:8
; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33 offset:8		; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33 offset:8
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN: v_writelane_b32 v1		; GCN: v_writelane_b32 v1

; MUBUF: s_add_u32 s32, s32, 0x400		; MUBUF: s_addk_i32 s32, 0x400
; MUBUF: s_sub_u32 s32, s32, 0x400		; MUBUF: s_addk_i32 s32, 0xfc00
; FLATSCR: s_add_u32 s32, s32, 16		; FLATSCR: s_add_i32 s32, s32, 16
; FLATSCR: s_sub_u32 s32, s32, 16		; FLATSCR: s_add_i32 s32, s32, -16
; GCN-NEXT: v_readlane_b32 s33, v1, 63		; GCN-NEXT: v_readlane_b32 s33, v1, 63
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:12 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:12 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:12 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @last_lane_vgpr_for_fp_csr() #1 {		define void @last_lane_vgpr_for_fp_csr() #1 {
Show All 26 Lines
; MUBUF: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; MUBUF: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
; FLATSCR: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill		; FLATSCR: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill
; MUBUF: buffer_store_dword		; MUBUF: buffer_store_dword
; FLATSCR: scratch_store_dword		; FLATSCR: scratch_store_dword
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN: v_writelane_b32 v1,		; GCN: v_writelane_b32 v1,
; MUBUF: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; MUBUF: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
; FLATSCR: scratch_load_dword v41, off, s33 ; 4-byte Folded Reload		; FLATSCR: scratch_load_dword v41, off, s33 ; 4-byte Folded Reload
; MUBUF: s_add_u32 s32, s32, 0x400		; MUBUF: s_addk_i32 s32, 0x400
; FLATSCR: s_add_u32 s32, s32, 16		; FLATSCR: s_add_i32 s32, s32, 16
; GCN-COUNT-64: v_readlane_b32 s{{[0-9]+}}, v1		; GCN-COUNT-64: v_readlane_b32 s{{[0-9]+}}, v1
; MUBUF-NEXT: s_sub_u32 s32, s32, 0x400		; MUBUF-NEXT: s_addk_i32 s32, 0xfc00
; FLATSCR-NEXT: s_sub_u32 s32, s32, 16		; FLATSCR-NEXT: s_add_i32 s32, s32, -16
; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]		; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:12 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:12 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:12 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @no_new_vgpr_for_fp_csr() #1 {		define void @no_new_vgpr_for_fp_csr() #1 {
Show All 11 Lines	define void @no_new_vgpr_for_fp_csr() #1 {

ret void		ret void
}		}

; GCN-LABEL: {{^}}realign_stack_no_fp_elim:		; GCN-LABEL: {{^}}realign_stack_no_fp_elim:
; GCN: s_waitcnt		; GCN: s_waitcnt
; MUBUF-NEXT: s_mov_b32 [[FP_COPY:s4]], s33		; MUBUF-NEXT: s_mov_b32 [[FP_COPY:s4]], s33
; FLATSCR-NEXT: s_mov_b32 [[FP_COPY:s0]], s33		; FLATSCR-NEXT: s_mov_b32 [[FP_COPY:s0]], s33
; MUBUF-NEXT: s_add_u32 s33, s32, 0x7ffc0		; MUBUF-NEXT: s_add_i32 s33, s32, 0x7ffc0
; FLATSCR-NEXT: s_add_u32 s33, s32, 0x1fff		; FLATSCR-NEXT: s_add_i32 s33, s32, 0x1fff
; MUBUF-NEXT: s_and_b32 s33, s33, 0xfff80000		; MUBUF-NEXT: s_and_b32 s33, s33, 0xfff80000
; FLATSCR-NEXT: s_and_b32 s33, s33, 0xffffe000		; FLATSCR-NEXT: s_and_b32 s33, s33, 0xffffe000
; MUBUF-NEXT: s_add_u32 s32, s32, 0x100000		; MUBUF-NEXT: s_add_i32 s32, s32, 0x100000
; FLATSCR-NEXT: s_add_u32 s32, s32, 0x4000		; FLATSCR-NEXT: s_addk_i32 s32, 0x4000
; GCN-NEXT: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0		; GCN-NEXT: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0
; MUBUF-NEXT: buffer_store_dword [[ZERO]], off, s[0:3], s33		; MUBUF-NEXT: buffer_store_dword [[ZERO]], off, s[0:3], s33
; FLATSCR-NEXT: scratch_store_dword off, [[ZERO]], s33		; FLATSCR-NEXT: scratch_store_dword off, [[ZERO]], s33
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; MUBUF-NEXT: s_sub_u32 s32, s32, 0x100000		; MUBUF-NEXT: s_add_i32 s32, s32, 0xfff00000
; FLATSCR-NEXT: s_sub_u32 s32, s32, 0x4000		; FLATSCR-NEXT: s_addk_i32 s32, 0xc000
; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]		; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @realign_stack_no_fp_elim() #1 {		define void @realign_stack_no_fp_elim() #1 {
%alloca = alloca i32, align 8192, addrspace(5)		%alloca = alloca i32, align 8192, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca
ret void		ret void
}		}

; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp:		; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:8 ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN-NEXT: v_writelane_b32 v1, s33, 2		; GCN-NEXT: v_writelane_b32 v1, s33, 2
; GCN-NEXT: v_writelane_b32 v1, s30, 0		; GCN-NEXT: v_writelane_b32 v1, s30, 0
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0		; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0
; GCN: v_writelane_b32 v1, s31, 1		; GCN: v_writelane_b32 v1, s31, 1
; MUBUF: buffer_store_dword [[ZERO]], off, s[0:3], s33 offset:4		; MUBUF: buffer_store_dword [[ZERO]], off, s[0:3], s33 offset:4
; FLATSCR: scratch_store_dword off, [[ZERO]], s33 offset:4		; FLATSCR: scratch_store_dword off, [[ZERO]], s33 offset:4
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; MUBUF: s_add_u32 s32, s32, 0x300		; MUBUF: s_addk_i32 s32, 0x300
; MUBUF-NEXT: v_readlane_b32 s4, v1, 0		; MUBUF-NEXT: v_readlane_b32 s4, v1, 0
; MUBUF-NEXT: v_readlane_b32 s5, v1, 1		; MUBUF-NEXT: v_readlane_b32 s5, v1, 1
; FLATSCR: s_add_u32 s32, s32, 12		; FLATSCR: s_add_i32 s32, s32, 12
; FLATSCR-NEXT: v_readlane_b32 s0, v1, 0		; FLATSCR-NEXT: v_readlane_b32 s0, v1, 0
; FLATSCR-NEXT: v_readlane_b32 s1, v1, 1		; FLATSCR-NEXT: v_readlane_b32 s1, v1, 1
; MUBUF-NEXT: s_sub_u32 s32, s32, 0x300		; MUBUF-NEXT: s_addk_i32 s32, 0xfd00
; FLATSCR-NEXT: s_sub_u32 s32, s32, 12		; FLATSCR-NEXT: s_add_i32 s32, s32, -12
; GCN-NEXT: v_readlane_b32 s33, v1, 2		; GCN-NEXT: v_readlane_b32 s33, v1, 2
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:8 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; MUBUF-NEXT: s_setpc_b64 s[4:5]		; MUBUF-NEXT: s_setpc_b64 s[4:5]
; FLATSCR-NEXT: s_setpc_b64 s[0:1]		; FLATSCR-NEXT: s_setpc_b64 s[0:1]
Show All 20 Lines
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s33, 2		; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s33, 2
; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s30, 0		; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s30, 0
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32

; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1
; MUBUF-DAG: buffer_store_dword		; MUBUF-DAG: buffer_store_dword
; FLATSCR-DAG: scratch_store_dword		; FLATSCR-DAG: scratch_store_dword
; MUBUF: s_add_u32 s32, s32, 0x300{{$}}		; MUBUF: s_addk_i32 s32, 0x300{{$}}
; FLATSCR: s_add_u32 s32, s32, 12{{$}}		; FLATSCR: s_add_i32 s32, s32, 12{{$}}

; MUBUF: v_readlane_b32 s4, [[CSR_VGPR]], 0		; MUBUF: v_readlane_b32 s4, [[CSR_VGPR]], 0
; FLATSCR: v_readlane_b32 s0, [[CSR_VGPR]], 0		; FLATSCR: v_readlane_b32 s0, [[CSR_VGPR]], 0
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; MUBUF: v_readlane_b32 s5, [[CSR_VGPR]], 1		; MUBUF: v_readlane_b32 s5, [[CSR_VGPR]], 1
; FLATSCR: v_readlane_b32 s1, [[CSR_VGPR]], 1		; FLATSCR: v_readlane_b32 s1, [[CSR_VGPR]], 1
; MUBUF-NEXT: s_sub_u32 s32, s32, 0x300{{$}}		; MUBUF-NEXT: s_addk_i32 s32, 0xfd00{{$}}
; FLATSCR-NEXT: s_sub_u32 s32, s32, 12{{$}}		; FLATSCR-NEXT: s_add_i32 s32, s32, -12{{$}}
; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2		; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:8 ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr() #1 {		define void @no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr() #1 {
Show All 16 Lines	define void @no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr() #1 {
ret void		ret void
}		}

; The byval argument exceeds the MUBUF constant offset, so a scratch		; The byval argument exceeds the MUBUF constant offset, so a scratch
; register is needed to access the CSR VGPR slot.		; register is needed to access the CSR VGPR slot.
; GCN-LABEL: {{^}}scratch_reg_needed_mubuf_offset:		; GCN-LABEL: {{^}}scratch_reg_needed_mubuf_offset:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: s_add_u32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40200		; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40200
; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; FLATSCR-NEXT: s_add_u32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1008		; FLATSCR-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1008
; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], [[SCRATCH_SGPR]] ; 4-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s33, 2		; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s33, 2
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30, 0		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30, 0
; GCN-DAG: s_mov_b32 s33, s32		; GCN-DAG: s_mov_b32 s33, s32
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1
; MUBUF-DAG: s_add_u32 s32, s32, 0x40300{{$}}		; MUBUF-DAG: s_add_i32 s32, s32, 0x40300{{$}}
; FLATSCR-DAG: s_add_u32 s32, s32, 0x100c{{$}}		; FLATSCR-DAG: s_addk_i32 s32, 0x100c{{$}}
; MUBUF-DAG: buffer_store_dword		; MUBUF-DAG: buffer_store_dword
; FLATSCR-DAG: scratch_store_dword		; FLATSCR-DAG: scratch_store_dword

; MUBUF: v_readlane_b32 s4, [[CSR_VGPR]], 0		; MUBUF: v_readlane_b32 s4, [[CSR_VGPR]], 0
; FLATSCR: v_readlane_b32 s0, [[CSR_VGPR]], 0		; FLATSCR: v_readlane_b32 s0, [[CSR_VGPR]], 0
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; MUBUF: v_readlane_b32 s5, [[CSR_VGPR]], 1		; MUBUF: v_readlane_b32 s5, [[CSR_VGPR]], 1
; FLATSCR: v_readlane_b32 s1, [[CSR_VGPR]], 1		; FLATSCR: v_readlane_b32 s1, [[CSR_VGPR]], 1
; MUBUF-NEXT: s_sub_u32 s32, s32, 0x40300{{$}}		; MUBUF-NEXT: s_add_i32 s32, s32, 0xfffbfd00{{$}}
; FLATSCR-NEXT: s_sub_u32 s32, s32, 0x100c{{$}}		; FLATSCR-NEXT: s_addk_i32 s32, 0xeff4{{$}}
; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2		; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; MUBUF-NEXT: s_add_u32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40200		; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40200
; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Reload
; FLATSCR-NEXT: s_add_u32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1008		; FLATSCR-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1008
; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, [[SCRATCH_SGPR]] ; 4-byte Folded Reload		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, [[SCRATCH_SGPR]] ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval([4096 x i8]) align 4 %arg) #1 {		define void @scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval([4096 x i8]) align 4 %arg) #1 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca

Show All 21 Lines	define internal void @local_empty_func() #0 {
ret void		ret void
}		}

; An FP is needed, despite not needing any spills		; An FP is needed, despite not needing any spills
; TODO: Ccould see callee does not use stack and omit FP.		; TODO: Ccould see callee does not use stack and omit FP.
; GCN-LABEL: {{^}}ipra_call_with_stack:		; GCN-LABEL: {{^}}ipra_call_with_stack:
; GCN: s_mov_b32 [[FP_COPY:s[0-9]+]], s33		; GCN: s_mov_b32 [[FP_COPY:s[0-9]+]], s33
; GCN: s_mov_b32 s33, s32		; GCN: s_mov_b32 s33, s32
; MUBUF: s_add_u32 s32, s32, 0x400		; MUBUF: s_addk_i32 s32, 0x400
; FLATSCR: s_add_u32 s32, s32, 16		; FLATSCR: s_add_i32 s32, s32, 16
; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33{{$}}		; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33{{$}}
; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33{{$}}		; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33{{$}}
; GCN: s_swappc_b64		; GCN: s_swappc_b64
; MUBUF: s_sub_u32 s32, s32, 0x400		; MUBUF: s_addk_i32 s32, 0xfc00
; FLATSCR: s_sub_u32 s32, s32, 16		; FLATSCR: s_add_i32 s32, s32, -16
; GCN: s_mov_b32 s33, [[FP_COPY:s[0-9]+]]		; GCN: s_mov_b32 s33, [[FP_COPY:s[0-9]+]]
define void @ipra_call_with_stack() #0 {		define void @ipra_call_with_stack() #0 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca
call void @local_empty_func()		call void @local_empty_func()
ret void		ret void
}		}

▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	call void asm sideeffect "; clobber all VGPRs except CSR v40",
,~{v30},~{v31},~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38},~{v39}"()		,~{v30},~{v31},~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38},~{v39}"()
ret void		ret void
}		}

; If the size of the offset exceeds the MUBUF offset field we need another		; If the size of the offset exceeds the MUBUF offset field we need another
; scratch VGPR to hold the offset.		; scratch VGPR to hold the offset.
; GCN-LABEL: {{^}}spill_fp_to_memory_scratch_reg_needed_mubuf_offset		; GCN-LABEL: {{^}}spill_fp_to_memory_scratch_reg_needed_mubuf_offset
; MUBUF: s_or_saveexec_b64 s[4:5], -1		; MUBUF: s_or_saveexec_b64 s[4:5], -1
; MUBUF-NEXT: s_add_u32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40200		; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40200
; MUBUF-NEXT: buffer_store_dword v39, off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword v39, off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; MUBUF: v_mov_b32_e32 v0, s33		; MUBUF: v_mov_b32_e32 v0, s33
; GCN-NOT: v_mov_b32_e32 v0, 0x100c		; GCN-NOT: v_mov_b32_e32 v0, 0x100c
; MUBUF-NEXT: s_add_u32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40300		; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x40300
; MUBUF: buffer_store_dword v0, off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill		; MUBUF: buffer_store_dword v0, off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; FLATSCR: s_add_u32 [[SOFF:s[0-9]+]], s33, 0x1004		; FLATSCR: s_add_i32 [[SOFF:s[0-9]+]], s33, 0x1004
; FLATSCR: v_mov_b32_e32 v0, 0		; FLATSCR: v_mov_b32_e32 v0, 0
; FLATSCR: scratch_store_dword off, v0, [[SOFF]]		; FLATSCR: scratch_store_dword off, v0, [[SOFF]]
define void @spill_fp_to_memory_scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval([4096 x i8]) align 4 %arg) #3 {		define void @spill_fp_to_memory_scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval([4096 x i8]) align 4 %arg) #3 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca

call void asm sideeffect "; clobber nonpreserved SGPRs and 64 CSRs",		call void asm sideeffect "; clobber nonpreserved SGPRs and 64 CSRs",
"~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}		"~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}
Show All 22 Lines

llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll

Show First 20 Lines • Show All 516 Lines • ▼ Show 20 Lines	define hidden void @func_use_every_sgpr_input_call_use_workgroup_id_xyz() #1 {
call void asm sideeffect "; use $0", "s"(i32 %val6)		call void asm sideeffect "; use $0", "s"(i32 %val6)

call void @use_workgroup_id_xyz()		call void @use_workgroup_id_xyz()
ret void		ret void
}		}

; GCN-LABEL: {{^}}func_use_every_sgpr_input_call_use_workgroup_id_xyz_spill:		; GCN-LABEL: {{^}}func_use_every_sgpr_input_call_use_workgroup_id_xyz_spill:
; GCN-DAG: s_mov_b32 s33, s32		; GCN-DAG: s_mov_b32 s33, s32
; GCN-DAG: s_add_u32 s32, s32, 0x400		; GCN-DAG: s_addk_i32 s32, 0x400
; GCN-DAG: s_mov_b64 s{{\[}}[[LO_X:[0-9]+]]{{\:}}[[HI_X:[0-9]+]]{{\]}}, s[4:5]		; GCN-DAG: s_mov_b64 s{{\[}}[[LO_X:[0-9]+]]{{\:}}[[HI_X:[0-9]+]]{{\]}}, s[4:5]
; GCN-DAG: s_mov_b64 s{{\[}}[[LO_Y:[0-9]+]]{{\:}}[[HI_Y:[0-9]+]]{{\]}}, s[6:7]		; GCN-DAG: s_mov_b64 s{{\[}}[[LO_Y:[0-9]+]]{{\:}}[[HI_Y:[0-9]+]]{{\]}}, s[6:7]


; GCN: s_mov_b32 s4, s12		; GCN: s_mov_b32 s4, s12
; GCN: s_mov_b32 s5, s13		; GCN: s_mov_b32 s5, s13
; GCN: s_mov_b32 s6, s14		; GCN: s_mov_b32 s6, s14

▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs-packed.ll

Show First 20 Lines • Show All 397 Lines • ▼ Show 20 Lines	call void @too_many_args_use_workitem_id_x(
i32 210, i32 220, i32 230, i32 240,		i32 210, i32 220, i32 230, i32 240,
i32 250, i32 260, i32 270, i32 280,		i32 250, i32 260, i32 270, i32 280,
i32 290, i32 300, i32 310, i32 320)		i32 290, i32 300, i32 310, i32 320)
ret void		ret void
}		}

; Requires loading and storing to stack slot.		; Requires loading and storing to stack slot.
; GCN-LABEL: {{^}}too_many_args_call_too_many_args_use_workitem_id_x:		; GCN-LABEL: {{^}}too_many_args_call_too_many_args_use_workitem_id_x:
; GCN-DAG: s_add_u32 s32, s32, 0x400{{$}}		; GCN-DAG: s_addk_i32 s32, 0x400{{$}}
; GCN-DAG: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GCN-DAG: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GCN-DAG: buffer_load_dword v32, off, s[0:3], s33{{$}}		; GCN-DAG: buffer_load_dword v32, off, s[0:3], s33{{$}}

; GCN: buffer_store_dword v32, off, s[0:3], s32{{$}}		; GCN: buffer_store_dword v32, off, s[0:3], s32{{$}}

; GCN: s_swappc_b64		; GCN: s_swappc_b64

; GCN: s_sub_u32 s32, s32, 0x400{{$}}		; GCN: s_addk_i32 s32, 0xfc00{{$}}
; GCN: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; GCN: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; GCN: s_setpc_b64		; GCN: s_setpc_b64
define void @too_many_args_call_too_many_args_use_workitem_id_x(		define void @too_many_args_call_too_many_args_use_workitem_id_x(
i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,		i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,
i32 %arg8, i32 %arg9, i32 %arg10, i32 %arg11, i32 %arg12, i32 %arg13, i32 %arg14, i32 %arg15,		i32 %arg8, i32 %arg9, i32 %arg10, i32 %arg11, i32 %arg12, i32 %arg13, i32 %arg14, i32 %arg15,
i32 %arg16, i32 %arg17, i32 %arg18, i32 %arg19, i32 %arg20, i32 %arg21, i32 %arg22, i32 %arg23,		i32 %arg16, i32 %arg17, i32 %arg18, i32 %arg19, i32 %arg20, i32 %arg21, i32 %arg22, i32 %arg23,
i32 %arg24, i32 %arg25, i32 %arg26, i32 %arg27, i32 %arg28, i32 %arg29, i32 %arg30, i32 %arg31) #1 {		i32 %arg24, i32 %arg25, i32 %arg26, i32 %arg27, i32 %arg28, i32 %arg29, i32 %arg30, i32 %arg31) #1 {
call void @too_many_args_use_workitem_id_x(		call void @too_many_args_use_workitem_id_x(
▲ Show 20 Lines • Show All 323 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll

Show First 20 Lines • Show All 503 Lines • ▼ Show 20 Lines	call void @too_many_args_use_workitem_id_x(
i32 210, i32 220, i32 230, i32 240,		i32 210, i32 220, i32 230, i32 240,
i32 250, i32 260, i32 270, i32 280,		i32 250, i32 260, i32 270, i32 280,
i32 290, i32 300, i32 310, i32 320)		i32 290, i32 300, i32 310, i32 320)
ret void		ret void
}		}

; Requires loading and storing to stack slot.		; Requires loading and storing to stack slot.
; GCN-LABEL: {{^}}too_many_args_call_too_many_args_use_workitem_id_x:		; GCN-LABEL: {{^}}too_many_args_call_too_many_args_use_workitem_id_x:
; GCN-DAG: s_add_u32 s32, s32, 0x400{{$}}		; GCN-DAG: s_addk_i32 s32, 0x400{{$}}
; GCN-DAG: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; GCN-DAG: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
; GCN-DAG: buffer_load_dword v32, off, s[0:3], s33{{$}}		; GCN-DAG: buffer_load_dword v32, off, s[0:3], s33{{$}}

; GCN: buffer_store_dword v32, off, s[0:3], s32{{$}}		; GCN: buffer_store_dword v32, off, s[0:3], s32{{$}}

; GCN: s_swappc_b64		; GCN: s_swappc_b64

; GCN: s_sub_u32 s32, s32, 0x400{{$}}		; GCN: s_addk_i32 s32, 0xfc00{{$}}
; GCN: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; GCN: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
; GCN: s_setpc_b64		; GCN: s_setpc_b64
define void @too_many_args_call_too_many_args_use_workitem_id_x(		define void @too_many_args_call_too_many_args_use_workitem_id_x(
i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,		i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,
i32 %arg8, i32 %arg9, i32 %arg10, i32 %arg11, i32 %arg12, i32 %arg13, i32 %arg14, i32 %arg15,		i32 %arg8, i32 %arg9, i32 %arg10, i32 %arg11, i32 %arg12, i32 %arg13, i32 %arg14, i32 %arg15,
i32 %arg16, i32 %arg17, i32 %arg18, i32 %arg19, i32 %arg20, i32 %arg21, i32 %arg22, i32 %arg23,		i32 %arg16, i32 %arg17, i32 %arg18, i32 %arg19, i32 %arg20, i32 %arg21, i32 %arg22, i32 %arg23,
i32 %arg24, i32 %arg25, i32 %arg26, i32 %arg27, i32 %arg28, i32 %arg29, i32 %arg30, i32 %arg31) #1 {		i32 %arg24, i32 %arg25, i32 %arg26, i32 %arg27, i32 %arg28, i32 %arg29, i32 %arg30, i32 %arg31) #1 {
call void @too_many_args_use_workitem_id_x(		call void @too_many_args_use_workitem_id_x(
▲ Show 20 Lines • Show All 362 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/cc-update.ll

Show All 16 Lines
; GFX1010-NEXT: s_endpgm		; GFX1010-NEXT: s_endpgm
entry:		entry:
ret void		ret void
}		}

define amdgpu_kernel void @test_kern_stack() local_unnamed_addr #0 {		define amdgpu_kernel void @test_kern_stack() local_unnamed_addr #0 {
; GFX803-LABEL: test_kern_stack:		; GFX803-LABEL: test_kern_stack:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_add_u32 s4, s4, s7		; GFX803-NEXT: s_add_i32 s4, s4, s7
; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8		; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
; GFX803-NEXT: s_add_u32 s0, s0, s7		; GFX803-NEXT: s_add_u32 s0, s0, s7
; GFX803-NEXT: s_addc_u32 s1, s1, 0		; GFX803-NEXT: s_addc_u32 s1, s1, 0
; GFX803-NEXT: v_mov_b32_e32 v0, 0		; GFX803-NEXT: v_mov_b32_e32 v0, 0
; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5		; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5
; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4		; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: s_endpgm		; GFX803-NEXT: s_endpgm
Show All 25 Lines	entry:
%x = alloca i32, align 4, addrspace(5)		%x = alloca i32, align 4, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %x, align 4		store volatile i32 0, i32 addrspace(5)* %x, align 4
ret void		ret void
}		}

define amdgpu_kernel void @test_kern_call() local_unnamed_addr #0 {		define amdgpu_kernel void @test_kern_call() local_unnamed_addr #0 {
; GFX803-LABEL: test_kern_call:		; GFX803-LABEL: test_kern_call:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_add_u32 s4, s4, s7		; GFX803-NEXT: s_add_i32 s4, s4, s7
; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8		; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
; GFX803-NEXT: s_add_u32 s0, s0, s7		; GFX803-NEXT: s_add_u32 s0, s0, s7
; GFX803-NEXT: s_addc_u32 s1, s1, 0		; GFX803-NEXT: s_addc_u32 s1, s1, 0
; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5		; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5
; GFX803-NEXT: s_getpc_b64 s[4:5]		; GFX803-NEXT: s_getpc_b64 s[4:5]
; GFX803-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4		; GFX803-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
; GFX803-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+12		; GFX803-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+12
; GFX803-NEXT: s_mov_b32 s32, 0		; GFX803-NEXT: s_mov_b32 s32, 0
Show All 30 Lines
entry:		entry:
tail call void @ex() #0		tail call void @ex() #0
ret void		ret void
}		}

define amdgpu_kernel void @test_kern_stack_and_call() local_unnamed_addr #0 {		define amdgpu_kernel void @test_kern_stack_and_call() local_unnamed_addr #0 {
; GFX803-LABEL: test_kern_stack_and_call:		; GFX803-LABEL: test_kern_stack_and_call:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_add_u32 s4, s4, s7		; GFX803-NEXT: s_add_i32 s4, s4, s7
; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8		; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
; GFX803-NEXT: s_add_u32 s0, s0, s7		; GFX803-NEXT: s_add_u32 s0, s0, s7
; GFX803-NEXT: s_addc_u32 s1, s1, 0		; GFX803-NEXT: s_addc_u32 s1, s1, 0
; GFX803-NEXT: v_mov_b32_e32 v0, 0		; GFX803-NEXT: v_mov_b32_e32 v0, 0
; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5		; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5
; GFX803-NEXT: s_getpc_b64 s[4:5]		; GFX803-NEXT: s_getpc_b64 s[4:5]
; GFX803-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4		; GFX803-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
; GFX803-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+12		; GFX803-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+12
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
; GFX1010-NEXT: s_endpgm		; GFX1010-NEXT: s_endpgm
entry:		entry:
ret void		ret void
}		}

define amdgpu_kernel void @test_force_fp_kern_stack() local_unnamed_addr #2 {		define amdgpu_kernel void @test_force_fp_kern_stack() local_unnamed_addr #2 {
; GFX803-LABEL: test_force_fp_kern_stack:		; GFX803-LABEL: test_force_fp_kern_stack:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_add_u32 s4, s4, s7		; GFX803-NEXT: s_add_i32 s4, s4, s7
; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8		; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
; GFX803-NEXT: s_add_u32 s0, s0, s7		; GFX803-NEXT: s_add_u32 s0, s0, s7
; GFX803-NEXT: s_mov_b32 s33, 0		; GFX803-NEXT: s_mov_b32 s33, 0
; GFX803-NEXT: s_addc_u32 s1, s1, 0		; GFX803-NEXT: s_addc_u32 s1, s1, 0
; GFX803-NEXT: v_mov_b32_e32 v0, 0		; GFX803-NEXT: v_mov_b32_e32 v0, 0
; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5		; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5
; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4		; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
Show All 28 Lines	entry:
%x = alloca i32, align 4, addrspace(5)		%x = alloca i32, align 4, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %x, align 4		store volatile i32 0, i32 addrspace(5)* %x, align 4
ret void		ret void
}		}

define amdgpu_kernel void @test_force_fp_kern_call() local_unnamed_addr #2 {		define amdgpu_kernel void @test_force_fp_kern_call() local_unnamed_addr #2 {
; GFX803-LABEL: test_force_fp_kern_call:		; GFX803-LABEL: test_force_fp_kern_call:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_add_u32 s4, s4, s7		; GFX803-NEXT: s_add_i32 s4, s4, s7
; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8		; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
; GFX803-NEXT: s_add_u32 s0, s0, s7		; GFX803-NEXT: s_add_u32 s0, s0, s7
; GFX803-NEXT: s_addc_u32 s1, s1, 0		; GFX803-NEXT: s_addc_u32 s1, s1, 0
; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5		; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5
; GFX803-NEXT: s_getpc_b64 s[4:5]		; GFX803-NEXT: s_getpc_b64 s[4:5]
; GFX803-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4		; GFX803-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
; GFX803-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+12		; GFX803-NEXT: s_addc_u32 s5, s5, ex@rel32@hi+12
; GFX803-NEXT: s_mov_b32 s32, 0		; GFX803-NEXT: s_mov_b32 s32, 0
Show All 33 Lines
entry:		entry:
tail call void @ex() #2		tail call void @ex() #2
ret void		ret void
}		}

define amdgpu_kernel void @test_force_fp_kern_stack_and_call() local_unnamed_addr #2 {		define amdgpu_kernel void @test_force_fp_kern_stack_and_call() local_unnamed_addr #2 {
; GFX803-LABEL: test_force_fp_kern_stack_and_call:		; GFX803-LABEL: test_force_fp_kern_stack_and_call:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_add_u32 s4, s4, s7		; GFX803-NEXT: s_add_i32 s4, s4, s7
; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8		; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
; GFX803-NEXT: s_add_u32 s0, s0, s7		; GFX803-NEXT: s_add_u32 s0, s0, s7
; GFX803-NEXT: s_mov_b32 s33, 0		; GFX803-NEXT: s_mov_b32 s33, 0
; GFX803-NEXT: s_addc_u32 s1, s1, 0		; GFX803-NEXT: s_addc_u32 s1, s1, 0
; GFX803-NEXT: v_mov_b32_e32 v0, 0		; GFX803-NEXT: v_mov_b32_e32 v0, 0
; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5		; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5
; GFX803-NEXT: s_getpc_b64 s[4:5]		; GFX803-NEXT: s_getpc_b64 s[4:5]
; GFX803-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4		; GFX803-NEXT: s_add_u32 s4, s4, ex@rel32@lo+4
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	entry:
store volatile i32 0, i32 addrspace(5)* %x, align 4		store volatile i32 0, i32 addrspace(5)* %x, align 4
tail call void @ex() #2		tail call void @ex() #2
ret void		ret void
}		}

define amdgpu_kernel void @test_sgpr_offset_kernel() #1 {		define amdgpu_kernel void @test_sgpr_offset_kernel() #1 {
; GFX803-LABEL: test_sgpr_offset_kernel:		; GFX803-LABEL: test_sgpr_offset_kernel:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_add_u32 s4, s4, s7		; GFX803-NEXT: s_add_i32 s4, s4, s7
; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8		; GFX803-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
; GFX803-NEXT: s_add_u32 s0, s0, s7		; GFX803-NEXT: s_add_u32 s0, s0, s7
; GFX803-NEXT: s_addc_u32 s1, s1, 0		; GFX803-NEXT: s_addc_u32 s1, s1, 0
; GFX803-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8 glc		; GFX803-NEXT: buffer_load_dword v0, off, s[0:3], 0 offset:8 glc
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: s_mov_b32 s4, 0x40000		; GFX803-NEXT: s_mov_b32 s4, 0x40000
; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5		; GFX803-NEXT: s_mov_b32 flat_scratch_lo, s5
; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], s4 ; 4-byte Folded Spill		; GFX803-NEXT: buffer_store_dword v0, off, s[0:3], s4 ; 4-byte Folded Spill
▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

	Show All 27 Lines
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v40, s33, 2			; GCN-NEXT: v_writelane_b32 v40, s33, 2
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func_v2f32@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func_v2f32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func_v2f32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s5, s5, func_v2f32@rel32@hi+12
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_readlane_b32 s4, v40, 0			; GCN-NEXT: v_readlane_b32 s4, v40, 0
	; GCN-NEXT: v_readlane_b32 s5, v40, 1			; GCN-NEXT: v_readlane_b32 s5, v40, 1
	; GCN-NEXT: s_sub_u32 s32, s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 2			; GCN-NEXT: v_readlane_b32 s33, v40, 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[4:5]			; GCN-NEXT: s_setpc_b64 s[4:5]
	bb0:			bb0:
	%split.ret.type = call <2 x float> @func_v2f32()			%split.ret.type = call <2 x float> @func_v2f32()
	Show All 9 Lines
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v40, s33, 2			; GCN-NEXT: v_writelane_b32 v40, s33, 2
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func_v3f32@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func_v3f32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func_v3f32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s5, s5, func_v3f32@rel32@hi+12
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_readlane_b32 s4, v40, 0			; GCN-NEXT: v_readlane_b32 s4, v40, 0
	; GCN-NEXT: v_readlane_b32 s5, v40, 1			; GCN-NEXT: v_readlane_b32 s5, v40, 1
	; GCN-NEXT: s_sub_u32 s32, s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 2			; GCN-NEXT: v_readlane_b32 s33, v40, 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[4:5]			; GCN-NEXT: s_setpc_b64 s[4:5]
	bb0:			bb0:
	%split.ret.type = call <3 x float> @func_v3f32()			%split.ret.type = call <3 x float> @func_v3f32()
	Show All 9 Lines
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v40, s33, 2			; GCN-NEXT: v_writelane_b32 v40, s33, 2
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func_v4f16@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func_v4f16@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func_v4f16@rel32@hi+12			; GCN-NEXT: s_addc_u32 s5, s5, func_v4f16@rel32@hi+12
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_readlane_b32 s4, v40, 0			; GCN-NEXT: v_readlane_b32 s4, v40, 0
	; GCN-NEXT: v_readlane_b32 s5, v40, 1			; GCN-NEXT: v_readlane_b32 s5, v40, 1
	; GCN-NEXT: s_sub_u32 s32, s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 2			; GCN-NEXT: v_readlane_b32 s33, v40, 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[4:5]			; GCN-NEXT: s_setpc_b64 s[4:5]
	bb0:			bb0:
	%split.ret.type = call <4 x half> @func_v4f16()			%split.ret.type = call <4 x half> @func_v4f16()
	Show All 9 Lines
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v40, s33, 2			; GCN-NEXT: v_writelane_b32 v40, s33, 2
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func_struct@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func_struct@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func_struct@rel32@hi+12			; GCN-NEXT: s_addc_u32 s5, s5, func_struct@rel32@hi+12
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_readlane_b32 s4, v40, 0			; GCN-NEXT: v_readlane_b32 s4, v40, 0
	; GCN-NEXT: v_mov_b32_e32 v1, v4			; GCN-NEXT: v_mov_b32_e32 v1, v4
	; GCN-NEXT: v_readlane_b32 s5, v40, 1			; GCN-NEXT: v_readlane_b32 s5, v40, 1
	; GCN-NEXT: s_sub_u32 s32, s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 2			; GCN-NEXT: v_readlane_b32 s33, v40, 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[4:5]			; GCN-NEXT: s_setpc_b64 s[4:5]
	bb0:			bb0:
	%split.ret.type = call { <4 x i32>, <4 x half> } @func_struct()			%split.ret.type = call { <4 x i32>, <4 x half> } @func_struct()
	▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/flat-scratch.ll

	Show First 20 Lines • Show All 222 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX9-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s2, s5			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s2, s5
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_lshl_b32 s1, s0, 2			; GFX9-NEXT: s_lshl_b32 s1, s0, 2
	; GFX9-NEXT: s_and_b32 s0, s0, 15			; GFX9-NEXT: s_and_b32 s0, s0, 15
	; GFX9-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-NEXT: s_add_u32 s1, 4, s1			; GFX9-NEXT: s_add_i32 s1, s1, 4
	; GFX9-NEXT: scratch_store_dword off, v0, s1			; GFX9-NEXT: scratch_store_dword off, v0, s1
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_u32 s0, 4, s0			; GFX9-NEXT: s_add_i32 s0, s0, 4
	; GFX9-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_sindex_kernel:			; GFX10-LABEL: store_load_sindex_kernel:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s2, s2, s5			; GFX10-NEXT: s_add_u32 s2, s2, s5
	; GFX10-NEXT: s_addc_u32 s3, s3, 0			; GFX10-NEXT: s_addc_u32 s3, s3, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX10-NEXT: v_mov_b32_e32 v0, 15			; GFX10-NEXT: v_mov_b32_e32 v0, 15
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_and_b32 s1, s0, 15			; GFX10-NEXT: s_and_b32 s1, s0, 15
	; GFX10-NEXT: s_lshl_b32 s0, s0, 2			; GFX10-NEXT: s_lshl_b32 s0, s0, 2
	; GFX10-NEXT: s_lshl_b32 s1, s1, 2			; GFX10-NEXT: s_lshl_b32 s1, s1, 2
	; GFX10-NEXT: s_add_u32 s0, 4, s0			; GFX10-NEXT: s_add_i32 s0, s0, 4
	; GFX10-NEXT: s_add_u32 s1, 4, s1			; GFX10-NEXT: s_add_i32 s1, s1, 4
	; GFX10-NEXT: scratch_store_dword off, v0, s0			; GFX10-NEXT: scratch_store_dword off, v0, s0
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: store_load_sindex_kernel:			; GFX9-PAL-LABEL: store_load_sindex_kernel:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_getpc_b64 s[4:5]			; GFX9-PAL-NEXT: s_getpc_b64 s[4:5]
	; GFX9-PAL-NEXT: s_mov_b32 s4, s0			; GFX9-PAL-NEXT: s_mov_b32 s4, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-PAL-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX9-PAL-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-PAL-NEXT: s_and_b32 s5, s5, 0xffff			; GFX9-PAL-NEXT: s_and_b32 s5, s5, 0xffff
	; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s4, s3			; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s4, s3
	; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s5, 0			; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
	; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2
	; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15			; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15
	; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-PAL-NEXT: s_add_u32 s1, 4, s1			; GFX9-PAL-NEXT: s_add_i32 s1, s1, 4
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1			; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_add_u32 s0, 4, s0			; GFX9-PAL-NEXT: s_add_i32 s0, s0, 4
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX10-PAL-LABEL: store_load_sindex_kernel:			; GFX10-PAL-LABEL: store_load_sindex_kernel:
	; GFX10-PAL: ; %bb.0: ; %bb			; GFX10-PAL: ; %bb.0: ; %bb
	; GFX10-PAL-NEXT: s_getpc_b64 s[4:5]			; GFX10-PAL-NEXT: s_getpc_b64 s[4:5]
	; GFX10-PAL-NEXT: s_mov_b32 s4, s0			; GFX10-PAL-NEXT: s_mov_b32 s4, s0
	; GFX10-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-PAL-NEXT: s_and_b32 s5, s5, 0xffff			; GFX10-PAL-NEXT: s_and_b32 s5, s5, 0xffff
	; GFX10-PAL-NEXT: s_add_u32 s4, s4, s3			; GFX10-PAL-NEXT: s_add_u32 s4, s4, s3
	; GFX10-PAL-NEXT: s_addc_u32 s5, s5, 0			; GFX10-PAL-NEXT: s_addc_u32 s5, s5, 0
	; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4			; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4
	; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5			; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5
	; GFX10-PAL-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX10-PAL-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX10-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX10-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX10-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX10-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX10-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX10-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX10-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX10-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX10-PAL-NEXT: s_add_u32 s0, 4, s0			; GFX10-PAL-NEXT: s_add_i32 s0, s0, 4
	; GFX10-PAL-NEXT: s_add_u32 s1, 4, s1			; GFX10-PAL-NEXT: s_add_i32 s1, s1, 4
	; GFX10-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX10-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX10-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-PAL-NEXT: s_endpgm			; GFX10-PAL-NEXT: s_endpgm
	bb:			bb:
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*			%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
	%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx			%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx
	%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*			%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
	store volatile i32 15, i32 addrspace(5)* %i8, align 4			store volatile i32 15, i32 addrspace(5)* %i8, align 4
	%i9 = and i32 %idx, 15			%i9 = and i32 %idx, 15
	%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9			%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
	%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*			%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
	%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4			%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
	ret void			ret void
	}			}

	define amdgpu_ps void @store_load_sindex_foo(i32 inreg %idx) {			define amdgpu_ps void @store_load_sindex_foo(i32 inreg %idx) {
	; GFX9-LABEL: store_load_sindex_foo:			; GFX9-LABEL: store_load_sindex_foo:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
	; GFX9-NEXT: s_lshl_b32 s0, s2, 2			; GFX9-NEXT: s_lshl_b32 s0, s2, 2
	; GFX9-NEXT: s_add_u32 s0, 4, s0			; GFX9-NEXT: s_add_i32 s0, s0, 4
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: scratch_store_dword off, v0, s0			; GFX9-NEXT: scratch_store_dword off, v0, s0
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_and_b32 s0, s2, 15			; GFX9-NEXT: s_and_b32 s0, s2, 15
	; GFX9-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-NEXT: s_add_u32 s0, 4, s0			; GFX9-NEXT: s_add_i32 s0, s0, 4
	; GFX9-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_sindex_foo:			; GFX10-LABEL: store_load_sindex_foo:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s0, s0, s3			; GFX10-NEXT: s_add_u32 s0, s0, s3
	; GFX10-NEXT: s_addc_u32 s1, s1, 0			; GFX10-NEXT: s_addc_u32 s1, s1, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
	; GFX10-NEXT: s_and_b32 s0, s2, 15			; GFX10-NEXT: s_and_b32 s0, s2, 15
	; GFX10-NEXT: v_mov_b32_e32 v0, 15			; GFX10-NEXT: v_mov_b32_e32 v0, 15
	; GFX10-NEXT: s_lshl_b32 s1, s2, 2			; GFX10-NEXT: s_lshl_b32 s1, s2, 2
	; GFX10-NEXT: s_lshl_b32 s0, s0, 2			; GFX10-NEXT: s_lshl_b32 s0, s0, 2
	; GFX10-NEXT: s_add_u32 s1, 4, s1			; GFX10-NEXT: s_add_i32 s1, s1, 4
	; GFX10-NEXT: s_add_u32 s0, 4, s0			; GFX10-NEXT: s_add_i32 s0, s0, 4
	; GFX10-NEXT: scratch_store_dword off, v0, s1			; GFX10-NEXT: scratch_store_dword off, v0, s1
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s0 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s0 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: store_load_sindex_foo:			; GFX9-PAL-LABEL: store_load_sindex_foo:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1			; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1
	; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2
	; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15			; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15
	; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-PAL-NEXT: s_add_u32 s1, 4, s1			; GFX9-PAL-NEXT: s_add_i32 s1, s1, 4
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1			; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_add_u32 s0, 4, s0			; GFX9-PAL-NEXT: s_add_i32 s0, s0, 4
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX10-PAL-LABEL: store_load_sindex_foo:			; GFX10-PAL-LABEL: store_load_sindex_foo:
	; GFX10-PAL: ; %bb.0: ; %bb			; GFX10-PAL: ; %bb.0: ; %bb
	; GFX10-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX10-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX10-PAL-NEXT: s_mov_b32 s2, s0			; GFX10-PAL-NEXT: s_mov_b32 s2, s0
	; GFX10-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX10-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX10-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX10-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX10-PAL-NEXT: s_add_u32 s2, s2, s1			; GFX10-PAL-NEXT: s_add_u32 s2, s2, s1
	; GFX10-PAL-NEXT: s_addc_u32 s3, s3, 0			; GFX10-PAL-NEXT: s_addc_u32 s3, s3, 0
	; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX10-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX10-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX10-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX10-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX10-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX10-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX10-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX10-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX10-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX10-PAL-NEXT: s_add_u32 s0, 4, s0			; GFX10-PAL-NEXT: s_add_i32 s0, s0, 4
	; GFX10-PAL-NEXT: s_add_u32 s1, 4, s1			; GFX10-PAL-NEXT: s_add_i32 s1, s1, 4
	; GFX10-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX10-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX10-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-PAL-NEXT: s_endpgm			; GFX10-PAL-NEXT: s_endpgm
	bb:			bb:
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*			%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
	▲ Show 20 Lines • Show All 453 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc			; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_lshl_b32 s1, s0, 2			; GFX9-NEXT: s_lshl_b32 s1, s0, 2
	; GFX9-NEXT: s_and_b32 s0, s0, 15			; GFX9-NEXT: s_and_b32 s0, s0, 15
	; GFX9-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: s_add_u32 s1, 0x104, s1			; GFX9-NEXT: s_addk_i32 s1, 0x104
	; GFX9-NEXT: scratch_store_dword off, v0, s1			; GFX9-NEXT: scratch_store_dword off, v0, s1
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_u32 s0, 0x104, s0			; GFX9-NEXT: s_addk_i32 s0, 0x104
	; GFX9-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_sindex_small_offset_kernel:			; GFX10-LABEL: store_load_sindex_small_offset_kernel:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s2, s2, s5			; GFX10-NEXT: s_add_u32 s2, s2, s5
	; GFX10-NEXT: s_addc_u32 s3, s3, 0			; GFX10-NEXT: s_addc_u32 s3, s3, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, 15			; GFX10-NEXT: v_mov_b32_e32 v0, 15
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_and_b32 s1, s0, 15			; GFX10-NEXT: s_and_b32 s1, s0, 15
	; GFX10-NEXT: s_lshl_b32 s0, s0, 2			; GFX10-NEXT: s_lshl_b32 s0, s0, 2
	; GFX10-NEXT: s_lshl_b32 s1, s1, 2			; GFX10-NEXT: s_lshl_b32 s1, s1, 2
	; GFX10-NEXT: s_add_u32 s0, 0x104, s0			; GFX10-NEXT: s_addk_i32 s0, 0x104
	; GFX10-NEXT: s_add_u32 s1, 0x104, s1			; GFX10-NEXT: s_addk_i32 s1, 0x104
	; GFX10-NEXT: scratch_store_dword off, v0, s0			; GFX10-NEXT: scratch_store_dword off, v0, s0
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: store_load_sindex_small_offset_kernel:			; GFX9-PAL-LABEL: store_load_sindex_small_offset_kernel:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_getpc_b64 s[4:5]			; GFX9-PAL-NEXT: s_getpc_b64 s[4:5]
	; GFX9-PAL-NEXT: s_mov_b32 s4, s0			; GFX9-PAL-NEXT: s_mov_b32 s4, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-PAL-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX9-PAL-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-PAL-NEXT: s_and_b32 s5, s5, 0xffff			; GFX9-PAL-NEXT: s_and_b32 s5, s5, 0xffff
	; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s4, s3			; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s4, s3
	; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s5, 0			; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
	; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2
	; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15			; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-PAL-NEXT: s_add_u32 s1, 0x104, s1			; GFX9-PAL-NEXT: s_addk_i32 s1, 0x104
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1			; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_add_u32 s0, 0x104, s0			; GFX9-PAL-NEXT: s_addk_i32 s0, 0x104
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX1010-PAL-LABEL: store_load_sindex_small_offset_kernel:			; GFX1010-PAL-LABEL: store_load_sindex_small_offset_kernel:
	; GFX1010-PAL: ; %bb.0: ; %bb			; GFX1010-PAL: ; %bb.0: ; %bb
	; GFX1010-PAL-NEXT: s_getpc_b64 s[4:5]			; GFX1010-PAL-NEXT: s_getpc_b64 s[4:5]
	; GFX1010-PAL-NEXT: s_mov_b32 s4, s0			; GFX1010-PAL-NEXT: s_mov_b32 s4, s0
	; GFX1010-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX1010-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1010-PAL-NEXT: s_and_b32 s5, s5, 0xffff			; GFX1010-PAL-NEXT: s_and_b32 s5, s5, 0xffff
	; GFX1010-PAL-NEXT: s_add_u32 s4, s4, s3			; GFX1010-PAL-NEXT: s_add_u32 s4, s4, s3
	; GFX1010-PAL-NEXT: s_addc_u32 s5, s5, 0			; GFX1010-PAL-NEXT: s_addc_u32 s5, s5, 0
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5
	; GFX1010-PAL-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX1010-PAL-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0			; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0
	; GFX1010-PAL-NEXT: scratch_load_dword v0, off, vcc_lo offset:4 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v0, off, vcc_lo offset:4 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1010-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX1010-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX1010-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX1010-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX1010-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX1010-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX1010-PAL-NEXT: s_add_u32 s0, 0x104, s0			; GFX1010-PAL-NEXT: s_addk_i32 s0, 0x104
	; GFX1010-PAL-NEXT: s_add_u32 s1, 0x104, s1			; GFX1010-PAL-NEXT: s_addk_i32 s1, 0x104
	; GFX1010-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX1010-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: s_endpgm			; GFX1010-PAL-NEXT: s_endpgm
	;			;
	; GFX1030-PAL-LABEL: store_load_sindex_small_offset_kernel:			; GFX1030-PAL-LABEL: store_load_sindex_small_offset_kernel:
	; GFX1030-PAL: ; %bb.0: ; %bb			; GFX1030-PAL: ; %bb.0: ; %bb
	Show All 9 Lines
	; GFX1030-PAL-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX1030-PAL-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX1030-PAL-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1030-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX1030-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX1030-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX1030-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX1030-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX1030-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX1030-PAL-NEXT: s_add_u32 s0, 0x104, s0			; GFX1030-PAL-NEXT: s_addk_i32 s0, 0x104
	; GFX1030-PAL-NEXT: s_add_u32 s1, 0x104, s1			; GFX1030-PAL-NEXT: s_addk_i32 s1, 0x104
	; GFX1030-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX1030-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: s_endpgm			; GFX1030-PAL-NEXT: s_endpgm
	bb:			bb:
	%padding = alloca [64 x i32], align 4, addrspace(5)			%padding = alloca [64 x i32], align 4, addrspace(5)
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	Show All 14 Lines
	; GFX9-LABEL: store_load_sindex_small_offset_foo:			; GFX9-LABEL: store_load_sindex_small_offset_foo:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
	; GFX9-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-NEXT: s_lshl_b32 s0, s2, 2			; GFX9-NEXT: s_lshl_b32 s0, s2, 2
	; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc			; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_u32 s0, 0x104, s0			; GFX9-NEXT: s_addk_i32 s0, 0x104
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: scratch_store_dword off, v0, s0			; GFX9-NEXT: scratch_store_dword off, v0, s0
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_and_b32 s0, s2, 15			; GFX9-NEXT: s_and_b32 s0, s2, 15
	; GFX9-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-NEXT: s_add_u32 s0, 0x104, s0			; GFX9-NEXT: s_addk_i32 s0, 0x104
	; GFX9-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_sindex_small_offset_foo:			; GFX10-LABEL: store_load_sindex_small_offset_foo:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s0, s0, s3			; GFX10-NEXT: s_add_u32 s0, s0, s3
	; GFX10-NEXT: s_addc_u32 s1, s1, 0			; GFX10-NEXT: s_addc_u32 s1, s1, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
	; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_and_b32 s0, s2, 15			; GFX10-NEXT: s_and_b32 s0, s2, 15
	; GFX10-NEXT: v_mov_b32_e32 v0, 15			; GFX10-NEXT: v_mov_b32_e32 v0, 15
	; GFX10-NEXT: s_lshl_b32 s1, s2, 2			; GFX10-NEXT: s_lshl_b32 s1, s2, 2
	; GFX10-NEXT: s_lshl_b32 s0, s0, 2			; GFX10-NEXT: s_lshl_b32 s0, s0, 2
	; GFX10-NEXT: s_add_u32 s1, 0x104, s1			; GFX10-NEXT: s_addk_i32 s1, 0x104
	; GFX10-NEXT: s_add_u32 s0, 0x104, s0			; GFX10-NEXT: s_addk_i32 s0, 0x104
	; GFX10-NEXT: scratch_store_dword off, v0, s1			; GFX10-NEXT: scratch_store_dword off, v0, s1
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s0 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s0 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: store_load_sindex_small_offset_foo:			; GFX9-PAL-LABEL: store_load_sindex_small_offset_foo:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1			; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1
	; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2
	; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15			; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-PAL-NEXT: s_add_u32 s1, 0x104, s1			; GFX9-PAL-NEXT: s_addk_i32 s1, 0x104
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1			; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_add_u32 s0, 0x104, s0			; GFX9-PAL-NEXT: s_addk_i32 s0, 0x104
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX1010-PAL-LABEL: store_load_sindex_small_offset_foo:			; GFX1010-PAL-LABEL: store_load_sindex_small_offset_foo:
	; GFX1010-PAL: ; %bb.0: ; %bb			; GFX1010-PAL: ; %bb.0: ; %bb
	; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1010-PAL-NEXT: s_mov_b32 s2, s0			; GFX1010-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1010-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX1010-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX1010-PAL-NEXT: s_add_u32 s2, s2, s1			; GFX1010-PAL-NEXT: s_add_u32 s2, s2, s1
	; GFX1010-PAL-NEXT: s_addc_u32 s3, s3, 0			; GFX1010-PAL-NEXT: s_addc_u32 s3, s3, 0
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0			; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0
	; GFX1010-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX1010-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX1010-PAL-NEXT: scratch_load_dword v0, off, vcc_lo offset:4 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v0, off, vcc_lo offset:4 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX1010-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX1010-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX1010-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX1010-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX1010-PAL-NEXT: s_add_u32 s0, 0x104, s0			; GFX1010-PAL-NEXT: s_addk_i32 s0, 0x104
	; GFX1010-PAL-NEXT: s_add_u32 s1, 0x104, s1			; GFX1010-PAL-NEXT: s_addk_i32 s1, 0x104
	; GFX1010-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX1010-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: s_endpgm			; GFX1010-PAL-NEXT: s_endpgm
	;			;
	; GFX1030-PAL-LABEL: store_load_sindex_small_offset_foo:			; GFX1030-PAL-LABEL: store_load_sindex_small_offset_foo:
	; GFX1030-PAL: ; %bb.0: ; %bb			; GFX1030-PAL: ; %bb.0: ; %bb
	; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1030-PAL-NEXT: s_mov_b32 s2, s0			; GFX1030-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1030-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX1030-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX1030-PAL-NEXT: s_add_u32 s2, s2, s1			; GFX1030-PAL-NEXT: s_add_u32 s2, s2, s1
	; GFX1030-PAL-NEXT: s_addc_u32 s3, s3, 0			; GFX1030-PAL-NEXT: s_addc_u32 s3, s3, 0
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX1030-PAL-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX1030-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX1030-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX1030-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX1030-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX1030-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX1030-PAL-NEXT: s_add_u32 s0, 0x104, s0			; GFX1030-PAL-NEXT: s_addk_i32 s0, 0x104
	; GFX1030-PAL-NEXT: s_add_u32 s1, 0x104, s1			; GFX1030-PAL-NEXT: s_addk_i32 s1, 0x104
	; GFX1030-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX1030-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: s_endpgm			; GFX1030-PAL-NEXT: s_endpgm
	bb:			bb:
	%padding = alloca [64 x i32], align 4, addrspace(5)			%padding = alloca [64 x i32], align 4, addrspace(5)
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	▲ Show 20 Lines • Show All 138 Lines • ▼ Show 20 Lines
	}			}

	define void @store_load_vindex_small_offset_foo(i32 %idx) {			define void @store_load_vindex_small_offset_foo(i32 %idx) {
	; GFX9-LABEL: store_load_vindex_small_offset_foo:			; GFX9-LABEL: store_load_vindex_small_offset_foo:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: scratch_load_dword v1, off, s32 glc			; GFX9-NEXT: scratch_load_dword v1, off, s32 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x100			; GFX9-NEXT: s_add_i32 vcc_hi, s32, 0x100
	; GFX9-NEXT: v_mov_b32_e32 v1, vcc_hi			; GFX9-NEXT: v_mov_b32_e32 v1, vcc_hi
	; GFX9-NEXT: v_mov_b32_e32 v3, 15			; GFX9-NEXT: v_mov_b32_e32 v3, 15
	; GFX9-NEXT: v_lshl_add_u32 v2, v0, 2, v1			; GFX9-NEXT: v_lshl_add_u32 v2, v0, 2, v1
	; GFX9-NEXT: v_and_b32_e32 v0, v0, v3			; GFX9-NEXT: v_and_b32_e32 v0, v0, v3
	; GFX9-NEXT: scratch_store_dword v2, v3, off			; GFX9-NEXT: scratch_store_dword v2, v3, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_lshl_add_u32 v0, v0, 2, v1			; GFX9-NEXT: v_lshl_add_u32 v0, v0, 2, v1
	; GFX9-NEXT: scratch_load_dword v0, v0, off glc			; GFX9-NEXT: scratch_load_dword v0, v0, off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: store_load_vindex_small_offset_foo:			; GFX10-LABEL: store_load_vindex_small_offset_foo:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_mov_b32_e32 v1, 15			; GFX10-NEXT: v_mov_b32_e32 v1, 15
	; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x100			; GFX10-NEXT: s_add_i32 vcc_lo, s32, 0x100
	; GFX10-NEXT: v_mov_b32_e32 v2, vcc_lo			; GFX10-NEXT: v_mov_b32_e32 v2, vcc_lo
	; GFX10-NEXT: v_and_b32_e32 v3, v0, v1			; GFX10-NEXT: v_and_b32_e32 v3, v0, v1
	; GFX10-NEXT: v_lshl_add_u32 v0, v0, 2, v2			; GFX10-NEXT: v_lshl_add_u32 v0, v0, 2, v2
	; GFX10-NEXT: v_lshl_add_u32 v2, v3, 2, v2			; GFX10-NEXT: v_lshl_add_u32 v2, v3, 2, v2
	; GFX10-NEXT: scratch_load_dword v3, off, s32 glc dlc			; GFX10-NEXT: scratch_load_dword v3, off, s32 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: scratch_store_dword v0, v1, off			; GFX10-NEXT: scratch_store_dword v0, v1, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, v2, off glc dlc			; GFX10-NEXT: scratch_load_dword v0, v2, off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-PAL-LABEL: store_load_vindex_small_offset_foo:			; GFX9-PAL-LABEL: store_load_vindex_small_offset_foo:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-PAL-NEXT: scratch_load_dword v1, off, s32 glc			; GFX9-PAL-NEXT: scratch_load_dword v1, off, s32 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_add_u32 vcc_hi, s32, 0x100			; GFX9-PAL-NEXT: s_add_i32 vcc_hi, s32, 0x100
	; GFX9-PAL-NEXT: v_mov_b32_e32 v1, vcc_hi			; GFX9-PAL-NEXT: v_mov_b32_e32 v1, vcc_hi
	; GFX9-PAL-NEXT: v_mov_b32_e32 v3, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v3, 15
	; GFX9-PAL-NEXT: v_lshl_add_u32 v2, v0, 2, v1			; GFX9-PAL-NEXT: v_lshl_add_u32 v2, v0, 2, v1
	; GFX9-PAL-NEXT: v_and_b32_e32 v0, v0, v3			; GFX9-PAL-NEXT: v_and_b32_e32 v0, v0, v3
	; GFX9-PAL-NEXT: scratch_store_dword v2, v3, off			; GFX9-PAL-NEXT: scratch_store_dword v2, v3, off
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, v1			; GFX9-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, v1
	; GFX9-PAL-NEXT: scratch_load_dword v0, v0, off glc			; GFX9-PAL-NEXT: scratch_load_dword v0, v0, off glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-PAL-LABEL: store_load_vindex_small_offset_foo:			; GFX10-PAL-LABEL: store_load_vindex_small_offset_foo:
	; GFX10-PAL: ; %bb.0: ; %bb			; GFX10-PAL: ; %bb.0: ; %bb
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: v_mov_b32_e32 v1, 15			; GFX10-PAL-NEXT: v_mov_b32_e32 v1, 15
	; GFX10-PAL-NEXT: s_add_u32 vcc_lo, s32, 0x100			; GFX10-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x100
	; GFX10-PAL-NEXT: v_mov_b32_e32 v2, vcc_lo			; GFX10-PAL-NEXT: v_mov_b32_e32 v2, vcc_lo
	; GFX10-PAL-NEXT: v_and_b32_e32 v3, v0, v1			; GFX10-PAL-NEXT: v_and_b32_e32 v3, v0, v1
	; GFX10-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, v2			; GFX10-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, v2
	; GFX10-PAL-NEXT: v_lshl_add_u32 v2, v3, 2, v2			; GFX10-PAL-NEXT: v_lshl_add_u32 v2, v3, 2, v2
	; GFX10-PAL-NEXT: scratch_load_dword v3, off, s32 glc dlc			; GFX10-PAL-NEXT: scratch_load_dword v3, off, s32 glc dlc
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-PAL-NEXT: scratch_store_dword v0, v1, off			; GFX10-PAL-NEXT: scratch_store_dword v0, v1, off
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	▲ Show 20 Lines • Show All 181 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_mov_b32 s0, 0			; GFX9-NEXT: s_mov_b32 s0, 0
	; GFX9-NEXT: s_mov_b32 s1, s0			; GFX9-NEXT: s_mov_b32 s1, s0
	; GFX9-NEXT: s_mov_b32 s2, s0			; GFX9-NEXT: s_mov_b32 s2, s0
	; GFX9-NEXT: s_mov_b32 s3, s0			; GFX9-NEXT: s_mov_b32 s3, s0
	; GFX9-NEXT: v_mov_b32_e32 v0, s0			; GFX9-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-NEXT: v_mov_b32_e32 v1, s1			; GFX9-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-NEXT: v_mov_b32_e32 v2, s2			; GFX9-NEXT: v_mov_b32_e32 v2, s2
	; GFX9-NEXT: v_mov_b32_e32 v3, s3			; GFX9-NEXT: v_mov_b32_e32 v3, s3
	; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000			; GFX9-NEXT: s_add_i32 vcc_hi, s32, 0x4000
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi
	; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000			; GFX9-NEXT: s_add_i32 vcc_hi, s32, 0x4000
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:16			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:16
	; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000			; GFX9-NEXT: s_add_i32 vcc_hi, s32, 0x4000
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:32			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:32
	; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000			; GFX9-NEXT: s_add_i32 vcc_hi, s32, 0x4000
	; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:48			; GFX9-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:48
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: zero_init_large_offset_foo:			; GFX10-LABEL: zero_init_large_offset_foo:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s32 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s32 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_mov_b32 s0, 0			; GFX10-NEXT: s_mov_b32 s0, 0
	; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000			; GFX10-NEXT: s_add_i32 vcc_lo, s32, 0x4000
	; GFX10-NEXT: s_mov_b32 s1, s0			; GFX10-NEXT: s_mov_b32 s1, s0
	; GFX10-NEXT: s_mov_b32 s2, s0			; GFX10-NEXT: s_mov_b32 s2, s0
	; GFX10-NEXT: s_mov_b32 s3, s0			; GFX10-NEXT: s_mov_b32 s3, s0
	; GFX10-NEXT: v_mov_b32_e32 v0, s0			; GFX10-NEXT: v_mov_b32_e32 v0, s0
	; GFX10-NEXT: v_mov_b32_e32 v1, s1			; GFX10-NEXT: v_mov_b32_e32 v1, s1
	; GFX10-NEXT: v_mov_b32_e32 v2, s2			; GFX10-NEXT: v_mov_b32_e32 v2, s2
	; GFX10-NEXT: v_mov_b32_e32 v3, s3			; GFX10-NEXT: v_mov_b32_e32 v3, s3
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo
	; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000			; GFX10-NEXT: s_add_i32 vcc_lo, s32, 0x4000
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:16			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:16
	; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000			; GFX10-NEXT: s_add_i32 vcc_lo, s32, 0x4000
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:32			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:32
	; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000			; GFX10-NEXT: s_add_i32 vcc_lo, s32, 0x4000
	; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:48			; GFX10-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:48
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-PAL-LABEL: zero_init_large_offset_foo:			; GFX9-PAL-LABEL: zero_init_large_offset_foo:
	; GFX9-PAL: ; %bb.0:			; GFX9-PAL: ; %bb.0:
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, s32 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, s32 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_mov_b32 s0, 0			; GFX9-PAL-NEXT: s_mov_b32 s0, 0
	; GFX9-PAL-NEXT: s_mov_b32 s1, s0			; GFX9-PAL-NEXT: s_mov_b32 s1, s0
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_mov_b32 s3, s0			; GFX9-PAL-NEXT: s_mov_b32 s3, s0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX9-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX9-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX9-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX9-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX9-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX9-PAL-NEXT: s_add_u32 vcc_hi, s32, 0x4000			; GFX9-PAL-NEXT: s_add_i32 vcc_hi, s32, 0x4000
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi
	; GFX9-PAL-NEXT: s_add_u32 vcc_hi, s32, 0x4000			; GFX9-PAL-NEXT: s_add_i32 vcc_hi, s32, 0x4000
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:16			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:16
	; GFX9-PAL-NEXT: s_add_u32 vcc_hi, s32, 0x4000			; GFX9-PAL-NEXT: s_add_i32 vcc_hi, s32, 0x4000
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:32			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:32
	; GFX9-PAL-NEXT: s_add_u32 vcc_hi, s32, 0x4000			; GFX9-PAL-NEXT: s_add_i32 vcc_hi, s32, 0x4000
	; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:48			; GFX9-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_hi offset:48
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX1010-PAL-LABEL: zero_init_large_offset_foo:			; GFX1010-PAL-LABEL: zero_init_large_offset_foo:
	; GFX1010-PAL: ; %bb.0:			; GFX1010-PAL: ; %bb.0:
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s32 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s32 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: s_mov_b32 s0, 0			; GFX1010-PAL-NEXT: s_mov_b32 s0, 0
	; GFX1010-PAL-NEXT: s_add_u32 vcc_lo, s32, 0x4000			; GFX1010-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x4000
	; GFX1010-PAL-NEXT: s_mov_b32 s1, s0			; GFX1010-PAL-NEXT: s_mov_b32 s1, s0
	; GFX1010-PAL-NEXT: s_mov_b32 s2, s0			; GFX1010-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1010-PAL-NEXT: s_mov_b32 s3, s0			; GFX1010-PAL-NEXT: s_mov_b32 s3, s0
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX1010-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX1010-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX1010-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo			; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo
	; GFX1010-PAL-NEXT: s_waitcnt_depctr 0xffe3			; GFX1010-PAL-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1010-PAL-NEXT: s_add_u32 vcc_lo, s32, 0x4000			; GFX1010-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x4000
	; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:16			; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:16
	; GFX1010-PAL-NEXT: s_waitcnt_depctr 0xffe3			; GFX1010-PAL-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1010-PAL-NEXT: s_add_u32 vcc_lo, s32, 0x4000			; GFX1010-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x4000
	; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:32			; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:32
	; GFX1010-PAL-NEXT: s_waitcnt_depctr 0xffe3			; GFX1010-PAL-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1010-PAL-NEXT: s_add_u32 vcc_lo, s32, 0x4000			; GFX1010-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x4000
	; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:48			; GFX1010-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:48
	; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1010-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX1010-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX1030-PAL-LABEL: zero_init_large_offset_foo:			; GFX1030-PAL-LABEL: zero_init_large_offset_foo:
	; GFX1030-PAL: ; %bb.0:			; GFX1030-PAL: ; %bb.0:
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s32 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s32 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: s_mov_b32 s0, 0			; GFX1030-PAL-NEXT: s_mov_b32 s0, 0
	; GFX1030-PAL-NEXT: s_add_u32 vcc_lo, s32, 0x4000			; GFX1030-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x4000
	; GFX1030-PAL-NEXT: s_mov_b32 s1, s0			; GFX1030-PAL-NEXT: s_mov_b32 s1, s0
	; GFX1030-PAL-NEXT: s_mov_b32 s2, s0			; GFX1030-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1030-PAL-NEXT: s_mov_b32 s3, s0			; GFX1030-PAL-NEXT: s_mov_b32 s3, s0
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, s0			; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, s0
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v1, s1			; GFX1030-PAL-NEXT: v_mov_b32_e32 v1, s1
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v2, s2			; GFX1030-PAL-NEXT: v_mov_b32_e32 v2, s2
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v3, s3			; GFX1030-PAL-NEXT: v_mov_b32_e32 v3, s3
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo
	; GFX1030-PAL-NEXT: s_add_u32 vcc_lo, s32, 0x4000			; GFX1030-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x4000
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:16			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:16
	; GFX1030-PAL-NEXT: s_add_u32 vcc_lo, s32, 0x4000			; GFX1030-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x4000
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:32			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:32
	; GFX1030-PAL-NEXT: s_add_u32 vcc_lo, s32, 0x4000			; GFX1030-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x4000
	; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:48			; GFX1030-PAL-NEXT: scratch_store_dwordx4 off, v[0:3], vcc_lo offset:48
	; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1030-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX1030-PAL-NEXT: s_setpc_b64 s[30:31]
	%padding = alloca [4096 x i32], align 4, addrspace(5)			%padding = alloca [4096 x i32], align 4, addrspace(5)
	%alloca = alloca [32 x i16], align 2, addrspace(5)			%alloca = alloca [32 x i16], align 2, addrspace(5)
	%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef			%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef
	%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4			%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
	%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*			%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*
	Show All 9 Lines
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc			; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_lshl_b32 s1, s0, 2			; GFX9-NEXT: s_lshl_b32 s1, s0, 2
	; GFX9-NEXT: s_and_b32 s0, s0, 15			; GFX9-NEXT: s_and_b32 s0, s0, 15
	; GFX9-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: s_add_u32 s1, 0x4004, s1			; GFX9-NEXT: s_addk_i32 s1, 0x4004
	; GFX9-NEXT: scratch_store_dword off, v0, s1			; GFX9-NEXT: scratch_store_dword off, v0, s1
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_u32 s0, 0x4004, s0			; GFX9-NEXT: s_addk_i32 s0, 0x4004
	; GFX9-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_sindex_large_offset_kernel:			; GFX10-LABEL: store_load_sindex_large_offset_kernel:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s2, s2, s5			; GFX10-NEXT: s_add_u32 s2, s2, s5
	; GFX10-NEXT: s_addc_u32 s3, s3, 0			; GFX10-NEXT: s_addc_u32 s3, s3, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, 15			; GFX10-NEXT: v_mov_b32_e32 v0, 15
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_and_b32 s1, s0, 15			; GFX10-NEXT: s_and_b32 s1, s0, 15
	; GFX10-NEXT: s_lshl_b32 s0, s0, 2			; GFX10-NEXT: s_lshl_b32 s0, s0, 2
	; GFX10-NEXT: s_lshl_b32 s1, s1, 2			; GFX10-NEXT: s_lshl_b32 s1, s1, 2
	; GFX10-NEXT: s_add_u32 s0, 0x4004, s0			; GFX10-NEXT: s_addk_i32 s0, 0x4004
	; GFX10-NEXT: s_add_u32 s1, 0x4004, s1			; GFX10-NEXT: s_addk_i32 s1, 0x4004
	; GFX10-NEXT: scratch_store_dword off, v0, s0			; GFX10-NEXT: scratch_store_dword off, v0, s0
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: store_load_sindex_large_offset_kernel:			; GFX9-PAL-LABEL: store_load_sindex_large_offset_kernel:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_getpc_b64 s[4:5]			; GFX9-PAL-NEXT: s_getpc_b64 s[4:5]
	; GFX9-PAL-NEXT: s_mov_b32 s4, s0			; GFX9-PAL-NEXT: s_mov_b32 s4, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-PAL-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX9-PAL-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-PAL-NEXT: s_and_b32 s5, s5, 0xffff			; GFX9-PAL-NEXT: s_and_b32 s5, s5, 0xffff
	; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s4, s3			; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s4, s3
	; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s5, 0			; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s5, 0
	; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2
	; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15			; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-PAL-NEXT: s_add_u32 s1, 0x4004, s1			; GFX9-PAL-NEXT: s_addk_i32 s1, 0x4004
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1			; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_add_u32 s0, 0x4004, s0			; GFX9-PAL-NEXT: s_addk_i32 s0, 0x4004
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX1010-PAL-LABEL: store_load_sindex_large_offset_kernel:			; GFX1010-PAL-LABEL: store_load_sindex_large_offset_kernel:
	; GFX1010-PAL: ; %bb.0: ; %bb			; GFX1010-PAL: ; %bb.0: ; %bb
	; GFX1010-PAL-NEXT: s_getpc_b64 s[4:5]			; GFX1010-PAL-NEXT: s_getpc_b64 s[4:5]
	; GFX1010-PAL-NEXT: s_mov_b32 s4, s0			; GFX1010-PAL-NEXT: s_mov_b32 s4, s0
	; GFX1010-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX1010-PAL-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1010-PAL-NEXT: s_and_b32 s5, s5, 0xffff			; GFX1010-PAL-NEXT: s_and_b32 s5, s5, 0xffff
	; GFX1010-PAL-NEXT: s_add_u32 s4, s4, s3			; GFX1010-PAL-NEXT: s_add_u32 s4, s4, s3
	; GFX1010-PAL-NEXT: s_addc_u32 s5, s5, 0			; GFX1010-PAL-NEXT: s_addc_u32 s5, s5, 0
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s4
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s5
	; GFX1010-PAL-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX1010-PAL-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0			; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0
	; GFX1010-PAL-NEXT: scratch_load_dword v0, off, vcc_lo offset:4 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v0, off, vcc_lo offset:4 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1010-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX1010-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX1010-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX1010-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX1010-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX1010-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX1010-PAL-NEXT: s_add_u32 s0, 0x4004, s0			; GFX1010-PAL-NEXT: s_addk_i32 s0, 0x4004
	; GFX1010-PAL-NEXT: s_add_u32 s1, 0x4004, s1			; GFX1010-PAL-NEXT: s_addk_i32 s1, 0x4004
	; GFX1010-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX1010-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: s_endpgm			; GFX1010-PAL-NEXT: s_endpgm
	;			;
	; GFX1030-PAL-LABEL: store_load_sindex_large_offset_kernel:			; GFX1030-PAL-LABEL: store_load_sindex_large_offset_kernel:
	; GFX1030-PAL: ; %bb.0: ; %bb			; GFX1030-PAL: ; %bb.0: ; %bb
	Show All 9 Lines
	; GFX1030-PAL-NEXT: s_load_dword s0, s[0:1], 0x24			; GFX1030-PAL-NEXT: s_load_dword s0, s[0:1], 0x24
	; GFX1030-PAL-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1030-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX1030-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX1030-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX1030-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX1030-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX1030-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX1030-PAL-NEXT: s_add_u32 s0, 0x4004, s0			; GFX1030-PAL-NEXT: s_addk_i32 s0, 0x4004
	; GFX1030-PAL-NEXT: s_add_u32 s1, 0x4004, s1			; GFX1030-PAL-NEXT: s_addk_i32 s1, 0x4004
	; GFX1030-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX1030-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: s_endpgm			; GFX1030-PAL-NEXT: s_endpgm
	bb:			bb:
	%padding = alloca [4096 x i32], align 4, addrspace(5)			%padding = alloca [4096 x i32], align 4, addrspace(5)
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	Show All 14 Lines
	; GFX9-LABEL: store_load_sindex_large_offset_foo:			; GFX9-LABEL: store_load_sindex_large_offset_foo:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
	; GFX9-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-NEXT: s_lshl_b32 s0, s2, 2			; GFX9-NEXT: s_lshl_b32 s0, s2, 2
	; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc			; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_u32 s0, 0x4004, s0			; GFX9-NEXT: s_addk_i32 s0, 0x4004
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: scratch_store_dword off, v0, s0			; GFX9-NEXT: scratch_store_dword off, v0, s0
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_and_b32 s0, s2, 15			; GFX9-NEXT: s_and_b32 s0, s2, 15
	; GFX9-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-NEXT: s_add_u32 s0, 0x4004, s0			; GFX9-NEXT: s_addk_i32 s0, 0x4004
	; GFX9-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_sindex_large_offset_foo:			; GFX10-LABEL: store_load_sindex_large_offset_foo:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s0, s0, s3			; GFX10-NEXT: s_add_u32 s0, s0, s3
	; GFX10-NEXT: s_addc_u32 s1, s1, 0			; GFX10-NEXT: s_addc_u32 s1, s1, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
	; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_and_b32 s0, s2, 15			; GFX10-NEXT: s_and_b32 s0, s2, 15
	; GFX10-NEXT: v_mov_b32_e32 v0, 15			; GFX10-NEXT: v_mov_b32_e32 v0, 15
	; GFX10-NEXT: s_lshl_b32 s1, s2, 2			; GFX10-NEXT: s_lshl_b32 s1, s2, 2
	; GFX10-NEXT: s_lshl_b32 s0, s0, 2			; GFX10-NEXT: s_lshl_b32 s0, s0, 2
	; GFX10-NEXT: s_add_u32 s1, 0x4004, s1			; GFX10-NEXT: s_addk_i32 s1, 0x4004
	; GFX10-NEXT: s_add_u32 s0, 0x4004, s0			; GFX10-NEXT: s_addk_i32 s0, 0x4004
	; GFX10-NEXT: scratch_store_dword off, v0, s1			; GFX10-NEXT: scratch_store_dword off, v0, s1
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s0 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s0 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: store_load_sindex_large_offset_foo:			; GFX9-PAL-LABEL: store_load_sindex_large_offset_foo:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1			; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1
	; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s1, s0, 2
	; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15			; GFX9-PAL-NEXT: s_and_b32 s0, s0, 15
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, vcc_hi offset:4 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX9-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX9-PAL-NEXT: s_add_u32 s1, 0x4004, s1			; GFX9-PAL-NEXT: s_addk_i32 s1, 0x4004
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1			; GFX9-PAL-NEXT: scratch_store_dword off, v0, s1
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_add_u32 s0, 0x4004, s0			; GFX9-PAL-NEXT: s_addk_i32 s0, 0x4004
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX1010-PAL-LABEL: store_load_sindex_large_offset_foo:			; GFX1010-PAL-LABEL: store_load_sindex_large_offset_foo:
	; GFX1010-PAL: ; %bb.0: ; %bb			; GFX1010-PAL: ; %bb.0: ; %bb
	; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1010-PAL-NEXT: s_mov_b32 s2, s0			; GFX1010-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1010-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX1010-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX1010-PAL-NEXT: s_add_u32 s2, s2, s1			; GFX1010-PAL-NEXT: s_add_u32 s2, s2, s1
	; GFX1010-PAL-NEXT: s_addc_u32 s3, s3, 0			; GFX1010-PAL-NEXT: s_addc_u32 s3, s3, 0
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0			; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0
	; GFX1010-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX1010-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX1010-PAL-NEXT: scratch_load_dword v0, off, vcc_lo offset:4 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v0, off, vcc_lo offset:4 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX1010-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX1010-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX1010-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX1010-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX1010-PAL-NEXT: s_add_u32 s0, 0x4004, s0			; GFX1010-PAL-NEXT: s_addk_i32 s0, 0x4004
	; GFX1010-PAL-NEXT: s_add_u32 s1, 0x4004, s1			; GFX1010-PAL-NEXT: s_addk_i32 s1, 0x4004
	; GFX1010-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX1010-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: s_endpgm			; GFX1010-PAL-NEXT: s_endpgm
	;			;
	; GFX1030-PAL-LABEL: store_load_sindex_large_offset_foo:			; GFX1030-PAL-LABEL: store_load_sindex_large_offset_foo:
	; GFX1030-PAL: ; %bb.0: ; %bb			; GFX1030-PAL: ; %bb.0: ; %bb
	; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1030-PAL-NEXT: s_mov_b32 s2, s0			; GFX1030-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1030-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX1030-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX1030-PAL-NEXT: s_add_u32 s2, s2, s1			; GFX1030-PAL-NEXT: s_add_u32 s2, s2, s1
	; GFX1030-PAL-NEXT: s_addc_u32 s3, s3, 0			; GFX1030-PAL-NEXT: s_addc_u32 s3, s3, 0
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX1030-PAL-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, off, off offset:4 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: s_and_b32 s1, s0, 15			; GFX1030-PAL-NEXT: s_and_b32 s1, s0, 15
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX1030-PAL-NEXT: s_lshl_b32 s0, s0, 2			; GFX1030-PAL-NEXT: s_lshl_b32 s0, s0, 2
	; GFX1030-PAL-NEXT: s_lshl_b32 s1, s1, 2			; GFX1030-PAL-NEXT: s_lshl_b32 s1, s1, 2
	; GFX1030-PAL-NEXT: s_add_u32 s0, 0x4004, s0			; GFX1030-PAL-NEXT: s_addk_i32 s0, 0x4004
	; GFX1030-PAL-NEXT: s_add_u32 s1, 0x4004, s1			; GFX1030-PAL-NEXT: s_addk_i32 s1, 0x4004
	; GFX1030-PAL-NEXT: scratch_store_dword off, v0, s0			; GFX1030-PAL-NEXT: scratch_store_dword off, v0, s0
	; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s1 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: s_endpgm			; GFX1030-PAL-NEXT: s_endpgm
	bb:			bb:
	%padding = alloca [4096 x i32], align 4, addrspace(5)			%padding = alloca [4096 x i32], align 4, addrspace(5)
	%i = alloca [32 x float], align 4, addrspace(5)			%i = alloca [32 x float], align 4, addrspace(5)
	▲ Show 20 Lines • Show All 138 Lines • ▼ Show 20 Lines
	}			}

	define void @store_load_vindex_large_offset_foo(i32 %idx) {			define void @store_load_vindex_large_offset_foo(i32 %idx) {
	; GFX9-LABEL: store_load_vindex_large_offset_foo:			; GFX9-LABEL: store_load_vindex_large_offset_foo:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: scratch_load_dword v1, off, s32 glc			; GFX9-NEXT: scratch_load_dword v1, off, s32 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000			; GFX9-NEXT: s_add_i32 vcc_hi, s32, 0x4000
	; GFX9-NEXT: v_mov_b32_e32 v1, vcc_hi			; GFX9-NEXT: v_mov_b32_e32 v1, vcc_hi
	; GFX9-NEXT: v_mov_b32_e32 v3, 15			; GFX9-NEXT: v_mov_b32_e32 v3, 15
	; GFX9-NEXT: v_lshl_add_u32 v2, v0, 2, v1			; GFX9-NEXT: v_lshl_add_u32 v2, v0, 2, v1
	; GFX9-NEXT: v_and_b32_e32 v0, v0, v3			; GFX9-NEXT: v_and_b32_e32 v0, v0, v3
	; GFX9-NEXT: scratch_store_dword v2, v3, off			; GFX9-NEXT: scratch_store_dword v2, v3, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_lshl_add_u32 v0, v0, 2, v1			; GFX9-NEXT: v_lshl_add_u32 v0, v0, 2, v1
	; GFX9-NEXT: scratch_load_dword v0, v0, off glc			; GFX9-NEXT: scratch_load_dword v0, v0, off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: store_load_vindex_large_offset_foo:			; GFX10-LABEL: store_load_vindex_large_offset_foo:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_mov_b32_e32 v1, 15			; GFX10-NEXT: v_mov_b32_e32 v1, 15
	; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000			; GFX10-NEXT: s_add_i32 vcc_lo, s32, 0x4000
	; GFX10-NEXT: v_mov_b32_e32 v2, vcc_lo			; GFX10-NEXT: v_mov_b32_e32 v2, vcc_lo
	; GFX10-NEXT: v_and_b32_e32 v3, v0, v1			; GFX10-NEXT: v_and_b32_e32 v3, v0, v1
	; GFX10-NEXT: v_lshl_add_u32 v0, v0, 2, v2			; GFX10-NEXT: v_lshl_add_u32 v0, v0, 2, v2
	; GFX10-NEXT: v_lshl_add_u32 v2, v3, 2, v2			; GFX10-NEXT: v_lshl_add_u32 v2, v3, 2, v2
	; GFX10-NEXT: scratch_load_dword v3, off, s32 glc dlc			; GFX10-NEXT: scratch_load_dword v3, off, s32 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: scratch_store_dword v0, v1, off			; GFX10-NEXT: scratch_store_dword v0, v1, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, v2, off glc dlc			; GFX10-NEXT: scratch_load_dword v0, v2, off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-PAL-LABEL: store_load_vindex_large_offset_foo:			; GFX9-PAL-LABEL: store_load_vindex_large_offset_foo:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-PAL-NEXT: scratch_load_dword v1, off, s32 glc			; GFX9-PAL-NEXT: scratch_load_dword v1, off, s32 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_add_u32 vcc_hi, s32, 0x4000			; GFX9-PAL-NEXT: s_add_i32 vcc_hi, s32, 0x4000
	; GFX9-PAL-NEXT: v_mov_b32_e32 v1, vcc_hi			; GFX9-PAL-NEXT: v_mov_b32_e32 v1, vcc_hi
	; GFX9-PAL-NEXT: v_mov_b32_e32 v3, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v3, 15
	; GFX9-PAL-NEXT: v_lshl_add_u32 v2, v0, 2, v1			; GFX9-PAL-NEXT: v_lshl_add_u32 v2, v0, 2, v1
	; GFX9-PAL-NEXT: v_and_b32_e32 v0, v0, v3			; GFX9-PAL-NEXT: v_and_b32_e32 v0, v0, v3
	; GFX9-PAL-NEXT: scratch_store_dword v2, v3, off			; GFX9-PAL-NEXT: scratch_store_dword v2, v3, off
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, v1			; GFX9-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, v1
	; GFX9-PAL-NEXT: scratch_load_dword v0, v0, off glc			; GFX9-PAL-NEXT: scratch_load_dword v0, v0, off glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-PAL-LABEL: store_load_vindex_large_offset_foo:			; GFX10-PAL-LABEL: store_load_vindex_large_offset_foo:
	; GFX10-PAL: ; %bb.0: ; %bb			; GFX10-PAL: ; %bb.0: ; %bb
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: v_mov_b32_e32 v1, 15			; GFX10-PAL-NEXT: v_mov_b32_e32 v1, 15
	; GFX10-PAL-NEXT: s_add_u32 vcc_lo, s32, 0x4000			; GFX10-PAL-NEXT: s_add_i32 vcc_lo, s32, 0x4000
	; GFX10-PAL-NEXT: v_mov_b32_e32 v2, vcc_lo			; GFX10-PAL-NEXT: v_mov_b32_e32 v2, vcc_lo
	; GFX10-PAL-NEXT: v_and_b32_e32 v3, v0, v1			; GFX10-PAL-NEXT: v_and_b32_e32 v3, v0, v1
	; GFX10-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, v2			; GFX10-PAL-NEXT: v_lshl_add_u32 v0, v0, 2, v2
	; GFX10-PAL-NEXT: v_lshl_add_u32 v2, v3, 2, v2			; GFX10-PAL-NEXT: v_lshl_add_u32 v2, v3, 2, v2
	; GFX10-PAL-NEXT: scratch_load_dword v3, off, s32 glc dlc			; GFX10-PAL-NEXT: scratch_load_dword v3, off, s32 glc dlc
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-PAL-NEXT: scratch_store_dword v0, v1, off			; GFX10-PAL-NEXT: scratch_store_dword v0, v1, off
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	Show All 21 Lines
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
	; GFX9-NEXT: s_movk_i32 s0, 0x3000			; GFX9-NEXT: s_movk_i32 s0, 0x3000
	; GFX9-NEXT: v_mov_b32_e32 v0, 13			; GFX9-NEXT: v_mov_b32_e32 v0, 13
	; GFX9-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:4			; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_u32 s0, 4, s0			; GFX9-NEXT: s_add_i32 s0, s0, 4
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: scratch_store_dword off, v0, s0 offset:3712			; GFX9-NEXT: scratch_store_dword off, v0, s0 offset:3712
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: scratch_load_dword v0, off, s0 offset:3712 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 offset:3712 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_endpgm			; GFX9-NEXT: s_endpgm
	;			;
	; GFX10-LABEL: store_load_large_imm_offset_kernel:			; GFX10-LABEL: store_load_large_imm_offset_kernel:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_add_u32 s0, s0, s3			; GFX10-NEXT: s_add_u32 s0, s0, s3
	; GFX10-NEXT: s_addc_u32 s1, s1, 0			; GFX10-NEXT: s_addc_u32 s1, s1, 0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
	; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1			; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
	; GFX10-NEXT: v_mov_b32_e32 v0, 13			; GFX10-NEXT: v_mov_b32_e32 v0, 13
	; GFX10-NEXT: v_mov_b32_e32 v1, 15			; GFX10-NEXT: v_mov_b32_e32 v1, 15
	; GFX10-NEXT: s_movk_i32 s0, 0x3800			; GFX10-NEXT: s_movk_i32 s0, 0x3800
	; GFX10-NEXT: s_add_u32 s0, 4, s0			; GFX10-NEXT: s_add_i32 s0, s0, 4
	; GFX10-NEXT: scratch_store_dword off, v0, off offset:4			; GFX10-NEXT: scratch_store_dword off, v0, off offset:4
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_store_dword off, v1, s0 offset:1664			; GFX10-NEXT: scratch_store_dword off, v1, s0 offset:1664
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_endpgm			; GFX10-NEXT: s_endpgm
	;			;
	; GFX9-PAL-LABEL: store_load_large_imm_offset_kernel:			; GFX9-PAL-LABEL: store_load_large_imm_offset_kernel:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX9-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX9-PAL-NEXT: s_mov_b32 s2, s0			; GFX9-PAL-NEXT: s_mov_b32 s2, s0
	; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX9-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 13			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 13
	; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0			; GFX9-PAL-NEXT: s_mov_b32 vcc_hi, 0
	; GFX9-PAL-NEXT: s_movk_i32 s0, 0x3000			; GFX9-PAL-NEXT: s_movk_i32 s0, 0x3000
	; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX9-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1			; GFX9-PAL-NEXT: s_add_u32 flat_scratch_lo, s2, s1
	; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0			; GFX9-PAL-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, vcc_hi offset:4			; GFX9-PAL-NEXT: scratch_store_dword off, v0, vcc_hi offset:4
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_add_u32 s0, 4, s0			; GFX9-PAL-NEXT: s_add_i32 s0, s0, 4
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, s0 offset:3712			; GFX9-PAL-NEXT: scratch_store_dword off, v0, s0 offset:3712
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 offset:3712 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 offset:3712 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_endpgm			; GFX9-PAL-NEXT: s_endpgm
	;			;
	; GFX1010-PAL-LABEL: store_load_large_imm_offset_kernel:			; GFX1010-PAL-LABEL: store_load_large_imm_offset_kernel:
	; GFX1010-PAL: ; %bb.0: ; %bb			; GFX1010-PAL: ; %bb.0: ; %bb
	; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1010-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1010-PAL-NEXT: s_mov_b32 s2, s0			; GFX1010-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1010-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1010-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX1010-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX1010-PAL-NEXT: s_add_u32 s2, s2, s1			; GFX1010-PAL-NEXT: s_add_u32 s2, s2, s1
	; GFX1010-PAL-NEXT: s_addc_u32 s3, s3, 0			; GFX1010-PAL-NEXT: s_addc_u32 s3, s3, 0
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX1010-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 13			; GFX1010-PAL-NEXT: v_mov_b32_e32 v0, 13
	; GFX1010-PAL-NEXT: v_mov_b32_e32 v1, 15			; GFX1010-PAL-NEXT: v_mov_b32_e32 v1, 15
	; GFX1010-PAL-NEXT: s_movk_i32 s0, 0x3800			; GFX1010-PAL-NEXT: s_movk_i32 s0, 0x3800
	; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0			; GFX1010-PAL-NEXT: s_mov_b32 vcc_lo, 0
	; GFX1010-PAL-NEXT: s_add_u32 s0, 4, s0			; GFX1010-PAL-NEXT: s_add_i32 s0, s0, 4
	; GFX1010-PAL-NEXT: scratch_store_dword off, v0, vcc_lo offset:4			; GFX1010-PAL-NEXT: scratch_store_dword off, v0, vcc_lo offset:4
	; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1010-PAL-NEXT: scratch_store_dword off, v1, s0 offset:1664			; GFX1010-PAL-NEXT: scratch_store_dword off, v1, s0 offset:1664
	; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1010-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc			; GFX1010-PAL-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc
	; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1010-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1010-PAL-NEXT: s_endpgm			; GFX1010-PAL-NEXT: s_endpgm
	;			;
	; GFX1030-PAL-LABEL: store_load_large_imm_offset_kernel:			; GFX1030-PAL-LABEL: store_load_large_imm_offset_kernel:
	; GFX1030-PAL: ; %bb.0: ; %bb			; GFX1030-PAL: ; %bb.0: ; %bb
	; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]			; GFX1030-PAL-NEXT: s_getpc_b64 s[2:3]
	; GFX1030-PAL-NEXT: s_mov_b32 s2, s0			; GFX1030-PAL-NEXT: s_mov_b32 s2, s0
	; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0			; GFX1030-PAL-NEXT: s_load_dwordx2 s[2:3], s[2:3], 0x0
	; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt lgkmcnt(0)
	; GFX1030-PAL-NEXT: s_and_b32 s3, s3, 0xffff			; GFX1030-PAL-NEXT: s_and_b32 s3, s3, 0xffff
	; GFX1030-PAL-NEXT: s_add_u32 s2, s2, s1			; GFX1030-PAL-NEXT: s_add_u32 s2, s2, s1
	; GFX1030-PAL-NEXT: s_addc_u32 s3, s3, 0			; GFX1030-PAL-NEXT: s_addc_u32 s3, s3, 0
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
	; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3			; GFX1030-PAL-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 13			; GFX1030-PAL-NEXT: v_mov_b32_e32 v0, 13
	; GFX1030-PAL-NEXT: v_mov_b32_e32 v1, 15			; GFX1030-PAL-NEXT: v_mov_b32_e32 v1, 15
	; GFX1030-PAL-NEXT: s_movk_i32 s0, 0x3800			; GFX1030-PAL-NEXT: s_movk_i32 s0, 0x3800
	; GFX1030-PAL-NEXT: s_add_u32 s0, 4, s0			; GFX1030-PAL-NEXT: s_add_i32 s0, s0, 4
	; GFX1030-PAL-NEXT: scratch_store_dword off, v0, off offset:4			; GFX1030-PAL-NEXT: scratch_store_dword off, v0, off offset:4
	; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1030-PAL-NEXT: scratch_store_dword off, v1, s0 offset:1664			; GFX1030-PAL-NEXT: scratch_store_dword off, v1, s0 offset:1664
	; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX1030-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc			; GFX1030-PAL-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc
	; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX1030-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX1030-PAL-NEXT: s_endpgm			; GFX1030-PAL-NEXT: s_endpgm
	bb:			bb:
	Show All 10 Lines
	define void @store_load_large_imm_offset_foo() {			define void @store_load_large_imm_offset_foo() {
	; GFX9-LABEL: store_load_large_imm_offset_foo:			; GFX9-LABEL: store_load_large_imm_offset_foo:
	; GFX9: ; %bb.0: ; %bb			; GFX9: ; %bb.0: ; %bb
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_movk_i32 s0, 0x3000			; GFX9-NEXT: s_movk_i32 s0, 0x3000
	; GFX9-NEXT: v_mov_b32_e32 v0, 13			; GFX9-NEXT: v_mov_b32_e32 v0, 13
	; GFX9-NEXT: scratch_store_dword off, v0, s32			; GFX9-NEXT: scratch_store_dword off, v0, s32
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_add_u32 s0, s32, s0			; GFX9-NEXT: s_add_i32 s0, s0, s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: scratch_store_dword off, v0, s0 offset:3712			; GFX9-NEXT: scratch_store_dword off, v0, s0 offset:3712
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: scratch_load_dword v0, off, s0 offset:3712 glc			; GFX9-NEXT: scratch_load_dword v0, off, s0 offset:3712 glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: store_load_large_imm_offset_foo:			; GFX10-LABEL: store_load_large_imm_offset_foo:
	; GFX10: ; %bb.0: ; %bb			; GFX10: ; %bb.0: ; %bb
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_mov_b32_e32 v0, 13			; GFX10-NEXT: v_mov_b32_e32 v0, 13
	; GFX10-NEXT: v_mov_b32_e32 v1, 15			; GFX10-NEXT: v_mov_b32_e32 v1, 15
	; GFX10-NEXT: s_movk_i32 s0, 0x3800			; GFX10-NEXT: s_movk_i32 s0, 0x3800
	; GFX10-NEXT: s_add_u32 s0, s32, s0			; GFX10-NEXT: s_add_i32 s0, s0, s32
	; GFX10-NEXT: scratch_store_dword off, v0, s32			; GFX10-NEXT: scratch_store_dword off, v0, s32
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_store_dword off, v1, s0 offset:1664			; GFX10-NEXT: scratch_store_dword off, v1, s0 offset:1664
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc			; GFX10-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-PAL-LABEL: store_load_large_imm_offset_foo:			; GFX9-PAL-LABEL: store_load_large_imm_offset_foo:
	; GFX9-PAL: ; %bb.0: ; %bb			; GFX9-PAL: ; %bb.0: ; %bb
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-PAL-NEXT: s_movk_i32 s0, 0x3000			; GFX9-PAL-NEXT: s_movk_i32 s0, 0x3000
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 13			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 13
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, s32			; GFX9-PAL-NEXT: scratch_store_dword off, v0, s32
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_add_u32 s0, s32, s0			; GFX9-PAL-NEXT: s_add_i32 s0, s0, s32
	; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15			; GFX9-PAL-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-PAL-NEXT: scratch_store_dword off, v0, s0 offset:3712			; GFX9-PAL-NEXT: scratch_store_dword off, v0, s0 offset:3712
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 offset:3712 glc			; GFX9-PAL-NEXT: scratch_load_dword v0, off, s0 offset:3712 glc
	; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX9-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX9-PAL-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-PAL-LABEL: store_load_large_imm_offset_foo:			; GFX10-PAL-LABEL: store_load_large_imm_offset_foo:
	; GFX10-PAL: ; %bb.0: ; %bb			; GFX10-PAL: ; %bb.0: ; %bb
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: v_mov_b32_e32 v0, 13			; GFX10-PAL-NEXT: v_mov_b32_e32 v0, 13
	; GFX10-PAL-NEXT: v_mov_b32_e32 v1, 15			; GFX10-PAL-NEXT: v_mov_b32_e32 v1, 15
	; GFX10-PAL-NEXT: s_movk_i32 s0, 0x3800			; GFX10-PAL-NEXT: s_movk_i32 s0, 0x3800
	; GFX10-PAL-NEXT: s_add_u32 s0, s32, s0			; GFX10-PAL-NEXT: s_add_i32 s0, s0, s32
	; GFX10-PAL-NEXT: scratch_store_dword off, v0, s32			; GFX10-PAL-NEXT: scratch_store_dword off, v0, s32
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: scratch_store_dword off, v1, s0 offset:1664			; GFX10-PAL-NEXT: scratch_store_dword off, v1, s0 offset:1664
	; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-PAL-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-PAL-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc			; GFX10-PAL-NEXT: scratch_load_dword v0, off, s0 offset:1664 glc dlc
	; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)			; GFX10-PAL-NEXT: s_waitcnt vmcnt(0)
	; GFX10-PAL-NEXT: s_setpc_b64 s[30:31]			; GFX10-PAL-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	▲ Show 20 Lines • Show All 446 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll

	Show All 31 Lines
	; CI-NOT: v_mov			; CI-NOT: v_mov
	; CI: ds_write_b32 v0, v0			; CI: ds_write_b32 v0, v0
	; CI-NEXT: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6			; CI-NEXT: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6
	; CI-NEXT: v_add_i32_e{{32\|64}} v0, {{s\[[0-9]+:[0-9]+\]\|vcc}}, 4, [[SCALED]]			; CI-NEXT: v_add_i32_e{{32\|64}} v0, {{s\[[0-9]+:[0-9]+\]\|vcc}}, 4, [[SCALED]]
	; CI-NEXT: ds_write_b32 v0, v0			; CI-NEXT: ds_write_b32 v0, v0

	; GFX9-MUBUF-NEXT: v_lshrrev_b32_e64 v0, 6, s32			; GFX9-MUBUF-NEXT: v_lshrrev_b32_e64 v0, 6, s32
	; GFX9-FLATSCR: v_mov_b32_e32 v0, s32			; GFX9-FLATSCR: v_mov_b32_e32 v0, s32
	; GFX9-FLATSCR: s_add_u32 [[ADD:[^,]+]], s32, 4			; GFX9-FLATSCR: s_add_i32 [[ADD:[^,]+]], s32, 4
	; GFX9-NEXT: ds_write_b32 v0, v0			; GFX9-NEXT: ds_write_b32 v0, v0
	; GFX9-MUBUF-NEXT: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32			; GFX9-MUBUF-NEXT: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32
	; GFX9-MUBUF-NEXT: v_add_u32_e32 v0, 4, [[SCALED]]			; GFX9-MUBUF-NEXT: v_add_u32_e32 v0, 4, [[SCALED]]
	; GFX9-FLATSCR-NEXT: v_mov_b32_e32 v0, [[ADD]]			; GFX9-FLATSCR-NEXT: v_mov_b32_e32 v0, [[ADD]]
	; GFX9-NEXT: ds_write_b32 v0, v0			; GFX9-NEXT: ds_write_b32 v0, v0
	define void @func_mov_fi_i32_offset() #0 {			define void @func_mov_fi_i32_offset() #0 {
	%alloca0 = alloca i32, addrspace(5)			%alloca0 = alloca i32, addrspace(5)
	%alloca1 = alloca i32, addrspace(5)			%alloca1 = alloca i32, addrspace(5)
	▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines

	; CI-DAG: s_movk_i32 [[K:s[0-9]+\|vcc_lo\|vcc_hi]], 0x200			; CI-DAG: s_movk_i32 [[K:s[0-9]+\|vcc_lo\|vcc_hi]], 0x200
	; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6			; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6
	; CI: v_add_i32_e32 [[VZ:v[0-9]+]], vcc, [[K]], [[SCALED]]			; CI: v_add_i32_e32 [[VZ:v[0-9]+]], vcc, [[K]], [[SCALED]]

	; GFX9-MUBUF-DAG: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32			; GFX9-MUBUF-DAG: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32
	; GFX9-MUBUF: v_add_u32_e32 [[VZ:v[0-9]+]], 0x200, [[SCALED]]			; GFX9-MUBUF: v_add_u32_e32 [[VZ:v[0-9]+]], 0x200, [[SCALED]]

	; GFX9-FLATSCR-DAG: s_add_u32 [[SZ:[^,]+]], s32, 0x200			; GFX9-FLATSCR-DAG: s_add_i32 [[SZ:[^,]+]], s32, 0x200
	; GFX9-FLATSCR: v_mov_b32_e32 [[VZ:v[0-9]+]], [[SZ]]			; GFX9-FLATSCR: v_mov_b32_e32 [[VZ:v[0-9]+]], [[SZ]]

	; GCN: v_mul_lo_u32 [[VZ]], [[VZ]], 9			; GCN: v_mul_lo_u32 [[VZ]], [[VZ]], 9
	; GCN: ds_write_b32 v0, [[VZ]]			; GCN: ds_write_b32 v0, [[VZ]]
	define void @func_other_fi_user_non_inline_imm_offset_i32() #0 {			define void @func_other_fi_user_non_inline_imm_offset_i32() #0 {
	%alloca0 = alloca [128 x i32], align 4, addrspace(5)			%alloca0 = alloca [128 x i32], align 4, addrspace(5)
	%alloca1 = alloca [8 x i32], align 4, addrspace(5)			%alloca1 = alloca [8 x i32], align 4, addrspace(5)
	%gep0 = getelementptr inbounds [128 x i32], [128 x i32] addrspace(5)* %alloca0, i32 0, i32 65			%gep0 = getelementptr inbounds [128 x i32], [128 x i32] addrspace(5)* %alloca0, i32 0, i32 65
	Show All 9 Lines

	; CI-DAG: s_movk_i32 [[OFFSET:s[0-9]+]], 0x200			; CI-DAG: s_movk_i32 [[OFFSET:s[0-9]+]], 0x200
	; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6			; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6
	; CI: v_add_i32_e64 [[VZ:v[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, [[OFFSET]], [[SCALED]]			; CI: v_add_i32_e64 [[VZ:v[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, [[OFFSET]], [[SCALED]]

	; GFX9-MUBUF-DAG: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32			; GFX9-MUBUF-DAG: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32
	; GFX9-MUBUF: v_add_u32_e32 [[VZ:v[0-9]+]], 0x200, [[SCALED]]			; GFX9-MUBUF: v_add_u32_e32 [[VZ:v[0-9]+]], 0x200, [[SCALED]]

	; GFX9-FLATSCR-DAG: s_add_u32 [[SZ:[^,]+]], s32, 0x200			; GFX9-FLATSCR-DAG: s_add_i32 [[SZ:[^,]+]], s32, 0x200
	; GFX9-FLATSCR: v_mov_b32_e32 [[VZ:v[0-9]+]], [[SZ]]			; GFX9-FLATSCR: v_mov_b32_e32 [[VZ:v[0-9]+]], [[SZ]]

	; GCN: v_mul_lo_u32 [[VZ]], [[VZ]], 9			; GCN: v_mul_lo_u32 [[VZ]], [[VZ]], 9
	; GCN: ds_write_b32 v0, [[VZ]]			; GCN: ds_write_b32 v0, [[VZ]]
	define void @func_other_fi_user_non_inline_imm_offset_i32_vcc_live() #0 {			define void @func_other_fi_user_non_inline_imm_offset_i32_vcc_live() #0 {
	%alloca0 = alloca [128 x i32], align 4, addrspace(5)			%alloca0 = alloca [128 x i32], align 4, addrspace(5)
	%alloca1 = alloca [8 x i32], align 4, addrspace(5)			%alloca1 = alloca [8 x i32], align 4, addrspace(5)
	%vcc = call i64 asm sideeffect "; def $0", "={vcc}"()			%vcc = call i64 asm sideeffect "; def $0", "={vcc}"()
	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/frame-setup-without-sgpr-to-vgpr-spills.ll

	Show All 10 Lines
	; SPILL-TO-VGPR: ; %bb.0:			; SPILL-TO-VGPR: ; %bb.0:
	; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[4:5], -1			; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[4:5], -1
	; SPILL-TO-VGPR-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; SPILL-TO-VGPR-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[4:5]			; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[4:5]
	; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s33, 2			; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s33, 2
	; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s30, 0			; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s30, 0
	; SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s32			; SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s32
	; SPILL-TO-VGPR-NEXT: s_add_u32 s32, s32, 0x400			; SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0x400
	; SPILL-TO-VGPR-NEXT: v_mov_b32_e32 v0, 0			; SPILL-TO-VGPR-NEXT: v_mov_b32_e32 v0, 0
	; SPILL-TO-VGPR-NEXT: s_getpc_b64 s[4:5]			; SPILL-TO-VGPR-NEXT: s_getpc_b64 s[4:5]
	; SPILL-TO-VGPR-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; SPILL-TO-VGPR-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; SPILL-TO-VGPR-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; SPILL-TO-VGPR-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s31, 1			; SPILL-TO-VGPR-NEXT: v_writelane_b32 v40, s31, 1
	; SPILL-TO-VGPR-NEXT: buffer_store_dword v0, off, s[0:3], s33			; SPILL-TO-VGPR-NEXT: buffer_store_dword v0, off, s[0:3], s33
	; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)			; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
	; SPILL-TO-VGPR-NEXT: s_swappc_b64 s[30:31], s[4:5]			; SPILL-TO-VGPR-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; SPILL-TO-VGPR-NEXT: v_readlane_b32 s4, v40, 0			; SPILL-TO-VGPR-NEXT: v_readlane_b32 s4, v40, 0
	; SPILL-TO-VGPR-NEXT: v_readlane_b32 s5, v40, 1			; SPILL-TO-VGPR-NEXT: v_readlane_b32 s5, v40, 1
	; SPILL-TO-VGPR-NEXT: s_sub_u32 s32, s32, 0x400			; SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0xfc00
	; SPILL-TO-VGPR-NEXT: v_readlane_b32 s33, v40, 2			; SPILL-TO-VGPR-NEXT: v_readlane_b32 s33, v40, 2
	; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[6:7], -1			; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[6:7], -1
	; SPILL-TO-VGPR-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; SPILL-TO-VGPR-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[6:7]			; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[6:7]
	; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)			; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
	; SPILL-TO-VGPR-NEXT: s_setpc_b64 s[4:5]			; SPILL-TO-VGPR-NEXT: s_setpc_b64 s[4:5]
	;			;
	; NO-SPILL-TO-VGPR-LABEL: callee_with_stack_and_call:			; NO-SPILL-TO-VGPR-LABEL: callee_with_stack_and_call:
	; NO-SPILL-TO-VGPR: ; %bb.0:			; NO-SPILL-TO-VGPR: ; %bb.0:
	; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; NO-SPILL-TO-VGPR-NEXT: v_mov_b32_e32 v0, s33			; NO-SPILL-TO-VGPR-NEXT: v_mov_b32_e32 v0, s33
	; NO-SPILL-TO-VGPR-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; NO-SPILL-TO-VGPR-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; NO-SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s32			; NO-SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s32
	; NO-SPILL-TO-VGPR-NEXT: s_add_u32 s32, s32, 0x800			; NO-SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0x800
	; NO-SPILL-TO-VGPR-NEXT: s_mov_b64 s[6:7], exec			; NO-SPILL-TO-VGPR-NEXT: s_mov_b64 s[6:7], exec
	; NO-SPILL-TO-VGPR-NEXT: s_mov_b64 exec, 3			; NO-SPILL-TO-VGPR-NEXT: s_mov_b64 exec, 3
	; NO-SPILL-TO-VGPR-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:16			; NO-SPILL-TO-VGPR-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:16
	; NO-SPILL-TO-VGPR-NEXT: v_writelane_b32 v1, s30, 0			; NO-SPILL-TO-VGPR-NEXT: v_writelane_b32 v1, s30, 0
	; NO-SPILL-TO-VGPR-NEXT: v_writelane_b32 v1, s31, 1			; NO-SPILL-TO-VGPR-NEXT: v_writelane_b32 v1, s31, 1
	; NO-SPILL-TO-VGPR-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; NO-SPILL-TO-VGPR-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; NO-SPILL-TO-VGPR-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:16			; NO-SPILL-TO-VGPR-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:16
	; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)			; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
	Show All 10 Lines
	; NO-SPILL-TO-VGPR-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:16			; NO-SPILL-TO-VGPR-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:16
	; NO-SPILL-TO-VGPR-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; NO-SPILL-TO-VGPR-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)			; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
	; NO-SPILL-TO-VGPR-NEXT: v_readlane_b32 s4, v1, 0			; NO-SPILL-TO-VGPR-NEXT: v_readlane_b32 s4, v1, 0
	; NO-SPILL-TO-VGPR-NEXT: v_readlane_b32 s5, v1, 1			; NO-SPILL-TO-VGPR-NEXT: v_readlane_b32 s5, v1, 1
	; NO-SPILL-TO-VGPR-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:16			; NO-SPILL-TO-VGPR-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:16
	; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)			; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
	; NO-SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[6:7]			; NO-SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[6:7]
	; NO-SPILL-TO-VGPR-NEXT: s_sub_u32 s32, s32, 0x800			; NO-SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0xf800
	; NO-SPILL-TO-VGPR-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; NO-SPILL-TO-VGPR-NEXT: buffer_load_dword v0, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)			; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
	; NO-SPILL-TO-VGPR-NEXT: v_readfirstlane_b32 s33, v0			; NO-SPILL-TO-VGPR-NEXT: v_readfirstlane_b32 s33, v0
	; NO-SPILL-TO-VGPR-NEXT: s_setpc_b64 s[4:5]			; NO-SPILL-TO-VGPR-NEXT: s_setpc_b64 s[4:5]
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca			store volatile i32 0, i32 addrspace(5)* %alloca
	call void @external_void_func_void()			call void @external_void_func_void()
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i1@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i1@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i1@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i1@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_imm:			; GFX10-LABEL: test_call_external_void_func_i1_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i1@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i1@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i1@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i1@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_i1(i1 true)			call amdgpu_gfx void @external_void_func_i1(i1 true)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i1_signext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i1_signext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i1_signext:			; GFX9-LABEL: test_call_external_void_func_i1_signext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc			; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i1_signext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i1_signext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i1_signext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i1_signext@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: v_and_b32_e32 v0, 1, v0			; GFX9-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_signext:			; GFX10-LABEL: test_call_external_void_func_i1_signext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i1_signext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i1_signext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i1_signext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i1_signext@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: v_and_b32_e32 v0, 1, v0			; GFX10-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_signext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_signext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_signext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_signext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_signext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_signext@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0			; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%var = load volatile i1, i1 addrspace(1)* undef			%var = load volatile i1, i1 addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_i1_signext(i1 signext%var)			call amdgpu_gfx void @external_void_func_i1_signext(i1 signext%var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i1_zeroext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i1_zeroext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i1_zeroext:			; GFX9-LABEL: test_call_external_void_func_i1_zeroext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc			; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i1_zeroext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i1_zeroext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i1_zeroext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i1_zeroext@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: v_and_b32_e32 v0, 1, v0			; GFX9-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_zeroext:			; GFX10-LABEL: test_call_external_void_func_i1_zeroext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i1_zeroext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i1_zeroext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i1_zeroext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i1_zeroext@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: v_and_b32_e32 v0, 1, v0			; GFX10-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_zeroext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_zeroext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_zeroext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_zeroext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_zeroext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_zeroext@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0			; GFX10-SCRATCH-NEXT: v_and_b32_e32 v0, 1, v0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%var = load volatile i1, i1 addrspace(1)* undef			%var = load volatile i1, i1 addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_i1_zeroext(i1 zeroext %var)			call amdgpu_gfx void @external_void_func_i1_zeroext(i1 zeroext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i8_imm(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i8_imm(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i8_imm:			; GFX9-LABEL: test_call_external_void_func_i8_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i8@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i8@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i8@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i8@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_imm:			; GFX10-LABEL: test_call_external_void_func_i8_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i8@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i8@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i8@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i8@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_i8(i8 123)			call amdgpu_gfx void @external_void_func_i8(i8 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i8_signext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i8_signext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i8_signext:			; GFX9-LABEL: test_call_external_void_func_i8_signext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: global_load_sbyte v0, v[0:1], off glc			; GFX9-NEXT: global_load_sbyte v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i8_signext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i8_signext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i8_signext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i8_signext@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_signext:			; GFX10-LABEL: test_call_external_void_func_i8_signext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: global_load_sbyte v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_sbyte v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i8_signext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i8_signext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i8_signext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i8_signext@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_signext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_signext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_sbyte v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_sbyte v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_signext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_signext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_signext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_signext@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%var = load volatile i8, i8 addrspace(1)* undef			%var = load volatile i8, i8 addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_i8_signext(i8 signext %var)			call amdgpu_gfx void @external_void_func_i8_signext(i8 signext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i8_zeroext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i8_zeroext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i8_zeroext:			; GFX9-LABEL: test_call_external_void_func_i8_zeroext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc			; GFX9-NEXT: global_load_ubyte v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i8_zeroext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i8_zeroext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i8_zeroext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i8_zeroext@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_zeroext:			; GFX10-LABEL: test_call_external_void_func_i8_zeroext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i8_zeroext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i8_zeroext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i8_zeroext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i8_zeroext@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_zeroext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_zeroext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_zeroext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_zeroext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_zeroext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_zeroext@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%var = load volatile i8, i8 addrspace(1)* undef			%var = load volatile i8, i8 addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_i8_zeroext(i8 zeroext %var)			call amdgpu_gfx void @external_void_func_i8_zeroext(i8 zeroext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_i16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_i16_imm:			; GFX9-LABEL: test_call_external_void_func_i16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i16@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_imm:			; GFX10-LABEL: test_call_external_void_func_i16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_i16(i16 123)			call amdgpu_gfx void @external_void_func_i16(i16 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i16_signext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i16_signext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i16_signext:			; GFX9-LABEL: test_call_external_void_func_i16_signext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc			; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i16_signext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i16_signext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i16_signext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i16_signext@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_signext:			; GFX10-LABEL: test_call_external_void_func_i16_signext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i16_signext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i16_signext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i16_signext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i16_signext@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_signext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_signext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_signext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_signext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_signext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_signext@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%var = load volatile i16, i16 addrspace(1)* undef			%var = load volatile i16, i16 addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_i16_signext(i16 signext %var)			call amdgpu_gfx void @external_void_func_i16_signext(i16 signext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i16_zeroext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i16_zeroext(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i16_zeroext:			; GFX9-LABEL: test_call_external_void_func_i16_zeroext:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc			; GFX9-NEXT: global_load_ushort v0, v[0:1], off glc
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i16_zeroext@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i16_zeroext@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i16_zeroext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i16_zeroext@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_zeroext:			; GFX10-LABEL: test_call_external_void_func_i16_zeroext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc			; GFX10-NEXT: global_load_ushort v0, v[0:1], off glc dlc
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i16_zeroext@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i16_zeroext@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i16_zeroext@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i16_zeroext@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_zeroext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_zeroext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc			; GFX10-SCRATCH-NEXT: global_load_ushort v0, v[0:1], off glc dlc
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_zeroext@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_zeroext@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_zeroext@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_zeroext@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%var = load volatile i16, i16 addrspace(1)* undef			%var = load volatile i16, i16 addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_i16_zeroext(i16 zeroext %var)			call amdgpu_gfx void @external_void_func_i16_zeroext(i16 zeroext %var)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i32_imm(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i32_imm(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i32_imm:			; GFX9-LABEL: test_call_external_void_func_i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 42			; GFX9-NEXT: v_mov_b32_e32 v0, 42
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i32_imm:			; GFX10-LABEL: test_call_external_void_func_i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 42			; GFX10-NEXT: v_mov_b32_e32 v0, 42
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_i32(i32 42)			call amdgpu_gfx void @external_void_func_i32(i32 42)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_i64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_i64_imm:			; GFX9-LABEL: test_call_external_void_func_i64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX9-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i64@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i64_imm:			; GFX10-LABEL: test_call_external_void_func_i64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i64@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x7b
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_i64(i64 123)			call amdgpu_gfx void @external_void_func_i64(i64 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i64() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i64() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i64:			; GFX9-LABEL: test_call_external_void_func_v2i64:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v2i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v2i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v2i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v2i64@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64:			; GFX10-LABEL: test_call_external_void_func_v2i64:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v2i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v2i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v2i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v2i64@rel32@hi+12
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%val = load <2 x i64>, <2 x i64> addrspace(1)* null			%val = load <2 x i64>, <2 x i64> addrspace(1)* null
	call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> %val)			call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i64_imm:			; GFX9-LABEL: test_call_external_void_func_v2i64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: v_mov_b32_e32 v2, 3			; GFX9-NEXT: v_mov_b32_e32 v2, 3
	; GFX9-NEXT: v_mov_b32_e32 v3, 4			; GFX9-NEXT: v_mov_b32_e32 v3, 4
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v2i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v2i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v2i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v2i64@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64_imm:			; GFX10-LABEL: test_call_external_void_func_v2i64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: v_mov_b32_e32 v1, 2			; GFX10-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-NEXT: v_mov_b32_e32 v2, 3			; GFX10-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-NEXT: v_mov_b32_e32 v3, 4			; GFX10-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v2i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v2i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v2i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v2i64@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> <i64 8589934593, i64 17179869187>)			call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> <i64 8589934593, i64 17179869187>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i64() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i64() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i64:			; GFX9-LABEL: test_call_external_void_func_v3i64:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v4, 1			; GFX9-NEXT: v_mov_b32_e32 v4, 1
	; GFX9-NEXT: v_mov_b32_e32 v5, 2			; GFX9-NEXT: v_mov_b32_e32 v5, 2
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v3i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v3i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v3i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v3i64@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i64:			; GFX10-LABEL: test_call_external_void_func_v3i64:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v4, 1			; GFX10-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-NEXT: v_mov_b32_e32 v5, 2			; GFX10-NEXT: v_mov_b32_e32 v5, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v3i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v3i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v3i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v3i64@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%load = load <2 x i64>, <2 x i64> addrspace(1)* null			%load = load <2 x i64>, <2 x i64> addrspace(1)* null
	Show All 11 Lines
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0			; GFX9-NEXT: v_mov_b32_e32 v1, 0
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v4, 1			; GFX9-NEXT: v_mov_b32_e32 v4, 1
	; GFX9-NEXT: v_mov_b32_e32 v5, 2			; GFX9-NEXT: v_mov_b32_e32 v5, 2
	; GFX9-NEXT: v_mov_b32_e32 v6, 3			; GFX9-NEXT: v_mov_b32_e32 v6, 3
	; GFX9-NEXT: v_mov_b32_e32 v7, 4			; GFX9-NEXT: v_mov_b32_e32 v7, 4
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v4i64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v4i64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v4i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v4i64@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i64:			; GFX10-LABEL: test_call_external_void_func_v4i64:
	Show All 9 Lines
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v4, 1			; GFX10-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-NEXT: v_mov_b32_e32 v5, 2			; GFX10-NEXT: v_mov_b32_e32 v5, 2
	; GFX10-NEXT: v_mov_b32_e32 v6, 3			; GFX10-NEXT: v_mov_b32_e32 v6, 3
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v7, 4			; GFX10-NEXT: v_mov_b32_e32 v7, 4
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v4i64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v4i64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v4i64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v4i64@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	Show All 10 Lines
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 3
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%load = load <2 x i64>, <2 x i64> addrspace(1)* null			%load = load <2 x i64>, <2 x i64> addrspace(1)* null
	%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	call amdgpu_gfx void @external_void_func_v4i64(<4 x i64> %val)			call amdgpu_gfx void @external_void_func_v4i64(<4 x i64> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_f16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_f16_imm:			; GFX9-LABEL: test_call_external_void_func_f16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x4400			; GFX9-NEXT: v_mov_b32_e32 v0, 0x4400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_f16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_f16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_f16@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f16_imm:			; GFX10-LABEL: test_call_external_void_func_f16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x4400			; GFX10-NEXT: v_mov_b32_e32 v0, 0x4400
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_f16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_f16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_f16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_f16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x4400			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x4400
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_f16(half 4.0)			call amdgpu_gfx void @external_void_func_f16(half 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_f32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_f32_imm:			; GFX9-LABEL: test_call_external_void_func_f32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 4.0			; GFX9-NEXT: v_mov_b32_e32 v0, 4.0
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_f32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f32_imm:			; GFX10-LABEL: test_call_external_void_func_f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 4.0			; GFX10-NEXT: v_mov_b32_e32 v0, 4.0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_f32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 4.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 4.0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_f32(float 4.0)			call amdgpu_gfx void @external_void_func_f32(float 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f32_imm:			; GFX9-LABEL: test_call_external_void_func_v2f32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 1.0			; GFX9-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v2f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v2f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v2f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v2f32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f32_imm:			; GFX10-LABEL: test_call_external_void_func_v2f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v2f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v2f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v2f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v2f32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v2f32(<2 x float> <float 1.0, float 2.0>)			call amdgpu_gfx void @external_void_func_v2f32(<2 x float> <float 1.0, float 2.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f32_imm:			; GFX9-LABEL: test_call_external_void_func_v3f32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 1.0			; GFX9-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: v_mov_b32_e32 v2, 4.0			; GFX9-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v3f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v3f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v3f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v3f32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f32_imm:			; GFX10-LABEL: test_call_external_void_func_v3f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-NEXT: v_mov_b32_e32 v2, 4.0			; GFX10-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v3f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v3f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v3f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v3f32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v3f32(<3 x float> <float 1.0, float 2.0, float 4.0>)			call amdgpu_gfx void @external_void_func_v3f32(<3 x float> <float 1.0, float 2.0, float 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5f32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v5f32_imm:			; GFX9-LABEL: test_call_external_void_func_v5f32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 1.0			; GFX9-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: v_mov_b32_e32 v2, 4.0			; GFX9-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX9-NEXT: v_mov_b32_e32 v3, -1.0			; GFX9-NEXT: v_mov_b32_e32 v3, -1.0
	; GFX9-NEXT: v_mov_b32_e32 v4, 0.5			; GFX9-NEXT: v_mov_b32_e32 v4, 0.5
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v5f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v5f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v5f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v5f32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5f32_imm:			; GFX10-LABEL: test_call_external_void_func_v5f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-NEXT: v_mov_b32_e32 v2, 4.0			; GFX10-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX10-NEXT: v_mov_b32_e32 v3, -1.0			; GFX10-NEXT: v_mov_b32_e32 v3, -1.0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 0.5			; GFX10-NEXT: v_mov_b32_e32 v4, 0.5
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v5f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v5f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v5f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v5f32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 4.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, -1.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, -1.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0.5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0.5
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v5f32(<5 x float> <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)			call amdgpu_gfx void @external_void_func_v5f32(<5 x float> <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_f64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_f64_imm:			; GFX9-LABEL: test_call_external_void_func_f64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 0x40100000			; GFX9-NEXT: v_mov_b32_e32 v1, 0x40100000
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_f64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_f64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_f64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_f64@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f64_imm:			; GFX10-LABEL: test_call_external_void_func_f64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x40100000			; GFX10-NEXT: v_mov_b32_e32 v1, 0x40100000
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_f64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_f64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_f64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_f64@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40100000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40100000
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_f64(double 4.0)			call amdgpu_gfx void @external_void_func_f64(double 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f64_imm:			; GFX9-LABEL: test_call_external_void_func_v2f64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v2f64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v2f64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v2f64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v2f64@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f64_imm:			; GFX10-LABEL: test_call_external_void_func_v2f64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v2f64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v2f64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v2f64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v2f64@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v2f64(<2 x double> <double 2.0, double 4.0>)			call amdgpu_gfx void @external_void_func_v2f64(<2 x double> <double 2.0, double 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f64_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f64_imm:			; GFX9-LABEL: test_call_external_void_func_v3f64_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, 2.0			; GFX9-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX9-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX9-NEXT: v_mov_b32_e32 v4, 0			; GFX9-NEXT: v_mov_b32_e32 v4, 0
	; GFX9-NEXT: v_mov_b32_e32 v5, 0x40200000			; GFX9-NEXT: v_mov_b32_e32 v5, 0x40200000
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v3f64@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v3f64@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v3f64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v3f64@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f64_imm:			; GFX10-LABEL: test_call_external_void_func_v3f64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX10-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 0			; GFX10-NEXT: v_mov_b32_e32 v4, 0
	; GFX10-NEXT: v_mov_b32_e32 v5, 0x40200000			; GFX10-NEXT: v_mov_b32_e32 v5, 0x40200000
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v3f64@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v3f64@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v3f64@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v3f64@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	Show All 9 Lines
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2.0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x40100000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x40200000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x40200000
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v3f64(<3 x double> <double 2.0, double 4.0, double 8.0>)			call amdgpu_gfx void @external_void_func_v3f64(<3 x double> <double 2.0, double 4.0, double 8.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i16:			; GFX9-LABEL: test_call_external_void_func_v2i16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: global_load_dword v0, v[0:1], off			; GFX9-NEXT: global_load_dword v0, v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v2i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v2i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v2i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v2i16@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i16:			; GFX10-LABEL: test_call_external_void_func_v2i16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: global_load_dword v0, v[0:1], off			; GFX10-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v2i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v2i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v2i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v2i16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%val = load <2 x i16>, <2 x i16> addrspace(1)* undef			%val = load <2 x i16>, <2 x i16> addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_v2i16(<2 x i16> %val)			call amdgpu_gfx void @external_void_func_v2i16(<2 x i16> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i16:			; GFX9-LABEL: test_call_external_void_func_v3i16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v3i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v3i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v3i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v3i16@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16:			; GFX10-LABEL: test_call_external_void_func_v3i16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v3i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v3i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v3i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v3i16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%val = load <3 x i16>, <3 x i16> addrspace(1)* undef			%val = load <3 x i16>, <3 x i16> addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> %val)			call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f16:			; GFX9-LABEL: test_call_external_void_func_v3f16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v3f16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v3f16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v3f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v3f16@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16:			; GFX10-LABEL: test_call_external_void_func_v3f16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v3f16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v3f16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v3f16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v3f16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%val = load <3 x half>, <3 x half> addrspace(1)* undef			%val = load <3 x half>, <3 x half> addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_v3f16(<3 x half> %val)			call amdgpu_gfx void @external_void_func_v3f16(<3 x half> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i16_imm:			; GFX9-LABEL: test_call_external_void_func_v3i16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX9-NEXT: v_mov_b32_e32 v1, 3			; GFX9-NEXT: v_mov_b32_e32 v1, 3
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v3i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v3i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v3i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v3i16@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16_imm:			; GFX10-LABEL: test_call_external_void_func_v3i16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX10-NEXT: v_mov_b32_e32 v1, 3			; GFX10-NEXT: v_mov_b32_e32 v1, 3
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v3i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v3i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v3i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v3i16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> <i16 1, i16 2, i16 3>)			call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> <i16 1, i16 2, i16 3>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f16_imm:			; GFX9-LABEL: test_call_external_void_func_v3f16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x40003c00			; GFX9-NEXT: v_mov_b32_e32 v0, 0x40003c00
	; GFX9-NEXT: v_mov_b32_e32 v1, 0x4400			; GFX9-NEXT: v_mov_b32_e32 v1, 0x4400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v3f16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v3f16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v3f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v3f16@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16_imm:			; GFX10-LABEL: test_call_external_void_func_v3f16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x40003c00			; GFX10-NEXT: v_mov_b32_e32 v0, 0x40003c00
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x4400			; GFX10-NEXT: v_mov_b32_e32 v1, 0x4400
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v3f16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v3f16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v3f16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v3f16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x40003c00			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x40003c00
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x4400			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x4400
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v3f16(<3 x half> <half 1.0, half 2.0, half 4.0>)			call amdgpu_gfx void @external_void_func_v3f16(<3 x half> <half 1.0, half 2.0, half 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i16:			; GFX9-LABEL: test_call_external_void_func_v4i16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v4i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v4i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v4i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v4i16@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16:			; GFX10-LABEL: test_call_external_void_func_v4i16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v4i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v4i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v4i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v4i16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%val = load <4 x i16>, <4 x i16> addrspace(1)* undef			%val = load <4 x i16>, <4 x i16> addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> %val)			call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i16_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i16_imm:			; GFX9-LABEL: test_call_external_void_func_v4i16_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX9-NEXT: v_mov_b32_e32 v1, 0x40003			; GFX9-NEXT: v_mov_b32_e32 v1, 0x40003
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v4i16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v4i16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v4i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v4i16@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16_imm:			; GFX10-LABEL: test_call_external_void_func_v4i16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX10-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x40003			; GFX10-NEXT: v_mov_b32_e32 v1, 0x40003
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v4i16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v4i16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v4i16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v4i16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40003			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x40003
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> <i16 1, i16 2, i16 3, i16 4>)			call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> <i16 1, i16 2, i16 3, i16 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f16() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f16:			; GFX9-LABEL: test_call_external_void_func_v2f16:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: global_load_dword v0, v[0:1], off			; GFX9-NEXT: global_load_dword v0, v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v2f16@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v2f16@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v2f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v2f16@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f16:			; GFX10-LABEL: test_call_external_void_func_v2f16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: global_load_dword v0, v[0:1], off			; GFX10-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v2f16@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v2f16@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v2f16@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v2f16@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dword v0, v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%val = load <2 x half>, <2 x half> addrspace(1)* undef			%val = load <2 x half>, <2 x half> addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_v2f16(<2 x half> %val)			call amdgpu_gfx void @external_void_func_v2f16(<2 x half> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i32:			; GFX9-LABEL: test_call_external_void_func_v2i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX9-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v2i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v2i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v2i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v2i32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32:			; GFX10-LABEL: test_call_external_void_func_v2i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v2i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v2i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v2i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v2i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx2 v[0:1], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%val = load <2 x i32>, <2 x i32> addrspace(1)* undef			%val = load <2 x i32>, <2 x i32> addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> %val)			call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i32_imm:			; GFX9-LABEL: test_call_external_void_func_v2i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v2i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v2i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v2i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v2i32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32_imm:			; GFX10-LABEL: test_call_external_void_func_v2i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: v_mov_b32_e32 v1, 2			; GFX10-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v2i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v2i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v2i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v2i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> <i32 1, i32 2>)			call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> <i32 1, i32 2>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_imm(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_imm(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i32_imm:			; GFX9-LABEL: test_call_external_void_func_v3i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 3			; GFX9-NEXT: v_mov_b32_e32 v0, 3
	; GFX9-NEXT: v_mov_b32_e32 v1, 4			; GFX9-NEXT: v_mov_b32_e32 v1, 4
	; GFX9-NEXT: v_mov_b32_e32 v2, 5			; GFX9-NEXT: v_mov_b32_e32 v2, 5
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v3i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v3i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v3i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v3i32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_imm:			; GFX10-LABEL: test_call_external_void_func_v3i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 3			; GFX10-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-NEXT: v_mov_b32_e32 v1, 4			; GFX10-NEXT: v_mov_b32_e32 v1, 4
	; GFX10-NEXT: v_mov_b32_e32 v2, 5			; GFX10-NEXT: v_mov_b32_e32 v2, 5
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v3i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v3i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v3i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v3i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v3i32(<3 x i32> <i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v3i32(<3 x i32> <i32 3, i32 4, i32 5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_i32(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_i32(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i32_i32:			; GFX9-LABEL: test_call_external_void_func_v3i32_i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 3			; GFX9-NEXT: v_mov_b32_e32 v0, 3
	; GFX9-NEXT: v_mov_b32_e32 v1, 4			; GFX9-NEXT: v_mov_b32_e32 v1, 4
	; GFX9-NEXT: v_mov_b32_e32 v2, 5			; GFX9-NEXT: v_mov_b32_e32 v2, 5
	; GFX9-NEXT: v_mov_b32_e32 v3, 6			; GFX9-NEXT: v_mov_b32_e32 v3, 6
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v3i32_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v3i32_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v3i32_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v3i32_i32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_i32:			; GFX10-LABEL: test_call_external_void_func_v3i32_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 3			; GFX10-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-NEXT: v_mov_b32_e32 v1, 4			; GFX10-NEXT: v_mov_b32_e32 v1, 4
	; GFX10-NEXT: v_mov_b32_e32 v2, 5			; GFX10-NEXT: v_mov_b32_e32 v2, 5
	; GFX10-NEXT: v_mov_b32_e32 v3, 6			; GFX10-NEXT: v_mov_b32_e32 v3, 6
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v3i32_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v3i32_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v3i32_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v3i32_i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 4
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 5
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 6			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v3i32_i32(<3 x i32> <i32 3, i32 4, i32 5>, i32 6)			call amdgpu_gfx void @external_void_func_v3i32_i32(<3 x i32> <i32 3, i32 4, i32 5>, i32 6)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i32() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i32:			; GFX9-LABEL: test_call_external_void_func_v4i32:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX9-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v4i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v4i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v4i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v4i32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32:			; GFX10-LABEL: test_call_external_void_func_v4i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v4i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v4i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v4i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v4i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v[0:1], off
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%val = load <4 x i32>, <4 x i32> addrspace(1)* undef			%val = load <4 x i32>, <4 x i32> addrspace(1)* undef
	call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> %val)			call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i32_imm:			; GFX9-LABEL: test_call_external_void_func_v4i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: v_mov_b32_e32 v2, 3			; GFX9-NEXT: v_mov_b32_e32 v2, 3
	; GFX9-NEXT: v_mov_b32_e32 v3, 4			; GFX9-NEXT: v_mov_b32_e32 v3, 4
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v4i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v4i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v4i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v4i32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32_imm:			; GFX10-LABEL: test_call_external_void_func_v4i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: v_mov_b32_e32 v1, 2			; GFX10-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-NEXT: v_mov_b32_e32 v2, 3			; GFX10-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-NEXT: v_mov_b32_e32 v3, 4			; GFX10-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v4i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v4i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v4i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v4i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> <i32 1, i32 2, i32 3, i32 4>)			call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> <i32 1, i32 2, i32 3, i32 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5i32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5i32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v5i32_imm:			; GFX9-LABEL: test_call_external_void_func_v5i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: v_mov_b32_e32 v2, 3			; GFX9-NEXT: v_mov_b32_e32 v2, 3
	; GFX9-NEXT: v_mov_b32_e32 v3, 4			; GFX9-NEXT: v_mov_b32_e32 v3, 4
	; GFX9-NEXT: v_mov_b32_e32 v4, 5			; GFX9-NEXT: v_mov_b32_e32 v4, 5
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v5i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v5i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v5i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v5i32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5i32_imm:			; GFX10-LABEL: test_call_external_void_func_v5i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: v_mov_b32_e32 v1, 2			; GFX10-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-NEXT: v_mov_b32_e32 v2, 3			; GFX10-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-NEXT: v_mov_b32_e32 v3, 4			; GFX10-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 5			; GFX10-NEXT: v_mov_b32_e32 v4, 5
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v5i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v5i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v5i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v5i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v5i32(<5 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v5i32(<5 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5>)
	Show All 10 Lines
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v8, 0			; GFX9-NEXT: v_mov_b32_e32 v8, 0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v8, s[4:5]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v8, s[4:5]
	; GFX9-NEXT: global_load_dwordx4 v[4:7], v8, s[4:5] offset:16			; GFX9-NEXT: global_load_dwordx4 v[4:7], v8, s[4:5] offset:16
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v8i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v8i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v8i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v8i32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32:			; GFX10-LABEL: test_call_external_void_func_v8i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v8, 0			; GFX10-NEXT: v_mov_b32_e32 v8, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v8, s[4:5]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v8, s[4:5]
	; GFX10-NEXT: global_load_dwordx4 v[4:7], v8, s[4:5] offset:16			; GFX10-NEXT: global_load_dwordx4 v[4:7], v8, s[4:5] offset:16
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v8i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v8i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v8i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v8i32@rel32@hi+12
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v8, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: s_clause 0x1
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v8, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v8, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v8, s[0:1] offset:16			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v8, s[0:1] offset:16
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%ptr = load <8 x i32> addrspace(1), <8 x i32> addrspace(1) addrspace(4)* undef			%ptr = load <8 x i32> addrspace(1), <8 x i32> addrspace(1) addrspace(4)* undef
	%val = load <8 x i32>, <8 x i32> addrspace(1)* %ptr			%val = load <8 x i32>, <8 x i32> addrspace(1)* %ptr
	call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> %val)			call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v8i32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v8i32_imm() #0 {
	; GFX9-LABEL: test_call_external_void_func_v8i32_imm:			; GFX9-LABEL: test_call_external_void_func_v8i32_imm:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: v_mov_b32_e32 v1, 2			; GFX9-NEXT: v_mov_b32_e32 v1, 2
	; GFX9-NEXT: v_mov_b32_e32 v2, 3			; GFX9-NEXT: v_mov_b32_e32 v2, 3
	; GFX9-NEXT: v_mov_b32_e32 v3, 4			; GFX9-NEXT: v_mov_b32_e32 v3, 4
	; GFX9-NEXT: v_mov_b32_e32 v4, 5			; GFX9-NEXT: v_mov_b32_e32 v4, 5
	; GFX9-NEXT: v_mov_b32_e32 v5, 6			; GFX9-NEXT: v_mov_b32_e32 v5, 6
	; GFX9-NEXT: v_mov_b32_e32 v6, 7			; GFX9-NEXT: v_mov_b32_e32 v6, 7
	; GFX9-NEXT: v_mov_b32_e32 v7, 8			; GFX9-NEXT: v_mov_b32_e32 v7, 8
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v8i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v8i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v8i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v8i32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32_imm:			; GFX10-LABEL: test_call_external_void_func_v8i32_imm:
	Show All 10 Lines
	; GFX10-NEXT: v_mov_b32_e32 v2, 3			; GFX10-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-NEXT: v_mov_b32_e32 v3, 4			; GFX10-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v4, 5			; GFX10-NEXT: v_mov_b32_e32 v4, 5
	; GFX10-NEXT: v_mov_b32_e32 v5, 6			; GFX10-NEXT: v_mov_b32_e32 v5, 6
	; GFX10-NEXT: v_mov_b32_e32 v6, 7			; GFX10-NEXT: v_mov_b32_e32 v6, 7
	; GFX10-NEXT: v_mov_b32_e32 v7, 8			; GFX10-NEXT: v_mov_b32_e32 v7, 8
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v8i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v8i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v8i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v8i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	Show All 11 Lines
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 5
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 6			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 6
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 7			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 7
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 8			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 8
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)			call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)
	Show All 12 Lines
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v16, s[4:5]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v16, s[4:5]
	; GFX9-NEXT: global_load_dwordx4 v[4:7], v16, s[4:5] offset:16			; GFX9-NEXT: global_load_dwordx4 v[4:7], v16, s[4:5] offset:16
	; GFX9-NEXT: global_load_dwordx4 v[8:11], v16, s[4:5] offset:32			; GFX9-NEXT: global_load_dwordx4 v[8:11], v16, s[4:5] offset:32
	; GFX9-NEXT: global_load_dwordx4 v[12:15], v16, s[4:5] offset:48			; GFX9-NEXT: global_load_dwordx4 v[12:15], v16, s[4:5] offset:48
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v16i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v16i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v16i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v16i32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v16i32:			; GFX10-LABEL: test_call_external_void_func_v16i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v16, 0			; GFX10-NEXT: v_mov_b32_e32 v16, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x3			; GFX10-NEXT: s_clause 0x3
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v16, s[4:5]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v16, s[4:5]
	; GFX10-NEXT: global_load_dwordx4 v[4:7], v16, s[4:5] offset:16			; GFX10-NEXT: global_load_dwordx4 v[4:7], v16, s[4:5] offset:16
	; GFX10-NEXT: global_load_dwordx4 v[8:11], v16, s[4:5] offset:32			; GFX10-NEXT: global_load_dwordx4 v[8:11], v16, s[4:5] offset:32
	; GFX10-NEXT: global_load_dwordx4 v[12:15], v16, s[4:5] offset:48			; GFX10-NEXT: global_load_dwordx4 v[12:15], v16, s[4:5] offset:48
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v16i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v16i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v16i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v16i32@rel32@hi+12
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v16, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v16, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x3			; GFX10-SCRATCH-NEXT: s_clause 0x3
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v16, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v16, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v16, s[0:1] offset:16			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v16, s[0:1] offset:16
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v16, s[0:1] offset:32			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v16, s[0:1] offset:32
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v16, s[0:1] offset:48			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v16, s[0:1] offset:48
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%ptr = load <16 x i32> addrspace(1), <16 x i32> addrspace(1) addrspace(4)* undef			%ptr = load <16 x i32> addrspace(1), <16 x i32> addrspace(1) addrspace(4)* undef
	Show All 19 Lines
	; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[4:5] offset:16			; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[4:5] offset:16
	; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[4:5] offset:32			; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[4:5] offset:32
	; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[4:5] offset:48			; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[4:5] offset:48
	; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[4:5] offset:64			; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[4:5] offset:64
	; GFX9-NEXT: global_load_dwordx4 v[20:23], v28, s[4:5] offset:80			; GFX9-NEXT: global_load_dwordx4 v[20:23], v28, s[4:5] offset:80
	; GFX9-NEXT: global_load_dwordx4 v[24:27], v28, s[4:5] offset:96			; GFX9-NEXT: global_load_dwordx4 v[24:27], v28, s[4:5] offset:96
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: global_load_dwordx4 v[28:31], v28, s[4:5] offset:112			; GFX9-NEXT: global_load_dwordx4 v[28:31], v28, s[4:5] offset:112
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v32i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v32i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v32i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v32i32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32:			; GFX10-LABEL: test_call_external_void_func_v32i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v32, 0			; GFX10-NEXT: v_mov_b32_e32 v32, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x7			; GFX10-NEXT: s_clause 0x7
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[4:5]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[4:5]
	; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[4:5] offset:16			; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[4:5] offset:16
	; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[4:5] offset:32			; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[4:5] offset:32
	; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[4:5] offset:48			; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[4:5] offset:48
	; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[4:5] offset:64			; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[4:5] offset:64
	; GFX10-NEXT: global_load_dwordx4 v[20:23], v32, s[4:5] offset:80			; GFX10-NEXT: global_load_dwordx4 v[20:23], v32, s[4:5] offset:80
	; GFX10-NEXT: global_load_dwordx4 v[24:27], v32, s[4:5] offset:96			; GFX10-NEXT: global_load_dwordx4 v[24:27], v32, s[4:5] offset:96
	; GFX10-NEXT: global_load_dwordx4 v[28:31], v32, s[4:5] offset:112			; GFX10-NEXT: global_load_dwordx4 v[28:31], v32, s[4:5] offset:112
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v32i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v32i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v32i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v32i32@rel32@hi+12
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x7			; GFX10-SCRATCH-NEXT: s_clause 0x7
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[20:23], v32, s[0:1] offset:80			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[20:23], v32, s[0:1] offset:80
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[24:27], v32, s[0:1] offset:96			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[24:27], v32, s[0:1] offset:96
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[28:31], v32, s[0:1] offset:112			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[28:31], v32, s[0:1] offset:112
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v32i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v32i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%ptr = load <32 x i32> addrspace(1), <32 x i32> addrspace(1) addrspace(4)* undef			%ptr = load <32 x i32> addrspace(1), <32 x i32> addrspace(1) addrspace(4)* undef
	Show All 19 Lines
	; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[4:5] offset:16			; GFX9-NEXT: global_load_dwordx4 v[4:7], v28, s[4:5] offset:16
	; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[4:5] offset:32			; GFX9-NEXT: global_load_dwordx4 v[8:11], v28, s[4:5] offset:32
	; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[4:5] offset:48			; GFX9-NEXT: global_load_dwordx4 v[12:15], v28, s[4:5] offset:48
	; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[4:5] offset:64			; GFX9-NEXT: global_load_dwordx4 v[16:19], v28, s[4:5] offset:64
	; GFX9-NEXT: global_load_dwordx4 v[20:23], v28, s[4:5] offset:80			; GFX9-NEXT: global_load_dwordx4 v[20:23], v28, s[4:5] offset:80
	; GFX9-NEXT: global_load_dwordx4 v[24:27], v28, s[4:5] offset:96			; GFX9-NEXT: global_load_dwordx4 v[24:27], v28, s[4:5] offset:96
	; GFX9-NEXT: s_nop 0			; GFX9-NEXT: s_nop 0
	; GFX9-NEXT: global_load_dwordx4 v[28:31], v28, s[4:5] offset:112			; GFX9-NEXT: global_load_dwordx4 v[28:31], v28, s[4:5] offset:112
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v32i32_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v32i32_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v32i32_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v32i32_i32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_waitcnt vmcnt(7)			; GFX9-NEXT: s_waitcnt vmcnt(7)
	; GFX9-NEXT: global_load_dword v32, v[0:1], off			; GFX9-NEXT: global_load_dword v32, v[0:1], off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32_i32:			; GFX10-LABEL: test_call_external_void_func_v32i32_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v32, 0			; GFX10-NEXT: v_mov_b32_e32 v32, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: global_load_dword v33, v[0:1], off			; GFX10-NEXT: global_load_dword v33, v[0:1], off
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x7			; GFX10-NEXT: s_clause 0x7
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[4:5]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v32, s[4:5]
	; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[4:5] offset:16			; GFX10-NEXT: global_load_dwordx4 v[4:7], v32, s[4:5] offset:16
	; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[4:5] offset:32			; GFX10-NEXT: global_load_dwordx4 v[8:11], v32, s[4:5] offset:32
	; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[4:5] offset:48			; GFX10-NEXT: global_load_dwordx4 v[12:15], v32, s[4:5] offset:48
	; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[4:5] offset:64			; GFX10-NEXT: global_load_dwordx4 v[16:19], v32, s[4:5] offset:64
	; GFX10-NEXT: global_load_dwordx4 v[20:23], v32, s[4:5] offset:80			; GFX10-NEXT: global_load_dwordx4 v[20:23], v32, s[4:5] offset:80
	; GFX10-NEXT: global_load_dwordx4 v[24:27], v32, s[4:5] offset:96			; GFX10-NEXT: global_load_dwordx4 v[24:27], v32, s[4:5] offset:96
	; GFX10-NEXT: global_load_dwordx4 v[28:31], v32, s[4:5] offset:112			; GFX10-NEXT: global_load_dwordx4 v[28:31], v32, s[4:5] offset:112
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v32i32_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v32i32_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v32i32_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v32i32_i32@rel32@hi+12
	; GFX10-NEXT: s_waitcnt vmcnt(8)			; GFX10-NEXT: s_waitcnt vmcnt(8)
	; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v32, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: global_load_dword v33, v[0:1], off			; GFX10-SCRATCH-NEXT: global_load_dword v33, v[0:1], off
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x7			; GFX10-SCRATCH-NEXT: s_clause 0x7
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v32, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[4:7], v32, s[0:1] offset:16
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[8:11], v32, s[0:1] offset:32
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[12:15], v32, s[0:1] offset:48
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[16:19], v32, s[0:1] offset:64
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[20:23], v32, s[0:1] offset:80			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[20:23], v32, s[0:1] offset:80
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[24:27], v32, s[0:1] offset:96			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[24:27], v32, s[0:1] offset:96
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[28:31], v32, s[0:1] offset:112			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[28:31], v32, s[0:1] offset:112
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v32i32_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v32i32_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v32i32_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(8)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(8)
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v33, s32			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v33, s32
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%ptr0 = load <32 x i32> addrspace(1), <32 x i32> addrspace(1) addrspace(4)* undef			%ptr0 = load <32 x i32> addrspace(1), <32 x i32> addrspace(1) addrspace(4)* undef
	Show All 10 Lines
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v42, s33, 2			; GFX9-NEXT: v_writelane_b32 v42, s33, 2
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: v_writelane_b32 v42, s30, 0			; GFX9-NEXT: v_writelane_b32 v42, s30, 0
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v40, v0			; GFX9-NEXT: v_mov_b32_e32 v40, v0
	; GFX9-NEXT: v_mov_b32_e32 v0, 42			; GFX9-NEXT: v_mov_b32_e32 v0, 42
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_i32_func_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_i32_func_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_i32_func_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_i32_func_i32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v42, s31, 1			; GFX9-NEXT: v_writelane_b32 v42, s31, 1
	; GFX9-NEXT: v_mov_b32_e32 v41, v1			; GFX9-NEXT: v_mov_b32_e32 v41, v1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: global_store_dword v[40:41], v0, off			; GFX9-NEXT: global_store_dword v[40:41], v0, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s4, v42, 0			; GFX9-NEXT: v_readlane_b32 s4, v42, 0
	; GFX9-NEXT: v_readlane_b32 s5, v42, 1			; GFX9-NEXT: v_readlane_b32 s5, v42, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v42, 2			; GFX9-NEXT: v_readlane_b32 s33, v42, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_i32_func_i32_imm:			; GFX10-LABEL: test_call_external_i32_func_i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v42, s33, 2			; GFX10-NEXT: v_writelane_b32 v42, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: v_mov_b32_e32 v40, v0			; GFX10-NEXT: v_mov_b32_e32 v40, v0
	; GFX10-NEXT: v_writelane_b32 v42, s30, 0			; GFX10-NEXT: v_writelane_b32 v42, s30, 0
	; GFX10-NEXT: v_mov_b32_e32 v0, 42			; GFX10-NEXT: v_mov_b32_e32 v0, 42
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_i32_func_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_i32_func_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_i32_func_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_i32_func_i32@rel32@hi+12
	; GFX10-NEXT: v_mov_b32_e32 v41, v1			; GFX10-NEXT: v_mov_b32_e32 v41, v1
	; GFX10-NEXT: v_writelane_b32 v42, s31, 1			; GFX10-NEXT: v_writelane_b32 v42, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: global_store_dword v[40:41], v0, off			; GFX10-NEXT: global_store_dword v[40:41], v0, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4
	; GFX10-NEXT: v_readlane_b32 s4, v42, 0			; GFX10-NEXT: v_readlane_b32 s4, v42, 0
	; GFX10-NEXT: v_readlane_b32 s5, v42, 1			; GFX10-NEXT: v_readlane_b32 s5, v42, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v42, 2			; GFX10-NEXT: v_readlane_b32 s33, v42, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_i32_func_i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_i32_func_i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v42, s32 offset:8 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v42, s32 offset:8 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v42, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v42, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:4 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s33 offset:4 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v40, v0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v40, v0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v42, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v42, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 42
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_i32_func_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_i32_func_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_i32_func_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_i32_func_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v41, v1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v41, v1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v42, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v42, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: global_store_dword v[40:41], v0, off			; GFX10-SCRATCH-NEXT: global_store_dword v[40:41], v0, off
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: s_clause 0x1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33			; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:4			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v42, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v42, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v42, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v42, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v42, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v42, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v42, off, s32 offset:8 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v42, off, s32 offset:8 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%val = call amdgpu_gfx i32 @external_i32_func_i32(i32 42)			%val = call amdgpu_gfx i32 @external_i32_func_i32(i32 42)
	Show All 11 Lines
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v2, 0			; GFX9-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dword v1, v2, s[4:5] offset:4			; GFX9-NEXT: global_load_dword v1, v2, s[4:5] offset:4
	; GFX9-NEXT: global_load_ubyte v0, v2, s[4:5]			; GFX9-NEXT: global_load_ubyte v0, v2, s[4:5]
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_struct_i8_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_struct_i8_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_struct_i8_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_struct_i8_i32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_struct_i8_i32:			; GFX10-LABEL: test_call_external_void_func_struct_i8_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v2, 0			; GFX10-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: global_load_ubyte v0, v2, s[4:5]			; GFX10-NEXT: global_load_ubyte v0, v2, s[4:5]
	; GFX10-NEXT: global_load_dword v1, v2, s[4:5] offset:4			; GFX10-NEXT: global_load_dword v1, v2, s[4:5] offset:4
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_struct_i8_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_struct_i8_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_struct_i8_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_struct_i8_i32@rel32@hi+12
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_struct_i8_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_struct_i8_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: s_clause 0x1
	; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v2, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_ubyte v0, v2, s[0:1]
	; GFX10-SCRATCH-NEXT: global_load_dword v1, v2, s[0:1] offset:4			; GFX10-SCRATCH-NEXT: global_load_dword v1, v2, s[0:1] offset:4
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_struct_i8_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_struct_i8_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_struct_i8_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_struct_i8_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%ptr0 = load { i8, i32 } addrspace(1), { i8, i32 } addrspace(1) addrspace(4)* undef			%ptr0 = load { i8, i32 } addrspace(1), { i8, i32 } addrspace(1) addrspace(4)* undef
	Show All 10 Lines
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 3			; GFX9-NEXT: v_mov_b32_e32 v0, 3
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33
	; GFX9-NEXT: v_mov_b32_e32 v0, 8			; GFX9-NEXT: v_mov_b32_e32 v0, 8
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4
	; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33			; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_byval_struct_i8_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_byval_struct_i8_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_byval_struct_i8_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_byval_struct_i8_i32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_byval_struct_i8_i32:			; GFX10-LABEL: test_call_external_void_func_byval_struct_i8_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 3			; GFX10-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-NEXT: v_mov_b32_e32 v1, 8			; GFX10-NEXT: v_mov_b32_e32 v1, 8
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4
	; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33			; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_byval_struct_i8_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_byval_struct_i8_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_byval_struct_i8_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_byval_struct_i8_i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_byval_struct_i8_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_byval_struct_i8_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, s33			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, s33
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_byval_struct_i8_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_byval_struct_i8_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_byval_struct_i8_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_byval_struct_i8_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:8 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:8 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%val = alloca { i8, i32 }, align 4, addrspace(5)			%val = alloca { i8, i32 }, align 4, addrspace(5)
	Show All 15 Lines
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 3			; GFX9-NEXT: v_mov_b32_e32 v0, 3
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s33
	; GFX9-NEXT: v_mov_b32_e32 v0, 8			; GFX9-NEXT: v_mov_b32_e32 v0, 8
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4
	; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33			; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_add_u32 s32, s32, 0x800			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: v_add_u32_e32 v0, 8, v0			; GFX9-NEXT: v_add_u32_e32 v0, 8, v0
	; GFX9-NEXT: v_lshrrev_b32_e64 v1, 6, s33			; GFX9-NEXT: v_lshrrev_b32_e64 v1, 6, s33
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:8			; GFX9-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:8
	; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12			; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: global_store_byte v[0:1], v0, off			; GFX9-NEXT: global_store_byte v[0:1], v0, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: global_store_dword v[0:1], v1, off			; GFX9-NEXT: global_store_dword v[0:1], v1, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:			; GFX10-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_mov_b32_e32 v0, 3			; GFX10-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-NEXT: v_mov_b32_e32 v1, 8			; GFX10-NEXT: v_mov_b32_e32 v1, 8
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x400			; GFX10-NEXT: s_addk_i32 s32, 0x400
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s33
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:4
	; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33			; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_lshrrev_b32_e64 v1, 5, s33			; GFX10-NEXT: v_lshrrev_b32_e64 v1, 5, s33
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12
	; GFX10-NEXT: v_add_nc_u32_e32 v0, 8, v0			; GFX10-NEXT: v_add_nc_u32_e32 v0, 8, v0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:8			; GFX10-NEXT: buffer_load_ubyte v0, off, s[0:3], s33 offset:8
	; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12			; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x400			; GFX10-NEXT: s_addk_i32 s32, 0xfc00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: global_store_byte v[0:1], v0, off			; GFX10-NEXT: global_store_byte v[0:1], v0, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: global_store_dword v[0:1], v1, off			; GFX10-NEXT: global_store_dword v[0:1], v1, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:16 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:16 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:16 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 32			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 32
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 8
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_sret_struct_i8_i32_byval_struct_i8_i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_add_u32 vcc_lo, s33, 8			; GFX10-SCRATCH-NEXT: s_add_i32 vcc_lo, s33, 8
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s33
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v1, s33 offset:4
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, vcc_lo			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, vcc_lo
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, s33			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, s33
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: s_clause 0x1			; GFX10-SCRATCH-NEXT: s_clause 0x1
	; GFX10-SCRATCH-NEXT: scratch_load_ubyte v0, off, s33 offset:8			; GFX10-SCRATCH-NEXT: scratch_load_ubyte v0, off, s33 offset:8
	; GFX10-SCRATCH-NEXT: scratch_load_dword v1, off, s33 offset:12			; GFX10-SCRATCH-NEXT: scratch_load_dword v1, off, s33 offset:12
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 32			; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: global_store_byte v[0:1], v0, off			; GFX10-SCRATCH-NEXT: global_store_byte v[0:1], v0, off
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: global_store_dword v[0:1], v1, off			; GFX10-SCRATCH-NEXT: global_store_dword v[0:1], v1, off
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:16 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:16 ; 4-byte Folded Reload
	Show All 27 Lines
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: global_load_dwordx4 v[0:3], v0, s[4:5]			; GFX9-NEXT: global_load_dwordx4 v[0:3], v0, s[4:5]
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v16i8@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_v16i8@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v16i8@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_v16i8@rel32@hi+12
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_lshrrev_b32_e32 v16, 8, v0			; GFX9-NEXT: v_lshrrev_b32_e32 v16, 8, v0
	; GFX9-NEXT: v_lshrrev_b32_e32 v17, 16, v0			; GFX9-NEXT: v_lshrrev_b32_e32 v17, 16, v0
	; GFX9-NEXT: v_lshrrev_b32_e32 v18, 24, v0			; GFX9-NEXT: v_lshrrev_b32_e32 v18, 24, v0
	Show All 10 Lines
	; GFX9-NEXT: v_lshrrev_b32_e32 v15, 24, v3			; GFX9-NEXT: v_lshrrev_b32_e32 v15, 24, v3
	; GFX9-NEXT: v_mov_b32_e32 v12, v3			; GFX9-NEXT: v_mov_b32_e32 v12, v3
	; GFX9-NEXT: v_mov_b32_e32 v1, v16			; GFX9-NEXT: v_mov_b32_e32 v1, v16
	; GFX9-NEXT: v_mov_b32_e32 v2, v17			; GFX9-NEXT: v_mov_b32_e32 v2, v17
	; GFX9-NEXT: v_mov_b32_e32 v3, v18			; GFX9-NEXT: v_mov_b32_e32 v3, v18
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v16i8:			; GFX10-LABEL: test_call_external_void_func_v16i8:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: global_load_dwordx4 v[0:3], v0, s[4:5]			; GFX10-NEXT: global_load_dwordx4 v[0:3], v0, s[4:5]
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v16i8@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_v16i8@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v16i8@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_v16i8@rel32@hi+12
	Show All 14 Lines
	; GFX10-NEXT: v_lshrrev_b32_e32 v15, 24, v3			; GFX10-NEXT: v_lshrrev_b32_e32 v15, 24, v3
	; GFX10-NEXT: v_mov_b32_e32 v12, v3			; GFX10-NEXT: v_mov_b32_e32 v12, v3
	; GFX10-NEXT: v_mov_b32_e32 v1, v16			; GFX10-NEXT: v_mov_b32_e32 v1, v16
	; GFX10-NEXT: v_mov_b32_e32 v2, v17			; GFX10-NEXT: v_mov_b32_e32 v2, v17
	; GFX10-NEXT: v_mov_b32_e32 v3, v18			; GFX10-NEXT: v_mov_b32_e32 v3, v18
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i8:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i8:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v0, s[0:1]			; GFX10-SCRATCH-NEXT: global_load_dwordx4 v[0:3], v0, s[0:1]
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i8@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i8@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i8@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i8@rel32@hi+12
	Show All 14 Lines
	; GFX10-SCRATCH-NEXT: v_lshrrev_b32_e32 v15, 24, v3			; GFX10-SCRATCH-NEXT: v_lshrrev_b32_e32 v15, 24, v3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v12, v3			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v12, v3
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, v16			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, v16
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, v17			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, v17
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, v18			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, v18
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%ptr = load <16 x i8> addrspace(1), <16 x i8> addrspace(1) addrspace(4)* undef			%ptr = load <16 x i8> addrspace(1), <16 x i8> addrspace(1) addrspace(4)* undef
	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i1_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_i1_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i1_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_i1_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i1_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 1			; GFX10-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i1_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_i1_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i1_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_i1_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i1_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i1_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32			; GFX10-SCRATCH-NEXT: scratch_store_byte off, v0, s32
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_i1_inreg(i1 inreg true)			call amdgpu_gfx void @external_void_func_i1_inreg(i1 inreg true)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i8_imm_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i8_imm_inreg(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i8_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_i8_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_movk_i32 s4, 0x7b			; GFX9-NEXT: s_movk_i32 s4, 0x7b
	; GFX9-NEXT: s_getpc_b64 s[6:7]			; GFX9-NEXT: s_getpc_b64 s[6:7]
	; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_i8_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_i8_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_i8_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_i8_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i8_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_movk_i32 s4, 0x7b			; GFX10-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[6:7]			; GFX10-NEXT: s_getpc_b64 s[6:7]
	; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_i8_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_i8_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_i8_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_i8_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b			; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i8_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i8_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_i8_inreg(i8 inreg 123)			call amdgpu_gfx void @external_void_func_i8_inreg(i8 inreg 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_i16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_i16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_i16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_movk_i32 s4, 0x7b			; GFX9-NEXT: s_movk_i32 s4, 0x7b
	; GFX9-NEXT: s_getpc_b64 s[6:7]			; GFX9-NEXT: s_getpc_b64 s[6:7]
	; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_i16_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_movk_i32 s4, 0x7b			; GFX10-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[6:7]			; GFX10-NEXT: s_getpc_b64 s[6:7]
	; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b			; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_i16_inreg(i16 inreg 123)			call amdgpu_gfx void @external_void_func_i16_inreg(i16 inreg 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i32_imm_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i32_imm_inreg(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s4, 42			; GFX9-NEXT: s_mov_b32 s4, 42
	; GFX9-NEXT: s_getpc_b64 s[6:7]			; GFX9-NEXT: s_getpc_b64 s[6:7]
	; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_i32_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s4, 42			; GFX10-NEXT: s_mov_b32 s4, 42
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[6:7]			; GFX10-NEXT: s_getpc_b64 s[6:7]
	; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 42			; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 42
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_i32_inreg(i32 inreg 42)			call amdgpu_gfx void @external_void_func_i32_inreg(i32 inreg 42)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_i64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_i64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_i64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_movk_i32 s4, 0x7b			; GFX9-NEXT: s_movk_i32 s4, 0x7b
	; GFX9-NEXT: s_mov_b32 s5, 0			; GFX9-NEXT: s_mov_b32 s5, 0
	; GFX9-NEXT: s_getpc_b64 s[6:7]			; GFX9-NEXT: s_getpc_b64 s[6:7]
	; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_i64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_i64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_i64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_i64_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_movk_i32 s4, 0x7b			; GFX10-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-NEXT: s_mov_b32 s5, 0			; GFX10-NEXT: s_mov_b32 s5, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[6:7]			; GFX10-NEXT: s_getpc_b64 s[6:7]
	; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_i64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b			; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x7b
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_i64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_i64_inreg(i64 inreg 123)			call amdgpu_gfx void @external_void_func_i64_inreg(i64 inreg 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i64_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i64_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i64_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i64_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_mov_b64 s[4:5], 0			; GFX9-NEXT: s_mov_b64 s[4:5], 0
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[8:9]			; GFX9-NEXT: s_getpc_b64 s[8:9]
	; GFX9-NEXT: s_add_u32 s8, s8, external_void_func_v2i64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s8, s8, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s9, s9, external_void_func_v2i64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s9, s9, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[8:9]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[8:9]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i64_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_mov_b64 s[4:5], 0			; GFX10-NEXT: s_mov_b64 s[4:5], 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[8:9]			; GFX10-NEXT: s_getpc_b64 s[8:9]
	; GFX10-NEXT: s_add_u32 s8, s8, external_void_func_v2i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s8, s8, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s9, s9, external_void_func_v2i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s9, s9, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[8:9]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[8:9]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0			; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%val = load <2 x i64>, <2 x i64> addrspace(4)* null			%val = load <2 x i64>, <2 x i64> addrspace(4)* null
	call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg %val)			call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: s_mov_b32 s6, 3			; GFX9-NEXT: s_mov_b32 s6, 3
	; GFX9-NEXT: s_mov_b32 s7, 4			; GFX9-NEXT: s_mov_b32 s7, 4
	; GFX9-NEXT: s_getpc_b64 s[8:9]			; GFX9-NEXT: s_getpc_b64 s[8:9]
	; GFX9-NEXT: s_add_u32 s8, s8, external_void_func_v2i64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s8, s8, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s9, s9, external_void_func_v2i64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s9, s9, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[8:9]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[8:9]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s4, 1			; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: s_mov_b32 s5, 2			; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: s_mov_b32 s6, 3			; GFX10-NEXT: s_mov_b32 s6, 3
	; GFX10-NEXT: s_mov_b32 s7, 4			; GFX10-NEXT: s_mov_b32 s7, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[8:9]			; GFX10-NEXT: s_getpc_b64 s[8:9]
	; GFX10-NEXT: s_add_u32 s8, s8, external_void_func_v2i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s8, s8, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s9, s9, external_void_func_v2i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s9, s9, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[8:9]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[8:9]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1			; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg <i64 8589934593, i64 17179869187>)			call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg <i64 8589934593, i64 17179869187>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i64_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i64_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i64_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i64_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_mov_b64 s[4:5], 0			; GFX9-NEXT: s_mov_b64 s[4:5], 0
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s8, 1			; GFX9-NEXT: s_mov_b32 s8, 1
	; GFX9-NEXT: s_mov_b32 s9, 2			; GFX9-NEXT: s_mov_b32 s9, 2
	; GFX9-NEXT: s_getpc_b64 s[10:11]			; GFX9-NEXT: s_getpc_b64 s[10:11]
	; GFX9-NEXT: s_add_u32 s10, s10, external_void_func_v3i64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s10, s10, external_void_func_v3i64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s11, s11, external_void_func_v3i64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s11, s11, external_void_func_v3i64_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[10:11]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[10:11]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i64_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i64_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_mov_b64 s[4:5], 0			; GFX10-NEXT: s_mov_b64 s[4:5], 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0
	; GFX10-NEXT: s_mov_b32 s8, 1			; GFX10-NEXT: s_mov_b32 s8, 1
	; GFX10-NEXT: s_mov_b32 s9, 2			; GFX10-NEXT: s_mov_b32 s9, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[10:11]			; GFX10-NEXT: s_getpc_b64 s[10:11]
	; GFX10-NEXT: s_add_u32 s10, s10, external_void_func_v3i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s10, s10, external_void_func_v3i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s11, s11, external_void_func_v3i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s11, s11, external_void_func_v3i64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[10:11]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[10:11]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0			; GFX10-SCRATCH-NEXT: s_mov_b64 s[0:1], 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 1			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 2			; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%load = load <2 x i64>, <2 x i64> addrspace(4)* null			%load = load <2 x i64>, <2 x i64> addrspace(4)* null
	Show All 10 Lines
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_mov_b64 s[4:5], 0			; GFX9-NEXT: s_mov_b64 s[4:5], 0
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s8, 1			; GFX9-NEXT: s_mov_b32 s8, 1
	; GFX9-NEXT: s_mov_b32 s9, 2			; GFX9-NEXT: s_mov_b32 s9, 2
	; GFX9-NEXT: s_mov_b32 s10, 3			; GFX9-NEXT: s_mov_b32 s10, 3
	; GFX9-NEXT: s_mov_b32 s11, 4			; GFX9-NEXT: s_mov_b32 s11, 4
	; GFX9-NEXT: s_getpc_b64 s[12:13]			; GFX9-NEXT: s_getpc_b64 s[12:13]
	; GFX9-NEXT: s_add_u32 s12, s12, external_void_func_v4i64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s12, s12, external_void_func_v4i64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s13, s13, external_void_func_v4i64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s13, s13, external_void_func_v4i64_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[12:13]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[12:13]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i64_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i64_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_mov_b64 s[4:5], 0			; GFX10-NEXT: s_mov_b64 s[4:5], 0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0
	; GFX10-NEXT: s_mov_b32 s8, 1			; GFX10-NEXT: s_mov_b32 s8, 1
	; GFX10-NEXT: s_mov_b32 s9, 2			; GFX10-NEXT: s_mov_b32 s9, 2
	; GFX10-NEXT: s_mov_b32 s10, 3			; GFX10-NEXT: s_mov_b32 s10, 3
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_mov_b32 s11, 4			; GFX10-NEXT: s_mov_b32 s11, 4
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[12:13]			; GFX10-NEXT: s_getpc_b64 s[12:13]
	; GFX10-NEXT: s_add_u32 s12, s12, external_void_func_v4i64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s12, s12, external_void_func_v4i64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s13, s13, external_void_func_v4i64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s13, s13, external_void_func_v4i64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[12:13]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[12:13]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	Show All 9 Lines
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 1			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 2			; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s10, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s10, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s11, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s11, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%load = load <2 x i64>, <2 x i64> addrspace(4)* null			%load = load <2 x i64>, <2 x i64> addrspace(4)* null
	%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	call amdgpu_gfx void @external_void_func_v4i64_inreg(<4 x i64> inreg %val)			call amdgpu_gfx void @external_void_func_v4i64_inreg(<4 x i64> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_f16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_f16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_f16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_movk_i32 s4, 0x4400			; GFX9-NEXT: s_movk_i32 s4, 0x4400
	; GFX9-NEXT: s_getpc_b64 s[6:7]			; GFX9-NEXT: s_getpc_b64 s[6:7]
	; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_f16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_f16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_f16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_f16_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_f16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_movk_i32 s4, 0x4400			; GFX10-NEXT: s_movk_i32 s4, 0x4400
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[6:7]			; GFX10-NEXT: s_getpc_b64 s[6:7]
	; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_f16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_f16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_f16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_f16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x4400			; GFX10-SCRATCH-NEXT: s_movk_i32 s4, 0x4400
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_f16_inreg(half inreg 4.0)			call amdgpu_gfx void @external_void_func_f16_inreg(half inreg 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_f32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_f32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_f32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s4, 4.0			; GFX9-NEXT: s_mov_b32 s4, 4.0
	; GFX9-NEXT: s_getpc_b64 s[6:7]			; GFX9-NEXT: s_getpc_b64 s[6:7]
	; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_f32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_f32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_f32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_f32_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s4, 4.0			; GFX10-NEXT: s_mov_b32 s4, 4.0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[6:7]			; GFX10-NEXT: s_getpc_b64 s[6:7]
	; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_f32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_f32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_f32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_f32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 4.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 4.0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_f32_inreg(float inreg 4.0)			call amdgpu_gfx void @external_void_func_f32_inreg(float inreg 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v2f32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s4, 1.0			; GFX9-NEXT: s_mov_b32 s4, 1.0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: s_getpc_b64 s[6:7]			; GFX9-NEXT: s_getpc_b64 s[6:7]
	; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v2f32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v2f32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v2f32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v2f32_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s4, 1.0			; GFX10-NEXT: s_mov_b32 s4, 1.0
	; GFX10-NEXT: s_mov_b32 s5, 2.0			; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[6:7]			; GFX10-NEXT: s_getpc_b64 s[6:7]
	; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v2f32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v2f32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v2f32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v2f32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v2f32_inreg(<2 x float> inreg <float 1.0, float 2.0>)			call amdgpu_gfx void @external_void_func_v2f32_inreg(<2 x float> inreg <float 1.0, float 2.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3f32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s4, 1.0			; GFX9-NEXT: s_mov_b32 s4, 1.0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: s_mov_b32 s6, 4.0			; GFX9-NEXT: s_mov_b32 s6, 4.0
	; GFX9-NEXT: s_getpc_b64 s[8:9]			; GFX9-NEXT: s_getpc_b64 s[8:9]
	; GFX9-NEXT: s_add_u32 s8, s8, external_void_func_v3f32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s8, s8, external_void_func_v3f32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s9, s9, external_void_func_v3f32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s9, s9, external_void_func_v3f32_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[8:9]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[8:9]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s4, 1.0			; GFX10-NEXT: s_mov_b32 s4, 1.0
	; GFX10-NEXT: s_mov_b32 s5, 2.0			; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: s_mov_b32 s6, 4.0			; GFX10-NEXT: s_mov_b32 s6, 4.0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[8:9]			; GFX10-NEXT: s_getpc_b64 s[8:9]
	; GFX10-NEXT: s_add_u32 s8, s8, external_void_func_v3f32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s8, s8, external_void_func_v3f32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s9, s9, external_void_func_v3f32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s9, s9, external_void_func_v3f32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[8:9]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[8:9]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v3f32_inreg(<3 x float> inreg <float 1.0, float 2.0, float 4.0>)			call amdgpu_gfx void @external_void_func_v3f32_inreg(<3 x float> inreg <float 1.0, float 2.0, float 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5f32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v5f32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v5f32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s4, 1.0			; GFX9-NEXT: s_mov_b32 s4, 1.0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: s_mov_b32 s6, 4.0			; GFX9-NEXT: s_mov_b32 s6, 4.0
	; GFX9-NEXT: s_mov_b32 s7, -1.0			; GFX9-NEXT: s_mov_b32 s7, -1.0
	; GFX9-NEXT: s_mov_b32 s8, 0.5			; GFX9-NEXT: s_mov_b32 s8, 0.5
	; GFX9-NEXT: s_getpc_b64 s[10:11]			; GFX9-NEXT: s_getpc_b64 s[10:11]
	; GFX9-NEXT: s_add_u32 s10, s10, external_void_func_v5f32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s10, s10, external_void_func_v5f32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s11, s11, external_void_func_v5f32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s11, s11, external_void_func_v5f32_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[10:11]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[10:11]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v5f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s4, 1.0			; GFX10-NEXT: s_mov_b32 s4, 1.0
	; GFX10-NEXT: s_mov_b32 s5, 2.0			; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: s_mov_b32 s6, 4.0			; GFX10-NEXT: s_mov_b32 s6, 4.0
	; GFX10-NEXT: s_mov_b32 s7, -1.0			; GFX10-NEXT: s_mov_b32 s7, -1.0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_mov_b32 s8, 0.5			; GFX10-NEXT: s_mov_b32 s8, 0.5
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[10:11]			; GFX10-NEXT: s_getpc_b64 s[10:11]
	; GFX10-NEXT: s_add_u32 s10, s10, external_void_func_v5f32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s10, s10, external_void_func_v5f32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s11, s11, external_void_func_v5f32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s11, s11, external_void_func_v5f32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[10:11]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[10:11]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1.0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 4.0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, -1.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, -1.0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0.5			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0.5
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5f32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5f32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v5f32_inreg(<5 x float> inreg <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)			call amdgpu_gfx void @external_void_func_v5f32_inreg(<5 x float> inreg <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_f64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_f64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_f64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s4, 0			; GFX9-NEXT: s_mov_b32 s4, 0
	; GFX9-NEXT: s_mov_b32 s5, 0x40100000			; GFX9-NEXT: s_mov_b32 s5, 0x40100000
	; GFX9-NEXT: s_getpc_b64 s[6:7]			; GFX9-NEXT: s_getpc_b64 s[6:7]
	; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_f64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_f64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_f64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_f64_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_f64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s4, 0			; GFX10-NEXT: s_mov_b32 s4, 0
	; GFX10-NEXT: s_mov_b32 s5, 0x40100000			; GFX10-NEXT: s_mov_b32 s5, 0x40100000
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[6:7]			; GFX10-NEXT: s_getpc_b64 s[6:7]
	; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_f64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_f64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_f64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_f64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0			; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40100000			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40100000
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_f64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_f64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_f64_inreg(double inreg 4.0)			call amdgpu_gfx void @external_void_func_f64_inreg(double inreg 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v2f64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s4, 0			; GFX9-NEXT: s_mov_b32 s4, 0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: s_mov_b32 s6, 0			; GFX9-NEXT: s_mov_b32 s6, 0
	; GFX9-NEXT: s_mov_b32 s7, 0x40100000			; GFX9-NEXT: s_mov_b32 s7, 0x40100000
	; GFX9-NEXT: s_getpc_b64 s[8:9]			; GFX9-NEXT: s_getpc_b64 s[8:9]
	; GFX9-NEXT: s_add_u32 s8, s8, external_void_func_v2f64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s8, s8, external_void_func_v2f64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s9, s9, external_void_func_v2f64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s9, s9, external_void_func_v2f64_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[8:9]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[8:9]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2f64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s4, 0			; GFX10-NEXT: s_mov_b32 s4, 0
	; GFX10-NEXT: s_mov_b32 s5, 2.0			; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: s_mov_b32 s6, 0			; GFX10-NEXT: s_mov_b32 s6, 0
	; GFX10-NEXT: s_mov_b32 s7, 0x40100000			; GFX10-NEXT: s_mov_b32 s7, 0x40100000
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[8:9]			; GFX10-NEXT: s_getpc_b64 s[8:9]
	; GFX10-NEXT: s_add_u32 s8, s8, external_void_func_v2f64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s8, s8, external_void_func_v2f64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s9, s9, external_void_func_v2f64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s9, s9, external_void_func_v2f64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[8:9]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[8:9]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0			; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v2f64_inreg(<2 x double> inreg <double 2.0, double 4.0>)			call amdgpu_gfx void @external_void_func_v2f64_inreg(<2 x double> inreg <double 2.0, double 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f64_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f64_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3f64_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s4, 0			; GFX9-NEXT: s_mov_b32 s4, 0
	; GFX9-NEXT: s_mov_b32 s5, 2.0			; GFX9-NEXT: s_mov_b32 s5, 2.0
	; GFX9-NEXT: s_mov_b32 s6, 0			; GFX9-NEXT: s_mov_b32 s6, 0
	; GFX9-NEXT: s_mov_b32 s7, 0x40100000			; GFX9-NEXT: s_mov_b32 s7, 0x40100000
	; GFX9-NEXT: s_mov_b32 s8, 0			; GFX9-NEXT: s_mov_b32 s8, 0
	; GFX9-NEXT: s_mov_b32 s9, 0x40200000			; GFX9-NEXT: s_mov_b32 s9, 0x40200000
	; GFX9-NEXT: s_getpc_b64 s[10:11]			; GFX9-NEXT: s_getpc_b64 s[10:11]
	; GFX9-NEXT: s_add_u32 s10, s10, external_void_func_v3f64_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s10, s10, external_void_func_v3f64_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s11, s11, external_void_func_v3f64_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s11, s11, external_void_func_v3f64_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[10:11]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[10:11]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s4, 0			; GFX10-NEXT: s_mov_b32 s4, 0
	; GFX10-NEXT: s_mov_b32 s5, 2.0			; GFX10-NEXT: s_mov_b32 s5, 2.0
	; GFX10-NEXT: s_mov_b32 s6, 0			; GFX10-NEXT: s_mov_b32 s6, 0
	; GFX10-NEXT: s_mov_b32 s7, 0x40100000			; GFX10-NEXT: s_mov_b32 s7, 0x40100000
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_mov_b32 s8, 0			; GFX10-NEXT: s_mov_b32 s8, 0
	; GFX10-NEXT: s_mov_b32 s9, 0x40200000			; GFX10-NEXT: s_mov_b32 s9, 0x40200000
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[10:11]			; GFX10-NEXT: s_getpc_b64 s[10:11]
	; GFX10-NEXT: s_add_u32 s10, s10, external_void_func_v3f64_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s10, s10, external_void_func_v3f64_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s11, s11, external_void_func_v3f64_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s11, s11, external_void_func_v3f64_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[10:11]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[10:11]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	Show All 9 Lines
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0			; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2.0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 0x40100000
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 0x40200000			; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 0x40200000
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f64_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f64_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v3f64_inreg(<3 x double> inreg <double 2.0, double 4.0, double 8.0>)			call amdgpu_gfx void @external_void_func_v3f64_inreg(<3 x double> inreg <double 2.0, double 4.0, double 8.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i16_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_load_dword s4, s[4:5], 0x0			; GFX9-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[6:7]			; GFX9-NEXT: s_getpc_b64 s[6:7]
	; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v2i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v2i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v2i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v2i16_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i16_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_load_dword s4, s[4:5], 0x0			; GFX10-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[6:7]			; GFX10-NEXT: s_getpc_b64 s[6:7]
	; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v2i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v2i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v2i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v2i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%val = load <2 x i16>, <2 x i16> addrspace(4)* undef			%val = load <2 x i16>, <2 x i16> addrspace(4)* undef
	call amdgpu_gfx void @external_void_func_v2i16_inreg(<2 x i16> inreg %val)			call amdgpu_gfx void @external_void_func_v2i16_inreg(<2 x i16> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i16_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[6:7]			; GFX9-NEXT: s_getpc_b64 s[6:7]
	; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v3i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v3i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[6:7]			; GFX10-NEXT: s_getpc_b64 s[6:7]
	; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v3i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v3i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%val = load <3 x i16>, <3 x i16> addrspace(4)* undef			%val = load <3 x i16>, <3 x i16> addrspace(4)* undef
	call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg %val)			call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f16_inreg:			; GFX9-LABEL: test_call_external_void_func_v3f16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[6:7]			; GFX9-NEXT: s_getpc_b64 s[6:7]
	; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v3f16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v3f16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[6:7]			; GFX10-NEXT: s_getpc_b64 s[6:7]
	; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v3f16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v3f16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%val = load <3 x half>, <3 x half> addrspace(4)* undef			%val = load <3 x half>, <3 x half> addrspace(4)* undef
	call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg %val)			call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s4, 0x20001			; GFX9-NEXT: s_mov_b32 s4, 0x20001
	; GFX9-NEXT: s_mov_b32 s5, 3			; GFX9-NEXT: s_mov_b32 s5, 3
	; GFX9-NEXT: s_getpc_b64 s[6:7]			; GFX9-NEXT: s_getpc_b64 s[6:7]
	; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v3i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v3i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s4, 0x20001			; GFX10-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-NEXT: s_mov_b32 s5, 3			; GFX10-NEXT: s_mov_b32 s5, 3
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[6:7]			; GFX10-NEXT: s_getpc_b64 s[6:7]
	; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v3i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v3i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001			; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg <i16 1, i16 2, i16 3>)			call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg <i16 1, i16 2, i16 3>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v3f16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3f16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s4, 0x40003c00			; GFX9-NEXT: s_mov_b32 s4, 0x40003c00
	; GFX9-NEXT: s_movk_i32 s5, 0x4400			; GFX9-NEXT: s_movk_i32 s5, 0x4400
	; GFX9-NEXT: s_getpc_b64 s[6:7]			; GFX9-NEXT: s_getpc_b64 s[6:7]
	; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v3f16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v3f16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s4, 0x40003c00			; GFX10-NEXT: s_mov_b32 s4, 0x40003c00
	; GFX10-NEXT: s_movk_i32 s5, 0x4400			; GFX10-NEXT: s_movk_i32 s5, 0x4400
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[6:7]			; GFX10-NEXT: s_getpc_b64 s[6:7]
	; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v3f16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v3f16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x40003c00			; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x40003c00
	; GFX10-SCRATCH-NEXT: s_movk_i32 s5, 0x4400			; GFX10-SCRATCH-NEXT: s_movk_i32 s5, 0x4400
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3f16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3f16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg <half 1.0, half 2.0, half 4.0>)			call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg <half 1.0, half 2.0, half 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i16_inreg:			; GFX9-LABEL: test_call_external_void_func_v4i16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[6:7]			; GFX9-NEXT: s_getpc_b64 s[6:7]
	; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v4i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v4i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[6:7]			; GFX10-NEXT: s_getpc_b64 s[6:7]
	; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v4i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v4i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%val = load <4 x i16>, <4 x i16> addrspace(4)* undef			%val = load <4 x i16>, <4 x i16> addrspace(4)* undef
	call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg %val)			call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i16_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i16_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v4i16_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s4, 0x20001			; GFX9-NEXT: s_mov_b32 s4, 0x20001
	; GFX9-NEXT: s_mov_b32 s5, 0x40003			; GFX9-NEXT: s_mov_b32 s5, 0x40003
	; GFX9-NEXT: s_getpc_b64 s[6:7]			; GFX9-NEXT: s_getpc_b64 s[6:7]
	; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v4i16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v4i16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s4, 0x20001			; GFX10-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-NEXT: s_mov_b32 s5, 0x40003			; GFX10-NEXT: s_mov_b32 s5, 0x40003
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[6:7]			; GFX10-NEXT: s_getpc_b64 s[6:7]
	; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v4i16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v4i16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001			; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 0x20001
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40003			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 0x40003
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg <i16 1, i16 2, i16 3, i16 4>)			call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg <i16 1, i16 2, i16 3, i16 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f16_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2f16_inreg:			; GFX9-LABEL: test_call_external_void_func_v2f16_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_load_dword s4, s[4:5], 0x0			; GFX9-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[6:7]			; GFX9-NEXT: s_getpc_b64 s[6:7]
	; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v2f16_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v2f16_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v2f16_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v2f16_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f16_inreg:			; GFX10-LABEL: test_call_external_void_func_v2f16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_load_dword s4, s[4:5], 0x0			; GFX10-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[6:7]			; GFX10-NEXT: s_getpc_b64 s[6:7]
	; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v2f16_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v2f16_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v2f16_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v2f16_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dword s4, s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2f16_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2f16_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%val = load <2 x half>, <2 x half> addrspace(4)* undef			%val = load <2 x half>, <2 x half> addrspace(4)* undef
	call amdgpu_gfx void @external_void_func_v2f16_inreg(<2 x half> inreg %val)			call amdgpu_gfx void @external_void_func_v2f16_inreg(<2 x half> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i32_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[6:7]			; GFX9-NEXT: s_getpc_b64 s[6:7]
	; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v2i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v2i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[6:7]			; GFX10-NEXT: s_getpc_b64 s[6:7]
	; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v2i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v2i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%val = load <2 x i32>, <2 x i32> addrspace(4)* undef			%val = load <2 x i32>, <2 x i32> addrspace(4)* undef
	call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v2i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v2i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: s_getpc_b64 s[6:7]			; GFX9-NEXT: s_getpc_b64 s[6:7]
	; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v2i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s6, s6, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v2i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s7, s7, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s4, 1			; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: s_mov_b32 s5, 2			; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[6:7]			; GFX10-NEXT: s_getpc_b64 s[6:7]
	; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v2i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s6, s6, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v2i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s7, s7, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1			; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v2i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v2i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg <i32 1, i32 2>)			call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg <i32 1, i32 2>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_imm_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_imm_inreg(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s4, 3			; GFX9-NEXT: s_mov_b32 s4, 3
	; GFX9-NEXT: s_mov_b32 s5, 4			; GFX9-NEXT: s_mov_b32 s5, 4
	; GFX9-NEXT: s_mov_b32 s6, 5			; GFX9-NEXT: s_mov_b32 s6, 5
	; GFX9-NEXT: s_getpc_b64 s[8:9]			; GFX9-NEXT: s_getpc_b64 s[8:9]
	; GFX9-NEXT: s_add_u32 s8, s8, external_void_func_v3i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s8, s8, external_void_func_v3i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s9, s9, external_void_func_v3i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s9, s9, external_void_func_v3i32_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[8:9]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[8:9]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s4, 3			; GFX10-NEXT: s_mov_b32 s4, 3
	; GFX10-NEXT: s_mov_b32 s5, 4			; GFX10-NEXT: s_mov_b32 s5, 4
	; GFX10-NEXT: s_mov_b32 s6, 5			; GFX10-NEXT: s_mov_b32 s6, 5
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[8:9]			; GFX10-NEXT: s_getpc_b64 s[8:9]
	; GFX10-NEXT: s_add_u32 s8, s8, external_void_func_v3i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s8, s8, external_void_func_v3i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s9, s9, external_void_func_v3i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s9, s9, external_void_func_v3i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[8:9]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[8:9]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v3i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v3i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_i32_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_i32_inreg(i32) #0 {
	; GFX9-LABEL: test_call_external_void_func_v3i32_i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v3i32_i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s4, 3			; GFX9-NEXT: s_mov_b32 s4, 3
	; GFX9-NEXT: s_mov_b32 s5, 4			; GFX9-NEXT: s_mov_b32 s5, 4
	; GFX9-NEXT: s_mov_b32 s6, 5			; GFX9-NEXT: s_mov_b32 s6, 5
	; GFX9-NEXT: s_mov_b32 s7, 6			; GFX9-NEXT: s_mov_b32 s7, 6
	; GFX9-NEXT: s_getpc_b64 s[8:9]			; GFX9-NEXT: s_getpc_b64 s[8:9]
	; GFX9-NEXT: s_add_u32 s8, s8, external_void_func_v3i32_i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s8, s8, external_void_func_v3i32_i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s9, s9, external_void_func_v3i32_i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s9, s9, external_void_func_v3i32_i32_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[8:9]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[8:9]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i32_i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s4, 3			; GFX10-NEXT: s_mov_b32 s4, 3
	; GFX10-NEXT: s_mov_b32 s5, 4			; GFX10-NEXT: s_mov_b32 s5, 4
	; GFX10-NEXT: s_mov_b32 s6, 5			; GFX10-NEXT: s_mov_b32 s6, 5
	; GFX10-NEXT: s_mov_b32 s7, 6			; GFX10-NEXT: s_mov_b32 s7, 6
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[8:9]			; GFX10-NEXT: s_getpc_b64 s[8:9]
	; GFX10-NEXT: s_add_u32 s8, s8, external_void_func_v3i32_i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s8, s8, external_void_func_v3i32_i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s9, s9, external_void_func_v3i32_i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s9, s9, external_void_func_v3i32_i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[8:9]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[8:9]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 4
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 5
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 6			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v3i32_i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v3i32_i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v3i32_i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>, i32 inreg 6)			call amdgpu_gfx void @external_void_func_v3i32_i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>, i32 inreg 6)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i32_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v4i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[8:9]			; GFX9-NEXT: s_getpc_b64 s[8:9]
	; GFX9-NEXT: s_add_u32 s8, s8, external_void_func_v4i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s8, s8, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s9, s9, external_void_func_v4i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s9, s9, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[8:9]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[8:9]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[8:9]			; GFX10-NEXT: s_getpc_b64 s[8:9]
	; GFX10-NEXT: s_add_u32 s8, s8, external_void_func_v4i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s8, s8, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s9, s9, external_void_func_v4i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s9, s9, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[8:9]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[8:9]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%val = load <4 x i32>, <4 x i32> addrspace(4)* undef			%val = load <4 x i32>, <4 x i32> addrspace(4)* undef
	call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v4i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v4i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: s_mov_b32 s6, 3			; GFX9-NEXT: s_mov_b32 s6, 3
	; GFX9-NEXT: s_mov_b32 s7, 4			; GFX9-NEXT: s_mov_b32 s7, 4
	; GFX9-NEXT: s_getpc_b64 s[8:9]			; GFX9-NEXT: s_getpc_b64 s[8:9]
	; GFX9-NEXT: s_add_u32 s8, s8, external_void_func_v4i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s8, s8, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s9, s9, external_void_func_v4i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s9, s9, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[8:9]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[8:9]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s4, 1			; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: s_mov_b32 s5, 2			; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: s_mov_b32 s6, 3			; GFX10-NEXT: s_mov_b32 s6, 3
	; GFX10-NEXT: s_mov_b32 s7, 4			; GFX10-NEXT: s_mov_b32 s7, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[8:9]			; GFX10-NEXT: s_getpc_b64 s[8:9]
	; GFX10-NEXT: s_add_u32 s8, s8, external_void_func_v4i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s8, s8, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s9, s9, external_void_func_v4i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s9, s9, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[8:9]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[8:9]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1			; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v4i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v4i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg <i32 1, i32 2, i32 3, i32 4>)			call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg <i32 1, i32 2, i32 3, i32 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5i32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5i32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v5i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v5i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: s_mov_b32 s6, 3			; GFX9-NEXT: s_mov_b32 s6, 3
	; GFX9-NEXT: s_mov_b32 s7, 4			; GFX9-NEXT: s_mov_b32 s7, 4
	; GFX9-NEXT: s_mov_b32 s8, 5			; GFX9-NEXT: s_mov_b32 s8, 5
	; GFX9-NEXT: s_getpc_b64 s[10:11]			; GFX9-NEXT: s_getpc_b64 s[10:11]
	; GFX9-NEXT: s_add_u32 s10, s10, external_void_func_v5i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s10, s10, external_void_func_v5i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s11, s11, external_void_func_v5i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s11, s11, external_void_func_v5i32_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[10:11]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[10:11]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v5i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s4, 1			; GFX10-NEXT: s_mov_b32 s4, 1
	; GFX10-NEXT: s_mov_b32 s5, 2			; GFX10-NEXT: s_mov_b32 s5, 2
	; GFX10-NEXT: s_mov_b32 s6, 3			; GFX10-NEXT: s_mov_b32 s6, 3
	; GFX10-NEXT: s_mov_b32 s7, 4			; GFX10-NEXT: s_mov_b32 s7, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_mov_b32 s8, 5			; GFX10-NEXT: s_mov_b32 s8, 5
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[10:11]			; GFX10-NEXT: s_getpc_b64 s[10:11]
	; GFX10-NEXT: s_add_u32 s10, s10, external_void_func_v5i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s10, s10, external_void_func_v5i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s11, s11, external_void_func_v5i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s11, s11, external_void_func_v5i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[10:11]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[10:11]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1			; GFX10-SCRATCH-NEXT: s_mov_b32 s4, 1
	; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2			; GFX10-SCRATCH-NEXT: s_mov_b32 s5, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v5i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v5i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v5i32_inreg(<5 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v5i32_inreg(<5 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v8i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v8i32_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v8i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v8i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_load_dwordx8 s[4:11], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx8 s[4:11], s[4:5], 0x0
	; GFX9-NEXT: s_getpc_b64 s[12:13]			; GFX9-NEXT: s_getpc_b64 s[12:13]
	; GFX9-NEXT: s_add_u32 s12, s12, external_void_func_v8i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s12, s12, external_void_func_v8i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s13, s13, external_void_func_v8i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s13, s13, external_void_func_v8i32_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[12:13]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[12:13]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v8i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[12:13]			; GFX10-NEXT: s_getpc_b64 s[12:13]
	; GFX10-NEXT: s_add_u32 s12, s12, external_void_func_v8i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s12, s12, external_void_func_v8i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s13, s13, external_void_func_v8i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s13, s13, external_void_func_v8i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_load_dwordx8 s[4:11], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx8 s[4:11], s[4:5], 0x0
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[12:13]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[12:13]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx8 s[4:11], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%ptr = load <8 x i32> addrspace(4), <8 x i32> addrspace(4) addrspace(4)* undef			%ptr = load <8 x i32> addrspace(4), <8 x i32> addrspace(4) addrspace(4)* undef
	%val = load <8 x i32>, <8 x i32> addrspace(4)* %ptr			%val = load <8 x i32>, <8 x i32> addrspace(4)* %ptr
	call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg %val)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v8i32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v8i32_imm_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v8i32_imm_inreg:			; GFX9-LABEL: test_call_external_void_func_v8i32_imm_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_mov_b32 s4, 1			; GFX9-NEXT: s_mov_b32 s4, 1
	; GFX9-NEXT: s_mov_b32 s5, 2			; GFX9-NEXT: s_mov_b32 s5, 2
	; GFX9-NEXT: s_mov_b32 s6, 3			; GFX9-NEXT: s_mov_b32 s6, 3
	; GFX9-NEXT: s_mov_b32 s7, 4			; GFX9-NEXT: s_mov_b32 s7, 4
	; GFX9-NEXT: s_mov_b32 s8, 5			; GFX9-NEXT: s_mov_b32 s8, 5
	; GFX9-NEXT: s_mov_b32 s9, 6			; GFX9-NEXT: s_mov_b32 s9, 6
	; GFX9-NEXT: s_mov_b32 s10, 7			; GFX9-NEXT: s_mov_b32 s10, 7
	; GFX9-NEXT: s_mov_b32 s11, 8			; GFX9-NEXT: s_mov_b32 s11, 8
	; GFX9-NEXT: s_getpc_b64 s[12:13]			; GFX9-NEXT: s_getpc_b64 s[12:13]
	; GFX9-NEXT: s_add_u32 s12, s12, external_void_func_v8i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s12, s12, external_void_func_v8i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s13, s13, external_void_func_v8i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s13, s13, external_void_func_v8i32_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[12:13]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[12:13]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v8i32_imm_inreg:
	Show All 10 Lines
	; GFX10-NEXT: s_mov_b32 s6, 3			; GFX10-NEXT: s_mov_b32 s6, 3
	; GFX10-NEXT: s_mov_b32 s7, 4			; GFX10-NEXT: s_mov_b32 s7, 4
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: s_mov_b32 s8, 5			; GFX10-NEXT: s_mov_b32 s8, 5
	; GFX10-NEXT: s_mov_b32 s9, 6			; GFX10-NEXT: s_mov_b32 s9, 6
	; GFX10-NEXT: s_mov_b32 s10, 7			; GFX10-NEXT: s_mov_b32 s10, 7
	; GFX10-NEXT: s_mov_b32 s11, 8			; GFX10-NEXT: s_mov_b32 s11, 8
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[12:13]			; GFX10-NEXT: s_getpc_b64 s[12:13]
	; GFX10-NEXT: s_add_u32 s12, s12, external_void_func_v8i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s12, s12, external_void_func_v8i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s13, s13, external_void_func_v8i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s13, s13, external_void_func_v8i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[12:13]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[12:13]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	Show All 11 Lines
	; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3			; GFX10-SCRATCH-NEXT: s_mov_b32 s6, 3
	; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4			; GFX10-SCRATCH-NEXT: s_mov_b32 s7, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5			; GFX10-SCRATCH-NEXT: s_mov_b32 s8, 5
	; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 6			; GFX10-SCRATCH-NEXT: s_mov_b32 s9, 6
	; GFX10-SCRATCH-NEXT: s_mov_b32 s10, 7			; GFX10-SCRATCH-NEXT: s_mov_b32 s10, 7
	; GFX10-SCRATCH-NEXT: s_mov_b32 s11, 8			; GFX10-SCRATCH-NEXT: s_mov_b32 s11, 8
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v8i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v8i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)			call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v16i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v16i32_inreg() #0 {
	; GFX9-LABEL: test_call_external_void_func_v16i32_inreg:			; GFX9-LABEL: test_call_external_void_func_v16i32_inreg:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_load_dwordx16 s[4:19], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx16 s[4:19], s[4:5], 0x0
	; GFX9-NEXT: s_getpc_b64 s[20:21]			; GFX9-NEXT: s_getpc_b64 s[20:21]
	; GFX9-NEXT: s_add_u32 s20, s20, external_void_func_v16i32_inreg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s20, s20, external_void_func_v16i32_inreg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s21, s21, external_void_func_v16i32_inreg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s21, s21, external_void_func_v16i32_inreg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[20:21]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[20:21]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v16i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v16i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[20:21]			; GFX10-NEXT: s_getpc_b64 s[20:21]
	; GFX10-NEXT: s_add_u32 s20, s20, external_void_func_v16i32_inreg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s20, s20, external_void_func_v16i32_inreg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s21, s21, external_void_func_v16i32_inreg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s21, s21, external_void_func_v16i32_inreg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_load_dwordx16 s[4:19], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx16 s[4:19], s[4:5], 0x0
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[20:21]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[20:21]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_load_dwordx16 s[4:19], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx16 s[4:19], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i32_inreg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_v16i32_inreg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32_inreg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32_inreg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%ptr = load <16 x i32> addrspace(4), <16 x i32> addrspace(4) addrspace(4)* undef			%ptr = load <16 x i32> addrspace(4), <16 x i32> addrspace(4) addrspace(4)* undef
	Show All 26 Lines
	; GFX9-NEXT: v_writelane_b32 v40, s48, 12			; GFX9-NEXT: v_writelane_b32 v40, s48, 12
	; GFX9-NEXT: v_writelane_b32 v40, s49, 13			; GFX9-NEXT: v_writelane_b32 v40, s49, 13
	; GFX9-NEXT: v_writelane_b32 v40, s50, 14			; GFX9-NEXT: v_writelane_b32 v40, s50, 14
	; GFX9-NEXT: v_writelane_b32 v40, s51, 15			; GFX9-NEXT: v_writelane_b32 v40, s51, 15
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_load_dwordx16 s[4:19], s[20:21], 0x0			; GFX9-NEXT: s_load_dwordx16 s[4:19], s[20:21], 0x0
	; GFX9-NEXT: s_load_dwordx16 s[36:51], s[20:21], 0x40			; GFX9-NEXT: s_load_dwordx16 s[36:51], s[20:21], 0x40
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s30, 16			; GFX9-NEXT: v_writelane_b32 v40, s30, 16
	; GFX9-NEXT: v_writelane_b32 v40, s31, 17			; GFX9-NEXT: v_writelane_b32 v40, s31, 17
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, s46			; GFX9-NEXT: v_mov_b32_e32 v0, s46
	; GFX9-NEXT: v_mov_b32_e32 v1, s47			; GFX9-NEXT: v_mov_b32_e32 v1, s47
	; GFX9-NEXT: v_mov_b32_e32 v2, s48			; GFX9-NEXT: v_mov_b32_e32 v2, s48
	; GFX9-NEXT: v_mov_b32_e32 v3, s49			; GFX9-NEXT: v_mov_b32_e32 v3, s49
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	Show All 31 Lines
	; GFX9-NEXT: v_readlane_b32 s43, v40, 7			; GFX9-NEXT: v_readlane_b32 s43, v40, 7
	; GFX9-NEXT: v_readlane_b32 s42, v40, 6			; GFX9-NEXT: v_readlane_b32 s42, v40, 6
	; GFX9-NEXT: v_readlane_b32 s41, v40, 5			; GFX9-NEXT: v_readlane_b32 s41, v40, 5
	; GFX9-NEXT: v_readlane_b32 s40, v40, 4			; GFX9-NEXT: v_readlane_b32 s40, v40, 4
	; GFX9-NEXT: v_readlane_b32 s39, v40, 3			; GFX9-NEXT: v_readlane_b32 s39, v40, 3
	; GFX9-NEXT: v_readlane_b32 s38, v40, 2			; GFX9-NEXT: v_readlane_b32 s38, v40, 2
	; GFX9-NEXT: v_readlane_b32 s37, v40, 1			; GFX9-NEXT: v_readlane_b32 s37, v40, 1
	; GFX9-NEXT: v_readlane_b32 s36, v40, 0			; GFX9-NEXT: v_readlane_b32 s36, v40, 0
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 18			; GFX9-NEXT: v_readlane_b32 s33, v40, 18
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v32i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 18			; GFX10-NEXT: v_writelane_b32 v40, s33, 18
	; GFX10-NEXT: s_load_dwordx2 s[20:21], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[20:21], s[4:5], 0x0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s36, 0			; GFX10-NEXT: v_writelane_b32 v40, s36, 0
	; GFX10-NEXT: v_writelane_b32 v40, s37, 1			; GFX10-NEXT: v_writelane_b32 v40, s37, 1
	; GFX10-NEXT: v_writelane_b32 v40, s38, 2			; GFX10-NEXT: v_writelane_b32 v40, s38, 2
	; GFX10-NEXT: v_writelane_b32 v40, s39, 3			; GFX10-NEXT: v_writelane_b32 v40, s39, 3
	; GFX10-NEXT: v_writelane_b32 v40, s40, 4			; GFX10-NEXT: v_writelane_b32 v40, s40, 4
	; GFX10-NEXT: v_writelane_b32 v40, s41, 5			; GFX10-NEXT: v_writelane_b32 v40, s41, 5
	; GFX10-NEXT: v_writelane_b32 v40, s42, 6			; GFX10-NEXT: v_writelane_b32 v40, s42, 6
	; GFX10-NEXT: v_writelane_b32 v40, s43, 7			; GFX10-NEXT: v_writelane_b32 v40, s43, 7
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_readlane_b32 s43, v40, 7			; GFX10-NEXT: v_readlane_b32 s43, v40, 7
	; GFX10-NEXT: v_readlane_b32 s42, v40, 6			; GFX10-NEXT: v_readlane_b32 s42, v40, 6
	; GFX10-NEXT: v_readlane_b32 s41, v40, 5			; GFX10-NEXT: v_readlane_b32 s41, v40, 5
	; GFX10-NEXT: v_readlane_b32 s40, v40, 4			; GFX10-NEXT: v_readlane_b32 s40, v40, 4
	; GFX10-NEXT: v_readlane_b32 s39, v40, 3			; GFX10-NEXT: v_readlane_b32 s39, v40, 3
	; GFX10-NEXT: v_readlane_b32 s38, v40, 2			; GFX10-NEXT: v_readlane_b32 s38, v40, 2
	; GFX10-NEXT: v_readlane_b32 s37, v40, 1			; GFX10-NEXT: v_readlane_b32 s37, v40, 1
	; GFX10-NEXT: v_readlane_b32 s36, v40, 0			; GFX10-NEXT: v_readlane_b32 s36, v40, 0
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 18			; GFX10-NEXT: v_readlane_b32 s33, v40, 18
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 18			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 18
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s36, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s36, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s37, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s37, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s38, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s38, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s39, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s39, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s40, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s40, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s41, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s41, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s42, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s42, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s43, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s43, 7
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s43, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s43, v40, 7
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s42, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s42, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s41, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s41, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s40, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s40, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s39, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s39, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s38, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s38, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s37, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s37, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s36, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s36, v40, 0
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 18			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 18
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%ptr = load <32 x i32> addrspace(4), <32 x i32> addrspace(4) addrspace(4)* undef			%ptr = load <32 x i32> addrspace(4), <32 x i32> addrspace(4) addrspace(4)* undef
	Show All 28 Lines
	; GFX9-NEXT: v_writelane_b32 v40, s49, 13			; GFX9-NEXT: v_writelane_b32 v40, s49, 13
	; GFX9-NEXT: v_writelane_b32 v40, s50, 14			; GFX9-NEXT: v_writelane_b32 v40, s50, 14
	; GFX9-NEXT: v_writelane_b32 v40, s51, 15			; GFX9-NEXT: v_writelane_b32 v40, s51, 15
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_load_dwordx16 s[4:19], s[20:21], 0x0			; GFX9-NEXT: s_load_dwordx16 s[4:19], s[20:21], 0x0
	; GFX9-NEXT: s_load_dwordx16 s[36:51], s[20:21], 0x40			; GFX9-NEXT: s_load_dwordx16 s[36:51], s[20:21], 0x40
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: v_mov_b32_e32 v0, s22			; GFX9-NEXT: v_mov_b32_e32 v0, s22
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:24
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, s46			; GFX9-NEXT: v_mov_b32_e32 v0, s46
	; GFX9-NEXT: v_mov_b32_e32 v1, s47			; GFX9-NEXT: v_mov_b32_e32 v1, s47
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
	; GFX9-NEXT: v_mov_b32_e32 v0, s48			; GFX9-NEXT: v_mov_b32_e32 v0, s48
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	Show All 32 Lines
	; GFX9-NEXT: v_readlane_b32 s43, v40, 7			; GFX9-NEXT: v_readlane_b32 s43, v40, 7
	; GFX9-NEXT: v_readlane_b32 s42, v40, 6			; GFX9-NEXT: v_readlane_b32 s42, v40, 6
	; GFX9-NEXT: v_readlane_b32 s41, v40, 5			; GFX9-NEXT: v_readlane_b32 s41, v40, 5
	; GFX9-NEXT: v_readlane_b32 s40, v40, 4			; GFX9-NEXT: v_readlane_b32 s40, v40, 4
	; GFX9-NEXT: v_readlane_b32 s39, v40, 3			; GFX9-NEXT: v_readlane_b32 s39, v40, 3
	; GFX9-NEXT: v_readlane_b32 s38, v40, 2			; GFX9-NEXT: v_readlane_b32 s38, v40, 2
	; GFX9-NEXT: v_readlane_b32 s37, v40, 1			; GFX9-NEXT: v_readlane_b32 s37, v40, 1
	; GFX9-NEXT: v_readlane_b32 s36, v40, 0			; GFX9-NEXT: v_readlane_b32 s36, v40, 0
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 18			; GFX9-NEXT: v_readlane_b32 s33, v40, 18
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32_i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v32i32_i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 18			; GFX10-NEXT: v_writelane_b32 v40, s33, 18
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: s_load_dwordx2 s[20:21], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[20:21], s[4:5], 0x0
	; GFX10-NEXT: s_load_dword s22, s[4:5], 0x0			; GFX10-NEXT: s_load_dword s22, s[4:5], 0x0
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s36, 0			; GFX10-NEXT: v_writelane_b32 v40, s36, 0
	; GFX10-NEXT: v_writelane_b32 v40, s37, 1			; GFX10-NEXT: v_writelane_b32 v40, s37, 1
	; GFX10-NEXT: v_writelane_b32 v40, s38, 2			; GFX10-NEXT: v_writelane_b32 v40, s38, 2
	; GFX10-NEXT: v_writelane_b32 v40, s39, 3			; GFX10-NEXT: v_writelane_b32 v40, s39, 3
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: v_mov_b32_e32 v0, s22			; GFX10-NEXT: v_mov_b32_e32 v0, s22
	; GFX10-NEXT: v_writelane_b32 v40, s40, 4			; GFX10-NEXT: v_writelane_b32 v40, s40, 4
	; GFX10-NEXT: v_writelane_b32 v40, s41, 5			; GFX10-NEXT: v_writelane_b32 v40, s41, 5
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_readlane_b32 s43, v40, 7			; GFX10-NEXT: v_readlane_b32 s43, v40, 7
	; GFX10-NEXT: v_readlane_b32 s42, v40, 6			; GFX10-NEXT: v_readlane_b32 s42, v40, 6
	; GFX10-NEXT: v_readlane_b32 s41, v40, 5			; GFX10-NEXT: v_readlane_b32 s41, v40, 5
	; GFX10-NEXT: v_readlane_b32 s40, v40, 4			; GFX10-NEXT: v_readlane_b32 s40, v40, 4
	; GFX10-NEXT: v_readlane_b32 s39, v40, 3			; GFX10-NEXT: v_readlane_b32 s39, v40, 3
	; GFX10-NEXT: v_readlane_b32 s38, v40, 2			; GFX10-NEXT: v_readlane_b32 s38, v40, 2
	; GFX10-NEXT: v_readlane_b32 s37, v40, 1			; GFX10-NEXT: v_readlane_b32 s37, v40, 1
	; GFX10-NEXT: v_readlane_b32 s36, v40, 0			; GFX10-NEXT: v_readlane_b32 s36, v40, 0
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 18			; GFX10-NEXT: v_readlane_b32 s33, v40, 18
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 18			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 18
	; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0			; GFX10-SCRATCH-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s36, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s36, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s37, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s37, 1
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s38, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s38, 2
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s39, 3			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s39, 3
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s40, 4			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s40, 4
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s41, 5			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s41, 5
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s42, 6			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s42, 6
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s43, 7			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s43, 7
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s43, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s43, v40, 7
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s42, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s42, v40, 6
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s41, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s41, v40, 5
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s40, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s40, v40, 4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s39, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s39, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s38, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s38, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s37, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s37, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s36, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s36, v40, 0
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 18			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 18
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	%ptr0 = load <32 x i32> addrspace(4), <32 x i32> addrspace(4) addrspace(4)* undef			%ptr0 = load <32 x i32> addrspace(4), <32 x i32> addrspace(4) addrspace(4)* undef
	Show All 10 Lines
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33			; GFX9-NEXT: buffer_load_dword v32, off, s[0:3], s33
	; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4			; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, stack_passed_f64_arg@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, stack_passed_f64_arg@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, stack_passed_f64_arg@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, stack_passed_f64_arg@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_waitcnt vmcnt(1)			; GFX9-NEXT: s_waitcnt vmcnt(1)
	; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32
	; GFX9-NEXT: s_waitcnt vmcnt(1)			; GFX9-NEXT: s_waitcnt vmcnt(1)
	; GFX9-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: stack_passed_arg_alignment_v32i32_f64:			; GFX10-LABEL: stack_passed_arg_alignment_v32i32_f64:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_clause 0x1			; GFX10-NEXT: s_clause 0x1
	; GFX10-NEXT: buffer_load_dword v32, off, s[0:3], s33			; GFX10-NEXT: buffer_load_dword v32, off, s[0:3], s33
	; GFX10-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:4
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, stack_passed_f64_arg@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, stack_passed_f64_arg@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, stack_passed_f64_arg@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, stack_passed_f64_arg@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_waitcnt vmcnt(1)			; GFX10-NEXT: s_waitcnt vmcnt(1)
	; GFX10-NEXT: buffer_store_dword v32, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v32, off, s[0:3], s32
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4			; GFX10-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: stack_passed_arg_alignment_v32i32_f64:			; GFX10-SCRATCH-LABEL: stack_passed_arg_alignment_v32i32_f64:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 offset:8 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: scratch_load_dwordx2 v[32:33], off, s33			; GFX10-SCRATCH-NEXT: scratch_load_dwordx2 v[32:33], off, s33
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, stack_passed_f64_arg@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, stack_passed_f64_arg@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, stack_passed_f64_arg@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, stack_passed_f64_arg@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx2 off, v[32:33], s32			; GFX10-SCRATCH-NEXT: scratch_store_dwordx2 off, v[32:33], s32
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:8 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 offset:8 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	entry:			entry:
	call amdgpu_gfx void @stack_passed_f64_arg(<32 x i32> %val, double %tmp)			call amdgpu_gfx void @stack_passed_f64_arg(<32 x i32> %val, double %tmp)
	ret void			ret void
	}			}

	define amdgpu_gfx void @stack_12xv3i32() #0 {			define amdgpu_gfx void @stack_12xv3i32() #0 {
	; GFX9-LABEL: stack_12xv3i32:			; GFX9-LABEL: stack_12xv3i32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 12			; GFX9-NEXT: v_mov_b32_e32 v0, 12
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 13			; GFX9-NEXT: v_mov_b32_e32 v0, 13
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; GFX9-NEXT: v_mov_b32_e32 v0, 14			; GFX9-NEXT: v_mov_b32_e32 v0, 14
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; GFX9-NEXT: v_mov_b32_e32 v0, 15			; GFX9-NEXT: v_mov_b32_e32 v0, 15
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	Show All 32 Lines
	; GFX9-NEXT: v_mov_b32_e32 v31, 11			; GFX9-NEXT: v_mov_b32_e32 v31, 11
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_12xv3i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_12xv3i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: stack_12xv3i32:			; GFX10-LABEL: stack_12xv3i32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: v_mov_b32_e32 v0, 12			; GFX10-NEXT: v_mov_b32_e32 v0, 12
	; GFX10-NEXT: v_mov_b32_e32 v1, 13			; GFX10-NEXT: v_mov_b32_e32 v1, 13
	; GFX10-NEXT: v_mov_b32_e32 v2, 14			; GFX10-NEXT: v_mov_b32_e32 v2, 14
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_mov_b32_e32 v3, 15			; GFX10-NEXT: v_mov_b32_e32 v3, 15
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
	; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:12			; GFX10-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:12
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: v_mov_b32_e32 v1, 0			; GFX10-NEXT: v_mov_b32_e32 v1, 0
	Show All 29 Lines
	; GFX10-NEXT: v_mov_b32_e32 v31, 11			; GFX10-NEXT: v_mov_b32_e32 v31, 11
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_12xv3i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_12xv3i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-SCRATCH-LABEL: stack_12xv3i32:			; GFX10-SCRATCH-LABEL: stack_12xv3i32:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill			; GFX10-SCRATCH-NEXT: scratch_store_dword off, v40, s32 ; 4-byte Folded Spill
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 12			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 12
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 1
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32			; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 1
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 1
	Show All 25 Lines
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v31, 11			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v31, 11
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_12xv3i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_12xv3i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_12xv3i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_12xv3i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	entry:			entry:
	Show All 17 Lines
	; GFX9-LABEL: stack_8xv5i32:			; GFX9-LABEL: stack_8xv5i32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 8			; GFX9-NEXT: v_mov_b32_e32 v0, 8
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 9			; GFX9-NEXT: v_mov_b32_e32 v0, 9
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; GFX9-NEXT: v_mov_b32_e32 v0, 10			; GFX9-NEXT: v_mov_b32_e32 v0, 10
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; GFX9-NEXT: v_mov_b32_e32 v0, 11			; GFX9-NEXT: v_mov_b32_e32 v0, 11
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	Show All 40 Lines
	; GFX9-NEXT: v_mov_b32_e32 v31, 7			; GFX9-NEXT: v_mov_b32_e32 v31, 7
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_8xv5i32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_8xv5i32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: stack_8xv5i32:			; GFX10-LABEL: stack_8xv5i32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_mov_b32_e32 v0, 8			; GFX10-NEXT: v_mov_b32_e32 v0, 8
	; GFX10-NEXT: v_mov_b32_e32 v1, 9			; GFX10-NEXT: v_mov_b32_e32 v1, 9
	; GFX10-NEXT: v_mov_b32_e32 v2, 10			; GFX10-NEXT: v_mov_b32_e32 v2, 10
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_mov_b32_e32 v3, 14			; GFX10-NEXT: v_mov_b32_e32 v3, 14
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
	; GFX10-NEXT: v_mov_b32_e32 v0, 11			; GFX10-NEXT: v_mov_b32_e32 v0, 11
	; GFX10-NEXT: v_mov_b32_e32 v1, 12			; GFX10-NEXT: v_mov_b32_e32 v1, 12
	; GFX10-NEXT: v_mov_b32_e32 v2, 13			; GFX10-NEXT: v_mov_b32_e32 v2, 13
	; GFX10-NEXT: v_mov_b32_e32 v4, 15			; GFX10-NEXT: v_mov_b32_e32 v4, 15
	Show All 37 Lines
	; GFX10-NEXT: v_mov_b32_e32 v31, 7			; GFX10-NEXT: v_mov_b32_e32 v31, 7
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_8xv5i32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_8xv5i32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	Show All 10 Lines
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 13
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 14
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 15
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 8			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 8
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 9			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 9
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 10			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 10
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 11			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 11
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16			; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32			; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0
	Show All 26 Lines
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v31, 7			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v31, 7
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_8xv5i32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_8xv5i32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5i32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5i32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	entry:			entry:
	Show All 13 Lines
	; GFX9-LABEL: stack_8xv5f32:			; GFX9-LABEL: stack_8xv5f32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41000000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41000000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41100000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41100000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:4
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41200000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41200000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:8
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x41300000			; GFX9-NEXT: v_mov_b32_e32 v0, 0x41300000
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:12
	Show All 40 Lines
	; GFX9-NEXT: v_mov_b32_e32 v31, 0x40e00000			; GFX9-NEXT: v_mov_b32_e32 v31, 0x40e00000
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_8xv5f32@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_8xv5f32@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: stack_8xv5f32:			; GFX10-LABEL: stack_8xv5f32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x41000000			; GFX10-NEXT: v_mov_b32_e32 v0, 0x41000000
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x41100000			; GFX10-NEXT: v_mov_b32_e32 v1, 0x41100000
	; GFX10-NEXT: v_mov_b32_e32 v2, 0x41200000			; GFX10-NEXT: v_mov_b32_e32 v2, 0x41200000
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_mov_b32_e32 v3, 0x41600000			; GFX10-NEXT: v_mov_b32_e32 v3, 0x41600000
	; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32			; GFX10-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4			; GFX10-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
	; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8			; GFX10-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
	; GFX10-NEXT: v_mov_b32_e32 v0, 0x41300000			; GFX10-NEXT: v_mov_b32_e32 v0, 0x41300000
	; GFX10-NEXT: v_mov_b32_e32 v1, 0x41400000			; GFX10-NEXT: v_mov_b32_e32 v1, 0x41400000
	; GFX10-NEXT: v_mov_b32_e32 v2, 0x41500000			; GFX10-NEXT: v_mov_b32_e32 v2, 0x41500000
	; GFX10-NEXT: v_mov_b32_e32 v4, 0x41700000			; GFX10-NEXT: v_mov_b32_e32 v4, 0x41700000
	Show All 37 Lines
	; GFX10-NEXT: v_mov_b32_e32 v31, 0x40e00000			; GFX10-NEXT: v_mov_b32_e32 v31, 0x40e00000
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_8xv5f32@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_8xv5f32@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	;			;
	Show All 10 Lines
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x41500000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0x41500000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0x41600000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0x41600000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x41700000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0x41700000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0x41000000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0x41000000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x41100000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v5, 0x41100000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 0x41200000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v6, 0x41200000
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 0x41300000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v7, 0x41300000
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s32
	; GFX10-SCRATCH-NEXT: s_add_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, 16
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16			; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[0:3], s32 offset:16
	; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32			; GFX10-SCRATCH-NEXT: scratch_store_dwordx4 off, v[4:7], s32
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v1, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v2, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v3, 0
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v4, 0
	Show All 26 Lines
	; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v31, 0x40e00000			; GFX10-SCRATCH-NEXT: v_mov_b32_e32 v31, 0x40e00000
	; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_getpc_b64 s[0:1]
	; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_8xv5f32@rel32@lo+4			; GFX10-SCRATCH-NEXT: s_add_u32 s0, s0, external_void_func_8xv5f32@rel32@lo+4
	; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5f32@rel32@hi+12			; GFX10-SCRATCH-NEXT: s_addc_u32 s1, s1, external_void_func_8xv5f32@rel32@hi+12
	; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-SCRATCH-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s1, v40, 1
	; GFX10-SCRATCH-NEXT: s_sub_u32 s32, s32, 16			; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s2, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s32 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s2
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[0:1]
	entry:			entry:
	Show All 24 Lines

llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll

	Show All 10 Lines
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 4			; GFX9-NEXT: v_writelane_b32 v40, s33, 4
	; GFX9-NEXT: v_writelane_b32 v40, s34, 0			; GFX9-NEXT: v_writelane_b32 v40, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s35, 1			; GFX9-NEXT: v_writelane_b32 v40, s35, 1
	; GFX9-NEXT: v_writelane_b32 v40, s30, 2			; GFX9-NEXT: v_writelane_b32 v40, s30, 2
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[34:35]			; GFX9-NEXT: s_getpc_b64 s[34:35]
	; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 3			; GFX9-NEXT: v_writelane_b32 v40, s31, 3
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 2			; GFX9-NEXT: v_readlane_b32 s4, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 3			; GFX9-NEXT: v_readlane_b32 s5, v40, 3
	; GFX9-NEXT: v_readlane_b32 s35, v40, 1			; GFX9-NEXT: v_readlane_b32 s35, v40, 1
	; GFX9-NEXT: v_readlane_b32 s34, v40, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 0
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 4			; GFX9-NEXT: v_readlane_b32 s33, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:			; GFX10-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 4			; GFX10-NEXT: v_writelane_b32 v40, s33, 4
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: v_writelane_b32 v40, s34, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s35, 1			; GFX10-NEXT: v_writelane_b32 v40, s35, 1
	; GFX10-NEXT: s_getpc_b64 s[34:35]			; GFX10-NEXT: s_getpc_b64 s[34:35]
	; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s34, s34, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s35, s35, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 2			; GFX10-NEXT: v_writelane_b32 v40, s30, 2
	; GFX10-NEXT: v_writelane_b32 v40, s31, 3			; GFX10-NEXT: v_writelane_b32 v40, s31, 3
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 2			; GFX10-NEXT: v_readlane_b32 s4, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 3			; GFX10-NEXT: v_readlane_b32 s5, v40, 3
	; GFX10-NEXT: v_readlane_b32 s35, v40, 1			; GFX10-NEXT: v_readlane_b32 s35, v40, 1
	; GFX10-NEXT: v_readlane_b32 s34, v40, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 0
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 4			; GFX10-NEXT: v_readlane_b32 s33, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	Show All 31 Lines
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3			; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s34, 0			; GFX9-NEXT: v_writelane_b32 v40, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s31			; GFX9-NEXT: ; def s31
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: s_mov_b32 s34, s31			; GFX9-NEXT: s_mov_b32 s34, s31
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 1			; GFX9-NEXT: v_readlane_b32 s4, v40, 1
	; GFX9-NEXT: s_mov_b32 s31, s34			; GFX9-NEXT: s_mov_b32 s31, s34
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use s31			; GFX9-NEXT: ; use s31
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s5, v40, 2			; GFX9-NEXT: v_readlane_b32 s5, v40, 2
	; GFX9-NEXT: v_readlane_b32 s34, v40, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 0
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_void_func_void_mayclobber_s31:			; GFX10-LABEL: test_call_void_func_void_mayclobber_s31:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s33, 3
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s34, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 0
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s31			; GFX10-NEXT: ; def s31
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: s_mov_b32 s34, s31			; GFX10-NEXT: s_mov_b32 s34, s31
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 1			; GFX10-NEXT: v_readlane_b32 s4, v40, 1
	; GFX10-NEXT: s_mov_b32 s31, s34			; GFX10-NEXT: s_mov_b32 s31, s34
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s31			; GFX10-NEXT: ; use s31
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_readlane_b32 s5, v40, 2			; GFX10-NEXT: v_readlane_b32 s5, v40, 2
	; GFX10-NEXT: v_readlane_b32 s34, v40, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 0
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	%s31 = call i32 asm sideeffect "; def $0", "={s31}"()			%s31 = call i32 asm sideeffect "; def $0", "={s31}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{s31}"(i32 %s31)			call void asm sideeffect "; use $0", "{s31}"(i32 %s31)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_void_func_void_mayclobber_v31(i32 addrspace(1)* %out) #0 {			define amdgpu_gfx void @test_call_void_func_void_mayclobber_v31(i32 addrspace(1)* %out) #0 {
	; GFX9-LABEL: test_call_void_func_void_mayclobber_v31:			; GFX9-LABEL: test_call_void_func_void_mayclobber_v31:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v41, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 2
	; GFX9-NEXT: v_writelane_b32 v41, s30, 0			; GFX9-NEXT: v_writelane_b32 v41, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def v31			; GFX9-NEXT: ; def v31
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v41, s31, 1			; GFX9-NEXT: v_writelane_b32 v41, s31, 1
	; GFX9-NEXT: v_mov_b32_e32 v40, v31			; GFX9-NEXT: v_mov_b32_e32 v40, v31
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_mov_b32_e32 v31, v40			; GFX9-NEXT: v_mov_b32_e32 v31, v40
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v31			; GFX9-NEXT: ; use v31
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s4, v41, 0			; GFX9-NEXT: v_readlane_b32 s4, v41, 0
	; GFX9-NEXT: v_readlane_b32 s5, v41, 1			; GFX9-NEXT: v_readlane_b32 s5, v41, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v41, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_void_func_void_mayclobber_v31:			; GFX10-LABEL: test_call_void_func_void_mayclobber_v31:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v41, s33, 2			; GFX10-NEXT: v_writelane_b32 v41, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def v31			; GFX10-NEXT: ; def v31
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v41, s30, 0			; GFX10-NEXT: v_writelane_b32 v41, s30, 0
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: v_mov_b32_e32 v40, v31			; GFX10-NEXT: v_mov_b32_e32 v40, v31
	; GFX10-NEXT: v_writelane_b32 v41, s31, 1			; GFX10-NEXT: v_writelane_b32 v41, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_mov_b32_e32 v31, v40			; GFX10-NEXT: v_mov_b32_e32 v31, v40
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v31			; GFX10-NEXT: ; use v31
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: v_readlane_b32 s4, v41, 0			; GFX10-NEXT: v_readlane_b32 s4, v41, 0
	; GFX10-NEXT: v_readlane_b32 s5, v41, 1			; GFX10-NEXT: v_readlane_b32 s5, v41, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v41, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	%v31 = call i32 asm sideeffect "; def $0", "={v31}"()			%v31 = call i32 asm sideeffect "; def $0", "={v31}"()
	Show All 9 Lines
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3			; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: v_writelane_b32 v40, s33, 0			; GFX9-NEXT: v_writelane_b32 v40, s33, 0
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s33			; GFX9-NEXT: ; def s33
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use s33			; GFX9-NEXT: ; use s33
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s4, v40, 1			; GFX9-NEXT: v_readlane_b32 s4, v40, 1
	; GFX9-NEXT: v_readlane_b32 s33, v40, 0			; GFX9-NEXT: v_readlane_b32 s33, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 2			; GFX9-NEXT: v_readlane_b32 s5, v40, 2
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_void_func_void_preserves_s33:			; GFX10-LABEL: test_call_void_func_void_preserves_s33:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s33, 3
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s33, 0			; GFX10-NEXT: v_writelane_b32 v40, s33, 0
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s33			; GFX10-NEXT: ; def s33
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s33			; GFX10-NEXT: ; use s33
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_readlane_b32 s4, v40, 1			; GFX10-NEXT: v_readlane_b32 s4, v40, 1
	; GFX10-NEXT: v_readlane_b32 s33, v40, 0			; GFX10-NEXT: v_readlane_b32 s33, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 2			; GFX10-NEXT: v_readlane_b32 s5, v40, 2
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	%s33 = call i32 asm sideeffect "; def $0", "={s33}"()			%s33 = call i32 asm sideeffect "; def $0", "={s33}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{s33}"(i32 %s33)			call void asm sideeffect "; use $0", "{s33}"(i32 %s33)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_void_func_void_preserves_s34(i32 addrspace(1)* %out) #0 {			define amdgpu_gfx void @test_call_void_func_void_preserves_s34(i32 addrspace(1)* %out) #0 {
	; GFX9-LABEL: test_call_void_func_void_preserves_s34:			; GFX9-LABEL: test_call_void_func_void_preserves_s34:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3			; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s34, 0			; GFX9-NEXT: v_writelane_b32 v40, s34, 0
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s34			; GFX9-NEXT: ; def s34
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 1			; GFX9-NEXT: v_readlane_b32 s4, v40, 1
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use s34			; GFX9-NEXT: ; use s34
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s5, v40, 2			; GFX9-NEXT: v_readlane_b32 s5, v40, 2
	; GFX9-NEXT: v_readlane_b32 s34, v40, 0			; GFX9-NEXT: v_readlane_b32 s34, v40, 0
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_void_func_void_preserves_s34:			; GFX10-LABEL: test_call_void_func_void_preserves_s34:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s33, 3
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s34, 0			; GFX10-NEXT: v_writelane_b32 v40, s34, 0
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s34			; GFX10-NEXT: ; def s34
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 1			; GFX10-NEXT: v_readlane_b32 s4, v40, 1
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s34			; GFX10-NEXT: ; use s34
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_readlane_b32 s5, v40, 2			; GFX10-NEXT: v_readlane_b32 s5, v40, 2
	; GFX10-NEXT: v_readlane_b32 s34, v40, 0			; GFX10-NEXT: v_readlane_b32 s34, v40, 0
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	%s34 = call i32 asm sideeffect "; def $0", "={s34}"()			%s34 = call i32 asm sideeffect "; def $0", "={s34}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{s34}"(i32 %s34)			call void asm sideeffect "; use $0", "{s34}"(i32 %s34)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_void_func_void_preserves_v40(i32 addrspace(1)* %out) #0 {			define amdgpu_gfx void @test_call_void_func_void_preserves_v40(i32 addrspace(1)* %out) #0 {
	; GFX9-LABEL: test_call_void_func_void_preserves_v40:			; GFX9-LABEL: test_call_void_func_void_preserves_v40:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v41, s33, 2			; GFX9-NEXT: v_writelane_b32 v41, s33, 2
	; GFX9-NEXT: v_writelane_b32 v41, s30, 0			; GFX9-NEXT: v_writelane_b32 v41, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v41, s31, 1			; GFX9-NEXT: v_writelane_b32 v41, s31, 1
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def v40			; GFX9-NEXT: ; def v40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v40			; GFX9-NEXT: ; use v40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s4, v41, 0			; GFX9-NEXT: v_readlane_b32 s4, v41, 0
	; GFX9-NEXT: v_readlane_b32 s5, v41, 1			; GFX9-NEXT: v_readlane_b32 s5, v41, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v41, 2			; GFX9-NEXT: v_readlane_b32 s33, v41, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_void_func_void_preserves_v40:			; GFX10-LABEL: test_call_void_func_void_preserves_v40:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v41, s33, 2			; GFX10-NEXT: v_writelane_b32 v41, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v41, s30, 0			; GFX10-NEXT: v_writelane_b32 v41, s30, 0
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def v40			; GFX10-NEXT: ; def v40
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v41, s31, 1			; GFX10-NEXT: v_writelane_b32 v41, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v40			; GFX10-NEXT: ; use v40
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: v_readlane_b32 s4, v41, 0			; GFX10-NEXT: v_readlane_b32 s4, v41, 0
	; GFX10-NEXT: v_readlane_b32 s5, v41, 1			; GFX10-NEXT: v_readlane_b32 s5, v41, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v41, 2			; GFX10-NEXT: v_readlane_b32 s33, v41, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	%v40 = call i32 asm sideeffect "; def $0", "={v40}"()			%v40 = call i32 asm sideeffect "; def $0", "={v40}"()
	▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, void_func_void_clobber_s33@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, void_func_void_clobber_s33@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, void_func_void_clobber_s33@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, void_func_void_clobber_s33@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_void_func_void_clobber_s33:			; GFX10-LABEL: test_call_void_func_void_clobber_s33:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, void_func_void_clobber_s33@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, void_func_void_clobber_s33@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, void_func_void_clobber_s33@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, void_func_void_clobber_s33@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	call amdgpu_gfx void @void_func_void_clobber_s33()			call amdgpu_gfx void @void_func_void_clobber_s33()
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_void_func_void_clobber_s34() #0 {			define amdgpu_gfx void @test_call_void_func_void_clobber_s34() #0 {
	; GFX9-LABEL: test_call_void_func_void_clobber_s34:			; GFX9-LABEL: test_call_void_func_void_clobber_s34:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 2			; GFX9-NEXT: v_writelane_b32 v40, s33, 2
	; GFX9-NEXT: v_writelane_b32 v40, s30, 0			; GFX9-NEXT: v_writelane_b32 v40, s30, 0
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, void_func_void_clobber_s34@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, void_func_void_clobber_s34@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, void_func_void_clobber_s34@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, void_func_void_clobber_s34@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 2			; GFX9-NEXT: v_readlane_b32 s33, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: test_call_void_func_void_clobber_s34:			; GFX10-LABEL: test_call_void_func_void_clobber_s34:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 2			; GFX10-NEXT: v_writelane_b32 v40, s33, 2
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, void_func_void_clobber_s34@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, void_func_void_clobber_s34@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, void_func_void_clobber_s34@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, void_func_void_clobber_s34@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s30, 0			; GFX10-NEXT: v_writelane_b32 v40, s30, 0
	; GFX10-NEXT: v_writelane_b32 v40, s31, 1			; GFX10-NEXT: v_writelane_b32 v40, s31, 1
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 2			; GFX10-NEXT: v_readlane_b32 s33, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	call amdgpu_gfx void @void_func_void_clobber_s34()			call amdgpu_gfx void @void_func_void_clobber_s34()
	ret void			ret void
	}			}

	define amdgpu_gfx void @callee_saved_sgpr_kernel() #1 {			define amdgpu_gfx void @callee_saved_sgpr_kernel() #1 {
	; GFX9-LABEL: callee_saved_sgpr_kernel:			; GFX9-LABEL: callee_saved_sgpr_kernel:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v40, s33, 3			; GFX9-NEXT: v_writelane_b32 v40, s33, 3
	; GFX9-NEXT: v_writelane_b32 v40, s40, 0			; GFX9-NEXT: v_writelane_b32 v40, s40, 0
	; GFX9-NEXT: v_writelane_b32 v40, s30, 1			; GFX9-NEXT: v_writelane_b32 v40, s30, 1
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v40, s31, 2			; GFX9-NEXT: v_writelane_b32 v40, s31, 2
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s40			; GFX9-NEXT: ; def s40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s4, v40, 1			; GFX9-NEXT: v_readlane_b32 s4, v40, 1
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use s40			; GFX9-NEXT: ; use s40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s5, v40, 2			; GFX9-NEXT: v_readlane_b32 s5, v40, 2
	; GFX9-NEXT: v_readlane_b32 s40, v40, 0			; GFX9-NEXT: v_readlane_b32 s40, v40, 0
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v40, 3			; GFX9-NEXT: v_readlane_b32 s33, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: callee_saved_sgpr_kernel:			; GFX10-LABEL: callee_saved_sgpr_kernel:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v40, s33, 3			; GFX10-NEXT: v_writelane_b32 v40, s33, 3
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: v_writelane_b32 v40, s40, 0			; GFX10-NEXT: v_writelane_b32 v40, s40, 0
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s40			; GFX10-NEXT: ; def s40
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_writelane_b32 v40, s30, 1			; GFX10-NEXT: v_writelane_b32 v40, s30, 1
	; GFX10-NEXT: v_writelane_b32 v40, s31, 2			; GFX10-NEXT: v_writelane_b32 v40, s31, 2
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX10-NEXT: v_readlane_b32 s4, v40, 1			; GFX10-NEXT: v_readlane_b32 s4, v40, 1
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use s40			; GFX10-NEXT: ; use s40
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: v_readlane_b32 s5, v40, 2			; GFX10-NEXT: v_readlane_b32 s5, v40, 2
	; GFX10-NEXT: v_readlane_b32 s40, v40, 0			; GFX10-NEXT: v_readlane_b32 s40, v40, 0
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v40, 3			; GFX10-NEXT: v_readlane_b32 s33, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0			%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "s"(i32 %s40) #0			call void asm sideeffect "; use $0", "s"(i32 %s40) #0
	ret void			ret void
	}			}

	define amdgpu_gfx void @callee_saved_sgpr_vgpr_kernel() #1 {			define amdgpu_gfx void @callee_saved_sgpr_vgpr_kernel() #1 {
	; GFX9-LABEL: callee_saved_sgpr_vgpr_kernel:			; GFX9-LABEL: callee_saved_sgpr_vgpr_kernel:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v41, s33, 3			; GFX9-NEXT: v_writelane_b32 v41, s33, 3
	; GFX9-NEXT: v_writelane_b32 v41, s40, 0			; GFX9-NEXT: v_writelane_b32 v41, s40, 0
	; GFX9-NEXT: v_writelane_b32 v41, s30, 1			; GFX9-NEXT: v_writelane_b32 v41, s30, 1
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def s40			; GFX9-NEXT: ; def s40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; def v32			; GFX9-NEXT: ; def v32
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v41, s31, 2			; GFX9-NEXT: v_writelane_b32 v41, s31, 2
	; GFX9-NEXT: v_mov_b32_e32 v40, v32			; GFX9-NEXT: v_mov_b32_e32 v40, v32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use s40			; GFX9-NEXT: ; use s40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; use v40			; GFX9-NEXT: ; use v40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s4, v41, 1			; GFX9-NEXT: v_readlane_b32 s4, v41, 1
	; GFX9-NEXT: v_readlane_b32 s5, v41, 2			; GFX9-NEXT: v_readlane_b32 s5, v41, 2
	; GFX9-NEXT: v_readlane_b32 s40, v41, 0			; GFX9-NEXT: v_readlane_b32 s40, v41, 0
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-NEXT: v_readlane_b32 s33, v41, 3			; GFX9-NEXT: v_readlane_b32 s33, v41, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	;			;
	; GFX10-LABEL: callee_saved_sgpr_vgpr_kernel:			; GFX10-LABEL: callee_saved_sgpr_vgpr_kernel:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: v_writelane_b32 v41, s33, 3			; GFX10-NEXT: v_writelane_b32 v41, s33, 3
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_add_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, external_void_func_void@rel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; GFX10-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; GFX10-NEXT: v_writelane_b32 v41, s40, 0			; GFX10-NEXT: v_writelane_b32 v41, s40, 0
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; def s40			; GFX10-NEXT: ; def s40
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	Show All 9 Lines
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: ;;#ASMSTART			; GFX10-NEXT: ;;#ASMSTART
	; GFX10-NEXT: ; use v40			; GFX10-NEXT: ; use v40
	; GFX10-NEXT: ;;#ASMEND			; GFX10-NEXT: ;;#ASMEND
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: v_readlane_b32 s4, v41, 1			; GFX10-NEXT: v_readlane_b32 s4, v41, 1
	; GFX10-NEXT: v_readlane_b32 s5, v41, 2			; GFX10-NEXT: v_readlane_b32 s5, v41, 2
	; GFX10-NEXT: v_readlane_b32 s40, v41, 0			; GFX10-NEXT: v_readlane_b32 s40, v41, 0
	; GFX10-NEXT: s_sub_u32 s32, s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0xfe00
	; GFX10-NEXT: v_readlane_b32 s33, v41, 3			; GFX10-NEXT: v_readlane_b32 s33, v41, 3
	; GFX10-NEXT: s_or_saveexec_b32 s6, -1			; GFX10-NEXT: s_or_saveexec_b32 s6, -1
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s6			; GFX10-NEXT: s_mov_b32 exec_lo, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[4:5]			; GFX10-NEXT: s_setpc_b64 s[4:5]
	%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0			%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0
	Show All 9 Lines

llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll

Show First 20 Lines • Show All 1,236 Lines • ▼ Show 20 Lines	entry:
ret <512 x i32> zeroinitializer		ret <512 x i32> zeroinitializer
}		}

define amdgpu_gfx void @call_512xi32() #0 {		define amdgpu_gfx void @call_512xi32() #0 {
; GFX9-LABEL: call_512xi32:		; GFX9-LABEL: call_512xi32:
; GFX9: ; %bb.0: ; %entry		; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_mov_b32 s8, s33		; GFX9-NEXT: s_mov_b32 s8, s33
; GFX9-NEXT: s_add_u32 s33, s32, 0x1ffc0		; GFX9-NEXT: s_add_i32 s33, s32, 0x1ffc0
; GFX9-NEXT: s_and_b32 s33, s33, 0xfffe0000		; GFX9-NEXT: s_and_b32 s33, s33, 0xfffe0000
; GFX9-NEXT: s_add_u32 s32, s32, 0x60000		; GFX9-NEXT: s_add_i32 s32, s32, 0x60000
; GFX9-NEXT: s_getpc_b64 s[6:7]		; GFX9-NEXT: s_getpc_b64 s[6:7]
; GFX9-NEXT: s_add_u32 s6, s6, return_512xi32@gotpcrel32@lo+4		; GFX9-NEXT: s_add_u32 s6, s6, return_512xi32@gotpcrel32@lo+4
; GFX9-NEXT: s_addc_u32 s7, s7, return_512xi32@gotpcrel32@hi+12		; GFX9-NEXT: s_addc_u32 s7, s7, return_512xi32@gotpcrel32@hi+12
; GFX9-NEXT: s_load_dwordx2 s[6:7], s[6:7], 0x0		; GFX9-NEXT: s_load_dwordx2 s[6:7], s[6:7], 0x0
; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33		; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s33
; GFX9-NEXT: s_mov_b64 s[4:5], s[30:31]		; GFX9-NEXT: s_mov_b64 s[4:5], s[30:31]
; GFX9-NEXT: s_waitcnt lgkmcnt(0)		; GFX9-NEXT: s_waitcnt lgkmcnt(0)
; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]		; GFX9-NEXT: s_swappc_b64 s[30:31], s[6:7]
; GFX9-NEXT: s_sub_u32 s32, s32, 0x60000		; GFX9-NEXT: s_add_i32 s32, s32, 0xfffa0000
; GFX9-NEXT: s_mov_b32 s33, s8		; GFX9-NEXT: s_mov_b32 s33, s8
; GFX9-NEXT: s_setpc_b64 s[4:5]		; GFX9-NEXT: s_setpc_b64 s[4:5]
;		;
; GFX10-LABEL: call_512xi32:		; GFX10-LABEL: call_512xi32:
; GFX10: ; %bb.0: ; %entry		; GFX10: ; %bb.0: ; %entry
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX10-NEXT: s_waitcnt_vscnt null, 0x0		; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
; GFX10-NEXT: s_mov_b32 s8, s33		; GFX10-NEXT: s_mov_b32 s8, s33
; GFX10-NEXT: s_add_u32 s33, s32, 0xffe0		; GFX10-NEXT: s_add_i32 s33, s32, 0xffe0
; GFX10-NEXT: s_mov_b64 s[4:5], s[30:31]		; GFX10-NEXT: s_mov_b64 s[4:5], s[30:31]
; GFX10-NEXT: s_and_b32 s33, s33, 0xffff0000		; GFX10-NEXT: s_and_b32 s33, s33, 0xffff0000
; GFX10-NEXT: s_add_u32 s32, s32, 0x30000		; GFX10-NEXT: s_add_i32 s32, s32, 0x30000
; GFX10-NEXT: s_getpc_b64 s[6:7]		; GFX10-NEXT: s_getpc_b64 s[6:7]
; GFX10-NEXT: s_add_u32 s6, s6, return_512xi32@gotpcrel32@lo+4		; GFX10-NEXT: s_add_u32 s6, s6, return_512xi32@gotpcrel32@lo+4
; GFX10-NEXT: s_addc_u32 s7, s7, return_512xi32@gotpcrel32@hi+12		; GFX10-NEXT: s_addc_u32 s7, s7, return_512xi32@gotpcrel32@hi+12
; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33		; GFX10-NEXT: v_lshrrev_b32_e64 v0, 5, s33
; GFX10-NEXT: s_load_dwordx2 s[6:7], s[6:7], 0x0		; GFX10-NEXT: s_load_dwordx2 s[6:7], s[6:7], 0x0
; GFX10-NEXT: s_waitcnt lgkmcnt(0)		; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]		; GFX10-NEXT: s_swappc_b64 s[30:31], s[6:7]
; GFX10-NEXT: s_sub_u32 s32, s32, 0x30000		; GFX10-NEXT: s_add_i32 s32, s32, 0xfffd0000
; GFX10-NEXT: s_mov_b32 s33, s8		; GFX10-NEXT: s_mov_b32 s33, s8
; GFX10-NEXT: s_setpc_b64 s[4:5]		; GFX10-NEXT: s_setpc_b64 s[4:5]
entry:		entry:
call amdgpu_gfx <512 x i32> @return_512xi32()		call amdgpu_gfx <512 x i32> @return_512xi32()
ret void		ret void
}		}

attributes #0 = { nounwind }		attributes #0 = { nounwind }

llvm/test/CodeGen/AMDGPU/indirect-call.ll

	Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	; GCN-NEXT: private_segment_alignment = 4			; GCN-NEXT: private_segment_alignment = 4
	; GCN-NEXT: wavefront_size = 6			; GCN-NEXT: wavefront_size = 6
	; GCN-NEXT: call_convention = -1			; GCN-NEXT: call_convention = -1
	; GCN-NEXT: runtime_loader_kernel_symbol = 0			; GCN-NEXT: runtime_loader_kernel_symbol = 0
	; GCN-NEXT: .end_amd_kernel_code_t			; GCN-NEXT: .end_amd_kernel_code_t
	; GCN-NEXT: ; %bb.0:			; GCN-NEXT: ; %bb.0:
	; GCN-NEXT: s_mov_b32 s32, 0			; GCN-NEXT: s_mov_b32 s32, 0
	; GCN-NEXT: s_mov_b32 flat_scratch_lo, s13			; GCN-NEXT: s_mov_b32 flat_scratch_lo, s13
	; GCN-NEXT: s_add_u32 s12, s12, s17			; GCN-NEXT: s_add_i32 s12, s12, s17
	; GCN-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8			; GCN-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
	; GCN-NEXT: s_add_u32 s0, s0, s17			; GCN-NEXT: s_add_u32 s0, s0, s17
	; GCN-NEXT: s_addc_u32 s1, s1, 0			; GCN-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: s_mov_b32 s13, s15			; GCN-NEXT: s_mov_b32 s13, s15
	; GCN-NEXT: s_mov_b32 s12, s14			; GCN-NEXT: s_mov_b32 s12, s14
	; GCN-NEXT: s_getpc_b64 s[14:15]			; GCN-NEXT: s_getpc_b64 s[14:15]
	; GCN-NEXT: s_add_u32 s14, s14, gv.fptr0@rel32@lo+4			; GCN-NEXT: s_add_u32 s14, s14, gv.fptr0@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s15, s15, gv.fptr0@rel32@hi+12			; GCN-NEXT: s_addc_u32 s15, s15, gv.fptr0@rel32@hi+12
	▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	; GCN-NEXT: private_segment_alignment = 4			; GCN-NEXT: private_segment_alignment = 4
	; GCN-NEXT: wavefront_size = 6			; GCN-NEXT: wavefront_size = 6
	; GCN-NEXT: call_convention = -1			; GCN-NEXT: call_convention = -1
	; GCN-NEXT: runtime_loader_kernel_symbol = 0			; GCN-NEXT: runtime_loader_kernel_symbol = 0
	; GCN-NEXT: .end_amd_kernel_code_t			; GCN-NEXT: .end_amd_kernel_code_t
	; GCN-NEXT: ; %bb.0:			; GCN-NEXT: ; %bb.0:
	; GCN-NEXT: s_mov_b32 s32, 0			; GCN-NEXT: s_mov_b32 s32, 0
	; GCN-NEXT: s_mov_b32 flat_scratch_lo, s13			; GCN-NEXT: s_mov_b32 flat_scratch_lo, s13
	; GCN-NEXT: s_add_u32 s12, s12, s17			; GCN-NEXT: s_add_i32 s12, s12, s17
	; GCN-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8			; GCN-NEXT: s_lshr_b32 flat_scratch_hi, s12, 8
	; GCN-NEXT: s_add_u32 s0, s0, s17			; GCN-NEXT: s_add_u32 s0, s0, s17
	; GCN-NEXT: s_addc_u32 s1, s1, 0			; GCN-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: s_mov_b32 s13, s15			; GCN-NEXT: s_mov_b32 s13, s15
	; GCN-NEXT: s_mov_b32 s12, s14			; GCN-NEXT: s_mov_b32 s12, s14
	; GCN-NEXT: s_getpc_b64 s[14:15]			; GCN-NEXT: s_getpc_b64 s[14:15]
	; GCN-NEXT: s_add_u32 s14, s14, gv.fptr1@rel32@lo+4			; GCN-NEXT: s_add_u32 s14, s14, gv.fptr1@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s15, s15, gv.fptr1@rel32@hi+12			; GCN-NEXT: s_addc_u32 s15, s15, gv.fptr1@rel32@hi+12
	Show All 16 Lines
	; GCN-LABEL: test_indirect_call_vgpr_ptr:			; GCN-LABEL: test_indirect_call_vgpr_ptr:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v43, s33, 17			; GCN-NEXT: v_writelane_b32 v43, s33, 17
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x800			; GCN-NEXT: s_addk_i32 s32, 0x800
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_writelane_b32 v43, s34, 0			; GCN-NEXT: v_writelane_b32 v43, s34, 0
	; GCN-NEXT: v_writelane_b32 v43, s35, 1			; GCN-NEXT: v_writelane_b32 v43, s35, 1
	; GCN-NEXT: v_writelane_b32 v43, s36, 2			; GCN-NEXT: v_writelane_b32 v43, s36, 2
	; GCN-NEXT: v_writelane_b32 v43, s38, 3			; GCN-NEXT: v_writelane_b32 v43, s38, 3
	; GCN-NEXT: v_writelane_b32 v43, s39, 4			; GCN-NEXT: v_writelane_b32 v43, s39, 4
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s39, v43, 4			; GCN-NEXT: v_readlane_b32 s39, v43, 4
	; GCN-NEXT: v_readlane_b32 s38, v43, 3			; GCN-NEXT: v_readlane_b32 s38, v43, 3
	; GCN-NEXT: v_readlane_b32 s36, v43, 2			; GCN-NEXT: v_readlane_b32 s36, v43, 2
	; GCN-NEXT: v_readlane_b32 s35, v43, 1			; GCN-NEXT: v_readlane_b32 s35, v43, 1
	; GCN-NEXT: v_readlane_b32 s34, v43, 0			; GCN-NEXT: v_readlane_b32 s34, v43, 0
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: s_sub_u32 s32, s32, 0x800			; GCN-NEXT: s_addk_i32 s32, 0xf800
	; GCN-NEXT: v_readlane_b32 s33, v43, 17			; GCN-NEXT: v_readlane_b32 s33, v43, 17
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[4:5]			; GCN-NEXT: s_setpc_b64 s[4:5]
	call void %fptr()			call void %fptr()
	ret void			ret void
	}			}

	define void @test_indirect_call_vgpr_ptr_arg(void(i32)* %fptr) {			define void @test_indirect_call_vgpr_ptr_arg(void(i32)* %fptr) {
	; GCN-LABEL: test_indirect_call_vgpr_ptr_arg:			; GCN-LABEL: test_indirect_call_vgpr_ptr_arg:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v43, s33, 17			; GCN-NEXT: v_writelane_b32 v43, s33, 17
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x800			; GCN-NEXT: s_addk_i32 s32, 0x800
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_writelane_b32 v43, s34, 0			; GCN-NEXT: v_writelane_b32 v43, s34, 0
	; GCN-NEXT: v_writelane_b32 v43, s35, 1			; GCN-NEXT: v_writelane_b32 v43, s35, 1
	; GCN-NEXT: v_writelane_b32 v43, s36, 2			; GCN-NEXT: v_writelane_b32 v43, s36, 2
	; GCN-NEXT: v_writelane_b32 v43, s38, 3			; GCN-NEXT: v_writelane_b32 v43, s38, 3
	; GCN-NEXT: v_writelane_b32 v43, s39, 4			; GCN-NEXT: v_writelane_b32 v43, s39, 4
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s39, v43, 4			; GCN-NEXT: v_readlane_b32 s39, v43, 4
	; GCN-NEXT: v_readlane_b32 s38, v43, 3			; GCN-NEXT: v_readlane_b32 s38, v43, 3
	; GCN-NEXT: v_readlane_b32 s36, v43, 2			; GCN-NEXT: v_readlane_b32 s36, v43, 2
	; GCN-NEXT: v_readlane_b32 s35, v43, 1			; GCN-NEXT: v_readlane_b32 s35, v43, 1
	; GCN-NEXT: v_readlane_b32 s34, v43, 0			; GCN-NEXT: v_readlane_b32 s34, v43, 0
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: s_sub_u32 s32, s32, 0x800			; GCN-NEXT: s_addk_i32 s32, 0xf800
	; GCN-NEXT: v_readlane_b32 s33, v43, 17			; GCN-NEXT: v_readlane_b32 s33, v43, 17
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[4:5]			; GCN-NEXT: s_setpc_b64 s[4:5]
	call void %fptr(i32 123)			call void %fptr(i32 123)
	ret void			ret void
	}			}

	define i32 @test_indirect_call_vgpr_ptr_ret(i32()* %fptr) {			define i32 @test_indirect_call_vgpr_ptr_ret(i32()* %fptr) {
	; GCN-LABEL: test_indirect_call_vgpr_ptr_ret:			; GCN-LABEL: test_indirect_call_vgpr_ptr_ret:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v43, s33, 17			; GCN-NEXT: v_writelane_b32 v43, s33, 17
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x800			; GCN-NEXT: s_addk_i32 s32, 0x800
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_writelane_b32 v43, s34, 0			; GCN-NEXT: v_writelane_b32 v43, s34, 0
	; GCN-NEXT: v_writelane_b32 v43, s35, 1			; GCN-NEXT: v_writelane_b32 v43, s35, 1
	; GCN-NEXT: v_writelane_b32 v43, s36, 2			; GCN-NEXT: v_writelane_b32 v43, s36, 2
	; GCN-NEXT: v_writelane_b32 v43, s38, 3			; GCN-NEXT: v_writelane_b32 v43, s38, 3
	; GCN-NEXT: v_writelane_b32 v43, s39, 4			; GCN-NEXT: v_writelane_b32 v43, s39, 4
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s39, v43, 4			; GCN-NEXT: v_readlane_b32 s39, v43, 4
	; GCN-NEXT: v_readlane_b32 s38, v43, 3			; GCN-NEXT: v_readlane_b32 s38, v43, 3
	; GCN-NEXT: v_readlane_b32 s36, v43, 2			; GCN-NEXT: v_readlane_b32 s36, v43, 2
	; GCN-NEXT: v_readlane_b32 s35, v43, 1			; GCN-NEXT: v_readlane_b32 s35, v43, 1
	; GCN-NEXT: v_readlane_b32 s34, v43, 0			; GCN-NEXT: v_readlane_b32 s34, v43, 0
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: s_sub_u32 s32, s32, 0x800			; GCN-NEXT: s_addk_i32 s32, 0xf800
	; GCN-NEXT: v_readlane_b32 s33, v43, 17			; GCN-NEXT: v_readlane_b32 s33, v43, 17
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[4:5]			; GCN-NEXT: s_setpc_b64 s[4:5]
	%a = call i32 %fptr()			%a = call i32 %fptr()
	%b = add i32 %a, 1			%b = add i32 %a, 1
	ret i32 %b			ret i32 %b
	}			}

	define void @test_indirect_call_vgpr_ptr_in_branch(void()* %fptr, i1 %cond) {			define void @test_indirect_call_vgpr_ptr_in_branch(void()* %fptr, i1 %cond) {
	; GCN-LABEL: test_indirect_call_vgpr_ptr_in_branch:			; GCN-LABEL: test_indirect_call_vgpr_ptr_in_branch:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1			; GCN-NEXT: s_or_saveexec_b64 s[16:17], -1
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[16:17]			; GCN-NEXT: s_mov_b64 exec, s[16:17]
	; GCN-NEXT: v_writelane_b32 v43, s33, 19			; GCN-NEXT: v_writelane_b32 v43, s33, 19
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x800			; GCN-NEXT: s_addk_i32 s32, 0x800
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_writelane_b32 v43, s34, 0			; GCN-NEXT: v_writelane_b32 v43, s34, 0
	; GCN-NEXT: v_writelane_b32 v43, s35, 1			; GCN-NEXT: v_writelane_b32 v43, s35, 1
	; GCN-NEXT: v_writelane_b32 v43, s36, 2			; GCN-NEXT: v_writelane_b32 v43, s36, 2
	; GCN-NEXT: v_writelane_b32 v43, s38, 3			; GCN-NEXT: v_writelane_b32 v43, s38, 3
	; GCN-NEXT: v_writelane_b32 v43, s39, 4			; GCN-NEXT: v_writelane_b32 v43, s39, 4
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s39, v43, 4			; GCN-NEXT: v_readlane_b32 s39, v43, 4
	; GCN-NEXT: v_readlane_b32 s38, v43, 3			; GCN-NEXT: v_readlane_b32 s38, v43, 3
	; GCN-NEXT: v_readlane_b32 s36, v43, 2			; GCN-NEXT: v_readlane_b32 s36, v43, 2
	; GCN-NEXT: v_readlane_b32 s35, v43, 1			; GCN-NEXT: v_readlane_b32 s35, v43, 1
	; GCN-NEXT: v_readlane_b32 s34, v43, 0			; GCN-NEXT: v_readlane_b32 s34, v43, 0
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: s_sub_u32 s32, s32, 0x800			; GCN-NEXT: s_addk_i32 s32, 0xf800
	; GCN-NEXT: v_readlane_b32 s33, v43, 19			; GCN-NEXT: v_readlane_b32 s33, v43, 19
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	br i1 %cond, label %bb1, label %bb2			br i1 %cond, label %bb1, label %bb2
	Show All 10 Lines
	; GCN-LABEL: test_indirect_call_vgpr_ptr_inreg_arg:			; GCN-LABEL: test_indirect_call_vgpr_ptr_inreg_arg:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v42, s33, 6			; GCN-NEXT: v_writelane_b32 v42, s33, 6
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_writelane_b32 v42, s34, 0			; GCN-NEXT: v_writelane_b32 v42, s34, 0
	; GCN-NEXT: v_writelane_b32 v42, s35, 1			; GCN-NEXT: v_writelane_b32 v42, s35, 1
	; GCN-NEXT: v_writelane_b32 v42, s36, 2			; GCN-NEXT: v_writelane_b32 v42, s36, 2
	; GCN-NEXT: v_writelane_b32 v42, s37, 3			; GCN-NEXT: v_writelane_b32 v42, s37, 3
	; GCN-NEXT: v_writelane_b32 v42, s30, 4			; GCN-NEXT: v_writelane_b32 v42, s30, 4
	; GCN-NEXT: v_writelane_b32 v42, s31, 5			; GCN-NEXT: v_writelane_b32 v42, s31, 5
	Show All 14 Lines
	; GCN-NEXT: v_readlane_b32 s4, v42, 4			; GCN-NEXT: v_readlane_b32 s4, v42, 4
	; GCN-NEXT: v_readlane_b32 s5, v42, 5			; GCN-NEXT: v_readlane_b32 s5, v42, 5
	; GCN-NEXT: v_readlane_b32 s37, v42, 3			; GCN-NEXT: v_readlane_b32 s37, v42, 3
	; GCN-NEXT: v_readlane_b32 s36, v42, 2			; GCN-NEXT: v_readlane_b32 s36, v42, 2
	; GCN-NEXT: v_readlane_b32 s35, v42, 1			; GCN-NEXT: v_readlane_b32 s35, v42, 1
	; GCN-NEXT: v_readlane_b32 s34, v42, 0			; GCN-NEXT: v_readlane_b32 s34, v42, 0
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_sub_u32 s32, s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v42, 6			; GCN-NEXT: v_readlane_b32 s33, v42, 6
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[4:5]			; GCN-NEXT: s_setpc_b64 s[4:5]
	call amdgpu_gfx void %fptr(i32 inreg 123)			call amdgpu_gfx void %fptr(i32 inreg 123)
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll

Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
; FLATSCR-NEXT: s_addc_u32 flat_scratch_hi, s3, 0		; FLATSCR-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
; FLATSCR-NEXT: v_mov_b32_e32 v0, 0		; FLATSCR-NEXT: v_mov_b32_e32 v0, 0
; FLATSCR-NEXT: s_movk_i32 vcc_hi, 0x2000		; FLATSCR-NEXT: s_movk_i32 vcc_hi, 0x2000
; FLATSCR-NEXT: s_mov_b32 s2, 0		; FLATSCR-NEXT: s_mov_b32 s2, 0
; FLATSCR-NEXT: scratch_store_dword off, v0, vcc_hi		; FLATSCR-NEXT: scratch_store_dword off, v0, vcc_hi
; FLATSCR-NEXT: s_waitcnt vmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: BB0_1: ; %loadstoreloop		; FLATSCR-NEXT: BB0_1: ; %loadstoreloop
; FLATSCR-NEXT: ; =>This Inner Loop Header: Depth=1		; FLATSCR-NEXT: ; =>This Inner Loop Header: Depth=1
; FLATSCR-NEXT: s_add_u32 s3, 0x3000, s2		; FLATSCR-NEXT: s_add_i32 s3, s2, 0x3000
; FLATSCR-NEXT: s_add_i32 s2, s2, 1		; FLATSCR-NEXT: s_add_i32 s2, s2, 1
; FLATSCR-NEXT: s_cmpk_lt_u32 s2, 0x2120		; FLATSCR-NEXT: s_cmpk_lt_u32 s2, 0x2120
; FLATSCR-NEXT: scratch_store_byte off, v0, s3		; FLATSCR-NEXT: scratch_store_byte off, v0, s3
; FLATSCR-NEXT: s_waitcnt vmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: s_cbranch_scc1 BB0_1		; FLATSCR-NEXT: s_cbranch_scc1 BB0_1
; FLATSCR-NEXT: ; %bb.2: ; %split		; FLATSCR-NEXT: ; %bb.2: ; %split
; FLATSCR-NEXT: s_movk_i32 s2, 0x2000		; FLATSCR-NEXT: s_movk_i32 s2, 0x2000
; FLATSCR-NEXT: s_add_u32 s2, 0x3000, s2		; FLATSCR-NEXT: s_addk_i32 s2, 0x3000
; FLATSCR-NEXT: scratch_load_dwordx2 v[0:1], off, s2 offset:208 glc		; FLATSCR-NEXT: scratch_load_dwordx2 v[0:1], off, s2 offset:208 glc
; FLATSCR-NEXT: s_waitcnt vmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: s_movk_i32 s2, 0x3000		; FLATSCR-NEXT: s_movk_i32 s2, 0x3000
; FLATSCR-NEXT: scratch_load_dwordx2 v[2:3], off, s2 offset:64 glc		; FLATSCR-NEXT: scratch_load_dwordx2 v[2:3], off, s2 offset:64 glc
; FLATSCR-NEXT: s_waitcnt vmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: v_add_co_u32_e32 v0, vcc, v0, v2		; FLATSCR-NEXT: v_add_co_u32_e32 v0, vcc, v0, v2
; FLATSCR-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v3, vcc		; FLATSCR-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v3, vcc
; FLATSCR-NEXT: v_mov_b32_e32 v2, 0		; FLATSCR-NEXT: v_mov_b32_e32 v2, 0
Show All 16 Lines	entry:
ret void		ret void
}		}

define void @func_local_stack_offset_uses_sp(i64 addrspace(1)* %out) {		define void @func_local_stack_offset_uses_sp(i64 addrspace(1)* %out) {
; MUBUF-LABEL: func_local_stack_offset_uses_sp:		; MUBUF-LABEL: func_local_stack_offset_uses_sp:
; MUBUF: ; %bb.0: ; %entry		; MUBUF: ; %bb.0: ; %entry
; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; MUBUF-NEXT: s_mov_b32 s5, s33		; MUBUF-NEXT: s_mov_b32 s5, s33
; MUBUF-NEXT: s_add_u32 s33, s32, 0x7ffc0		; MUBUF-NEXT: s_add_i32 s33, s32, 0x7ffc0
; MUBUF-NEXT: s_and_b32 s33, s33, 0xfff80000		; MUBUF-NEXT: s_and_b32 s33, s33, 0xfff80000
; MUBUF-NEXT: v_lshrrev_b32_e64 v3, 6, s33		; MUBUF-NEXT: v_lshrrev_b32_e64 v3, 6, s33
; MUBUF-NEXT: v_add_u32_e32 v3, 0x1000, v3		; MUBUF-NEXT: v_add_u32_e32 v3, 0x1000, v3
; MUBUF-NEXT: v_mov_b32_e32 v4, 0		; MUBUF-NEXT: v_mov_b32_e32 v4, 0
; MUBUF-NEXT: v_add_u32_e32 v2, 64, v3		; MUBUF-NEXT: v_add_u32_e32 v2, 64, v3
; MUBUF-NEXT: s_mov_b32 s4, 0		; MUBUF-NEXT: s_mov_b32 s4, 0
; MUBUF-NEXT: s_add_u32 s32, s32, 0x180000		; MUBUF-NEXT: s_add_i32 s32, s32, 0x180000
; MUBUF-NEXT: buffer_store_dword v4, off, s[0:3], s33		; MUBUF-NEXT: buffer_store_dword v4, off, s[0:3], s33
; MUBUF-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; MUBUF-NEXT: BB1_1: ; %loadstoreloop		; MUBUF-NEXT: BB1_1: ; %loadstoreloop
; MUBUF-NEXT: ; =>This Inner Loop Header: Depth=1		; MUBUF-NEXT: ; =>This Inner Loop Header: Depth=1
; MUBUF-NEXT: v_add_u32_e32 v5, s4, v3		; MUBUF-NEXT: v_add_u32_e32 v5, s4, v3
; MUBUF-NEXT: s_add_i32 s4, s4, 1		; MUBUF-NEXT: s_add_i32 s4, s4, 1
; MUBUF-NEXT: s_cmpk_lt_u32 s4, 0x2120		; MUBUF-NEXT: s_cmpk_lt_u32 s4, 0x2120
; MUBUF-NEXT: buffer_store_byte v4, v5, s[0:3], 0 offen		; MUBUF-NEXT: buffer_store_byte v4, v5, s[0:3], 0 offen
; MUBUF-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; MUBUF-NEXT: s_cbranch_scc1 BB1_1		; MUBUF-NEXT: s_cbranch_scc1 BB1_1
; MUBUF-NEXT: ; %bb.2: ; %split		; MUBUF-NEXT: ; %bb.2: ; %split
; MUBUF-NEXT: v_lshrrev_b32_e64 v3, 6, s33		; MUBUF-NEXT: v_lshrrev_b32_e64 v3, 6, s33
; MUBUF-NEXT: v_add_u32_e32 v3, 0x1000, v3		; MUBUF-NEXT: v_add_u32_e32 v3, 0x1000, v3
; MUBUF-NEXT: v_add_u32_e32 v3, 0x20d0, v3		; MUBUF-NEXT: v_add_u32_e32 v3, 0x20d0, v3
; MUBUF-NEXT: buffer_load_dword v4, v3, s[0:3], 0 offen glc		; MUBUF-NEXT: buffer_load_dword v4, v3, s[0:3], 0 offen glc
; MUBUF-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; MUBUF-NEXT: buffer_load_dword v5, v3, s[0:3], 0 offen offset:4 glc		; MUBUF-NEXT: buffer_load_dword v5, v3, s[0:3], 0 offen offset:4 glc
; MUBUF-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; MUBUF-NEXT: buffer_load_dword v6, v2, s[0:3], 0 offen glc		; MUBUF-NEXT: buffer_load_dword v6, v2, s[0:3], 0 offen glc
; MUBUF-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; MUBUF-NEXT: buffer_load_dword v7, v2, s[0:3], 0 offen offset:4 glc		; MUBUF-NEXT: buffer_load_dword v7, v2, s[0:3], 0 offen offset:4 glc
; MUBUF-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; MUBUF-NEXT: s_sub_u32 s32, s32, 0x180000		; MUBUF-NEXT: s_add_i32 s32, s32, 0xffe80000
; MUBUF-NEXT: s_mov_b32 s33, s5		; MUBUF-NEXT: s_mov_b32 s33, s5
; MUBUF-NEXT: v_add_co_u32_e32 v2, vcc, v4, v6		; MUBUF-NEXT: v_add_co_u32_e32 v2, vcc, v4, v6
; MUBUF-NEXT: v_addc_co_u32_e32 v3, vcc, v5, v7, vcc		; MUBUF-NEXT: v_addc_co_u32_e32 v3, vcc, v5, v7, vcc
; MUBUF-NEXT: global_store_dwordx2 v[0:1], v[2:3], off		; MUBUF-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
; MUBUF-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; MUBUF-NEXT: s_setpc_b64 s[30:31]		; MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; FLATSCR-LABEL: func_local_stack_offset_uses_sp:		; FLATSCR-LABEL: func_local_stack_offset_uses_sp:
; FLATSCR: ; %bb.0: ; %entry		; FLATSCR: ; %bb.0: ; %entry
; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; FLATSCR-NEXT: s_mov_b32 s2, s33		; FLATSCR-NEXT: s_mov_b32 s2, s33
; FLATSCR-NEXT: s_add_u32 s33, s32, 0x1fff		; FLATSCR-NEXT: s_add_i32 s33, s32, 0x1fff
; FLATSCR-NEXT: s_and_b32 s33, s33, 0xffffe000		; FLATSCR-NEXT: s_and_b32 s33, s33, 0xffffe000
; FLATSCR-NEXT: v_mov_b32_e32 v2, 0		; FLATSCR-NEXT: v_mov_b32_e32 v2, 0
; FLATSCR-NEXT: s_mov_b32 s0, 0		; FLATSCR-NEXT: s_mov_b32 s0, 0
; FLATSCR-NEXT: s_add_u32 s32, s32, 0x6000		; FLATSCR-NEXT: s_addk_i32 s32, 0x6000
; FLATSCR-NEXT: scratch_store_dword off, v2, s33		; FLATSCR-NEXT: scratch_store_dword off, v2, s33
; FLATSCR-NEXT: s_waitcnt vmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: BB1_1: ; %loadstoreloop		; FLATSCR-NEXT: BB1_1: ; %loadstoreloop
; FLATSCR-NEXT: ; =>This Inner Loop Header: Depth=1		; FLATSCR-NEXT: ; =>This Inner Loop Header: Depth=1
; FLATSCR-NEXT: s_add_u32 vcc_hi, s33, 0x1000		; FLATSCR-NEXT: s_add_i32 vcc_hi, s33, 0x1000
; FLATSCR-NEXT: s_add_u32 s1, vcc_hi, s0		; FLATSCR-NEXT: s_add_i32 s1, s0, vcc_hi
; FLATSCR-NEXT: s_add_i32 s0, s0, 1		; FLATSCR-NEXT: s_add_i32 s0, s0, 1
; FLATSCR-NEXT: s_cmpk_lt_u32 s0, 0x2120		; FLATSCR-NEXT: s_cmpk_lt_u32 s0, 0x2120
; FLATSCR-NEXT: scratch_store_byte off, v2, s1		; FLATSCR-NEXT: scratch_store_byte off, v2, s1
; FLATSCR-NEXT: s_waitcnt vmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: s_cbranch_scc1 BB1_1		; FLATSCR-NEXT: s_cbranch_scc1 BB1_1
; FLATSCR-NEXT: ; %bb.2: ; %split		; FLATSCR-NEXT: ; %bb.2: ; %split
; FLATSCR-NEXT: s_movk_i32 s0, 0x2000		; FLATSCR-NEXT: s_movk_i32 s0, 0x2000
; FLATSCR-NEXT: s_add_u32 s1, s33, 0x1000		; FLATSCR-NEXT: s_add_i32 s1, s33, 0x1000
; FLATSCR-NEXT: s_add_u32 s0, s1, s0		; FLATSCR-NEXT: s_add_i32 s0, s0, s1
; FLATSCR-NEXT: scratch_load_dwordx2 v[2:3], off, s0 offset:208 glc		; FLATSCR-NEXT: scratch_load_dwordx2 v[2:3], off, s0 offset:208 glc
; FLATSCR-NEXT: s_waitcnt vmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: s_add_u32 s0, s33, 0x1000		; FLATSCR-NEXT: s_add_i32 s0, s33, 0x1000
; FLATSCR-NEXT: scratch_load_dwordx2 v[4:5], off, s0 offset:64 glc		; FLATSCR-NEXT: scratch_load_dwordx2 v[4:5], off, s0 offset:64 glc
; FLATSCR-NEXT: s_waitcnt vmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: s_sub_u32 s32, s32, 0x6000		; FLATSCR-NEXT: s_addk_i32 s32, 0xa000
; FLATSCR-NEXT: s_mov_b32 s33, s2		; FLATSCR-NEXT: s_mov_b32 s33, s2
; FLATSCR-NEXT: v_add_co_u32_e32 v2, vcc, v2, v4		; FLATSCR-NEXT: v_add_co_u32_e32 v2, vcc, v2, v4
; FLATSCR-NEXT: v_addc_co_u32_e32 v3, vcc, v3, v5, vcc		; FLATSCR-NEXT: v_addc_co_u32_e32 v3, vcc, v3, v5, vcc
; FLATSCR-NEXT: global_store_dwordx2 v[0:1], v[2:3], off		; FLATSCR-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
; FLATSCR-NEXT: s_waitcnt vmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: s_setpc_b64 s[30:31]		; FLATSCR-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%pin.low = alloca i32, align 8192, addrspace(5)		%pin.low = alloca i32, align 8192, addrspace(5)
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
; FLATSCR-NEXT: s_addc_u32 flat_scratch_hi, s3, 0		; FLATSCR-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
; FLATSCR-NEXT: v_mov_b32_e32 v0, 0		; FLATSCR-NEXT: v_mov_b32_e32 v0, 0
; FLATSCR-NEXT: s_mov_b32 vcc_hi, 0		; FLATSCR-NEXT: s_mov_b32 vcc_hi, 0
; FLATSCR-NEXT: s_mov_b32 s2, 0		; FLATSCR-NEXT: s_mov_b32 s2, 0
; FLATSCR-NEXT: scratch_store_dword off, v0, vcc_hi offset:1024		; FLATSCR-NEXT: scratch_store_dword off, v0, vcc_hi offset:1024
; FLATSCR-NEXT: s_waitcnt vmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: BB2_1: ; %loadstoreloop		; FLATSCR-NEXT: BB2_1: ; %loadstoreloop
; FLATSCR-NEXT: ; =>This Inner Loop Header: Depth=1		; FLATSCR-NEXT: ; =>This Inner Loop Header: Depth=1
; FLATSCR-NEXT: s_add_u32 s3, 0x2000, s2		; FLATSCR-NEXT: s_add_i32 s3, s2, 0x2000
; FLATSCR-NEXT: s_add_i32 s2, s2, 1		; FLATSCR-NEXT: s_add_i32 s2, s2, 1
; FLATSCR-NEXT: s_cmpk_lt_u32 s2, 0x2120		; FLATSCR-NEXT: s_cmpk_lt_u32 s2, 0x2120
; FLATSCR-NEXT: scratch_store_byte off, v0, s3		; FLATSCR-NEXT: scratch_store_byte off, v0, s3
; FLATSCR-NEXT: s_waitcnt vmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: s_cbranch_scc1 BB2_1		; FLATSCR-NEXT: s_cbranch_scc1 BB2_1
; FLATSCR-NEXT: ; %bb.2: ; %split		; FLATSCR-NEXT: ; %bb.2: ; %split
; FLATSCR-NEXT: s_movk_i32 s2, 0x1000		; FLATSCR-NEXT: s_movk_i32 s2, 0x1000
; FLATSCR-NEXT: s_add_u32 s2, 0x2000, s2		; FLATSCR-NEXT: s_addk_i32 s2, 0x2000
; FLATSCR-NEXT: scratch_load_dwordx2 v[8:9], off, s2 offset:720 glc		; FLATSCR-NEXT: scratch_load_dwordx2 v[8:9], off, s2 offset:720 glc
; FLATSCR-NEXT: s_waitcnt vmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: scratch_load_dwordx4 v[0:3], off, s2 offset:704 glc		; FLATSCR-NEXT: scratch_load_dwordx4 v[0:3], off, s2 offset:704 glc
; FLATSCR-NEXT: s_waitcnt vmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: s_movk_i32 s2, 0x2000		; FLATSCR-NEXT: s_movk_i32 s2, 0x2000
; FLATSCR-NEXT: scratch_load_dwordx2 v[10:11], off, s2 offset:16 glc		; FLATSCR-NEXT: scratch_load_dwordx2 v[10:11], off, s2 offset:16 glc
; FLATSCR-NEXT: s_waitcnt vmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: s_movk_i32 s2, 0x2000		; FLATSCR-NEXT: s_movk_i32 s2, 0x2000
Show All 33 Lines

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

	Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines
	; GFX9-LABEL: slsr1_1:			; GFX9-LABEL: slsr1_1:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: v_writelane_b32 v43, s33, 4			; GFX9-NEXT: v_writelane_b32 v43, s33, 4
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_add_u32 s32, s32, 0x800			; GFX9-NEXT: s_addk_i32 s32, 0x800
	; GFX9-NEXT: v_writelane_b32 v43, s34, 0			; GFX9-NEXT: v_writelane_b32 v43, s34, 0
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, foo@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, foo@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, foo@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, foo@gotpcrel32@hi+12
	; GFX9-NEXT: v_writelane_b32 v43, s35, 1			; GFX9-NEXT: v_writelane_b32 v43, s35, 1
	; GFX9-NEXT: s_load_dwordx2 s[34:35], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[34:35], s[4:5], 0x0
	; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GFX9-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	Show All 13 Lines
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s4, v43, 2			; GFX9-NEXT: v_readlane_b32 s4, v43, 2
	; GFX9-NEXT: v_readlane_b32 s5, v43, 3			; GFX9-NEXT: v_readlane_b32 s5, v43, 3
	; GFX9-NEXT: v_readlane_b32 s35, v43, 1			; GFX9-NEXT: v_readlane_b32 s35, v43, 1
	; GFX9-NEXT: v_readlane_b32 s34, v43, 0			; GFX9-NEXT: v_readlane_b32 s34, v43, 0
	; GFX9-NEXT: s_sub_u32 s32, s32, 0x800			; GFX9-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-NEXT: v_readlane_b32 s33, v43, 4			; GFX9-NEXT: v_readlane_b32 s33, v43, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[4:5]			; GFX9-NEXT: s_setpc_b64 s[4:5]
	%b = and i32 %b.arg, 16777215			%b = and i32 %b.arg, 16777215
	%s = and i32 %s.arg, 16777215			%s = and i32 %s.arg, 16777215
	Show All 28 Lines

llvm/test/CodeGen/AMDGPU/need-fp-from-csr-vgpr-spill.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck %s

	; FP is in CSR range, modified.			; FP is in CSR range, modified.
	define hidden fastcc void @callee_has_fp() #1 {			define hidden fastcc void @callee_has_fp() #1 {
	; CHECK-LABEL: callee_has_fp:			; CHECK-LABEL: callee_has_fp:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s4, s33			; CHECK-NEXT: s_mov_b32 s4, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_add_u32 s32, s32, 0x200			; CHECK-NEXT: s_addk_i32 s32, 0x200
	; CHECK-NEXT: v_mov_b32_e32 v0, 1			; CHECK-NEXT: v_mov_b32_e32 v0, 1
	; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4			; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_sub_u32 s32, s32, 0x200			; CHECK-NEXT: s_addk_i32 s32, 0xfe00
	; CHECK-NEXT: s_mov_b32 s33, s4			; CHECK-NEXT: s_mov_b32 s33, s4
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 1, i32 addrspace(5)* %alloca			store volatile i32 1, i32 addrspace(5)* %alloca
	ret void			ret void
	}			}

	; Has no stack objects, but introduces them due to the CSR spill. We			; Has no stack objects, but introduces them due to the CSR spill. We
	; see the FP modified in the callee with IPRA. We should not have			; see the FP modified in the callee with IPRA. We should not have
	; redundant spills of s33 or assert.			; redundant spills of s33 or assert.
	define internal fastcc void @csr_vgpr_spill_fp_callee() #0 {			define internal fastcc void @csr_vgpr_spill_fp_callee() #0 {
	; CHECK-LABEL: csr_vgpr_spill_fp_callee:			; CHECK-LABEL: csr_vgpr_spill_fp_callee:
	; CHECK: ; %bb.0: ; %bb			; CHECK: ; %bb.0: ; %bb
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s8, s33			; CHECK-NEXT: s_mov_b32 s8, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_add_u32 s32, s32, 0x400			; CHECK-NEXT: s_addk_i32 s32, 0x400
	; CHECK-NEXT: s_getpc_b64 s[4:5]			; CHECK-NEXT: s_getpc_b64 s[4:5]
	; CHECK-NEXT: s_add_u32 s4, s4, callee_has_fp@rel32@lo+4			; CHECK-NEXT: s_add_u32 s4, s4, callee_has_fp@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s5, s5, callee_has_fp@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s5, s5, callee_has_fp@rel32@hi+12
	; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill			; CHECK-NEXT: buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
	; CHECK-NEXT: s_mov_b64 s[6:7], s[30:31]			; CHECK-NEXT: s_mov_b64 s[6:7], s[30:31]
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ; clobber csr v40			; CHECK-NEXT: ; clobber csr v40
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CHECK-NEXT: s_sub_u32 s32, s32, 0x400			; CHECK-NEXT: s_addk_i32 s32, 0xfc00
	; CHECK-NEXT: s_mov_b32 s33, s8			; CHECK-NEXT: s_mov_b32 s33, s8
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[6:7]			; CHECK-NEXT: s_setpc_b64 s[6:7]
	bb:			bb:
	call fastcc void @callee_has_fp()			call fastcc void @callee_has_fp()
	call void asm sideeffect "; clobber csr v40", "~{v40}"()			call void asm sideeffect "; clobber csr v40", "~{v40}"()
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/nested-calls.ll

	Show All 10 Lines
	; GCN: s_waitcnt			; GCN: s_waitcnt

	; Spill CSR VGPR used for SGPR spilling			; Spill CSR VGPR used for SGPR spilling
	; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GCN-DAG: v_writelane_b32 v40, s33, 2			; GCN-DAG: v_writelane_b32 v40, s33, 2
	; GCN-DAG: s_mov_b32 s33, s32			; GCN-DAG: s_mov_b32 s33, s32
	; GCN-DAG: s_add_u32 s32, s32, 0x400			; GCN-DAG: s_addk_i32 s32, 0x400
	; GCN-DAG: v_writelane_b32 v40, s30, 0			; GCN-DAG: v_writelane_b32 v40, s30, 0
	; GCN-DAG: v_writelane_b32 v40, s31, 1			; GCN-DAG: v_writelane_b32 v40, s31, 1

	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN: v_readlane_b32 s4, v40, 0			; GCN: v_readlane_b32 s4, v40, 0
	; GCN: v_readlane_b32 s5, v40, 1			; GCN: v_readlane_b32 s5, v40, 1

	; GCN-NEXT: s_sub_u32 s32, s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 2			; GCN-NEXT: v_readlane_b32 s33, v40, 2
	; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[4:5]			; GCN-NEXT: s_setpc_b64 s[4:5]
	define void @test_func_call_external_void_func_i32_imm() #0 {			define void @test_func_call_external_void_func_i32_imm() #0 {
	call void @external_void_func_i32(i32 42)			call void @external_void_func_i32(i32 42)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}test_func_call_external_void_func_i32_imm_stack_use:			; GCN-LABEL: {{^}}test_func_call_external_void_func_i32_imm_stack_use:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN: s_mov_b32 s33, s32			; GCN: s_mov_b32 s33, s32
	; GCN-DAG: s_add_u32 s32, s32, 0x1400{{$}}			; GCN-DAG: s_addk_i32 s32, 0x1400{{$}}
	; GCN-DAG: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:			; GCN-DAG: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GCN: s_sub_u32 s32, s32, 0x1400{{$}}			; GCN: s_addk_i32 s32, 0xec00{{$}}
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	define void @test_func_call_external_void_func_i32_imm_stack_use() #0 {			define void @test_func_call_external_void_func_i32_imm_stack_use() #0 {
	%alloca = alloca [16 x i32], align 4, addrspace(5)			%alloca = alloca [16 x i32], align 4, addrspace(5)
	%gep0 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 0			%gep0 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 0
	%gep15 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 16			%gep15 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 16
	store volatile i32 0, i32 addrspace(5)* %gep0			store volatile i32 0, i32 addrspace(5)* %gep0
	store volatile i32 0, i32 addrspace(5)* %gep15			store volatile i32 0, i32 addrspace(5)* %gep15
	call void @external_void_func_i32(i32 42)			call void @external_void_func_i32(i32 42)
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readnone }			attributes #1 = { nounwind readnone }
	attributes #2 = { nounwind noinline }			attributes #2 = { nounwind noinline }

llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
; FLATSCR-NEXT: s_mov_b32 s33, 0		; FLATSCR-NEXT: s_mov_b32 s33, 0
; FLATSCR-NEXT: s_waitcnt lgkmcnt(0)		; FLATSCR-NEXT: s_waitcnt lgkmcnt(0)
; FLATSCR-NEXT: s_cmp_lg_u32 s4, 0		; FLATSCR-NEXT: s_cmp_lg_u32 s4, 0
; FLATSCR-NEXT: s_cbranch_scc1 BB0_3		; FLATSCR-NEXT: s_cbranch_scc1 BB0_3
; FLATSCR-NEXT: ; %bb.1: ; %bb.0		; FLATSCR-NEXT: ; %bb.1: ; %bb.0
; FLATSCR-NEXT: s_cmp_lg_u32 s5, 0		; FLATSCR-NEXT: s_cmp_lg_u32 s5, 0
; FLATSCR-NEXT: s_cbranch_scc1 BB0_3		; FLATSCR-NEXT: s_cbranch_scc1 BB0_3
; FLATSCR-NEXT: ; %bb.2: ; %bb.1		; FLATSCR-NEXT: ; %bb.2: ; %bb.1
; FLATSCR-NEXT: s_mov_b32 s2, s32		; FLATSCR-NEXT: s_add_i32 s2, s32, 0x1000
; FLATSCR-NEXT: s_add_i32 s3, s2, 0x1000
; FLATSCR-NEXT: v_mov_b32_e32 v1, 0		; FLATSCR-NEXT: v_mov_b32_e32 v1, 0
; FLATSCR-NEXT: s_add_u32 s2, s2, 0x1000
; FLATSCR-NEXT: v_mov_b32_e32 v2, 1		; FLATSCR-NEXT: v_mov_b32_e32 v2, 1
		; FLATSCR-NEXT: s_lshl_b32 s3, s6, 2
		; FLATSCR-NEXT: s_mov_b32 s32, s2
; FLATSCR-NEXT: scratch_store_dwordx2 off, v[1:2], s2		; FLATSCR-NEXT: scratch_store_dwordx2 off, v[1:2], s2
; FLATSCR-NEXT: s_lshl_b32 s2, s6, 2		; FLATSCR-NEXT: s_add_i32 s2, s2, s3
; FLATSCR-NEXT: s_mov_b32 s32, s3		; FLATSCR-NEXT: scratch_load_dword v2, off, s2
; FLATSCR-NEXT: s_add_i32 s3, s3, s2
; FLATSCR-NEXT: scratch_load_dword v2, off, s3
; FLATSCR-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0		; FLATSCR-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x0
; FLATSCR-NEXT: s_waitcnt vmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: v_add_u32_e32 v0, v2, v0		; FLATSCR-NEXT: v_add_u32_e32 v0, v2, v0
; FLATSCR-NEXT: s_waitcnt lgkmcnt(0)		; FLATSCR-NEXT: s_waitcnt lgkmcnt(0)
; FLATSCR-NEXT: global_store_dword v1, v0, s[0:1]		; FLATSCR-NEXT: global_store_dword v1, v0, s[0:1]
; FLATSCR-NEXT: BB0_3: ; %bb.2		; FLATSCR-NEXT: BB0_3: ; %bb.2
; FLATSCR-NEXT: v_mov_b32_e32 v0, 0		; FLATSCR-NEXT: v_mov_b32_e32 v0, 0
; FLATSCR-NEXT: global_store_dword v[0:1], v0, off		; FLATSCR-NEXT: global_store_dword v[0:1], v0, off
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines

define void @func_non_entry_block_static_alloca_align4(i32 addrspace(1)* %out, i32 %arg.cond0, i32 %arg.cond1, i32 %in) {		define void @func_non_entry_block_static_alloca_align4(i32 addrspace(1)* %out, i32 %arg.cond0, i32 %arg.cond1, i32 %in) {
; MUBUF-LABEL: func_non_entry_block_static_alloca_align4:		; MUBUF-LABEL: func_non_entry_block_static_alloca_align4:
; MUBUF: ; %bb.0: ; %entry		; MUBUF: ; %bb.0: ; %entry
; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; MUBUF-NEXT: s_mov_b32 s7, s33		; MUBUF-NEXT: s_mov_b32 s7, s33
; MUBUF-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2		; MUBUF-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2
; MUBUF-NEXT: s_mov_b32 s33, s32		; MUBUF-NEXT: s_mov_b32 s33, s32
; MUBUF-NEXT: s_add_u32 s32, s32, 0x400		; MUBUF-NEXT: s_addk_i32 s32, 0x400
; MUBUF-NEXT: s_and_saveexec_b64 s[4:5], vcc		; MUBUF-NEXT: s_and_saveexec_b64 s[4:5], vcc
; MUBUF-NEXT: s_cbranch_execz BB2_3		; MUBUF-NEXT: s_cbranch_execz BB2_3
; MUBUF-NEXT: ; %bb.1: ; %bb.0		; MUBUF-NEXT: ; %bb.1: ; %bb.0
; MUBUF-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3		; MUBUF-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3
; MUBUF-NEXT: s_and_b64 exec, exec, vcc		; MUBUF-NEXT: s_and_b64 exec, exec, vcc
; MUBUF-NEXT: s_cbranch_execz BB2_3		; MUBUF-NEXT: s_cbranch_execz BB2_3
; MUBUF-NEXT: ; %bb.2: ; %bb.1		; MUBUF-NEXT: ; %bb.2: ; %bb.1
; MUBUF-NEXT: s_add_i32 s6, s32, 0x1000		; MUBUF-NEXT: s_add_i32 s6, s32, 0x1000
Show All 9 Lines
; MUBUF-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; MUBUF-NEXT: v_add_u32_e32 v2, v2, v3		; MUBUF-NEXT: v_add_u32_e32 v2, v2, v3
; MUBUF-NEXT: global_store_dword v[0:1], v2, off		; MUBUF-NEXT: global_store_dword v[0:1], v2, off
; MUBUF-NEXT: BB2_3: ; %bb.2		; MUBUF-NEXT: BB2_3: ; %bb.2
; MUBUF-NEXT: s_or_b64 exec, exec, s[4:5]		; MUBUF-NEXT: s_or_b64 exec, exec, s[4:5]
; MUBUF-NEXT: v_mov_b32_e32 v0, 0		; MUBUF-NEXT: v_mov_b32_e32 v0, 0
; MUBUF-NEXT: global_store_dword v[0:1], v0, off		; MUBUF-NEXT: global_store_dword v[0:1], v0, off
; MUBUF-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; MUBUF-NEXT: s_sub_u32 s32, s32, 0x400		; MUBUF-NEXT: s_addk_i32 s32, 0xfc00
; MUBUF-NEXT: s_mov_b32 s33, s7		; MUBUF-NEXT: s_mov_b32 s33, s7
; MUBUF-NEXT: s_setpc_b64 s[30:31]		; MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; FLATSCR-LABEL: func_non_entry_block_static_alloca_align4:		; FLATSCR-LABEL: func_non_entry_block_static_alloca_align4:
; FLATSCR: ; %bb.0: ; %entry		; FLATSCR: ; %bb.0: ; %entry
; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; FLATSCR-NEXT: s_mov_b32 s4, s33		; FLATSCR-NEXT: s_mov_b32 s3, s33
; FLATSCR-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2		; FLATSCR-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2
; FLATSCR-NEXT: s_mov_b32 s33, s32		; FLATSCR-NEXT: s_mov_b32 s33, s32
; FLATSCR-NEXT: s_add_u32 s32, s32, 16		; FLATSCR-NEXT: s_add_i32 s32, s32, 16
; FLATSCR-NEXT: s_and_saveexec_b64 s[0:1], vcc		; FLATSCR-NEXT: s_and_saveexec_b64 s[0:1], vcc
; FLATSCR-NEXT: s_cbranch_execz BB2_3		; FLATSCR-NEXT: s_cbranch_execz BB2_3
; FLATSCR-NEXT: ; %bb.1: ; %bb.0		; FLATSCR-NEXT: ; %bb.1: ; %bb.0
; FLATSCR-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3		; FLATSCR-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3
; FLATSCR-NEXT: s_and_b64 exec, exec, vcc		; FLATSCR-NEXT: s_and_b64 exec, exec, vcc
; FLATSCR-NEXT: s_cbranch_execz BB2_3		; FLATSCR-NEXT: s_cbranch_execz BB2_3
; FLATSCR-NEXT: ; %bb.2: ; %bb.1		; FLATSCR-NEXT: ; %bb.2: ; %bb.1
; FLATSCR-NEXT: s_mov_b32 s2, s32		; FLATSCR-NEXT: s_add_i32 s2, s32, 0x1000
; FLATSCR-NEXT: s_add_i32 s3, s2, 0x1000
; FLATSCR-NEXT: s_add_u32 s2, s2, 0x1000
; FLATSCR-NEXT: v_mov_b32_e32 v2, 0		; FLATSCR-NEXT: v_mov_b32_e32 v2, 0
; FLATSCR-NEXT: v_mov_b32_e32 v3, 1		; FLATSCR-NEXT: v_mov_b32_e32 v3, 1
; FLATSCR-NEXT: scratch_store_dwordx2 off, v[2:3], s2		; FLATSCR-NEXT: scratch_store_dwordx2 off, v[2:3], s2
; FLATSCR-NEXT: v_lshl_add_u32 v2, v4, 2, s3		; FLATSCR-NEXT: v_lshl_add_u32 v2, v4, 2, s2
; FLATSCR-NEXT: scratch_load_dword v2, v2, off		; FLATSCR-NEXT: scratch_load_dword v2, v2, off
; FLATSCR-NEXT: v_and_b32_e32 v3, 0x3ff, v5		; FLATSCR-NEXT: v_and_b32_e32 v3, 0x3ff, v5
; FLATSCR-NEXT: s_mov_b32 s32, s3		; FLATSCR-NEXT: s_mov_b32 s32, s2
; FLATSCR-NEXT: s_waitcnt vmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: v_add_u32_e32 v2, v2, v3		; FLATSCR-NEXT: v_add_u32_e32 v2, v2, v3
; FLATSCR-NEXT: global_store_dword v[0:1], v2, off		; FLATSCR-NEXT: global_store_dword v[0:1], v2, off
; FLATSCR-NEXT: BB2_3: ; %bb.2		; FLATSCR-NEXT: BB2_3: ; %bb.2
; FLATSCR-NEXT: s_or_b64 exec, exec, s[0:1]		; FLATSCR-NEXT: s_or_b64 exec, exec, s[0:1]
; FLATSCR-NEXT: v_mov_b32_e32 v0, 0		; FLATSCR-NEXT: v_mov_b32_e32 v0, 0
; FLATSCR-NEXT: global_store_dword v[0:1], v0, off		; FLATSCR-NEXT: global_store_dword v[0:1], v0, off
; FLATSCR-NEXT: s_waitcnt vmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: s_sub_u32 s32, s32, 16		; FLATSCR-NEXT: s_add_i32 s32, s32, -16
; FLATSCR-NEXT: s_mov_b32 s33, s4		; FLATSCR-NEXT: s_mov_b32 s33, s3
; FLATSCR-NEXT: s_setpc_b64 s[30:31]		; FLATSCR-NEXT: s_setpc_b64 s[30:31]

entry:		entry:
%cond0 = icmp eq i32 %arg.cond0, 0		%cond0 = icmp eq i32 %arg.cond0, 0
br i1 %cond0, label %bb.0, label %bb.2		br i1 %cond0, label %bb.0, label %bb.2

bb.0:		bb.0:
%alloca = alloca [16 x i32], align 4, addrspace(5)		%alloca = alloca [16 x i32], align 4, addrspace(5)
Show All 18 Lines	bb.2:
ret void		ret void
}		}

define void @func_non_entry_block_static_alloca_align64(i32 addrspace(1)* %out, i32 %arg.cond, i32 %in) {		define void @func_non_entry_block_static_alloca_align64(i32 addrspace(1)* %out, i32 %arg.cond, i32 %in) {
; MUBUF-LABEL: func_non_entry_block_static_alloca_align64:		; MUBUF-LABEL: func_non_entry_block_static_alloca_align64:
; MUBUF: ; %bb.0: ; %entry		; MUBUF: ; %bb.0: ; %entry
; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; MUBUF-NEXT: s_mov_b32 s7, s33		; MUBUF-NEXT: s_mov_b32 s7, s33
; MUBUF-NEXT: s_add_u32 s33, s32, 0xfc0		; MUBUF-NEXT: s_add_i32 s33, s32, 0xfc0
; MUBUF-NEXT: s_and_b32 s33, s33, 0xfffff000		; MUBUF-NEXT: s_and_b32 s33, s33, 0xfffff000
; MUBUF-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2		; MUBUF-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2
; MUBUF-NEXT: s_add_u32 s32, s32, 0x2000		; MUBUF-NEXT: s_addk_i32 s32, 0x2000
; MUBUF-NEXT: s_and_saveexec_b64 s[4:5], vcc		; MUBUF-NEXT: s_and_saveexec_b64 s[4:5], vcc
; MUBUF-NEXT: s_cbranch_execz BB3_2		; MUBUF-NEXT: s_cbranch_execz BB3_2
; MUBUF-NEXT: ; %bb.1: ; %bb.0		; MUBUF-NEXT: ; %bb.1: ; %bb.0
; MUBUF-NEXT: s_add_i32 s6, s32, 0x1000		; MUBUF-NEXT: s_add_i32 s6, s32, 0x1000
; MUBUF-NEXT: s_and_b32 s6, s6, 0xfffff000		; MUBUF-NEXT: s_and_b32 s6, s6, 0xfffff000
; MUBUF-NEXT: v_mov_b32_e32 v2, 0		; MUBUF-NEXT: v_mov_b32_e32 v2, 0
; MUBUF-NEXT: v_mov_b32_e32 v5, s6		; MUBUF-NEXT: v_mov_b32_e32 v5, s6
; MUBUF-NEXT: buffer_store_dword v2, v5, s[0:3], 0 offen		; MUBUF-NEXT: buffer_store_dword v2, v5, s[0:3], 0 offen
; MUBUF-NEXT: v_mov_b32_e32 v2, 1		; MUBUF-NEXT: v_mov_b32_e32 v2, 1
; MUBUF-NEXT: buffer_store_dword v2, v5, s[0:3], 0 offen offset:4		; MUBUF-NEXT: buffer_store_dword v2, v5, s[0:3], 0 offen offset:4
; MUBUF-NEXT: v_lshl_add_u32 v2, v3, 2, s6		; MUBUF-NEXT: v_lshl_add_u32 v2, v3, 2, s6
; MUBUF-NEXT: buffer_load_dword v2, v2, s[0:3], 0 offen		; MUBUF-NEXT: buffer_load_dword v2, v2, s[0:3], 0 offen
; MUBUF-NEXT: v_and_b32_e32 v3, 0x3ff, v4		; MUBUF-NEXT: v_and_b32_e32 v3, 0x3ff, v4
; MUBUF-NEXT: s_mov_b32 s32, s6		; MUBUF-NEXT: s_mov_b32 s32, s6
; MUBUF-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; MUBUF-NEXT: v_add_u32_e32 v2, v2, v3		; MUBUF-NEXT: v_add_u32_e32 v2, v2, v3
; MUBUF-NEXT: global_store_dword v[0:1], v2, off		; MUBUF-NEXT: global_store_dword v[0:1], v2, off
; MUBUF-NEXT: BB3_2: ; %bb.1		; MUBUF-NEXT: BB3_2: ; %bb.1
; MUBUF-NEXT: s_or_b64 exec, exec, s[4:5]		; MUBUF-NEXT: s_or_b64 exec, exec, s[4:5]
; MUBUF-NEXT: v_mov_b32_e32 v0, 0		; MUBUF-NEXT: v_mov_b32_e32 v0, 0
; MUBUF-NEXT: global_store_dword v[0:1], v0, off		; MUBUF-NEXT: global_store_dword v[0:1], v0, off
; MUBUF-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; MUBUF-NEXT: s_sub_u32 s32, s32, 0x2000		; MUBUF-NEXT: s_addk_i32 s32, 0xe000
; MUBUF-NEXT: s_mov_b32 s33, s7		; MUBUF-NEXT: s_mov_b32 s33, s7
; MUBUF-NEXT: s_setpc_b64 s[30:31]		; MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; FLATSCR-LABEL: func_non_entry_block_static_alloca_align64:		; FLATSCR-LABEL: func_non_entry_block_static_alloca_align64:
; FLATSCR: ; %bb.0: ; %entry		; FLATSCR: ; %bb.0: ; %entry
; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; FLATSCR-NEXT: s_mov_b32 s3, s33		; FLATSCR-NEXT: s_mov_b32 s3, s33
; FLATSCR-NEXT: s_add_u32 s33, s32, 63		; FLATSCR-NEXT: s_add_i32 s33, s32, 63
; FLATSCR-NEXT: s_andn2_b32 s33, s33, 63		; FLATSCR-NEXT: s_andn2_b32 s33, s33, 63
; FLATSCR-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2		; FLATSCR-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2
; FLATSCR-NEXT: s_add_u32 s32, s32, 0x80		; FLATSCR-NEXT: s_addk_i32 s32, 0x80
; FLATSCR-NEXT: s_and_saveexec_b64 s[0:1], vcc		; FLATSCR-NEXT: s_and_saveexec_b64 s[0:1], vcc
; FLATSCR-NEXT: s_cbranch_execz BB3_2		; FLATSCR-NEXT: s_cbranch_execz BB3_2
; FLATSCR-NEXT: ; %bb.1: ; %bb.0		; FLATSCR-NEXT: ; %bb.1: ; %bb.0
; FLATSCR-NEXT: s_add_i32 s2, s32, 0x1000		; FLATSCR-NEXT: s_add_i32 s2, s32, 0x1000
; FLATSCR-NEXT: s_and_b32 s2, s2, 0xfffff000		; FLATSCR-NEXT: s_and_b32 s2, s2, 0xfffff000
; FLATSCR-NEXT: v_mov_b32_e32 v5, 0		; FLATSCR-NEXT: v_mov_b32_e32 v5, 0
; FLATSCR-NEXT: v_mov_b32_e32 v6, 1		; FLATSCR-NEXT: v_mov_b32_e32 v6, 1
; FLATSCR-NEXT: v_lshl_add_u32 v2, v3, 2, s2		; FLATSCR-NEXT: v_lshl_add_u32 v2, v3, 2, s2
; FLATSCR-NEXT: scratch_store_dwordx2 off, v[5:6], s2		; FLATSCR-NEXT: scratch_store_dwordx2 off, v[5:6], s2
; FLATSCR-NEXT: scratch_load_dword v2, v2, off		; FLATSCR-NEXT: scratch_load_dword v2, v2, off
; FLATSCR-NEXT: v_and_b32_e32 v3, 0x3ff, v4		; FLATSCR-NEXT: v_and_b32_e32 v3, 0x3ff, v4
; FLATSCR-NEXT: s_mov_b32 s32, s2		; FLATSCR-NEXT: s_mov_b32 s32, s2
; FLATSCR-NEXT: s_waitcnt vmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: v_add_u32_e32 v2, v2, v3		; FLATSCR-NEXT: v_add_u32_e32 v2, v2, v3
; FLATSCR-NEXT: global_store_dword v[0:1], v2, off		; FLATSCR-NEXT: global_store_dword v[0:1], v2, off
; FLATSCR-NEXT: BB3_2: ; %bb.1		; FLATSCR-NEXT: BB3_2: ; %bb.1
; FLATSCR-NEXT: s_or_b64 exec, exec, s[0:1]		; FLATSCR-NEXT: s_or_b64 exec, exec, s[0:1]
; FLATSCR-NEXT: v_mov_b32_e32 v0, 0		; FLATSCR-NEXT: v_mov_b32_e32 v0, 0
; FLATSCR-NEXT: global_store_dword v[0:1], v0, off		; FLATSCR-NEXT: global_store_dword v[0:1], v0, off
; FLATSCR-NEXT: s_waitcnt vmcnt(0)		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: s_sub_u32 s32, s32, 0x80		; FLATSCR-NEXT: s_addk_i32 s32, 0xff80
; FLATSCR-NEXT: s_mov_b32 s33, s3		; FLATSCR-NEXT: s_mov_b32 s33, s3
; FLATSCR-NEXT: s_setpc_b64 s[30:31]		; FLATSCR-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%cond = icmp eq i32 %arg.cond, 0		%cond = icmp eq i32 %arg.cond, 0
br i1 %cond, label %bb.0, label %bb.1		br i1 %cond, label %bb.0, label %bb.1

bb.0:		bb.0:
%alloca = alloca [16 x i32], align 64, addrspace(5)		%alloca = alloca [16 x i32], align 64, addrspace(5)
Show All 19 Lines

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr-carry-out.mir

	Show All 23 Lines

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr1			liveins: $vgpr1

	; CHECK-LABEL: name: scavenge_sgpr_pei_no_sgprs			; CHECK-LABEL: name: scavenge_sgpr_pei_no_sgprs
	; CHECK: liveins: $vgpr1, $vgpr2			; CHECK: liveins: $vgpr1, $vgpr2
	; CHECK: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec			; CHECK: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
	; CHECK: $sgpr6 = S_ADD_U32 $sgpr32, 524544, implicit-def $scc			; CHECK: $sgpr6 = S_ADD_I32 $sgpr32, 524544, implicit-def $scc
	; CHECK: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.3, addrspace 5)			; CHECK: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.3, addrspace 5)
	; CHECK: $exec = S_MOV_B64 killed $sgpr4_sgpr5			; CHECK: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; CHECK: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2			; CHECK: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2
	; CHECK: $sgpr33 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; CHECK: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 524224, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def $scc			; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def $scc
	; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	; CHECK: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec			; CHECK: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; CHECK: $sgpr33 = S_LSHR_B32 $sgpr33, 6, implicit-def $scc			; CHECK: $sgpr33 = S_LSHR_B32 $sgpr33, 6, implicit-def $scc
	; CHECK: $sgpr33 = S_ADD_U32 killed $sgpr33, 8192, implicit-def $scc			; CHECK: $sgpr33 = S_ADD_I32 killed $sgpr33, 8192, implicit-def $scc
	; CHECK: $vgpr3 = COPY killed $sgpr33			; CHECK: $vgpr3 = COPY killed $sgpr33
	; CHECK: $sgpr33 = S_SUB_U32 killed $sgpr33, 8192, implicit-def $scc			; CHECK: $sgpr33 = S_ADD_I32 killed $sgpr33, -8192, implicit-def $scc
	; CHECK: $sgpr33 = S_LSHL_B32 $sgpr33, 6, implicit-def $scc			; CHECK: $sgpr33 = S_LSHL_B32 $sgpr33, 6, implicit-def $scc
	; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31			; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
	; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -1572864, implicit-def $scc
	; CHECK: $sgpr33 = V_READLANE_B32 $vgpr2, 0			; CHECK: $sgpr33 = V_READLANE_B32 $vgpr2, 0
	; CHECK: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec			; CHECK: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
	; CHECK: $sgpr6 = S_ADD_U32 $sgpr32, 524544, implicit-def $scc			; CHECK: $sgpr6 = S_ADD_I32 $sgpr32, 524544, implicit-def $scc
	; CHECK: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.3, addrspace 5)			; CHECK: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.3, addrspace 5)
	; CHECK: $exec = S_MOV_B64 killed $sgpr4_sgpr5			; CHECK: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; CHECK: S_ENDPGM 0, implicit $vcc			; CHECK: S_ENDPGM 0, implicit $vcc
	S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec			$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec
	$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31			$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
	S_ENDPGM 0, implicit $vcc			S_ENDPGM 0, implicit $vcc
	...			...
	Show All 17 Lines

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr1			liveins: $vgpr1

	; CHECK-LABEL: name: scavenge_sgpr_pei_one_sgpr			; CHECK-LABEL: name: scavenge_sgpr_pei_one_sgpr
	; CHECK: liveins: $sgpr29, $vgpr1			; CHECK: liveins: $sgpr29, $vgpr1
	; CHECK: $sgpr29 = frame-setup COPY $sgpr33			; CHECK: $sgpr29 = frame-setup COPY $sgpr33
	; CHECK: $sgpr33 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; CHECK: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 524224, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def $scc			; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def $scc
	; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	; CHECK: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec			; CHECK: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; CHECK: $sgpr33 = S_LSHR_B32 $sgpr33, 6, implicit-def $scc			; CHECK: $sgpr33 = S_LSHR_B32 $sgpr33, 6, implicit-def $scc
	; CHECK: $sgpr33 = S_ADD_U32 killed $sgpr33, 8192, implicit-def $scc			; CHECK: $sgpr33 = S_ADD_I32 killed $sgpr33, 8192, implicit-def $scc
	; CHECK: $vgpr2 = COPY killed $sgpr33			; CHECK: $vgpr2 = COPY killed $sgpr33
	; CHECK: $sgpr33 = S_SUB_U32 killed $sgpr33, 8192, implicit-def $scc			; CHECK: $sgpr33 = S_ADD_I32 killed $sgpr33, -8192, implicit-def $scc
	; CHECK: $sgpr33 = S_LSHL_B32 $sgpr33, 6, implicit-def $scc			; CHECK: $sgpr33 = S_LSHL_B32 $sgpr33, 6, implicit-def $scc
	; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr31			; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr31
	; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -1572864, implicit-def $scc
	; CHECK: $sgpr33 = frame-destroy COPY $sgpr29			; CHECK: $sgpr33 = frame-destroy COPY $sgpr29
	; CHECK: S_ENDPGM 0, implicit $vcc			; CHECK: S_ENDPGM 0, implicit $vcc
	S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec			$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec
	$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr31			$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr31
	S_ENDPGM 0, implicit $vcc			S_ENDPGM 0, implicit $vcc
	...			...

	Show All 16 Lines

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr1			liveins: $vgpr1

	; CHECK-LABEL: name: scavenge_sgpr_pei_one_sgpr_64			; CHECK-LABEL: name: scavenge_sgpr_pei_one_sgpr_64
	; CHECK: liveins: $sgpr28, $vgpr1			; CHECK: liveins: $sgpr28, $vgpr1
	; CHECK: $sgpr28 = frame-setup COPY $sgpr33			; CHECK: $sgpr28 = frame-setup COPY $sgpr33
	; CHECK: $sgpr33 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; CHECK: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 524224, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def $scc			; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def $scc
	; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	; CHECK: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec			; CHECK: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; CHECK: $sgpr29 = S_LSHR_B32 $sgpr33, 6, implicit-def $scc			; CHECK: $sgpr29 = S_LSHR_B32 $sgpr33, 6, implicit-def $scc
	; CHECK: $sgpr29 = S_ADD_U32 killed $sgpr29, 8192, implicit-def $scc			; CHECK: $sgpr29 = S_ADD_I32 killed $sgpr29, 8192, implicit-def $scc
	; CHECK: $vgpr2 = COPY killed $sgpr29			; CHECK: $vgpr2 = COPY killed $sgpr29
	; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr31			; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr31
	; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -1572864, implicit-def $scc
	; CHECK: $sgpr33 = frame-destroy COPY $sgpr28			; CHECK: $sgpr33 = frame-destroy COPY $sgpr28
	; CHECK: S_ENDPGM 0, implicit $vcc			; CHECK: S_ENDPGM 0, implicit $vcc
	S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec			$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec
	$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr31			$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr31
	S_ENDPGM 0, implicit $vcc			S_ENDPGM 0, implicit $vcc
	...			...

	Show All 15 Lines

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr1			liveins: $vgpr1

	; CHECK-LABEL: name: scavenge_sgpr_pei_prefer_vcc			; CHECK-LABEL: name: scavenge_sgpr_pei_prefer_vcc
	; CHECK: liveins: $sgpr28, $vgpr1			; CHECK: liveins: $sgpr28, $vgpr1
	; CHECK: $sgpr28 = frame-setup COPY $sgpr33			; CHECK: $sgpr28 = frame-setup COPY $sgpr33
	; CHECK: $sgpr33 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; CHECK: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 524224, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def $scc			; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def $scc
	; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr30, implicit-def $sgpr31			; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr30, implicit-def $sgpr31
	; CHECK: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec			; CHECK: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; CHECK: $vgpr2 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec			; CHECK: $vgpr2 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; CHECK: $vcc_lo = S_MOV_B32 8192			; CHECK: $vcc_lo = S_MOV_B32 8192
	; CHECK: $vgpr2, dead $vcc = V_ADD_CO_U32_e64 killed $vcc_lo, killed $vgpr2, 0, implicit $exec			; CHECK: $vgpr2, dead $vcc = V_ADD_CO_U32_e64 killed $vcc_lo, killed $vgpr2, 0, implicit $exec
	; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr31			; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr31
	; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; CHECK: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -1572864, implicit-def $scc
	; CHECK: $sgpr33 = frame-destroy COPY $sgpr28			; CHECK: $sgpr33 = frame-destroy COPY $sgpr28
	; CHECK: S_ENDPGM 0			; CHECK: S_ENDPGM 0
	S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr30, implicit-def $sgpr31			S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr30, implicit-def $sgpr31
	$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec			$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec
	$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr31			$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr31
	S_ENDPGM 0			S_ENDPGM 0
	...			...

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr-gfx9.mir

	Show All 19 Lines

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr1			liveins: $vgpr1

	; MUBUF-LABEL: name: scavenge_sgpr_pei_no_sgprs			; MUBUF-LABEL: name: scavenge_sgpr_pei_no_sgprs
	; MUBUF: liveins: $vgpr1, $vgpr2			; MUBUF: liveins: $vgpr1, $vgpr2
	; MUBUF: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec			; MUBUF: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
	; MUBUF: $sgpr6 = S_ADD_U32 $sgpr32, 524544, implicit-def $scc			; MUBUF: $sgpr6 = S_ADD_I32 $sgpr32, 524544, implicit-def $scc
	; MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.3, addrspace 5)			; MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.3, addrspace 5)
	; MUBUF: $exec = S_MOV_B64 killed $sgpr4_sgpr5			; MUBUF: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; MUBUF: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2			; MUBUF: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2
	; MUBUF: $sgpr33 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; MUBUF: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 524224, implicit-def $scc
	; MUBUF: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def $scc			; MUBUF: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def $scc
	; MUBUF: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; MUBUF: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 1572864, implicit-def $scc
	; MUBUF: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			; MUBUF: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	; MUBUF: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec			; MUBUF: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; MUBUF: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec			; MUBUF: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; MUBUF: $vgpr3 = V_ADD_U32_e32 8192, killed $vgpr3, implicit $exec			; MUBUF: $vgpr3 = V_ADD_U32_e32 8192, killed $vgpr3, implicit $exec
	; MUBUF: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31			; MUBUF: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
	; MUBUF: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; MUBUF: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -1572864, implicit-def $scc
	; MUBUF: $sgpr33 = V_READLANE_B32 $vgpr2, 0			; MUBUF: $sgpr33 = V_READLANE_B32 $vgpr2, 0
	; MUBUF: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec			; MUBUF: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
	; MUBUF: $sgpr6 = S_ADD_U32 $sgpr32, 524544, implicit-def $scc			; MUBUF: $sgpr6 = S_ADD_I32 $sgpr32, 524544, implicit-def $scc
	; MUBUF: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.3, addrspace 5)			; MUBUF: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.3, addrspace 5)
	; MUBUF: $exec = S_MOV_B64 killed $sgpr4_sgpr5			; MUBUF: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; MUBUF: S_ENDPGM 0, implicit $vcc			; MUBUF: S_ENDPGM 0, implicit $vcc
	; FLATSCR-LABEL: name: scavenge_sgpr_pei_no_sgprs			; FLATSCR-LABEL: name: scavenge_sgpr_pei_no_sgprs
	; FLATSCR: liveins: $vgpr1, $vgpr2			; FLATSCR: liveins: $vgpr1, $vgpr2
	; FLATSCR: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec			; FLATSCR: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
	; FLATSCR: $sgpr6 = S_ADD_U32 $sgpr32, 8196, implicit-def $scc			; FLATSCR: $sgpr6 = S_ADD_I32 $sgpr32, 8196, implicit-def $scc
	; FLATSCR: SCRATCH_STORE_DWORD_SADDR killed $vgpr2, killed $sgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (store 4 into %stack.3, addrspace 5)			; FLATSCR: SCRATCH_STORE_DWORD_SADDR killed $vgpr2, killed $sgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (store 4 into %stack.3, addrspace 5)
	; FLATSCR: $exec = S_MOV_B64 killed $sgpr4_sgpr5			; FLATSCR: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; FLATSCR: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2			; FLATSCR: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2
	; FLATSCR: $sgpr33 = frame-setup S_ADD_U32 $sgpr32, 8191, implicit-def $scc			; FLATSCR: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 8191, implicit-def $scc
	; FLATSCR: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294959104, implicit-def $scc			; FLATSCR: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294959104, implicit-def $scc
	; FLATSCR: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 24576, implicit-def $scc			; FLATSCR: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 24576, implicit-def $scc
	; FLATSCR: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			; FLATSCR: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	; FLATSCR: $vgpr0 = V_MOV_B32_e32 $sgpr33, implicit $exec			; FLATSCR: $vgpr0 = V_MOV_B32_e32 $sgpr33, implicit $exec
	; FLATSCR: $sgpr33 = S_ADD_U32 $sgpr33, 8192, implicit-def $scc			; FLATSCR: $sgpr33 = S_ADD_I32 $sgpr33, 8192, implicit-def $scc
	; FLATSCR: $vgpr0 = V_OR_B32_e32 $sgpr33, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31			; FLATSCR: $vgpr0 = V_OR_B32_e32 $sgpr33, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
	; FLATSCR: $sgpr33 = S_SUB_U32 $sgpr33, 8192, implicit-def $scc			; FLATSCR: $sgpr33 = S_ADD_I32 $sgpr33, -8192, implicit-def $scc
	; FLATSCR: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 24576, implicit-def $scc			; FLATSCR: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -24576, implicit-def $scc
	; FLATSCR: $sgpr33 = V_READLANE_B32 $vgpr2, 0			; FLATSCR: $sgpr33 = V_READLANE_B32 $vgpr2, 0
	; FLATSCR: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec			; FLATSCR: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
	; FLATSCR: $sgpr6 = S_ADD_U32 $sgpr32, 8196, implicit-def $scc			; FLATSCR: $sgpr6 = S_ADD_I32 $sgpr32, 8196, implicit-def $scc
	; FLATSCR: $vgpr2 = SCRATCH_LOAD_DWORD_SADDR killed $sgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %stack.3, addrspace 5)			; FLATSCR: $vgpr2 = SCRATCH_LOAD_DWORD_SADDR killed $sgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %stack.3, addrspace 5)
	; FLATSCR: $exec = S_MOV_B64 killed $sgpr4_sgpr5			; FLATSCR: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; FLATSCR: S_ENDPGM 0, implicit $vcc			; FLATSCR: S_ENDPGM 0, implicit $vcc
	S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec			$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec
	$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31			$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
	S_ENDPGM 0, implicit $vcc			S_ENDPGM 0, implicit $vcc
	...			...

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr.mir

Show All 21 Lines	bb.0:
liveins: $vgpr1		liveins: $vgpr1

; CHECK-LABEL: name: scavenge_sgpr_pei		; CHECK-LABEL: name: scavenge_sgpr_pei
; CHECK: liveins: $vgpr1, $vgpr2		; CHECK: liveins: $vgpr1, $vgpr2
; CHECK: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec		; CHECK: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
; CHECK: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, 0, implicit $exec :: (store 4 into %stack.2, addrspace 5)		; CHECK: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, 0, implicit $exec :: (store 4 into %stack.2, addrspace 5)
; CHECK: $exec = S_MOV_B64 killed $sgpr4_sgpr5		; CHECK: $exec = S_MOV_B64 killed $sgpr4_sgpr5
; CHECK: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2		; CHECK: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2
; CHECK: $sgpr33 = frame-setup S_ADD_U32 $sgpr32, 262080, implicit-def $scc		; CHECK: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 262080, implicit-def $scc
; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294705152, implicit-def $scc		; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294705152, implicit-def $scc
; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 524288, implicit-def $scc		; CHECK: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 524288, implicit-def $scc
; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc		; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
; CHECK: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec		; CHECK: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31		; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 524288, implicit-def $scc		; CHECK: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -524288, implicit-def $scc
; CHECK: $sgpr33 = V_READLANE_B32 $vgpr2, 0		; CHECK: $sgpr33 = V_READLANE_B32 $vgpr2, 0
; CHECK: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec		; CHECK: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
; CHECK: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, 0, implicit $exec :: (load 4 from %stack.2, addrspace 5)		; CHECK: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, 0, implicit $exec :: (load 4 from %stack.2, addrspace 5)
; CHECK: $exec = S_MOV_B64 killed $sgpr4_sgpr5		; CHECK: $exec = S_MOV_B64 killed $sgpr4_sgpr5
; CHECK: S_ENDPGM 0, implicit $vcc		; CHECK: S_ENDPGM 0, implicit $vcc
S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc		S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
$vgpr0 = V_OR_B32_e32 %stack.0, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31		$vgpr0 = V_OR_B32_e32 %stack.0, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
S_ENDPGM 0, implicit $vcc		S_ENDPGM 0, implicit $vcc
...		...

llvm/test/CodeGen/AMDGPU/pei-scavenge-vgpr-spill.mir

	Show All 20 Lines

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255			liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255

	; GFX8-LABEL: name: pei_scavenge_vgpr_spill			; GFX8-LABEL: name: pei_scavenge_vgpr_spill
	; GFX8: liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr2			; GFX8: liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr2
	; GFX8: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec			; GFX8: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
	; GFX8: $sgpr6 = S_ADD_U32 $sgpr32, 524544, implicit-def $scc			; GFX8: $sgpr6 = S_ADD_I32 $sgpr32, 524544, implicit-def $scc
	; GFX8: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.3, addrspace 5)			; GFX8: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.3, addrspace 5)
	; GFX8: $exec = S_MOV_B64 killed $sgpr4_sgpr5			; GFX8: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; GFX8: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2			; GFX8: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2
	; GFX8: $sgpr33 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; GFX8: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 524224, implicit-def $scc
	; GFX8: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def $scc			; GFX8: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def $scc
	; GFX8: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; GFX8: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 1572864, implicit-def $scc
	; GFX8: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec			; GFX8: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; GFX8: $sgpr7 = S_ADD_U32 $sgpr33, 524800, implicit-def $scc			; GFX8: $sgpr7 = S_ADD_I32 $sgpr33, 524800, implicit-def $scc
	; GFX8: BUFFER_STORE_DWORD_OFFSET killed $vgpr3, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr7, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)			; GFX8: BUFFER_STORE_DWORD_OFFSET killed $vgpr3, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr7, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)
	; GFX8: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec			; GFX8: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; GFX8: $vcc_lo = S_MOV_B32 8192			; GFX8: $vcc_lo = S_MOV_B32 8192
	; GFX8: $vgpr3, dead $vcc = V_ADD_CO_U32_e64 killed $vcc_lo, killed $vgpr3, 0, implicit $exec			; GFX8: $vgpr3, dead $vcc = V_ADD_CO_U32_e64 killed $vcc_lo, killed $vgpr3, 0, implicit $exec
	; GFX8: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec			; GFX8: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec
	; GFX8: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; GFX8: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -1572864, implicit-def $scc
	; GFX8: $sgpr33 = V_READLANE_B32 $vgpr2, 0			; GFX8: $sgpr33 = V_READLANE_B32 $vgpr2, 0
	; GFX8: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec			; GFX8: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
	; GFX8: $sgpr6 = S_ADD_U32 $sgpr32, 524544, implicit-def $scc			; GFX8: $sgpr6 = S_ADD_I32 $sgpr32, 524544, implicit-def $scc
	; GFX8: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.3, addrspace 5)			; GFX8: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.3, addrspace 5)
	; GFX8: $exec = S_MOV_B64 killed $sgpr4_sgpr5			; GFX8: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; GFX8: $sgpr4 = S_ADD_U32 $sgpr33, 524800, implicit-def $scc			; GFX8: $sgpr4 = S_ADD_I32 $sgpr33, 524800, implicit-def $scc
	; GFX8: $vgpr3 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr4, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.4, addrspace 5)			; GFX8: $vgpr3 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr4, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.4, addrspace 5)
	; GFX8: S_ENDPGM 0, csr_amdgpu_allvgprs			; GFX8: S_ENDPGM 0, csr_amdgpu_allvgprs
	; GFX9-LABEL: name: pei_scavenge_vgpr_spill			; GFX9-LABEL: name: pei_scavenge_vgpr_spill
	; GFX9: liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr2			; GFX9: liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr2
	; GFX9: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec			; GFX9: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
	; GFX9: $sgpr6 = S_ADD_U32 $sgpr32, 524544, implicit-def $scc			; GFX9: $sgpr6 = S_ADD_I32 $sgpr32, 524544, implicit-def $scc
	; GFX9: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.3, addrspace 5)			; GFX9: BUFFER_STORE_DWORD_OFFSET killed $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.3, addrspace 5)
	; GFX9: $exec = S_MOV_B64 killed $sgpr4_sgpr5			; GFX9: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; GFX9: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2			; GFX9: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2
	; GFX9: $sgpr33 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; GFX9: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 524224, implicit-def $scc
	; GFX9: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def $scc			; GFX9: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def $scc
	; GFX9: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; GFX9: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 1572864, implicit-def $scc
	; GFX9: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec			; GFX9: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; GFX9: $sgpr7 = S_ADD_U32 $sgpr33, 524800, implicit-def $scc			; GFX9: $sgpr7 = S_ADD_I32 $sgpr33, 524800, implicit-def $scc
	; GFX9: BUFFER_STORE_DWORD_OFFSET killed $vgpr3, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr7, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)			; GFX9: BUFFER_STORE_DWORD_OFFSET killed $vgpr3, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr7, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.4, addrspace 5)
	; GFX9: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec			; GFX9: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; GFX9: $vgpr3 = V_ADD_U32_e32 8192, killed $vgpr3, implicit $exec			; GFX9: $vgpr3 = V_ADD_U32_e32 8192, killed $vgpr3, implicit $exec
	; GFX9: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec			; GFX9: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec
	; GFX9: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; GFX9: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -1572864, implicit-def $scc
	; GFX9: $sgpr33 = V_READLANE_B32 $vgpr2, 0			; GFX9: $sgpr33 = V_READLANE_B32 $vgpr2, 0
	; GFX9: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec			; GFX9: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
	; GFX9: $sgpr6 = S_ADD_U32 $sgpr32, 524544, implicit-def $scc			; GFX9: $sgpr6 = S_ADD_I32 $sgpr32, 524544, implicit-def $scc
	; GFX9: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.3, addrspace 5)			; GFX9: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr6, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.3, addrspace 5)
	; GFX9: $exec = S_MOV_B64 killed $sgpr4_sgpr5			; GFX9: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; GFX9: $sgpr4 = S_ADD_U32 $sgpr33, 524800, implicit-def $scc			; GFX9: $sgpr4 = S_ADD_I32 $sgpr33, 524800, implicit-def $scc
	; GFX9: $vgpr3 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr4, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.4, addrspace 5)			; GFX9: $vgpr3 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr4, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.4, addrspace 5)
	; GFX9: S_ENDPGM 0, csr_amdgpu_allvgprs			; GFX9: S_ENDPGM 0, csr_amdgpu_allvgprs
	; GFX9-FLATSCR-LABEL: name: pei_scavenge_vgpr_spill			; GFX9-FLATSCR-LABEL: name: pei_scavenge_vgpr_spill
	; GFX9-FLATSCR: liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr2			; GFX9-FLATSCR: liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr2
	; GFX9-FLATSCR: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec			; GFX9-FLATSCR: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
	; GFX9-FLATSCR: $sgpr6 = S_ADD_U32 $sgpr32, 8196, implicit-def $scc			; GFX9-FLATSCR: $sgpr6 = S_ADD_I32 $sgpr32, 8196, implicit-def $scc
	; GFX9-FLATSCR: SCRATCH_STORE_DWORD_SADDR killed $vgpr2, killed $sgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (store 4 into %stack.3, addrspace 5)			; GFX9-FLATSCR: SCRATCH_STORE_DWORD_SADDR killed $vgpr2, killed $sgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (store 4 into %stack.3, addrspace 5)
	; GFX9-FLATSCR: $exec = S_MOV_B64 killed $sgpr4_sgpr5			; GFX9-FLATSCR: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; GFX9-FLATSCR: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2			; GFX9-FLATSCR: $vgpr2 = V_WRITELANE_B32 $sgpr33, 0, undef $vgpr2
	; GFX9-FLATSCR: $sgpr33 = frame-setup S_ADD_U32 $sgpr32, 8191, implicit-def $scc			; GFX9-FLATSCR: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 8191, implicit-def $scc
	; GFX9-FLATSCR: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294959104, implicit-def $scc			; GFX9-FLATSCR: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294959104, implicit-def $scc
	; GFX9-FLATSCR: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 24576, implicit-def $scc			; GFX9-FLATSCR: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 24576, implicit-def $scc
	; GFX9-FLATSCR: $vgpr0 = V_MOV_B32_e32 $sgpr33, implicit $exec			; GFX9-FLATSCR: $vgpr0 = V_MOV_B32_e32 $sgpr33, implicit $exec
	; GFX9-FLATSCR: $vcc_hi = S_ADD_U32 $sgpr33, 8192, implicit-def $scc			; GFX9-FLATSCR: $vcc_hi = S_ADD_I32 $sgpr33, 8192, implicit-def $scc
	; GFX9-FLATSCR: $vgpr0 = V_OR_B32_e32 killed $vcc_hi, $vgpr1, implicit $exec			; GFX9-FLATSCR: $vgpr0 = V_OR_B32_e32 killed $vcc_hi, $vgpr1, implicit $exec
	; GFX9-FLATSCR: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 24576, implicit-def $scc			; GFX9-FLATSCR: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -24576, implicit-def $scc
	; GFX9-FLATSCR: $sgpr33 = V_READLANE_B32 $vgpr2, 0			; GFX9-FLATSCR: $sgpr33 = V_READLANE_B32 $vgpr2, 0
	; GFX9-FLATSCR: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec			; GFX9-FLATSCR: $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def $scc, implicit $exec
	; GFX9-FLATSCR: $sgpr6 = S_ADD_U32 $sgpr32, 8196, implicit-def $scc			; GFX9-FLATSCR: $sgpr6 = S_ADD_I32 $sgpr32, 8196, implicit-def $scc
	; GFX9-FLATSCR: $vgpr2 = SCRATCH_LOAD_DWORD_SADDR killed $sgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %stack.3, addrspace 5)			; GFX9-FLATSCR: $vgpr2 = SCRATCH_LOAD_DWORD_SADDR killed $sgpr6, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %stack.3, addrspace 5)
	; GFX9-FLATSCR: $exec = S_MOV_B64 killed $sgpr4_sgpr5			; GFX9-FLATSCR: $exec = S_MOV_B64 killed $sgpr4_sgpr5
	; GFX9-FLATSCR: S_ENDPGM 0, csr_amdgpu_allvgprs			; GFX9-FLATSCR: S_ENDPGM 0, csr_amdgpu_allvgprs
	$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec			$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec
	$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec			$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec
	S_ENDPGM 0, csr_amdgpu_allvgprs			S_ENDPGM 0, csr_amdgpu_allvgprs
	...			...

llvm/test/CodeGen/AMDGPU/sgpr-spill.mir

Show First 20 Lines • Show All 596 Lines • ▼ Show 20 Lines	bb.0:
; GCN64-MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr28_sgpr29_sgpr30_sgpr31, $sgpr33, 160, 0, 0, 0, implicit $exec :: (store 4 into %stack.7, addrspace 5)		; GCN64-MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr28_sgpr29_sgpr30_sgpr31, $sgpr33, 160, 0, 0, 0, implicit $exec :: (store 4 into %stack.7, addrspace 5)
; GCN64-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr28_sgpr29_sgpr30_sgpr31, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load 4 from %fixed-stack.0, align 16, addrspace 5)		; GCN64-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr28_sgpr29_sgpr30_sgpr31, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load 4 from %fixed-stack.0, align 16, addrspace 5)
; GCN64-MUBUF: $exec = S_MOV_B64 killed $sgpr0_sgpr1, implicit killed $vgpr0		; GCN64-MUBUF: $exec = S_MOV_B64 killed $sgpr0_sgpr1, implicit killed $vgpr0
; GCN64-MUBUF: renamable $sgpr12 = IMPLICIT_DEF		; GCN64-MUBUF: renamable $sgpr12 = IMPLICIT_DEF
; GCN64-MUBUF: $sgpr0_sgpr1 = S_MOV_B64 $exec		; GCN64-MUBUF: $sgpr0_sgpr1 = S_MOV_B64 $exec
; GCN64-MUBUF: $exec = S_MOV_B64 1, implicit-def $vgpr0		; GCN64-MUBUF: $exec = S_MOV_B64 1, implicit-def $vgpr0
; GCN64-MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr28_sgpr29_sgpr30_sgpr31, $sgpr33, 0, 0, 0, 0, implicit $exec :: (store 4 into %fixed-stack.0, align 16, addrspace 5)		; GCN64-MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr28_sgpr29_sgpr30_sgpr31, $sgpr33, 0, 0, 0, 0, implicit $exec :: (store 4 into %fixed-stack.0, align 16, addrspace 5)
; GCN64-MUBUF: $vgpr0 = V_WRITELANE_B32 $sgpr12, 0, undef $vgpr0		; GCN64-MUBUF: $vgpr0 = V_WRITELANE_B32 $sgpr12, 0, undef $vgpr0
; GCN64-MUBUF: $sgpr2 = S_ADD_U32 $sgpr33, 262144, implicit-def $scc		; GCN64-MUBUF: $sgpr2 = S_ADD_I32 $sgpr33, 262144, implicit-def $scc
; GCN64-MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr28_sgpr29_sgpr30_sgpr31, killed $sgpr2, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.8, align 4096, addrspace 5)		; GCN64-MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr28_sgpr29_sgpr30_sgpr31, killed $sgpr2, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.8, align 4096, addrspace 5)
; GCN64-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr28_sgpr29_sgpr30_sgpr31, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load 4 from %fixed-stack.0, align 16, addrspace 5)		; GCN64-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr28_sgpr29_sgpr30_sgpr31, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load 4 from %fixed-stack.0, align 16, addrspace 5)
; GCN64-MUBUF: $exec = S_MOV_B64 killed $sgpr0_sgpr1, implicit killed $vgpr0		; GCN64-MUBUF: $exec = S_MOV_B64 killed $sgpr0_sgpr1, implicit killed $vgpr0
; GCN32-MUBUF-LABEL: name: check_spill		; GCN32-MUBUF-LABEL: name: check_spill
; GCN32-MUBUF: liveins: $sgpr8, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr11		; GCN32-MUBUF: liveins: $sgpr8, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr11
; GCN32-MUBUF: $sgpr33 = S_MOV_B32 0		; GCN32-MUBUF: $sgpr33 = S_MOV_B32 0
; GCN32-MUBUF: $sgpr96 = S_MOV_B32 &SCRATCH_RSRC_DWORD0, implicit-def $sgpr96_sgpr97_sgpr98_sgpr99		; GCN32-MUBUF: $sgpr96 = S_MOV_B32 &SCRATCH_RSRC_DWORD0, implicit-def $sgpr96_sgpr97_sgpr98_sgpr99
; GCN32-MUBUF: $sgpr97 = S_MOV_B32 &SCRATCH_RSRC_DWORD1, implicit-def $sgpr96_sgpr97_sgpr98_sgpr99		; GCN32-MUBUF: $sgpr97 = S_MOV_B32 &SCRATCH_RSRC_DWORD1, implicit-def $sgpr96_sgpr97_sgpr98_sgpr99
▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	bb.0:
; GCN32-MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr33, 160, 0, 0, 0, implicit $exec :: (store 4 into %stack.7, addrspace 5)		; GCN32-MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr33, 160, 0, 0, 0, implicit $exec :: (store 4 into %stack.7, addrspace 5)
; GCN32-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load 4 from %fixed-stack.0, align 16, addrspace 5)		; GCN32-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load 4 from %fixed-stack.0, align 16, addrspace 5)
; GCN32-MUBUF: $exec_lo = S_MOV_B32 killed $sgpr0, implicit killed $vgpr0		; GCN32-MUBUF: $exec_lo = S_MOV_B32 killed $sgpr0, implicit killed $vgpr0
; GCN32-MUBUF: renamable $sgpr12 = IMPLICIT_DEF		; GCN32-MUBUF: renamable $sgpr12 = IMPLICIT_DEF
; GCN32-MUBUF: $sgpr0 = S_MOV_B32 $exec_lo		; GCN32-MUBUF: $sgpr0 = S_MOV_B32 $exec_lo
; GCN32-MUBUF: $exec_lo = S_MOV_B32 1, implicit-def $vgpr0		; GCN32-MUBUF: $exec_lo = S_MOV_B32 1, implicit-def $vgpr0
; GCN32-MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr33, 0, 0, 0, 0, implicit $exec :: (store 4 into %fixed-stack.0, align 16, addrspace 5)		; GCN32-MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr33, 0, 0, 0, 0, implicit $exec :: (store 4 into %fixed-stack.0, align 16, addrspace 5)
; GCN32-MUBUF: $vgpr0 = V_WRITELANE_B32 $sgpr12, 0, undef $vgpr0		; GCN32-MUBUF: $vgpr0 = V_WRITELANE_B32 $sgpr12, 0, undef $vgpr0
; GCN32-MUBUF: $sgpr1 = S_ADD_U32 $sgpr33, 131072, implicit-def $scc		; GCN32-MUBUF: $sgpr1 = S_ADD_I32 $sgpr33, 131072, implicit-def $scc
; GCN32-MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr96_sgpr97_sgpr98_sgpr99, killed $sgpr1, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.8, align 4096, addrspace 5)		; GCN32-MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr96_sgpr97_sgpr98_sgpr99, killed $sgpr1, 0, 0, 0, 0, implicit $exec :: (store 4 into %stack.8, align 4096, addrspace 5)
; GCN32-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load 4 from %fixed-stack.0, align 16, addrspace 5)		; GCN32-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load 4 from %fixed-stack.0, align 16, addrspace 5)
; GCN32-MUBUF: $exec_lo = S_MOV_B32 killed $sgpr0, implicit killed $vgpr0		; GCN32-MUBUF: $exec_lo = S_MOV_B32 killed $sgpr0, implicit killed $vgpr0
; GCN64-FLATSCR-LABEL: name: check_spill		; GCN64-FLATSCR-LABEL: name: check_spill
; GCN64-FLATSCR: liveins: $sgpr8, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr11, $sgpr0_sgpr1		; GCN64-FLATSCR: liveins: $sgpr8, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr11, $sgpr0_sgpr1
; GCN64-FLATSCR: $sgpr33 = S_MOV_B32 0		; GCN64-FLATSCR: $sgpr33 = S_MOV_B32 0
; GCN64-FLATSCR: $flat_scr_lo = S_ADD_U32 $sgpr0, $sgpr11, implicit-def $scc		; GCN64-FLATSCR: $flat_scr_lo = S_ADD_U32 $sgpr0, $sgpr11, implicit-def $scc
; GCN64-FLATSCR: $flat_scr_hi = S_ADDC_U32 $sgpr1, 0, implicit-def $scc, implicit $scc		; GCN64-FLATSCR: $flat_scr_hi = S_ADDC_U32 $sgpr1, 0, implicit-def $scc, implicit $scc
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	bb.0:
; GCN64-FLATSCR: SCRATCH_STORE_DWORD_SADDR killed $vgpr0, $sgpr33, 160, 0, implicit $exec, implicit $flat_scr :: (store 4 into %stack.7, addrspace 5)		; GCN64-FLATSCR: SCRATCH_STORE_DWORD_SADDR killed $vgpr0, $sgpr33, 160, 0, implicit $exec, implicit $flat_scr :: (store 4 into %stack.7, addrspace 5)
; GCN64-FLATSCR: $vgpr0 = SCRATCH_LOAD_DWORD_SADDR $sgpr33, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %fixed-stack.0, align 16, addrspace 5)		; GCN64-FLATSCR: $vgpr0 = SCRATCH_LOAD_DWORD_SADDR $sgpr33, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %fixed-stack.0, align 16, addrspace 5)
; GCN64-FLATSCR: $exec = S_MOV_B64 killed $sgpr2_sgpr3, implicit killed $vgpr0		; GCN64-FLATSCR: $exec = S_MOV_B64 killed $sgpr2_sgpr3, implicit killed $vgpr0
; GCN64-FLATSCR: renamable $sgpr12 = IMPLICIT_DEF		; GCN64-FLATSCR: renamable $sgpr12 = IMPLICIT_DEF
; GCN64-FLATSCR: $sgpr2_sgpr3 = S_MOV_B64 $exec		; GCN64-FLATSCR: $sgpr2_sgpr3 = S_MOV_B64 $exec
; GCN64-FLATSCR: $exec = S_MOV_B64 1, implicit-def $vgpr0		; GCN64-FLATSCR: $exec = S_MOV_B64 1, implicit-def $vgpr0
; GCN64-FLATSCR: SCRATCH_STORE_DWORD_SADDR killed $vgpr0, $sgpr33, 0, 0, implicit $exec, implicit $flat_scr :: (store 4 into %fixed-stack.0, align 16, addrspace 5)		; GCN64-FLATSCR: SCRATCH_STORE_DWORD_SADDR killed $vgpr0, $sgpr33, 0, 0, implicit $exec, implicit $flat_scr :: (store 4 into %fixed-stack.0, align 16, addrspace 5)
; GCN64-FLATSCR: $vgpr0 = V_WRITELANE_B32 $sgpr12, 0, undef $vgpr0		; GCN64-FLATSCR: $vgpr0 = V_WRITELANE_B32 $sgpr12, 0, undef $vgpr0
; GCN64-FLATSCR: $sgpr9 = S_ADD_U32 $sgpr33, 4096, implicit-def $scc		; GCN64-FLATSCR: $sgpr9 = S_ADD_I32 $sgpr33, 4096, implicit-def $scc
; GCN64-FLATSCR: SCRATCH_STORE_DWORD_SADDR killed $vgpr0, killed $sgpr9, 0, 0, implicit $exec, implicit $flat_scr :: (store 4 into %stack.8, align 4096, addrspace 5)		; GCN64-FLATSCR: SCRATCH_STORE_DWORD_SADDR killed $vgpr0, killed $sgpr9, 0, 0, implicit $exec, implicit $flat_scr :: (store 4 into %stack.8, align 4096, addrspace 5)
; GCN64-FLATSCR: $vgpr0 = SCRATCH_LOAD_DWORD_SADDR $sgpr33, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %fixed-stack.0, align 16, addrspace 5)		; GCN64-FLATSCR: $vgpr0 = SCRATCH_LOAD_DWORD_SADDR $sgpr33, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %fixed-stack.0, align 16, addrspace 5)
; GCN64-FLATSCR: $exec = S_MOV_B64 killed $sgpr2_sgpr3, implicit killed $vgpr0		; GCN64-FLATSCR: $exec = S_MOV_B64 killed $sgpr2_sgpr3, implicit killed $vgpr0
renamable $sgpr12 = IMPLICIT_DEF		renamable $sgpr12 = IMPLICIT_DEF
SI_SPILL_S32_SAVE killed $sgpr12, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32		SI_SPILL_S32_SAVE killed $sgpr12, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32

renamable $sgpr12 = IMPLICIT_DEF		renamable $sgpr12 = IMPLICIT_DEF
SI_SPILL_S32_SAVE $sgpr12, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32		SI_SPILL_S32_SAVE $sgpr12, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32
▲ Show 20 Lines • Show All 190 Lines • ▼ Show 20 Lines	bb.0:
; GCN64-MUBUF: $sgpr93 = V_READLANE_B32 $vgpr0, 29		; GCN64-MUBUF: $sgpr93 = V_READLANE_B32 $vgpr0, 29
; GCN64-MUBUF: $sgpr94 = V_READLANE_B32 $vgpr0, 30		; GCN64-MUBUF: $sgpr94 = V_READLANE_B32 $vgpr0, 30
; GCN64-MUBUF: $sgpr95 = V_READLANE_B32 killed $vgpr0, 31		; GCN64-MUBUF: $sgpr95 = V_READLANE_B32 killed $vgpr0, 31
; GCN64-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr28_sgpr29_sgpr30_sgpr31, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load 4 from %fixed-stack.0, align 16, addrspace 5)		; GCN64-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr28_sgpr29_sgpr30_sgpr31, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load 4 from %fixed-stack.0, align 16, addrspace 5)
; GCN64-MUBUF: $exec = S_MOV_B64 killed $sgpr0_sgpr1, implicit killed $vgpr0		; GCN64-MUBUF: $exec = S_MOV_B64 killed $sgpr0_sgpr1, implicit killed $vgpr0
; GCN64-MUBUF: $sgpr0_sgpr1 = S_MOV_B64 $exec		; GCN64-MUBUF: $sgpr0_sgpr1 = S_MOV_B64 $exec
; GCN64-MUBUF: $exec = S_MOV_B64 1, implicit-def $vgpr0		; GCN64-MUBUF: $exec = S_MOV_B64 1, implicit-def $vgpr0
; GCN64-MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr28_sgpr29_sgpr30_sgpr31, $sgpr33, 0, 0, 0, 0, implicit $exec :: (store 4 into %fixed-stack.0, align 16, addrspace 5)		; GCN64-MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr28_sgpr29_sgpr30_sgpr31, $sgpr33, 0, 0, 0, 0, implicit $exec :: (store 4 into %fixed-stack.0, align 16, addrspace 5)
; GCN64-MUBUF: $sgpr2 = S_ADD_U32 $sgpr33, 262144, implicit-def $scc		; GCN64-MUBUF: $sgpr2 = S_ADD_I32 $sgpr33, 262144, implicit-def $scc
; GCN64-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr28_sgpr29_sgpr30_sgpr31, killed $sgpr2, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.8, align 4096, addrspace 5)		; GCN64-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr28_sgpr29_sgpr30_sgpr31, killed $sgpr2, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.8, align 4096, addrspace 5)
; GCN64-MUBUF: $sgpr12 = V_READLANE_B32 killed $vgpr0, 0		; GCN64-MUBUF: $sgpr12 = V_READLANE_B32 killed $vgpr0, 0
; GCN64-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr28_sgpr29_sgpr30_sgpr31, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load 4 from %fixed-stack.0, align 16, addrspace 5)		; GCN64-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr28_sgpr29_sgpr30_sgpr31, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load 4 from %fixed-stack.0, align 16, addrspace 5)
; GCN64-MUBUF: $exec = S_MOV_B64 killed $sgpr0_sgpr1, implicit killed $vgpr0		; GCN64-MUBUF: $exec = S_MOV_B64 killed $sgpr0_sgpr1, implicit killed $vgpr0
; GCN32-MUBUF-LABEL: name: check_reload		; GCN32-MUBUF-LABEL: name: check_reload
; GCN32-MUBUF: liveins: $sgpr8, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr11		; GCN32-MUBUF: liveins: $sgpr8, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr11
; GCN32-MUBUF: $sgpr33 = S_MOV_B32 0		; GCN32-MUBUF: $sgpr33 = S_MOV_B32 0
; GCN32-MUBUF: $sgpr96 = S_MOV_B32 &SCRATCH_RSRC_DWORD0, implicit-def $sgpr96_sgpr97_sgpr98_sgpr99		; GCN32-MUBUF: $sgpr96 = S_MOV_B32 &SCRATCH_RSRC_DWORD0, implicit-def $sgpr96_sgpr97_sgpr98_sgpr99
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	bb.0:
; GCN32-MUBUF: $sgpr93 = V_READLANE_B32 $vgpr0, 29		; GCN32-MUBUF: $sgpr93 = V_READLANE_B32 $vgpr0, 29
; GCN32-MUBUF: $sgpr94 = V_READLANE_B32 $vgpr0, 30		; GCN32-MUBUF: $sgpr94 = V_READLANE_B32 $vgpr0, 30
; GCN32-MUBUF: $sgpr95 = V_READLANE_B32 killed $vgpr0, 31		; GCN32-MUBUF: $sgpr95 = V_READLANE_B32 killed $vgpr0, 31
; GCN32-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load 4 from %fixed-stack.0, align 16, addrspace 5)		; GCN32-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load 4 from %fixed-stack.0, align 16, addrspace 5)
; GCN32-MUBUF: $exec_lo = S_MOV_B32 killed $sgpr0, implicit killed $vgpr0		; GCN32-MUBUF: $exec_lo = S_MOV_B32 killed $sgpr0, implicit killed $vgpr0
; GCN32-MUBUF: $sgpr0 = S_MOV_B32 $exec_lo		; GCN32-MUBUF: $sgpr0 = S_MOV_B32 $exec_lo
; GCN32-MUBUF: $exec_lo = S_MOV_B32 1, implicit-def $vgpr0		; GCN32-MUBUF: $exec_lo = S_MOV_B32 1, implicit-def $vgpr0
; GCN32-MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr33, 0, 0, 0, 0, implicit $exec :: (store 4 into %fixed-stack.0, align 16, addrspace 5)		; GCN32-MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr0, $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr33, 0, 0, 0, 0, implicit $exec :: (store 4 into %fixed-stack.0, align 16, addrspace 5)
; GCN32-MUBUF: $sgpr1 = S_ADD_U32 $sgpr33, 131072, implicit-def $scc		; GCN32-MUBUF: $sgpr1 = S_ADD_I32 $sgpr33, 131072, implicit-def $scc
; GCN32-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr96_sgpr97_sgpr98_sgpr99, killed $sgpr1, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.8, align 4096, addrspace 5)		; GCN32-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr96_sgpr97_sgpr98_sgpr99, killed $sgpr1, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.8, align 4096, addrspace 5)
; GCN32-MUBUF: $sgpr12 = V_READLANE_B32 killed $vgpr0, 0		; GCN32-MUBUF: $sgpr12 = V_READLANE_B32 killed $vgpr0, 0
; GCN32-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load 4 from %fixed-stack.0, align 16, addrspace 5)		; GCN32-MUBUF: $vgpr0 = BUFFER_LOAD_DWORD_OFFSET $sgpr96_sgpr97_sgpr98_sgpr99, $sgpr33, 0, 0, 0, 0, implicit $exec :: (load 4 from %fixed-stack.0, align 16, addrspace 5)
; GCN32-MUBUF: $exec_lo = S_MOV_B32 killed $sgpr0, implicit killed $vgpr0		; GCN32-MUBUF: $exec_lo = S_MOV_B32 killed $sgpr0, implicit killed $vgpr0
; GCN64-FLATSCR-LABEL: name: check_reload		; GCN64-FLATSCR-LABEL: name: check_reload
; GCN64-FLATSCR: liveins: $sgpr8, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr11, $sgpr0_sgpr1		; GCN64-FLATSCR: liveins: $sgpr8, $sgpr4_sgpr5, $sgpr6_sgpr7, $sgpr11, $sgpr0_sgpr1
; GCN64-FLATSCR: $sgpr33 = S_MOV_B32 0		; GCN64-FLATSCR: $sgpr33 = S_MOV_B32 0
; GCN64-FLATSCR: $flat_scr_lo = S_ADD_U32 $sgpr0, $sgpr11, implicit-def $scc		; GCN64-FLATSCR: $flat_scr_lo = S_ADD_U32 $sgpr0, $sgpr11, implicit-def $scc
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	bb.0:
; GCN64-FLATSCR: $sgpr93 = V_READLANE_B32 $vgpr0, 29		; GCN64-FLATSCR: $sgpr93 = V_READLANE_B32 $vgpr0, 29
; GCN64-FLATSCR: $sgpr94 = V_READLANE_B32 $vgpr0, 30		; GCN64-FLATSCR: $sgpr94 = V_READLANE_B32 $vgpr0, 30
; GCN64-FLATSCR: $sgpr95 = V_READLANE_B32 killed $vgpr0, 31		; GCN64-FLATSCR: $sgpr95 = V_READLANE_B32 killed $vgpr0, 31
; GCN64-FLATSCR: $vgpr0 = SCRATCH_LOAD_DWORD_SADDR $sgpr33, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %fixed-stack.0, align 16, addrspace 5)		; GCN64-FLATSCR: $vgpr0 = SCRATCH_LOAD_DWORD_SADDR $sgpr33, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %fixed-stack.0, align 16, addrspace 5)
; GCN64-FLATSCR: $exec = S_MOV_B64 killed $sgpr2_sgpr3, implicit killed $vgpr0		; GCN64-FLATSCR: $exec = S_MOV_B64 killed $sgpr2_sgpr3, implicit killed $vgpr0
; GCN64-FLATSCR: $sgpr2_sgpr3 = S_MOV_B64 $exec		; GCN64-FLATSCR: $sgpr2_sgpr3 = S_MOV_B64 $exec
; GCN64-FLATSCR: $exec = S_MOV_B64 1, implicit-def $vgpr0		; GCN64-FLATSCR: $exec = S_MOV_B64 1, implicit-def $vgpr0
; GCN64-FLATSCR: SCRATCH_STORE_DWORD_SADDR killed $vgpr0, $sgpr33, 0, 0, implicit $exec, implicit $flat_scr :: (store 4 into %fixed-stack.0, align 16, addrspace 5)		; GCN64-FLATSCR: SCRATCH_STORE_DWORD_SADDR killed $vgpr0, $sgpr33, 0, 0, implicit $exec, implicit $flat_scr :: (store 4 into %fixed-stack.0, align 16, addrspace 5)
; GCN64-FLATSCR: $sgpr9 = S_ADD_U32 $sgpr33, 4096, implicit-def $scc		; GCN64-FLATSCR: $sgpr9 = S_ADD_I32 $sgpr33, 4096, implicit-def $scc
; GCN64-FLATSCR: $vgpr0 = SCRATCH_LOAD_DWORD_SADDR killed $sgpr9, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %stack.8, align 4096, addrspace 5)		; GCN64-FLATSCR: $vgpr0 = SCRATCH_LOAD_DWORD_SADDR killed $sgpr9, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %stack.8, align 4096, addrspace 5)
; GCN64-FLATSCR: $sgpr12 = V_READLANE_B32 killed $vgpr0, 0		; GCN64-FLATSCR: $sgpr12 = V_READLANE_B32 killed $vgpr0, 0
; GCN64-FLATSCR: $vgpr0 = SCRATCH_LOAD_DWORD_SADDR $sgpr33, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %fixed-stack.0, align 16, addrspace 5)		; GCN64-FLATSCR: $vgpr0 = SCRATCH_LOAD_DWORD_SADDR $sgpr33, 0, 0, implicit $exec, implicit $flat_scr :: (load 4 from %fixed-stack.0, align 16, addrspace 5)
; GCN64-FLATSCR: $exec = S_MOV_B64 killed $sgpr2_sgpr3, implicit killed $vgpr0		; GCN64-FLATSCR: $exec = S_MOV_B64 killed $sgpr2_sgpr3, implicit killed $vgpr0
renamable $sgpr12 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32		renamable $sgpr12 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32

renamable $sgpr12_sgpr13 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32		renamable $sgpr12_sgpr13 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32

Show All 14 Lines

llvm/test/CodeGen/AMDGPU/sibling-call.ll

	Show First 20 Lines • Show All 194 Lines • ▼ Show 20 Lines
	}			}

	; Have another non-tail in the function			; Have another non-tail in the function
	; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_i32_other_call:			; GCN-LABEL: {{^}}sibling_call_i32_fastcc_i32_i32_other_call:
	; GCN: s_or_saveexec_b64 s{{\[[0-9]+:[0-9]+\]}}, -1			; GCN: s_or_saveexec_b64 s{{\[[0-9]+:[0-9]+\]}}, -1
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec			; GCN-NEXT: s_mov_b64 exec
	; GCN: s_mov_b32 s33, s32			; GCN: s_mov_b32 s33, s32
	; GCN-DAG: s_add_u32 s32, s32, 0x400			; GCN-DAG: s_addk_i32 s32, 0x400

	; GCN-DAG: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-DAG: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-DAG: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-DAG: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-DAG: v_writelane_b32 v42, s34, 0			; GCN-DAG: v_writelane_b32 v42, s34, 0
	; GCN-DAG: v_writelane_b32 v42, s35, 1			; GCN-DAG: v_writelane_b32 v42, s35, 1

	; GCN-DAG: s_getpc_b64 s[4:5]			; GCN-DAG: s_getpc_b64 s[4:5]
	; GCN-DAG: s_add_u32 s4, s4, i32_fastcc_i32_i32@gotpcrel32@lo+4			; GCN-DAG: s_add_u32 s4, s4, i32_fastcc_i32_i32@gotpcrel32@lo+4
	; GCN-DAG: s_addc_u32 s5, s5, i32_fastcc_i32_i32@gotpcrel32@hi+12			; GCN-DAG: s_addc_u32 s5, s5, i32_fastcc_i32_i32@gotpcrel32@hi+12


	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN-DAG: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-DAG: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-DAG: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-DAG: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload

	; GCN: s_getpc_b64 s[4:5]			; GCN: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, sibling_call_i32_fastcc_i32_i32@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, sibling_call_i32_fastcc_i32_i32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, sibling_call_i32_fastcc_i32_i32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s5, s5, sibling_call_i32_fastcc_i32_i32@rel32@hi+12

	; GCN-DAG: v_readlane_b32 s34, v42, 0			; GCN-DAG: v_readlane_b32 s34, v42, 0
	; GCN-DAG: v_readlane_b32 s35, v42, 1			; GCN-DAG: v_readlane_b32 s35, v42, 1

	; GCN: s_sub_u32 s32, s32, 0x400			; GCN: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33,			; GCN-NEXT: v_readlane_b32 s33,
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_setpc_b64 s[4:5]			; GCN-NEXT: s_setpc_b64 s[4:5]
	define fastcc i32 @sibling_call_i32_fastcc_i32_i32_other_call(i32 %a, i32 %b, i32 %c) #1 {			define fastcc i32 @sibling_call_i32_fastcc_i32_i32_other_call(i32 %a, i32 %b, i32 %c) #1 {
	entry:			entry:
	%other.call = tail call fastcc i32 @i32_fastcc_i32_i32(i32 %a, i32 %b)			%other.call = tail call fastcc i32 @i32_fastcc_i32_i32(i32 %a, i32 %b)
	▲ Show 20 Lines • Show All 233 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/spill-offset-calculation.ll

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	entry:
%asm4.0 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm.0, 4		%asm4.0 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm.0, 4
%asm5.0 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm.0, 5		%asm5.0 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm.0, 5
%asm6.0 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm.0, 6		%asm6.0 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm.0, 6
%asm7.0 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm.0, 7		%asm7.0 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm.0, 7

; 0x40000 / 64 = 4096 (for wave64)		; 0x40000 / 64 = 4096 (for wave64)
%a = load volatile i32, i32 addrspace(5)* %aptr		%a = load volatile i32, i32 addrspace(5)* %aptr

; MUBUF: s_add_u32 s32, s32, 0x40000		; MUBUF: s_add_i32 s32, s32, 0x40000
; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s32 ; 4-byte Folded Spill		; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s32 ; 4-byte Folded Spill
; MUBUF: s_sub_u32 s32, s32, 0x40000		; MUBUF: s_add_i32 s32, s32, 0xfffc0000
; FLATSCR: s_add_u32 [[SOFF:s[0-9]+]], s32, 0x1000		; FLATSCR: s_add_i32 [[SOFF:s[0-9]+]], s32, 0x1000
; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, [[SOFF]] ; 4-byte Folded Spill		; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, [[SOFF]] ; 4-byte Folded Spill
call void asm sideeffect "", "s,s,s,s,s,s,s,s,v"(i32 %asm0.0, i32 %asm1.0, i32 %asm2.0, i32 %asm3.0, i32 %asm4.0, i32 %asm5.0, i32 %asm6.0, i32 %asm7.0, i32 %a)		call void asm sideeffect "", "s,s,s,s,s,s,s,s,v"(i32 %asm0.0, i32 %asm1.0, i32 %asm2.0, i32 %asm3.0, i32 %asm4.0, i32 %asm5.0, i32 %asm6.0, i32 %asm7.0, i32 %a)

%asm = call { i32, i32, i32, i32, i32, i32, i32, i32 } asm sideeffect "", "=s,=s,=s,=s,=s,=s,=s,=s"()		%asm = call { i32, i32, i32, i32, i32, i32, i32, i32 } asm sideeffect "", "=s,=s,=s,=s,=s,=s,=s,=s"()
%asm0 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 0		%asm0 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 0
%asm1 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 1		%asm1 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 1
%asm2 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 2		%asm2 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 2
%asm3 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 3		%asm3 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 3
%asm4 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 4		%asm4 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 4
%asm5 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 5		%asm5 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 5
%asm6 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 6		%asm6 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 6
%asm7 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 7		%asm7 = extractvalue { i32, i32, i32, i32, i32, i32, i32, i32 } %asm, 7

call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}"() #0		call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}"() #0

; MUBUF: s_add_u32 s32, s32, 0x40000		; MUBUF: s_add_i32 s32, s32, 0x40000
; MUBUF: buffer_load_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s32 ; 4-byte Folded Reload		; MUBUF: buffer_load_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s32 ; 4-byte Folded Reload
; MUBUF: s_sub_u32 s32, s32, 0x40000		; MUBUF: s_add_i32 s32, s32, 0xfffc0000
; FLATSCR: s_add_u32 [[SOFF:s[0-9]+]], s32, 0x1000		; FLATSCR: s_add_i32 [[SOFF:s[0-9]+]], s32, 0x1000
; FLATSCR: scratch_load_dword v{{[0-9]+}}, off, [[SOFF]] ; 4-byte Folded Reload		; FLATSCR: scratch_load_dword v{{[0-9]+}}, off, [[SOFF]] ; 4-byte Folded Reload

; Force %a to spill with no free SGPRs		; Force %a to spill with no free SGPRs
call void asm sideeffect "", "s,s,s,s,s,s,s,s,v"(i32 %asm0, i32 %asm1, i32 %asm2, i32 %asm3, i32 %asm4, i32 %asm5, i32 %asm6, i32 %asm7, i32 %a)		call void asm sideeffect "", "s,s,s,s,s,s,s,s,v"(i32 %asm0, i32 %asm1, i32 %asm2, i32 %asm3, i32 %asm4, i32 %asm5, i32 %asm6, i32 %asm7, i32 %a)
ret void		ret void
}		}

; GCN-LABEL: test_sgpr_offset_subregs_kernel		; GCN-LABEL: test_sgpr_offset_subregs_kernel
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
entry:		entry:
; Occupy 4096 bytes of scratch, so the offset of the spill of %a does not		; Occupy 4096 bytes of scratch, so the offset of the spill of %a does not
; fit in the instruction, and has to live in the SGPR offset.		; fit in the instruction, and has to live in the SGPR offset.
%alloca = alloca i8, i32 4096, align 4, addrspace(5)		%alloca = alloca i8, i32 4096, align 4, addrspace(5)
%buf = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*		%buf = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*

%aptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1		%aptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1
; 0x40000 / 64 = 4096 (for wave64)		; 0x40000 / 64 = 4096 (for wave64)
; MUBUF: s_add_u32 s4, s32, 0x40000		; MUBUF: s_add_i32 s4, s32, 0x40000
; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s4 ; 4-byte Folded Spill		; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s4 ; 4-byte Folded Spill
; FLATSCR: s_add_u32 s0, s32, 0x1000		; FLATSCR: s_add_i32 s0, s32, 0x1000
; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s0 ; 4-byte Folded Spill		; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s0 ; 4-byte Folded Spill
%a = load volatile i32, i32 addrspace(5)* %aptr		%a = load volatile i32, i32 addrspace(5)* %aptr

; Force %a to spill		; Force %a to spill
call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()		call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()

%outptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1		%outptr = getelementptr i32, i32 addrspace(5)* %buf, i32 1
store volatile i32 %a, i32 addrspace(5)* %outptr		store volatile i32 %a, i32 addrspace(5)* %outptr
Show All 36 Lines	entry:
; Occupy 4092 bytes of scratch, so that the spill of the last subreg of %a		; Occupy 4092 bytes of scratch, so that the spill of the last subreg of %a
; does not fit below offset 4096 (4092 + 8 - 4 = 4096), and has to live		; does not fit below offset 4096 (4092 + 8 - 4 = 4096), and has to live
; in the SGPR offset.		; in the SGPR offset.
%alloca = alloca i8, i32 4092, align 4, addrspace(5)		%alloca = alloca i8, i32 4092, align 4, addrspace(5)
%bufv1 = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*		%bufv1 = bitcast i8 addrspace(5)* %alloca to i32 addrspace(5)*
%bufv2 = bitcast i8 addrspace(5)* %alloca to <2 x i32> addrspace(5)*		%bufv2 = bitcast i8 addrspace(5)* %alloca to <2 x i32> addrspace(5)*

; 0x3ff00 / 64 = 4092 (for wave64)		; 0x3ff00 / 64 = 4092 (for wave64)
; MUBUF: s_add_u32 s4, s32, 0x3ff00		; MUBUF: s_add_i32 s4, s32, 0x3ff00
; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s4 ; 4-byte Folded Spill		; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s4 ; 4-byte Folded Spill
; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s4 offset:4 ; 4-byte Folded Spill		; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], s4 offset:4 ; 4-byte Folded Spill
; FLATSCR: scratch_store_dwordx2 off, v[{{[0-9:]+}}], s32 offset:4092 ; 8-byte Folded Spill		; FLATSCR: scratch_store_dwordx2 off, v[{{[0-9:]+}}], s32 offset:4092 ; 8-byte Folded Spill
%aptr = getelementptr <2 x i32>, <2 x i32> addrspace(5)* %bufv2, i32 1		%aptr = getelementptr <2 x i32>, <2 x i32> addrspace(5)* %bufv2, i32 1
%a = load volatile <2 x i32>, <2 x i32> addrspace(5)* %aptr		%a = load volatile <2 x i32>, <2 x i32> addrspace(5)* %aptr

; Force %a to spill.		; Force %a to spill.
call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()		call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()
Show All 14 Lines

llvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll

Show All 39 Lines	; mark most VGPR registers as used to increase register pressure

%outptr = getelementptr <1280 x i32>, <1280 x i32> addrspace(1)* %out, i32 %tid		%outptr = getelementptr <1280 x i32>, <1280 x i32> addrspace(1)* %out, i32 %tid
store <1280 x i32> %a, <1280 x i32> addrspace(1)* %outptr		store <1280 x i32> %a, <1280 x i32> addrspace(1)* %outptr

ret void		ret void
}		}

; CHECK-LABEL: test_limited_sgpr		; CHECK-LABEL: test_limited_sgpr
; GFX6: s_add_u32 s32, s32, 0x[[OFFSET:[0-9a-f]+]]		; GFX6: s_add_i32 s32, s32, 0x[[OFFSET:[0-9a-f]+]]
; GFX6: s_add_u32 s32, s32, 0x[[OFFSET:[0-9a-f]+]]		; GFX6: s_add_i32 s32, s32, 0x[[OFFSET:[0-9a-f]+]]
		; GFX6: s_add_i32 s32, s32, 0x[[OFFSET:[0-9a-f]+]]
; GFX6-NEXT: s_waitcnt expcnt(0)		; GFX6-NEXT: s_waitcnt expcnt(0)
; GFX6-NEXT: buffer_load_dword v{{[0-9]+}}, off, s[{{[0-9:]+}}], s32		; GFX6-NEXT: buffer_load_dword v{{[0-9]+}}, off, s[{{[0-9:]+}}], s32
; GFX6-NEXT: s_sub_u32 s32, s32, 0x[[OFFSET:[0-9a-f]+]]		; GFX6-NEXT: s_add_i32 s32, s32, 0x[[OFFSET:[0-9a-f]+]]
; GFX6: NumSgprs: 48		; GFX6: NumSgprs: 48
; GFX6: ScratchSize: 8608		; GFX6: ScratchSize: 8608

; FLATSCR: s_movk_i32 [[SOFF1:s[0-9]+]], 0x		; FLATSCR: s_movk_i32 [[SOFF1:s[0-9]+]], 0x
; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)		; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)
; FLATSCR-NEXT: scratch_store_dwordx4 off, v[{{[0-9:]+}}], [[SOFF1]] ; 16-byte Folded Spill		; FLATSCR-NEXT: scratch_store_dwordx4 off, v[{{[0-9:]+}}], [[SOFF1]] ; 16-byte Folded Spill
; FLATSCR: s_movk_i32 [[SOFF2:s[0-9]+]], 0x		; FLATSCR: s_movk_i32 [[SOFF2:s[0-9]+]], 0x
; FLATSCR: scratch_load_dwordx4 v[{{[0-9:]+}}], off, [[SOFF2]] ; 16-byte Folded Reload		; FLATSCR: scratch_load_dwordx4 v[{{[0-9:]+}}], off, [[SOFF2]] ; 16-byte Folded Reload
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/stack-realign-kernel.ll

; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji --amdhsa-code-object-version=3 < %s \| FileCheck -check-prefix=VI %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji --amdhsa-code-object-version=3 < %s \| FileCheck -check-prefix=VI %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 --amdhsa-code-object-version=3 < %s \| FileCheck -check-prefix=GFX9 %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 --amdhsa-code-object-version=3 < %s \| FileCheck -check-prefix=GFX9 %s

; Make sure the stack is never realigned for entry functions.		; Make sure the stack is never realigned for entry functions.

define amdgpu_kernel void @max_alignment_128() #0 {		define amdgpu_kernel void @max_alignment_128() #0 {
; VI-LABEL: max_alignment_128:		; VI-LABEL: max_alignment_128:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_add_u32 s4, s4, s7		; VI-NEXT: s_add_i32 s4, s4, s7
; VI-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8		; VI-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
; VI-NEXT: s_add_u32 s0, s0, s7		; VI-NEXT: s_add_u32 s0, s0, s7
; VI-NEXT: s_addc_u32 s1, s1, 0		; VI-NEXT: s_addc_u32 s1, s1, 0
; VI-NEXT: v_mov_b32_e32 v0, 9		; VI-NEXT: v_mov_b32_e32 v0, 9
; VI-NEXT: s_mov_b32 flat_scratch_lo, s5		; VI-NEXT: s_mov_b32 flat_scratch_lo, s5
; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:128		; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:128
; VI-NEXT: s_waitcnt vmcnt(0)		; VI-NEXT: s_waitcnt vmcnt(0)
; VI-NEXT: s_endpgm		; VI-NEXT: s_endpgm
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	; GFX9-NEXT: .text
%alloca.align = alloca i32, align 128, addrspace(5)		%alloca.align = alloca i32, align 128, addrspace(5)
store volatile i32 9, i32 addrspace(5)* %alloca.align, align 128		store volatile i32 9, i32 addrspace(5)* %alloca.align, align 128
ret void		ret void
}		}

define amdgpu_kernel void @stackrealign_attr() #1 {		define amdgpu_kernel void @stackrealign_attr() #1 {
; VI-LABEL: stackrealign_attr:		; VI-LABEL: stackrealign_attr:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_add_u32 s4, s4, s7		; VI-NEXT: s_add_i32 s4, s4, s7
; VI-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8		; VI-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
; VI-NEXT: s_add_u32 s0, s0, s7		; VI-NEXT: s_add_u32 s0, s0, s7
; VI-NEXT: s_addc_u32 s1, s1, 0		; VI-NEXT: s_addc_u32 s1, s1, 0
; VI-NEXT: v_mov_b32_e32 v0, 9		; VI-NEXT: v_mov_b32_e32 v0, 9
; VI-NEXT: s_mov_b32 flat_scratch_lo, s5		; VI-NEXT: s_mov_b32 flat_scratch_lo, s5
; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4		; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
; VI-NEXT: s_waitcnt vmcnt(0)		; VI-NEXT: s_waitcnt vmcnt(0)
; VI-NEXT: s_endpgm		; VI-NEXT: s_endpgm
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	; GFX9-NEXT: .text
%alloca.align = alloca i32, align 4, addrspace(5)		%alloca.align = alloca i32, align 4, addrspace(5)
store volatile i32 9, i32 addrspace(5)* %alloca.align, align 4		store volatile i32 9, i32 addrspace(5)* %alloca.align, align 4
ret void		ret void
}		}

define amdgpu_kernel void @alignstack_attr() #2 {		define amdgpu_kernel void @alignstack_attr() #2 {
; VI-LABEL: alignstack_attr:		; VI-LABEL: alignstack_attr:
; VI: ; %bb.0:		; VI: ; %bb.0:
; VI-NEXT: s_add_u32 s4, s4, s7		; VI-NEXT: s_add_i32 s4, s4, s7
; VI-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8		; VI-NEXT: s_lshr_b32 flat_scratch_hi, s4, 8
; VI-NEXT: s_add_u32 s0, s0, s7		; VI-NEXT: s_add_u32 s0, s0, s7
; VI-NEXT: s_addc_u32 s1, s1, 0		; VI-NEXT: s_addc_u32 s1, s1, 0
; VI-NEXT: v_mov_b32_e32 v0, 9		; VI-NEXT: v_mov_b32_e32 v0, 9
; VI-NEXT: s_mov_b32 flat_scratch_lo, s5		; VI-NEXT: s_mov_b32 flat_scratch_lo, s5
; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4		; VI-NEXT: buffer_store_dword v0, off, s[0:3], 0 offset:4
; VI-NEXT: s_waitcnt vmcnt(0)		; VI-NEXT: s_waitcnt vmcnt(0)
; VI-NEXT: s_endpgm		; VI-NEXT: s_endpgm
▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/stack-realign.ll

Show All 26 Lines
define void @needs_align16_default_stack_align(i32 %idx) #0 {		define void @needs_align16_default_stack_align(i32 %idx) #0 {
%alloca.align16 = alloca [8 x <4 x i32>], align 16, addrspace(5)		%alloca.align16 = alloca [8 x <4 x i32>], align 16, addrspace(5)
%gep0 = getelementptr inbounds [8 x <4 x i32>], [8 x <4 x i32>] addrspace(5)* %alloca.align16, i32 0, i32 %idx		%gep0 = getelementptr inbounds [8 x <4 x i32>], [8 x <4 x i32>] addrspace(5)* %alloca.align16, i32 0, i32 %idx
store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep0, align 16		store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep0, align 16
ret void		ret void
}		}

; GCN-LABEL: {{^}}needs_align16_stack_align4:		; GCN-LABEL: {{^}}needs_align16_stack_align4:
; GCN: s_add_u32 [[SCRATCH_REG:s[0-9]+]], s32, 0x3c0{{$}}		; GCN: s_add_i32 [[SCRATCH_REG:s[0-9]+]], s32, 0x3c0{{$}}
; GCN: s_and_b32 s33, [[SCRATCH_REG]], 0xfffffc00		; GCN: s_and_b32 s33, [[SCRATCH_REG]], 0xfffffc00

; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen		; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
; GCN: v_or_b32_e32 v{{[0-9]+}}, 12		; GCN: v_or_b32_e32 v{{[0-9]+}}, 12
; GCN: s_add_u32 s32, s32, 0x2800{{$}}		; GCN: s_addk_i32 s32, 0x2800{{$}}
; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen		; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen		; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen		; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen

; GCN: s_sub_u32 s32, s32, 0x2800		; GCN: s_addk_i32 s32, 0xd800

; GCN: ; ScratchSize: 160		; GCN: ; ScratchSize: 160
define void @needs_align16_stack_align4(i32 %idx) #2 {		define void @needs_align16_stack_align4(i32 %idx) #2 {
%alloca.align16 = alloca [8 x <4 x i32>], align 16, addrspace(5)		%alloca.align16 = alloca [8 x <4 x i32>], align 16, addrspace(5)
%gep0 = getelementptr inbounds [8 x <4 x i32>], [8 x <4 x i32>] addrspace(5)* %alloca.align16, i32 0, i32 %idx		%gep0 = getelementptr inbounds [8 x <4 x i32>], [8 x <4 x i32>] addrspace(5)* %alloca.align16, i32 0, i32 %idx
store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep0, align 16		store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep0, align 16
ret void		ret void
}		}

; GCN-LABEL: {{^}}needs_align32:		; GCN-LABEL: {{^}}needs_align32:
; GCN: s_add_u32 [[SCRATCH_REG:s[0-9]+]], s32, 0x7c0{{$}}		; GCN: s_add_i32 [[SCRATCH_REG:s[0-9]+]], s32, 0x7c0{{$}}
; GCN: s_and_b32 s33, [[SCRATCH_REG]], 0xfffff800		; GCN: s_and_b32 s33, [[SCRATCH_REG]], 0xfffff800

; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen		; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
; GCN: v_or_b32_e32 v{{[0-9]+}}, 12		; GCN: v_or_b32_e32 v{{[0-9]+}}, 12
; GCN: s_add_u32 s32, s32, 0x3000{{$}}		; GCN: s_addk_i32 s32, 0x3000{{$}}
; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen		; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen		; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen		; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen

; GCN: s_sub_u32 s32, s32, 0x3000		; GCN: s_addk_i32 s32, 0xd000

; GCN: ; ScratchSize: 192		; GCN: ; ScratchSize: 192
define void @needs_align32(i32 %idx) #0 {		define void @needs_align32(i32 %idx) #0 {
%alloca.align16 = alloca [8 x <4 x i32>], align 32, addrspace(5)		%alloca.align16 = alloca [8 x <4 x i32>], align 32, addrspace(5)
%gep0 = getelementptr inbounds [8 x <4 x i32>], [8 x <4 x i32>] addrspace(5)* %alloca.align16, i32 0, i32 %idx		%gep0 = getelementptr inbounds [8 x <4 x i32>], [8 x <4 x i32>] addrspace(5)* %alloca.align16, i32 0, i32 %idx
store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep0, align 32		store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, <4 x i32> addrspace(5)* %gep0, align 32
ret void		ret void
}		}

; GCN-LABEL: {{^}}force_realign4:		; GCN-LABEL: {{^}}force_realign4:
; GCN: s_add_u32 [[SCRATCH_REG:s[0-9]+]], s32, 0xc0{{$}}		; GCN: s_add_i32 [[SCRATCH_REG:s[0-9]+]], s32, 0xc0{{$}}
; GCN: s_and_b32 s33, [[SCRATCH_REG]], 0xffffff00		; GCN: s_and_b32 s33, [[SCRATCH_REG]], 0xffffff00
; GCN: s_add_u32 s32, s32, 0xd00{{$}}		; GCN: s_addk_i32 s32, 0xd00{{$}}

; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen		; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
; GCN: s_sub_u32 s32, s32, 0xd00		; GCN: s_addk_i32 s32, 0xf300

; GCN: ; ScratchSize: 52		; GCN: ; ScratchSize: 52
define void @force_realign4(i32 %idx) #1 {		define void @force_realign4(i32 %idx) #1 {
%alloca.align16 = alloca [8 x i32], align 4, addrspace(5)		%alloca.align16 = alloca [8 x i32], align 4, addrspace(5)
%gep0 = getelementptr inbounds [8 x i32], [8 x i32] addrspace(5)* %alloca.align16, i32 0, i32 %idx		%gep0 = getelementptr inbounds [8 x i32], [8 x i32] addrspace(5)* %alloca.align16, i32 0, i32 %idx
store volatile i32 3, i32 addrspace(5)* %gep0, align 4		store volatile i32 3, i32 addrspace(5)* %gep0, align 4
ret void		ret void
}		}
Show All 29 Lines	define amdgpu_kernel void @kernel_call_align4_from_5() {
store volatile i8 2, i8 addrspace(5)* %alloca0		store volatile i8 2, i8 addrspace(5)* %alloca0

call void @needs_align16_stack_align4(i32 1)		call void @needs_align16_stack_align4(i32 1)
ret void		ret void
}		}

; GCN-LABEL: {{^}}default_realign_align128:		; GCN-LABEL: {{^}}default_realign_align128:
; GCN: s_mov_b32 [[FP_COPY:s[0-9]+]], s33		; GCN: s_mov_b32 [[FP_COPY:s[0-9]+]], s33
; GCN-NEXT: s_add_u32 s33, s32, 0x1fc0		; GCN-NEXT: s_add_i32 s33, s32, 0x1fc0
; GCN-NEXT: s_and_b32 s33, s33, 0xffffe000		; GCN-NEXT: s_and_b32 s33, s33, 0xffffe000
; GCN-NEXT: s_add_u32 s32, s32, 0x4000		; GCN-NEXT: s_addk_i32 s32, 0x4000
; GCN-NOT: s33		; GCN-NOT: s33
; GCN: buffer_store_dword v0, off, s[0:3], s33{{$}}		; GCN: buffer_store_dword v0, off, s[0:3], s33{{$}}
; GCN: s_sub_u32 s32, s32, 0x4000		; GCN: s_addk_i32 s32, 0xc000
; GCN: s_mov_b32 s33, [[FP_COPY]]		; GCN: s_mov_b32 s33, [[FP_COPY]]
define void @default_realign_align128(i32 %idx) #0 {		define void @default_realign_align128(i32 %idx) #0 {
%alloca.align = alloca i32, align 128, addrspace(5)		%alloca.align = alloca i32, align 128, addrspace(5)
store volatile i32 9, i32 addrspace(5)* %alloca.align, align 128		store volatile i32 9, i32 addrspace(5)* %alloca.align, align 128
ret void		ret void
}		}

; GCN-LABEL: {{^}}disable_realign_align128:		; GCN-LABEL: {{^}}disable_realign_align128:
Show All 12 Lines
; since there is a local object with an alignment of 1024.		; since there is a local object with an alignment of 1024.
; Should use BP to access the incoming stack arguments.		; Should use BP to access the incoming stack arguments.
; The BP value is saved/restored with a VGPR spill.		; The BP value is saved/restored with a VGPR spill.

; GCN-LABEL: func_call_align1024_bp_gets_vgpr_spill:		; GCN-LABEL: func_call_align1024_bp_gets_vgpr_spill:
; GCN: buffer_store_dword [[VGPR_REG:v[0-9]+]], off, s[0:3], s32 offset:1028 ; 4-byte Folded Spill		; GCN: buffer_store_dword [[VGPR_REG:v[0-9]+]], off, s[0:3], s32 offset:1028 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: v_writelane_b32 [[VGPR_REG]], s33, 2		; GCN-NEXT: v_writelane_b32 [[VGPR_REG]], s33, 2
; GCN-DAG: s_add_u32 [[SCRATCH_REG:s[0-9]+]], s32, 0xffc0		; GCN-DAG: s_add_i32 [[SCRATCH_REG:s[0-9]+]], s32, 0xffc0
; GCN-DAG: v_writelane_b32 [[VGPR_REG]], s34, 3		; GCN-DAG: v_writelane_b32 [[VGPR_REG]], s34, 3
; GCN: s_and_b32 s33, [[SCRATCH_REG]], 0xffff0000		; GCN: s_and_b32 s33, [[SCRATCH_REG]], 0xffff0000
; GCN: s_mov_b32 s34, s32		; GCN: s_mov_b32 s34, s32
; GCN: v_mov_b32_e32 v32, 0		; GCN: v_mov_b32_e32 v32, 0
; GCN: buffer_store_dword v32, off, s[0:3], s33 offset:1024		; GCN: buffer_store_dword v32, off, s[0:3], s33 offset:1024
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s34		; GCN-NEXT: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s34
; GCN-DAG: s_add_u32 s32, s32, 0x30000		; GCN-DAG: s_add_i32 s32, s32, 0x30000
; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s32		; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s32
; GCN: s_swappc_b64 s[30:31], s[4:5]		; GCN: s_swappc_b64 s[30:31], s[4:5]

; GCN: s_sub_u32 s32, s32, 0x30000		; GCN: s_add_i32 s32, s32, 0xfffd0000
; GCN-NEXT: v_readlane_b32 s33, [[VGPR_REG]], 2		; GCN-NEXT: v_readlane_b32 s33, [[VGPR_REG]], 2
; GCN-NEXT: v_readlane_b32 s34, [[VGPR_REG]], 3		; GCN-NEXT: v_readlane_b32 s34, [[VGPR_REG]], 3
; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1		; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
; GCN-NEXT: buffer_load_dword [[VGPR_REG]], off, s[0:3], s32 offset:1028 ; 4-byte Folded Reload		; GCN-NEXT: buffer_load_dword [[VGPR_REG]], off, s[0:3], s32 offset:1028 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, s[6:7]		; GCN-NEXT: s_mov_b64 exec, s[6:7]
%temp = alloca i32, align 1024, addrspace(5)		%temp = alloca i32, align 1024, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %temp, align 1024		store volatile i32 0, i32 addrspace(5)* %temp, align 1024
call void @extern_func(<32 x i32> %a, i32 %b)		call void @extern_func(<32 x i32> %a, i32 %b)
ret void		ret void
}		}

%struct.Data = type { [9 x i32] }		%struct.Data = type { [9 x i32] }
define i32 @needs_align1024_stack_args_used_inside_loop(%struct.Data addrspace(5)* nocapture readonly byval(%struct.Data) align 8 %arg) local_unnamed_addr #4 {		define i32 @needs_align1024_stack_args_used_inside_loop(%struct.Data addrspace(5)* nocapture readonly byval(%struct.Data) align 8 %arg) local_unnamed_addr #4 {
; The local object allocation needed an alignment of 1024.		; The local object allocation needed an alignment of 1024.
; Since the function argument is accessed in a loop with an		; Since the function argument is accessed in a loop with an
; index variable, the base pointer first get loaded into a VGPR		; index variable, the base pointer first get loaded into a VGPR
; and that value should be further referenced to load the incoming values.		; and that value should be further referenced to load the incoming values.
; The BP value will get saved/restored in an SGPR at the prolgoue/epilogue.		; The BP value will get saved/restored in an SGPR at the prolgoue/epilogue.

; GCN-LABEL: needs_align1024_stack_args_used_inside_loop:		; GCN-LABEL: needs_align1024_stack_args_used_inside_loop:
; GCN: s_mov_b32 [[FP_COPY:s[0-9]+]], s33		; GCN: s_mov_b32 [[FP_COPY:s[0-9]+]], s33
; GCN-NEXT: s_add_u32 s33, s32, 0xffc0		; GCN-NEXT: s_add_i32 s33, s32, 0xffc0
; GCN-NEXT: s_mov_b32 [[BP_COPY:s[0-9]+]], s34		; GCN-NEXT: s_mov_b32 [[BP_COPY:s[0-9]+]], s34
; GCN-NEXT: s_mov_b32 s34, s32		; GCN-NEXT: s_mov_b32 s34, s32
; GCN-NEXT: s_and_b32 s33, s33, 0xffff0000		; GCN-NEXT: s_and_b32 s33, s33, 0xffff0000
; GCN-NEXT: v_mov_b32_e32 v{{[0-9]+}}, 0		; GCN-NEXT: v_mov_b32_e32 v{{[0-9]+}}, 0
; GCN-NEXT: v_lshrrev_b32_e64 [[VGPR_REG:v[0-9]+]], 6, s34		; GCN-NEXT: v_lshrrev_b32_e64 [[VGPR_REG:v[0-9]+]], 6, s34
; GCN: s_add_u32 s32, s32, 0x30000		; GCN: s_add_i32 s32, s32, 0x30000
; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:1024		; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:1024
; GCN: buffer_load_dword v{{[0-9]+}}, [[VGPR_REG]], s[0:3], 0 offen		; GCN: buffer_load_dword v{{[0-9]+}}, [[VGPR_REG]], s[0:3], 0 offen
; GCN: v_add_u32_e32 [[VGPR_REG]], vcc, 4, [[VGPR_REG]]		; GCN: v_add_u32_e32 [[VGPR_REG]], vcc, 4, [[VGPR_REG]]
; GCN: s_sub_u32 s32, s32, 0x30000		; GCN: s_add_i32 s32, s32, 0xfffd0000
; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]		; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]
; GCN-NEXT: s_mov_b32 s34, [[BP_COPY]]		; GCN-NEXT: s_mov_b32 s34, [[BP_COPY]]
; GCN-NEXT: s_setpc_b64 s[30:31]		; GCN-NEXT: s_setpc_b64 s[30:31]
begin:		begin:
%local_var = alloca i32, align 1024, addrspace(5)		%local_var = alloca i32, align 1024, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %local_var, align 1024		store volatile i32 0, i32 addrspace(5)* %local_var, align 1024
br label %loop_body		br label %loop_body

▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
}		}

define void @spill_bp_to_memory_scratch_reg_needed_mubuf_offset(<32 x i32> %a, i32 %b, [4096 x i8] addrspace(5)* byval([4096 x i8]) align 4 %arg) #5 {		define void @spill_bp_to_memory_scratch_reg_needed_mubuf_offset(<32 x i32> %a, i32 %b, [4096 x i8] addrspace(5)* byval([4096 x i8]) align 4 %arg) #5 {
; If the size of the offset exceeds the MUBUF offset field we need another		; If the size of the offset exceeds the MUBUF offset field we need another
; scratch VGPR to hold the offset.		; scratch VGPR to hold the offset.

; GCN-LABEL: spill_bp_to_memory_scratch_reg_needed_mubuf_offset		; GCN-LABEL: spill_bp_to_memory_scratch_reg_needed_mubuf_offset
; GCN: s_or_saveexec_b64 s[4:5], -1		; GCN: s_or_saveexec_b64 s[4:5], -1
; GCN-NEXT: s_add_u32 s6, s32, 0x42100		; GCN-NEXT: s_add_i32 s6, s32, 0x42100
; GCN-NEXT: buffer_store_dword v39, off, s[0:3], s6 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v39, off, s[0:3], s6 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, s[4:5]		; GCN-NEXT: s_mov_b64 exec, s[4:5]
; GCN-NEXT: v_mov_b32_e32 v0, s33		; GCN-NEXT: v_mov_b32_e32 v0, s33
; GCN-NOT: v_mov_b32_e32 v0, 0x1088		; GCN-NOT: v_mov_b32_e32 v0, 0x1088
; GCN-NEXT: s_add_u32 s6, s32, 0x42200		; GCN-NEXT: s_add_i32 s6, s32, 0x42200
; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s6 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s6 ; 4-byte Folded Spill
; GCN-NEXT: v_mov_b32_e32 v0, s34		; GCN-NEXT: v_mov_b32_e32 v0, s34
; GCN-NOT: v_mov_b32_e32 v0, 0x108c		; GCN-NOT: v_mov_b32_e32 v0, 0x108c
; GCN-NEXT: s_add_u32 s6, s32, 0x42300		; GCN-NEXT: s_add_i32 s6, s32, 0x42300
; GCN-NEXT: s_mov_b32 s34, s32		; GCN-NEXT: s_mov_b32 s34, s32
; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s6 ; 4-byte Folded Spill		; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s6 ; 4-byte Folded Spill
%local_val = alloca i32, align 128, addrspace(5)		%local_val = alloca i32, align 128, addrspace(5)
store volatile i32 %b, i32 addrspace(5)* %local_val, align 128		store volatile i32 %b, i32 addrspace(5)* %local_val, align 128

call void asm sideeffect "; clobber nonpreserved SGPRs and 64 CSRs",		call void asm sideeffect "; clobber nonpreserved SGPRs and 64 CSRs",
"~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}		"~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}
,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}		,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}
Show All 23 Lines

llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: llc -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=GCN %s
	; RUN: opt -S -si-annotate-control-flow -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=SI-OPT %s			; RUN: opt -S -si-annotate-control-flow -mtriple=amdgcn-amdhsa -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=SI-OPT %s

	define hidden void @widget() {			define hidden void @widget() {
	; GCN-LABEL: widget:			; GCN-LABEL: widget:
	; GCN: ; %bb.0: ; %bb			; GCN: ; %bb.0: ; %bb
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v40, s33, 2			; GCN-NEXT: v_writelane_b32 v40, s33, 2
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0x400
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: v_mov_b32_e32 v1, 0			; GCN-NEXT: v_mov_b32_e32 v1, 0
	; GCN-NEXT: flat_load_dword v0, v[0:1]			; GCN-NEXT: flat_load_dword v0, v[0:1]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 21, v0			; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 21, v0
	; GCN-NEXT: s_and_b64 vcc, exec, vcc			; GCN-NEXT: s_and_b64 vcc, exec, vcc
	; GCN-NEXT: v_writelane_b32 v40, s30, 0			; GCN-NEXT: v_writelane_b32 v40, s30, 0
	; GCN-NEXT: v_writelane_b32 v40, s31, 1			; GCN-NEXT: v_writelane_b32 v40, s31, 1
	Show All 25 Lines
	; GCN-NEXT: BB0_6: ; %bb12			; GCN-NEXT: BB0_6: ; %bb12
	; GCN-NEXT: v_mov_b32_e32 v2, 0			; GCN-NEXT: v_mov_b32_e32 v2, 0
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: v_mov_b32_e32 v1, 0			; GCN-NEXT: v_mov_b32_e32 v1, 0
	; GCN-NEXT: flat_store_dword v[0:1], v2			; GCN-NEXT: flat_store_dword v[0:1], v2
	; GCN-NEXT: BB0_7: ; %UnifiedReturnBlock			; GCN-NEXT: BB0_7: ; %UnifiedReturnBlock
	; GCN-NEXT: v_readlane_b32 s4, v40, 0			; GCN-NEXT: v_readlane_b32 s4, v40, 0
	; GCN-NEXT: v_readlane_b32 s5, v40, 1			; GCN-NEXT: v_readlane_b32 s5, v40, 1
	; GCN-NEXT: s_sub_u32 s32, s32, 0x400			; GCN-NEXT: s_addk_i32 s32, 0xfc00
	; GCN-NEXT: v_readlane_b32 s33, v40, 2			; GCN-NEXT: v_readlane_b32 s33, v40, 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[4:5]			; GCN-NEXT: s_setpc_b64 s[4:5]
	; SI-OPT-LABEL: @widget(			; SI-OPT-LABEL: @widget(
	; SI-OPT-NEXT: bb:			; SI-OPT-NEXT: bb:
	▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines
	; GCN-LABEL: blam:			; GCN-LABEL: blam:
	; GCN: ; %bb.0: ; %bb			; GCN: ; %bb.0: ; %bb
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v43, s33, 4			; GCN-NEXT: v_writelane_b32 v43, s33, 4
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x800			; GCN-NEXT: s_addk_i32 s32, 0x800
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v42, off, s[0:3], s33 ; 4-byte Folded Spill
	; GCN-NEXT: v_writelane_b32 v43, s34, 0			; GCN-NEXT: v_writelane_b32 v43, s34, 0
	; GCN-NEXT: v_writelane_b32 v43, s35, 1			; GCN-NEXT: v_writelane_b32 v43, s35, 1
	; GCN-NEXT: v_writelane_b32 v43, s36, 2			; GCN-NEXT: v_writelane_b32 v43, s36, 2
	; GCN-NEXT: v_writelane_b32 v43, s37, 3			; GCN-NEXT: v_writelane_b32 v43, s37, 3
	; GCN-NEXT: s_mov_b64 s[4:5], 0			; GCN-NEXT: s_mov_b64 s[4:5], 0
	▲ Show 20 Lines • Show All 130 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/wave32.ll

	Show First 20 Lines • Show All 1,123 Lines • ▼ Show 20 Lines
	; GFX1032-NEXT: s_or_saveexec_b32 [[COPY_EXEC0:s[0-9]+]], -1{{$}}			; GFX1032-NEXT: s_or_saveexec_b32 [[COPY_EXEC0:s[0-9]+]], -1{{$}}
	; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v40, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_waitcnt_depctr 0xffe3			; GCN-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GFX1064-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GFX1032-NEXT: s_mov_b32 exec_lo, [[COPY_EXEC0]]			; GFX1032-NEXT: s_mov_b32 exec_lo, [[COPY_EXEC0]]

	; GCN-NEXT: v_writelane_b32 v40, s33, 2			; GCN-NEXT: v_writelane_b32 v40, s33, 2
	; GCN: s_mov_b32 s33, s32			; GCN: s_mov_b32 s33, s32
	; GFX1064: s_add_u32 s32, s32, 0x400			; GFX1064: s_addk_i32 s32, 0x400
	; GFX1032: s_add_u32 s32, s32, 0x200			; GFX1032: s_addk_i32 s32, 0x200


	; GCN-DAG: v_writelane_b32 v40, s30, 0			; GCN-DAG: v_writelane_b32 v40, s30, 0
	; GCN-DAG: v_writelane_b32 v40, s31, 1			; GCN-DAG: v_writelane_b32 v40, s31, 1
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GCN-DAG: v_readlane_b32 s4, v40, 0			; GCN-DAG: v_readlane_b32 s4, v40, 0
	; GCN-DAG: v_readlane_b32 s5, v40, 1			; GCN-DAG: v_readlane_b32 s5, v40, 1


	; GFX1064: s_sub_u32 s32, s32, 0x400			; GFX1064: s_addk_i32 s32, 0xfc00
	; GFX1032: s_sub_u32 s32, s32, 0x200			; GFX1032: s_addk_i32 s32, 0xfe00
	; GCN: v_readlane_b32 s33, v40, 2			; GCN: v_readlane_b32 s33, v40, 2
	; GFX1064: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GFX1064: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GFX1032: s_or_saveexec_b32 [[COPY_EXEC1:s[0-9]]], -1{{$}}			; GFX1032: s_or_saveexec_b32 [[COPY_EXEC1:s[0-9]]], -1{{$}}
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_waitcnt_depctr 0xffe3			; GCN-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GFX1064-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GFX1032-NEXT: s_mov_b32 exec_lo, [[COPY_EXEC1]]			; GFX1032-NEXT: s_mov_b32 exec_lo, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll

	Show First 20 Lines • Show All 348 Lines • ▼ Show 20 Lines
	; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-O0-NEXT: s_or_saveexec_b64 s[10:11], -1			; GFX9-O0-NEXT: s_or_saveexec_b64 s[10:11], -1
	; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: s_mov_b64 exec, s[10:11]			; GFX9-O0-NEXT: s_mov_b64 exec, s[10:11]
	; GFX9-O0-NEXT: v_writelane_b32 v3, s33, 7			; GFX9-O0-NEXT: v_writelane_b32 v3, s33, 7
	; GFX9-O0-NEXT: s_mov_b32 s33, s32			; GFX9-O0-NEXT: s_mov_b32 s33, s32
	; GFX9-O0-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-O0-NEXT: s_addk_i32 s32, 0x400
	; GFX9-O0-NEXT: v_writelane_b32 v3, s30, 0			; GFX9-O0-NEXT: v_writelane_b32 v3, s30, 0
	; GFX9-O0-NEXT: v_writelane_b32 v3, s31, 1			; GFX9-O0-NEXT: v_writelane_b32 v3, s31, 1
	; GFX9-O0-NEXT: v_writelane_b32 v3, s8, 2			; GFX9-O0-NEXT: v_writelane_b32 v3, s8, 2
	; GFX9-O0-NEXT: s_mov_b32 s8, s4			; GFX9-O0-NEXT: s_mov_b32 s8, s4
	; GFX9-O0-NEXT: v_readlane_b32 s4, v3, 2			; GFX9-O0-NEXT: v_readlane_b32 s4, v3, 2
	; GFX9-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11			; GFX9-O0-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9_sgpr10_sgpr11
	; GFX9-O0-NEXT: s_mov_b32 s9, s5			; GFX9-O0-NEXT: s_mov_b32 s9, s5
	; GFX9-O0-NEXT: s_mov_b32 s10, s6			; GFX9-O0-NEXT: s_mov_b32 s10, s6
	Show All 24 Lines
	; GFX9-O0-NEXT: v_readlane_b32 s7, v3, 6			; GFX9-O0-NEXT: v_readlane_b32 s7, v3, 6
	; GFX9-O0-NEXT: v_readlane_b32 s30, v3, 0			; GFX9-O0-NEXT: v_readlane_b32 s30, v3, 0
	; GFX9-O0-NEXT: v_readlane_b32 s31, v3, 1			; GFX9-O0-NEXT: v_readlane_b32 s31, v3, 1
	; GFX9-O0-NEXT: v_mov_b32_e32 v1, v0			; GFX9-O0-NEXT: v_mov_b32_e32 v1, v0
	; GFX9-O0-NEXT: v_add_u32_e32 v1, v1, v2			; GFX9-O0-NEXT: v_add_u32_e32 v1, v1, v2
	; GFX9-O0-NEXT: s_mov_b64 exec, s[10:11]			; GFX9-O0-NEXT: s_mov_b64 exec, s[10:11]
	; GFX9-O0-NEXT: v_mov_b32_e32 v0, v1			; GFX9-O0-NEXT: v_mov_b32_e32 v0, v1
	; GFX9-O0-NEXT: buffer_store_dword v0, off, s[4:7], s8 offset:4			; GFX9-O0-NEXT: buffer_store_dword v0, off, s[4:7], s8 offset:4
	; GFX9-O0-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-O0-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-O0-NEXT: v_readlane_b32 s33, v3, 7			; GFX9-O0-NEXT: v_readlane_b32 s33, v3, 7
	; GFX9-O0-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-O0-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-O0-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-O0-NEXT: s_waitcnt vmcnt(0)			; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O0-NEXT: s_setpc_b64 s[30:31]			; GFX9-O0-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-O3-LABEL: strict_wwm_call:			; GFX9-O3-LABEL: strict_wwm_call:
	; GFX9-O3: ; %bb.0:			; GFX9-O3: ; %bb.0:
	; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-O3-NEXT: s_or_saveexec_b64 s[10:11], -1			; GFX9-O3-NEXT: s_or_saveexec_b64 s[10:11], -1
	; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: s_mov_b64 exec, s[10:11]			; GFX9-O3-NEXT: s_mov_b64 exec, s[10:11]
	; GFX9-O3-NEXT: s_mov_b32 s14, s33			; GFX9-O3-NEXT: s_mov_b32 s14, s33
	; GFX9-O3-NEXT: s_mov_b32 s33, s32			; GFX9-O3-NEXT: s_mov_b32 s33, s32
	; GFX9-O3-NEXT: s_add_u32 s32, s32, 0x400			; GFX9-O3-NEXT: s_addk_i32 s32, 0x400
	; GFX9-O3-NEXT: s_mov_b64 s[10:11], s[30:31]			; GFX9-O3-NEXT: s_mov_b64 s[10:11], s[30:31]
	; GFX9-O3-NEXT: v_mov_b32_e32 v2, s8			; GFX9-O3-NEXT: v_mov_b32_e32 v2, s8
	; GFX9-O3-NEXT: s_not_b64 exec, exec			; GFX9-O3-NEXT: s_not_b64 exec, exec
	; GFX9-O3-NEXT: v_mov_b32_e32 v2, 0			; GFX9-O3-NEXT: v_mov_b32_e32 v2, 0
	; GFX9-O3-NEXT: s_not_b64 exec, exec			; GFX9-O3-NEXT: s_not_b64 exec, exec
	; GFX9-O3-NEXT: s_or_saveexec_b64 s[8:9], -1			; GFX9-O3-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GFX9-O3-NEXT: v_mov_b32_e32 v0, v2			; GFX9-O3-NEXT: v_mov_b32_e32 v0, v2
	; GFX9-O3-NEXT: s_getpc_b64 s[12:13]			; GFX9-O3-NEXT: s_getpc_b64 s[12:13]
	; GFX9-O3-NEXT: s_add_u32 s12, s12, strict_wwm_called@rel32@lo+4			; GFX9-O3-NEXT: s_add_u32 s12, s12, strict_wwm_called@rel32@lo+4
	; GFX9-O3-NEXT: s_addc_u32 s13, s13, strict_wwm_called@rel32@hi+12			; GFX9-O3-NEXT: s_addc_u32 s13, s13, strict_wwm_called@rel32@hi+12
	; GFX9-O3-NEXT: s_swappc_b64 s[30:31], s[12:13]			; GFX9-O3-NEXT: s_swappc_b64 s[30:31], s[12:13]
	; GFX9-O3-NEXT: v_mov_b32_e32 v1, v0			; GFX9-O3-NEXT: v_mov_b32_e32 v1, v0
	; GFX9-O3-NEXT: v_add_u32_e32 v1, v1, v2			; GFX9-O3-NEXT: v_add_u32_e32 v1, v1, v2
	; GFX9-O3-NEXT: s_mov_b64 exec, s[8:9]			; GFX9-O3-NEXT: s_mov_b64 exec, s[8:9]
	; GFX9-O3-NEXT: v_mov_b32_e32 v0, v1			; GFX9-O3-NEXT: v_mov_b32_e32 v0, v1
	; GFX9-O3-NEXT: buffer_store_dword v0, off, s[4:7], 0 offset:4			; GFX9-O3-NEXT: buffer_store_dword v0, off, s[4:7], 0 offset:4
	; GFX9-O3-NEXT: s_sub_u32 s32, s32, 0x400			; GFX9-O3-NEXT: s_addk_i32 s32, 0xfc00
	; GFX9-O3-NEXT: s_mov_b32 s33, s14			; GFX9-O3-NEXT: s_mov_b32 s33, s14
	; GFX9-O3-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-O3-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-O3-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-O3-NEXT: s_waitcnt vmcnt(0)			; GFX9-O3-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O3-NEXT: s_setpc_b64 s[10:11]			; GFX9-O3-NEXT: s_setpc_b64 s[10:11]
	%tmp107 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %arg, i32 0)			%tmp107 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %arg, i32 0)
	▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
	; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:24 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: s_waitcnt vmcnt(0)			; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:28 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:32 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill			; GFX9-O0-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:36 ; 4-byte Folded Spill
	; GFX9-O0-NEXT: s_mov_b64 exec, s[10:11]			; GFX9-O0-NEXT: s_mov_b64 exec, s[10:11]
	; GFX9-O0-NEXT: v_writelane_b32 v11, s33, 9			; GFX9-O0-NEXT: v_writelane_b32 v11, s33, 9
	; GFX9-O0-NEXT: s_mov_b32 s33, s32			; GFX9-O0-NEXT: s_mov_b32 s33, s32
	; GFX9-O0-NEXT: s_add_u32 s32, s32, 0xc00			; GFX9-O0-NEXT: s_addk_i32 s32, 0xc00
	; GFX9-O0-NEXT: v_writelane_b32 v11, s30, 0			; GFX9-O0-NEXT: v_writelane_b32 v11, s30, 0
	; GFX9-O0-NEXT: v_writelane_b32 v11, s31, 1			; GFX9-O0-NEXT: v_writelane_b32 v11, s31, 1
	; GFX9-O0-NEXT: v_writelane_b32 v11, s9, 2			; GFX9-O0-NEXT: v_writelane_b32 v11, s9, 2
	; GFX9-O0-NEXT: v_writelane_b32 v11, s8, 3			; GFX9-O0-NEXT: v_writelane_b32 v11, s8, 3
	; GFX9-O0-NEXT: s_mov_b32 s8, s6			; GFX9-O0-NEXT: s_mov_b32 s8, s6
	; GFX9-O0-NEXT: v_readlane_b32 s6, v11, 3			; GFX9-O0-NEXT: v_readlane_b32 s6, v11, 3
	; GFX9-O0-NEXT: v_writelane_b32 v11, s8, 4			; GFX9-O0-NEXT: v_writelane_b32 v11, s8, 4
	; GFX9-O0-NEXT: s_mov_b32 s12, s5			; GFX9-O0-NEXT: s_mov_b32 s12, s5
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; GFX9-O0-NEXT: v_mov_b32_e32 v5, v10			; GFX9-O0-NEXT: v_mov_b32_e32 v5, v10
	; GFX9-O0-NEXT: v_add_co_u32_e64 v2, s[10:11], v2, v4			; GFX9-O0-NEXT: v_add_co_u32_e64 v2, s[10:11], v2, v4
	; GFX9-O0-NEXT: v_addc_co_u32_e64 v3, s[10:11], v3, v5, s[10:11]			; GFX9-O0-NEXT: v_addc_co_u32_e64 v3, s[10:11], v3, v5, s[10:11]
	; GFX9-O0-NEXT: s_mov_b64 exec, s[8:9]			; GFX9-O0-NEXT: s_mov_b64 exec, s[8:9]
	; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2			; GFX9-O0-NEXT: v_mov_b32_e32 v0, v2
	; GFX9-O0-NEXT: v_mov_b32_e32 v1, v3			; GFX9-O0-NEXT: v_mov_b32_e32 v1, v3
	; GFX9-O0-NEXT: s_mov_b32 s8, 0			; GFX9-O0-NEXT: s_mov_b32 s8, 0
	; GFX9-O0-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], s8 offset:4			; GFX9-O0-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], s8 offset:4
	; GFX9-O0-NEXT: s_sub_u32 s32, s32, 0xc00			; GFX9-O0-NEXT: s_addk_i32 s32, 0xf400
	; GFX9-O0-NEXT: v_readlane_b32 s33, v11, 9			; GFX9-O0-NEXT: v_readlane_b32 s33, v11, 9
	; GFX9-O0-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-O0-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-O0-NEXT: buffer_load_dword v11, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v11, off, s[0:3], s32 offset:40 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: s_nop 0			; GFX9-O0-NEXT: s_nop 0
	; GFX9-O0-NEXT: buffer_load_dword v9, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v9, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: s_nop 0			; GFX9-O0-NEXT: s_nop 0
	; GFX9-O0-NEXT: buffer_load_dword v10, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v10, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: s_nop 0			; GFX9-O0-NEXT: s_nop 0
	Show All 25 Lines
	; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:12 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:16 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: s_waitcnt vmcnt(0)			; GFX9-O3-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill			; GFX9-O3-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:20 ; 4-byte Folded Spill
	; GFX9-O3-NEXT: s_mov_b64 exec, s[10:11]			; GFX9-O3-NEXT: s_mov_b64 exec, s[10:11]
	; GFX9-O3-NEXT: s_mov_b32 s14, s33			; GFX9-O3-NEXT: s_mov_b32 s14, s33
	; GFX9-O3-NEXT: s_mov_b32 s33, s32			; GFX9-O3-NEXT: s_mov_b32 s33, s32
	; GFX9-O3-NEXT: s_add_u32 s32, s32, 0x800			; GFX9-O3-NEXT: s_addk_i32 s32, 0x800
	; GFX9-O3-NEXT: s_mov_b64 s[10:11], s[30:31]			; GFX9-O3-NEXT: s_mov_b64 s[10:11], s[30:31]
	; GFX9-O3-NEXT: v_mov_b32_e32 v6, s8			; GFX9-O3-NEXT: v_mov_b32_e32 v6, s8
	; GFX9-O3-NEXT: v_mov_b32_e32 v7, s9			; GFX9-O3-NEXT: v_mov_b32_e32 v7, s9
	; GFX9-O3-NEXT: s_not_b64 exec, exec			; GFX9-O3-NEXT: s_not_b64 exec, exec
	; GFX9-O3-NEXT: v_mov_b32_e32 v6, 0			; GFX9-O3-NEXT: v_mov_b32_e32 v6, 0
	; GFX9-O3-NEXT: v_mov_b32_e32 v7, 0			; GFX9-O3-NEXT: v_mov_b32_e32 v7, 0
	; GFX9-O3-NEXT: s_not_b64 exec, exec			; GFX9-O3-NEXT: s_not_b64 exec, exec
	; GFX9-O3-NEXT: s_or_saveexec_b64 s[8:9], -1			; GFX9-O3-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GFX9-O3-NEXT: s_getpc_b64 s[12:13]			; GFX9-O3-NEXT: s_getpc_b64 s[12:13]
	; GFX9-O3-NEXT: s_add_u32 s12, s12, strict_wwm_called_i64@gotpcrel32@lo+4			; GFX9-O3-NEXT: s_add_u32 s12, s12, strict_wwm_called_i64@gotpcrel32@lo+4
	; GFX9-O3-NEXT: s_addc_u32 s13, s13, strict_wwm_called_i64@gotpcrel32@hi+12			; GFX9-O3-NEXT: s_addc_u32 s13, s13, strict_wwm_called_i64@gotpcrel32@hi+12
	; GFX9-O3-NEXT: s_load_dwordx2 s[12:13], s[12:13], 0x0			; GFX9-O3-NEXT: s_load_dwordx2 s[12:13], s[12:13], 0x0
	; GFX9-O3-NEXT: v_mov_b32_e32 v0, v6			; GFX9-O3-NEXT: v_mov_b32_e32 v0, v6
	; GFX9-O3-NEXT: v_mov_b32_e32 v1, v7			; GFX9-O3-NEXT: v_mov_b32_e32 v1, v7
	; GFX9-O3-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-O3-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-O3-NEXT: s_swappc_b64 s[30:31], s[12:13]			; GFX9-O3-NEXT: s_swappc_b64 s[30:31], s[12:13]
	; GFX9-O3-NEXT: v_mov_b32_e32 v2, v0			; GFX9-O3-NEXT: v_mov_b32_e32 v2, v0
	; GFX9-O3-NEXT: v_mov_b32_e32 v3, v1			; GFX9-O3-NEXT: v_mov_b32_e32 v3, v1
	; GFX9-O3-NEXT: v_add_co_u32_e32 v2, vcc, v2, v6			; GFX9-O3-NEXT: v_add_co_u32_e32 v2, vcc, v2, v6
	; GFX9-O3-NEXT: v_addc_co_u32_e32 v3, vcc, v3, v7, vcc			; GFX9-O3-NEXT: v_addc_co_u32_e32 v3, vcc, v3, v7, vcc
	; GFX9-O3-NEXT: s_mov_b64 exec, s[8:9]			; GFX9-O3-NEXT: s_mov_b64 exec, s[8:9]
	; GFX9-O3-NEXT: v_mov_b32_e32 v0, v2			; GFX9-O3-NEXT: v_mov_b32_e32 v0, v2
	; GFX9-O3-NEXT: v_mov_b32_e32 v1, v3			; GFX9-O3-NEXT: v_mov_b32_e32 v1, v3
	; GFX9-O3-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 offset:4			; GFX9-O3-NEXT: buffer_store_dwordx2 v[0:1], off, s[4:7], 0 offset:4
	; GFX9-O3-NEXT: s_sub_u32 s32, s32, 0x800			; GFX9-O3-NEXT: s_addk_i32 s32, 0xf800
	; GFX9-O3-NEXT: s_mov_b32 s33, s14			; GFX9-O3-NEXT: s_mov_b32 s33, s14
	; GFX9-O3-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-O3-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-O3-NEXT: buffer_load_dword v6, off, s[0:3], s32 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v6, off, s[0:3], s32 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: s_nop 0			; GFX9-O3-NEXT: s_nop 0
	; GFX9-O3-NEXT: buffer_load_dword v7, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v7, off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: s_nop 0			; GFX9-O3-NEXT: s_nop 0
	; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:12 ; 4-byte Folded Reload
	▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Use s_add_i32 for address additionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 350278

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/dynamic-alloca-uniform.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement-stack-lower.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/flat-scratch.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll

llvm/test/CodeGen/AMDGPU/addrspacecast.ll

llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll

llvm/test/CodeGen/AMDGPU/call-constant.ll

llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

llvm/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll

llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs-packed.ll

llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll

llvm/test/CodeGen/AMDGPU/cc-update.ll

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

llvm/test/CodeGen/AMDGPU/flat-scratch.ll

llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll

llvm/test/CodeGen/AMDGPU/frame-setup-without-sgpr-to-vgpr-spills.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll

llvm/test/CodeGen/AMDGPU/indirect-call.ll

llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

llvm/test/CodeGen/AMDGPU/need-fp-from-csr-vgpr-spill.ll

llvm/test/CodeGen/AMDGPU/nested-calls.ll

llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr-carry-out.mir

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr-gfx9.mir

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr.mir

llvm/test/CodeGen/AMDGPU/pei-scavenge-vgpr-spill.mir

llvm/test/CodeGen/AMDGPU/sgpr-spill.mir

llvm/test/CodeGen/AMDGPU/sibling-call.ll

llvm/test/CodeGen/AMDGPU/spill-offset-calculation.ll

llvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll

llvm/test/CodeGen/AMDGPU/stack-realign-kernel.ll

llvm/test/CodeGen/AMDGPU/stack-realign.ll

llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll

llvm/test/CodeGen/AMDGPU/wave32.ll

llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll

[AMDGPU] Use s_add_i32 for address additions
ClosedPublic