This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
SIFrameLowering.cpp
-
test/
-
CodeGen/AMDGPU/
-
AMDGPU/
-
GlobalISel/
-
assert-align.ll
-
call-outgoing-stack-args.ll
-
dynamic-alloca-uniform.ll
-
extractelement-stack-lower.ll
-
localizer.ll
-
non-entry-alloca.ll
-
abi-attribute-hints-undefined-behavior.ll
-
bf16.ll
-
call-argument-types.ll
-
callee-frame-setup.ll
-
callee-special-input-vgprs-packed.ll
-
callee-special-input-vgprs.ll
-
cross-block-use-is-not-abi-copy.ll
-
dwarf-multi-register-use-crash.ll
-
dynamic_stackalloc.ll
-
fix-frame-reg-in-custom-csr-spills.ll
-
frame-setup-without-sgpr-to-vgpr-spills.ll
-
gfx-call-non-gfx-func.ll
-
gfx-callable-argument-types.ll
-
gfx-callable-preserved-registers.ll
-
gfx-callable-return-types.ll
-
indirect-call.ll
-
insert-delay-alu-bug.ll
-
local-stack-alloc-block-sp-reference.ll
-
mul24-pass-ordering.ll
-
need-fp-from-vgpr-spills.ll
-
nested-calls.ll
-
no-source-locations-in-prologue.ll
-
non-entry-alloca.ll
-
pei-scavenge-sgpr-carry-out.mir
-
pei-scavenge-sgpr-gfx9.mir
-
pei-scavenge-sgpr.mir
-
pei-scavenge-vgpr-spill.mir
-
preserve-wwm-copy-dst-reg.ll
-
sgpr-spills-split-regalloc.ll
-
sibling-call.ll
-
stack-realign.ll
-
stacksave_stackrestore.ll
-
unstructured-cfg-def-use-issue.ll
-
use_restore_frame_reg.mir
-
vgpr-tuple-allocation.ll
-
wave32.ll
-
whole-wave-register-copy.ll
-
whole-wave-register-spill.ll
-
wwm-reserved-spill.ll
-
tools/UpdateTestChecks/update_llc_test_checks/Inputs/
-
UpdateTestChecks/
-
update_llc_test_checks/
-
Inputs/
-
amdgpu_generated_funcs.ll.generated.expected

Differential D158190

[wip] AMDGPU: Try to restore SP correctly in presence of dynamic stack adjustments
Needs ReviewPublic

Authored by arsenm on Aug 17 2023, 8:40 AM.

Download Raw Diff

Details

Reviewers

scott.linder
cdevadas
yassingh
sebastian-ne

Group Reviewers

Restricted Project

Summary

Currently have no idea if this really works. We could probably try
restoring the SP from the base pointer if it's available. For some
reason we're currently inserting CSR spills before the stack is
realigned.

Diff Detail

Event Timeline

arsenm created this revision.Aug 17 2023, 8:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2023, 8:40 AM

Herald added subscribers: foad, kerbowa, hiraditya and 8 others. · View Herald Transcript

arsenm requested review of this revision.Aug 17 2023, 8:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2023, 8:40 AM

Herald added a subscriber: wdng. · View Herald Transcript

Harbormaster completed remote builds in B253231: Diff 551152.Aug 17 2023, 8:41 AM

Just to make sure I understand our current scheme correctly (before this patch).
If we re-align the stack, we do (ignoring the scale factor)

fp = sp + (alignment - 1)
fp &= -alignment
sp += frameSize + alignment

And in the epilogue:

sp -= frameSize + alignment

Due to the alignment of fp (but not sp), the allocated stack size sp - fp may be larger than needed, but it is restored correctly.
However, sp is not aligned, so maybe this causes problems when calling another function that expects the stack to be already aligned?

In case the stack is already aligned and does not need re-alignment, using a sp = fp in the epilogue sounds ok to me.
For the re-alignment case, sp = fp - (alignment - 1) looks incorrect to me.

If we do not want the over-commitment, we need to enforce the presence of a base pointer and in the epilogue, restore the stack pointer from the base pointer (sp = bp).

In D158190#4595893, @sebastian-ne wrote:
Just to make sure I understand our current scheme correctly (before this patch).
If we re-align the stack, we do (ignoring the scale factor)
fp = sp + (alignment - 1)
fp &= -alignment
sp += frameSize + alignment
And in the epilogue:
sp -= frameSize + alignment

Yes, this is essentially what was happening

Due to the alignment of fp (but not sp), the allocated stack size sp - fp may be larger than needed, but it is restored correctly.
However, sp is not aligned, so maybe this causes problems when calling another function that expects the stack to be already aligned?

We currently assume 16 byte alignment on stack entry. Realignment is triggered by a stack object with a larger alignment requirement, or forced with the "stackrealign" attribute.

I don't think there was any pre-existing issue with stack realignment. We restored the realigned size in the epilog. The problem is if there were any dynamic stack adjustments, the restore using a fixed offset just assumed none happened.

In case the stack is already aligned and does not need re-alignment, using a sp = fp in the epilogue sounds ok to me.
For the re-alignment case, sp = fp - (alignment - 1) looks incorrect to me.

If we do not want the over-commitment, we need to enforce the presence of a base pointer and in the epilogue, restore the stack pointer from the base pointer (sp = bp).

The FP is set after the stack is realigned, so the original SP has the additional alignment padding offset

The problem is if there were any dynamic stack adjustments, the restore using a fixed offset just assumed none happened.

I see, that is problematic.

An example, where I think the new code fails:

sp = 0x10
alignment = 0x20
frameSize = 0x10

// prologue
fp = sp + (alignment - 1) = 0x2f
fp &= -alignment = 0x20
sp += frameSize + alignment = 0x40
// some dynamic allocation happens
sp += 0x18 = 0x58
// everything ok so far

// epilogue
sp = fp - (alignment - 1) = 0x20 - 0x1f = 0x01
// but sp was 0x10 at the start

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIFrameLowering.cpp

22 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

assert-align.ll

2 lines

call-outgoing-stack-args.ll

8 lines

dynamic-alloca-uniform.ll

36 lines

extractelement-stack-lower.ll

6 lines

localizer.ll

2 lines

non-entry-alloca.ll

7 lines

abi-attribute-hints-undefined-behavior.ll

2 lines

bf16.ll

60 lines

call-argument-types.ll

40 lines

callee-frame-setup.ll

45 lines

callee-special-input-vgprs-packed.ll

2 lines

callee-special-input-vgprs.ll

2 lines

cross-block-use-is-not-abi-copy.ll

8 lines

dwarf-multi-register-use-crash.ll

2 lines

dynamic_stackalloc.ll

79 lines

fix-frame-reg-in-custom-csr-spills.ll

2 lines

frame-setup-without-sgpr-to-vgpr-spills.ll

4 lines

gfx-call-non-gfx-func.ll

4 lines

gfx-callable-argument-types.ll

720 lines

gfx-callable-preserved-registers.ll

60 lines

gfx-callable-return-types.ll

42 lines

indirect-call.ll

32 lines

insert-delay-alu-bug.ll

2 lines

local-stack-alloc-block-sp-reference.ll

4 lines

mul24-pass-ordering.ll

2 lines

need-fp-from-vgpr-spills.ll

9 lines

nested-calls.ll

4 lines

no-source-locations-in-prologue.ll

2 lines

non-entry-alloca.ll

8 lines

pei-scavenge-sgpr-carry-out.mir

8 lines

pei-scavenge-sgpr-gfx9.mir

5 lines

pei-scavenge-sgpr.mir

2 lines

pei-scavenge-vgpr-spill.mir

8 lines

preserve-wwm-copy-dst-reg.ll

4 lines

sgpr-spills-split-regalloc.ll

6 lines

sibling-call.ll

2 lines

stack-realign.ll

12 lines

stacksave_stackrestore.ll

8 lines

unstructured-cfg-def-use-issue.ll

4 lines

use_restore_frame_reg.mir

5 lines

vgpr-tuple-allocation.ll

12 lines

wave32.ll

4 lines

whole-wave-register-copy.ll

2 lines

whole-wave-register-spill.ll

4 lines

wwm-reserved-spill.ll

8 lines

tools/

UpdateTestChecks/

update_llc_test_checks/

Inputs/

amdgpu_generated_funcs.ll.generated.expected

4 lines

Diff 551152

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

Show First 20 Lines • Show All 1,151 Lines • ▼ Show 20 Lines	if (HasFP) {
FuncInfo->setIsStackRealigned(true);		FuncInfo->setIsStackRealigned(true);
} else if ((HasFP = hasFP(MF))) {		} else if ((HasFP = hasFP(MF))) {
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::COPY), FramePtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::COPY), FramePtrReg)
.addReg(StackPtrReg)		.addReg(StackPtrReg)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
}		}

// If FP is used, emit the CSR spills with FP base register.		// If FP is used, emit the CSR spills with FP base register.
		// FIXME: We should emit CSR spilling before stack realignment
if (HasFP) {		if (HasFP) {
emitCSRSpillStores(MF, MBB, MBBI, DL, LiveRegs, FramePtrReg,		emitCSRSpillStores(MF, MBB, MBBI, DL, LiveRegs, FramePtrReg,
FramePtrRegScratchCopy);		FramePtrRegScratchCopy);
if (FramePtrRegScratchCopy)		if (FramePtrRegScratchCopy)
LiveRegs.removeReg(FramePtrRegScratchCopy);		LiveRegs.removeReg(FramePtrRegScratchCopy);
}		}

// If we need a base pointer, set it up here. It's whatever the value of		// If we need a base pointer, set it up here. It's whatever the value of
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	if (!MBB.empty()) {
MBBI = MBB.getLastNonDebugInstr();		MBBI = MBB.getLastNonDebugInstr();
if (MBBI != MBB.end())		if (MBBI != MBB.end())
DL = MBBI->getDebugLoc();		DL = MBBI->getDebugLoc();

MBBI = MBB.getFirstTerminator();		MBBI = MBB.getFirstTerminator();
}		}

const MachineFrameInfo &MFI = MF.getFrameInfo();		const MachineFrameInfo &MFI = MF.getFrameInfo();
uint32_t NumBytes = MFI.getStackSize();
uint32_t RoundedSize = FuncInfo->isStackRealigned()
? NumBytes + MFI.getMaxAlign().value()
: NumBytes;
const Register StackPtrReg = FuncInfo->getStackPtrOffsetReg();		const Register StackPtrReg = FuncInfo->getStackPtrOffsetReg();
Register FramePtrReg = FuncInfo->getFrameOffsetReg();		Register FramePtrReg = FuncInfo->getFrameOffsetReg();
bool FPSaved = FuncInfo->hasPrologEpilogSGPRSpillEntry(FramePtrReg);		bool FPSaved = FuncInfo->hasPrologEpilogSGPRSpillEntry(FramePtrReg);

Register FramePtrRegScratchCopy;		Register FramePtrRegScratchCopy;
Register SGPRForFPSaveRestoreCopy =		Register SGPRForFPSaveRestoreCopy =
FuncInfo->getScratchSGPRCopyDstReg(FramePtrReg);		FuncInfo->getScratchSGPRCopyDstReg(FramePtrReg);
if (FPSaved) {		if (FPSaved) {
Show All 12 Lines	if (SGPRForFPSaveRestoreCopy) {

LiveRegs.addReg(FramePtrRegScratchCopy);		LiveRegs.addReg(FramePtrRegScratchCopy);
}		}

emitCSRSpillRestores(MF, MBB, MBBI, DL, LiveRegs, FramePtrReg,		emitCSRSpillRestores(MF, MBB, MBBI, DL, LiveRegs, FramePtrReg,
FramePtrRegScratchCopy);		FramePtrRegScratchCopy);
}		}

if (RoundedSize != 0 && hasFP(MF)) {		if (hasFP(MF)) {
		if (FuncInfo->isStackRealigned()) {
		int32_t RealignOffset = MFI.getMaxAlign().value() - 1;
auto Add = BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_ADD_I32), StackPtrReg)		auto Add = BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_ADD_I32), StackPtrReg)
.addReg(StackPtrReg)		.addReg(FramePtrReg)
.addImm(-static_cast<int64_t>(RoundedSize * getScratchScaleFactor(ST)))		.addImm(-static_cast<int64_t>(RealignOffset * getScratchScaleFactor(ST)))
.setMIFlag(MachineInstr::FrameDestroy);		.setMIFlag(MachineInstr::FrameDestroy);
Add->getOperand(3).setIsDead(); // Mark SCC as dead.		Add->getOperand(3).setIsDead(); // Mark SCC as dead.
		} else {
		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_MOV_B32), StackPtrReg)
		.addReg(FramePtrReg)
		.setMIFlag(MachineInstr::FrameDestroy);
		}
}		}

if (FPSaved) {		if (FPSaved) {
// Insert the copy to restore FP.		// Insert the copy to restore FP.
Register SrcReg = SGPRForFPSaveRestoreCopy ? SGPRForFPSaveRestoreCopy		Register SrcReg = SGPRForFPSaveRestoreCopy ? SGPRForFPSaveRestoreCopy
: FramePtrRegScratchCopy;		: FramePtrRegScratchCopy;
MachineInstrBuilder MIB =		MachineInstrBuilder MIB =
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::COPY), FramePtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::COPY), FramePtrReg)
▲ Show 20 Lines • Show All 541 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll

	Show All 25 Lines
	; CHECK-NEXT: global_store_dword v[0:1], v2, off			; CHECK-NEXT: global_store_dword v[0:1], v2, off
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: v_readlane_b32 s31, v40, 1			; CHECK-NEXT: v_readlane_b32 s31, v40, 1
	; CHECK-NEXT: v_readlane_b32 s30, v40, 0			; CHECK-NEXT: v_readlane_b32 s30, v40, 0
	; CHECK-NEXT: v_readlane_b32 s4, v40, 2			; CHECK-NEXT: v_readlane_b32 s4, v40, 2
	; CHECK-NEXT: s_or_saveexec_b64 s[6:7], -1			; CHECK-NEXT: s_or_saveexec_b64 s[6:7], -1
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[6:7]			; CHECK-NEXT: s_mov_b64 exec, s[6:7]
	; CHECK-NEXT: s_addk_i32 s32, 0xfc00			; CHECK-NEXT: s_mov_b32 s32, s33
	; CHECK-NEXT: s_mov_b32 s33, s4			; CHECK-NEXT: s_mov_b32 s33, s4
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%call = call align 4 ptr addrspace(1) @ext(ptr addrspace(1) null)			%call = call align 4 ptr addrspace(1) @ext(ptr addrspace(1) null)
	store volatile i32 0, ptr addrspace(1) %call			store volatile i32 0, ptr addrspace(1) %call
	ret ptr addrspace(1) %call			ret ptr addrspace(1) %call
	}			}
	Show All 15 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll

	Show First 20 Lines • Show All 245 Lines • ▼ Show 20 Lines
	; MUBUF-NEXT: s_addc_u32 s5, s5, external_void_func_v16i32_v16i32_v4i32@rel32@hi+12			; MUBUF-NEXT: s_addc_u32 s5, s5, external_void_func_v16i32_v16i32_v4i32@rel32@hi+12
	; MUBUF-NEXT: s_swappc_b64 s[30:31], s[4:5]			; MUBUF-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; MUBUF-NEXT: v_readlane_b32 s31, v40, 1			; MUBUF-NEXT: v_readlane_b32 s31, v40, 1
	; MUBUF-NEXT: v_readlane_b32 s30, v40, 0			; MUBUF-NEXT: v_readlane_b32 s30, v40, 0
	; MUBUF-NEXT: v_readlane_b32 s4, v40, 2			; MUBUF-NEXT: v_readlane_b32 s4, v40, 2
	; MUBUF-NEXT: s_or_saveexec_b64 s[6:7], -1			; MUBUF-NEXT: s_or_saveexec_b64 s[6:7], -1
	; MUBUF-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; MUBUF-NEXT: s_mov_b64 exec, s[6:7]			; MUBUF-NEXT: s_mov_b64 exec, s[6:7]
	; MUBUF-NEXT: s_addk_i32 s32, 0xfc00			; MUBUF-NEXT: s_mov_b32 s32, s33
	; MUBUF-NEXT: s_mov_b32 s33, s4			; MUBUF-NEXT: s_mov_b32 s33, s4
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: s_setpc_b64 s[30:31]			; MUBUF-NEXT: s_setpc_b64 s[30:31]
	;			;
	; FLATSCR-LABEL: func_caller_stack:			; FLATSCR-LABEL: func_caller_stack:
	; FLATSCR: ; %bb.0:			; FLATSCR: ; %bb.0:
	; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; FLATSCR-NEXT: s_mov_b32 s0, s33			; FLATSCR-NEXT: s_mov_b32 s0, s33
	Show All 22 Lines
	; FLATSCR-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32_v16i32_v4i32@rel32@hi+12			; FLATSCR-NEXT: s_addc_u32 s1, s1, external_void_func_v16i32_v16i32_v4i32@rel32@hi+12
	; FLATSCR-NEXT: s_swappc_b64 s[30:31], s[0:1]			; FLATSCR-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; FLATSCR-NEXT: v_readlane_b32 s31, v40, 1			; FLATSCR-NEXT: v_readlane_b32 s31, v40, 1
	; FLATSCR-NEXT: v_readlane_b32 s30, v40, 0			; FLATSCR-NEXT: v_readlane_b32 s30, v40, 0
	; FLATSCR-NEXT: v_readlane_b32 s0, v40, 2			; FLATSCR-NEXT: v_readlane_b32 s0, v40, 2
	; FLATSCR-NEXT: s_or_saveexec_b64 s[2:3], -1			; FLATSCR-NEXT: s_or_saveexec_b64 s[2:3], -1
	; FLATSCR-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; FLATSCR-NEXT: s_mov_b64 exec, s[2:3]			; FLATSCR-NEXT: s_mov_b64 exec, s[2:3]
	; FLATSCR-NEXT: s_add_i32 s32, s32, -16			; FLATSCR-NEXT: s_mov_b32 s32, s33
	; FLATSCR-NEXT: s_mov_b32 s33, s0			; FLATSCR-NEXT: s_mov_b32 s33, s0
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_setpc_b64 s[30:31]			; FLATSCR-NEXT: s_setpc_b64 s[30:31]
	call void @external_void_func_v16i32_v16i32_v4i32(<16 x i32> undef, <16 x i32> undef, <4 x i32> <i32 9, i32 10, i32 11, i32 12>)			call void @external_void_func_v16i32_v16i32_v4i32(<16 x i32> undef, <16 x i32> undef, <4 x i32> <i32 9, i32 10, i32 11, i32 12>)
	ret void			ret void
	}			}

	define void @func_caller_byval(ptr addrspace(5) %argptr) {			define void @func_caller_byval(ptr addrspace(5) %argptr) {
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; MUBUF-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:60			; MUBUF-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:60
	; MUBUF-NEXT: s_swappc_b64 s[30:31], s[4:5]			; MUBUF-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; MUBUF-NEXT: v_readlane_b32 s31, v40, 1			; MUBUF-NEXT: v_readlane_b32 s31, v40, 1
	; MUBUF-NEXT: v_readlane_b32 s30, v40, 0			; MUBUF-NEXT: v_readlane_b32 s30, v40, 0
	; MUBUF-NEXT: v_readlane_b32 s4, v40, 2			; MUBUF-NEXT: v_readlane_b32 s4, v40, 2
	; MUBUF-NEXT: s_or_saveexec_b64 s[6:7], -1			; MUBUF-NEXT: s_or_saveexec_b64 s[6:7], -1
	; MUBUF-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; MUBUF-NEXT: s_mov_b64 exec, s[6:7]			; MUBUF-NEXT: s_mov_b64 exec, s[6:7]
	; MUBUF-NEXT: s_addk_i32 s32, 0xfc00			; MUBUF-NEXT: s_mov_b32 s32, s33
	; MUBUF-NEXT: s_mov_b32 s33, s4			; MUBUF-NEXT: s_mov_b32 s33, s4
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: s_setpc_b64 s[30:31]			; MUBUF-NEXT: s_setpc_b64 s[30:31]
	;			;
	; FLATSCR-LABEL: func_caller_byval:			; FLATSCR-LABEL: func_caller_byval:
	; FLATSCR: ; %bb.0:			; FLATSCR: ; %bb.0:
	; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; FLATSCR-NEXT: s_mov_b32 s0, s33			; FLATSCR-NEXT: s_mov_b32 s0, s33
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s2			; FLATSCR-NEXT: scratch_store_dwordx2 off, v[0:1], s2
	; FLATSCR-NEXT: s_swappc_b64 s[30:31], s[0:1]			; FLATSCR-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; FLATSCR-NEXT: v_readlane_b32 s31, v40, 1			; FLATSCR-NEXT: v_readlane_b32 s31, v40, 1
	; FLATSCR-NEXT: v_readlane_b32 s30, v40, 0			; FLATSCR-NEXT: v_readlane_b32 s30, v40, 0
	; FLATSCR-NEXT: v_readlane_b32 s0, v40, 2			; FLATSCR-NEXT: v_readlane_b32 s0, v40, 2
	; FLATSCR-NEXT: s_or_saveexec_b64 s[2:3], -1			; FLATSCR-NEXT: s_or_saveexec_b64 s[2:3], -1
	; FLATSCR-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; FLATSCR-NEXT: s_mov_b64 exec, s[2:3]			; FLATSCR-NEXT: s_mov_b64 exec, s[2:3]
	; FLATSCR-NEXT: s_add_i32 s32, s32, -16			; FLATSCR-NEXT: s_mov_b32 s32, s33
	; FLATSCR-NEXT: s_mov_b32 s33, s0			; FLATSCR-NEXT: s_mov_b32 s33, s0
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_setpc_b64 s[30:31]			; FLATSCR-NEXT: s_setpc_b64 s[30:31]
	call void @external_void_func_byval(ptr addrspace(5) byval([16 x i32]) %argptr)			call void @external_void_func_byval(ptr addrspace(5) byval([16 x i32]) %argptr)
	ret void			ret void
	}			}

	declare void @llvm.memset.p5.i32(ptr addrspace(5) nocapture writeonly, i8, i32, i1 immarg) #1			declare void @llvm.memset.p5.i32(ptr addrspace(5) nocapture writeonly, i8, i32, i1 immarg) #1

	attributes #0 = { nounwind "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" }			attributes #0 = { nounwind "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" }
	attributes #1 = { argmemonly nofree nounwind willreturn writeonly }			attributes #1 = { argmemonly nofree nounwind willreturn writeonly }

llvm/test/CodeGen/AMDGPU/GlobalISel/dynamic-alloca-uniform.ll

	Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_mov_b32 s6, s33			; GFX9-NEXT: s_mov_b32 s6, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_load_dword s4, s[4:5], 0x0			; GFX9-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_lshl2_add_u32 s4, s4, 15			; GFX9-NEXT: s_lshl2_add_u32 s4, s4, 15
	; GFX9-NEXT: s_and_b32 s4, s4, -16			; GFX9-NEXT: s_and_b32 s4, s4, -16
	; GFX9-NEXT: s_lshl_b32 s4, s4, 6			; GFX9-NEXT: s_lshl_b32 s4, s4, 6
	; GFX9-NEXT: s_add_u32 s4, s32, s4			; GFX9-NEXT: s_add_u32 s4, s32, s4
	; GFX9-NEXT: v_mov_b32_e32 v1, s4			; GFX9-NEXT: v_mov_b32_e32 v1, s4
	; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
				; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: func_dynamic_stackalloc_sgpr_align4:			; GFX10-LABEL: func_dynamic_stackalloc_sgpr_align4:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s6, s33			; GFX10-NEXT: s_mov_b32 s6, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_load_dword s4, s[4:5], 0x0			; GFX10-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_lshl2_add_u32 s4, s4, 15			; GFX10-NEXT: s_lshl2_add_u32 s4, s4, 15
	; GFX10-NEXT: s_and_b32 s4, s4, -16			; GFX10-NEXT: s_and_b32 s4, s4, -16
	; GFX10-NEXT: s_lshl_b32 s4, s4, 5			; GFX10-NEXT: s_lshl_b32 s4, s4, 5
	; GFX10-NEXT: s_add_u32 s4, s32, s4			; GFX10-NEXT: s_add_u32 s4, s32, s4
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: v_mov_b32_e32 v1, s4			; GFX10-NEXT: v_mov_b32_e32 v1, s4
				; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX10-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: func_dynamic_stackalloc_sgpr_align4:			; GFX11-LABEL: func_dynamic_stackalloc_sgpr_align4:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s2, s33			; GFX11-NEXT: s_mov_b32 s2, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, gv@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, gv@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, gv@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, gv@gotpcrel32@hi+12
	; GFX11-NEXT: v_mov_b32_e32 v0, 0			; GFX11-NEXT: v_mov_b32_e32 v0, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x0			; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_lshl2_add_u32 s0, s0, 15			; GFX11-NEXT: s_lshl2_add_u32 s0, s0, 15
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(NEXT) \| instid1(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(NEXT) \| instid1(SALU_CYCLE_1)
	; GFX11-NEXT: s_and_b32 s0, s0, -16			; GFX11-NEXT: s_and_b32 s0, s0, -16
	; GFX11-NEXT: s_lshl_b32 s0, s0, 5			; GFX11-NEXT: s_lshl_b32 s0, s0, 5
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_add_u32 s0, s32, s0			; GFX11-NEXT: s_add_u32 s0, s32, s0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: scratch_store_b32 off, v0, s0			; GFX11-NEXT: scratch_store_b32 off, v0, s0
				; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%n = load i32, ptr addrspace(4) @gv, align 4			%n = load i32, ptr addrspace(4) @gv, align 4
	%alloca = alloca i32, i32 %n, addrspace(5)			%alloca = alloca i32, i32 %n, addrspace(5)
	store i32 0, ptr addrspace(5) %alloca			store i32 0, ptr addrspace(5) %alloca
	ret void			ret void
	}			}

	define amdgpu_kernel void @kernel_dynamic_stackalloc_sgpr_align16(i32 %n) {			define amdgpu_kernel void @kernel_dynamic_stackalloc_sgpr_align16(i32 %n) {
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_mov_b32 s6, s33			; GFX9-NEXT: s_mov_b32 s6, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_load_dword s4, s[4:5], 0x0			; GFX9-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_lshl2_add_u32 s4, s4, 15			; GFX9-NEXT: s_lshl2_add_u32 s4, s4, 15
	; GFX9-NEXT: s_and_b32 s4, s4, -16			; GFX9-NEXT: s_and_b32 s4, s4, -16
	; GFX9-NEXT: s_lshl_b32 s4, s4, 6			; GFX9-NEXT: s_lshl_b32 s4, s4, 6
	; GFX9-NEXT: s_add_u32 s4, s32, s4			; GFX9-NEXT: s_add_u32 s4, s32, s4
	; GFX9-NEXT: v_mov_b32_e32 v1, s4			; GFX9-NEXT: v_mov_b32_e32 v1, s4
	; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
				; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: func_dynamic_stackalloc_sgpr_align16:			; GFX10-LABEL: func_dynamic_stackalloc_sgpr_align16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s6, s33			; GFX10-NEXT: s_mov_b32 s6, s33
	; GFX10-NEXT: s_mov_b32 s33, s32			; GFX10-NEXT: s_mov_b32 s33, s32
	; GFX10-NEXT: s_addk_i32 s32, 0x200			; GFX10-NEXT: s_addk_i32 s32, 0x200
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_load_dword s4, s[4:5], 0x0			; GFX10-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_lshl2_add_u32 s4, s4, 15			; GFX10-NEXT: s_lshl2_add_u32 s4, s4, 15
	; GFX10-NEXT: s_and_b32 s4, s4, -16			; GFX10-NEXT: s_and_b32 s4, s4, -16
	; GFX10-NEXT: s_lshl_b32 s4, s4, 5			; GFX10-NEXT: s_lshl_b32 s4, s4, 5
	; GFX10-NEXT: s_add_u32 s4, s32, s4			; GFX10-NEXT: s_add_u32 s4, s32, s4
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: v_mov_b32_e32 v1, s4			; GFX10-NEXT: v_mov_b32_e32 v1, s4
				; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX10-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: func_dynamic_stackalloc_sgpr_align16:			; GFX11-LABEL: func_dynamic_stackalloc_sgpr_align16:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s2, s33			; GFX11-NEXT: s_mov_b32 s2, s33
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, gv@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, gv@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, gv@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, gv@gotpcrel32@hi+12
	; GFX11-NEXT: v_mov_b32_e32 v0, 0			; GFX11-NEXT: v_mov_b32_e32 v0, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x0			; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_lshl2_add_u32 s0, s0, 15			; GFX11-NEXT: s_lshl2_add_u32 s0, s0, 15
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(NEXT) \| instid1(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(NEXT) \| instid1(SALU_CYCLE_1)
	; GFX11-NEXT: s_and_b32 s0, s0, -16			; GFX11-NEXT: s_and_b32 s0, s0, -16
	; GFX11-NEXT: s_lshl_b32 s0, s0, 5			; GFX11-NEXT: s_lshl_b32 s0, s0, 5
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_add_u32 s0, s32, s0			; GFX11-NEXT: s_add_u32 s0, s32, s0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: scratch_store_b32 off, v0, s0			; GFX11-NEXT: scratch_store_b32 off, v0, s0
				; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%n = load i32, ptr addrspace(4) @gv, align 16			%n = load i32, ptr addrspace(4) @gv, align 16
	%alloca = alloca i32, i32 %n, addrspace(5)			%alloca = alloca i32, i32 %n, addrspace(5)
	store i32 0, ptr addrspace(5) %alloca			store i32 0, ptr addrspace(5) %alloca
	ret void			ret void
	}			}

	define amdgpu_kernel void @kernel_dynamic_stackalloc_sgpr_align32(i32 %n) {			define amdgpu_kernel void @kernel_dynamic_stackalloc_sgpr_align32(i32 %n) {
	▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_add_i32 s33, s32, 0x7c0			; GFX9-NEXT: s_add_i32 s33, s32, 0x7c0
	; GFX9-NEXT: s_and_b32 s33, s33, 0xfffff800			; GFX9-NEXT: s_and_b32 s33, s33, 0xfffff800
	; GFX9-NEXT: s_addk_i32 s32, 0x1000			; GFX9-NEXT: s_addk_i32 s32, 0x1000
	; GFX9-NEXT: s_getpc_b64 s[4:5]			; GFX9-NEXT: s_getpc_b64 s[4:5]
	; GFX9-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4			; GFX9-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4
	; GFX9-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_load_dword s4, s[4:5], 0x0			; GFX9-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_lshl2_add_u32 s4, s4, 15			; GFX9-NEXT: s_lshl2_add_u32 s4, s4, 15
	; GFX9-NEXT: s_and_b32 s4, s4, -16			; GFX9-NEXT: s_and_b32 s4, s4, -16
	; GFX9-NEXT: s_lshl_b32 s4, s4, 6			; GFX9-NEXT: s_lshl_b32 s4, s4, 6
	; GFX9-NEXT: s_add_u32 s4, s32, s4			; GFX9-NEXT: s_add_u32 s4, s32, s4
	; GFX9-NEXT: s_and_b32 s4, s4, 0xfffff800			; GFX9-NEXT: s_and_b32 s4, s4, 0xfffff800
	; GFX9-NEXT: v_mov_b32_e32 v1, s4			; GFX9-NEXT: v_mov_b32_e32 v1, s4
	; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_addk_i32 s32, 0xf000			; GFX9-NEXT: s_add_i32 s32, s33, 0xfffff840
				; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: func_dynamic_stackalloc_sgpr_align32:			; GFX10-LABEL: func_dynamic_stackalloc_sgpr_align32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s6, s33			; GFX10-NEXT: s_mov_b32 s6, s33
	; GFX10-NEXT: s_add_i32 s33, s32, 0x3e0			; GFX10-NEXT: s_add_i32 s33, s32, 0x3e0
	; GFX10-NEXT: s_addk_i32 s32, 0x800			; GFX10-NEXT: s_addk_i32 s32, 0x800
	; GFX10-NEXT: s_and_b32 s33, s33, 0xfffffc00			; GFX10-NEXT: s_and_b32 s33, s33, 0xfffffc00
	; GFX10-NEXT: s_getpc_b64 s[4:5]			; GFX10-NEXT: s_getpc_b64 s[4:5]
	; GFX10-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4			; GFX10-NEXT: s_add_u32 s4, s4, gv@gotpcrel32@lo+4
	; GFX10-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12			; GFX10-NEXT: s_addc_u32 s5, s5, gv@gotpcrel32@hi+12
	; GFX10-NEXT: v_mov_b32_e32 v0, 0			; GFX10-NEXT: v_mov_b32_e32 v0, 0
	; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; GFX10-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_load_dword s4, s[4:5], 0x0			; GFX10-NEXT: s_load_dword s4, s[4:5], 0x0
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_lshl2_add_u32 s4, s4, 15			; GFX10-NEXT: s_lshl2_add_u32 s4, s4, 15
	; GFX10-NEXT: s_and_b32 s4, s4, -16			; GFX10-NEXT: s_and_b32 s4, s4, -16
	; GFX10-NEXT: s_lshl_b32 s4, s4, 5			; GFX10-NEXT: s_lshl_b32 s4, s4, 5
	; GFX10-NEXT: s_add_u32 s4, s32, s4			; GFX10-NEXT: s_add_u32 s4, s32, s4
	; GFX10-NEXT: s_addk_i32 s32, 0xf800			; GFX10-NEXT: s_add_i32 s32, s33, 0xfffffc20
	; GFX10-NEXT: s_and_b32 s4, s4, 0xfffffc00			; GFX10-NEXT: s_and_b32 s4, s4, 0xfffffc00
				; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: v_mov_b32_e32 v1, s4			; GFX10-NEXT: v_mov_b32_e32 v1, s4
	; GFX10-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX10-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: func_dynamic_stackalloc_sgpr_align32:			; GFX11-LABEL: func_dynamic_stackalloc_sgpr_align32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s2, s33			; GFX11-NEXT: s_mov_b32 s2, s33
	; GFX11-NEXT: s_add_i32 s33, s32, 31			; GFX11-NEXT: s_add_i32 s33, s32, 31
	; GFX11-NEXT: s_add_i32 s32, s32, 64			; GFX11-NEXT: s_add_i32 s32, s32, 64
	; GFX11-NEXT: s_and_not1_b32 s33, s33, 31			; GFX11-NEXT: s_and_not1_b32 s33, s33, 31
	; GFX11-NEXT: s_getpc_b64 s[0:1]			; GFX11-NEXT: s_getpc_b64 s[0:1]
	; GFX11-NEXT: s_add_u32 s0, s0, gv@gotpcrel32@lo+4			; GFX11-NEXT: s_add_u32 s0, s0, gv@gotpcrel32@lo+4
	; GFX11-NEXT: s_addc_u32 s1, s1, gv@gotpcrel32@hi+12			; GFX11-NEXT: s_addc_u32 s1, s1, gv@gotpcrel32@hi+12
	; GFX11-NEXT: v_mov_b32_e32 v0, 0			; GFX11-NEXT: v_mov_b32_e32 v0, 0
	; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0			; GFX11-NEXT: s_load_b64 s[0:1], s[0:1], 0x0
	; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x0			; GFX11-NEXT: s_load_b32 s0, s[0:1], 0x0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_lshl2_add_u32 s0, s0, 15			; GFX11-NEXT: s_lshl2_add_u32 s0, s0, 15
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(NEXT) \| instid1(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(NEXT) \| instid1(SALU_CYCLE_1)
	; GFX11-NEXT: s_and_b32 s0, s0, -16			; GFX11-NEXT: s_and_b32 s0, s0, -16
	; GFX11-NEXT: s_lshl_b32 s0, s0, 5			; GFX11-NEXT: s_lshl_b32 s0, s0, 5
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_add_u32 s0, s32, s0			; GFX11-NEXT: s_add_u32 s0, s32, s0
	; GFX11-NEXT: s_addk_i32 s32, 0xffc0			; GFX11-NEXT: s_add_i32 s32, s33, 0xffffffe1
	; GFX11-NEXT: s_and_b32 s0, s0, 0xfffffc00			; GFX11-NEXT: s_and_b32 s0, s0, 0xfffffc00
				; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: scratch_store_b32 off, v0, s0			; GFX11-NEXT: scratch_store_b32 off, v0, s0
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%n = load i32, ptr addrspace(4) @gv			%n = load i32, ptr addrspace(4) @gv
	%alloca = alloca i32, i32 %n, align 32, addrspace(5)			%alloca = alloca i32, i32 %n, align 32, addrspace(5)
	store i32 0, ptr addrspace(5) %alloca			store i32 0, ptr addrspace(5) %alloca
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement-stack-lower.ll

	Show All 37 Lines
	; GCN-NEXT: global_load_dwordx4 v[60:63], v[0:1], off offset:112			; GCN-NEXT: global_load_dwordx4 v[60:63], v[0:1], off offset:112
	; GCN-NEXT: global_load_dwordx4 v[36:39], v[0:1], off offset:128			; GCN-NEXT: global_load_dwordx4 v[36:39], v[0:1], off offset:128
	; GCN-NEXT: global_load_dwordx4 v[32:35], v[0:1], off offset:144			; GCN-NEXT: global_load_dwordx4 v[32:35], v[0:1], off offset:144
	; GCN-NEXT: global_load_dwordx4 v[28:31], v[0:1], off offset:160			; GCN-NEXT: global_load_dwordx4 v[28:31], v[0:1], off offset:160
	; GCN-NEXT: global_load_dwordx4 v[52:55], v[0:1], off offset:176			; GCN-NEXT: global_load_dwordx4 v[52:55], v[0:1], off offset:176
	; GCN-NEXT: global_load_dwordx4 v[24:27], v[0:1], off offset:192			; GCN-NEXT: global_load_dwordx4 v[24:27], v[0:1], off offset:192
	; GCN-NEXT: global_load_dwordx4 v[7:10], v[0:1], off offset:208			; GCN-NEXT: global_load_dwordx4 v[7:10], v[0:1], off offset:208
	; GCN-NEXT: s_add_i32 s32, s32, 0x10000			; GCN-NEXT: s_add_i32 s32, s32, 0x10000
	; GCN-NEXT: s_add_i32 s32, s32, 0xffff0000			; GCN-NEXT: s_add_i32 s32, s33, 0xffffc040
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:512 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:512 ; 4-byte Folded Spill
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:516 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:516 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:520 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:520 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:524 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:524 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:528 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:528 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v8, off, s[0:3], s33 offset:532 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v8, off, s[0:3], s33 offset:532 ; 4-byte Folded Spill
	▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines
	; GCN-NEXT: global_load_dwordx4 v[60:63], v[0:1], off offset:112			; GCN-NEXT: global_load_dwordx4 v[60:63], v[0:1], off offset:112
	; GCN-NEXT: global_load_dwordx4 v[36:39], v[0:1], off offset:128			; GCN-NEXT: global_load_dwordx4 v[36:39], v[0:1], off offset:128
	; GCN-NEXT: global_load_dwordx4 v[32:35], v[0:1], off offset:144			; GCN-NEXT: global_load_dwordx4 v[32:35], v[0:1], off offset:144
	; GCN-NEXT: global_load_dwordx4 v[28:31], v[0:1], off offset:160			; GCN-NEXT: global_load_dwordx4 v[28:31], v[0:1], off offset:160
	; GCN-NEXT: global_load_dwordx4 v[52:55], v[0:1], off offset:176			; GCN-NEXT: global_load_dwordx4 v[52:55], v[0:1], off offset:176
	; GCN-NEXT: global_load_dwordx4 v[24:27], v[0:1], off offset:192			; GCN-NEXT: global_load_dwordx4 v[24:27], v[0:1], off offset:192
	; GCN-NEXT: global_load_dwordx4 v[7:10], v[0:1], off offset:208			; GCN-NEXT: global_load_dwordx4 v[7:10], v[0:1], off offset:208
	; GCN-NEXT: s_add_i32 s32, s32, 0x10000			; GCN-NEXT: s_add_i32 s32, s32, 0x10000
	; GCN-NEXT: s_add_i32 s32, s32, 0xffff0000			; GCN-NEXT: s_add_i32 s32, s33, 0xffffc040
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:512 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:512 ; 4-byte Folded Spill
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:516 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:516 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:520 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:520 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:524 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:524 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:528 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:528 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v8, off, s[0:3], s33 offset:532 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v8, off, s[0:3], s33 offset:532 ; 4-byte Folded Spill
	▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
	; GCN-NEXT: global_load_dwordx4 v[60:63], v[0:1], off offset:112			; GCN-NEXT: global_load_dwordx4 v[60:63], v[0:1], off offset:112
	; GCN-NEXT: global_load_dwordx4 v[36:39], v[0:1], off offset:128			; GCN-NEXT: global_load_dwordx4 v[36:39], v[0:1], off offset:128
	; GCN-NEXT: global_load_dwordx4 v[32:35], v[0:1], off offset:144			; GCN-NEXT: global_load_dwordx4 v[32:35], v[0:1], off offset:144
	; GCN-NEXT: global_load_dwordx4 v[28:31], v[0:1], off offset:160			; GCN-NEXT: global_load_dwordx4 v[28:31], v[0:1], off offset:160
	; GCN-NEXT: global_load_dwordx4 v[52:55], v[0:1], off offset:176			; GCN-NEXT: global_load_dwordx4 v[52:55], v[0:1], off offset:176
	; GCN-NEXT: global_load_dwordx4 v[24:27], v[0:1], off offset:192			; GCN-NEXT: global_load_dwordx4 v[24:27], v[0:1], off offset:192
	; GCN-NEXT: global_load_dwordx4 v[7:10], v[0:1], off offset:208			; GCN-NEXT: global_load_dwordx4 v[7:10], v[0:1], off offset:208
	; GCN-NEXT: s_add_i32 s32, s32, 0x10000			; GCN-NEXT: s_add_i32 s32, s32, 0x10000
	; GCN-NEXT: s_add_i32 s32, s32, 0xffff0000			; GCN-NEXT: s_add_i32 s32, s33, 0xffffc040
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:512 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:512 ; 4-byte Folded Spill
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:516 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:516 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:520 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:520 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:524 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v6, off, s[0:3], s33 offset:524 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:528 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v7, off, s[0:3], s33 offset:528 ; 4-byte Folded Spill
	; GCN-NEXT: buffer_store_dword v8, off, s[0:3], s33 offset:532 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v8, off, s[0:3], s33 offset:532 ; 4-byte Folded Spill
	▲ Show 20 Lines • Show All 127 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/localizer.ll

	Show First 20 Lines • Show All 246 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_writelane_b32 v40, s31, 1			; GFX9-NEXT: v_writelane_b32 v40, s31, 1
	; GFX9-NEXT: s_swappc_b64 s[30:31], 0			; GFX9-NEXT: s_swappc_b64 s[30:31], 0
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s4, v40, 2			; GFX9-NEXT: v_readlane_b32 s4, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%load0 = load volatile i32, ptr addrspace(1) null, align 4			%load0 = load volatile i32, ptr addrspace(1) null, align 4
	br label %bb1			br label %bb1

	bb1:			bb1:
	call void null()			call void null()
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll

	Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_add_u32_e32 v2, v2, v3			; GCN-NEXT: v_add_u32_e32 v2, v2, v3
	; GCN-NEXT: global_store_dword v[0:1], v2, off			; GCN-NEXT: global_store_dword v[0:1], v2, off
	; GCN-NEXT: .LBB2_3: ; %bb.2			; GCN-NEXT: .LBB2_3: ; %bb.2
	; GCN-NEXT: s_or_b64 exec, exec, s[4:5]			; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: global_store_dword v[0:1], v0, off			; GCN-NEXT: global_store_dword v[0:1], v0, off
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s7			; GCN-NEXT: s_mov_b32 s33, s7
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]

	entry:			entry:
	%cond0 = icmp eq i32 %arg.cond0, 0			%cond0 = icmp eq i32 %arg.cond0, 0
	br i1 %cond0, label %bb.0, label %bb.2			br i1 %cond0, label %bb.0, label %bb.2

	bb.0:			bb.0:
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_add_u32_e32 v2, v2, v3			; GCN-NEXT: v_add_u32_e32 v2, v2, v3
	; GCN-NEXT: global_store_dword v[0:1], v2, off			; GCN-NEXT: global_store_dword v[0:1], v2, off
	; GCN-NEXT: .LBB3_2: ; %bb.1			; GCN-NEXT: .LBB3_2: ; %bb.1
	; GCN-NEXT: s_or_b64 exec, exec, s[4:5]			; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: global_store_dword v[0:1], v0, off			; GCN-NEXT: global_store_dword v[0:1], v0, off
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_addk_i32 s32, 0xe000			; GCN-NEXT: s_add_i32 s32, s33, 0xfffff040
	; GCN-NEXT: s_mov_b32 s33, s7			; GCN-NEXT: s_mov_b32 s33, s7
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%cond = icmp eq i32 %arg.cond, 0			%cond = icmp eq i32 %arg.cond, 0
	br i1 %cond, label %bb.0, label %bb.1			br i1 %cond, label %bb.0, label %bb.1

	bb.0:			bb.0:
	%alloca = alloca [16 x i32], align 64, addrspace(5)			%alloca = alloca [16 x i32], align 64, addrspace(5)
	Show All 10 Lines
	bb.1:			bb.1:
	store volatile i32 0, ptr addrspace(1) undef			store volatile i32 0, ptr addrspace(1) undef
	ret void			ret void
	}			}

	declare i32 @llvm.amdgcn.workitem.id.x() #0			declare i32 @llvm.amdgcn.workitem.id.x() #0

	attributes #0 = { nounwind readnone speculatable }			attributes #0 = { nounwind readnone speculatable }
				;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
				; ASSUME1024: {{.*}}
				; DEFAULTSIZE: {{.*}}

llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll

	Show All 30 Lines
	; FIXEDABI-NEXT: s_addc_u32 s17, s17, requires_all_inputs@rel32@hi+12			; FIXEDABI-NEXT: s_addc_u32 s17, s17, requires_all_inputs@rel32@hi+12
	; FIXEDABI-NEXT: s_swappc_b64 s[30:31], s[16:17]			; FIXEDABI-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; FIXEDABI-NEXT: v_readlane_b32 s31, v40, 1			; FIXEDABI-NEXT: v_readlane_b32 s31, v40, 1
	; FIXEDABI-NEXT: v_readlane_b32 s30, v40, 0			; FIXEDABI-NEXT: v_readlane_b32 s30, v40, 0
	; FIXEDABI-NEXT: v_readlane_b32 s4, v40, 2			; FIXEDABI-NEXT: v_readlane_b32 s4, v40, 2
	; FIXEDABI-NEXT: s_or_saveexec_b64 s[6:7], -1			; FIXEDABI-NEXT: s_or_saveexec_b64 s[6:7], -1
	; FIXEDABI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; FIXEDABI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; FIXEDABI-NEXT: s_mov_b64 exec, s[6:7]			; FIXEDABI-NEXT: s_mov_b64 exec, s[6:7]
	; FIXEDABI-NEXT: s_addk_i32 s32, 0xfc00			; FIXEDABI-NEXT: s_mov_b32 s32, s33
	; FIXEDABI-NEXT: s_mov_b32 s33, s4			; FIXEDABI-NEXT: s_mov_b32 s33, s4
	; FIXEDABI-NEXT: s_waitcnt vmcnt(0)			; FIXEDABI-NEXT: s_waitcnt vmcnt(0)
	; FIXEDABI-NEXT: s_setpc_b64 s[30:31]			; FIXEDABI-NEXT: s_setpc_b64 s[30:31]
	call void @requires_all_inputs()			call void @requires_all_inputs()
	ret void			ret void
	}			}

	define amdgpu_kernel void @parent_kernel_missing_inputs() #0 {			define amdgpu_kernel void @parent_kernel_missing_inputs() #0 {
	▲ Show 20 Lines • Show All 345 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/bf16.ll

	Show First 20 Lines • Show All 1,383 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; GCN-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; GCN-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s31, v2, 1			; GCN-NEXT: v_readlane_b32 s31, v2, 1
	; GCN-NEXT: v_readlane_b32 s30, v2, 0			; GCN-NEXT: v_readlane_b32 s30, v2, 0
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s8			; GCN-NEXT: s_mov_b32 s33, s8
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-LABEL: test_call:			; GFX7-LABEL: test_call:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_mov_b32 s8, s33			; GFX7-NEXT: s_mov_b32 s8, s33
	Show All 13 Lines
	; GFX7-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; GFX7-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; GFX7-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_readlane_b32 s31, v2, 1			; GFX7-NEXT: v_readlane_b32 s31, v2, 1
	; GFX7-NEXT: v_readlane_b32 s30, v2, 0			; GFX7-NEXT: v_readlane_b32 s30, v2, 0
	; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX7-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_addk_i32 s32, 0xfc00			; GFX7-NEXT: s_mov_b32 s32, s33
	; GFX7-NEXT: s_mov_b32 s33, s8			; GFX7-NEXT: s_mov_b32 s33, s8
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX8-LABEL: test_call:			; GFX8-LABEL: test_call:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX8-NEXT: s_mov_b32 s6, s33			; GFX8-NEXT: s_mov_b32 s6, s33
	Show All 13 Lines
	; GFX8-NEXT: v_lshrrev_b32_e32 v0, 16, v0			; GFX8-NEXT: v_lshrrev_b32_e32 v0, 16, v0
	; GFX8-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readlane_b32 s31, v2, 1			; GFX8-NEXT: v_readlane_b32 s31, v2, 1
	; GFX8-NEXT: v_readlane_b32 s30, v2, 0			; GFX8-NEXT: v_readlane_b32 s30, v2, 0
	; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX8-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_addk_i32 s32, 0xfc00			; GFX8-NEXT: s_mov_b32 s32, s33
	; GFX8-NEXT: s_mov_b32 s33, s6			; GFX8-NEXT: s_mov_b32 s33, s6
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: s_setpc_b64 s[30:31]			; GFX8-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: test_call:			; GFX9-LABEL: test_call:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s6, s33			; GFX9-NEXT: s_mov_b32 s6, s33
	Show All 12 Lines
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: buffer_store_short_d16_hi v0, v1, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_short_d16_hi v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readlane_b32 s31, v2, 1			; GFX9-NEXT: v_readlane_b32 s31, v2, 1
	; GFX9-NEXT: v_readlane_b32 s30, v2, 0			; GFX9-NEXT: v_readlane_b32 s30, v2, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s6			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call:			; GFX10-LABEL: test_call:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s6, s33			; GFX10-NEXT: s_mov_b32 s6, s33
	Show All 14 Lines
	; GFX10-NEXT: buffer_store_short_d16_hi v0, v1, s[0:3], 0 offen			; GFX10-NEXT: buffer_store_short_d16_hi v0, v1, s[0:3], 0 offen
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_readlane_b32 s31, v2, 1			; GFX10-NEXT: v_readlane_b32 s31, v2, 1
	; GFX10-NEXT: v_readlane_b32 s30, v2, 0			; GFX10-NEXT: v_readlane_b32 s30, v2, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s4, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s6			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%result = call bfloat @test_arg_store(bfloat %in)			%result = call bfloat @test_arg_store(bfloat %in)
	store volatile bfloat %result, ptr addrspace(5) %out			store volatile bfloat %result, ptr addrspace(5) %out
	ret void			ret void
	}			}
	Show All 24 Lines
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s31, v3, 1			; GCN-NEXT: v_readlane_b32 s31, v3, 1
	; GCN-NEXT: v_readlane_b32 s30, v3, 0			; GCN-NEXT: v_readlane_b32 s30, v3, 0
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s8			; GCN-NEXT: s_mov_b32 s33, s8
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-LABEL: test_call_v2bf16:			; GFX7-LABEL: test_call_v2bf16:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_mov_b32 s8, s33			; GFX7-NEXT: s_mov_b32 s8, s33
	Show All 17 Lines
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_readlane_b32 s31, v3, 1			; GFX7-NEXT: v_readlane_b32 s31, v3, 1
	; GFX7-NEXT: v_readlane_b32 s30, v3, 0			; GFX7-NEXT: v_readlane_b32 s30, v3, 0
	; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX7-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_addk_i32 s32, 0xfc00			; GFX7-NEXT: s_mov_b32 s32, s33
	; GFX7-NEXT: s_mov_b32 s33, s8			; GFX7-NEXT: s_mov_b32 s33, s8
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX8-LABEL: test_call_v2bf16:			; GFX8-LABEL: test_call_v2bf16:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX8-NEXT: s_mov_b32 s6, s33			; GFX8-NEXT: s_mov_b32 s6, s33
	Show All 12 Lines
	; GFX8-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX8-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX8-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readlane_b32 s31, v2, 1			; GFX8-NEXT: v_readlane_b32 s31, v2, 1
	; GFX8-NEXT: v_readlane_b32 s30, v2, 0			; GFX8-NEXT: v_readlane_b32 s30, v2, 0
	; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX8-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_addk_i32 s32, 0xfc00			; GFX8-NEXT: s_mov_b32 s32, s33
	; GFX8-NEXT: s_mov_b32 s33, s6			; GFX8-NEXT: s_mov_b32 s33, s6
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: s_setpc_b64 s[30:31]			; GFX8-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: test_call_v2bf16:			; GFX9-LABEL: test_call_v2bf16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s6, s33			; GFX9-NEXT: s_mov_b32 s6, s33
	Show All 12 Lines
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readlane_b32 s31, v2, 1			; GFX9-NEXT: v_readlane_b32 s31, v2, 1
	; GFX9-NEXT: v_readlane_b32 s30, v2, 0			; GFX9-NEXT: v_readlane_b32 s30, v2, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s6			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_v2bf16:			; GFX10-LABEL: test_call_v2bf16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s6, s33			; GFX10-NEXT: s_mov_b32 s6, s33
	Show All 14 Lines
	; GFX10-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX10-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_readlane_b32 s31, v2, 1			; GFX10-NEXT: v_readlane_b32 s31, v2, 1
	; GFX10-NEXT: v_readlane_b32 s30, v2, 0			; GFX10-NEXT: v_readlane_b32 s30, v2, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s4, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s6			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%result = call <2 x bfloat> @test_arg_store_v2bf16(<2 x bfloat> %in)			%result = call <2 x bfloat> @test_arg_store_v2bf16(<2 x bfloat> %in)
	store volatile <2 x bfloat> %result, ptr addrspace(5) %out			store volatile <2 x bfloat> %result, ptr addrspace(5) %out
	ret void			ret void
	}			}
	Show All 25 Lines
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_dword v0, v3, s[0:3], 0 offen			; GCN-NEXT: buffer_store_dword v0, v3, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s31, v4, 1			; GCN-NEXT: v_readlane_b32 s31, v4, 1
	; GCN-NEXT: v_readlane_b32 s30, v4, 0			; GCN-NEXT: v_readlane_b32 s30, v4, 0
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v4, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s8			; GCN-NEXT: s_mov_b32 s33, s8
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-LABEL: test_call_v3bf16:			; GFX7-LABEL: test_call_v3bf16:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_mov_b32 s8, s33			; GFX7-NEXT: s_mov_b32 s8, s33
	Show All 18 Lines
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_store_dword v0, v3, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_dword v0, v3, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_readlane_b32 s31, v4, 1			; GFX7-NEXT: v_readlane_b32 s31, v4, 1
	; GFX7-NEXT: v_readlane_b32 s30, v4, 0			; GFX7-NEXT: v_readlane_b32 s30, v4, 0
	; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_load_dword v4, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX7-NEXT: buffer_load_dword v4, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_addk_i32 s32, 0xfc00			; GFX7-NEXT: s_mov_b32 s32, s33
	; GFX7-NEXT: s_mov_b32 s33, s8			; GFX7-NEXT: s_mov_b32 s33, s8
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX8-LABEL: test_call_v3bf16:			; GFX8-LABEL: test_call_v3bf16:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX8-NEXT: s_mov_b32 s6, s33			; GFX8-NEXT: s_mov_b32 s6, s33
	Show All 16 Lines
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: buffer_store_dword v0, v2, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_dword v0, v2, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readlane_b32 s31, v3, 1			; GFX8-NEXT: v_readlane_b32 s31, v3, 1
	; GFX8-NEXT: v_readlane_b32 s30, v3, 0			; GFX8-NEXT: v_readlane_b32 s30, v3, 0
	; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX8-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_addk_i32 s32, 0xfc00			; GFX8-NEXT: s_mov_b32 s32, s33
	; GFX8-NEXT: s_mov_b32 s33, s6			; GFX8-NEXT: s_mov_b32 s33, s6
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: s_setpc_b64 s[30:31]			; GFX8-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: test_call_v3bf16:			; GFX9-LABEL: test_call_v3bf16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s6, s33			; GFX9-NEXT: s_mov_b32 s6, s33
	Show All 15 Lines
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v0, v2, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_dword v0, v2, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readlane_b32 s31, v3, 1			; GFX9-NEXT: v_readlane_b32 s31, v3, 1
	; GFX9-NEXT: v_readlane_b32 s30, v3, 0			; GFX9-NEXT: v_readlane_b32 s30, v3, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s6			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_v3bf16:			; GFX10-LABEL: test_call_v3bf16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s6, s33			; GFX10-NEXT: s_mov_b32 s6, s33
	Show All 17 Lines
	; GFX10-NEXT: buffer_store_dword v0, v2, s[0:3], 0 offen			; GFX10-NEXT: buffer_store_dword v0, v2, s[0:3], 0 offen
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_readlane_b32 s31, v3, 1			; GFX10-NEXT: v_readlane_b32 s31, v3, 1
	; GFX10-NEXT: v_readlane_b32 s30, v3, 0			; GFX10-NEXT: v_readlane_b32 s30, v3, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s4, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s6			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%result = call <3 x bfloat> @test_arg_store_v2bf16(<3 x bfloat> %in)			%result = call <3 x bfloat> @test_arg_store_v2bf16(<3 x bfloat> %in)
	store volatile <3 x bfloat> %result, ptr addrspace(5) %out			store volatile <3 x bfloat> %result, ptr addrspace(5) %out
	ret void			ret void
	}			}
	Show All 32 Lines
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s31, v5, 1			; GCN-NEXT: v_readlane_b32 s31, v5, 1
	; GCN-NEXT: v_readlane_b32 s30, v5, 0			; GCN-NEXT: v_readlane_b32 s30, v5, 0
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s8			; GCN-NEXT: s_mov_b32 s33, s8
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-LABEL: test_call_v4bf16:			; GFX7-LABEL: test_call_v4bf16:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_mov_b32 s8, s33			; GFX7-NEXT: s_mov_b32 s8, s33
	Show All 25 Lines
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_readlane_b32 s31, v5, 1			; GFX7-NEXT: v_readlane_b32 s31, v5, 1
	; GFX7-NEXT: v_readlane_b32 s30, v5, 0			; GFX7-NEXT: v_readlane_b32 s30, v5, 0
	; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX7-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_addk_i32 s32, 0xfc00			; GFX7-NEXT: s_mov_b32 s32, s33
	; GFX7-NEXT: s_mov_b32 s33, s8			; GFX7-NEXT: s_mov_b32 s33, s8
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX8-LABEL: test_call_v4bf16:			; GFX8-LABEL: test_call_v4bf16:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX8-NEXT: s_mov_b32 s6, s33			; GFX8-NEXT: s_mov_b32 s6, s33
	Show All 23 Lines
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, 2, v2			; GFX8-NEXT: v_add_u32_e32 v0, vcc, 2, v2
	; GFX8-NEXT: buffer_store_short v4, v0, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_short v4, v0, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readlane_b32 s31, v3, 1			; GFX8-NEXT: v_readlane_b32 s31, v3, 1
	; GFX8-NEXT: v_readlane_b32 s30, v3, 0			; GFX8-NEXT: v_readlane_b32 s30, v3, 0
	; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX8-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_addk_i32 s32, 0xfc00			; GFX8-NEXT: s_mov_b32 s32, s33
	; GFX8-NEXT: s_mov_b32 s33, s6			; GFX8-NEXT: s_mov_b32 s33, s6
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: s_setpc_b64 s[30:31]			; GFX8-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: test_call_v4bf16:			; GFX9-LABEL: test_call_v4bf16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s6, s33			; GFX9-NEXT: s_mov_b32 s6, s33
	Show All 18 Lines
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readlane_b32 s31, v3, 1			; GFX9-NEXT: v_readlane_b32 s31, v3, 1
	; GFX9-NEXT: v_readlane_b32 s30, v3, 0			; GFX9-NEXT: v_readlane_b32 s30, v3, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s6			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_v4bf16:			; GFX10-LABEL: test_call_v4bf16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s6, s33			; GFX10-NEXT: s_mov_b32 s6, s33
	Show All 20 Lines
	; GFX10-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen			; GFX10-NEXT: buffer_store_short v0, v2, s[0:3], 0 offen
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_readlane_b32 s31, v3, 1			; GFX10-NEXT: v_readlane_b32 s31, v3, 1
	; GFX10-NEXT: v_readlane_b32 s30, v3, 0			; GFX10-NEXT: v_readlane_b32 s30, v3, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s4, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s6			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%result = call <4 x bfloat> @test_arg_store_v2bf16(<4 x bfloat> %in)			%result = call <4 x bfloat> @test_arg_store_v2bf16(<4 x bfloat> %in)
	store volatile <4 x bfloat> %result, ptr addrspace(5) %out			store volatile <4 x bfloat> %result, ptr addrspace(5) %out
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s31, v9, 1			; GCN-NEXT: v_readlane_b32 s31, v9, 1
	; GCN-NEXT: v_readlane_b32 s30, v9, 0			; GCN-NEXT: v_readlane_b32 s30, v9, 0
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s8			; GCN-NEXT: s_mov_b32 s33, s8
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-LABEL: test_call_v8bf16:			; GFX7-LABEL: test_call_v8bf16:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_mov_b32 s8, s33			; GFX7-NEXT: s_mov_b32 s8, s33
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_readlane_b32 s31, v9, 1			; GFX7-NEXT: v_readlane_b32 s31, v9, 1
	; GFX7-NEXT: v_readlane_b32 s30, v9, 0			; GFX7-NEXT: v_readlane_b32 s30, v9, 0
	; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX7-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_addk_i32 s32, 0xfc00			; GFX7-NEXT: s_mov_b32 s32, s33
	; GFX7-NEXT: s_mov_b32 s33, s8			; GFX7-NEXT: s_mov_b32 s33, s8
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX8-LABEL: test_call_v8bf16:			; GFX8-LABEL: test_call_v8bf16:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX8-NEXT: s_mov_b32 s6, s33			; GFX8-NEXT: s_mov_b32 s6, s33
	Show All 37 Lines
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, 2, v4			; GFX8-NEXT: v_add_u32_e32 v0, vcc, 2, v4
	; GFX8-NEXT: buffer_store_short v6, v0, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_short v6, v0, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readlane_b32 s31, v5, 1			; GFX8-NEXT: v_readlane_b32 s31, v5, 1
	; GFX8-NEXT: v_readlane_b32 s30, v5, 0			; GFX8-NEXT: v_readlane_b32 s30, v5, 0
	; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX8-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_addk_i32 s32, 0xfc00			; GFX8-NEXT: s_mov_b32 s32, s33
	; GFX8-NEXT: s_mov_b32 s33, s6			; GFX8-NEXT: s_mov_b32 s33, s6
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: s_setpc_b64 s[30:31]			; GFX8-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: test_call_v8bf16:			; GFX9-LABEL: test_call_v8bf16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s6, s33			; GFX9-NEXT: s_mov_b32 s6, s33
	Show All 26 Lines
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readlane_b32 s31, v5, 1			; GFX9-NEXT: v_readlane_b32 s31, v5, 1
	; GFX9-NEXT: v_readlane_b32 s30, v5, 0			; GFX9-NEXT: v_readlane_b32 s30, v5, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s6			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_v8bf16:			; GFX10-LABEL: test_call_v8bf16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s6, s33			; GFX10-NEXT: s_mov_b32 s6, s33
	Show All 28 Lines
	; GFX10-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen			; GFX10-NEXT: buffer_store_short v0, v4, s[0:3], 0 offen
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_readlane_b32 s31, v5, 1			; GFX10-NEXT: v_readlane_b32 s31, v5, 1
	; GFX10-NEXT: v_readlane_b32 s30, v5, 0			; GFX10-NEXT: v_readlane_b32 s30, v5, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s4, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v5, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s6			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%result = call <8 x bfloat> @test_arg_store_v2bf16(<8 x bfloat> %in)			%result = call <8 x bfloat> @test_arg_store_v2bf16(<8 x bfloat> %in)
	store volatile <8 x bfloat> %result, ptr addrspace(5) %out			store volatile <8 x bfloat> %result, ptr addrspace(5) %out
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_store_short v0, v16, s[0:3], 0 offen			; GCN-NEXT: buffer_store_short v0, v16, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_readlane_b32 s31, v17, 1			; GCN-NEXT: v_readlane_b32 s31, v17, 1
	; GCN-NEXT: v_readlane_b32 s30, v17, 0			; GCN-NEXT: v_readlane_b32 s30, v17, 0
	; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v17, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s8			; GCN-NEXT: s_mov_b32 s33, s8
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX7-LABEL: test_call_v16bf16:			; GFX7-LABEL: test_call_v16bf16:
	; GFX7: ; %bb.0: ; %entry			; GFX7: ; %bb.0: ; %entry
	; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-NEXT: s_mov_b32 s8, s33			; GFX7-NEXT: s_mov_b32 s8, s33
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: buffer_store_short v0, v16, s[0:3], 0 offen			; GFX7-NEXT: buffer_store_short v0, v16, s[0:3], 0 offen
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: v_readlane_b32 s31, v17, 1			; GFX7-NEXT: v_readlane_b32 s31, v17, 1
	; GFX7-NEXT: v_readlane_b32 s30, v17, 0			; GFX7-NEXT: v_readlane_b32 s30, v17, 0
	; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX7-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX7-NEXT: buffer_load_dword v17, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX7-NEXT: buffer_load_dword v17, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX7-NEXT: s_mov_b64 exec, s[4:5]			; GFX7-NEXT: s_mov_b64 exec, s[4:5]
	; GFX7-NEXT: s_addk_i32 s32, 0xfc00			; GFX7-NEXT: s_mov_b32 s32, s33
	; GFX7-NEXT: s_mov_b32 s33, s8			; GFX7-NEXT: s_mov_b32 s33, s8
	; GFX7-NEXT: s_waitcnt vmcnt(0)			; GFX7-NEXT: s_waitcnt vmcnt(0)
	; GFX7-NEXT: s_setpc_b64 s[30:31]			; GFX7-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX8-LABEL: test_call_v16bf16:			; GFX8-LABEL: test_call_v16bf16:
	; GFX8: ; %bb.0: ; %entry			; GFX8: ; %bb.0: ; %entry
	; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX8-NEXT: s_mov_b32 s6, s33			; GFX8-NEXT: s_mov_b32 s6, s33
	▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; GFX8-NEXT: v_add_u32_e32 v0, vcc, 2, v8			; GFX8-NEXT: v_add_u32_e32 v0, vcc, 2, v8
	; GFX8-NEXT: buffer_store_short v10, v0, s[0:3], 0 offen			; GFX8-NEXT: buffer_store_short v10, v0, s[0:3], 0 offen
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: v_readlane_b32 s31, v9, 1			; GFX8-NEXT: v_readlane_b32 s31, v9, 1
	; GFX8-NEXT: v_readlane_b32 s30, v9, 0			; GFX8-NEXT: v_readlane_b32 s30, v9, 0
	; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX8-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX8-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX8-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX8-NEXT: s_mov_b64 exec, s[4:5]			; GFX8-NEXT: s_mov_b64 exec, s[4:5]
	; GFX8-NEXT: s_addk_i32 s32, 0xfc00			; GFX8-NEXT: s_mov_b32 s32, s33
	; GFX8-NEXT: s_mov_b32 s33, s6			; GFX8-NEXT: s_mov_b32 s33, s6
	; GFX8-NEXT: s_waitcnt vmcnt(0)			; GFX8-NEXT: s_waitcnt vmcnt(0)
	; GFX8-NEXT: s_setpc_b64 s[30:31]			; GFX8-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: test_call_v16bf16:			; GFX9-LABEL: test_call_v16bf16:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s6, s33			; GFX9-NEXT: s_mov_b32 s6, s33
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_readlane_b32 s31, v9, 1			; GFX9-NEXT: v_readlane_b32 s31, v9, 1
	; GFX9-NEXT: v_readlane_b32 s30, v9, 0			; GFX9-NEXT: v_readlane_b32 s30, v9, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s6			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_v16bf16:			; GFX10-LABEL: test_call_v16bf16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s6, s33			; GFX10-NEXT: s_mov_b32 s6, s33
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen			; GFX10-NEXT: buffer_store_short v0, v8, s[0:3], 0 offen
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: v_readlane_b32 s31, v9, 1			; GFX10-NEXT: v_readlane_b32 s31, v9, 1
	; GFX10-NEXT: v_readlane_b32 s30, v9, 0			; GFX10-NEXT: v_readlane_b32 s30, v9, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s4, -1			; GFX10-NEXT: s_xor_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v9, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s6			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%result = call <16 x bfloat> @test_arg_store_v2bf16(<16 x bfloat> %in)			%result = call <16 x bfloat> @test_arg_store_v2bf16(<16 x bfloat> %in)
	store volatile <16 x bfloat> %result, ptr addrspace(5) %out			store volatile <16 x bfloat> %result, ptr addrspace(5) %out
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 407 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/call-argument-types.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 5,873 Lines • ▼ Show 20 Lines
	; VI-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12			; VI-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12
	; VI-NEXT: s_swappc_b64 s[30:31], s[4:5]			; VI-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; VI-NEXT: v_readlane_b32 s31, v40, 1			; VI-NEXT: v_readlane_b32 s31, v40, 1
	; VI-NEXT: v_readlane_b32 s30, v40, 0			; VI-NEXT: v_readlane_b32 s30, v40, 0
	; VI-NEXT: v_readlane_b32 s4, v40, 2			; VI-NEXT: v_readlane_b32 s4, v40, 2
	; VI-NEXT: s_or_saveexec_b64 s[6:7], -1			; VI-NEXT: s_or_saveexec_b64 s[6:7], -1
	; VI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; VI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; VI-NEXT: s_mov_b64 exec, s[6:7]			; VI-NEXT: s_mov_b64 exec, s[6:7]
	; VI-NEXT: s_addk_i32 s32, 0xfc00			; VI-NEXT: s_mov_b32 s32, s33
	; VI-NEXT: s_mov_b32 s33, s4			; VI-NEXT: s_mov_b32 s33, s4
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: s_setpc_b64 s[30:31]			; VI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; CI-LABEL: stack_12xv3i32:			; CI-LABEL: stack_12xv3i32:
	; CI: ; %bb.0: ; %entry			; CI: ; %bb.0: ; %entry
	; CI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CI-NEXT: s_mov_b32 s4, s33			; CI-NEXT: s_mov_b32 s4, s33
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; CI-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12			; CI-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12
	; CI-NEXT: s_swappc_b64 s[30:31], s[4:5]			; CI-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; CI-NEXT: v_readlane_b32 s31, v40, 1			; CI-NEXT: v_readlane_b32 s31, v40, 1
	; CI-NEXT: v_readlane_b32 s30, v40, 0			; CI-NEXT: v_readlane_b32 s30, v40, 0
	; CI-NEXT: v_readlane_b32 s4, v40, 2			; CI-NEXT: v_readlane_b32 s4, v40, 2
	; CI-NEXT: s_or_saveexec_b64 s[6:7], -1			; CI-NEXT: s_or_saveexec_b64 s[6:7], -1
	; CI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; CI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CI-NEXT: s_mov_b64 exec, s[6:7]			; CI-NEXT: s_mov_b64 exec, s[6:7]
	; CI-NEXT: s_addk_i32 s32, 0xfc00			; CI-NEXT: s_mov_b32 s32, s33
	; CI-NEXT: s_mov_b32 s33, s4			; CI-NEXT: s_mov_b32 s33, s4
	; CI-NEXT: s_waitcnt vmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0)
	; CI-NEXT: s_setpc_b64 s[30:31]			; CI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: stack_12xv3i32:			; GFX9-LABEL: stack_12xv3i32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s4, s33			; GFX9-NEXT: s_mov_b32 s4, s33
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s4, v40, 2			; GFX9-NEXT: v_readlane_b32 s4, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_12xv3i32:			; GFX11-LABEL: stack_12xv3i32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 33 Lines
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; HSA-LABEL: stack_12xv3i32:			; HSA-LABEL: stack_12xv3i32:
	; HSA: ; %bb.0: ; %entry			; HSA: ; %bb.0: ; %entry
	; HSA-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; HSA-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; HSA-NEXT: s_mov_b32 s4, s33			; HSA-NEXT: s_mov_b32 s4, s33
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; HSA-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12			; HSA-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3i32@rel32@hi+12
	; HSA-NEXT: s_swappc_b64 s[30:31], s[4:5]			; HSA-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; HSA-NEXT: v_readlane_b32 s31, v40, 1			; HSA-NEXT: v_readlane_b32 s31, v40, 1
	; HSA-NEXT: v_readlane_b32 s30, v40, 0			; HSA-NEXT: v_readlane_b32 s30, v40, 0
	; HSA-NEXT: v_readlane_b32 s4, v40, 2			; HSA-NEXT: v_readlane_b32 s4, v40, 2
	; HSA-NEXT: s_or_saveexec_b64 s[6:7], -1			; HSA-NEXT: s_or_saveexec_b64 s[6:7], -1
	; HSA-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; HSA-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; HSA-NEXT: s_mov_b64 exec, s[6:7]			; HSA-NEXT: s_mov_b64 exec, s[6:7]
	; HSA-NEXT: s_addk_i32 s32, 0xfc00			; HSA-NEXT: s_mov_b32 s32, s33
	; HSA-NEXT: s_mov_b32 s33, s4			; HSA-NEXT: s_mov_b32 s33, s4
	; HSA-NEXT: s_waitcnt vmcnt(0)			; HSA-NEXT: s_waitcnt vmcnt(0)
	; HSA-NEXT: s_setpc_b64 s[30:31]			; HSA-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call void @external_void_func_12xv3i32(			call void @external_void_func_12xv3i32(
	<3 x i32><i32 0, i32 0, i32 0>,			<3 x i32><i32 0, i32 0, i32 0>,
	<3 x i32><i32 1, i32 1, i32 1>,			<3 x i32><i32 1, i32 1, i32 1>,
	<3 x i32><i32 2, i32 2, i32 2>,			<3 x i32><i32 2, i32 2, i32 2>,
	▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; VI-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3f32@rel32@hi+12			; VI-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3f32@rel32@hi+12
	; VI-NEXT: s_swappc_b64 s[30:31], s[4:5]			; VI-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; VI-NEXT: v_readlane_b32 s31, v40, 1			; VI-NEXT: v_readlane_b32 s31, v40, 1
	; VI-NEXT: v_readlane_b32 s30, v40, 0			; VI-NEXT: v_readlane_b32 s30, v40, 0
	; VI-NEXT: v_readlane_b32 s4, v40, 2			; VI-NEXT: v_readlane_b32 s4, v40, 2
	; VI-NEXT: s_or_saveexec_b64 s[6:7], -1			; VI-NEXT: s_or_saveexec_b64 s[6:7], -1
	; VI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; VI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; VI-NEXT: s_mov_b64 exec, s[6:7]			; VI-NEXT: s_mov_b64 exec, s[6:7]
	; VI-NEXT: s_addk_i32 s32, 0xfc00			; VI-NEXT: s_mov_b32 s32, s33
	; VI-NEXT: s_mov_b32 s33, s4			; VI-NEXT: s_mov_b32 s33, s4
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: s_setpc_b64 s[30:31]			; VI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; CI-LABEL: stack_12xv3f32:			; CI-LABEL: stack_12xv3f32:
	; CI: ; %bb.0: ; %entry			; CI: ; %bb.0: ; %entry
	; CI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CI-NEXT: s_mov_b32 s4, s33			; CI-NEXT: s_mov_b32 s4, s33
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; CI-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3f32@rel32@hi+12			; CI-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3f32@rel32@hi+12
	; CI-NEXT: s_swappc_b64 s[30:31], s[4:5]			; CI-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; CI-NEXT: v_readlane_b32 s31, v40, 1			; CI-NEXT: v_readlane_b32 s31, v40, 1
	; CI-NEXT: v_readlane_b32 s30, v40, 0			; CI-NEXT: v_readlane_b32 s30, v40, 0
	; CI-NEXT: v_readlane_b32 s4, v40, 2			; CI-NEXT: v_readlane_b32 s4, v40, 2
	; CI-NEXT: s_or_saveexec_b64 s[6:7], -1			; CI-NEXT: s_or_saveexec_b64 s[6:7], -1
	; CI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; CI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CI-NEXT: s_mov_b64 exec, s[6:7]			; CI-NEXT: s_mov_b64 exec, s[6:7]
	; CI-NEXT: s_addk_i32 s32, 0xfc00			; CI-NEXT: s_mov_b32 s32, s33
	; CI-NEXT: s_mov_b32 s33, s4			; CI-NEXT: s_mov_b32 s33, s4
	; CI-NEXT: s_waitcnt vmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0)
	; CI-NEXT: s_setpc_b64 s[30:31]			; CI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: stack_12xv3f32:			; GFX9-LABEL: stack_12xv3f32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s4, s33			; GFX9-NEXT: s_mov_b32 s4, s33
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s4, v40, 2			; GFX9-NEXT: v_readlane_b32 s4, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_12xv3f32:			; GFX11-LABEL: stack_12xv3f32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 37 Lines
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; HSA-LABEL: stack_12xv3f32:			; HSA-LABEL: stack_12xv3f32:
	; HSA: ; %bb.0: ; %entry			; HSA: ; %bb.0: ; %entry
	; HSA-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; HSA-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; HSA-NEXT: s_mov_b32 s4, s33			; HSA-NEXT: s_mov_b32 s4, s33
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; HSA-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3f32@rel32@hi+12			; HSA-NEXT: s_addc_u32 s5, s5, external_void_func_12xv3f32@rel32@hi+12
	; HSA-NEXT: s_swappc_b64 s[30:31], s[4:5]			; HSA-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; HSA-NEXT: v_readlane_b32 s31, v40, 1			; HSA-NEXT: v_readlane_b32 s31, v40, 1
	; HSA-NEXT: v_readlane_b32 s30, v40, 0			; HSA-NEXT: v_readlane_b32 s30, v40, 0
	; HSA-NEXT: v_readlane_b32 s4, v40, 2			; HSA-NEXT: v_readlane_b32 s4, v40, 2
	; HSA-NEXT: s_or_saveexec_b64 s[6:7], -1			; HSA-NEXT: s_or_saveexec_b64 s[6:7], -1
	; HSA-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; HSA-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; HSA-NEXT: s_mov_b64 exec, s[6:7]			; HSA-NEXT: s_mov_b64 exec, s[6:7]
	; HSA-NEXT: s_addk_i32 s32, 0xfc00			; HSA-NEXT: s_mov_b32 s32, s33
	; HSA-NEXT: s_mov_b32 s33, s4			; HSA-NEXT: s_mov_b32 s33, s4
	; HSA-NEXT: s_waitcnt vmcnt(0)			; HSA-NEXT: s_waitcnt vmcnt(0)
	; HSA-NEXT: s_setpc_b64 s[30:31]			; HSA-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call void @external_void_func_12xv3f32(			call void @external_void_func_12xv3f32(
	<3 x float><float 0.0, float 0.0, float 0.0>,			<3 x float><float 0.0, float 0.0, float 0.0>,
	<3 x float><float 1.0, float 1.0, float 1.0>,			<3 x float><float 1.0, float 1.0, float 1.0>,
	<3 x float><float 2.0, float 2.0, float 2.0>,			<3 x float><float 2.0, float 2.0, float 2.0>,
	▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	; VI-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12			; VI-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12
	; VI-NEXT: s_swappc_b64 s[30:31], s[4:5]			; VI-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; VI-NEXT: v_readlane_b32 s31, v40, 1			; VI-NEXT: v_readlane_b32 s31, v40, 1
	; VI-NEXT: v_readlane_b32 s30, v40, 0			; VI-NEXT: v_readlane_b32 s30, v40, 0
	; VI-NEXT: v_readlane_b32 s4, v40, 2			; VI-NEXT: v_readlane_b32 s4, v40, 2
	; VI-NEXT: s_or_saveexec_b64 s[6:7], -1			; VI-NEXT: s_or_saveexec_b64 s[6:7], -1
	; VI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; VI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; VI-NEXT: s_mov_b64 exec, s[6:7]			; VI-NEXT: s_mov_b64 exec, s[6:7]
	; VI-NEXT: s_addk_i32 s32, 0xfc00			; VI-NEXT: s_mov_b32 s32, s33
	; VI-NEXT: s_mov_b32 s33, s4			; VI-NEXT: s_mov_b32 s33, s4
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: s_setpc_b64 s[30:31]			; VI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; CI-LABEL: stack_8xv5i32:			; CI-LABEL: stack_8xv5i32:
	; CI: ; %bb.0: ; %entry			; CI: ; %bb.0: ; %entry
	; CI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CI-NEXT: s_mov_b32 s4, s33			; CI-NEXT: s_mov_b32 s4, s33
	▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	; CI-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12			; CI-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12
	; CI-NEXT: s_swappc_b64 s[30:31], s[4:5]			; CI-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; CI-NEXT: v_readlane_b32 s31, v40, 1			; CI-NEXT: v_readlane_b32 s31, v40, 1
	; CI-NEXT: v_readlane_b32 s30, v40, 0			; CI-NEXT: v_readlane_b32 s30, v40, 0
	; CI-NEXT: v_readlane_b32 s4, v40, 2			; CI-NEXT: v_readlane_b32 s4, v40, 2
	; CI-NEXT: s_or_saveexec_b64 s[6:7], -1			; CI-NEXT: s_or_saveexec_b64 s[6:7], -1
	; CI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; CI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CI-NEXT: s_mov_b64 exec, s[6:7]			; CI-NEXT: s_mov_b64 exec, s[6:7]
	; CI-NEXT: s_addk_i32 s32, 0xfc00			; CI-NEXT: s_mov_b32 s32, s33
	; CI-NEXT: s_mov_b32 s33, s4			; CI-NEXT: s_mov_b32 s33, s4
	; CI-NEXT: s_waitcnt vmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0)
	; CI-NEXT: s_setpc_b64 s[30:31]			; CI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: stack_8xv5i32:			; GFX9-LABEL: stack_8xv5i32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s4, s33			; GFX9-NEXT: s_mov_b32 s4, s33
	▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s4, v40, 2			; GFX9-NEXT: v_readlane_b32 s4, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_8xv5i32:			; GFX11-LABEL: stack_8xv5i32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 38 Lines
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; HSA-LABEL: stack_8xv5i32:			; HSA-LABEL: stack_8xv5i32:
	; HSA: ; %bb.0: ; %entry			; HSA: ; %bb.0: ; %entry
	; HSA-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; HSA-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; HSA-NEXT: s_mov_b32 s4, s33			; HSA-NEXT: s_mov_b32 s4, s33
	▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	; HSA-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12			; HSA-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5i32@rel32@hi+12
	; HSA-NEXT: s_swappc_b64 s[30:31], s[4:5]			; HSA-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; HSA-NEXT: v_readlane_b32 s31, v40, 1			; HSA-NEXT: v_readlane_b32 s31, v40, 1
	; HSA-NEXT: v_readlane_b32 s30, v40, 0			; HSA-NEXT: v_readlane_b32 s30, v40, 0
	; HSA-NEXT: v_readlane_b32 s4, v40, 2			; HSA-NEXT: v_readlane_b32 s4, v40, 2
	; HSA-NEXT: s_or_saveexec_b64 s[6:7], -1			; HSA-NEXT: s_or_saveexec_b64 s[6:7], -1
	; HSA-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; HSA-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; HSA-NEXT: s_mov_b64 exec, s[6:7]			; HSA-NEXT: s_mov_b64 exec, s[6:7]
	; HSA-NEXT: s_addk_i32 s32, 0xfc00			; HSA-NEXT: s_mov_b32 s32, s33
	; HSA-NEXT: s_mov_b32 s33, s4			; HSA-NEXT: s_mov_b32 s33, s4
	; HSA-NEXT: s_waitcnt vmcnt(0)			; HSA-NEXT: s_waitcnt vmcnt(0)
	; HSA-NEXT: s_setpc_b64 s[30:31]			; HSA-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call void @external_void_func_8xv5i32(			call void @external_void_func_8xv5i32(
	<5 x i32><i32 0, i32 0, i32 0, i32 0, i32 0>,			<5 x i32><i32 0, i32 0, i32 0, i32 0, i32 0>,
	<5 x i32><i32 1, i32 1, i32 1, i32 1, i32 1>,			<5 x i32><i32 1, i32 1, i32 1, i32 1, i32 1>,
	<5 x i32><i32 2, i32 2, i32 2, i32 2, i32 2>,			<5 x i32><i32 2, i32 2, i32 2, i32 2, i32 2>,
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	; VI-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12			; VI-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12
	; VI-NEXT: s_swappc_b64 s[30:31], s[4:5]			; VI-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; VI-NEXT: v_readlane_b32 s31, v40, 1			; VI-NEXT: v_readlane_b32 s31, v40, 1
	; VI-NEXT: v_readlane_b32 s30, v40, 0			; VI-NEXT: v_readlane_b32 s30, v40, 0
	; VI-NEXT: v_readlane_b32 s4, v40, 2			; VI-NEXT: v_readlane_b32 s4, v40, 2
	; VI-NEXT: s_or_saveexec_b64 s[6:7], -1			; VI-NEXT: s_or_saveexec_b64 s[6:7], -1
	; VI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; VI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; VI-NEXT: s_mov_b64 exec, s[6:7]			; VI-NEXT: s_mov_b64 exec, s[6:7]
	; VI-NEXT: s_addk_i32 s32, 0xfc00			; VI-NEXT: s_mov_b32 s32, s33
	; VI-NEXT: s_mov_b32 s33, s4			; VI-NEXT: s_mov_b32 s33, s4
	; VI-NEXT: s_waitcnt vmcnt(0)			; VI-NEXT: s_waitcnt vmcnt(0)
	; VI-NEXT: s_setpc_b64 s[30:31]			; VI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; CI-LABEL: stack_8xv5f32:			; CI-LABEL: stack_8xv5f32:
	; CI: ; %bb.0: ; %entry			; CI: ; %bb.0: ; %entry
	; CI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CI-NEXT: s_mov_b32 s4, s33			; CI-NEXT: s_mov_b32 s4, s33
	▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	; CI-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12			; CI-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12
	; CI-NEXT: s_swappc_b64 s[30:31], s[4:5]			; CI-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; CI-NEXT: v_readlane_b32 s31, v40, 1			; CI-NEXT: v_readlane_b32 s31, v40, 1
	; CI-NEXT: v_readlane_b32 s30, v40, 0			; CI-NEXT: v_readlane_b32 s30, v40, 0
	; CI-NEXT: v_readlane_b32 s4, v40, 2			; CI-NEXT: v_readlane_b32 s4, v40, 2
	; CI-NEXT: s_or_saveexec_b64 s[6:7], -1			; CI-NEXT: s_or_saveexec_b64 s[6:7], -1
	; CI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; CI-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CI-NEXT: s_mov_b64 exec, s[6:7]			; CI-NEXT: s_mov_b64 exec, s[6:7]
	; CI-NEXT: s_addk_i32 s32, 0xfc00			; CI-NEXT: s_mov_b32 s32, s33
	; CI-NEXT: s_mov_b32 s33, s4			; CI-NEXT: s_mov_b32 s33, s4
	; CI-NEXT: s_waitcnt vmcnt(0)			; CI-NEXT: s_waitcnt vmcnt(0)
	; CI-NEXT: s_setpc_b64 s[30:31]			; CI-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-LABEL: stack_8xv5f32:			; GFX9-LABEL: stack_8xv5f32:
	; GFX9: ; %bb.0: ; %entry			; GFX9: ; %bb.0: ; %entry
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: s_mov_b32 s4, s33			; GFX9-NEXT: s_mov_b32 s4, s33
	▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s4, v40, 2			; GFX9-NEXT: v_readlane_b32 s4, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_8xv5f32:			; GFX11-LABEL: stack_8xv5f32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; HSA-LABEL: stack_8xv5f32:			; HSA-LABEL: stack_8xv5f32:
	; HSA: ; %bb.0: ; %entry			; HSA: ; %bb.0: ; %entry
	; HSA-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; HSA-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; HSA-NEXT: s_mov_b32 s4, s33			; HSA-NEXT: s_mov_b32 s4, s33
	▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	; HSA-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12			; HSA-NEXT: s_addc_u32 s5, s5, external_void_func_8xv5f32@rel32@hi+12
	; HSA-NEXT: s_swappc_b64 s[30:31], s[4:5]			; HSA-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; HSA-NEXT: v_readlane_b32 s31, v40, 1			; HSA-NEXT: v_readlane_b32 s31, v40, 1
	; HSA-NEXT: v_readlane_b32 s30, v40, 0			; HSA-NEXT: v_readlane_b32 s30, v40, 0
	; HSA-NEXT: v_readlane_b32 s4, v40, 2			; HSA-NEXT: v_readlane_b32 s4, v40, 2
	; HSA-NEXT: s_or_saveexec_b64 s[6:7], -1			; HSA-NEXT: s_or_saveexec_b64 s[6:7], -1
	; HSA-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; HSA-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; HSA-NEXT: s_mov_b64 exec, s[6:7]			; HSA-NEXT: s_mov_b64 exec, s[6:7]
	; HSA-NEXT: s_addk_i32 s32, 0xfc00			; HSA-NEXT: s_mov_b32 s32, s33
	; HSA-NEXT: s_mov_b32 s33, s4			; HSA-NEXT: s_mov_b32 s33, s4
	; HSA-NEXT: s_waitcnt vmcnt(0)			; HSA-NEXT: s_waitcnt vmcnt(0)
	; HSA-NEXT: s_setpc_b64 s[30:31]			; HSA-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call void @external_void_func_8xv5f32(			call void @external_void_func_8xv5f32(
	<5 x float><float 0.0, float 0.0, float 0.0, float 0.0, float 0.0>,			<5 x float><float 0.0, float 0.0, float 0.0, float 0.0, float 0.0>,
	<5 x float><float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>,			<5 x float><float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>,
	<5 x float><float 2.0, float 2.0, float 2.0, float 2.0, float 2.0>,			<5 x float><float 2.0, float 2.0, float 2.0, float 2.0, float 2.0>,
	Show All 22 Lines

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

	Show All 10 Lines
	}			}

	; GCN-LABEL: {{^}}callee_no_stack_no_fp_elim_all:			; GCN-LABEL: {{^}}callee_no_stack_no_fp_elim_all:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt			; GCN-NEXT: s_waitcnt
	; MUBUF-NEXT: s_mov_b32 [[FP_COPY:s4]], s33			; MUBUF-NEXT: s_mov_b32 [[FP_COPY:s4]], s33
	; FLATSCR-NEXT: s_mov_b32 [[FP_COPY:s0]], s33			; FLATSCR-NEXT: s_mov_b32 [[FP_COPY:s0]], s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
				; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]			; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define void @callee_no_stack_no_fp_elim_all() #1 {			define void @callee_no_stack_no_fp_elim_all() #1 {
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}callee_no_stack_no_fp_elim_nonleaf:			; GCN-LABEL: {{^}}callee_no_stack_no_fp_elim_nonleaf:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	Show All 26 Lines
	; FLATSCR-NEXT: s_mov_b32 [[FP_COPY:s0]], s33			; FLATSCR-NEXT: s_mov_b32 [[FP_COPY:s0]], s33
	; GCN-NEXT: s_mov_b32 s33, s32			; GCN-NEXT: s_mov_b32 s33, s32
	; MUBUF-NEXT: s_addk_i32 s32, 0x200			; MUBUF-NEXT: s_addk_i32 s32, 0x200
	; FLATSCR-NEXT: s_add_i32 s32, s32, 8			; FLATSCR-NEXT: s_add_i32 s32, s32, 8
	; GCN-NEXT: v_mov_b32_e32 v0, 0{{$}}			; GCN-NEXT: v_mov_b32_e32 v0, 0{{$}}
	; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s33{{$}}			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s33{{$}}
	; FLATSCR-NEXT: scratch_store_dword off, v0, s33{{$}}			; FLATSCR-NEXT: scratch_store_dword off, v0, s33{{$}}
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: s_addk_i32 s32, 0xfe00			; MUBUF-NEXT: s_mov_b32 s32, s33
	; FLATSCR-NEXT: s_add_i32 s32, s32, -8			; FLATSCR-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]			; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define void @callee_with_stack_no_fp_elim_all() #1 {			define void @callee_with_stack_no_fp_elim_all() #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %alloca			store volatile i32 0, ptr addrspace(5) %alloca
	ret void			ret void
	}			}

	Show All 35 Lines
	; GCN-DAG: v_readlane_b32 s30, [[CSR_VGPR]]			; GCN-DAG: v_readlane_b32 s30, [[CSR_VGPR]]
	; GCN-DAG: v_readlane_b32 s31, [[CSR_VGPR]]			; GCN-DAG: v_readlane_b32 s31, [[CSR_VGPR]]

	; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], [[CSR_VGPR]], 2			; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], [[CSR_VGPR]], 2
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s33 offset:4 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; MUBUF: s_addk_i32 s32, 0xfc00{{$}}			; MUBUF: s_mov_b32 s32, s33
	; FLATSCR: s_add_i32 s32, s32, -16{{$}}			; FLATSCR: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]			; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)

	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	define void @callee_with_stack_and_call() #0 {			define void @callee_with_stack_and_call() #0 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %alloca			store volatile i32 0, ptr addrspace(5) %alloca
	call void @external_void_func_void()			call void @external_void_func_void()
	Show All 25 Lines
	; GCN-DAG: v_readlane_b32 s30, [[CSR_VGPR]], 0			; GCN-DAG: v_readlane_b32 s30, [[CSR_VGPR]], 0
	; GCN-DAG: v_readlane_b32 s31, [[CSR_VGPR]], 1			; GCN-DAG: v_readlane_b32 s31, [[CSR_VGPR]], 1

	; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], [[CSR_VGPR]], [[FP_SPILL_LANE]]			; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], [[CSR_VGPR]], [[FP_SPILL_LANE]]
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s33 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s33 ; 4-byte Folded Reload
	; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s33 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; MUBUF: s_addk_i32 s32, 0xfc00			; MUBUF: s_mov_b32 s32, s33
	; FLATSCR: s_add_i32 s32, s32, -16			; FLATSCR: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]			; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	define void @callee_no_stack_with_call() #0 {			define void @callee_no_stack_with_call() #0 {
	call void @external_void_func_void()			call void @external_void_func_void()
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines

	; GCN: ;;#ASMSTART			; GCN: ;;#ASMSTART
	; GCN-NEXT: ; clobber v41			; GCN-NEXT: ; clobber v41
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND

	; MUBUF: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; MUBUF: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; FLATSCR: scratch_load_dword v41, off, s33 ; 4-byte Folded Reload			; FLATSCR: scratch_load_dword v41, off, s33 ; 4-byte Folded Reload
	; MUBUF: s_addk_i32 s32, 0x300			; MUBUF: s_addk_i32 s32, 0x300
	; MUBUF-NEXT: s_addk_i32 s32, 0xfd00			; MUBUF-NEXT: s_mov_b32 s32, s33
	; MUBUF-NEXT: s_mov_b32 s33, s4			; MUBUF-NEXT: s_mov_b32 s33, s4
	; FLATSCR: s_add_i32 s32, s32, 12			; FLATSCR: s_add_i32 s32, s32, 12
	; FLATSCR-NEXT: s_add_i32 s32, s32, -12			; FLATSCR-NEXT: s_mov_b32 s32, s33
	; FLATSCR-NEXT: s_mov_b32 s33, s0			; FLATSCR-NEXT: s_mov_b32 s33, s0
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define void @callee_with_stack_no_fp_elim_csr_vgpr() #1 {			define void @callee_with_stack_no_fp_elim_csr_vgpr() #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %alloca			store volatile i32 0, ptr addrspace(5) %alloca
	call void asm sideeffect "; clobber v41", "~{v41}"()			call void asm sideeffect "; clobber v41", "~{v41}"()
	ret void			ret void
	Show All 17 Lines
	; GCN: v_writelane_b32 v0			; GCN: v_writelane_b32 v0

	; MUBUF: s_addk_i32 s32, 0x400			; MUBUF: s_addk_i32 s32, 0x400
	; FLATSCR: s_add_i32 s32, s32, 16			; FLATSCR: s_add_i32 s32, s32, 16
	; GCN: s_xor_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN: s_xor_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s33 offset:8 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s33 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; MUBUF: s_addk_i32 s32, 0xfc00			; MUBUF: s_mov_b32 s32, s33
	; FLATSCR: s_add_i32 s32, s32, -16			; FLATSCR: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, [[TMP_SGPR]]			; GCN-NEXT: s_mov_b32 s33, [[TMP_SGPR]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define void @last_lane_vgpr_for_fp_csr() #1 {			define void @last_lane_vgpr_for_fp_csr() #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %alloca			store volatile i32 0, ptr addrspace(5) %alloca
	call void asm sideeffect "; clobber v41", "~{v41}"()			call void asm sideeffect "; clobber v41", "~{v41}"()
	call void asm sideeffect "",			call void asm sideeffect "",
	Show All 29 Lines
	; FLATSCR: scratch_load_dword v41, off, s33 ; 4-byte Folded Reload			; FLATSCR: scratch_load_dword v41, off, s33 ; 4-byte Folded Reload
	; MUBUF: s_addk_i32 s32, 0x400			; MUBUF: s_addk_i32 s32, 0x400
	; FLATSCR: s_add_i32 s32, s32, 16			; FLATSCR: s_add_i32 s32, s32, 16
	; GCN-COUNT-64: v_readlane_b32 s{{[0-9]+}}, v0			; GCN-COUNT-64: v_readlane_b32 s{{[0-9]+}}, v0
	; GCN-NEXT: s_xor_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_xor_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s33 offset:8 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s33 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; MUBUF-NEXT: s_addk_i32 s32, 0xfc00			; MUBUF-NEXT: s_mov_b32 s32, s33
	; FLATSCR-NEXT: s_add_i32 s32, s32, -16			; FLATSCR-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]			; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define void @no_new_vgpr_for_fp_csr() #1 {			define void @no_new_vgpr_for_fp_csr() #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %alloca			store volatile i32 0, ptr addrspace(5) %alloca
	call void asm sideeffect "; clobber v41", "~{v41}"()			call void asm sideeffect "; clobber v41", "~{v41}"()
	call void asm sideeffect "",			call void asm sideeffect "",
	Show All 19 Lines
	; MUBUF-NEXT: s_add_i32 s32, s32, 0x180000			; MUBUF-NEXT: s_add_i32 s32, s32, 0x180000
	; FLATSCR-NEXT: s_addk_i32 s32, 0x6000			; FLATSCR-NEXT: s_addk_i32 s32, 0x6000
	; GCN-NEXT: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0			; GCN-NEXT: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0
	; MUBUF-NEXT: v_mov_b32_e32 [[OFFSET:v[0-9]+]], 0x2000{{$}}			; MUBUF-NEXT: v_mov_b32_e32 [[OFFSET:v[0-9]+]], 0x2000{{$}}
	; MUBUF-NEXT: buffer_store_dword [[ZERO]], [[OFFSET]], s[0:3], s33 offen{{$}}			; MUBUF-NEXT: buffer_store_dword [[ZERO]], [[OFFSET]], s[0:3], s33 offen{{$}}
	; FLATSCR-NEXT: s_add_i32 s1, s33, 0x2000			; FLATSCR-NEXT: s_add_i32 s1, s33, 0x2000
	; FLATSCR-NEXT: scratch_store_dword off, [[ZERO]], s1			; FLATSCR-NEXT: scratch_store_dword off, [[ZERO]], s1
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: s_add_i32 s32, s32, 0xffe80000			; MUBUF-NEXT: s_add_i32 s32, s33, 0xfff80040
	; FLATSCR-NEXT: s_addk_i32 s32, 0xa000			; FLATSCR-NEXT: s_add_i32 s32, s33, 0xffffe001
	; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]			; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define void @realign_stack_no_fp_elim() #1 {			define void @realign_stack_no_fp_elim() #1 {
	%alloca = alloca i32, align 8192, addrspace(5)			%alloca = alloca i32, align 8192, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %alloca			store volatile i32 0, ptr addrspace(5) %alloca
	ret void			ret void
	}			}

	Show All 15 Lines
	; MUBUF: s_addk_i32 s32, 0x300			; MUBUF: s_addk_i32 s32, 0x300
	; FLATSCR: s_add_i32 s32, s32, 12			; FLATSCR: s_add_i32 s32, s32, 12
	; GCN: v_readlane_b32 s31, [[CSR_VGPR]], 1			; GCN: v_readlane_b32 s31, [[CSR_VGPR]], 1
	; GCN: v_readlane_b32 s30, [[CSR_VGPR]], 0			; GCN: v_readlane_b32 s30, [[CSR_VGPR]], 0
	; GCN-NEXT: s_xor_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_xor_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s33 offset:4 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; MUBUF: s_addk_i32 s32, 0xfd00			; MUBUF: s_mov_b32 s32, s33
	; FLATSCR: s_add_i32 s32, s32, -12			; FLATSCR: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, vcc_lo			; GCN-NEXT: s_mov_b32 s33, vcc_lo
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	define void @no_unused_non_csr_sgpr_for_fp() #1 {			define void @no_unused_non_csr_sgpr_for_fp() #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %alloca			store volatile i32 0, ptr addrspace(5) %alloca

	; Use all clobberable registers, so FP has to spill to a VGPR.			; Use all clobberable registers, so FP has to spill to a VGPR.
	Show All 21 Lines
	; MUBUF: s_addk_i32 s32, 0x300{{$}}			; MUBUF: s_addk_i32 s32, 0x300{{$}}
	; FLATSCR: s_add_i32 s32, s32, 12{{$}}			; FLATSCR: s_add_i32 s32, s32, 12{{$}}

	; GCN: ;;#ASMSTART			; GCN: ;;#ASMSTART
	; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s33 offset:4 ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; MUBUF: s_addk_i32 s32, 0xfd00{{$}}			; MUBUF: s_mov_b32 s32, s33
	; FLATSCR: s_add_i32 s32, s32, -12{{$}}			; FLATSCR: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, vcc_lo			; GCN-NEXT: s_mov_b32 s33, vcc_lo
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	define void @no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr() #1 {			define void @no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr() #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %alloca			store volatile i32 0, ptr addrspace(5) %alloca

	; Use all clobberable registers, so FP has to spill to a VGPR.			; Use all clobberable registers, so FP has to spill to a VGPR.
	Show All 31 Lines

	; GCN: ;;#ASMSTART			; GCN: ;;#ASMSTART
	; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s33, 0x40100			; MUBUF-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s33, 0x40100
	; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Reload			; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], [[SCRATCH_SGPR]] ; 4-byte Folded Reload
	; FLATSCR-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s33, 0x1004			; FLATSCR-NEXT: s_add_i32 [[SCRATCH_SGPR:s[0-9]+]], s33, 0x1004
	; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, [[SCRATCH_SGPR]] ; 4-byte Folded Reload			; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, [[SCRATCH_SGPR]] ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; MUBUF: s_add_i32 s32, s32, 0xfffbfd00{{$}}			; MUBUF: s_mov_b32 s32, s33
	; FLATSCR: s_addk_i32 s32, 0xeff4{{$}}			; FLATSCR: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, vcc_lo			; GCN-NEXT: s_mov_b32 s33, vcc_lo
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	define void @scratch_reg_needed_mubuf_offset(ptr addrspace(5) byval([4096 x i8]) align 4 %arg) #1 {			define void @scratch_reg_needed_mubuf_offset(ptr addrspace(5) byval([4096 x i8]) align 4 %arg) #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %alloca			store volatile i32 0, ptr addrspace(5) %alloca

	; Use all clobberable registers, so FP has to spill to a VGPR.			; Use all clobberable registers, so FP has to spill to a VGPR.
	Show All 25 Lines
	; GCN-LABEL: {{^}}ipra_call_with_stack:			; GCN-LABEL: {{^}}ipra_call_with_stack:
	; GCN: s_mov_b32 [[TMP_SGPR:s[0-9]+]], s33			; GCN: s_mov_b32 [[TMP_SGPR:s[0-9]+]], s33
	; GCN: s_mov_b32 s33, s32			; GCN: s_mov_b32 s33, s32
	; MUBUF: s_addk_i32 s32, 0x400			; MUBUF: s_addk_i32 s32, 0x400
	; FLATSCR: s_add_i32 s32, s32, 16			; FLATSCR: s_add_i32 s32, s32, 16
	; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33{{$}}			; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33{{$}}
	; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33{{$}}			; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33{{$}}
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; MUBUF: s_addk_i32 s32, 0xfc00			; MUBUF: s_mov_b32 s32, s33
	; FLATSCR: s_add_i32 s32, s32, -16			; FLATSCR: s_mov_b32 s32, s33
	; GCN: s_mov_b32 s33, [[TMP_SGPR]]			; GCN: s_mov_b32 s33, [[TMP_SGPR]]
	define void @ipra_call_with_stack() #0 {			define void @ipra_call_with_stack() #0 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %alloca			store volatile i32 0, ptr addrspace(5) %alloca
	call void @local_empty_func()			call void @local_empty_func()
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 148 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs-packed.ll

	Show First 20 Lines • Show All 421 Lines • ▼ Show 20 Lines
	; GCN-DAG: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-DAG: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-DAG: buffer_load_dword [[TMP_REG:v[0-9]+]], off, s[0:3], s33{{$}}			; GCN-DAG: buffer_load_dword [[TMP_REG:v[0-9]+]], off, s[0:3], s33{{$}}

	; GCN: buffer_store_dword [[TMP_REG]], off, s[0:3], s32{{$}}			; GCN: buffer_store_dword [[TMP_REG]], off, s[0:3], s32{{$}}

	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN: s_addk_i32 s32, 0xfc00{{$}}			; GCN: s_mov_b32 s32, s33{{$}}
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	define void @too_many_args_call_too_many_args_use_workitem_id_x(			define void @too_many_args_call_too_many_args_use_workitem_id_x(
	i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,			i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,
	i32 %arg8, i32 %arg9, i32 %arg10, i32 %arg11, i32 %arg12, i32 %arg13, i32 %arg14, i32 %arg15,			i32 %arg8, i32 %arg9, i32 %arg10, i32 %arg11, i32 %arg12, i32 %arg13, i32 %arg14, i32 %arg15,
	i32 %arg16, i32 %arg17, i32 %arg18, i32 %arg19, i32 %arg20, i32 %arg21, i32 %arg22, i32 %arg23,			i32 %arg16, i32 %arg17, i32 %arg18, i32 %arg19, i32 %arg20, i32 %arg21, i32 %arg22, i32 %arg23,
	i32 %arg24, i32 %arg25, i32 %arg26, i32 %arg27, i32 %arg28, i32 %arg29, i32 %arg30, i32 %arg31) #1 {			i32 %arg24, i32 %arg25, i32 %arg26, i32 %arg27, i32 %arg28, i32 %arg29, i32 %arg30, i32 %arg31) #1 {
	call void @too_many_args_use_workitem_id_x(			call void @too_many_args_use_workitem_id_x(
	i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,			i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,
	▲ Show 20 Lines • Show All 326 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll

	Show First 20 Lines • Show All 427 Lines • ▼ Show 20 Lines
	; GCN-DAG: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill			; GCN-DAG: buffer_store_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Spill
	; GCN-DAG: buffer_load_dword v32, off, s[0:3], s33{{$}}			; GCN-DAG: buffer_load_dword v32, off, s[0:3], s33{{$}}

	; GCN: buffer_store_dword v32, off, s[0:3], s32{{$}}			; GCN: buffer_store_dword v32, off, s[0:3], s32{{$}}

	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN: s_addk_i32 s32, 0xfc00{{$}}			; GCN: s_mov_b32 s32, s33{{$}}
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	define void @too_many_args_call_too_many_args_use_workitem_id_x(			define void @too_many_args_call_too_many_args_use_workitem_id_x(
	i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,			i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,
	i32 %arg8, i32 %arg9, i32 %arg10, i32 %arg11, i32 %arg12, i32 %arg13, i32 %arg14, i32 %arg15,			i32 %arg8, i32 %arg9, i32 %arg10, i32 %arg11, i32 %arg12, i32 %arg13, i32 %arg14, i32 %arg15,
	i32 %arg16, i32 %arg17, i32 %arg18, i32 %arg19, i32 %arg20, i32 %arg21, i32 %arg22, i32 %arg23,			i32 %arg16, i32 %arg17, i32 %arg18, i32 %arg19, i32 %arg20, i32 %arg21, i32 %arg22, i32 %arg23,
	i32 %arg24, i32 %arg25, i32 %arg26, i32 %arg27, i32 %arg28, i32 %arg29, i32 %arg30, i32 %arg31) #1 {			i32 %arg24, i32 %arg25, i32 %arg26, i32 %arg27, i32 %arg28, i32 %arg29, i32 %arg30, i32 %arg31) #1 {
	call void @too_many_args_use_workitem_id_x(			call void @too_many_args_use_workitem_id_x(
	i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,			i32 %arg0, i32 %arg1, i32 %arg2, i32 %arg3, i32 %arg4, i32 %arg5, i32 %arg6, i32 %arg7,
	▲ Show 20 Lines • Show All 363 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

	Show All 40 Lines
	; GCN-NEXT: s_addc_u32 s17, s17, func_v2f32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_v2f32@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: v_readlane_b32 s4, v40, 2			; GCN-NEXT: v_readlane_b32 s4, v40, 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	%split.ret.type = call <2 x float> @func_v2f32()			%split.ret.type = call <2 x float> @func_v2f32()
	br label %bb1			br label %bb1

	bb1:			bb1:
	Show All 19 Lines
	; GCN-NEXT: s_addc_u32 s17, s17, func_v3f32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_v3f32@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: v_readlane_b32 s4, v40, 2			; GCN-NEXT: v_readlane_b32 s4, v40, 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	%split.ret.type = call <3 x float> @func_v3f32()			%split.ret.type = call <3 x float> @func_v3f32()
	br label %bb1			br label %bb1

	bb1:			bb1:
	Show All 19 Lines
	; GCN-NEXT: s_addc_u32 s17, s17, func_v4f16@rel32@hi+12			; GCN-NEXT: s_addc_u32 s17, s17, func_v4f16@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: v_readlane_b32 s4, v40, 2			; GCN-NEXT: v_readlane_b32 s4, v40, 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	%split.ret.type = call <4 x half> @func_v4f16()			%split.ret.type = call <4 x half> @func_v4f16()
	br label %bb1			br label %bb1

	bb1:			bb1:
	Show All 20 Lines
	; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GCN-NEXT: v_mov_b32_e32 v1, v4			; GCN-NEXT: v_mov_b32_e32 v1, v4
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: v_readlane_b32 s4, v40, 2			; GCN-NEXT: v_readlane_b32 s4, v40, 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	%split.ret.type = call { <4 x i32>, <4 x half> } @func_struct()			%split.ret.type = call { <4 x i32>, <4 x half> } @func_struct()
	br label %bb1			br label %bb1

	bb1:			bb1:
	▲ Show 20 Lines • Show All 121 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll

	Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: v_readlane_b32 s35, v40, 3			; CHECK-NEXT: v_readlane_b32 s35, v40, 3
	; CHECK-NEXT: v_readlane_b32 s34, v40, 2			; CHECK-NEXT: v_readlane_b32 s34, v40, 2
	; CHECK-NEXT: v_readlane_b32 s31, v40, 1			; CHECK-NEXT: v_readlane_b32 s31, v40, 1
	; CHECK-NEXT: v_readlane_b32 s30, v40, 0			; CHECK-NEXT: v_readlane_b32 s30, v40, 0
	; CHECK-NEXT: v_readlane_b32 s4, v40, 16			; CHECK-NEXT: v_readlane_b32 s4, v40, 16
	; CHECK-NEXT: s_or_saveexec_b64 s[6:7], -1			; CHECK-NEXT: s_or_saveexec_b64 s[6:7], -1
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[6:7]			; CHECK-NEXT: s_mov_b64 exec, s[6:7]
	; CHECK-NEXT: s_addk_i32 s32, 0xfc00			; CHECK-NEXT: s_mov_b32 s32, s33
	; CHECK-NEXT: s_mov_b32 s33, s4			; CHECK-NEXT: s_mov_b32 s33, s4
	; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	; CHECK-NEXT: .Ltmp2:			; CHECK-NEXT: .Ltmp2:
	%2 = call ptr @__kmpc_alloc_shared(), !dbg !43			%2 = call ptr @__kmpc_alloc_shared(), !dbg !43
	%3 = call ptr @__kmpc_alloc_shared()			%3 = call ptr @__kmpc_alloc_shared()
	store i32 0, ptr %3, align 4			store i32 0, ptr %3, align 4
	call void @llvm.dbg.declare(metadata ptr %3, metadata !40, metadata !DIExpression()), !dbg !43			call void @llvm.dbg.declare(metadata ptr %3, metadata !40, metadata !DIExpression()), !dbg !43
	▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/dynamic_stackalloc.ll

	Show First 20 Lines • Show All 124 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_lshl_b32 s34, s34, 6			; GFX9-NEXT: s_lshl_b32 s34, s34, 6
	; GFX9-NEXT: s_add_i32 s34, s32, s34			; GFX9-NEXT: s_add_i32 s34, s32, s34
	; GFX9-NEXT: s_mov_b32 s32, s34			; GFX9-NEXT: s_mov_b32 s32, s34
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, s34			; GFX9-NEXT: v_mov_b32_e32 v1, s34
	; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s35			; GFX9-NEXT: s_mov_b32 s33, s35
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_dynamic_stackalloc_function_uniform:			; GFX11-LABEL: test_dynamic_stackalloc_function_uniform:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_lshl_b32 s0, s4, 2			; GFX11-NEXT: s_lshl_b32 s0, s4, 2
	; GFX11-NEXT: s_mov_b32 s1, s33			; GFX11-NEXT: s_mov_b32 s1, s33
	; GFX11-NEXT: s_add_i32 s0, s0, 15			; GFX11-NEXT: s_add_i32 s0, s0, 15
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_and_b32 s0, s0, 0x7fffff0			; GFX11-NEXT: s_and_b32 s0, s0, 0x7fffff0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_lshl_b32 s0, s0, 5			; GFX11-NEXT: s_lshl_b32 s0, s0, 5
	; GFX11-NEXT: v_mov_b32_e32 v0, 0			; GFX11-NEXT: v_mov_b32_e32 v0, 0
	; GFX11-NEXT: s_add_i32 s0, s32, s0			; GFX11-NEXT: s_add_i32 s0, s32, s0
	; GFX11-NEXT: s_mov_b32 s33, s1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_mov_b32 s32, s0			; GFX11-NEXT: s_mov_b32 s32, s0
				; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s33, s1
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%dyn.alloca = alloca i32, i32 %n, addrspace(5)			%dyn.alloca = alloca i32, i32 %n, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %dyn.alloca			store volatile i32 0, ptr addrspace(5) %dyn.alloca
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_dynamic_stackalloc_function_uniform_other_object(i32 inreg %n) {			define amdgpu_gfx void @test_dynamic_stackalloc_function_uniform_other_object(i32 inreg %n) {
	; GFX9-LABEL: test_dynamic_stackalloc_function_uniform_other_object:			; GFX9-LABEL: test_dynamic_stackalloc_function_uniform_other_object:
	Show All 13 Lines
	; GFX9-NEXT: s_add_i32 s34, s32, s34			; GFX9-NEXT: s_add_i32 s34, s32, s34
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:60			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:60
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_mov_b32 s32, s34			; GFX9-NEXT: s_mov_b32 s32, s34
	; GFX9-NEXT: v_mov_b32_e32 v0, 3			; GFX9-NEXT: v_mov_b32_e32 v0, 3
	; GFX9-NEXT: v_mov_b32_e32 v1, s34			; GFX9-NEXT: v_mov_b32_e32 v1, s34
	; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_addk_i32 s32, 0xec00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s35			; GFX9-NEXT: s_mov_b32 s33, s35
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_dynamic_stackalloc_function_uniform_other_object:			; GFX11-LABEL: test_dynamic_stackalloc_function_uniform_other_object:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_lshl_b32 s0, s4, 2			; GFX11-NEXT: s_lshl_b32 s0, s4, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2			; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2
	; GFX11-NEXT: s_add_i32 s0, s0, 15			; GFX11-NEXT: s_add_i32 s0, s0, 15
	; GFX11-NEXT: s_mov_b32 s1, s33			; GFX11-NEXT: s_mov_b32 s1, s33
	; GFX11-NEXT: s_and_b32 s0, s0, 0x7fffff0
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
				; GFX11-NEXT: s_and_b32 s0, s0, 0x7fffff0
	; GFX11-NEXT: s_addk_i32 s32, 0x50			; GFX11-NEXT: s_addk_i32 s32, 0x50
	; GFX11-NEXT: s_lshl_b32 s0, s0, 5
	; GFX11-NEXT: scratch_store_b32 off, v0, s33 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s33 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_lshl_b32 s0, s0, 5
	; GFX11-NEXT: v_mov_b32_e32 v0, 3			; GFX11-NEXT: v_mov_b32_e32 v0, 3
	; GFX11-NEXT: s_add_i32 s0, s32, s0			; GFX11-NEXT: s_add_i32 s0, s32, s0
	; GFX11-NEXT: scratch_store_b32 off, v1, s33 offset:60 dlc			; GFX11-NEXT: scratch_store_b32 off, v1, s33 offset:60 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_mov_b32 s32, s0			; GFX11-NEXT: s_mov_b32 s32, s0
	; GFX11-NEXT: s_mov_b32 s33, s1			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_addk_i32 s32, 0xffb0			; GFX11-NEXT: s_mov_b32 s33, s1
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%regular.object = alloca [16 x i32], addrspace(5)			%regular.object = alloca [16 x i32], addrspace(5)
	store volatile i32 1, ptr addrspace(5) %regular.object			store volatile i32 1, ptr addrspace(5) %regular.object
	%regular.object.last = getelementptr inbounds [16 x i32], ptr addrspace(5) %regular.object, i32 0, i32 15			%regular.object.last = getelementptr inbounds [16 x i32], ptr addrspace(5) %regular.object, i32 0, i32 15
	store volatile i32 2, ptr addrspace(5) %regular.object.last			store volatile i32 2, ptr addrspace(5) %regular.object.last
	%dynamic.alloca = alloca i32, i32 %n, addrspace(5)			%dynamic.alloca = alloca i32, i32 %n, addrspace(5)
	store volatile i32 3, ptr addrspace(5) %dynamic.alloca			store volatile i32 3, ptr addrspace(5) %dynamic.alloca
	ret void			ret void
	Show All 13 Lines
	; GFX9-NEXT: s_add_i32 s34, s32, s34			; GFX9-NEXT: s_add_i32 s34, s32, s34
	; GFX9-NEXT: s_and_b32 s34, s34, 0xfffff800			; GFX9-NEXT: s_and_b32 s34, s34, 0xfffff800
	; GFX9-NEXT: s_and_b32 s33, s33, 0xfffff800			; GFX9-NEXT: s_and_b32 s33, s33, 0xfffff800
	; GFX9-NEXT: s_mov_b32 s32, s34			; GFX9-NEXT: s_mov_b32 s32, s34
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, s34			; GFX9-NEXT: v_mov_b32_e32 v1, s34
	; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_addk_i32 s32, 0xf000			; GFX9-NEXT: s_add_i32 s32, s33, 0xfffff840
	; GFX9-NEXT: s_mov_b32 s33, s35			; GFX9-NEXT: s_mov_b32 s33, s35
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_dynamic_stackalloc_function_uniform_realign32:			; GFX11-LABEL: test_dynamic_stackalloc_function_uniform_realign32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s1, s33			; GFX11-NEXT: s_mov_b32 s1, s33
	; GFX11-NEXT: s_add_i32 s33, s32, 31			; GFX11-NEXT: s_add_i32 s33, s32, 31
	; GFX11-NEXT: s_lshl_b32 s0, s4, 2			; GFX11-NEXT: s_lshl_b32 s0, s4, 2
	; GFX11-NEXT: s_add_i32 s32, s32, 64			; GFX11-NEXT: s_add_i32 s32, s32, 64
	; GFX11-NEXT: s_add_i32 s0, s0, 15			; GFX11-NEXT: s_add_i32 s0, s0, 15
	; GFX11-NEXT: v_mov_b32_e32 v0, 0			; GFX11-NEXT: v_mov_b32_e32 v0, 0
	; GFX11-NEXT: s_and_b32 s0, s0, 0x7fffff0			; GFX11-NEXT: s_and_b32 s0, s0, 0x7fffff0
	; GFX11-NEXT: s_and_not1_b32 s33, s33, 31			; GFX11-NEXT: s_and_not1_b32 s33, s33, 31
	; GFX11-NEXT: s_lshl_b32 s0, s0, 5			; GFX11-NEXT: s_lshl_b32 s0, s0, 5
	; GFX11-NEXT: s_mov_b32 s33, s1
	; GFX11-NEXT: s_add_i32 s0, s32, s0
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(NEXT) \| instid1(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(NEXT) \| instid1(SALU_CYCLE_1)
				; GFX11-NEXT: s_add_i32 s0, s32, s0
	; GFX11-NEXT: s_and_b32 s0, s0, 0xfffffc00			; GFX11-NEXT: s_and_b32 s0, s0, 0xfffffc00
				; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_mov_b32 s32, s0			; GFX11-NEXT: s_mov_b32 s32, s0
	; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_addk_i32 s32, 0xffc0			; GFX11-NEXT: s_add_i32 s32, s33, 0xffffffe1
				; GFX11-NEXT: s_mov_b32 s33, s1
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%alloca = alloca i32, i32 %n, align 32, addrspace(5)			%alloca = alloca i32, i32 %n, align 32, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %alloca			store volatile i32 0, ptr addrspace(5) %alloca
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_dynamic_stackalloc_function_uniform_realign32_other_object(i32 inreg %n) {			define amdgpu_gfx void @test_dynamic_stackalloc_function_uniform_realign32_other_object(i32 inreg %n) {
	; GFX9-LABEL: test_dynamic_stackalloc_function_uniform_realign32_other_object:			; GFX9-LABEL: test_dynamic_stackalloc_function_uniform_realign32_other_object:
	Show All 15 Lines
	; GFX9-NEXT: s_and_b32 s34, s34, 0xfffff800			; GFX9-NEXT: s_and_b32 s34, s34, 0xfffff800
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:60			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:60
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_mov_b32 s32, s34			; GFX9-NEXT: s_mov_b32 s32, s34
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, s34			; GFX9-NEXT: v_mov_b32_e32 v1, s34
	; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_addk_i32 s32, 0xe000			; GFX9-NEXT: s_add_i32 s32, s33, 0xfffff840
	; GFX9-NEXT: s_mov_b32 s33, s35			; GFX9-NEXT: s_mov_b32 s33, s35
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_dynamic_stackalloc_function_uniform_realign32_other_object:			; GFX11-LABEL: test_dynamic_stackalloc_function_uniform_realign32_other_object:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s1, s33			; GFX11-NEXT: s_mov_b32 s1, s33
	; GFX11-NEXT: s_add_i32 s33, s32, 31			; GFX11-NEXT: s_add_i32 s33, s32, 31
	; GFX11-NEXT: s_lshl_b32 s0, s4, 2			; GFX11-NEXT: s_lshl_b32 s0, s4, 2
	; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2			; GFX11-NEXT: v_dual_mov_b32 v0, 1 :: v_dual_mov_b32 v1, 2
	; GFX11-NEXT: s_add_i32 s0, s0, 15			; GFX11-NEXT: s_add_i32 s0, s0, 15
	; GFX11-NEXT: s_addk_i32 s32, 0x80
	; GFX11-NEXT: s_and_b32 s0, s0, 0x7fffff0
	; GFX11-NEXT: s_and_not1_b32 s33, s33, 31			; GFX11-NEXT: s_and_not1_b32 s33, s33, 31
				; GFX11-NEXT: s_and_b32 s0, s0, 0x7fffff0
				; GFX11-NEXT: s_addk_i32 s32, 0x80
	; GFX11-NEXT: s_lshl_b32 s0, s0, 5			; GFX11-NEXT: s_lshl_b32 s0, s0, 5
	; GFX11-NEXT: scratch_store_b32 off, v0, s33 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s33 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_add_i32 s0, s32, s0			; GFX11-NEXT: s_add_i32 s0, s32, s0
	; GFX11-NEXT: v_mov_b32_e32 v0, 0			; GFX11-NEXT: v_mov_b32_e32 v0, 0
	; GFX11-NEXT: s_and_b32 s0, s0, 0xfffffc00			; GFX11-NEXT: s_and_b32 s0, s0, 0xfffffc00
	; GFX11-NEXT: scratch_store_b32 off, v1, s33 offset:60 dlc			; GFX11-NEXT: scratch_store_b32 off, v1, s33 offset:60 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_mov_b32 s32, s0			; GFX11-NEXT: s_mov_b32 s32, s0
	; GFX11-NEXT: s_mov_b32 s33, s1			; GFX11-NEXT: s_add_i32 s32, s33, 0xffffffe1
	; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_addk_i32 s32, 0xff80			; GFX11-NEXT: s_mov_b32 s33, s1
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%regular.object = alloca [16 x i32], addrspace(5)			%regular.object = alloca [16 x i32], addrspace(5)
	store volatile i32 1, ptr addrspace(5) %regular.object			store volatile i32 1, ptr addrspace(5) %regular.object
	%regular.object.last = getelementptr inbounds [16 x i32], ptr addrspace(5) %regular.object, i32 0, i32 15			%regular.object.last = getelementptr inbounds [16 x i32], ptr addrspace(5) %regular.object, i32 0, i32 15
	store volatile i32 2, ptr addrspace(5) %regular.object.last			store volatile i32 2, ptr addrspace(5) %regular.object.last
	%dyn.alloca = alloca i32, i32 %n, align 32, addrspace(5)			%dyn.alloca = alloca i32, i32 %n, align 32, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %dyn.alloca			store volatile i32 0, ptr addrspace(5) %dyn.alloca
	ret void			ret void
	Show All 11 Lines
	; GFX9-NEXT: s_addk_i32 s32, 0xc00			; GFX9-NEXT: s_addk_i32 s32, 0xc00
	; GFX9-NEXT: s_lshl_b32 s34, s34, 6			; GFX9-NEXT: s_lshl_b32 s34, s34, 6
	; GFX9-NEXT: s_add_i32 s34, s32, s34			; GFX9-NEXT: s_add_i32 s34, s32, s34
	; GFX9-NEXT: s_mov_b32 s32, s34			; GFX9-NEXT: s_mov_b32 s32, s34
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, s34			; GFX9-NEXT: v_mov_b32_e32 v1, s34
	; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_addk_i32 s32, 0xf400			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s35			; GFX9-NEXT: s_mov_b32 s33, s35
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_dynamic_stackalloc_stack_arg_usage_function_uniform:			; GFX11-LABEL: test_dynamic_stackalloc_stack_arg_usage_function_uniform:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_lshl_b32 s0, s4, 2			; GFX11-NEXT: s_lshl_b32 s0, s4, 2
	; GFX11-NEXT: s_mov_b32 s1, s33			; GFX11-NEXT: s_mov_b32 s1, s33
	; GFX11-NEXT: s_add_i32 s0, s0, 15			; GFX11-NEXT: s_add_i32 s0, s0, 15
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_and_b32 s0, s0, 0x7fffff0			; GFX11-NEXT: s_and_b32 s0, s0, 0x7fffff0
	; GFX11-NEXT: s_add_i32 s32, s32, 48			; GFX11-NEXT: s_add_i32 s32, s32, 48
	; GFX11-NEXT: s_lshl_b32 s0, s0, 5			; GFX11-NEXT: s_lshl_b32 s0, s0, 5
	; GFX11-NEXT: v_mov_b32_e32 v0, 0			; GFX11-NEXT: v_mov_b32_e32 v0, 0
	; GFX11-NEXT: s_add_i32 s0, s32, s0			; GFX11-NEXT: s_add_i32 s0, s32, s0
	; GFX11-NEXT: s_mov_b32 s33, s1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_mov_b32 s32, s0			; GFX11-NEXT: s_mov_b32 s32, s0
				; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_addk_i32 s32, 0xffd0			; GFX11-NEXT: s_mov_b32 s33, s1
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%alloca = alloca i32, i32 %n, addrspace(5)			%alloca = alloca i32, i32 %n, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %alloca			store volatile i32 0, ptr addrspace(5) %alloca
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_dynamic_stackalloc_byval_stack_arg_usage_function_uniform(ptr addrspace(5) byval([40 x i32]) %byval.arg, i32 inreg %n) {			define amdgpu_gfx void @test_dynamic_stackalloc_byval_stack_arg_usage_function_uniform(ptr addrspace(5) byval([40 x i32]) %byval.arg, i32 inreg %n) {
	; GFX9-LABEL: test_dynamic_stackalloc_byval_stack_arg_usage_function_uniform:			; GFX9-LABEL: test_dynamic_stackalloc_byval_stack_arg_usage_function_uniform:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen			; GFX9-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen
	; GFX9-NEXT: s_lshl_b32 s34, s4, 2			; GFX9-NEXT: s_lshl_b32 s34, s4, 2
	; GFX9-NEXT: s_add_i32 s34, s34, 15			; GFX9-NEXT: s_add_i32 s34, s34, 15
	; GFX9-NEXT: s_and_b32 s34, s34, 0x3fffff0			; GFX9-NEXT: s_and_b32 s34, s34, 0x3fffff0
	; GFX9-NEXT: s_mov_b32 s35, s33			; GFX9-NEXT: s_mov_b32 s35, s33
	; GFX9-NEXT: s_mov_b32 s33, s32			; GFX9-NEXT: s_mov_b32 s33, s32
	; GFX9-NEXT: s_addk_i32 s32, 0x400			; GFX9-NEXT: s_addk_i32 s32, 0x400
	; GFX9-NEXT: s_lshl_b32 s34, s34, 6			; GFX9-NEXT: s_lshl_b32 s34, s34, 6
	; GFX9-NEXT: s_add_i32 s34, s32, s34			; GFX9-NEXT: s_add_i32 s34, s32, s34
	; GFX9-NEXT: s_mov_b32 s32, s34			; GFX9-NEXT: s_mov_b32 s32, s34
	; GFX9-NEXT: v_mov_b32_e32 v1, s34			; GFX9-NEXT: v_mov_b32_e32 v1, s34
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s35			; GFX9-NEXT: s_mov_b32 s33, s35
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_dynamic_stackalloc_byval_stack_arg_usage_function_uniform:			; GFX11-LABEL: test_dynamic_stackalloc_byval_stack_arg_usage_function_uniform:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: scratch_load_b32 v0, v0, off			; GFX11-NEXT: scratch_load_b32 v0, v0, off
	; GFX11-NEXT: s_lshl_b32 s0, s4, 2			; GFX11-NEXT: s_lshl_b32 s0, s4, 2
	; GFX11-NEXT: s_mov_b32 s1, s33			; GFX11-NEXT: s_mov_b32 s1, s33
	; GFX11-NEXT: s_add_i32 s0, s0, 15			; GFX11-NEXT: s_add_i32 s0, s0, 15
	; GFX11-NEXT: s_mov_b32 s33, s32			; GFX11-NEXT: s_mov_b32 s33, s32
	; GFX11-NEXT: s_and_b32 s0, s0, 0x7fffff0			; GFX11-NEXT: s_and_b32 s0, s0, 0x7fffff0
	; GFX11-NEXT: s_add_i32 s32, s32, 16			; GFX11-NEXT: s_add_i32 s32, s32, 16
	; GFX11-NEXT: s_lshl_b32 s0, s0, 5			; GFX11-NEXT: s_lshl_b32 s0, s0, 5
	; GFX11-NEXT: s_mov_b32 s33, s1
	; GFX11-NEXT: s_add_i32 s0, s32, s0
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(NEXT) \| instid1(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1) \| instskip(NEXT) \| instid1(SALU_CYCLE_1)
				; GFX11-NEXT: s_add_i32 s0, s32, s0
	; GFX11-NEXT: s_mov_b32 s32, s0			; GFX11-NEXT: s_mov_b32 s32, s0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
				; GFX11-NEXT: s_mov_b32 s33, s1
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%alloca = alloca i32, i32 %n, addrspace(5)			%alloca = alloca i32, i32 %n, addrspace(5)
	%byval.0 = load i32, ptr addrspace(5) %byval.arg			%byval.0 = load i32, ptr addrspace(5) %byval.arg
	store volatile i32 %byval.0, ptr addrspace(5) %alloca			store volatile i32 %byval.0, ptr addrspace(5) %alloca
	%byval.39.ptr = getelementptr inbounds [16 x i32], ptr addrspace(5) %byval.arg, i32 0, i32 39			%byval.39.ptr = getelementptr inbounds [16 x i32], ptr addrspace(5) %byval.arg, i32 0, i32 39
	Show All 18 Lines
	; GFX9-NEXT: s_and_b32 s35, s35, 0xfffff800			; GFX9-NEXT: s_and_b32 s35, s35, 0xfffff800
	; GFX9-NEXT: s_and_b32 s33, s33, 0xfffff800			; GFX9-NEXT: s_and_b32 s33, s33, 0xfffff800
	; GFX9-NEXT: s_mov_b32 s32, s35			; GFX9-NEXT: s_mov_b32 s32, s35
	; GFX9-NEXT: v_mov_b32_e32 v0, 0			; GFX9-NEXT: v_mov_b32_e32 v0, 0
	; GFX9-NEXT: v_mov_b32_e32 v1, s35			; GFX9-NEXT: v_mov_b32_e32 v1, s35
	; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s37			; GFX9-NEXT: s_mov_b32 s34, s37
	; GFX9-NEXT: s_addk_i32 s32, 0xe800			; GFX9-NEXT: s_add_i32 s32, s33, 0xfffff840
	; GFX9-NEXT: s_mov_b32 s33, s36			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_dynamic_stackalloc_stack_arg_usage_function_uniform_realign32:			; GFX11-LABEL: test_dynamic_stackalloc_stack_arg_usage_function_uniform_realign32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s1, s33			; GFX11-NEXT: s_mov_b32 s1, s33
	; GFX11-NEXT: s_add_i32 s33, s32, 31			; GFX11-NEXT: s_add_i32 s33, s32, 31
	; GFX11-NEXT: s_lshl_b32 s0, s4, 2			; GFX11-NEXT: s_lshl_b32 s0, s4, 2
	; GFX11-NEXT: s_mov_b32 s2, s34			; GFX11-NEXT: s_mov_b32 s2, s34
	; GFX11-NEXT: s_add_i32 s0, s0, 15			; GFX11-NEXT: s_add_i32 s0, s0, 15
	; GFX11-NEXT: s_mov_b32 s34, s32			; GFX11-NEXT: s_mov_b32 s34, s32
	; GFX11-NEXT: s_and_b32 s0, s0, 0x7fffff0			; GFX11-NEXT: s_and_b32 s0, s0, 0x7fffff0
	; GFX11-NEXT: s_addk_i32 s32, 0x60			; GFX11-NEXT: s_addk_i32 s32, 0x60
	; GFX11-NEXT: s_lshl_b32 s0, s0, 5			; GFX11-NEXT: s_lshl_b32 s0, s0, 5
	; GFX11-NEXT: v_mov_b32_e32 v0, 0			; GFX11-NEXT: v_mov_b32_e32 v0, 0
	; GFX11-NEXT: s_add_i32 s0, s32, s0			; GFX11-NEXT: s_add_i32 s0, s32, s0
	; GFX11-NEXT: s_and_not1_b32 s33, s33, 31			; GFX11-NEXT: s_and_not1_b32 s33, s33, 31
	; GFX11-NEXT: s_and_b32 s0, s0, 0xfffffc00			; GFX11-NEXT: s_and_b32 s0, s0, 0xfffffc00
	; GFX11-NEXT: s_mov_b32 s34, s2			; GFX11-NEXT: s_mov_b32 s34, s2
	; GFX11-NEXT: s_mov_b32 s32, s0			; GFX11-NEXT: s_mov_b32 s32, s0
	; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_addk_i32 s32, 0xffa0			; GFX11-NEXT: s_add_i32 s32, s33, 0xffffffe1
	; GFX11-NEXT: s_mov_b32 s33, s1			; GFX11-NEXT: s_mov_b32 s33, s1
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%alloca = alloca i32, i32 %n, align 32, addrspace(5)			%alloca = alloca i32, i32 %n, align 32, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %alloca			store volatile i32 0, ptr addrspace(5) %alloca
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_dynamic_stackalloc_stack_arg_usage_function_uniform_realign32_other_object([40 x i32] %stack.args, i32 inreg %n) {			define amdgpu_gfx void @test_dynamic_stackalloc_stack_arg_usage_function_uniform_realign32_other_object([40 x i32] %stack.args, i32 inreg %n) {
	Show All 16 Lines
	; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:32			; GFX9-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:32
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_mov_b32 s32, s35			; GFX9-NEXT: s_mov_b32 s32, s35
	; GFX9-NEXT: v_mov_b32_e32 v0, 10			; GFX9-NEXT: v_mov_b32_e32 v0, 10
	; GFX9-NEXT: v_mov_b32_e32 v1, s35			; GFX9-NEXT: v_mov_b32_e32 v1, s35
	; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_mov_b32 s34, s37			; GFX9-NEXT: s_mov_b32 s34, s37
	; GFX9-NEXT: s_addk_i32 s32, 0xb800			; GFX9-NEXT: s_add_i32 s32, s33, 0xfffff840
	; GFX9-NEXT: s_mov_b32 s33, s36			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_dynamic_stackalloc_stack_arg_usage_function_uniform_realign32_other_object:			; GFX11-LABEL: test_dynamic_stackalloc_stack_arg_usage_function_uniform_realign32_other_object:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s1, s33			; GFX11-NEXT: s_mov_b32 s1, s33
	; GFX11-NEXT: s_add_i32 s33, s32, 31			; GFX11-NEXT: s_add_i32 s33, s32, 31
	; GFX11-NEXT: s_lshl_b32 s0, s4, 2			; GFX11-NEXT: s_lshl_b32 s0, s4, 2
	; GFX11-NEXT: v_mov_b32_e32 v0, 9			; GFX11-NEXT: v_mov_b32_e32 v0, 9
	; GFX11-NEXT: s_add_i32 s0, s0, 15			; GFX11-NEXT: s_add_i32 s0, s0, 15
	; GFX11-NEXT: s_mov_b32 s2, s34			; GFX11-NEXT: s_and_not1_b32 s33, s33, 31
	; GFX11-NEXT: s_and_b32 s0, s0, 0x7fffff0			; GFX11-NEXT: s_and_b32 s0, s0, 0x7fffff0
				; GFX11-NEXT: s_mov_b32 s2, s34
	; GFX11-NEXT: s_mov_b32 s34, s32			; GFX11-NEXT: s_mov_b32 s34, s32
	; GFX11-NEXT: s_addk_i32 s32, 0x120			; GFX11-NEXT: s_addk_i32 s32, 0x120
	; GFX11-NEXT: s_lshl_b32 s0, s0, 5			; GFX11-NEXT: s_lshl_b32 s0, s0, 5
	; GFX11-NEXT: s_and_not1_b32 s33, s33, 31
	; GFX11-NEXT: s_add_i32 s0, s32, s0
	; GFX11-NEXT: scratch_store_b32 off, v0, s33 offset:32 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s33 offset:32 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_add_i32 s0, s32, s0
	; GFX11-NEXT: v_mov_b32_e32 v0, 10			; GFX11-NEXT: v_mov_b32_e32 v0, 10
	; GFX11-NEXT: s_and_b32 s0, s0, 0xfffffc00			; GFX11-NEXT: s_and_b32 s0, s0, 0xfffffc00
	; GFX11-NEXT: s_mov_b32 s34, s2			; GFX11-NEXT: s_mov_b32 s34, s2
	; GFX11-NEXT: s_mov_b32 s32, s0			; GFX11-NEXT: s_mov_b32 s32, s0
	; GFX11-NEXT: s_mov_b32 s33, s1			; GFX11-NEXT: s_add_i32 s32, s33, 0xffffffe1
	; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s0 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_addk_i32 s32, 0xfee0			; GFX11-NEXT: s_mov_b32 s33, s1
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%fixed.realign = alloca [42 x i32], align 32, addrspace(5)			%fixed.realign = alloca [42 x i32], align 32, addrspace(5)
	store volatile i32 9, ptr addrspace(5) %fixed.realign			store volatile i32 9, ptr addrspace(5) %fixed.realign
	%dyn.alloca = alloca i32, i32 %n, align 32, addrspace(5)			%dyn.alloca = alloca i32, i32 %n, align 32, addrspace(5)
	store volatile i32 10, ptr addrspace(5) %dyn.alloca			store volatile i32 10, ptr addrspace(5) %dyn.alloca
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_dynamic_stackalloc_function_uniform_outgoing_stack_args:			; GFX11-LABEL: test_dynamic_stackalloc_function_uniform_outgoing_stack_args:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s0, v40, 3			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: scratch_store_b32 off, v0, s4 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s4 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%dyn.alloca = alloca i32, i32 %n, addrspace(5)			%dyn.alloca = alloca i32, i32 %n, addrspace(5)
	store volatile i32 1, ptr addrspace(5) %dyn.alloca			store volatile i32 1, ptr addrspace(5) %dyn.alloca
	call amdgpu_gfx void @uses_stack_args([40 x i32] zeroinitializer)			call amdgpu_gfx void @uses_stack_args([40 x i32] zeroinitializer)
	store volatile i32 2, ptr addrspace(5) %dyn.alloca			store volatile i32 2, ptr addrspace(5) %dyn.alloca
	ret void			ret void
	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:224 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:224 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xb800			; GFX9-NEXT: s_add_i32 s32, s33, 0xfffff840
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_dynamic_stackalloc_function_uniform_outgoing_stack_args_realign32:			; GFX11-LABEL: test_dynamic_stackalloc_function_uniform_outgoing_stack_args_realign32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s0, v40, 3			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: scratch_store_b32 off, v0, s4 dlc			; GFX11-NEXT: scratch_store_b32 off, v0, s4 dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:192 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:192 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_addk_i32 s32, 0xff00			; GFX11-NEXT: s_add_i32 s32, s33, 0xffffffe1
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%fixed.realign = alloca [42 x i32], align 32, addrspace(5)			%fixed.realign = alloca [42 x i32], align 32, addrspace(5)
	store volatile i32 9, ptr addrspace(5) %fixed.realign			store volatile i32 9, ptr addrspace(5) %fixed.realign
	%dyn.alloca = alloca i32, i32 %n, align 32, addrspace(5)			%dyn.alloca = alloca i32, i32 %n, align 32, addrspace(5)
	store volatile i32 1, ptr addrspace(5) %dyn.alloca			store volatile i32 1, ptr addrspace(5) %dyn.alloca
	call amdgpu_gfx void @uses_stack_args([40 x i32] zeroinitializer)			call amdgpu_gfx void @uses_stack_args([40 x i32] zeroinitializer)
	Show All 28 Lines
	; GFX9-NEXT: ; clobber v40			; GFX9-NEXT: ; clobber v40
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: ;;#ASMSTART			; GFX9-NEXT: ;;#ASMSTART
	; GFX9-NEXT: ; clobber v41			; GFX9-NEXT: ; clobber v41
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b32 s32, s34			; GFX9-NEXT: s_mov_b32 s32, s34
	; GFX9-NEXT: s_addk_i32 s32, 0xb800			; GFX9-NEXT: s_add_i32 s32, s33, 0xfffff840
	; GFX9-NEXT: s_mov_b32 s33, s35			; GFX9-NEXT: s_mov_b32 s33, s35
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_dynamic_stackalloc_function_uniform_realign32_csr_spilling:			; GFX11-LABEL: test_dynamic_stackalloc_function_uniform_realign32_csr_spilling:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s1, s33			; GFX11-NEXT: s_mov_b32 s1, s33
	Show All 21 Lines
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: ;;#ASMSTART			; GFX11-NEXT: ;;#ASMSTART
	; GFX11-NEXT: ; clobber v41			; GFX11-NEXT: ; clobber v41
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: s_clause 0x1			; GFX11-NEXT: s_clause 0x1
	; GFX11-NEXT: scratch_load_b32 v41, off, s33			; GFX11-NEXT: scratch_load_b32 v41, off, s33
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:4			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:4
	; GFX11-NEXT: s_mov_b32 s32, s0			; GFX11-NEXT: s_mov_b32 s32, s0
				; GFX11-NEXT: s_add_i32 s32, s33, 0xffffffe1
	; GFX11-NEXT: s_mov_b32 s33, s1			; GFX11-NEXT: s_mov_b32 s33, s1
	; GFX11-NEXT: s_addk_i32 s32, 0xfee0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%fixed.realign = alloca [42 x i32], align 32, addrspace(5)			%fixed.realign = alloca [42 x i32], align 32, addrspace(5)
	store volatile i32 9, ptr addrspace(5) %fixed.realign			store volatile i32 9, ptr addrspace(5) %fixed.realign
	%dyn.alloca = alloca i32, i32 %n, align 32, addrspace(5)			%dyn.alloca = alloca i32, i32 %n, align 32, addrspace(5)
	store volatile i32 2, ptr addrspace(5) %dyn.alloca			store volatile i32 2, ptr addrspace(5) %dyn.alloca
	call void asm sideeffect "; clobber v40", "~{v40}" ()			call void asm sideeffect "; clobber v40", "~{v40}" ()
	call void asm sideeffect "; clobber v41", "~{v41}" ()			call void asm sideeffect "; clobber v41", "~{v41}" ()
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

	Show All 36 Lines
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: v_readlane_b32 s31, v42, 1			; GCN-NEXT: v_readlane_b32 s31, v42, 1
	; GCN-NEXT: v_readlane_b32 s30, v42, 0			; GCN-NEXT: v_readlane_b32 s30, v42, 0
	; GCN-NEXT: v_readlane_b32 s4, v42, 2			; GCN-NEXT: v_readlane_b32 s4, v42, 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%alloca.val = alloca <8 x i32>, align 64, addrspace(5)			%alloca.val = alloca <8 x i32>, align 64, addrspace(5)
	store volatile <8 x i32> %val, ptr addrspace(5) %alloca.val, align 64			store volatile <8 x i32> %val, ptr addrspace(5) %alloca.val, align 64
	call void asm sideeffect "", "~{v40}" ()			call void asm sideeffect "", "~{v40}" ()
	call void asm sideeffect "", "~{v41}" ()			call void asm sideeffect "", "~{v41}" ()
	call void @extern_func(i32 %idx)			call void @extern_func(i32 %idx)
	ret void			ret void
	}			}

	declare void @extern_func(i32) #0			declare void @extern_func(i32) #0

	attributes #0 = { noinline nounwind }			attributes #0 = { noinline nounwind }

llvm/test/CodeGen/AMDGPU/frame-setup-without-sgpr-to-vgpr-spills.ll

	Show All 26 Lines
	; SPILL-TO-VGPR-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12			; SPILL-TO-VGPR-NEXT: s_addc_u32 s5, s5, external_void_func_void@rel32@hi+12
	; SPILL-TO-VGPR-NEXT: s_swappc_b64 s[30:31], s[4:5]			; SPILL-TO-VGPR-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; SPILL-TO-VGPR-NEXT: v_readlane_b32 s31, v40, 1			; SPILL-TO-VGPR-NEXT: v_readlane_b32 s31, v40, 1
	; SPILL-TO-VGPR-NEXT: v_readlane_b32 s30, v40, 0			; SPILL-TO-VGPR-NEXT: v_readlane_b32 s30, v40, 0
	; SPILL-TO-VGPR-NEXT: v_readlane_b32 s4, v40, 2			; SPILL-TO-VGPR-NEXT: v_readlane_b32 s4, v40, 2
	; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[6:7], -1			; SPILL-TO-VGPR-NEXT: s_or_saveexec_b64 s[6:7], -1
	; SPILL-TO-VGPR-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; SPILL-TO-VGPR-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[6:7]			; SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[6:7]
	; SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0xfc00			; SPILL-TO-VGPR-NEXT: s_mov_b32 s32, s33
	; SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s4			; SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s4
	; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)			; SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
	; SPILL-TO-VGPR-NEXT: s_setpc_b64 s[30:31]			; SPILL-TO-VGPR-NEXT: s_setpc_b64 s[30:31]
	;			;
	; NO-SPILL-TO-VGPR-LABEL: callee_with_stack_and_call:			; NO-SPILL-TO-VGPR-LABEL: callee_with_stack_and_call:
	; NO-SPILL-TO-VGPR: ; %bb.0:			; NO-SPILL-TO-VGPR: ; %bb.0:
	; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; NO-SPILL-TO-VGPR-NEXT: s_mov_b32 s4, s33			; NO-SPILL-TO-VGPR-NEXT: s_mov_b32 s4, s33
	Show All 38 Lines
	; NO-SPILL-TO-VGPR-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:16			; NO-SPILL-TO-VGPR-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:16
	; NO-SPILL-TO-VGPR-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; NO-SPILL-TO-VGPR-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)			; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
	; NO-SPILL-TO-VGPR-NEXT: v_readlane_b32 s30, v0, 0			; NO-SPILL-TO-VGPR-NEXT: v_readlane_b32 s30, v0, 0
	; NO-SPILL-TO-VGPR-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:16			; NO-SPILL-TO-VGPR-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:16
	; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)			; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
	; NO-SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[4:5]			; NO-SPILL-TO-VGPR-NEXT: s_mov_b64 exec, s[4:5]
	; NO-SPILL-TO-VGPR-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; NO-SPILL-TO-VGPR-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; NO-SPILL-TO-VGPR-NEXT: s_addk_i32 s32, 0xf800			; NO-SPILL-TO-VGPR-NEXT: s_mov_b32 s32, s33
	; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)			; NO-SPILL-TO-VGPR-NEXT: s_waitcnt vmcnt(0)
	; NO-SPILL-TO-VGPR-NEXT: v_readfirstlane_b32 s4, v0			; NO-SPILL-TO-VGPR-NEXT: v_readfirstlane_b32 s4, v0
	; NO-SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s4			; NO-SPILL-TO-VGPR-NEXT: s_mov_b32 s33, s4
	; NO-SPILL-TO-VGPR-NEXT: s_setpc_b64 s[30:31]			; NO-SPILL-TO-VGPR-NEXT: s_setpc_b64 s[30:31]
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %alloca			store volatile i32 0, ptr addrspace(5) %alloca
	call void @external_void_func_void()			call void @external_void_func_void()
	ret void			ret void
	}			}

	attributes #0 = { nounwind "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" }			attributes #0 = { nounwind "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" }

llvm/test/CodeGen/AMDGPU/gfx-call-non-gfx-func.ll

	Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	; SDAG-NEXT: v_readlane_b32 s8, v40, 4			; SDAG-NEXT: v_readlane_b32 s8, v40, 4
	; SDAG-NEXT: v_readlane_b32 s7, v40, 3			; SDAG-NEXT: v_readlane_b32 s7, v40, 3
	; SDAG-NEXT: v_readlane_b32 s6, v40, 2			; SDAG-NEXT: v_readlane_b32 s6, v40, 2
	; SDAG-NEXT: v_readlane_b32 s5, v40, 1			; SDAG-NEXT: v_readlane_b32 s5, v40, 1
	; SDAG-NEXT: v_readlane_b32 s4, v40, 0			; SDAG-NEXT: v_readlane_b32 s4, v40, 0
	; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1			; SDAG-NEXT: s_or_saveexec_b64 s[34:35], -1
	; SDAG-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; SDAG-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; SDAG-NEXT: s_mov_b64 exec, s[34:35]			; SDAG-NEXT: s_mov_b64 exec, s[34:35]
	; SDAG-NEXT: s_addk_i32 s32, 0xfc00			; SDAG-NEXT: s_mov_b32 s32, s33
	; SDAG-NEXT: s_mov_b32 s33, s36			; SDAG-NEXT: s_mov_b32 s33, s36
	; SDAG-NEXT: s_waitcnt vmcnt(0)			; SDAG-NEXT: s_waitcnt vmcnt(0)
	; SDAG-NEXT: s_setpc_b64 s[30:31]			; SDAG-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: gfx_func:			; GISEL-LABEL: gfx_func:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_mov_b32 s36, s33			; GISEL-NEXT: s_mov_b32 s36, s33
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s8, v40, 4			; GISEL-NEXT: v_readlane_b32 s8, v40, 4
	; GISEL-NEXT: v_readlane_b32 s7, v40, 3			; GISEL-NEXT: v_readlane_b32 s7, v40, 3
	; GISEL-NEXT: v_readlane_b32 s6, v40, 2			; GISEL-NEXT: v_readlane_b32 s6, v40, 2
	; GISEL-NEXT: v_readlane_b32 s5, v40, 1			; GISEL-NEXT: v_readlane_b32 s5, v40, 1
	; GISEL-NEXT: v_readlane_b32 s4, v40, 0			; GISEL-NEXT: v_readlane_b32 s4, v40, 0
	; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1			; GISEL-NEXT: s_or_saveexec_b64 s[34:35], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[34:35]			; GISEL-NEXT: s_mov_b64 exec, s[34:35]
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_mov_b32 s32, s33
	; GISEL-NEXT: s_mov_b32 s33, s36			; GISEL-NEXT: s_mov_b32 s33, s36
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	call void @extern_c_func()			call void @extern_c_func()
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_imm:			; GFX10-LABEL: test_call_external_void_func_i1_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 14 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i1_imm:			; GFX11-LABEL: test_call_external_void_func_i1_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i1(i1 true)			call amdgpu_gfx void @external_void_func_i1(i1 true)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i1_signext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i1_signext(i32) #0 {
	Show All 18 Lines
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_signext:			; GFX10-LABEL: test_call_external_void_func_i1_signext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 16 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i1_signext:			; GFX11-LABEL: test_call_external_void_func_i1_signext:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX11-NEXT: scratch_store_b8 off, v0, s32			; GFX11-NEXT: scratch_store_b8 off, v0, s32
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_signext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_signext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 16 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i1, ptr addrspace(1) undef			%var = load volatile i1, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_i1_signext(i1 signext%var)			call amdgpu_gfx void @external_void_func_i1_signext(i1 signext%var)
	ret void			ret void
	}			}

	Show All 19 Lines
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_zeroext:			; GFX10-LABEL: test_call_external_void_func_i1_zeroext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 16 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i1_zeroext:			; GFX11-LABEL: test_call_external_void_func_i1_zeroext:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX11-NEXT: scratch_store_b8 off, v0, s32			; GFX11-NEXT: scratch_store_b8 off, v0, s32
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_zeroext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_zeroext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 16 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i1, ptr addrspace(1) undef			%var = load volatile i1, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_i1_zeroext(i1 zeroext %var)			call amdgpu_gfx void @external_void_func_i1_zeroext(i1 zeroext %var)
	ret void			ret void
	}			}

	Show All 16 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_imm:			; GFX10-LABEL: test_call_external_void_func_i8_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 13 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i8_imm:			; GFX11-LABEL: test_call_external_void_func_i8_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 12 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i8(i8 123)			call amdgpu_gfx void @external_void_func_i8(i8 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i8_signext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i8_signext(i32) #0 {
	Show All 16 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_signext@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_signext:			; GFX10-LABEL: test_call_external_void_func_i8_signext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 14 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i8_signext:			; GFX11-LABEL: test_call_external_void_func_i8_signext:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_signext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_signext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i8, ptr addrspace(1) undef			%var = load volatile i8, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_i8_signext(i8 signext %var)			call amdgpu_gfx void @external_void_func_i8_signext(i8 signext %var)
	ret void			ret void
	}			}

	Show All 17 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i8_zeroext@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_zeroext:			; GFX10-LABEL: test_call_external_void_func_i8_zeroext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 14 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i8_zeroext:			; GFX11-LABEL: test_call_external_void_func_i8_zeroext:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_zeroext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_zeroext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i8, ptr addrspace(1) undef			%var = load volatile i8, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_i8_zeroext(i8 zeroext %var)			call amdgpu_gfx void @external_void_func_i8_zeroext(i8 zeroext %var)
	ret void			ret void
	}			}

	Show All 16 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_imm:			; GFX10-LABEL: test_call_external_void_func_i16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 13 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i16_imm:			; GFX11-LABEL: test_call_external_void_func_i16_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 12 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i16(i16 123)			call amdgpu_gfx void @external_void_func_i16(i16 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i16_signext(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i16_signext(i32) #0 {
	Show All 16 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_signext@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_signext:			; GFX10-LABEL: test_call_external_void_func_i16_signext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 14 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i16_signext:			; GFX11-LABEL: test_call_external_void_func_i16_signext:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_signext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_signext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i16, ptr addrspace(1) undef			%var = load volatile i16, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_i16_signext(i16 signext %var)			call amdgpu_gfx void @external_void_func_i16_signext(i16 signext %var)
	ret void			ret void
	}			}

	Show All 17 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i16_zeroext@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_zeroext:			; GFX10-LABEL: test_call_external_void_func_i16_zeroext:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 14 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i16_zeroext:			; GFX11-LABEL: test_call_external_void_func_i16_zeroext:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_zeroext:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_zeroext:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%var = load volatile i16, ptr addrspace(1) undef			%var = load volatile i16, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_i16_zeroext(i16 zeroext %var)			call amdgpu_gfx void @external_void_func_i16_zeroext(i16 zeroext %var)
	ret void			ret void
	}			}

	Show All 16 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i32_imm:			; GFX10-LABEL: test_call_external_void_func_i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 13 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i32_imm:			; GFX11-LABEL: test_call_external_void_func_i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 12 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i32(i32 42)			call amdgpu_gfx void @external_void_func_i32(i32 42)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_i64_imm() #0 {
	Show All 16 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i64_imm:			; GFX10-LABEL: test_call_external_void_func_i64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 14 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i64_imm:			; GFX11-LABEL: test_call_external_void_func_i64_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 12 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i64(i64 123)			call amdgpu_gfx void @external_void_func_i64(i64 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i64() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i64() #0 {
	Show All 17 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64:			; GFX10-LABEL: test_call_external_void_func_v2i64:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 15 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i64:			; GFX11-LABEL: test_call_external_void_func_v2i64:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 15 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i64>, ptr addrspace(1) null			%val = load <2 x i64>, ptr addrspace(1) null
	call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> %val)			call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> %val)
	ret void			ret void
	}			}

	Show All 19 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64_imm:			; GFX10-LABEL: test_call_external_void_func_v2i64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 16 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i64_imm:			; GFX11-LABEL: test_call_external_void_func_v2i64_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 16 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> <i64 8589934593, i64 17179869187>)			call amdgpu_gfx void @external_void_func_v2i64(<2 x i64> <i64 8589934593, i64 17179869187>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i64() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i64() #0 {
	Show All 19 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i64:			; GFX10-LABEL: test_call_external_void_func_v3i64:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 17 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i64:			; GFX11-LABEL: test_call_external_void_func_v3i64:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 17 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%load = load <2 x i64>, ptr addrspace(1) null			%load = load <2 x i64>, ptr addrspace(1) null
	%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 undef>, <3 x i32> <i32 0, i32 1, i32 2>			%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 undef>, <3 x i32> <i32 0, i32 1, i32 2>

	call amdgpu_gfx void @external_void_func_v3i64(<3 x i64> %val)			call amdgpu_gfx void @external_void_func_v3i64(<3 x i64> %val)
	ret void			ret void
	Show All 24 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i64:			; GFX10-LABEL: test_call_external_void_func_v4i64:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 19 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i64:			; GFX11-LABEL: test_call_external_void_func_v4i64:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 15 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i64:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i64:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 19 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%load = load <2 x i64>, ptr addrspace(1) null			%load = load <2 x i64>, ptr addrspace(1) null
	%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	call amdgpu_gfx void @external_void_func_v4i64(<4 x i64> %val)			call amdgpu_gfx void @external_void_func_v4i64(<4 x i64> %val)
	ret void			ret void
	}			}
	Show All 17 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f16_imm:			; GFX10-LABEL: test_call_external_void_func_f16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 13 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_f16_imm:			; GFX11-LABEL: test_call_external_void_func_f16_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 12 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f16(half 4.0)			call amdgpu_gfx void @external_void_func_f16(half 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_f32_imm() #0 {
	Show All 15 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f32_imm:			; GFX10-LABEL: test_call_external_void_func_f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 13 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_f32_imm:			; GFX11-LABEL: test_call_external_void_func_f32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 12 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f32(float 4.0)			call amdgpu_gfx void @external_void_func_f32(float 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f32_imm() #0 {
	Show All 16 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f32_imm:			; GFX10-LABEL: test_call_external_void_func_v2f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 14 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2f32_imm:			; GFX11-LABEL: test_call_external_void_func_v2f32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 12 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2f32(<2 x float> <float 1.0, float 2.0>)			call amdgpu_gfx void @external_void_func_v2f32(<2 x float> <float 1.0, float 2.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f32_imm() #0 {
	Show All 17 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f32_imm:			; GFX10-LABEL: test_call_external_void_func_v3f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 15 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f32_imm:			; GFX11-LABEL: test_call_external_void_func_v3f32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 15 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f32(<3 x float> <float 1.0, float 2.0, float 4.0>)			call amdgpu_gfx void @external_void_func_v3f32(<3 x float> <float 1.0, float 2.0, float 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5f32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5f32_imm() #0 {
	Show All 19 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5f32_imm:			; GFX10-LABEL: test_call_external_void_func_v5f32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 17 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v5f32_imm:			; GFX11-LABEL: test_call_external_void_func_v5f32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 17 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v5f32(<5 x float> <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)			call amdgpu_gfx void @external_void_func_v5f32(<5 x float> <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_f64_imm() #0 {
	Show All 16 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_f64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f64_imm:			; GFX10-LABEL: test_call_external_void_func_f64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 14 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_f64_imm:			; GFX11-LABEL: test_call_external_void_func_f64_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 12 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f64(double 4.0)			call amdgpu_gfx void @external_void_func_f64(double 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f64_imm() #0 {
	Show All 18 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f64_imm:			; GFX10-LABEL: test_call_external_void_func_v2f64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 16 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2f64_imm:			; GFX11-LABEL: test_call_external_void_func_v2f64_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 16 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2f64(<2 x double> <double 2.0, double 4.0>)			call amdgpu_gfx void @external_void_func_v2f64(<2 x double> <double 2.0, double 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f64_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f64_imm() #0 {
	Show All 20 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f64@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f64_imm:			; GFX10-LABEL: test_call_external_void_func_v3f64_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 18 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f64_imm:			; GFX11-LABEL: test_call_external_void_func_v3f64_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f64_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f64_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 18 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f64(<3 x double> <double 2.0, double 4.0, double 8.0>)			call amdgpu_gfx void @external_void_func_v3f64(<3 x double> <double 2.0, double 4.0, double 8.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i16() #0 {
	Show All 15 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i16:			; GFX10-LABEL: test_call_external_void_func_v2i16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 13 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i16:			; GFX11-LABEL: test_call_external_void_func_v2i16:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 12 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i16>, ptr addrspace(1) undef			%val = load <2 x i16>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v2i16(<2 x i16> %val)			call amdgpu_gfx void @external_void_func_v2i16(<2 x i16> %val)
	ret void			ret void
	}			}

	Show All 16 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16:			; GFX10-LABEL: test_call_external_void_func_v3i16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 13 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i16:			; GFX11-LABEL: test_call_external_void_func_v3i16:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 12 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <3 x i16>, ptr addrspace(1) undef			%val = load <3 x i16>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> %val)			call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> %val)
	ret void			ret void
	}			}

	Show All 16 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16:			; GFX10-LABEL: test_call_external_void_func_v3f16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 13 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f16:			; GFX11-LABEL: test_call_external_void_func_v3f16:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 12 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <3 x half>, ptr addrspace(1) undef			%val = load <3 x half>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v3f16(<3 x half> %val)			call amdgpu_gfx void @external_void_func_v3f16(<3 x half> %val)
	ret void			ret void
	}			}

	Show All 17 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16_imm:			; GFX10-LABEL: test_call_external_void_func_v3i16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 14 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i16_imm:			; GFX11-LABEL: test_call_external_void_func_v3i16_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 12 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> <i16 1, i16 2, i16 3>)			call amdgpu_gfx void @external_void_func_v3i16(<3 x i16> <i16 1, i16 2, i16 3>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f16_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f16_imm() #0 {
	Show All 16 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3f16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16_imm:			; GFX10-LABEL: test_call_external_void_func_v3f16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 14 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f16_imm:			; GFX11-LABEL: test_call_external_void_func_v3f16_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f16(<3 x half> <half 1.0, half 2.0, half 4.0>)			call amdgpu_gfx void @external_void_func_v3f16(<3 x half> <half 1.0, half 2.0, half 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i16() #0 {
	Show All 15 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16:			; GFX10-LABEL: test_call_external_void_func_v4i16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 13 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i16:			; GFX11-LABEL: test_call_external_void_func_v4i16:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 12 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <4 x i16>, ptr addrspace(1) undef			%val = load <4 x i16>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> %val)			call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> %val)
	ret void			ret void
	}			}

	Show All 17 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16_imm:			; GFX10-LABEL: test_call_external_void_func_v4i16_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 14 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i16_imm:			; GFX11-LABEL: test_call_external_void_func_v4i16_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> <i16 1, i16 2, i16 3, i16 4>)			call amdgpu_gfx void @external_void_func_v4i16(<4 x i16> <i16 1, i16 2, i16 3, i16 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f16() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f16() #0 {
	Show All 15 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2f16@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f16:			; GFX10-LABEL: test_call_external_void_func_v2f16:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 13 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2f16:			; GFX11-LABEL: test_call_external_void_func_v2f16:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 12 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x half>, ptr addrspace(1) undef			%val = load <2 x half>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v2f16(<2 x half> %val)			call amdgpu_gfx void @external_void_func_v2f16(<2 x half> %val)
	ret void			ret void
	}			}

	Show All 16 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32:			; GFX10-LABEL: test_call_external_void_func_v2i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 13 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i32:			; GFX11-LABEL: test_call_external_void_func_v2i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 12 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i32>, ptr addrspace(1) undef			%val = load <2 x i32>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> %val)			call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> %val)
	ret void			ret void
	}			}

	Show All 17 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v2i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32_imm:			; GFX10-LABEL: test_call_external_void_func_v2i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 14 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i32_imm:			; GFX11-LABEL: test_call_external_void_func_v2i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 12 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> <i32 1, i32 2>)			call amdgpu_gfx void @external_void_func_v2i32(<2 x i32> <i32 1, i32 2>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_imm(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_imm(i32) #0 {
	Show All 17 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_imm:			; GFX10-LABEL: test_call_external_void_func_v3i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 15 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i32_imm:			; GFX11-LABEL: test_call_external_void_func_v3i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 15 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i32(<3 x i32> <i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v3i32(<3 x i32> <i32 3, i32 4, i32 5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_i32(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_i32(i32) #0 {
	Show All 18 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v3i32_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_i32:			; GFX10-LABEL: test_call_external_void_func_v3i32_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 16 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i32_i32:			; GFX11-LABEL: test_call_external_void_func_v3i32_i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 16 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i32_i32(<3 x i32> <i32 3, i32 4, i32 5>, i32 6)			call amdgpu_gfx void @external_void_func_v3i32_i32(<3 x i32> <i32 3, i32 4, i32 5>, i32 6)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i32() #0 {
	Show All 15 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32:			; GFX10-LABEL: test_call_external_void_func_v4i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 13 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i32:			; GFX11-LABEL: test_call_external_void_func_v4i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 12 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <4 x i32>, ptr addrspace(1) undef			%val = load <4 x i32>, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> %val)			call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> %val)
	ret void			ret void
	}			}

	Show All 19 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v4i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32_imm:			; GFX10-LABEL: test_call_external_void_func_v4i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 16 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i32_imm:			; GFX11-LABEL: test_call_external_void_func_v4i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 16 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> <i32 1, i32 2, i32 3, i32 4>)			call amdgpu_gfx void @external_void_func_v4i32(<4 x i32> <i32 1, i32 2, i32 3, i32 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5i32_imm() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5i32_imm() #0 {
	Show All 19 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v5i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5i32_imm:			; GFX10-LABEL: test_call_external_void_func_v5i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 17 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v5i32_imm:			; GFX11-LABEL: test_call_external_void_func_v5i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 17 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v5i32(<5 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v5i32(<5 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v8i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v8i32() #0 {
	Show All 19 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32:			; GFX10-LABEL: test_call_external_void_func_v8i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 18 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v8i32:			; GFX11-LABEL: test_call_external_void_func_v8i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 17 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 18 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(1), ptr addrspace(4) undef			%ptr = load ptr addrspace(1), ptr addrspace(4) undef
	%val = load <8 x i32>, ptr addrspace(1) %ptr			%val = load <8 x i32>, ptr addrspace(1) %ptr
	call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> %val)			call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> %val)
	ret void			ret void
	}			}
	Show All 24 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v8i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32_imm:			; GFX10-LABEL: test_call_external_void_func_v8i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 20 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v8i32_imm:			; GFX11-LABEL: test_call_external_void_func_v8i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 15 Lines
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 20 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)			call amdgpu_gfx void @external_void_func_v8i32(<8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v16i32() #0 {			define amdgpu_gfx void @test_call_external_void_func_v16i32() #0 {
	Show All 21 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v16i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v16i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v16i32:			; GFX10-LABEL: test_call_external_void_func_v16i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 20 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v16i32:			; GFX11-LABEL: test_call_external_void_func_v16i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 19 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 20 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(1), ptr addrspace(4) undef			%ptr = load ptr addrspace(1), ptr addrspace(4) undef
	%val = load <16 x i32>, ptr addrspace(1) %ptr			%val = load <16 x i32>, ptr addrspace(1) %ptr
	call amdgpu_gfx void @external_void_func_v16i32(<16 x i32> %val)			call amdgpu_gfx void @external_void_func_v16i32(<16 x i32> %val)
	ret void			ret void
	}			}
	Show All 28 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_v32i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32:			; GFX10-LABEL: test_call_external_void_func_v32i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 24 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v32i32:			; GFX11-LABEL: test_call_external_void_func_v32i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 23 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 24 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(1), ptr addrspace(4) undef			%ptr = load ptr addrspace(1), ptr addrspace(4) undef
	%val = load <32 x i32>, ptr addrspace(1) %ptr			%val = load <32 x i32>, ptr addrspace(1) %ptr
	call amdgpu_gfx void @external_void_func_v32i32(<32 x i32> %val)			call amdgpu_gfx void @external_void_func_v32i32(<32 x i32> %val)
	ret void			ret void
	}			}
	Show All 31 Lines
	; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32			; GFX9-NEXT: buffer_store_dword v32, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32_i32:			; GFX10-LABEL: test_call_external_void_func_v32i32_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 27 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v32i32_i32:			; GFX11-LABEL: test_call_external_void_func_v32i32_i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 25 Lines
	; GFX11-NEXT: scratch_store_b32 off, v32, s32			; GFX11-NEXT: scratch_store_b32 off, v32, s32
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 27 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr0 = load ptr addrspace(1), ptr addrspace(4) undef			%ptr0 = load ptr addrspace(1), ptr addrspace(4) undef
	%val0 = load <32 x i32>, ptr addrspace(1) %ptr0			%val0 = load <32 x i32>, ptr addrspace(1) %ptr0
	%val1 = load i32, ptr addrspace(1) undef			%val1 = load i32, ptr addrspace(1) undef
	call amdgpu_gfx void @external_void_func_v32i32_i32(<32 x i32> %val0, i32 %val1)			call amdgpu_gfx void @external_void_func_v32i32_i32(<32 x i32> %val0, i32 %val1)
	ret void			ret void
	Show All 26 Lines
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_i32_func_i32_imm:			; GFX10-LABEL: test_call_external_i32_func_i32_imm:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 22 Lines
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_i32_func_i32_imm:			; GFX11-LABEL: test_call_external_i32_func_i32_imm:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 20 Lines
	; GFX11-NEXT: scratch_load_b32 v42, off, s33			; GFX11-NEXT: scratch_load_b32 v42, off, s33
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4			; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:8 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:8 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_i32_func_i32_imm:			; GFX10-SCRATCH-LABEL: test_call_external_i32_func_i32_imm:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 22 Lines
	; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4			; GFX10-SCRATCH-NEXT: scratch_load_dword v41, off, s33 offset:4
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:8 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:8 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = call amdgpu_gfx i32 @external_i32_func_i32(i32 42)			%val = call amdgpu_gfx i32 @external_i32_func_i32(i32 42)
	store volatile i32 %val, ptr addrspace(1) %out			store volatile i32 %val, ptr addrspace(1) %out
	ret void			ret void
	}			}

	Show All 20 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_struct_i8_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_struct_i8_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_struct_i8_i32:			; GFX10-LABEL: test_call_external_void_func_struct_i8_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 18 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_struct_i8_i32:			; GFX11-LABEL: test_call_external_void_func_struct_i8_i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 17 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_struct_i8_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_struct_i8_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 18 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr0 = load ptr addrspace(1), ptr addrspace(4) undef			%ptr0 = load ptr addrspace(1), ptr addrspace(4) undef
	%val = load { i8, i32 }, ptr addrspace(1) %ptr0			%val = load { i8, i32 }, ptr addrspace(1) %ptr0
	call amdgpu_gfx void @external_void_func_struct_i8_i32({ i8, i32 } %val)			call amdgpu_gfx void @external_void_func_struct_i8_i32({ i8, i32 } %val)
	ret void			ret void
	}			}
	Show All 21 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_byval_struct_i8_i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_byval_struct_i8_i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_byval_struct_i8_i32:			; GFX10-LABEL: test_call_external_void_func_byval_struct_i8_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 17 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_byval_struct_i8_i32:			; GFX11-LABEL: test_call_external_void_func_byval_struct_i8_i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 16 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:8 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:8 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_byval_struct_i8_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_byval_struct_i8_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 17 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:8 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:8 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = alloca { i8, i32 }, align 4, addrspace(5)			%val = alloca { i8, i32 }, align 4, addrspace(5)
	%gep0 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %val, i32 0, i32 0			%gep0 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %val, i32 0, i32 0
	%gep1 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %val, i32 0, i32 1			%gep1 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %val, i32 0, i32 1
	store i8 3, ptr addrspace(5) %gep0			store i8 3, ptr addrspace(5) %gep0
	store i32 8, ptr addrspace(5) %gep1			store i32 8, ptr addrspace(5) %gep1
	Show All 33 Lines
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: global_store_byte v[0:1], v0, off			; GFX9-NEXT: global_store_byte v[0:1], v0, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: global_store_dword v[0:1], v1, off			; GFX9-NEXT: global_store_dword v[0:1], v1, off
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:			; GFX10-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 27 Lines
	; GFX10-NEXT: global_store_byte v[0:1], v0, off			; GFX10-NEXT: global_store_byte v[0:1], v0, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: global_store_dword v[0:1], v1, off			; GFX10-NEXT: global_store_dword v[0:1], v1, off
	; GFX10-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:			; GFX11-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 24 Lines
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: global_store_b8 v[0:1], v0, off dlc			; GFX11-NEXT: global_store_b8 v[0:1], v0, off dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: global_store_b32 v[0:1], v1, off dlc			; GFX11-NEXT: global_store_b32 v[0:1], v1, off dlc
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:16 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:16 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_addk_i32 s32, 0xffe0			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_sret_struct_i8_i32_byval_struct_i8_i32:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 27 Lines
	; GFX10-SCRATCH-NEXT: global_store_byte v[0:1], v0, off			; GFX10-SCRATCH-NEXT: global_store_byte v[0:1], v0, off
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: global_store_dword v[0:1], v1, off			; GFX10-SCRATCH-NEXT: global_store_dword v[0:1], v1, off
	; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0			; GFX10-SCRATCH-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:16 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:16 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%in.val = alloca { i8, i32 }, align 4, addrspace(5)			%in.val = alloca { i8, i32 }, align 4, addrspace(5)
	%out.val = alloca { i8, i32 }, align 4, addrspace(5)			%out.val = alloca { i8, i32 }, align 4, addrspace(5)
	%in.gep0 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %in.val, i32 0, i32 0			%in.gep0 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %in.val, i32 0, i32 0
	%in.gep1 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %in.val, i32 0, i32 1			%in.gep1 = getelementptr inbounds { i8, i32 }, ptr addrspace(5) %in.val, i32 0, i32 1
	store i8 3, ptr addrspace(5) %in.gep0			store i8 3, ptr addrspace(5) %in.gep0
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_mov_b32_e32 v3, v18			; GFX9-NEXT: v_mov_b32_e32 v3, v18
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v16i8:			; GFX10-LABEL: test_call_external_void_func_v16i8:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 36 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v16i8:			; GFX11-LABEL: test_call_external_void_func_v16i8:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 31 Lines
	; GFX11-NEXT: v_dual_mov_b32 v1, v16 :: v_dual_mov_b32 v2, v17			; GFX11-NEXT: v_dual_mov_b32 v1, v16 :: v_dual_mov_b32 v2, v17
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i8:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i8:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 36 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(1), ptr addrspace(4) undef			%ptr = load ptr addrspace(1), ptr addrspace(4) undef
	%val = load <16 x i8>, ptr addrspace(1) %ptr			%val = load <16 x i8>, ptr addrspace(1) %ptr
	call amdgpu_gfx void @external_void_func_v16i8(<16 x i8> %val)			call amdgpu_gfx void @external_void_func_v16i8(<16 x i8> %val)
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_readlane_b32 s36, v40, 4			; GFX9-NEXT: v_readlane_b32 s36, v40, 4
	; GFX9-NEXT: v_readlane_b32 s35, v40, 3			; GFX9-NEXT: v_readlane_b32 s35, v40, 3
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1			; GFX9-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[4:5]			; GFX9-NEXT: s_mov_b64 exec, s[4:5]
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s6			; GFX9-NEXT: s_mov_b32 s33, s6
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: tail_call_byval_align16:			; GFX10-LABEL: tail_call_byval_align16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s6, s33			; GFX10-NEXT: s_mov_b32 s6, s33
	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_readlane_b32 s35, v40, 3			; GFX10-NEXT: v_readlane_b32 s35, v40, 3
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_or_saveexec_b32 s4, -1			; GFX10-NEXT: s_or_saveexec_b32 s4, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s4			; GFX10-NEXT: s_mov_b32 exec_lo, s4
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s6			; GFX10-NEXT: s_mov_b32 s33, s6
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: tail_call_byval_align16:			; GFX11-LABEL: tail_call_byval_align16:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s4, s33			; GFX11-NEXT: s_mov_b32 s4, s33
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_readlane_b32 s36, v40, 4			; GFX11-NEXT: v_readlane_b32 s36, v40, 4
	; GFX11-NEXT: v_readlane_b32 s35, v40, 3			; GFX11-NEXT: v_readlane_b32 s35, v40, 3
	; GFX11-NEXT: v_readlane_b32 s34, v40, 2			; GFX11-NEXT: v_readlane_b32 s34, v40, 2
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:24 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:24 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_addk_i32 s32, 0xffe0			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s4			; GFX11-NEXT: s_mov_b32 s33, s4
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: tail_call_byval_align16:			; GFX10-SCRATCH-LABEL: tail_call_byval_align16:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s4, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s4, s33
	▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s35, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s35, v40, 3
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s0, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:24 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:24 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s0
	; GFX10-SCRATCH-NEXT: s_addk_i32 s32, 0xffe0			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s4			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s4
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%alloca = alloca double, align 8, addrspace(5)			%alloca = alloca double, align 8, addrspace(5)
	tail call amdgpu_gfx void @byval_align16_f64_arg(<32 x i32> %val, ptr addrspace(5) byval(double) align 16 %alloca)			tail call amdgpu_gfx void @byval_align16_f64_arg(<32 x i32> %val, ptr addrspace(5) byval(double) align 16 %alloca)
	ret void			ret void
	}			}
	Show All 19 Lines
	; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32			; GFX9-NEXT: buffer_store_byte v0, off, s[0:3], s32
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i1_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i1_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 14 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i1_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_i1_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i1_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i1_inreg(i1 inreg true)			call amdgpu_gfx void @external_void_func_i1_inreg(i1 inreg true)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i8_imm_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i8_imm_inreg(i32) #0 {
	Show All 17 Lines
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 3			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i8_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i8_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 15 Lines
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 3			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i8_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_i8_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 3			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i8_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 15 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i8_inreg(i8 inreg 123)			call amdgpu_gfx void @external_void_func_i8_inreg(i8 inreg 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_i16_imm_inreg() #0 {
	Show All 17 Lines
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 3			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 15 Lines
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 3			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i16_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_i16_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 3			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 15 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i16_inreg(i16 inreg 123)			call amdgpu_gfx void @external_void_func_i16_inreg(i16 inreg 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i32_imm_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_i32_imm_inreg(i32) #0 {
	Show All 17 Lines
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 3			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 15 Lines
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 3			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_i32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 3			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 15 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i32_inreg(i32 inreg 42)			call amdgpu_gfx void @external_void_func_i32_inreg(i32 inreg 42)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_i64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_i64_imm_inreg() #0 {
	Show All 20 Lines
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 4			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_i64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_i64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 18 Lines
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 4			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_i64_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_i64_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 17 Lines
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 4			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_i64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 18 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_i64_inreg(i64 inreg 123)			call amdgpu_gfx void @external_void_func_i64_inreg(i64 inreg 123)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i64_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i64_inreg() #0 {
	Show All 24 Lines
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 6			; GFX9-NEXT: v_readlane_b32 s34, v40, 6
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i64_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 22 Lines
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 6			; GFX10-NEXT: v_readlane_b32 s34, v40, 6
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i64_inreg:			; GFX11-LABEL: test_call_external_void_func_v2i64_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 21 Lines
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 6			; GFX11-NEXT: v_readlane_b32 s0, v40, 6
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 22 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 6
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i64>, ptr addrspace(4) null			%val = load <2 x i64>, ptr addrspace(4) null
	call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg %val)			call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg %val)
	ret void			ret void
	}			}

	Show All 27 Lines
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 6			; GFX9-NEXT: v_readlane_b32 s34, v40, 6
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 24 Lines
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 6			; GFX10-NEXT: v_readlane_b32 s34, v40, 6
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i64_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v2i64_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 23 Lines
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 6			; GFX11-NEXT: v_readlane_b32 s0, v40, 6
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 24 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 6
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg <i64 8589934593, i64 17179869187>)			call amdgpu_gfx void @external_void_func_v2i64_inreg(<2 x i64> inreg <i64 8589934593, i64 17179869187>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i64_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i64_inreg() #0 {
	Show All 30 Lines
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 8			; GFX9-NEXT: v_readlane_b32 s34, v40, 8
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i64_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i64_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 28 Lines
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 8			; GFX10-NEXT: v_readlane_b32 s34, v40, 8
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i64_inreg:			; GFX11-LABEL: test_call_external_void_func_v3i64_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 27 Lines
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 8			; GFX11-NEXT: v_readlane_b32 s0, v40, 8
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i64_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 28 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 8			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 8
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%load = load <2 x i64>, ptr addrspace(4) null			%load = load <2 x i64>, ptr addrspace(4) null
	%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 undef>, <3 x i32> <i32 0, i32 1, i32 2>			%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 undef>, <3 x i32> <i32 0, i32 1, i32 2>

	call amdgpu_gfx void @external_void_func_v3i64_inreg(<3 x i64> inreg %val)			call amdgpu_gfx void @external_void_func_v3i64_inreg(<3 x i64> inreg %val)
	ret void			ret void
	Show All 39 Lines
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 10			; GFX9-NEXT: v_readlane_b32 s34, v40, 10
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i64_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i64_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 34 Lines
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 10			; GFX10-NEXT: v_readlane_b32 s34, v40, 10
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i64_inreg:			; GFX11-LABEL: test_call_external_void_func_v4i64_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 33 Lines
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 10			; GFX11-NEXT: v_readlane_b32 s0, v40, 10
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i64_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i64_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 34 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 10			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 10
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%load = load <2 x i64>, ptr addrspace(4) null			%load = load <2 x i64>, ptr addrspace(4) null
	%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%val = shufflevector <2 x i64> %load, <2 x i64> <i64 8589934593, i64 17179869187>, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	call amdgpu_gfx void @external_void_func_v4i64_inreg(<4 x i64> inreg %val)			call amdgpu_gfx void @external_void_func_v4i64_inreg(<4 x i64> inreg %val)
	ret void			ret void
	}			}
	Show All 19 Lines
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 3			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_f16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 15 Lines
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 3			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_f16_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_f16_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 3			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 15 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f16_inreg(half inreg 4.0)			call amdgpu_gfx void @external_void_func_f16_inreg(half inreg 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_f32_imm_inreg() #0 {
	Show All 17 Lines
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 3			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 15 Lines
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 3			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_f32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_f32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 3			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 15 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f32_inreg(float inreg 4.0)			call amdgpu_gfx void @external_void_func_f32_inreg(float inreg 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f32_imm_inreg() #0 {
	Show All 20 Lines
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 4			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 18 Lines
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 4			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2f32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v2f32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 17 Lines
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 4			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 18 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2f32_inreg(<2 x float> inreg <float 1.0, float 2.0>)			call amdgpu_gfx void @external_void_func_v2f32_inreg(<2 x float> inreg <float 1.0, float 2.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f32_imm_inreg() #0 {
	Show All 23 Lines
	; GFX9-NEXT: v_readlane_b32 s30, v40, 3			; GFX9-NEXT: v_readlane_b32 s30, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 5			; GFX9-NEXT: v_readlane_b32 s34, v40, 5
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 21 Lines
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 5			; GFX10-NEXT: v_readlane_b32 s34, v40, 5
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v3f32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 20 Lines
	; GFX11-NEXT: v_readlane_b32 s30, v40, 3			; GFX11-NEXT: v_readlane_b32 s30, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 5			; GFX11-NEXT: v_readlane_b32 s0, v40, 5
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 21 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 5
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f32_inreg(<3 x float> inreg <float 1.0, float 2.0, float 4.0>)			call amdgpu_gfx void @external_void_func_v3f32_inreg(<3 x float> inreg <float 1.0, float 2.0, float 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5f32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5f32_imm_inreg() #0 {
	Show All 29 Lines
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 7			; GFX9-NEXT: v_readlane_b32 s34, v40, 7
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5f32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v5f32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 27 Lines
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 7			; GFX10-NEXT: v_readlane_b32 s34, v40, 7
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v5f32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v5f32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 26 Lines
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 7			; GFX11-NEXT: v_readlane_b32 s0, v40, 7
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5f32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 27 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 7
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v5f32_inreg(<5 x float> inreg <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)			call amdgpu_gfx void @external_void_func_v5f32_inreg(<5 x float> inreg <float 1.0, float 2.0, float 4.0, float -1.0, float 0.5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_f64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_f64_imm_inreg() #0 {
	Show All 20 Lines
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 4			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_f64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_f64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 18 Lines
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 4			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_f64_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_f64_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 17 Lines
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 4			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_f64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 18 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_f64_inreg(double inreg 4.0)			call amdgpu_gfx void @external_void_func_f64_inreg(double inreg 4.0)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f64_imm_inreg() #0 {
	Show All 26 Lines
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 6			; GFX9-NEXT: v_readlane_b32 s34, v40, 6
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2f64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 24 Lines
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 6			; GFX10-NEXT: v_readlane_b32 s34, v40, 6
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2f64_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v2f64_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 23 Lines
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 6			; GFX11-NEXT: v_readlane_b32 s0, v40, 6
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 24 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 6
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2f64_inreg(<2 x double> inreg <double 2.0, double 4.0>)			call amdgpu_gfx void @external_void_func_v2f64_inreg(<2 x double> inreg <double 2.0, double 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f64_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f64_imm_inreg() #0 {
	Show All 32 Lines
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 8			; GFX9-NEXT: v_readlane_b32 s34, v40, 8
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f64_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f64_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 30 Lines
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 8			; GFX10-NEXT: v_readlane_b32 s34, v40, 8
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f64_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v3f64_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 29 Lines
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 8			; GFX11-NEXT: v_readlane_b32 s0, v40, 8
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f64_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f64_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 30 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 8			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 8
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f64_inreg(<3 x double> inreg <double 2.0, double 4.0, double 8.0>)			call amdgpu_gfx void @external_void_func_v3f64_inreg(<3 x double> inreg <double 2.0, double 4.0, double 8.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2i16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2i16_inreg() #0 {
	Show All 17 Lines
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 3			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i16_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 15 Lines
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 3			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i16_inreg:			; GFX11-LABEL: test_call_external_void_func_v2i16_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 3			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 15 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i16>, ptr addrspace(4) undef			%val = load <2 x i16>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v2i16_inreg(<2 x i16> inreg %val)			call amdgpu_gfx void @external_void_func_v2i16_inreg(<2 x i16> inreg %val)
	ret void			ret void
	}			}

	Show All 20 Lines
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 4			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 17 Lines
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 4			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i16_inreg:			; GFX11-LABEL: test_call_external_void_func_v3i16_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 16 Lines
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 4			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 17 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <3 x i16>, ptr addrspace(4) undef			%val = load <3 x i16>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg %val)			call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg %val)
	ret void			ret void
	}			}

	Show All 20 Lines
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 4			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 17 Lines
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 4			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f16_inreg:			; GFX11-LABEL: test_call_external_void_func_v3f16_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 16 Lines
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 4			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 17 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <3 x half>, ptr addrspace(4) undef			%val = load <3 x half>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg %val)			call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg %val)
	ret void			ret void
	}			}

	Show All 21 Lines
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 4			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 18 Lines
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 4			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i16_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v3i16_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 17 Lines
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 4			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 18 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg <i16 1, i16 2, i16 3>)			call amdgpu_gfx void @external_void_func_v3i16_inreg(<3 x i16> inreg <i16 1, i16 2, i16 3>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3f16_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v3f16_imm_inreg() #0 {
	Show All 20 Lines
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 4			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3f16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3f16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 18 Lines
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 4			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3f16_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v3f16_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 17 Lines
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 4			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3f16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 18 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg <half 1.0, half 2.0, half 4.0>)			call amdgpu_gfx void @external_void_func_v3f16_inreg(<3 x half> inreg <half 1.0, half 2.0, half 4.0>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i16_inreg() #0 {
	Show All 19 Lines
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 4			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 17 Lines
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 4			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i16_inreg:			; GFX11-LABEL: test_call_external_void_func_v4i16_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 16 Lines
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 4			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 17 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <4 x i16>, ptr addrspace(4) undef			%val = load <4 x i16>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg %val)			call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg %val)
	ret void			ret void
	}			}

	Show All 21 Lines
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 4			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i16_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i16_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 18 Lines
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 4			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i16_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v4i16_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 17 Lines
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 4			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i16_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 18 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg <i16 1, i16 2, i16 3, i16 4>)			call amdgpu_gfx void @external_void_func_v4i16_inreg(<4 x i16> inreg <i16 1, i16 2, i16 3, i16 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v2f16_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v2f16_inreg() #0 {
	Show All 17 Lines
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 3			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2f16_inreg:			; GFX10-LABEL: test_call_external_void_func_v2f16_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 15 Lines
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 3			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2f16_inreg:			; GFX11-LABEL: test_call_external_void_func_v2f16_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 14 Lines
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 3			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2f16_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 15 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 3
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x half>, ptr addrspace(4) undef			%val = load <2 x half>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v2f16_inreg(<2 x half> inreg %val)			call amdgpu_gfx void @external_void_func_v2f16_inreg(<2 x half> inreg %val)
	ret void			ret void
	}			}

	Show All 20 Lines
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 4			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 17 Lines
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 4			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v2i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 16 Lines
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 4			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 17 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <2 x i32>, ptr addrspace(4) undef			%val = load <2 x i32>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg %val)
	ret void			ret void
	}			}

	Show All 21 Lines
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 4			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v2i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v2i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 18 Lines
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 4			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v2i32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v2i32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 17 Lines
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 4			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v2i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 18 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 4
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg <i32 1, i32 2>)			call amdgpu_gfx void @external_void_func_v2i32_inreg(<2 x i32> inreg <i32 1, i32 2>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_imm_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_imm_inreg(i32) #0 {
	Show All 23 Lines
	; GFX9-NEXT: v_readlane_b32 s30, v40, 3			; GFX9-NEXT: v_readlane_b32 s30, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 5			; GFX9-NEXT: v_readlane_b32 s34, v40, 5
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 21 Lines
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 5			; GFX10-NEXT: v_readlane_b32 s34, v40, 5
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v3i32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 20 Lines
	; GFX11-NEXT: v_readlane_b32 s30, v40, 3			; GFX11-NEXT: v_readlane_b32 s30, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 5			; GFX11-NEXT: v_readlane_b32 s0, v40, 5
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 21 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 5			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 5
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v3i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v3i32_i32_inreg(i32) #0 {			define amdgpu_gfx void @test_call_external_void_func_v3i32_i32_inreg(i32) #0 {
	Show All 26 Lines
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 6			; GFX9-NEXT: v_readlane_b32 s34, v40, 6
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v3i32_i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v3i32_i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 24 Lines
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 6			; GFX10-NEXT: v_readlane_b32 s34, v40, 6
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v3i32_i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v3i32_i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 23 Lines
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 6			; GFX11-NEXT: v_readlane_b32 s0, v40, 6
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v3i32_i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 24 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 6
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v3i32_i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>, i32 inreg 6)			call amdgpu_gfx void @external_void_func_v3i32_i32_inreg(<3 x i32> inreg <i32 3, i32 4, i32 5>, i32 inreg 6)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v4i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v4i32_inreg() #0 {
	Show All 23 Lines
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 6			; GFX9-NEXT: v_readlane_b32 s34, v40, 6
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 21 Lines
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 6			; GFX10-NEXT: v_readlane_b32 s34, v40, 6
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v4i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 20 Lines
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 6			; GFX11-NEXT: v_readlane_b32 s0, v40, 6
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 21 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 6
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%val = load <4 x i32>, ptr addrspace(4) undef			%val = load <4 x i32>, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg %val)
	ret void			ret void
	}			}

	Show All 27 Lines
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 6			; GFX9-NEXT: v_readlane_b32 s34, v40, 6
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v4i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v4i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 24 Lines
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 6			; GFX10-NEXT: v_readlane_b32 s34, v40, 6
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v4i32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v4i32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 23 Lines
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 6			; GFX11-NEXT: v_readlane_b32 s0, v40, 6
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v4i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 24 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 6			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 6
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg <i32 1, i32 2, i32 3, i32 4>)			call amdgpu_gfx void @external_void_func_v4i32_inreg(<4 x i32> inreg <i32 1, i32 2, i32 3, i32 4>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v5i32_imm_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v5i32_imm_inreg() #0 {
	Show All 29 Lines
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 7			; GFX9-NEXT: v_readlane_b32 s34, v40, 7
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v5i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v5i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 27 Lines
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 7			; GFX10-NEXT: v_readlane_b32 s34, v40, 7
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v5i32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v5i32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 26 Lines
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 7			; GFX11-NEXT: v_readlane_b32 s0, v40, 7
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v5i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 27 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 7			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 7
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v5i32_inreg(<5 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5>)			call amdgpu_gfx void @external_void_func_v5i32_inreg(<5 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v8i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v8i32_inreg() #0 {
	Show All 33 Lines
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 10			; GFX9-NEXT: v_readlane_b32 s34, v40, 10
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v8i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 31 Lines
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 10			; GFX10-NEXT: v_readlane_b32 s34, v40, 10
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v8i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v8i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 30 Lines
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 10			; GFX11-NEXT: v_readlane_b32 s0, v40, 10
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 31 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 10			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 10
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(4), ptr addrspace(4) undef			%ptr = load ptr addrspace(4), ptr addrspace(4) undef
	%val = load <8 x i32>, ptr addrspace(4) %ptr			%val = load <8 x i32>, ptr addrspace(4) %ptr
	call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg %val)
	ret void			ret void
	}			}
	Show All 40 Lines
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 10			; GFX9-NEXT: v_readlane_b32 s34, v40, 10
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v8i32_imm_inreg:			; GFX10-LABEL: test_call_external_void_func_v8i32_imm_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 36 Lines
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 10			; GFX10-NEXT: v_readlane_b32 s34, v40, 10
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v8i32_imm_inreg:			; GFX11-LABEL: test_call_external_void_func_v8i32_imm_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 35 Lines
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 10			; GFX11-NEXT: v_readlane_b32 s0, v40, 10
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_imm_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v8i32_imm_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 36 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 10			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 10
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)			call amdgpu_gfx void @external_void_func_v8i32_inreg(<8 x i32> inreg <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>)
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_external_void_func_v16i32_inreg() #0 {			define amdgpu_gfx void @test_call_external_void_func_v16i32_inreg() #0 {
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 18			; GFX9-NEXT: v_readlane_b32 s34, v40, 18
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v16i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v16i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 18			; GFX10-NEXT: v_readlane_b32 s34, v40, 18
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v16i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v16i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 18			; GFX11-NEXT: v_readlane_b32 s0, v40, 18
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v16i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 18			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 18
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(4), ptr addrspace(4) undef			%ptr = load ptr addrspace(4), ptr addrspace(4) undef
	%val = load <16 x i32>, ptr addrspace(4) %ptr			%val = load <16 x i32>, ptr addrspace(4) %ptr
	call amdgpu_gfx void @external_void_func_v16i32_inreg(<16 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v16i32_inreg(<16 x i32> inreg %val)
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 28			; GFX9-NEXT: v_readlane_b32 s34, v40, 28
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v32i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 28			; GFX10-NEXT: v_readlane_b32 s34, v40, 28
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v32i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v32i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 28			; GFX11-NEXT: v_readlane_b32 s0, v40, 28
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 28			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 28
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr = load ptr addrspace(4), ptr addrspace(4) undef			%ptr = load ptr addrspace(4), ptr addrspace(4) undef
	%val = load <32 x i32>, ptr addrspace(4) %ptr			%val = load <32 x i32>, ptr addrspace(4) %ptr
	call amdgpu_gfx void @external_void_func_v32i32_inreg(<32 x i32> inreg %val)			call amdgpu_gfx void @external_void_func_v32i32_inreg(<32 x i32> inreg %val)
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_readlane_b32 s7, v40, 3			; GFX9-NEXT: v_readlane_b32 s7, v40, 3
	; GFX9-NEXT: v_readlane_b32 s6, v40, 2			; GFX9-NEXT: v_readlane_b32 s6, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 28			; GFX9-NEXT: v_readlane_b32 s34, v40, 28
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_v32i32_i32_inreg:			; GFX10-LABEL: test_call_external_void_func_v32i32_i32_inreg:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 28			; GFX10-NEXT: v_readlane_b32 s34, v40, 28
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_v32i32_i32_inreg:			; GFX11-LABEL: test_call_external_void_func_v32i32_i32_inreg:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: v_readlane_b32 s7, v40, 3			; GFX11-NEXT: v_readlane_b32 s7, v40, 3
	; GFX11-NEXT: v_readlane_b32 s6, v40, 2			; GFX11-NEXT: v_readlane_b32 s6, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 28			; GFX11-NEXT: v_readlane_b32 s0, v40, 28
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32_inreg:			; GFX10-SCRATCH-LABEL: test_call_external_void_func_v32i32_i32_inreg:
	; GFX10-SCRATCH: ; %bb.0:			; GFX10-SCRATCH: ; %bb.0:
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s6, v40, 2
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 28			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 28
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	%ptr0 = load ptr addrspace(4), ptr addrspace(4) undef			%ptr0 = load ptr addrspace(4), ptr addrspace(4) undef
	%val0 = load <32 x i32>, ptr addrspace(4) %ptr0			%val0 = load <32 x i32>, ptr addrspace(4) %ptr0
	%val1 = load i32, ptr addrspace(4) undef			%val1 = load i32, ptr addrspace(4) undef
	call amdgpu_gfx void @external_void_func_v32i32_i32_inreg(<32 x i32> inreg %val0, i32 inreg %val1)			call amdgpu_gfx void @external_void_func_v32i32_i32_inreg(<32 x i32> inreg %val0, i32 inreg %val1)
	ret void			ret void
	Show All 23 Lines
	; GFX9-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4			; GFX9-NEXT: buffer_store_dword v33, off, s[0:3], s32 offset:4
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: stack_passed_arg_alignment_v32i32_f64:			; GFX10-LABEL: stack_passed_arg_alignment_v32i32_f64:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 19 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_passed_arg_alignment_v32i32_f64:			; GFX11-LABEL: stack_passed_arg_alignment_v32i32_f64:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 13 Lines
	; GFX11-NEXT: scratch_store_b64 off, v[32:33], s32			; GFX11-NEXT: scratch_store_b64 off, v[32:33], s32
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:8 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:8 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: stack_passed_arg_alignment_v32i32_f64:			; GFX10-SCRATCH-LABEL: stack_passed_arg_alignment_v32i32_f64:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	Show All 15 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:8 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 offset:8 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx void @stack_passed_f64_arg(<32 x i32> %val, double %tmp)			call amdgpu_gfx void @stack_passed_f64_arg(<32 x i32> %val, double %tmp)
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_12xv3i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_12xv3i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: stack_12xv3i32:			; GFX10-LABEL: stack_12xv3i32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_12xv3i32:			; GFX11-LABEL: stack_12xv3i32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 30 Lines
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: stack_12xv3i32:			; GFX10-SCRATCH-LABEL: stack_12xv3i32:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx void @external_void_func_12xv3i32(			call amdgpu_gfx void @external_void_func_12xv3i32(
	<3 x i32><i32 0, i32 0, i32 0>,			<3 x i32><i32 0, i32 0, i32 0>,
	<3 x i32><i32 1, i32 1, i32 1>,			<3 x i32><i32 1, i32 1, i32 1>,
	<3 x i32><i32 2, i32 2, i32 2>,			<3 x i32><i32 2, i32 2, i32 2>,
	▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5i32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5i32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: stack_8xv5i32:			; GFX10-LABEL: stack_8xv5i32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_8xv5i32:			; GFX11-LABEL: stack_8xv5i32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 34 Lines
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: stack_8xv5i32:			; GFX10-SCRATCH-LABEL: stack_8xv5i32:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx void @external_void_func_8xv5i32(			call amdgpu_gfx void @external_void_func_8xv5i32(
	<5 x i32><i32 0, i32 0, i32 0, i32 0, i32 0>,			<5 x i32><i32 0, i32 0, i32 0, i32 0, i32 0>,
	<5 x i32><i32 1, i32 1, i32 1, i32 1, i32 1>,			<5 x i32><i32 1, i32 1, i32 1, i32 1, i32 1>,
	<5 x i32><i32 2, i32 2, i32 2, i32 2, i32 2>,			<5 x i32><i32 2, i32 2, i32 2, i32 2, i32 2>,
	▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5f32@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, external_void_func_8xv5f32@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: stack_8xv5f32:			; GFX10-LABEL: stack_8xv5f32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: stack_8xv5f32:			; GFX11-LABEL: stack_8xv5f32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 40 Lines
	; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-SCRATCH-LABEL: stack_8xv5f32:			; GFX10-SCRATCH-LABEL: stack_8xv5f32:
	; GFX10-SCRATCH: ; %bb.0: ; %entry			; GFX10-SCRATCH: ; %bb.0: ; %entry
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33			; GFX10-SCRATCH-NEXT: s_mov_b32 s0, s33
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX10-SCRATCH-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-SCRATCH-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-SCRATCH-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2			; GFX10-SCRATCH-NEXT: v_readlane_b32 s0, v40, 2
	; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1			; GFX10-SCRATCH-NEXT: s_or_saveexec_b32 s1, -1
	; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload			; GFX10-SCRATCH-NEXT: scratch_load_dword v40, off, s33 ; 4-byte Folded Reload
	; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-SCRATCH-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1			; GFX10-SCRATCH-NEXT: s_mov_b32 exec_lo, s1
	; GFX10-SCRATCH-NEXT: s_add_i32 s32, s32, -16			; GFX10-SCRATCH-NEXT: s_mov_b32 s32, s33
	; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0			; GFX10-SCRATCH-NEXT: s_mov_b32 s33, s0
	; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)			; GFX10-SCRATCH-NEXT: s_waitcnt vmcnt(0)
	; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]			; GFX10-SCRATCH-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx void @external_void_func_8xv5f32(			call amdgpu_gfx void @external_void_func_8xv5f32(
	<5 x float><float 0.0, float 0.0, float 0.0, float 0.0, float 0.0>,			<5 x float><float 0.0, float 0.0, float 0.0, float 0.0, float 0.0>,
	<5 x float><float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>,			<5 x float><float 1.0, float 1.0, float 1.0, float 1.0, float 1.0>,
	<5 x float><float 2.0, float 2.0, float 2.0, float 2.0, float 2.0>,			<5 x float><float 2.0, float 2.0, float 2.0, float 2.0, float 2.0>,
	Show All 20 Lines

llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll

	Show All 29 Lines
	; GFX9-NEXT: v_readlane_b32 s31, v40, 3			; GFX9-NEXT: v_readlane_b32 s31, v40, 3
	; GFX9-NEXT: v_readlane_b32 s30, v40, 2			; GFX9-NEXT: v_readlane_b32 s30, v40, 2
	; GFX9-NEXT: v_readlane_b32 s5, v40, 1			; GFX9-NEXT: v_readlane_b32 s5, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 4			; GFX9-NEXT: v_readlane_b32 s34, v40, 4
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:			; GFX10-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 19 Lines
	; GFX10-NEXT: v_readlane_b32 s30, v40, 2			; GFX10-NEXT: v_readlane_b32 s30, v40, 2
	; GFX10-NEXT: v_readlane_b32 s5, v40, 1			; GFX10-NEXT: v_readlane_b32 s5, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 4			; GFX10-NEXT: v_readlane_b32 s34, v40, 4
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:			; GFX11-LABEL: test_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 18 Lines
	; GFX11-NEXT: v_readlane_b32 s31, v40, 3			; GFX11-NEXT: v_readlane_b32 s31, v40, 3
	; GFX11-NEXT: v_readlane_b32 s30, v40, 2			; GFX11-NEXT: v_readlane_b32 s30, v40, 2
	; GFX11-NEXT: v_readlane_b32 s5, v40, 1			; GFX11-NEXT: v_readlane_b32 s5, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 4			; GFX11-NEXT: v_readlane_b32 s0, v40, 4
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "", ""() #0			call void asm sideeffect "", ""() #0
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 3			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_mayclobber_s31:			; GFX10-LABEL: test_call_void_func_void_mayclobber_s31:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 22 Lines
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 3			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_mayclobber_s31:			; GFX11-LABEL: test_call_void_func_void_mayclobber_s31:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 20 Lines
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 3			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%s31 = call i32 asm sideeffect "; def $0", "={s31}"()			%s31 = call i32 asm sideeffect "; def $0", "={s31}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{s31}"(i32 %s31)			call void asm sideeffect "; use $0", "{s31}"(i32 %s31)
	ret void			ret void
	}			}
	Show All 26 Lines
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_mayclobber_v31:			; GFX10-LABEL: test_call_void_func_void_mayclobber_v31:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 22 Lines
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_mayclobber_v31:			; GFX11-LABEL: test_call_void_func_void_mayclobber_v31:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 20 Lines
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v41, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:4 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:4 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%v31 = call i32 asm sideeffect "; def $0", "={v31}"()			%v31 = call i32 asm sideeffect "; def $0", "={v31}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{v31}"(i32 %v31)			call void asm sideeffect "; use $0", "{v31}"(i32 %v31)
	ret void			ret void
	}			}
	Show All 27 Lines
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 3			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_preserves_s33:			; GFX10-LABEL: test_call_void_func_void_preserves_s33:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 22 Lines
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 3			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_preserves_s33:			; GFX11-LABEL: test_call_void_func_void_preserves_s33:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 21 Lines
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 3			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%s33 = call i32 asm sideeffect "; def $0", "={s33}"()			%s33 = call i32 asm sideeffect "; def $0", "={s33}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{s33}"(i32 %s33)			call void asm sideeffect "; use $0", "{s33}"(i32 %s33)
	ret void			ret void
	}			}
	Show All 26 Lines
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 3			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_preserves_s34:			; GFX10-LABEL: test_call_void_func_void_preserves_s34:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 22 Lines
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 3			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_preserves_s34:			; GFX11-LABEL: test_call_void_func_void_preserves_s34:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 21 Lines
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 3			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%s34 = call i32 asm sideeffect "; def $0", "={s34}"()			%s34 = call i32 asm sideeffect "; def $0", "={s34}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{s34}"(i32 %s34)			call void asm sideeffect "; use $0", "{s34}"(i32 %s34)
	ret void			ret void
	}			}
	Show All 24 Lines
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v41, 1			; GFX9-NEXT: v_readlane_b32 s31, v41, 1
	; GFX9-NEXT: v_readlane_b32 s30, v41, 0			; GFX9-NEXT: v_readlane_b32 s30, v41, 0
	; GFX9-NEXT: v_readlane_b32 s34, v41, 2			; GFX9-NEXT: v_readlane_b32 s34, v41, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_preserves_v40:			; GFX10-LABEL: test_call_void_func_void_preserves_v40:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 20 Lines
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: v_readlane_b32 s31, v41, 1			; GFX10-NEXT: v_readlane_b32 s31, v41, 1
	; GFX10-NEXT: v_readlane_b32 s30, v41, 0			; GFX10-NEXT: v_readlane_b32 s30, v41, 0
	; GFX10-NEXT: v_readlane_b32 s34, v41, 2			; GFX10-NEXT: v_readlane_b32 s34, v41, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_preserves_v40:			; GFX11-LABEL: test_call_void_func_void_preserves_v40:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 18 Lines
	; GFX11-NEXT: ;;#ASMEND			; GFX11-NEXT: ;;#ASMEND
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: v_readlane_b32 s31, v41, 1			; GFX11-NEXT: v_readlane_b32 s31, v41, 1
	; GFX11-NEXT: v_readlane_b32 s30, v41, 0			; GFX11-NEXT: v_readlane_b32 s30, v41, 0
	; GFX11-NEXT: v_readlane_b32 s0, v41, 2			; GFX11-NEXT: v_readlane_b32 s0, v41, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:4 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%v40 = call i32 asm sideeffect "; def $0", "={v40}"()			%v40 = call i32 asm sideeffect "; def $0", "={v40}"()
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "{v40}"(i32 %v40)			call void asm sideeffect "; use $0", "{v40}"(i32 %v40)
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s33@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_clobber_s33:			; GFX10-LABEL: test_call_void_func_void_clobber_s33:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 12 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_clobber_s33:			; GFX11-LABEL: test_call_void_func_void_clobber_s33:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 11 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @void_func_void_clobber_s33()			call amdgpu_gfx void @void_func_void_clobber_s33()
	ret void			ret void
	}			}

	define amdgpu_gfx void @test_call_void_func_void_clobber_s34() #0 {			define amdgpu_gfx void @test_call_void_func_void_clobber_s34() #0 {
	Show All 14 Lines
	; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s35, s35, void_func_void_clobber_s34@rel32@hi+12
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: test_call_void_func_void_clobber_s34:			; GFX10-LABEL: test_call_void_func_void_clobber_s34:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 12 Lines
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 2			; GFX10-NEXT: v_readlane_b32 s34, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: test_call_void_func_void_clobber_s34:			; GFX11-LABEL: test_call_void_func_void_clobber_s34:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 11 Lines
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void @void_func_void_clobber_s34()			call amdgpu_gfx void @void_func_void_clobber_s34()
	ret void			ret void
	}			}

	define amdgpu_gfx void @callee_saved_sgpr_kernel() #1 {			define amdgpu_gfx void @callee_saved_sgpr_kernel() #1 {
	Show All 23 Lines
	; GFX9-NEXT: ;;#ASMEND			; GFX9-NEXT: ;;#ASMEND
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 3			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: callee_saved_sgpr_kernel:			; GFX10-LABEL: callee_saved_sgpr_kernel:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 21 Lines
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 3			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: callee_saved_sgpr_kernel:			; GFX11-LABEL: callee_saved_sgpr_kernel:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 20 Lines
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 3			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0			%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "s"(i32 %s40) #0			call void asm sideeffect "; use $0", "s"(i32 %s40) #0
	ret void			ret void
	}			}
	Show All 34 Lines
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 2			; GFX9-NEXT: v_readlane_b32 s31, v40, 2
	; GFX9-NEXT: v_readlane_b32 s30, v40, 1			; GFX9-NEXT: v_readlane_b32 s30, v40, 1
	; GFX9-NEXT: v_readlane_b32 s4, v40, 0			; GFX9-NEXT: v_readlane_b32 s4, v40, 0
	; GFX9-NEXT: v_readlane_b32 s34, v40, 3			; GFX9-NEXT: v_readlane_b32 s34, v40, 3
	; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1			; GFX9-NEXT: s_or_saveexec_b64 s[36:37], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[36:37]			; GFX9-NEXT: s_mov_b64 exec, s[36:37]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s34			; GFX9-NEXT: s_mov_b32 s33, s34
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: callee_saved_sgpr_vgpr_kernel:			; GFX10-LABEL: callee_saved_sgpr_vgpr_kernel:
	; GFX10: ; %bb.0:			; GFX10: ; %bb.0:
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s34, s33			; GFX10-NEXT: s_mov_b32 s34, s33
	Show All 30 Lines
	; GFX10-NEXT: v_readlane_b32 s31, v40, 2			; GFX10-NEXT: v_readlane_b32 s31, v40, 2
	; GFX10-NEXT: v_readlane_b32 s30, v40, 1			; GFX10-NEXT: v_readlane_b32 s30, v40, 1
	; GFX10-NEXT: v_readlane_b32 s4, v40, 0			; GFX10-NEXT: v_readlane_b32 s4, v40, 0
	; GFX10-NEXT: v_readlane_b32 s34, v40, 3			; GFX10-NEXT: v_readlane_b32 s34, v40, 3
	; GFX10-NEXT: s_or_saveexec_b32 s35, -1			; GFX10-NEXT: s_or_saveexec_b32 s35, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s35			; GFX10-NEXT: s_mov_b32 exec_lo, s35
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s34			; GFX10-NEXT: s_mov_b32 s33, s34
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: callee_saved_sgpr_vgpr_kernel:			; GFX11-LABEL: callee_saved_sgpr_vgpr_kernel:
	; GFX11: ; %bb.0:			; GFX11: ; %bb.0:
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 28 Lines
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v41, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: v_readlane_b32 s31, v40, 2			; GFX11-NEXT: v_readlane_b32 s31, v40, 2
	; GFX11-NEXT: v_readlane_b32 s30, v40, 1			; GFX11-NEXT: v_readlane_b32 s30, v40, 1
	; GFX11-NEXT: v_readlane_b32 s4, v40, 0			; GFX11-NEXT: v_readlane_b32 s4, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 3			; GFX11-NEXT: v_readlane_b32 s0, v40, 3
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:4 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:4 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0			%s40 = call i32 asm sideeffect "; def s40", "={s40}"() #0
	%v32 = call i32 asm sideeffect "; def v32", "={v32}"() #0			%v32 = call i32 asm sideeffect "; def v32", "={v32}"() #0
	call amdgpu_gfx void @external_void_func_void()			call amdgpu_gfx void @external_void_func_void()
	call void asm sideeffect "; use $0", "s"(i32 %s40) #0			call void asm sideeffect "; use $0", "s"(i32 %s40) #0
	call void asm sideeffect "; use $0", "v"(i32 %v32) #0			call void asm sideeffect "; use $0", "v"(i32 %v32) #0
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind noinline }			attributes #1 = { nounwind noinline }

llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll

	Show All 36 Lines
	; GFX9-NEXT: v_writelane_b32 v1, s31, 1			; GFX9-NEXT: v_writelane_b32 v1, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v1, 1			; GFX9-NEXT: v_readlane_b32 s31, v1, 1
	; GFX9-NEXT: v_readlane_b32 s30, v1, 0			; GFX9-NEXT: v_readlane_b32 s30, v1, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s36			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_i1:			; GFX10-LABEL: call_i1:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s36, s33			; GFX10-NEXT: s_mov_b32 s36, s33
	Show All 12 Lines
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v1, 1			; GFX10-NEXT: v_readlane_b32 s31, v1, 1
	; GFX10-NEXT: v_readlane_b32 s30, v1, 0			; GFX10-NEXT: v_readlane_b32 s30, v1, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s34, -1			; GFX10-NEXT: s_xor_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_i1:			; GFX11-LABEL: call_i1:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s2, s33			; GFX11-NEXT: s_mov_b32 s2, s33
	Show All 11 Lines
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v1, 1			; GFX11-NEXT: v_readlane_b32 s31, v1, 1
	; GFX11-NEXT: v_readlane_b32 s30, v1, 0			; GFX11-NEXT: v_readlane_b32 s30, v1, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v1, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v1, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s2			; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx i1 @return_i1()			call amdgpu_gfx i1 @return_i1()
	ret void			ret void
	}			}

	Show All 31 Lines
	; GFX9-NEXT: v_writelane_b32 v1, s31, 1			; GFX9-NEXT: v_writelane_b32 v1, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v1, 1			; GFX9-NEXT: v_readlane_b32 s31, v1, 1
	; GFX9-NEXT: v_readlane_b32 s30, v1, 0			; GFX9-NEXT: v_readlane_b32 s30, v1, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s36			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_i16:			; GFX10-LABEL: call_i16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s36, s33			; GFX10-NEXT: s_mov_b32 s36, s33
	Show All 12 Lines
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v1, 1			; GFX10-NEXT: v_readlane_b32 s31, v1, 1
	; GFX10-NEXT: v_readlane_b32 s30, v1, 0			; GFX10-NEXT: v_readlane_b32 s30, v1, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s34, -1			; GFX10-NEXT: s_xor_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_i16:			; GFX11-LABEL: call_i16:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s2, s33			; GFX11-NEXT: s_mov_b32 s2, s33
	Show All 11 Lines
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v1, 1			; GFX11-NEXT: v_readlane_b32 s31, v1, 1
	; GFX11-NEXT: v_readlane_b32 s30, v1, 0			; GFX11-NEXT: v_readlane_b32 s30, v1, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v1, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v1, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s2			; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx i16 @return_i16()			call amdgpu_gfx i16 @return_i16()
	ret void			ret void
	}			}

	Show All 31 Lines
	; GFX9-NEXT: v_writelane_b32 v1, s31, 1			; GFX9-NEXT: v_writelane_b32 v1, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v1, 1			; GFX9-NEXT: v_readlane_b32 s31, v1, 1
	; GFX9-NEXT: v_readlane_b32 s30, v1, 0			; GFX9-NEXT: v_readlane_b32 s30, v1, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s36			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_2xi16:			; GFX10-LABEL: call_2xi16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s36, s33			; GFX10-NEXT: s_mov_b32 s36, s33
	Show All 12 Lines
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v1, 1			; GFX10-NEXT: v_readlane_b32 s31, v1, 1
	; GFX10-NEXT: v_readlane_b32 s30, v1, 0			; GFX10-NEXT: v_readlane_b32 s30, v1, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s34, -1			; GFX10-NEXT: s_xor_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_2xi16:			; GFX11-LABEL: call_2xi16:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s2, s33			; GFX11-NEXT: s_mov_b32 s2, s33
	Show All 11 Lines
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v1, 1			; GFX11-NEXT: v_readlane_b32 s31, v1, 1
	; GFX11-NEXT: v_readlane_b32 s30, v1, 0			; GFX11-NEXT: v_readlane_b32 s30, v1, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v1, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v1, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s2			; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx <2 x i16> @return_2xi16()			call amdgpu_gfx <2 x i16> @return_2xi16()
	ret void			ret void
	}			}

	Show All 39 Lines
	; GFX9-NEXT: v_writelane_b32 v2, s31, 1			; GFX9-NEXT: v_writelane_b32 v2, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v2, 1			; GFX9-NEXT: v_readlane_b32 s31, v2, 1
	; GFX9-NEXT: v_readlane_b32 s30, v2, 0			; GFX9-NEXT: v_readlane_b32 s30, v2, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s36			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_3xi16:			; GFX10-LABEL: call_3xi16:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s36, s33			; GFX10-NEXT: s_mov_b32 s36, s33
	Show All 12 Lines
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v2, 1			; GFX10-NEXT: v_readlane_b32 s31, v2, 1
	; GFX10-NEXT: v_readlane_b32 s30, v2, 0			; GFX10-NEXT: v_readlane_b32 s30, v2, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s34, -1			; GFX10-NEXT: s_xor_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_addk_i32 s32, 0xfe00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_3xi16:			; GFX11-LABEL: call_3xi16:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s2, s33			; GFX11-NEXT: s_mov_b32 s2, s33
	Show All 11 Lines
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v2, 1			; GFX11-NEXT: v_readlane_b32 s31, v2, 1
	; GFX11-NEXT: v_readlane_b32 s30, v2, 0			; GFX11-NEXT: v_readlane_b32 s30, v2, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v2, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v2, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s2			; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx <3 x i16> @return_3xi16()			call amdgpu_gfx <3 x i16> @return_3xi16()
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 349 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:116 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:116 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:120 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:120 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:124 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:124 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v100, 1			; GFX9-NEXT: v_readlane_b32 s31, v100, 1
	; GFX9-NEXT: v_readlane_b32 s30, v100, 0			; GFX9-NEXT: v_readlane_b32 s30, v100, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v100, off, s[0:3], s33 offset:128 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v100, off, s[0:3], s33 offset:128 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_addk_i32 s32, 0xdc00			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s36			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_100xi32:			; GFX10-LABEL: call_100xi32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s36, s33			; GFX10-NEXT: s_mov_b32 s36, s33
	▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:120			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:120
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:124			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:124
	; GFX10-NEXT: v_readlane_b32 s31, v100, 1			; GFX10-NEXT: v_readlane_b32 s31, v100, 1
	; GFX10-NEXT: v_readlane_b32 s30, v100, 0			; GFX10-NEXT: v_readlane_b32 s30, v100, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s34, -1			; GFX10-NEXT: s_xor_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v100, off, s[0:3], s33 offset:128 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v100, off, s[0:3], s33 offset:128 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_addk_i32 s32, 0xee00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_100xi32:			; GFX11-LABEL: call_100xi32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s2, s33			; GFX11-NEXT: s_mov_b32 s2, s33
	▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:116			; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:116
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:120			; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:120
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:124			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:124
	; GFX11-NEXT: v_readlane_b32 s31, v100, 1			; GFX11-NEXT: v_readlane_b32 s31, v100, 1
	; GFX11-NEXT: v_readlane_b32 s30, v100, 0			; GFX11-NEXT: v_readlane_b32 s30, v100, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v100, off, s33 offset:128 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v100, off, s33 offset:128 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_addk_i32 s32, 0xff70			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s2			; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx <100 x i32> @return_100xi32()			call amdgpu_gfx <100 x i32> @return_100xi32()
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 1,329 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_writelane_b32 v2, s31, 1			; GFX9-NEXT: v_writelane_b32 v2, s31, 1
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX9-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX9-NEXT: v_readlane_b32 s31, v2, 1			; GFX9-NEXT: v_readlane_b32 s31, v2, 1
	; GFX9-NEXT: v_readlane_b32 s30, v2, 0			; GFX9-NEXT: v_readlane_b32 s30, v2, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:2048 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:2048 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_add_i32 s32, s32, 0xfffa0000			; GFX9-NEXT: s_add_i32 s32, s33, 0xfffe0040
	; GFX9-NEXT: s_mov_b32 s33, s36			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_512xi32:			; GFX10-LABEL: call_512xi32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s36, s33			; GFX10-NEXT: s_mov_b32 s36, s33
	Show All 14 Lines
	; GFX10-NEXT: s_waitcnt lgkmcnt(0)			; GFX10-NEXT: s_waitcnt lgkmcnt(0)
	; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GFX10-NEXT: s_swappc_b64 s[30:31], s[34:35]
	; GFX10-NEXT: v_readlane_b32 s31, v2, 1			; GFX10-NEXT: v_readlane_b32 s31, v2, 1
	; GFX10-NEXT: v_readlane_b32 s30, v2, 0			; GFX10-NEXT: v_readlane_b32 s30, v2, 0
	; GFX10-NEXT: s_xor_saveexec_b32 s34, -1			; GFX10-NEXT: s_xor_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:2048 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:2048 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_add_i32 s32, s32, 0xfffd0000			; GFX10-NEXT: s_add_i32 s32, s33, 0xffff0020
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_512xi32:			; GFX11-LABEL: call_512xi32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s34, s33			; GFX11-NEXT: s_mov_b32 s34, s33
	Show All 14 Lines
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v5, 1			; GFX11-NEXT: v_readlane_b32 s31, v5, 1
	; GFX11-NEXT: v_readlane_b32 s30, v5, 0			; GFX11-NEXT: v_readlane_b32 s30, v5, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v5, off, s33 offset:2048 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v5, off, s33 offset:2048 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_addk_i32 s32, 0xe800			; GFX11-NEXT: s_add_i32 s32, s33, 0xfffff801
	; GFX11-NEXT: s_mov_b32 s33, s34			; GFX11-NEXT: s_mov_b32 s33, s34
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	call amdgpu_gfx <512 x i32> @return_512xi32()			call amdgpu_gfx <512 x i32> @return_512xi32()
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 705 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:52 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:52 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:56 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:56 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:60 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:60 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v33, 1			; GFX9-NEXT: v_readlane_b32 s31, v33, 1
	; GFX9-NEXT: v_readlane_b32 s30, v33, 0			; GFX9-NEXT: v_readlane_b32 s30, v33, 0
	; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:1536 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:1536 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-NEXT: s_add_i32 s32, s32, 0xfffd8000			; GFX9-NEXT: s_add_i32 s32, s33, 0xffff8040
	; GFX9-NEXT: s_mov_b32 s33, s36			; GFX9-NEXT: s_mov_b32 s33, s36
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_72xi32:			; GFX10-LABEL: call_72xi32:
	; GFX10: ; %bb.0: ; %entry			; GFX10: ; %bb.0: ; %entry
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s36, s33			; GFX10-NEXT: s_mov_b32 s36, s33
	▲ Show 20 Lines • Show All 262 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:52			; GFX10-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:52
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:56			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:56
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: s_or_saveexec_b32 s34, -1			; GFX10-NEXT: s_or_saveexec_b32 s34, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:1536 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:1536 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s34			; GFX10-NEXT: s_mov_b32 exec_lo, s34
	; GFX10-NEXT: s_add_i32 s32, s32, 0xfffec000			; GFX10-NEXT: s_add_i32 s32, s33, 0xffffc020
	; GFX10-NEXT: s_mov_b32 s33, s36			; GFX10-NEXT: s_mov_b32 s33, s36
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_72xi32:			; GFX11-LABEL: call_72xi32:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s45, s33			; GFX11-NEXT: s_mov_b32 s45, s33
	▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines
	; GFX11-NEXT: scratch_load_b32 v43, off, s33 offset:48			; GFX11-NEXT: scratch_load_b32 v43, off, s33 offset:48
	; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:52			; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:52
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:56			; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:56
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: s_or_saveexec_b32 s0, -1			; GFX11-NEXT: s_or_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:1536 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:1536 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_addk_i32 s32, 0xf600			; GFX11-NEXT: s_add_i32 s32, s33, 0xfffffe01
	; GFX11-NEXT: s_mov_b32 s33, s45			; GFX11-NEXT: s_mov_b32 s33, s45
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%ret.0 = call amdgpu_gfx <72 x i32> @return_72xi32(<72 x i32> zeroinitializer)			%ret.0 = call amdgpu_gfx <72 x i32> @return_72xi32(<72 x i32> zeroinitializer)
	%val.0 = insertelement <72 x i32> %ret.0, i32 42, i32 0			%val.0 = insertelement <72 x i32> %ret.0, i32 42, i32 0
	%val.1 = insertelement <72 x i32> %val.0, i32 24, i32 58			%val.1 = insertelement <72 x i32> %val.0, i32 24, i32 58
	%ret.1 = call amdgpu_gfx <72 x i32> @return_72xi32(<72 x i32> %val.1)			%ret.1 = call amdgpu_gfx <72 x i32> @return_72xi32(<72 x i32> %val.1)
	ret void			ret void
	}			}

	; Ensure all VGPRs are available			; Ensure all VGPRs are available
	attributes #0 = { nounwind "amdgpu-waves-per-eu"="1,1" "amdgpu-flat-work-group-size"="1,1" }			attributes #0 = { nounwind "amdgpu-waves-per-eu"="1,1" "amdgpu-flat-work-group-size"="1,1" }

	; Limit to 64 VGPRs			; Limit to 64 VGPRs
	attributes #1 = { nounwind "amdgpu-num-vgpr"="64" }			attributes #1 = { nounwind "amdgpu-num-vgpr"="64" }

llvm/test/CodeGen/AMDGPU/indirect-call.ll

	Show First 20 Lines • Show All 463 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: v_readlane_b32 s4, v40, 18			; GCN-NEXT: v_readlane_b32 s4, v40, 18
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr:			; GISEL-LABEL: test_indirect_call_vgpr_ptr:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_mov_b32 s16, s33			; GISEL-NEXT: s_mov_b32 s16, s33
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: v_readlane_b32 s4, v40, 18			; GISEL-NEXT: v_readlane_b32 s4, v40, 18
	; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1			; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[6:7]			; GISEL-NEXT: s_mov_b64 exec, s[6:7]
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_mov_b32 s32, s33
	; GISEL-NEXT: s_mov_b32 s33, s4			; GISEL-NEXT: s_mov_b32 s33, s4
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	call void %fptr()			call void %fptr()
	ret void			ret void
	}			}

	define void @test_indirect_call_vgpr_ptr_arg(ptr %fptr) {			define void @test_indirect_call_vgpr_ptr_arg(ptr %fptr) {
	▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: v_readlane_b32 s4, v40, 18			; GCN-NEXT: v_readlane_b32 s4, v40, 18
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg:			; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_mov_b32 s16, s33			; GISEL-NEXT: s_mov_b32 s16, s33
	▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: v_readlane_b32 s4, v40, 18			; GISEL-NEXT: v_readlane_b32 s4, v40, 18
	; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1			; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[6:7]			; GISEL-NEXT: s_mov_b64 exec, s[6:7]
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_mov_b32 s32, s33
	; GISEL-NEXT: s_mov_b32 s33, s4			; GISEL-NEXT: s_mov_b32 s33, s4
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	call void %fptr(i32 123)			call void %fptr(i32 123)
	ret void			ret void
	}			}

	define i32 @test_indirect_call_vgpr_ptr_ret(ptr %fptr) {			define i32 @test_indirect_call_vgpr_ptr_ret(ptr %fptr) {
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: v_readlane_b32 s4, v40, 18			; GCN-NEXT: v_readlane_b32 s4, v40, 18
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr_ret:			; GISEL-LABEL: test_indirect_call_vgpr_ptr_ret:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_mov_b32 s16, s33			; GISEL-NEXT: s_mov_b32 s16, s33
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: v_readlane_b32 s4, v40, 18			; GISEL-NEXT: v_readlane_b32 s4, v40, 18
	; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1			; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[6:7]			; GISEL-NEXT: s_mov_b64 exec, s[6:7]
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_mov_b32 s32, s33
	; GISEL-NEXT: s_mov_b32 s33, s4			; GISEL-NEXT: s_mov_b32 s33, s4
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	%a = call i32 %fptr()			%a = call i32 %fptr()
	%b = add i32 %a, 1			%b = add i32 %a, 1
	ret i32 %b			ret i32 %b
	}			}

	▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: v_readlane_b32 s4, v40, 20			; GCN-NEXT: v_readlane_b32 s4, v40, 20
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr_in_branch:			; GISEL-LABEL: test_indirect_call_vgpr_ptr_in_branch:
	; GISEL: ; %bb.0: ; %bb0			; GISEL: ; %bb.0: ; %bb0
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_mov_b32 s16, s33			; GISEL-NEXT: s_mov_b32 s16, s33
	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: v_readlane_b32 s4, v40, 20			; GISEL-NEXT: v_readlane_b32 s4, v40, 20
	; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1			; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[6:7]			; GISEL-NEXT: s_mov_b64 exec, s[6:7]
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_mov_b32 s32, s33
	; GISEL-NEXT: s_mov_b32 s33, s4			; GISEL-NEXT: s_mov_b32 s33, s4
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	bb0:			bb0:
	br i1 %cond, label %bb1, label %bb2			br i1 %cond, label %bb1, label %bb2

	bb1:			bb1:
	call void %fptr()			call void %fptr()
	▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s5			; GCN-NEXT: s_mov_b32 s33, s5
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr_inreg_arg:			; GISEL-LABEL: test_indirect_call_vgpr_ptr_inreg_arg:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_mov_b32 s5, s33			; GISEL-NEXT: s_mov_b32 s5, s33
	▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s36, v40, 4			; GISEL-NEXT: v_readlane_b32 s36, v40, 4
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1			; GISEL-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[6:7]			; GISEL-NEXT: s_mov_b64 exec, s[6:7]
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_mov_b32 s32, s33
	; GISEL-NEXT: s_mov_b32 s33, s5			; GISEL-NEXT: s_mov_b32 s33, s5
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void %fptr(i32 inreg 123)			call amdgpu_gfx void %fptr(i32 inreg 123)
	ret void			ret void
	}			}

	define i32 @test_indirect_call_vgpr_ptr_arg_and_reuse(i32 %i, ptr %fptr) {			define i32 @test_indirect_call_vgpr_ptr_arg_and_reuse(i32 %i, ptr %fptr) {
	▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s10			; GCN-NEXT: s_mov_b32 s33, s10
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:			; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_reuse:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_mov_b32 s10, s33			; GISEL-NEXT: s_mov_b32 s10, s33
	▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_mov_b32 s32, s33
	; GISEL-NEXT: s_mov_b32 s33, s10			; GISEL-NEXT: s_mov_b32 s33, s10
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	call amdgpu_gfx void %fptr(i32 %i)			call amdgpu_gfx void %fptr(i32 %i)
	ret i32 %i			ret i32 %i
	}			}

	; Use a variable inside a waterfall loop and use the return variable after the loop.			; Use a variable inside a waterfall loop and use the return variable after the loop.
	▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s10			; GCN-NEXT: s_mov_b32 s33, s10
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:			; GISEL-LABEL: test_indirect_call_vgpr_ptr_arg_and_return:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_mov_b32 s10, s33			; GISEL-NEXT: s_mov_b32 s10, s33
	▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s36, v40, 4			; GISEL-NEXT: v_readlane_b32 s36, v40, 4
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_mov_b32 s32, s33
	; GISEL-NEXT: s_mov_b32 s33, s10			; GISEL-NEXT: s_mov_b32 s33, s10
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	%ret = call amdgpu_gfx i32 %fptr(i32 %i)			%ret = call amdgpu_gfx i32 %fptr(i32 %i)
	ret i32 %ret			ret i32 %ret
	}			}

	; Calling a vgpr can never be a tail call.			; Calling a vgpr can never be a tail call.
	▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s36, v40, 4			; GCN-NEXT: v_readlane_b32 s36, v40, 4
	; GCN-NEXT: v_readlane_b32 s35, v40, 3			; GCN-NEXT: v_readlane_b32 s35, v40, 3
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s10			; GCN-NEXT: s_mov_b32 s33, s10
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GISEL-LABEL: test_indirect_tail_call_vgpr_ptr:			; GISEL-LABEL: test_indirect_tail_call_vgpr_ptr:
	; GISEL: ; %bb.0:			; GISEL: ; %bb.0:
	; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GISEL-NEXT: s_mov_b32 s10, s33			; GISEL-NEXT: s_mov_b32 s10, s33
	▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	; GISEL-NEXT: v_readlane_b32 s36, v40, 4			; GISEL-NEXT: v_readlane_b32 s36, v40, 4
	; GISEL-NEXT: v_readlane_b32 s35, v40, 3			; GISEL-NEXT: v_readlane_b32 s35, v40, 3
	; GISEL-NEXT: v_readlane_b32 s34, v40, 2			; GISEL-NEXT: v_readlane_b32 s34, v40, 2
	; GISEL-NEXT: v_readlane_b32 s31, v40, 1			; GISEL-NEXT: v_readlane_b32 s31, v40, 1
	; GISEL-NEXT: v_readlane_b32 s30, v40, 0			; GISEL-NEXT: v_readlane_b32 s30, v40, 0
	; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1			; GISEL-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GISEL-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GISEL-NEXT: s_mov_b64 exec, s[4:5]			; GISEL-NEXT: s_mov_b64 exec, s[4:5]
	; GISEL-NEXT: s_addk_i32 s32, 0xfc00			; GISEL-NEXT: s_mov_b32 s32, s33
	; GISEL-NEXT: s_mov_b32 s33, s10			; GISEL-NEXT: s_mov_b32 s33, s10
	; GISEL-NEXT: s_waitcnt vmcnt(0)			; GISEL-NEXT: s_waitcnt vmcnt(0)
	; GISEL-NEXT: s_setpc_b64 s[30:31]			; GISEL-NEXT: s_setpc_b64 s[30:31]
	tail call amdgpu_gfx void %fptr()			tail call amdgpu_gfx void %fptr()
	ret void			ret void
	}			}

	!llvm.module.flags = !{!0}			!llvm.module.flags = !{!0}
	!0 = !{i32 1, !"amdgpu_code_object_version", i32 200}			!0 = !{i32 1, !"amdgpu_code_object_version", i32 200}

llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll

	Show All 21 Lines
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]			; GFX11-NEXT: s_swappc_b64 s[30:31], s[0:1]
	; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)			; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
	; GFX11-NEXT: v_readlane_b32 s31, v4, 1			; GFX11-NEXT: v_readlane_b32 s31, v4, 1
	; GFX11-NEXT: v_readlane_b32 s30, v4, 0			; GFX11-NEXT: v_readlane_b32 s30, v4, 0
	; GFX11-NEXT: s_xor_saveexec_b32 s0, -1			; GFX11-NEXT: s_xor_saveexec_b32 s0, -1
	; GFX11-NEXT: scratch_load_b32 v4, off, s33 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v4, off, s33 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s0			; GFX11-NEXT: s_mov_b32 exec_lo, s0
	; GFX11-NEXT: s_add_i32 s32, s32, -16			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s2			; GFX11-NEXT: s_mov_b32 s33, s2
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%i = call <2 x i64> @f1()			%i = call <2 x i64> @f1()
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 208 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll

	Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	; MUBUF-NEXT: buffer_load_dword v4, v3, s[0:3], 0 offen glc			; MUBUF-NEXT: buffer_load_dword v4, v3, s[0:3], 0 offen glc
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: buffer_load_dword v5, v3, s[0:3], 0 offen offset:4 glc			; MUBUF-NEXT: buffer_load_dword v5, v3, s[0:3], 0 offen offset:4 glc
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: buffer_load_dword v6, v2, s[0:3], 0 offen glc			; MUBUF-NEXT: buffer_load_dword v6, v2, s[0:3], 0 offen glc
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: buffer_load_dword v7, v2, s[0:3], 0 offen offset:4 glc			; MUBUF-NEXT: buffer_load_dword v7, v2, s[0:3], 0 offen offset:4 glc
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: s_add_i32 s32, s32, 0xffe00000			; MUBUF-NEXT: s_add_i32 s32, s33, 0xfff80040
	; MUBUF-NEXT: s_mov_b32 s33, s5			; MUBUF-NEXT: s_mov_b32 s33, s5
	; MUBUF-NEXT: v_add_co_u32_e32 v2, vcc, v4, v6			; MUBUF-NEXT: v_add_co_u32_e32 v2, vcc, v4, v6
	; MUBUF-NEXT: v_addc_co_u32_e32 v3, vcc, v5, v7, vcc			; MUBUF-NEXT: v_addc_co_u32_e32 v3, vcc, v5, v7, vcc
	; MUBUF-NEXT: global_store_dwordx2 v[0:1], v[2:3], off			; MUBUF-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: s_setpc_b64 s[30:31]			; MUBUF-NEXT: s_setpc_b64 s[30:31]
	;			;
	; FLATSCR-LABEL: func_local_stack_offset_uses_sp:			; FLATSCR-LABEL: func_local_stack_offset_uses_sp:
	Show All 21 Lines
	; FLATSCR-NEXT: s_movk_i32 s0, 0x2000			; FLATSCR-NEXT: s_movk_i32 s0, 0x2000
	; FLATSCR-NEXT: s_add_i32 s1, s33, 0x3000			; FLATSCR-NEXT: s_add_i32 s1, s33, 0x3000
	; FLATSCR-NEXT: s_add_i32 s0, s0, s1			; FLATSCR-NEXT: s_add_i32 s0, s0, s1
	; FLATSCR-NEXT: scratch_load_dwordx2 v[2:3], off, s0 offset:208 glc			; FLATSCR-NEXT: scratch_load_dwordx2 v[2:3], off, s0 offset:208 glc
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_add_i32 s0, s33, 0x3000			; FLATSCR-NEXT: s_add_i32 s0, s33, 0x3000
	; FLATSCR-NEXT: scratch_load_dwordx2 v[4:5], off, s0 offset:64 glc			; FLATSCR-NEXT: scratch_load_dwordx2 v[4:5], off, s0 offset:64 glc
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_addk_i32 s32, 0x8000			; FLATSCR-NEXT: s_add_i32 s32, s33, 0xffffe001
	; FLATSCR-NEXT: s_mov_b32 s33, s2			; FLATSCR-NEXT: s_mov_b32 s33, s2
	; FLATSCR-NEXT: v_add_co_u32_e32 v2, vcc, v2, v4			; FLATSCR-NEXT: v_add_co_u32_e32 v2, vcc, v2, v4
	; FLATSCR-NEXT: v_addc_co_u32_e32 v3, vcc, v3, v5, vcc			; FLATSCR-NEXT: v_addc_co_u32_e32 v3, vcc, v3, v5, vcc
	; FLATSCR-NEXT: global_store_dwordx2 v[0:1], v[2:3], off			; FLATSCR-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_setpc_b64 s[30:31]			; FLATSCR-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%pin.low = alloca i32, align 8192, addrspace(5)			%pin.low = alloca i32, align 8192, addrspace(5)
	▲ Show 20 Lines • Show All 142 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

	Show First 20 Lines • Show All 227 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: v_readlane_b32 s36, v40, 3			; GFX9-NEXT: v_readlane_b32 s36, v40, 3
	; GFX9-NEXT: v_readlane_b32 s34, v40, 2			; GFX9-NEXT: v_readlane_b32 s34, v40, 2
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s4, v40, 5			; GFX9-NEXT: v_readlane_b32 s4, v40, 5
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	%b = and i32 %b.arg, 16777215			%b = and i32 %b.arg, 16777215
	%s = and i32 %s.arg, 16777215			%s = and i32 %s.arg, 16777215

	; CHECK-LABEL: @slsr1(			; CHECK-LABEL: @slsr1(
	; foo(b * s);			; foo(b * s);
	Show All 26 Lines

llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -O0 -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -O0 -verify-machineinstrs < %s \| FileCheck %s

	; FP is in CSR range, modified.			; FP is in CSR range, modified.
	define hidden fastcc void @callee_has_fp() #1 {			define hidden fastcc void @callee_has_fp() #1 {
	; CHECK-LABEL: callee_has_fp:			; CHECK-LABEL: callee_has_fp:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s4, s33			; CHECK-NEXT: s_mov_b32 s4, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: s_add_i32 s32, s32, 0x200			; CHECK-NEXT: s_add_i32 s32, s32, 0x200
	; CHECK-NEXT: v_mov_b32_e32 v0, 1			; CHECK-NEXT: v_mov_b32_e32 v0, 1
	; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33			; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffe00			; CHECK-NEXT: s_mov_b32 s32, s33
	; CHECK-NEXT: s_mov_b32 s33, s4			; CHECK-NEXT: s_mov_b32 s33, s4
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 1, ptr addrspace(5) %alloca			store volatile i32 1, ptr addrspace(5) %alloca
	ret void			ret void
	}			}

	; Has no stack objects, but introduces them due to the CSR spill. We			; Has no stack objects, but introduces them due to the CSR spill. We
	Show All 24 Lines
	; CHECK-NEXT: ; clobber csr v40			; CHECK-NEXT: ; clobber csr v40
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: v_readlane_b32 s31, v1, 1			; CHECK-NEXT: v_readlane_b32 s31, v1, 1
	; CHECK-NEXT: v_readlane_b32 s30, v1, 0			; CHECK-NEXT: v_readlane_b32 s30, v1, 0
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_mov_b32 s32, s33
	; CHECK-NEXT: s_mov_b32 s33, s18			; CHECK-NEXT: s_mov_b32 s33, s18
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	call fastcc void @callee_has_fp()			call fastcc void @callee_has_fp()
	call void asm sideeffect "; clobber csr v40", "~{v40}"()			call void asm sideeffect "; clobber csr v40", "~{v40}"()
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines

	define hidden i32 @tail_call() #1 {			define hidden i32 @tail_call() #1 {
	; CHECK-LABEL: tail_call:			; CHECK-LABEL: tail_call:
	; CHECK: ; %bb.0: ; %entry			; CHECK: ; %bb.0: ; %entry
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s4, s33			; CHECK-NEXT: s_mov_b32 s4, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: v_mov_b32_e32 v0, 0
				; CHECK-NEXT: s_mov_b32 s32, s33
	; CHECK-NEXT: s_mov_b32 s33, s4			; CHECK-NEXT: s_mov_b32 s33, s4
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	ret i32 0			ret i32 0
	}			}

	define hidden i32 @caller_save_vgpr_spill_fp_tail_call() #0 {			define hidden i32 @caller_save_vgpr_spill_fp_tail_call() #0 {
	; CHECK-LABEL: caller_save_vgpr_spill_fp_tail_call:			; CHECK-LABEL: caller_save_vgpr_spill_fp_tail_call:
	Show All 15 Lines
	; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]			; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]			; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; CHECK-NEXT: v_readlane_b32 s31, v1, 1			; CHECK-NEXT: v_readlane_b32 s31, v1, 1
	; CHECK-NEXT: v_readlane_b32 s30, v1, 0			; CHECK-NEXT: v_readlane_b32 s30, v1, 0
	; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s33 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_mov_b32 s32, s33
	; CHECK-NEXT: s_mov_b32 s33, s18			; CHECK-NEXT: s_mov_b32 s33, s18
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%call = call i32 @tail_call()			%call = call i32 @tail_call()
	ret i32 %call			ret i32 %call
	}			}

	Show All 17 Lines
	; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]			; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]
	; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]			; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]
	; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]			; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; CHECK-NEXT: v_readlane_b32 s31, v2, 1			; CHECK-NEXT: v_readlane_b32 s31, v2, 1
	; CHECK-NEXT: v_readlane_b32 s30, v2, 0			; CHECK-NEXT: v_readlane_b32 s30, v2, 0
	; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1			; CHECK-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s33 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[4:5]			; CHECK-NEXT: s_mov_b64 exec, s[4:5]
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_mov_b32 s32, s33
	; CHECK-NEXT: s_mov_b32 s33, s19			; CHECK-NEXT: s_mov_b32 s33, s19
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%call = call i32 @caller_save_vgpr_spill_fp_tail_call()			%call = call i32 @caller_save_vgpr_spill_fp_tail_call()
	ret i32 %call			ret i32 %call
	}			}

	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/nested-calls.ll

	Show All 23 Lines

	; GCN: v_readlane_b32 s31, v40, 1			; GCN: v_readlane_b32 s31, v40, 1
	; GCN: v_readlane_b32 s30, v40, 0			; GCN: v_readlane_b32 s30, v40, 0

	; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], v40, 2			; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], v40, 2
	; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]			; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	define void @test_func_call_external_void_func_i32_imm() #0 {			define void @test_func_call_external_void_func_i32_imm() #0 {
	call void @external_void_func_i32(i32 42)			call void @external_void_func_i32(i32 42)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}test_func_call_external_void_func_i32_imm_stack_use:			; GCN-LABEL: {{^}}test_func_call_external_void_func_i32_imm_stack_use:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN: s_mov_b32 s33, s32			; GCN: s_mov_b32 s33, s32
	; GCN-DAG: s_addk_i32 s32, 0x1400{{$}}			; GCN-DAG: s_addk_i32 s32, 0x1400{{$}}
	; GCN-DAG: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:			; GCN-DAG: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GCN: s_addk_i32 s32, 0xec00{{$}}			; GCN: s_mov_b32 s32, s33{{$}}
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	define void @test_func_call_external_void_func_i32_imm_stack_use() #0 {			define void @test_func_call_external_void_func_i32_imm_stack_use() #0 {
	%alloca = alloca [16 x i32], align 4, addrspace(5)			%alloca = alloca [16 x i32], align 4, addrspace(5)
	%gep15 = getelementptr inbounds [16 x i32], ptr addrspace(5) %alloca, i32 0, i32 16			%gep15 = getelementptr inbounds [16 x i32], ptr addrspace(5) %alloca, i32 0, i32 16
	store volatile i32 0, ptr addrspace(5) %alloca			store volatile i32 0, ptr addrspace(5) %alloca
	store volatile i32 0, ptr addrspace(5) %gep15			store volatile i32 0, ptr addrspace(5) %gep15
	call void @external_void_func_i32(i32 42)			call void @external_void_func_i32(i32 42)
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readnone }			attributes #1 = { nounwind readnone }
	attributes #2 = { nounwind noinline }			attributes #2 = { nounwind noinline }

llvm/test/CodeGen/AMDGPU/no-source-locations-in-prologue.ll

	Show All 37 Lines
	; CHECK-NEXT: .loc 0 32 1 ; lane-info.cpp:32:1			; CHECK-NEXT: .loc 0 32 1 ; lane-info.cpp:32:1
	; CHECK-NEXT: v_readlane_b32 s31, v40, 1			; CHECK-NEXT: v_readlane_b32 s31, v40, 1
	; CHECK-NEXT: v_readlane_b32 s30, v40, 0			; CHECK-NEXT: v_readlane_b32 s30, v40, 0
	; CHECK-NEXT: v_readlane_b32 s4, v40, 2			; CHECK-NEXT: v_readlane_b32 s4, v40, 2
	; CHECK-NEXT: s_or_saveexec_b64 s[6:7], -1			; CHECK-NEXT: s_or_saveexec_b64 s[6:7], -1
	; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; CHECK-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; CHECK-NEXT: s_mov_b64 exec, s[6:7]			; CHECK-NEXT: s_mov_b64 exec, s[6:7]
	; CHECK-NEXT: .loc 0 32 1 epilogue_begin is_stmt 0 ; lane-info.cpp:32:1			; CHECK-NEXT: .loc 0 32 1 epilogue_begin is_stmt 0 ; lane-info.cpp:32:1
	; CHECK-NEXT: s_add_i32 s32, s32, 0xfffffc00			; CHECK-NEXT: s_mov_b32 s32, s33
	; CHECK-NEXT: s_mov_b32 s33, s4			; CHECK-NEXT: s_mov_b32 s33, s4
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	; CHECK-NEXT: .Ltmp2:			; CHECK-NEXT: .Ltmp2:
	entry:			entry:
	call void @_ZL13sleep_foreverv(), !dbg !1646			call void @_ZL13sleep_foreverv(), !dbg !1646
	ret void, !dbg !1647			ret void, !dbg !1647
	}			}
	Show All 20 Lines

llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll

	Show First 20 Lines • Show All 241 Lines • ▼ Show 20 Lines
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: v_add_u32_e32 v2, v2, v3			; MUBUF-NEXT: v_add_u32_e32 v2, v2, v3
	; MUBUF-NEXT: global_store_dword v[0:1], v2, off			; MUBUF-NEXT: global_store_dword v[0:1], v2, off
	; MUBUF-NEXT: .LBB2_3: ; %bb.2			; MUBUF-NEXT: .LBB2_3: ; %bb.2
	; MUBUF-NEXT: s_or_b64 exec, exec, s[4:5]			; MUBUF-NEXT: s_or_b64 exec, exec, s[4:5]
	; MUBUF-NEXT: v_mov_b32_e32 v0, 0			; MUBUF-NEXT: v_mov_b32_e32 v0, 0
	; MUBUF-NEXT: global_store_dword v[0:1], v0, off			; MUBUF-NEXT: global_store_dword v[0:1], v0, off
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: s_addk_i32 s32, 0xfc00			; MUBUF-NEXT: s_mov_b32 s32, s33
	; MUBUF-NEXT: s_mov_b32 s33, s7			; MUBUF-NEXT: s_mov_b32 s33, s7
	; MUBUF-NEXT: s_setpc_b64 s[30:31]			; MUBUF-NEXT: s_setpc_b64 s[30:31]
	;			;
	; FLATSCR-LABEL: func_non_entry_block_static_alloca_align4:			; FLATSCR-LABEL: func_non_entry_block_static_alloca_align4:
	; FLATSCR: ; %bb.0: ; %entry			; FLATSCR: ; %bb.0: ; %entry
	; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; FLATSCR-NEXT: s_mov_b32 s3, s33			; FLATSCR-NEXT: s_mov_b32 s3, s33
	; FLATSCR-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2			; FLATSCR-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2
	Show All 17 Lines
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: v_add_u32_e32 v2, v2, v3			; FLATSCR-NEXT: v_add_u32_e32 v2, v2, v3
	; FLATSCR-NEXT: global_store_dword v[0:1], v2, off			; FLATSCR-NEXT: global_store_dword v[0:1], v2, off
	; FLATSCR-NEXT: .LBB2_3: ; %bb.2			; FLATSCR-NEXT: .LBB2_3: ; %bb.2
	; FLATSCR-NEXT: s_or_b64 exec, exec, s[0:1]			; FLATSCR-NEXT: s_or_b64 exec, exec, s[0:1]
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 0			; FLATSCR-NEXT: v_mov_b32_e32 v0, 0
	; FLATSCR-NEXT: global_store_dword v[0:1], v0, off			; FLATSCR-NEXT: global_store_dword v[0:1], v0, off
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_add_i32 s32, s32, -16			; FLATSCR-NEXT: s_mov_b32 s32, s33
	; FLATSCR-NEXT: s_mov_b32 s33, s3			; FLATSCR-NEXT: s_mov_b32 s33, s3
	; FLATSCR-NEXT: s_setpc_b64 s[30:31]			; FLATSCR-NEXT: s_setpc_b64 s[30:31]

	entry:			entry:
	%cond0 = icmp eq i32 %arg.cond0, 0			%cond0 = icmp eq i32 %arg.cond0, 0
	br i1 %cond0, label %bb.0, label %bb.2			br i1 %cond0, label %bb.0, label %bb.2

	bb.0:			bb.0:
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: v_add_u32_e32 v2, v2, v3			; MUBUF-NEXT: v_add_u32_e32 v2, v2, v3
	; MUBUF-NEXT: global_store_dword v[0:1], v2, off			; MUBUF-NEXT: global_store_dword v[0:1], v2, off
	; MUBUF-NEXT: .LBB3_2: ; %bb.1			; MUBUF-NEXT: .LBB3_2: ; %bb.1
	; MUBUF-NEXT: s_or_b64 exec, exec, s[4:5]			; MUBUF-NEXT: s_or_b64 exec, exec, s[4:5]
	; MUBUF-NEXT: v_mov_b32_e32 v0, 0			; MUBUF-NEXT: v_mov_b32_e32 v0, 0
	; MUBUF-NEXT: global_store_dword v[0:1], v0, off			; MUBUF-NEXT: global_store_dword v[0:1], v0, off
	; MUBUF-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; MUBUF-NEXT: s_addk_i32 s32, 0xe000			; MUBUF-NEXT: s_add_i32 s32, s33, 0xfffff040
	; MUBUF-NEXT: s_mov_b32 s33, s7			; MUBUF-NEXT: s_mov_b32 s33, s7
	; MUBUF-NEXT: s_setpc_b64 s[30:31]			; MUBUF-NEXT: s_setpc_b64 s[30:31]
	;			;
	; FLATSCR-LABEL: func_non_entry_block_static_alloca_align64:			; FLATSCR-LABEL: func_non_entry_block_static_alloca_align64:
	; FLATSCR: ; %bb.0: ; %entry			; FLATSCR: ; %bb.0: ; %entry
	; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; FLATSCR-NEXT: s_mov_b32 s3, s33			; FLATSCR-NEXT: s_mov_b32 s3, s33
	; FLATSCR-NEXT: s_add_i32 s33, s32, 63			; FLATSCR-NEXT: s_add_i32 s33, s32, 63
	Show All 15 Lines
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: v_add_u32_e32 v2, v2, v3			; FLATSCR-NEXT: v_add_u32_e32 v2, v2, v3
	; FLATSCR-NEXT: global_store_dword v[0:1], v2, off			; FLATSCR-NEXT: global_store_dword v[0:1], v2, off
	; FLATSCR-NEXT: .LBB3_2: ; %bb.1			; FLATSCR-NEXT: .LBB3_2: ; %bb.1
	; FLATSCR-NEXT: s_or_b64 exec, exec, s[0:1]			; FLATSCR-NEXT: s_or_b64 exec, exec, s[0:1]
	; FLATSCR-NEXT: v_mov_b32_e32 v0, 0			; FLATSCR-NEXT: v_mov_b32_e32 v0, 0
	; FLATSCR-NEXT: global_store_dword v[0:1], v0, off			; FLATSCR-NEXT: global_store_dword v[0:1], v0, off
	; FLATSCR-NEXT: s_waitcnt vmcnt(0)			; FLATSCR-NEXT: s_waitcnt vmcnt(0)
	; FLATSCR-NEXT: s_addk_i32 s32, 0xff80			; FLATSCR-NEXT: s_add_i32 s32, s33, 0xffffffc1
	; FLATSCR-NEXT: s_mov_b32 s33, s3			; FLATSCR-NEXT: s_mov_b32 s33, s3
	; FLATSCR-NEXT: s_setpc_b64 s[30:31]			; FLATSCR-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%cond = icmp eq i32 %arg.cond, 0			%cond = icmp eq i32 %arg.cond, 0
	br i1 %cond, label %bb.0, label %bb.1			br i1 %cond, label %bb.0, label %bb.1

	bb.0:			bb.0:
	%alloca = alloca [16 x i32], align 64, addrspace(5)			%alloca = alloca [16 x i32], align 64, addrspace(5)
	Show All 22 Lines

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr-carry-out.mir

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	bb.0:
; CHECK-NEXT: $sgpr33 = S_ADD_I32 killed $sgpr33, -16384, implicit-def $scc		; CHECK-NEXT: $sgpr33 = S_ADD_I32 killed $sgpr33, -16384, implicit-def $scc
; CHECK-NEXT: $sgpr33 = S_LSHL_B32 $sgpr33, 6, implicit-def $scc		; CHECK-NEXT: $sgpr33 = S_LSHL_B32 $sgpr33, 6, implicit-def $scc
; CHECK-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31		; CHECK-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
; CHECK-NEXT: $sgpr4 = V_READLANE_B32 $vgpr2, 0		; CHECK-NEXT: $sgpr4 = V_READLANE_B32 $vgpr2, 0
; CHECK-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; CHECK-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; CHECK-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 1048832, implicit-def dead $scc		; CHECK-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 1048832, implicit-def dead $scc
; CHECK-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr5, 0, 0, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)		; CHECK-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr5, 0, 0, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)
; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7		; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7
; CHECK-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -2097152, implicit-def dead $scc		; CHECK-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr33, -524224, implicit-def dead $scc
; CHECK-NEXT: $sgpr33 = COPY $sgpr4		; CHECK-NEXT: $sgpr33 = COPY $sgpr4
; CHECK-NEXT: S_ENDPGM 0, implicit $vcc		; CHECK-NEXT: S_ENDPGM 0, implicit $vcc
S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc		S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec		$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec
$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31		$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
S_ENDPGM 0, implicit $vcc		S_ENDPGM 0, implicit $vcc
...		...

Show All 32 Lines	bb.0:
; CHECK-NEXT: $sgpr33 = S_ADD_I32 killed $sgpr33, -8192, implicit-def $scc		; CHECK-NEXT: $sgpr33 = S_ADD_I32 killed $sgpr33, -8192, implicit-def $scc
; CHECK-NEXT: $sgpr33 = S_LSHL_B32 $sgpr33, 6, implicit-def $scc		; CHECK-NEXT: $sgpr33 = S_LSHL_B32 $sgpr33, 6, implicit-def $scc
; CHECK-NEXT: $sgpr33 = S_LSHR_B32 $sgpr33, 6, implicit-def $scc		; CHECK-NEXT: $sgpr33 = S_LSHR_B32 $sgpr33, 6, implicit-def $scc
; CHECK-NEXT: $sgpr33 = S_ADD_I32 killed $sgpr33, 16384, implicit-def $scc		; CHECK-NEXT: $sgpr33 = S_ADD_I32 killed $sgpr33, 16384, implicit-def $scc
; CHECK-NEXT: $vgpr2 = COPY killed $sgpr33		; CHECK-NEXT: $vgpr2 = COPY killed $sgpr33
; CHECK-NEXT: $sgpr33 = S_ADD_I32 killed $sgpr33, -16384, implicit-def $scc		; CHECK-NEXT: $sgpr33 = S_ADD_I32 killed $sgpr33, -16384, implicit-def $scc
; CHECK-NEXT: $sgpr33 = S_LSHL_B32 $sgpr33, 6, implicit-def $scc		; CHECK-NEXT: $sgpr33 = S_LSHL_B32 $sgpr33, 6, implicit-def $scc
; CHECK-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr31		; CHECK-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr31
; CHECK-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -2097152, implicit-def dead $scc		; CHECK-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr33, -524224, implicit-def dead $scc
; CHECK-NEXT: $sgpr33 = frame-destroy COPY $sgpr29		; CHECK-NEXT: $sgpr33 = frame-destroy COPY $sgpr29
; CHECK-NEXT: S_ENDPGM 0, implicit $vcc		; CHECK-NEXT: S_ENDPGM 0, implicit $vcc
S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc		S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec		$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec
$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr31		$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr31
S_ENDPGM 0, implicit $vcc		S_ENDPGM 0, implicit $vcc
...		...

Show All 28 Lines	bb.0:
; CHECK-NEXT: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc		; CHECK-NEXT: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
; CHECK-NEXT: $sgpr29 = S_LSHR_B32 $sgpr33, 6, implicit-def $scc		; CHECK-NEXT: $sgpr29 = S_LSHR_B32 $sgpr33, 6, implicit-def $scc
; CHECK-NEXT: $sgpr29 = S_ADD_I32 killed $sgpr29, 8192, implicit-def $scc		; CHECK-NEXT: $sgpr29 = S_ADD_I32 killed $sgpr29, 8192, implicit-def $scc
; CHECK-NEXT: $vgpr0 = COPY killed $sgpr29		; CHECK-NEXT: $vgpr0 = COPY killed $sgpr29
; CHECK-NEXT: $sgpr29 = S_LSHR_B32 $sgpr33, 6, implicit-def $scc		; CHECK-NEXT: $sgpr29 = S_LSHR_B32 $sgpr33, 6, implicit-def $scc
; CHECK-NEXT: $sgpr29 = S_ADD_I32 killed $sgpr29, 16384, implicit-def $scc		; CHECK-NEXT: $sgpr29 = S_ADD_I32 killed $sgpr29, 16384, implicit-def $scc
; CHECK-NEXT: $vgpr2 = COPY killed $sgpr29		; CHECK-NEXT: $vgpr2 = COPY killed $sgpr29
; CHECK-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr31		; CHECK-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr31
; CHECK-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -2097152, implicit-def dead $scc		; CHECK-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr33, -524224, implicit-def dead $scc
; CHECK-NEXT: $sgpr33 = frame-destroy COPY $sgpr28		; CHECK-NEXT: $sgpr33 = frame-destroy COPY $sgpr28
; CHECK-NEXT: S_ENDPGM 0, implicit $vcc		; CHECK-NEXT: S_ENDPGM 0, implicit $vcc
S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc		S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec		$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec
$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr31		$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr31
S_ENDPGM 0, implicit $vcc		S_ENDPGM 0, implicit $vcc
...		...

Show All 27 Lines	bb.0:
; CHECK-NEXT: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr30, implicit-def $sgpr31		; CHECK-NEXT: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr30, implicit-def $sgpr31
; CHECK-NEXT: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec		; CHECK-NEXT: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
; CHECK-NEXT: $vcc_lo = S_MOV_B32 8192		; CHECK-NEXT: $vcc_lo = S_MOV_B32 8192
; CHECK-NEXT: $vgpr0, dead $vcc = V_ADD_CO_U32_e64 killed $vcc_lo, killed $vgpr0, 0, implicit $exec		; CHECK-NEXT: $vgpr0, dead $vcc = V_ADD_CO_U32_e64 killed $vcc_lo, killed $vgpr0, 0, implicit $exec
; CHECK-NEXT: $vgpr2 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec		; CHECK-NEXT: $vgpr2 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
; CHECK-NEXT: $vcc_lo = S_MOV_B32 16384		; CHECK-NEXT: $vcc_lo = S_MOV_B32 16384
; CHECK-NEXT: $vgpr2, dead $vcc = V_ADD_CO_U32_e64 killed $vcc_lo, killed $vgpr2, 0, implicit $exec		; CHECK-NEXT: $vgpr2, dead $vcc = V_ADD_CO_U32_e64 killed $vcc_lo, killed $vgpr2, 0, implicit $exec
; CHECK-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr31		; CHECK-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr31
; CHECK-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -2097152, implicit-def dead $scc		; CHECK-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr33, -524224, implicit-def dead $scc
; CHECK-NEXT: $sgpr33 = frame-destroy COPY $sgpr28		; CHECK-NEXT: $sgpr33 = frame-destroy COPY $sgpr28
; CHECK-NEXT: S_ENDPGM 0		; CHECK-NEXT: S_ENDPGM 0
S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr30, implicit-def $sgpr31		S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr30, implicit-def $sgpr31
$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec		$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec
$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr31		$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr31
S_ENDPGM 0		S_ENDPGM 0
...		...

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr-gfx9.mir

Show All 39 Lines	bb.0:
; MUBUF-NEXT: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec		; MUBUF-NEXT: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
; MUBUF-NEXT: $vgpr3 = V_ADD_U32_e32 16384, killed $vgpr3, implicit $exec		; MUBUF-NEXT: $vgpr3 = V_ADD_U32_e32 16384, killed $vgpr3, implicit $exec
; MUBUF-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31		; MUBUF-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
; MUBUF-NEXT: $sgpr4 = V_READLANE_B32 $vgpr2, 0		; MUBUF-NEXT: $sgpr4 = V_READLANE_B32 $vgpr2, 0
; MUBUF-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; MUBUF-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; MUBUF-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 1048832, implicit-def dead $scc		; MUBUF-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 1048832, implicit-def dead $scc
; MUBUF-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr5, 0, 0, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)		; MUBUF-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr5, 0, 0, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)
; MUBUF-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7		; MUBUF-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7
; MUBUF-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -2097152, implicit-def dead $scc		; MUBUF-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr33, -524224, implicit-def dead $scc
; MUBUF-NEXT: $sgpr33 = COPY $sgpr4		; MUBUF-NEXT: $sgpr33 = COPY $sgpr4
; MUBUF-NEXT: S_ENDPGM 0, implicit $vcc		; MUBUF-NEXT: S_ENDPGM 0, implicit $vcc
		;
; FLATSCR-LABEL: name: scavenge_sgpr_pei_no_sgprs		; FLATSCR-LABEL: name: scavenge_sgpr_pei_no_sgprs
; FLATSCR: liveins: $vgpr1, $vgpr2		; FLATSCR: liveins: $vgpr1, $vgpr2
; FLATSCR-NEXT: {{ $}}		; FLATSCR-NEXT: {{ $}}
; FLATSCR-NEXT: $sgpr4 = COPY $sgpr33		; FLATSCR-NEXT: $sgpr4 = COPY $sgpr33
; FLATSCR-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 8191, implicit-def $scc		; FLATSCR-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 8191, implicit-def $scc
; FLATSCR-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294959104, implicit-def dead $scc		; FLATSCR-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294959104, implicit-def dead $scc
; FLATSCR-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; FLATSCR-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; FLATSCR-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 16388, implicit-def dead $scc		; FLATSCR-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 16388, implicit-def dead $scc
; FLATSCR-NEXT: SCRATCH_STORE_DWORD_SADDR $vgpr2, killed $sgpr5, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.3, addrspace 5)		; FLATSCR-NEXT: SCRATCH_STORE_DWORD_SADDR $vgpr2, killed $sgpr5, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.3, addrspace 5)
; FLATSCR-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7		; FLATSCR-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7
; FLATSCR-NEXT: $vgpr2 = V_WRITELANE_B32 $sgpr4, 0, undef $vgpr2		; FLATSCR-NEXT: $vgpr2 = V_WRITELANE_B32 $sgpr4, 0, undef $vgpr2
; FLATSCR-NEXT: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 32768, implicit-def dead $scc		; FLATSCR-NEXT: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 32768, implicit-def dead $scc
; FLATSCR-NEXT: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc		; FLATSCR-NEXT: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
; FLATSCR-NEXT: $sgpr33 = S_ADD_I32 $sgpr33, 8192, implicit-def $scc		; FLATSCR-NEXT: $sgpr33 = S_ADD_I32 $sgpr33, 8192, implicit-def $scc
; FLATSCR-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr33, implicit $exec		; FLATSCR-NEXT: $vgpr0 = V_MOV_B32_e32 $sgpr33, implicit $exec
; FLATSCR-NEXT: $sgpr33 = S_ADD_I32 $sgpr33, -8192, implicit-def $scc		; FLATSCR-NEXT: $sgpr33 = S_ADD_I32 $sgpr33, -8192, implicit-def $scc
; FLATSCR-NEXT: $sgpr33 = S_ADD_I32 $sgpr33, 16384, implicit-def $scc		; FLATSCR-NEXT: $sgpr33 = S_ADD_I32 $sgpr33, 16384, implicit-def $scc
; FLATSCR-NEXT: $vgpr0 = V_OR_B32_e32 $sgpr33, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31		; FLATSCR-NEXT: $vgpr0 = V_OR_B32_e32 $sgpr33, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
; FLATSCR-NEXT: $sgpr33 = S_ADD_I32 $sgpr33, -16384, implicit-def $scc		; FLATSCR-NEXT: $sgpr33 = S_ADD_I32 $sgpr33, -16384, implicit-def $scc
; FLATSCR-NEXT: $sgpr4 = V_READLANE_B32 $vgpr2, 0		; FLATSCR-NEXT: $sgpr4 = V_READLANE_B32 $vgpr2, 0
; FLATSCR-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; FLATSCR-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; FLATSCR-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 16388, implicit-def dead $scc		; FLATSCR-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 16388, implicit-def dead $scc
; FLATSCR-NEXT: $vgpr2 = SCRATCH_LOAD_DWORD_SADDR killed $sgpr5, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.3, addrspace 5)		; FLATSCR-NEXT: $vgpr2 = SCRATCH_LOAD_DWORD_SADDR killed $sgpr5, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.3, addrspace 5)
; FLATSCR-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7		; FLATSCR-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7
; FLATSCR-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -32768, implicit-def dead $scc		; FLATSCR-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr33, -8191, implicit-def dead $scc
; FLATSCR-NEXT: $sgpr33 = COPY $sgpr4		; FLATSCR-NEXT: $sgpr33 = COPY $sgpr4
; FLATSCR-NEXT: S_ENDPGM 0, implicit $vcc		; FLATSCR-NEXT: S_ENDPGM 0, implicit $vcc
S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc		S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec		$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec
$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31		$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
S_ENDPGM 0, implicit $vcc		S_ENDPGM 0, implicit $vcc
...		...

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr.mir

Show All 39 Lines	bb.0:
; CHECK-NEXT: $sgpr33 = S_ADD_I32 killed $sgpr33, -4096, implicit-def $scc		; CHECK-NEXT: $sgpr33 = S_ADD_I32 killed $sgpr33, -4096, implicit-def $scc
; CHECK-NEXT: $sgpr33 = S_LSHL_B32 $sgpr33, 6, implicit-def $scc		; CHECK-NEXT: $sgpr33 = S_LSHL_B32 $sgpr33, 6, implicit-def $scc
; CHECK-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31		; CHECK-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
; CHECK-NEXT: $sgpr4 = V_READLANE_B32 $vgpr2, 0		; CHECK-NEXT: $sgpr4 = V_READLANE_B32 $vgpr2, 0
; CHECK-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; CHECK-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; CHECK-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 262400, implicit-def dead $scc		; CHECK-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 262400, implicit-def dead $scc
; CHECK-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr5, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)		; CHECK-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr5, 0, 0, 0, implicit $exec :: (load (s32) from %stack.2, addrspace 5)
; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7		; CHECK-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7
; CHECK-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -786432, implicit-def dead $scc		; CHECK-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr33, -262080, implicit-def dead $scc
; CHECK-NEXT: $sgpr33 = COPY $sgpr4		; CHECK-NEXT: $sgpr33 = COPY $sgpr4
; CHECK-NEXT: S_ENDPGM 0, implicit $vcc		; CHECK-NEXT: S_ENDPGM 0, implicit $vcc
S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc		S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
$vgpr0 = V_OR_B32_e32 %stack.0, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31		$vgpr0 = V_OR_B32_e32 %stack.0, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr27, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
S_ENDPGM 0, implicit $vcc		S_ENDPGM 0, implicit $vcc
...		...

llvm/test/CodeGen/AMDGPU/pei-scavenge-vgpr-spill.mir

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	bb.0:
; GFX8-NEXT: $vcc_lo = S_MOV_B32 16384		; GFX8-NEXT: $vcc_lo = S_MOV_B32 16384
; GFX8-NEXT: $vgpr3, dead $vcc = V_ADD_CO_U32_e64 killed $vcc_lo, killed $vgpr3, 0, implicit $exec		; GFX8-NEXT: $vgpr3, dead $vcc = V_ADD_CO_U32_e64 killed $vcc_lo, killed $vgpr3, 0, implicit $exec
; GFX8-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec		; GFX8-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec
; GFX8-NEXT: $sgpr4 = V_READLANE_B32 $vgpr2, 0		; GFX8-NEXT: $sgpr4 = V_READLANE_B32 $vgpr2, 0
; GFX8-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GFX8-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GFX8-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 1048832, implicit-def dead $scc		; GFX8-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 1048832, implicit-def dead $scc
; GFX8-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr5, 0, 0, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)		; GFX8-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr5, 0, 0, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)
; GFX8-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7		; GFX8-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7
; GFX8-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -2097152, implicit-def dead $scc		; GFX8-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr33, -524224, implicit-def dead $scc
; GFX8-NEXT: $sgpr33 = COPY $sgpr4		; GFX8-NEXT: $sgpr33 = COPY $sgpr4
; GFX8-NEXT: S_ENDPGM 0, amdgpu_allvgprs		; GFX8-NEXT: S_ENDPGM 0, amdgpu_allvgprs
		;
; GFX9-LABEL: name: pei_scavenge_vgpr_spill		; GFX9-LABEL: name: pei_scavenge_vgpr_spill
; GFX9: liveins: $vgpr2, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239		; GFX9: liveins: $vgpr2, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239
; GFX9-NEXT: {{ $}}		; GFX9-NEXT: {{ $}}
; GFX9-NEXT: $sgpr4 = COPY $sgpr33		; GFX9-NEXT: $sgpr4 = COPY $sgpr33
; GFX9-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 524224, implicit-def $scc		; GFX9-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 524224, implicit-def $scc
; GFX9-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def dead $scc		; GFX9-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294443008, implicit-def dead $scc
; GFX9-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GFX9-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GFX9-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 1048832, implicit-def dead $scc		; GFX9-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 1048832, implicit-def dead $scc
; GFX9-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr5, 0, 0, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5)		; GFX9-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr2, $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr5, 0, 0, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5)
; GFX9-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7		; GFX9-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7
; GFX9-NEXT: $vgpr2 = V_WRITELANE_B32 $sgpr4, 0, undef $vgpr2		; GFX9-NEXT: $vgpr2 = V_WRITELANE_B32 $sgpr4, 0, undef $vgpr2
; GFX9-NEXT: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 2097152, implicit-def dead $scc		; GFX9-NEXT: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 2097152, implicit-def dead $scc
; GFX9-NEXT: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec		; GFX9-NEXT: $vgpr0 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
; GFX9-NEXT: $vgpr0 = V_ADD_U32_e32 8192, killed $vgpr0, implicit $exec		; GFX9-NEXT: $vgpr0 = V_ADD_U32_e32 8192, killed $vgpr0, implicit $exec
; GFX9-NEXT: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec		; GFX9-NEXT: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
; GFX9-NEXT: $vgpr3 = V_ADD_U32_e32 16384, killed $vgpr3, implicit $exec		; GFX9-NEXT: $vgpr3 = V_ADD_U32_e32 16384, killed $vgpr3, implicit $exec
; GFX9-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec		; GFX9-NEXT: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec
; GFX9-NEXT: $sgpr4 = V_READLANE_B32 $vgpr2, 0		; GFX9-NEXT: $sgpr4 = V_READLANE_B32 $vgpr2, 0
; GFX9-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GFX9-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GFX9-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 1048832, implicit-def dead $scc		; GFX9-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 1048832, implicit-def dead $scc
; GFX9-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr5, 0, 0, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)		; GFX9-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr5, 0, 0, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)
; GFX9-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7		; GFX9-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7
; GFX9-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -2097152, implicit-def dead $scc		; GFX9-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr33, -524224, implicit-def dead $scc
; GFX9-NEXT: $sgpr33 = COPY $sgpr4		; GFX9-NEXT: $sgpr33 = COPY $sgpr4
; GFX9-NEXT: S_ENDPGM 0, amdgpu_allvgprs		; GFX9-NEXT: S_ENDPGM 0, amdgpu_allvgprs
		;
; GFX9-FLATSCR-LABEL: name: pei_scavenge_vgpr_spill		; GFX9-FLATSCR-LABEL: name: pei_scavenge_vgpr_spill
; GFX9-FLATSCR: liveins: $vgpr2, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239		; GFX9-FLATSCR: liveins: $vgpr2, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239
; GFX9-FLATSCR-NEXT: {{ $}}		; GFX9-FLATSCR-NEXT: {{ $}}
; GFX9-FLATSCR-NEXT: $sgpr4 = COPY $sgpr33		; GFX9-FLATSCR-NEXT: $sgpr4 = COPY $sgpr33
; GFX9-FLATSCR-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 8191, implicit-def $scc		; GFX9-FLATSCR-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 8191, implicit-def $scc
; GFX9-FLATSCR-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294959104, implicit-def dead $scc		; GFX9-FLATSCR-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294959104, implicit-def dead $scc
; GFX9-FLATSCR-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GFX9-FLATSCR-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GFX9-FLATSCR-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 16388, implicit-def dead $scc		; GFX9-FLATSCR-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 16388, implicit-def dead $scc
; GFX9-FLATSCR-NEXT: SCRATCH_STORE_DWORD_SADDR $vgpr2, killed $sgpr5, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.3, addrspace 5)		; GFX9-FLATSCR-NEXT: SCRATCH_STORE_DWORD_SADDR $vgpr2, killed $sgpr5, 0, 0, implicit $exec, implicit $flat_scr :: (store (s32) into %stack.3, addrspace 5)
; GFX9-FLATSCR-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7		; GFX9-FLATSCR-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7
; GFX9-FLATSCR-NEXT: $vgpr2 = V_WRITELANE_B32 $sgpr4, 0, undef $vgpr2		; GFX9-FLATSCR-NEXT: $vgpr2 = V_WRITELANE_B32 $sgpr4, 0, undef $vgpr2
; GFX9-FLATSCR-NEXT: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 32768, implicit-def dead $scc		; GFX9-FLATSCR-NEXT: $sgpr32 = frame-setup S_ADD_I32 $sgpr32, 32768, implicit-def dead $scc
; GFX9-FLATSCR-NEXT: $sgpr4 = S_ADD_I32 $sgpr33, 8192, implicit-def $scc		; GFX9-FLATSCR-NEXT: $sgpr4 = S_ADD_I32 $sgpr33, 8192, implicit-def $scc
; GFX9-FLATSCR-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr4, implicit $exec		; GFX9-FLATSCR-NEXT: $vgpr0 = V_MOV_B32_e32 killed $sgpr4, implicit $exec
; GFX9-FLATSCR-NEXT: $sgpr4 = S_ADD_I32 $sgpr33, 16384, implicit-def $scc		; GFX9-FLATSCR-NEXT: $sgpr4 = S_ADD_I32 $sgpr33, 16384, implicit-def $scc
; GFX9-FLATSCR-NEXT: $vgpr0 = V_OR_B32_e32 killed $sgpr4, $vgpr1, implicit $exec		; GFX9-FLATSCR-NEXT: $vgpr0 = V_OR_B32_e32 killed $sgpr4, $vgpr1, implicit $exec
; GFX9-FLATSCR-NEXT: $sgpr4 = V_READLANE_B32 $vgpr2, 0		; GFX9-FLATSCR-NEXT: $sgpr4 = V_READLANE_B32 $vgpr2, 0
; GFX9-FLATSCR-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; GFX9-FLATSCR-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; GFX9-FLATSCR-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 16388, implicit-def dead $scc		; GFX9-FLATSCR-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 16388, implicit-def dead $scc
; GFX9-FLATSCR-NEXT: $vgpr2 = SCRATCH_LOAD_DWORD_SADDR killed $sgpr5, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.3, addrspace 5)		; GFX9-FLATSCR-NEXT: $vgpr2 = SCRATCH_LOAD_DWORD_SADDR killed $sgpr5, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.3, addrspace 5)
; GFX9-FLATSCR-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7		; GFX9-FLATSCR-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7
; GFX9-FLATSCR-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -32768, implicit-def dead $scc		; GFX9-FLATSCR-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr33, -8191, implicit-def dead $scc
; GFX9-FLATSCR-NEXT: $sgpr33 = COPY $sgpr4		; GFX9-FLATSCR-NEXT: $sgpr33 = COPY $sgpr4
; GFX9-FLATSCR-NEXT: S_ENDPGM 0, amdgpu_allvgprs		; GFX9-FLATSCR-NEXT: S_ENDPGM 0, amdgpu_allvgprs
$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec		$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec
$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec		$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec
S_ENDPGM 0, amdgpu_allvgprs		S_ENDPGM 0, amdgpu_allvgprs
...		...

llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll

	Show First 20 Lines • Show All 357 Lines • ▼ Show 20 Lines
	; GFX906-NEXT: s_waitcnt vmcnt(0)			; GFX906-NEXT: s_waitcnt vmcnt(0)
	; GFX906-NEXT: s_xor_saveexec_b64 s[6:7], -1			; GFX906-NEXT: s_xor_saveexec_b64 s[6:7], -1
	; GFX906-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:144 ; 4-byte Folded Reload			; GFX906-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:144 ; 4-byte Folded Reload
	; GFX906-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:152 ; 4-byte Folded Reload			; GFX906-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:152 ; 4-byte Folded Reload
	; GFX906-NEXT: s_mov_b64 exec, -1			; GFX906-NEXT: s_mov_b64 exec, -1
	; GFX906-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX906-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX906-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:148 ; 4-byte Folded Reload			; GFX906-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:148 ; 4-byte Folded Reload
	; GFX906-NEXT: s_mov_b64 exec, s[6:7]			; GFX906-NEXT: s_mov_b64 exec, s[6:7]
	; GFX906-NEXT: s_addk_i32 s32, 0xd800			; GFX906-NEXT: s_mov_b32 s32, s33
	; GFX906-NEXT: s_mov_b32 s33, s4			; GFX906-NEXT: s_mov_b32 s33, s4
	; GFX906-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX906-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX906-NEXT: s_setpc_b64 s[30:31]			; GFX906-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX908-LABEL: preserve_wwm_copy_dstreg:			; GFX908-LABEL: preserve_wwm_copy_dstreg:
	; GFX908: ; %bb.0:			; GFX908: ; %bb.0:
	; GFX908-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX908-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX908-NEXT: s_mov_b32 s16, s33			; GFX908-NEXT: s_mov_b32 s16, s33
	▲ Show 20 Lines • Show All 375 Lines • ▼ Show 20 Lines
	; GFX908-NEXT: s_waitcnt vmcnt(0)			; GFX908-NEXT: s_waitcnt vmcnt(0)
	; GFX908-NEXT: v_readfirstlane_b32 s4, v0			; GFX908-NEXT: v_readfirstlane_b32 s4, v0
	; GFX908-NEXT: s_xor_saveexec_b64 s[6:7], -1			; GFX908-NEXT: s_xor_saveexec_b64 s[6:7], -1
	; GFX908-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:148 ; 4-byte Folded Reload			; GFX908-NEXT: buffer_load_dword v33, off, s[0:3], s33 offset:148 ; 4-byte Folded Reload
	; GFX908-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:156 ; 4-byte Folded Reload			; GFX908-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:156 ; 4-byte Folded Reload
	; GFX908-NEXT: s_mov_b64 exec, -1			; GFX908-NEXT: s_mov_b64 exec, -1
	; GFX908-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:152 ; 4-byte Folded Reload			; GFX908-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:152 ; 4-byte Folded Reload
	; GFX908-NEXT: s_mov_b64 exec, s[6:7]			; GFX908-NEXT: s_mov_b64 exec, s[6:7]
	; GFX908-NEXT: s_addk_i32 s32, 0xd400			; GFX908-NEXT: s_mov_b32 s32, s33
	; GFX908-NEXT: s_mov_b32 s33, s4			; GFX908-NEXT: s_mov_b32 s33, s4
	; GFX908-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX908-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX908-NEXT: s_setpc_b64 s[30:31]			; GFX908-NEXT: s_setpc_b64 s[30:31]
	%vreg0 = call <32 x float> asm sideeffect "; def $0", "=v"()			%vreg0 = call <32 x float> asm sideeffect "; def $0", "=v"()
	%v40 = call i32 asm sideeffect "; def $0","=${v40}"()			%v40 = call i32 asm sideeffect "; def $0","=${v40}"()

	%s11 = call i32 asm sideeffect "; def $0","=${s11}"()			%s11 = call i32 asm sideeffect "; def $0","=${s11}"()
	%s12 = call i32 asm sideeffect "; def $0","=${s12}"()			%s12 = call i32 asm sideeffect "; def $0","=${s12}"()
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sgpr-spills-split-regalloc.ll

	Show First 20 Lines • Show All 262 Lines • ▼ Show 20 Lines
	; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:440 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:440 ; 4-byte Folded Reload
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v255, off, s[0:3], s33 offset:448 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v255, off, s[0:3], s33 offset:448 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_add_i32 s32, s32, 0xffff8c00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s18			; GCN-NEXT: s_mov_b32 s33, s18
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%alloca = alloca i32, align 4, addrspace(5)			%alloca = alloca i32, align 4, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %alloca			store volatile i32 0, ptr addrspace(5) %alloca

	call void asm sideeffect "",			call void asm sideeffect "",
	"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}			"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
	▲ Show 20 Lines • Show All 275 Lines • ▼ Show 20 Lines
	; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_load_dword v254, off, s[0:3], s33 offset:444 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v254, off, s[0:3], s33 offset:444 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: s_add_i32 s32, s32, 0xffff8c00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s18			; GCN-NEXT: s_mov_b32 s33, s18
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%alloca = alloca i32, align 4, addrspace(5)			%alloca = alloca i32, align 4, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %alloca			store volatile i32 0, ptr addrspace(5) %alloca

	call void asm sideeffect "",			call void asm sideeffect "",
	"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}			"~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7},~{v8},~{v9}
	▲ Show 20 Lines • Show All 1,220 Lines • ▼ Show 20 Lines
	; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:416 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v47, off, s[0:3], s33 offset:416 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:420 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 offset:424 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:428 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:432 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:436 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:440 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:440 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:444 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:444 ; 4-byte Folded Reload
	; GCN-NEXT: s_add_i32 s32, s32, 0xffff8c00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s18			; GCN-NEXT: s_mov_b32 s33, s18
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	call void @child_function_ipra()			call void @child_function_ipra()
	ret void			ret void
	}			}

	define internal void @child_function_ipra_tail_call() #0 {			define internal void @child_function_ipra_tail_call() #0 {
	▲ Show 20 Lines • Show All 275 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/sibling-call.ll

	Show First 20 Lines • Show All 224 Lines • ▼ Show 20 Lines
	; GCN-NEXT: s_add_u32 s4, s4, sibling_call_i32_fastcc_i32_i32@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, sibling_call_i32_fastcc_i32_i32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, sibling_call_i32_fastcc_i32_i32@rel32@hi+12			; GCN-NEXT: s_addc_u32 s5, s5, sibling_call_i32_fastcc_i32_i32@rel32@hi+12
	; GCN-NEXT: v_readlane_b32 s31, [[CSRV]], 1			; GCN-NEXT: v_readlane_b32 s31, [[CSRV]], 1
	; GCN-NEXT: v_readlane_b32 s30, [[CSRV]], 0			; GCN-NEXT: v_readlane_b32 s30, [[CSRV]], 0
	; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], [[CSRV]], 2			; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], [[CSRV]], 2
	; GCN-NEXT: s_or_saveexec_b64 s[8:9], -1			; GCN-NEXT: s_or_saveexec_b64 s[8:9], -1
	; GCN-NEXT: buffer_load_dword [[CSRV]], off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword [[CSRV]], off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[8:9]			; GCN-NEXT: s_mov_b64 exec, s[8:9]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]			; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]
	; GCN-NEXT: s_setpc_b64 s[4:5]			; GCN-NEXT: s_setpc_b64 s[4:5]
	define fastcc i32 @sibling_call_i32_fastcc_i32_i32_other_call(i32 %a, i32 %b, i32 %c) #1 {			define fastcc i32 @sibling_call_i32_fastcc_i32_i32_other_call(i32 %a, i32 %b, i32 %c) #1 {
	entry:			entry:
	%other.call = tail call fastcc i32 @i32_fastcc_i32_i32(i32 %a, i32 %b)			%other.call = tail call fastcc i32 @i32_fastcc_i32_i32(i32 %a, i32 %b)
	%ret = tail call fastcc i32 @sibling_call_i32_fastcc_i32_i32(i32 %a, i32 %b, i32 %other.call)			%ret = tail call fastcc i32 @sibling_call_i32_fastcc_i32_i32(i32 %a, i32 %b, i32 %other.call)
	ret i32 %ret			ret i32 %ret
	}			}
	▲ Show 20 Lines • Show All 231 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/stack-realign.ll

	Show All 36 Lines

	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: v_or_b32_e32 v{{[0-9]+}}, 12			; GCN: v_or_b32_e32 v{{[0-9]+}}, 12
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: s_addk_i32 s32, 0x2800{{$}}			; GCN: s_addk_i32 s32, 0x2800{{$}}
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen

	; GCN: s_addk_i32 s32, 0xd800			; GCN: s_add_i32 s32, s33, 0xfffffc40

	; GCN: ; ScratchSize: 160			; GCN: ; ScratchSize: 160
	define void @needs_align16_stack_align4(i32 %idx) #2 {			define void @needs_align16_stack_align4(i32 %idx) #2 {
	%alloca.align16 = alloca [8 x <4 x i32>], align 16, addrspace(5)			%alloca.align16 = alloca [8 x <4 x i32>], align 16, addrspace(5)
	%gep0 = getelementptr inbounds [8 x <4 x i32>], ptr addrspace(5) %alloca.align16, i32 0, i32 %idx			%gep0 = getelementptr inbounds [8 x <4 x i32>], ptr addrspace(5) %alloca.align16, i32 0, i32 %idx
	store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, ptr addrspace(5) %gep0, align 16			store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, ptr addrspace(5) %gep0, align 16
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}needs_align32:			; GCN-LABEL: {{^}}needs_align32:
	; GCN: s_add_i32 [[SCRATCH_REG:s[0-9]+]], s32, 0x7c0{{$}}			; GCN: s_add_i32 [[SCRATCH_REG:s[0-9]+]], s32, 0x7c0{{$}}
	; GCN: s_and_b32 s33, [[SCRATCH_REG]], 0xfffff800			; GCN: s_and_b32 s33, [[SCRATCH_REG]], 0xfffff800

	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: v_or_b32_e32 v{{[0-9]+}}, 12			; GCN: v_or_b32_e32 v{{[0-9]+}}, 12
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: s_addk_i32 s32, 0x3000{{$}}			; GCN: s_addk_i32 s32, 0x3000{{$}}
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen

	; GCN: s_addk_i32 s32, 0xd000			; GCN: s_add_i32 s32, s33, 0xfffff840

	; GCN: ; ScratchSize: 192			; GCN: ; ScratchSize: 192
	define void @needs_align32(i32 %idx) #0 {			define void @needs_align32(i32 %idx) #0 {
	%alloca.align16 = alloca [8 x <4 x i32>], align 32, addrspace(5)			%alloca.align16 = alloca [8 x <4 x i32>], align 32, addrspace(5)
	%gep0 = getelementptr inbounds [8 x <4 x i32>], ptr addrspace(5) %alloca.align16, i32 0, i32 %idx			%gep0 = getelementptr inbounds [8 x <4 x i32>], ptr addrspace(5) %alloca.align16, i32 0, i32 %idx
	store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, ptr addrspace(5) %gep0, align 32			store volatile <4 x i32> <i32 1, i32 2, i32 3, i32 4>, ptr addrspace(5) %gep0, align 32
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}force_realign4:			; GCN-LABEL: {{^}}force_realign4:
	; GCN: s_add_i32 [[SCRATCH_REG:s[0-9]+]], s32, 0xc0{{$}}			; GCN: s_add_i32 [[SCRATCH_REG:s[0-9]+]], s32, 0xc0{{$}}
	; GCN: s_and_b32 s33, [[SCRATCH_REG]], 0xffffff00			; GCN: s_and_b32 s33, [[SCRATCH_REG]], 0xffffff00
	; GCN: s_addk_i32 s32, 0xd00{{$}}			; GCN: s_addk_i32 s32, 0xd00{{$}}

	; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen			; GCN: buffer_store_dword v{{[0-9]+}}, v{{[0-9]+}}, s[0:3], 0 offen
	; GCN: s_addk_i32 s32, 0xf300			; GCN: s_add_i32 s32, s33, 0xffffff40

	; GCN: ; ScratchSize: 52			; GCN: ; ScratchSize: 52
	define void @force_realign4(i32 %idx) #1 {			define void @force_realign4(i32 %idx) #1 {
	%alloca.align16 = alloca [8 x i32], align 4, addrspace(5)			%alloca.align16 = alloca [8 x i32], align 4, addrspace(5)
	%gep0 = getelementptr inbounds [8 x i32], ptr addrspace(5) %alloca.align16, i32 0, i32 %idx			%gep0 = getelementptr inbounds [8 x i32], ptr addrspace(5) %alloca.align16, i32 0, i32 %idx
	store volatile i32 3, ptr addrspace(5) %gep0, align 4			store volatile i32 3, ptr addrspace(5) %gep0, align 4
	ret void			ret void
	}			}
	Show All 34 Lines

	; GCN-LABEL: {{^}}default_realign_align128:			; GCN-LABEL: {{^}}default_realign_align128:
	; GCN: s_mov_b32 [[FP_COPY:s[0-9]+]], s33			; GCN: s_mov_b32 [[FP_COPY:s[0-9]+]], s33
	; GCN-NEXT: s_add_i32 s33, s32, 0x1fc0			; GCN-NEXT: s_add_i32 s33, s32, 0x1fc0
	; GCN-NEXT: s_and_b32 s33, s33, 0xffffe000			; GCN-NEXT: s_and_b32 s33, s33, 0xffffe000
	; GCN-NEXT: s_addk_i32 s32, 0x4000			; GCN-NEXT: s_addk_i32 s32, 0x4000
	; GCN-NOT: s33			; GCN-NOT: s33
	; GCN: buffer_store_dword v0, off, s[0:3], s33{{$}}			; GCN: buffer_store_dword v0, off, s[0:3], s33{{$}}
	; GCN: s_addk_i32 s32, 0xc000			; GCN: s_add_i32 s32, s33, 0xffffe040
	; GCN: s_mov_b32 s33, [[FP_COPY]]			; GCN: s_mov_b32 s33, [[FP_COPY]]
	define void @default_realign_align128(i32 %idx) #0 {			define void @default_realign_align128(i32 %idx) #0 {
	%alloca.align = alloca i32, align 128, addrspace(5)			%alloca.align = alloca i32, align 128, addrspace(5)
	store volatile i32 9, ptr addrspace(5) %alloca.align, align 128			store volatile i32 9, ptr addrspace(5) %alloca.align, align 128
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}disable_realign_align128:			; GCN-LABEL: {{^}}disable_realign_align128:
	Show All 33 Lines

	; GCN: v_readlane_b32 s31, [[VGPR_REG]], 1			; GCN: v_readlane_b32 s31, [[VGPR_REG]], 1
	; GCN: v_readlane_b32 s30, [[VGPR_REG]], 0			; GCN: v_readlane_b32 s30, [[VGPR_REG]], 0
	; GCN-NEXT: v_readlane_b32 s34, [[VGPR_REG]], 3			; GCN-NEXT: v_readlane_b32 s34, [[VGPR_REG]], 3
	; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], [[VGPR_REG]], 2			; GCN-NEXT: v_readlane_b32 [[FP_SCRATCH_COPY:s[0-9]+]], [[VGPR_REG]], 2
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword [[VGPR_REG]], off, s[0:3], s33 offset:1028 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword [[VGPR_REG]], off, s[0:3], s33 offset:1028 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_add_i32 s32, s32, 0xfffd0000			; GCN-NEXT: s_add_i32 s32, s33, 0xffff0040
	; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]			; GCN-NEXT: s_mov_b32 s33, [[FP_SCRATCH_COPY]]
	; GCN: s_setpc_b64 s[30:31]			; GCN: s_setpc_b64 s[30:31]
	%temp = alloca i32, align 1024, addrspace(5)			%temp = alloca i32, align 1024, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %temp, align 1024			store volatile i32 0, ptr addrspace(5) %temp, align 1024
	call void @extern_func(<32 x i32> %a, i32 %b)			call void @extern_func(<32 x i32> %a, i32 %b)
	ret void			ret void
	}			}

	Show All 13 Lines
	; GCN-NEXT: s_and_b32 s33, s33, 0xffff0000			; GCN-NEXT: s_and_b32 s33, s33, 0xffff0000
	; GCN-NEXT: v_lshrrev_b32_e64 [[VGPR_REG:v[0-9]+]], 6, s34			; GCN-NEXT: v_lshrrev_b32_e64 [[VGPR_REG:v[0-9]+]], 6, s34
	; GCN-NEXT: v_mov_b32_e32 v{{[0-9]+}}, 0			; GCN-NEXT: v_mov_b32_e32 v{{[0-9]+}}, 0
	; GCN: s_add_i32 s32, s32, 0x30000			; GCN: s_add_i32 s32, s32, 0x30000
	; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:1024			; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:1024
	; GCN: buffer_load_dword v{{[0-9]+}}, [[VGPR_REG]], s[0:3], 0 offen			; GCN: buffer_load_dword v{{[0-9]+}}, [[VGPR_REG]], s[0:3], 0 offen
	; GCN: v_add_u32_e32 [[VGPR_REG]], vcc, 4, [[VGPR_REG]]			; GCN: v_add_u32_e32 [[VGPR_REG]], vcc, 4, [[VGPR_REG]]
	; GCN: s_mov_b32 s34, [[BP_COPY]]			; GCN: s_mov_b32 s34, [[BP_COPY]]
	; GCN-NEXT: s_add_i32 s32, s32, 0xfffd0000			; GCN-NEXT: s_add_i32 s32, s33, 0xffff0040
	; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]			; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	begin:			begin:
	%local_var = alloca i32, align 1024, addrspace(5)			%local_var = alloca i32, align 1024, addrspace(5)
	store volatile i32 0, ptr addrspace(5) %local_var, align 1024			store volatile i32 0, ptr addrspace(5) %local_var, align 1024
	br label %loop_body			br label %loop_body

	loop_end: ; preds = %loop_body			loop_end: ; preds = %loop_body
	▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/stacksave_stackrestore.ll

	Show First 20 Lines • Show All 1,020 Lines • ▼ Show 20 Lines
	; WAVE32-OPT-NEXT: ; use s7			; WAVE32-OPT-NEXT: ; use s7
	; WAVE32-OPT-NEXT: ;;#ASMEND			; WAVE32-OPT-NEXT: ;;#ASMEND
	; WAVE32-OPT-NEXT: s_mov_b32 s32, s6			; WAVE32-OPT-NEXT: s_mov_b32 s32, s6
	; WAVE32-OPT-NEXT: v_readlane_b32 s31, v31, 1			; WAVE32-OPT-NEXT: v_readlane_b32 s31, v31, 1
	; WAVE32-OPT-NEXT: v_readlane_b32 s30, v31, 0			; WAVE32-OPT-NEXT: v_readlane_b32 s30, v31, 0
	; WAVE32-OPT-NEXT: s_xor_saveexec_b32 s4, -1			; WAVE32-OPT-NEXT: s_xor_saveexec_b32 s4, -1
	; WAVE32-OPT-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:128 ; 4-byte Folded Reload			; WAVE32-OPT-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:128 ; 4-byte Folded Reload
	; WAVE32-OPT-NEXT: s_mov_b32 exec_lo, s4			; WAVE32-OPT-NEXT: s_mov_b32 exec_lo, s4
	; WAVE32-OPT-NEXT: s_addk_i32 s32, 0xee00			; WAVE32-OPT-NEXT: s_mov_b32 s32, s33
	; WAVE32-OPT-NEXT: s_mov_b32 s33, s8			; WAVE32-OPT-NEXT: s_mov_b32 s33, s8
	; WAVE32-OPT-NEXT: s_waitcnt vmcnt(0)			; WAVE32-OPT-NEXT: s_waitcnt vmcnt(0)
	; WAVE32-OPT-NEXT: s_setpc_b64 s[30:31]			; WAVE32-OPT-NEXT: s_setpc_b64 s[30:31]
	;			;
	; WAVE64-OPT-LABEL: func_stacksave_stackrestore_call_with_stack_objects:			; WAVE64-OPT-LABEL: func_stacksave_stackrestore_call_with_stack_objects:
	; WAVE64-OPT: ; %bb.0:			; WAVE64-OPT: ; %bb.0:
	; WAVE64-OPT-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; WAVE64-OPT-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; WAVE64-OPT-NEXT: s_mov_b32 s8, s33			; WAVE64-OPT-NEXT: s_mov_b32 s8, s33
	Show All 19 Lines
	; WAVE64-OPT-NEXT: ; use s7			; WAVE64-OPT-NEXT: ; use s7
	; WAVE64-OPT-NEXT: ;;#ASMEND			; WAVE64-OPT-NEXT: ;;#ASMEND
	; WAVE64-OPT-NEXT: s_mov_b32 s32, s6			; WAVE64-OPT-NEXT: s_mov_b32 s32, s6
	; WAVE64-OPT-NEXT: v_readlane_b32 s31, v31, 1			; WAVE64-OPT-NEXT: v_readlane_b32 s31, v31, 1
	; WAVE64-OPT-NEXT: v_readlane_b32 s30, v31, 0			; WAVE64-OPT-NEXT: v_readlane_b32 s30, v31, 0
	; WAVE64-OPT-NEXT: s_xor_saveexec_b64 s[4:5], -1			; WAVE64-OPT-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; WAVE64-OPT-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:128 ; 4-byte Folded Reload			; WAVE64-OPT-NEXT: buffer_load_dword v31, off, s[0:3], s33 offset:128 ; 4-byte Folded Reload
	; WAVE64-OPT-NEXT: s_mov_b64 exec, s[4:5]			; WAVE64-OPT-NEXT: s_mov_b64 exec, s[4:5]
	; WAVE64-OPT-NEXT: s_addk_i32 s32, 0xdc00			; WAVE64-OPT-NEXT: s_mov_b32 s32, s33
	; WAVE64-OPT-NEXT: s_mov_b32 s33, s8			; WAVE64-OPT-NEXT: s_mov_b32 s33, s8
	; WAVE64-OPT-NEXT: s_waitcnt vmcnt(0)			; WAVE64-OPT-NEXT: s_waitcnt vmcnt(0)
	; WAVE64-OPT-NEXT: s_setpc_b64 s[30:31]			; WAVE64-OPT-NEXT: s_setpc_b64 s[30:31]
	;			;
	; WAVE32-O0-LABEL: func_stacksave_stackrestore_call_with_stack_objects:			; WAVE32-O0-LABEL: func_stacksave_stackrestore_call_with_stack_objects:
	; WAVE32-O0: ; %bb.0:			; WAVE32-O0: ; %bb.0:
	; WAVE32-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; WAVE32-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; WAVE32-O0-NEXT: s_mov_b32 s25, s33			; WAVE32-O0-NEXT: s_mov_b32 s25, s33
	▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
	; WAVE32-O0-NEXT: s_mov_b32 s32, s4			; WAVE32-O0-NEXT: s_mov_b32 s32, s4
	; WAVE32-O0-NEXT: v_readlane_b32 s31, v32, 1			; WAVE32-O0-NEXT: v_readlane_b32 s31, v32, 1
	; WAVE32-O0-NEXT: v_readlane_b32 s30, v32, 0			; WAVE32-O0-NEXT: v_readlane_b32 s30, v32, 0
	; WAVE32-O0-NEXT: ; kill: killed $vgpr0			; WAVE32-O0-NEXT: ; kill: killed $vgpr0
	; WAVE32-O0-NEXT: s_xor_saveexec_b32 s4, -1			; WAVE32-O0-NEXT: s_xor_saveexec_b32 s4, -1
	; WAVE32-O0-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:128 ; 4-byte Folded Reload			; WAVE32-O0-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:128 ; 4-byte Folded Reload
	; WAVE32-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:136 ; 4-byte Folded Reload			; WAVE32-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:136 ; 4-byte Folded Reload
	; WAVE32-O0-NEXT: s_mov_b32 exec_lo, s4			; WAVE32-O0-NEXT: s_mov_b32 exec_lo, s4
	; WAVE32-O0-NEXT: s_add_i32 s32, s32, 0xffffee00			; WAVE32-O0-NEXT: s_mov_b32 s32, s33
	; WAVE32-O0-NEXT: s_mov_b32 s33, s25			; WAVE32-O0-NEXT: s_mov_b32 s33, s25
	; WAVE32-O0-NEXT: s_waitcnt vmcnt(0)			; WAVE32-O0-NEXT: s_waitcnt vmcnt(0)
	; WAVE32-O0-NEXT: s_setpc_b64 s[30:31]			; WAVE32-O0-NEXT: s_setpc_b64 s[30:31]
	;			;
	; WAVE64-O0-LABEL: func_stacksave_stackrestore_call_with_stack_objects:			; WAVE64-O0-LABEL: func_stacksave_stackrestore_call_with_stack_objects:
	; WAVE64-O0: ; %bb.0:			; WAVE64-O0: ; %bb.0:
	; WAVE64-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; WAVE64-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; WAVE64-O0-NEXT: s_mov_b32 s19, s33			; WAVE64-O0-NEXT: s_mov_b32 s19, s33
	▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
	; WAVE64-O0-NEXT: s_mov_b32 s32, s4			; WAVE64-O0-NEXT: s_mov_b32 s32, s4
	; WAVE64-O0-NEXT: v_readlane_b32 s31, v32, 1			; WAVE64-O0-NEXT: v_readlane_b32 s31, v32, 1
	; WAVE64-O0-NEXT: v_readlane_b32 s30, v32, 0			; WAVE64-O0-NEXT: v_readlane_b32 s30, v32, 0
	; WAVE64-O0-NEXT: ; kill: killed $vgpr0			; WAVE64-O0-NEXT: ; kill: killed $vgpr0
	; WAVE64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1			; WAVE64-O0-NEXT: s_xor_saveexec_b64 s[4:5], -1
	; WAVE64-O0-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:128 ; 4-byte Folded Reload			; WAVE64-O0-NEXT: buffer_load_dword v32, off, s[0:3], s33 offset:128 ; 4-byte Folded Reload
	; WAVE64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:136 ; 4-byte Folded Reload			; WAVE64-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:136 ; 4-byte Folded Reload
	; WAVE64-O0-NEXT: s_mov_b64 exec, s[4:5]			; WAVE64-O0-NEXT: s_mov_b64 exec, s[4:5]
	; WAVE64-O0-NEXT: s_add_i32 s32, s32, 0xffffdc00			; WAVE64-O0-NEXT: s_mov_b32 s32, s33
	; WAVE64-O0-NEXT: s_mov_b32 s33, s19			; WAVE64-O0-NEXT: s_mov_b32 s33, s19
	; WAVE64-O0-NEXT: s_waitcnt vmcnt(0)			; WAVE64-O0-NEXT: s_waitcnt vmcnt(0)
	; WAVE64-O0-NEXT: s_setpc_b64 s[30:31]			; WAVE64-O0-NEXT: s_setpc_b64 s[30:31]
	%alloca = alloca [32 x i32], addrspace(5)			%alloca = alloca [32 x i32], addrspace(5)
	%stacksave = call ptr addrspace(5) @llvm.stacksave.p5()			%stacksave = call ptr addrspace(5) @llvm.stacksave.p5()
	store volatile i32 42, ptr addrspace(5) %alloca			store volatile i32 42, ptr addrspace(5) %alloca
	call void @stack_passed_argument([32 x i32] poison, i32 17)			call void @stack_passed_argument([32 x i32] poison, i32 17)
	call void asm sideeffect "; use $0", "s"(ptr addrspace(5) %stacksave)			call void asm sideeffect "; use $0", "s"(ptr addrspace(5) %stacksave)
	call void @llvm.stackrestore.p5(ptr addrspace(5) %stacksave)			call void @llvm.stackrestore.p5(ptr addrspace(5) %stacksave)
	ret void			ret void
	}			}
	;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:			;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
	; WAVE32: {{.*}}			; WAVE32: {{.*}}
	; WAVE64: {{.*}}			; WAVE64: {{.*}}

llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll

	Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s34, v40, 2			; GCN-NEXT: v_readlane_b32 s34, v40, 2
	; GCN-NEXT: v_readlane_b32 s31, v40, 1			; GCN-NEXT: v_readlane_b32 s31, v40, 1
	; GCN-NEXT: v_readlane_b32 s30, v40, 0			; GCN-NEXT: v_readlane_b32 s30, v40, 0
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: v_readlane_b32 s4, v40, 16			; GCN-NEXT: v_readlane_b32 s4, v40, 16
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xfc00			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	; GCN-NEXT: .LBB0_9: ; %bb2			; GCN-NEXT: .LBB0_9: ; %bb2
	; GCN-NEXT: v_cmp_eq_u32_e64 s[46:47], 21, v0			; GCN-NEXT: v_cmp_eq_u32_e64 s[46:47], 21, v0
	; GCN-NEXT: v_cmp_ne_u32_e64 s[6:7], 21, v0			; GCN-NEXT: v_cmp_ne_u32_e64 s[6:7], 21, v0
	; GCN-NEXT: s_mov_b64 vcc, exec			; GCN-NEXT: s_mov_b64 vcc, exec
	; GCN-NEXT: s_cbranch_execnz .LBB0_2			; GCN-NEXT: s_cbranch_execnz .LBB0_2
	▲ Show 20 Lines • Show All 335 Lines • ▼ Show 20 Lines
	; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GCN-NEXT: v_readlane_b32 s4, v40, 28			; GCN-NEXT: v_readlane_b32 s4, v40, 28
	; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xf800			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%tmp = load float, ptr null, align 16			%tmp = load float, ptr null, align 16
	br label %bb2			br label %bb2

	bb1: ; preds = %bb8, %bb6			bb1: ; preds = %bb8, %bb6
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/use_restore_frame_reg.mir

Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	body: \|
; MUBUF-NEXT: bb.2:		; MUBUF-NEXT: bb.2:
; MUBUF-NEXT: liveins: $vgpr2		; MUBUF-NEXT: liveins: $vgpr2
; MUBUF-NEXT: {{ $}}		; MUBUF-NEXT: {{ $}}
; MUBUF-NEXT: $sgpr4 = V_READLANE_B32 $vgpr2, 0		; MUBUF-NEXT: $sgpr4 = V_READLANE_B32 $vgpr2, 0
; MUBUF-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; MUBUF-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; MUBUF-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 9961728, implicit-def dead $scc		; MUBUF-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 9961728, implicit-def dead $scc
; MUBUF-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr5, 0, 0, 0, implicit $exec :: (load (s32) from %stack.20, addrspace 5)		; MUBUF-NEXT: $vgpr2 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr5, 0, 0, 0, implicit $exec :: (load (s32) from %stack.20, addrspace 5)
; MUBUF-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7		; MUBUF-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7
; MUBUF-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -11010048, implicit-def dead $scc		; MUBUF-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr33, -524224, implicit-def dead $scc
; MUBUF-NEXT: $sgpr33 = COPY $sgpr4		; MUBUF-NEXT: $sgpr33 = COPY $sgpr4
; MUBUF-NEXT: S_ENDPGM 0		; MUBUF-NEXT: S_ENDPGM 0
		;
; FLATSCR-LABEL: name: use_restore_frame_reg		; FLATSCR-LABEL: name: use_restore_frame_reg
; FLATSCR: bb.0:		; FLATSCR: bb.0:
; FLATSCR-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)		; FLATSCR-NEXT: successors: %bb.2(0x40000000), %bb.1(0x40000000)
; FLATSCR-NEXT: liveins: $vgpr1, $vgpr2		; FLATSCR-NEXT: liveins: $vgpr1, $vgpr2
; FLATSCR-NEXT: {{ $}}		; FLATSCR-NEXT: {{ $}}
; FLATSCR-NEXT: $sgpr4 = COPY $sgpr33		; FLATSCR-NEXT: $sgpr4 = COPY $sgpr33
; FLATSCR-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 8191, implicit-def $scc		; FLATSCR-NEXT: $sgpr33 = frame-setup S_ADD_I32 $sgpr32, 8191, implicit-def $scc
; FLATSCR-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294959104, implicit-def dead $scc		; FLATSCR-NEXT: $sgpr33 = frame-setup S_AND_B32 killed $sgpr33, 4294959104, implicit-def dead $scc
Show All 31 Lines	body: \|
; FLATSCR-NEXT: bb.2:		; FLATSCR-NEXT: bb.2:
; FLATSCR-NEXT: liveins: $vgpr2		; FLATSCR-NEXT: liveins: $vgpr2
; FLATSCR-NEXT: {{ $}}		; FLATSCR-NEXT: {{ $}}
; FLATSCR-NEXT: $sgpr4 = V_READLANE_B32 $vgpr2, 0		; FLATSCR-NEXT: $sgpr4 = V_READLANE_B32 $vgpr2, 0
; FLATSCR-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec		; FLATSCR-NEXT: $sgpr6_sgpr7 = S_XOR_SAVEEXEC_B64 -1, implicit-def $exec, implicit-def dead $scc, implicit $exec
; FLATSCR-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 155652, implicit-def dead $scc		; FLATSCR-NEXT: $sgpr5 = S_ADD_I32 $sgpr33, 155652, implicit-def dead $scc
; FLATSCR-NEXT: $vgpr2 = SCRATCH_LOAD_DWORD_SADDR killed $sgpr5, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.20, addrspace 5)		; FLATSCR-NEXT: $vgpr2 = SCRATCH_LOAD_DWORD_SADDR killed $sgpr5, 0, 0, implicit $exec, implicit $flat_scr :: (load (s32) from %stack.20, addrspace 5)
; FLATSCR-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7		; FLATSCR-NEXT: $exec = S_MOV_B64 killed $sgpr6_sgpr7
; FLATSCR-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr32, -172032, implicit-def dead $scc		; FLATSCR-NEXT: $sgpr32 = frame-destroy S_ADD_I32 $sgpr33, -8191, implicit-def dead $scc
; FLATSCR-NEXT: $sgpr33 = COPY $sgpr4		; FLATSCR-NEXT: $sgpr33 = COPY $sgpr4
; FLATSCR-NEXT: S_ENDPGM 0		; FLATSCR-NEXT: S_ENDPGM 0
bb.0:		bb.0:
liveins: $vgpr1		liveins: $vgpr1

S_CMP_EQ_U32 0, 0, implicit-def $scc		S_CMP_EQ_U32 0, 0, implicit-def $scc
S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc		S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr27, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec		$vgpr0 = V_MOV_B32_e32 %stack.0, implicit $exec
Show All 11 Lines

llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll

	Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s4, v40, 2			; GFX9-NEXT: v_readlane_b32 s4, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: non_preserved_vgpr_tuple8:			; GFX10-LABEL: non_preserved_vgpr_tuple8:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s4, s33			; GFX10-NEXT: s_mov_b32 s4, s33
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:12
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s4, v40, 2			; GFX10-NEXT: v_readlane_b32 s4, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s5, -1			; GFX10-NEXT: s_or_saveexec_b32 s5, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s5			; GFX10-NEXT: s_mov_b32 exec_lo, s5
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s4			; GFX10-NEXT: s_mov_b32 s33, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: non_preserved_vgpr_tuple8:			; GFX11-LABEL: non_preserved_vgpr_tuple8:
	; GFX11: ; %bb.0: ; %main_body			; GFX11: ; %bb.0: ; %main_body
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 36 Lines
	; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:8			; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:8
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:12			; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:12
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:16 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:16 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_addk_i32 s32, 0xffe0			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]





	▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GFX9-NEXT: v_readlane_b32 s31, v40, 1			; GFX9-NEXT: v_readlane_b32 s31, v40, 1
	; GFX9-NEXT: v_readlane_b32 s30, v40, 0			; GFX9-NEXT: v_readlane_b32 s30, v40, 0
	; GFX9-NEXT: v_readlane_b32 s4, v40, 2			; GFX9-NEXT: v_readlane_b32 s4, v40, 2
	; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX9-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload			; GFX9-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
	; GFX9-NEXT: s_mov_b64 exec, s[6:7]			; GFX9-NEXT: s_mov_b64 exec, s[6:7]
	; GFX9-NEXT: s_addk_i32 s32, 0xf800			; GFX9-NEXT: s_mov_b32 s32, s33
	; GFX9-NEXT: s_mov_b32 s33, s4			; GFX9-NEXT: s_mov_b32 s33, s4
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX10-LABEL: call_preserved_vgpr_tuple8:			; GFX10-LABEL: call_preserved_vgpr_tuple8:
	; GFX10: ; %bb.0: ; %main_body			; GFX10: ; %bb.0: ; %main_body
	; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX10-NEXT: s_mov_b32 s4, s33			; GFX10-NEXT: s_mov_b32 s4, s33
	Show All 34 Lines
	; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16			; GFX10-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16
	; GFX10-NEXT: v_readlane_b32 s31, v40, 1			; GFX10-NEXT: v_readlane_b32 s31, v40, 1
	; GFX10-NEXT: v_readlane_b32 s30, v40, 0			; GFX10-NEXT: v_readlane_b32 s30, v40, 0
	; GFX10-NEXT: v_readlane_b32 s4, v40, 2			; GFX10-NEXT: v_readlane_b32 s4, v40, 2
	; GFX10-NEXT: s_or_saveexec_b32 s5, -1			; GFX10-NEXT: s_or_saveexec_b32 s5, -1
	; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload			; GFX10-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
	; GFX10-NEXT: s_waitcnt_depctr 0xffe3			; GFX10-NEXT: s_waitcnt_depctr 0xffe3
	; GFX10-NEXT: s_mov_b32 exec_lo, s5			; GFX10-NEXT: s_mov_b32 exec_lo, s5
	; GFX10-NEXT: s_addk_i32 s32, 0xfc00			; GFX10-NEXT: s_mov_b32 s32, s33
	; GFX10-NEXT: s_mov_b32 s33, s4			; GFX10-NEXT: s_mov_b32 s33, s4
	; GFX10-NEXT: s_waitcnt vmcnt(0)			; GFX10-NEXT: s_waitcnt vmcnt(0)
	; GFX10-NEXT: s_setpc_b64 s[30:31]			; GFX10-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX11-LABEL: call_preserved_vgpr_tuple8:			; GFX11-LABEL: call_preserved_vgpr_tuple8:
	; GFX11: ; %bb.0: ; %main_body			; GFX11: ; %bb.0: ; %main_body
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_mov_b32 s0, s33			; GFX11-NEXT: s_mov_b32 s0, s33
	Show All 31 Lines
	; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:12			; GFX11-NEXT: scratch_load_b32 v42, off, s33 offset:12
	; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:16			; GFX11-NEXT: scratch_load_b32 v41, off, s33 offset:16
	; GFX11-NEXT: v_readlane_b32 s31, v40, 1			; GFX11-NEXT: v_readlane_b32 s31, v40, 1
	; GFX11-NEXT: v_readlane_b32 s30, v40, 0			; GFX11-NEXT: v_readlane_b32 s30, v40, 0
	; GFX11-NEXT: v_readlane_b32 s0, v40, 2			; GFX11-NEXT: v_readlane_b32 s0, v40, 2
	; GFX11-NEXT: s_or_saveexec_b32 s1, -1			; GFX11-NEXT: s_or_saveexec_b32 s1, -1
	; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:20 ; 4-byte Folded Reload			; GFX11-NEXT: scratch_load_b32 v40, off, s33 offset:20 ; 4-byte Folded Reload
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_addk_i32 s32, 0xffe0			; GFX11-NEXT: s_mov_b32 s32, s33
	; GFX11-NEXT: s_mov_b32 s33, s0			; GFX11-NEXT: s_mov_b32 s33, s0
	; GFX11-NEXT: s_waitcnt vmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0)
	; GFX11-NEXT: s_setpc_b64 s[30:31]			; GFX11-NEXT: s_setpc_b64 s[30:31]





	Show All 15 Lines

llvm/test/CodeGen/AMDGPU/wave32.ll

	Show First 20 Lines • Show All 2,869 Lines • ▼ Show 20 Lines
	; GFX1032-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GFX1032-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GFX1032-NEXT: v_readlane_b32 s31, v40, 1			; GFX1032-NEXT: v_readlane_b32 s31, v40, 1
	; GFX1032-NEXT: v_readlane_b32 s30, v40, 0			; GFX1032-NEXT: v_readlane_b32 s30, v40, 0
	; GFX1032-NEXT: v_readlane_b32 s4, v40, 2			; GFX1032-NEXT: v_readlane_b32 s4, v40, 2
	; GFX1032-NEXT: s_or_saveexec_b32 s5, -1			; GFX1032-NEXT: s_or_saveexec_b32 s5, -1
	; GFX1032-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX1032-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX1032-NEXT: s_waitcnt_depctr 0xffe3			; GFX1032-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1032-NEXT: s_mov_b32 exec_lo, s5			; GFX1032-NEXT: s_mov_b32 exec_lo, s5
	; GFX1032-NEXT: s_addk_i32 s32, 0xfe00			; GFX1032-NEXT: s_mov_b32 s32, s33
	; GFX1032-NEXT: s_mov_b32 s33, s4			; GFX1032-NEXT: s_mov_b32 s33, s4
	; GFX1032-NEXT: s_waitcnt vmcnt(0)			; GFX1032-NEXT: s_waitcnt vmcnt(0)
	; GFX1032-NEXT: s_setpc_b64 s[30:31]			; GFX1032-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX1064-LABEL: callee_no_stack_with_call:			; GFX1064-LABEL: callee_no_stack_with_call:
	; GFX1064: ; %bb.0:			; GFX1064: ; %bb.0:
	; GFX1064-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX1064-NEXT: s_mov_b32 s16, s33			; GFX1064-NEXT: s_mov_b32 s16, s33
	Show All 14 Lines
	; GFX1064-NEXT: s_swappc_b64 s[30:31], s[16:17]			; GFX1064-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GFX1064-NEXT: v_readlane_b32 s31, v40, 1			; GFX1064-NEXT: v_readlane_b32 s31, v40, 1
	; GFX1064-NEXT: v_readlane_b32 s30, v40, 0			; GFX1064-NEXT: v_readlane_b32 s30, v40, 0
	; GFX1064-NEXT: v_readlane_b32 s4, v40, 2			; GFX1064-NEXT: v_readlane_b32 s4, v40, 2
	; GFX1064-NEXT: s_or_saveexec_b64 s[6:7], -1			; GFX1064-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GFX1064-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX1064-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX1064-NEXT: s_waitcnt_depctr 0xffe3			; GFX1064-NEXT: s_waitcnt_depctr 0xffe3
	; GFX1064-NEXT: s_mov_b64 exec, s[6:7]			; GFX1064-NEXT: s_mov_b64 exec, s[6:7]
	; GFX1064-NEXT: s_addk_i32 s32, 0xfc00			; GFX1064-NEXT: s_mov_b32 s32, s33
	; GFX1064-NEXT: s_mov_b32 s33, s4			; GFX1064-NEXT: s_mov_b32 s33, s4
	; GFX1064-NEXT: s_waitcnt vmcnt(0)			; GFX1064-NEXT: s_waitcnt vmcnt(0)
	; GFX1064-NEXT: s_setpc_b64 s[30:31]			; GFX1064-NEXT: s_setpc_b64 s[30:31]
	call void @external_void_func_void()			call void @external_void_func_void()
	ret void			ret void
	}			}


	Show All 38 Lines

llvm/test/CodeGen/AMDGPU/whole-wave-register-copy.ll

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; GFX90A-NEXT: ; kill: killed $vgpr0			; GFX90A-NEXT: ; kill: killed $vgpr0
	; GFX90A-NEXT: v_readlane_b32 s4, v40, 2			; GFX90A-NEXT: v_readlane_b32 s4, v40, 2
	; GFX90A-NEXT: s_xor_saveexec_b64 s[6:7], -1			; GFX90A-NEXT: s_xor_saveexec_b64 s[6:7], -1
	; GFX90A-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX90A-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX90A-NEXT: s_mov_b64 exec, -1			; GFX90A-NEXT: s_mov_b64 exec, -1
	; GFX90A-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX90A-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX90A-NEXT: buffer_load_dword a32, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX90A-NEXT: buffer_load_dword a32, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX90A-NEXT: s_mov_b64 exec, s[6:7]			; GFX90A-NEXT: s_mov_b64 exec, s[6:7]
	; GFX90A-NEXT: s_addk_i32 s32, 0xfc00			; GFX90A-NEXT: s_mov_b32 s32, s33
	; GFX90A-NEXT: s_mov_b32 s33, s4			; GFX90A-NEXT: s_mov_b32 s33, s4
	; GFX90A-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NEXT: s_setpc_b64 s[30:31]			; GFX90A-NEXT: s_setpc_b64 s[30:31]
	%s20 = call i32 asm sideeffect "; def $0","=${s20}"()			%s20 = call i32 asm sideeffect "; def $0","=${s20}"()
	call void @foo()			call void @foo()
	call void asm sideeffect "; use $0","${s20}"(i32 %s20)			call void asm sideeffect "; use $0","${s20}"(i32 %s20)
	ret void			ret void
	}			}

	declare void @foo()			declare void @foo()

	attributes #0 = { "amdgpu-num-vgpr"="41" "amdgpu-num-sgpr"="34"}			attributes #0 = { "amdgpu-num-vgpr"="41" "amdgpu-num-sgpr"="34"}

llvm/test/CodeGen/AMDGPU/whole-wave-register-spill.ll

	Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_readlane_b32 s29, v40, 3			; GCN-NEXT: v_readlane_b32 s29, v40, 3
	; GCN-NEXT: v_readlane_b32 s4, v40, 4			; GCN-NEXT: v_readlane_b32 s4, v40, 4
	; GCN-NEXT: s_xor_saveexec_b64 s[6:7], -1			; GCN-NEXT: s_xor_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, -1			; GCN-NEXT: s_mov_b64 exec, -1
	; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[6:7]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_addk_i32 s32, 0xf800			; GCN-NEXT: s_mov_b32 s32, s33
	; GCN-NEXT: s_mov_b32 s33, s4			; GCN-NEXT: s_mov_b32 s33, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GCN-O0-LABEL: test:			; GCN-O0-LABEL: test:
	; GCN-O0: ; %bb.0:			; GCN-O0: ; %bb.0:
	; GCN-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-O0-NEXT: s_mov_b32 s16, s33			; GCN-O0-NEXT: s_mov_b32 s16, s33
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; GCN-O0-NEXT: v_readlane_b32 s28, v40, 2			; GCN-O0-NEXT: v_readlane_b32 s28, v40, 2
	; GCN-O0-NEXT: v_readlane_b32 s29, v40, 3			; GCN-O0-NEXT: v_readlane_b32 s29, v40, 3
	; GCN-O0-NEXT: v_readlane_b32 s4, v40, 4			; GCN-O0-NEXT: v_readlane_b32 s4, v40, 4
	; GCN-O0-NEXT: s_xor_saveexec_b64 s[6:7], -1			; GCN-O0-NEXT: s_xor_saveexec_b64 s[6:7], -1
	; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v0, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GCN-O0-NEXT: s_mov_b64 exec, -1			; GCN-O0-NEXT: s_mov_b64 exec, -1
	; GCN-O0-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload			; GCN-O0-NEXT: buffer_load_dword v40, off, s[0:3], s33 ; 4-byte Folded Reload
	; GCN-O0-NEXT: s_mov_b64 exec, s[6:7]			; GCN-O0-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-O0-NEXT: s_add_i32 s32, s32, 0xfffffc00			; GCN-O0-NEXT: s_mov_b32 s32, s33
	; GCN-O0-NEXT: s_mov_b32 s33, s4			; GCN-O0-NEXT: s_mov_b32 s33, s4
	; GCN-O0-NEXT: s_waitcnt vmcnt(0)			; GCN-O0-NEXT: s_waitcnt vmcnt(0)
	; GCN-O0-NEXT: s_setpc_b64 s[30:31]			; GCN-O0-NEXT: s_setpc_b64 s[30:31]
	%sgpr = call i32 asm sideeffect "; def $0", "=s" () #0			%sgpr = call i32 asm sideeffect "; def $0", "=s" () #0
	call void @ext_func()			call void @ext_func()
	store volatile i32 %sgpr, ptr addrspace(1) undef			store volatile i32 %sgpr, ptr addrspace(1) undef
	ret void			ret void
	}			}

	declare void @ext_func();			declare void @ext_func();

	attributes #0 = { nounwind "amdgpu-num-vgpr"="41" "amdgpu-num-sgpr"="34"}			attributes #0 = { nounwind "amdgpu-num-vgpr"="41" "amdgpu-num-sgpr"="34"}

llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll

	Show First 20 Lines • Show All 413 Lines • ▼ Show 20 Lines
	; GFX9-O0-NEXT: buffer_store_dword v0, off, s[36:39], s34 offset:4			; GFX9-O0-NEXT: buffer_store_dword v0, off, s[36:39], s34 offset:4
	; GFX9-O0-NEXT: v_readlane_b32 s31, v3, 1			; GFX9-O0-NEXT: v_readlane_b32 s31, v3, 1
	; GFX9-O0-NEXT: v_readlane_b32 s30, v3, 0			; GFX9-O0-NEXT: v_readlane_b32 s30, v3, 0
	; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-O0-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffffc00			; GFX9-O0-NEXT: s_mov_b32 s32, s33
	; GFX9-O0-NEXT: s_mov_b32 s33, s48			; GFX9-O0-NEXT: s_mov_b32 s33, s48
	; GFX9-O0-NEXT: s_waitcnt vmcnt(0)			; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O0-NEXT: s_setpc_b64 s[30:31]			; GFX9-O0-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-O3-LABEL: strict_wwm_call:			; GFX9-O3-LABEL: strict_wwm_call:
	; GFX9-O3: ; %bb.0:			; GFX9-O3: ; %bb.0:
	; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-O3-NEXT: s_mov_b32 s38, s33			; GFX9-O3-NEXT: s_mov_b32 s38, s33
	Show All 23 Lines
	; GFX9-O3-NEXT: buffer_store_dword v0, off, s[4:7], 0 offset:4			; GFX9-O3-NEXT: buffer_store_dword v0, off, s[4:7], 0 offset:4
	; GFX9-O3-NEXT: v_readlane_b32 s31, v3, 1			; GFX9-O3-NEXT: v_readlane_b32 s31, v3, 1
	; GFX9-O3-NEXT: v_readlane_b32 s30, v3, 0			; GFX9-O3-NEXT: v_readlane_b32 s30, v3, 0
	; GFX9-O3-NEXT: s_xor_saveexec_b64 s[34:35], -1			; GFX9-O3-NEXT: s_xor_saveexec_b64 s[34:35], -1
	; GFX9-O3-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v3, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v1, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O3-NEXT: s_addk_i32 s32, 0xfc00			; GFX9-O3-NEXT: s_mov_b32 s32, s33
	; GFX9-O3-NEXT: s_mov_b32 s33, s38			; GFX9-O3-NEXT: s_mov_b32 s33, s38
	; GFX9-O3-NEXT: s_waitcnt vmcnt(0)			; GFX9-O3-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O3-NEXT: s_setpc_b64 s[30:31]			; GFX9-O3-NEXT: s_setpc_b64 s[30:31]
	%tmp107 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %arg, i32 0)			%tmp107 = tail call i32 @llvm.amdgcn.set.inactive.i32(i32 %arg, i32 0)
	%tmp134 = call amdgpu_gfx i32 @strict_wwm_called(i32 %tmp107)			%tmp134 = call amdgpu_gfx i32 @strict_wwm_called(i32 %tmp107)
	%tmp136 = add i32 %tmp134, %tmp107			%tmp136 = add i32 %tmp134, %tmp107
	%tmp137 = tail call i32 @llvm.amdgcn.strict.wwm.i32(i32 %tmp136)			%tmp137 = tail call i32 @llvm.amdgcn.strict.wwm.i32(i32 %tmp136)
	call void @llvm.amdgcn.raw.ptr.buffer.store.i32(i32 %tmp137, ptr addrspace(8) %tmp14, i32 4, i32 0, i32 0)			call void @llvm.amdgcn.raw.ptr.buffer.store.i32(i32 %tmp137, ptr addrspace(8) %tmp14, i32 4, i32 0, i32 0)
	▲ Show 20 Lines • Show All 202 Lines • ▼ Show 20 Lines
	; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:28 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:32 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:32 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:36 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:36 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:40 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:40 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:44 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:44 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:48 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v4, off, s[0:3], s33 offset:48 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:52 ; 4-byte Folded Reload			; GFX9-O0-NEXT: buffer_load_dword v5, off, s[0:3], s33 offset:52 ; 4-byte Folded Reload
	; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O0-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O0-NEXT: s_add_i32 s32, s32, 0xfffff000			; GFX9-O0-NEXT: s_mov_b32 s32, s33
	; GFX9-O0-NEXT: s_mov_b32 s33, s48			; GFX9-O0-NEXT: s_mov_b32 s33, s48
	; GFX9-O0-NEXT: s_waitcnt vmcnt(0)			; GFX9-O0-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O0-NEXT: s_setpc_b64 s[30:31]			; GFX9-O0-NEXT: s_setpc_b64 s[30:31]
	;			;
	; GFX9-O3-LABEL: strict_wwm_call_i64:			; GFX9-O3-LABEL: strict_wwm_call_i64:
	; GFX9-O3: ; %bb.0:			; GFX9-O3: ; %bb.0:
	; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-O3-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-O3-NEXT: s_mov_b32 s40, s33			; GFX9-O3-NEXT: s_mov_b32 s40, s33
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; GFX9-O3-NEXT: buffer_load_dword v8, off, s[0:3], s33 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v8, off, s[0:3], s33 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v6, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: buffer_load_dword v7, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v7, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v2, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload			; GFX9-O3-NEXT: buffer_load_dword v3, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
	; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]			; GFX9-O3-NEXT: s_mov_b64 exec, s[34:35]
	; GFX9-O3-NEXT: s_addk_i32 s32, 0xf800			; GFX9-O3-NEXT: s_mov_b32 s32, s33
	; GFX9-O3-NEXT: s_mov_b32 s33, s40			; GFX9-O3-NEXT: s_mov_b32 s33, s40
	; GFX9-O3-NEXT: s_waitcnt vmcnt(0)			; GFX9-O3-NEXT: s_waitcnt vmcnt(0)
	; GFX9-O3-NEXT: s_setpc_b64 s[30:31]			; GFX9-O3-NEXT: s_setpc_b64 s[30:31]
	%tmp107 = tail call i64 @llvm.amdgcn.set.inactive.i64(i64 %arg, i64 0)			%tmp107 = tail call i64 @llvm.amdgcn.set.inactive.i64(i64 %arg, i64 0)
	%tmp134 = call amdgpu_gfx i64 @strict_wwm_called_i64(i64 %tmp107)			%tmp134 = call amdgpu_gfx i64 @strict_wwm_called_i64(i64 %tmp107)
	%tmp136 = add i64 %tmp134, %tmp107			%tmp136 = add i64 %tmp134, %tmp107
	%tmp137 = tail call i64 @llvm.amdgcn.strict.wwm.i64(i64 %tmp136)			%tmp137 = tail call i64 @llvm.amdgcn.strict.wwm.i64(i64 %tmp136)
	%tmp138 = bitcast i64 %tmp137 to <2 x i32>			%tmp138 = bitcast i64 %tmp137 to <2 x i32>
	▲ Show 20 Lines • Show All 657 Lines • Show Last 20 Lines

llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.generated.expected

	Show First 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]			; CHECK-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
	; CHECK-NEXT: s_cbranch_execz .LBB0_4			; CHECK-NEXT: s_cbranch_execz .LBB0_4
	; CHECK-NEXT: ; %bb.3:			; CHECK-NEXT: ; %bb.3:
	; CHECK-NEXT: v_mov_b32_e32 v0, 1			; CHECK-NEXT: v_mov_b32_e32 v0, 1
	; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:12			; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:12
	; CHECK-NEXT: .LBB0_4:			; CHECK-NEXT: .LBB0_4:
	; CHECK-NEXT: s_or_b64 exec, exec, s[4:5]			; CHECK-NEXT: s_or_b64 exec, exec, s[4:5]
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: v_mov_b32_e32 v0, 0
	; CHECK-NEXT: s_addk_i32 s32, 0xfa00			; CHECK-NEXT: s_mov_b32 s32, s33
	; CHECK-NEXT: s_mov_b32 s33, s8			; CHECK-NEXT: s_mov_b32 s33, s8
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	;			;
	; CHECK-LABEL: main:			; CHECK-LABEL: main:
	; CHECK: main$local:			; CHECK: main$local:
	; CHECK-NEXT: .type main$local,@function			; CHECK-NEXT: .type main$local,@function
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	Show All 20 Lines
	; CHECK-NEXT: flat_store_dword v[0:1], v2			; CHECK-NEXT: flat_store_dword v[0:1], v2
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
	; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:4			; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:4
	; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:8			; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:8
	; CHECK-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:12			; CHECK-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:12
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: v_mov_b32_e32 v0, 0
	; CHECK-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:16			; CHECK-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:16
	; CHECK-NEXT: s_addk_i32 s32, 0xfa00			; CHECK-NEXT: s_mov_b32 s32, s33
	; CHECK-NEXT: s_mov_b32 s33, s6			; CHECK-NEXT: s_mov_b32 s33, s6
	; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]

This is an archive of the discontinued LLVM Phabricator instance.

[wip] AMDGPU: Try to restore SP correctly in presence of dynamic stack adjustmentsNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 551152

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/dynamic-alloca-uniform.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/extractelement-stack-lower.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/localizer.ll

llvm/test/CodeGen/AMDGPU/GlobalISel/non-entry-alloca.ll

llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll

llvm/test/CodeGen/AMDGPU/bf16.ll

llvm/test/CodeGen/AMDGPU/call-argument-types.ll

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs-packed.ll

llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll

llvm/test/CodeGen/AMDGPU/dynamic_stackalloc.ll

llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll

llvm/test/CodeGen/AMDGPU/frame-setup-without-sgpr-to-vgpr-spills.ll

llvm/test/CodeGen/AMDGPU/gfx-call-non-gfx-func.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll

llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll

llvm/test/CodeGen/AMDGPU/indirect-call.ll

llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll

llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll

llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll

llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll

llvm/test/CodeGen/AMDGPU/nested-calls.ll

llvm/test/CodeGen/AMDGPU/no-source-locations-in-prologue.ll

llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr-carry-out.mir

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr-gfx9.mir

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr.mir

llvm/test/CodeGen/AMDGPU/pei-scavenge-vgpr-spill.mir

llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll

llvm/test/CodeGen/AMDGPU/sgpr-spills-split-regalloc.ll

llvm/test/CodeGen/AMDGPU/sibling-call.ll

llvm/test/CodeGen/AMDGPU/stack-realign.ll

llvm/test/CodeGen/AMDGPU/stacksave_stackrestore.ll

llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll

llvm/test/CodeGen/AMDGPU/use_restore_frame_reg.mir

llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll

llvm/test/CodeGen/AMDGPU/wave32.ll

llvm/test/CodeGen/AMDGPU/whole-wave-register-copy.ll

llvm/test/CodeGen/AMDGPU/whole-wave-register-spill.ll

llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll

llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.generated.expected

[wip] AMDGPU: Try to restore SP correctly in presence of dynamic stack adjustments
Needs ReviewPublic