This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Created a sub-register class for the return address operand in the return instruction.
ClosedPublic

Authored by cdevadas on Jun 28 2019, 3:26 AM.

Download Raw Diff

Details

Reviewers

Commits

rGb2d24bd5400d: [AMDGPU] Created a sub-register class for the return address operand in the…
rL365512: [AMDGPU] Created a sub-register class for the return address operand in the…

Summary

Function return instruction lowering, currently uses the fixed register pair, s[30:31] for holding the return address.
It can be any SGPR pair other than the CSRs. Created an SGPR pair sub-register class exclusive of the CSRs, and used this register class while lowering the return instruction.

Diff Detail

Repository: rL LLVM

Event Timeline

cdevadas created this revision.Jun 28 2019, 3:26 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2019, 3:26 AM

Herald added subscribers: llvm-commits, t-tye, tpr and 6 others. · View Herald Transcript

Hi Matt,

The codegen is different now and the scheduler & RA introduce most changes in the test cases.

arsenm added inline comments.Jun 28 2019, 5:05 PM

lib/Target/AMDGPU/SOPInstructions.td
133–137 ↗	(On Diff #207023)	The name doesn't match what this is for. I would rather not introduce a separate instruction class for this. You can instead parameterize the existing class with the (ins) for the instruction, defaulting to (ins SReg_64:$src0)
test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll
32 ↗	(On Diff #207023)	It looks like you manually added these checks instead of using update_llc_test_checks?
test/CodeGen/AMDGPU/nested-calls.ll
33 ↗	(On Diff #207023)	Are the results actually better looking? Can you add explicit checks for the register s_setpc_b64 is using?

Parameterized the existing operand class 'SOP1_1' to accommodate different register classes.
Added explicit check for the register pair in return instruction, 's_setpc_b64' (for nested-calls.ll test)

arsenm added inline comments.Jul 3 2019, 12:32 PM

lib/Target/AMDGPU/SIISelLowering.cpp
2237 ↗	(On Diff #207250)	This line looks too long? Run clang-format?
lib/Target/AMDGPU/SIRegisterInfo.td
492 ↗	(On Diff #207250)	You should bee able to use SGPR_64.RegTypes to avoid repeating the type list
493–494 ↗	(On Diff #207250)	You can also avoid repeating these with SGPR_64.CopyCost, and AllocationPriority

reused the existing register class' parameters for the new class.
Also ran clang-format to fix the long lines.

LGTM

This revision is now accepted and ready to land.Jul 8 2019, 9:46 AM

Closed by commit rL365512: [AMDGPU] Created a sub-register class for the return address operand in the… (authored by cdevadas). · Explain WhyJul 9 2019, 9:48 AM

This revision was automatically updated to reflect the committed changes.

is this the cause of http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/62962/testReport/junit/LLVM/CodeGen_AMDGPU/spill_before_exec_mir/ ?

In D63924#1576560, @jfb wrote:

is this the cause of http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/62962/testReport/junit/LLVM/CodeGen_AMDGPU/spill_before_exec_mir/ ?

Matt says it's fixed in llvm.org/r365521. Thanks!

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AMDGPU/

SIISelLowering.cpp

15 lines

SIRegisterInfo.td

6 lines

SOPInstructions.td

6 lines

test/

CodeGen/

AMDGPU/

call-graph-register-usage.ll

22 lines

call-preserved-registers.ll

10 lines

callee-frame-setup.ll

55 lines

callee-special-input-sgprs.ll

5 lines

chain-hi-to-lo.ll

4 lines

cross-block-use-is-not-abi-copy.ll

64 lines

10 lines

10 lines

8 lines

10 lines

8 lines

Diff 208717

llvm/trunk/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,228 Lines • ▼ Show 20 Lines	SITargetLowering::LowerReturn(SDValue Chain, CallingConv::ID CallConv,
RetOps.push_back(Chain); // Operand #0 = Chain (updated below)		RetOps.push_back(Chain); // Operand #0 = Chain (updated below)

// Add return address for callable functions.		// Add return address for callable functions.
if (!Info->isEntryFunction()) {		if (!Info->isEntryFunction()) {
const SIRegisterInfo *TRI = getSubtarget()->getRegisterInfo();		const SIRegisterInfo *TRI = getSubtarget()->getRegisterInfo();
SDValue ReturnAddrReg = CreateLiveInRegister(		SDValue ReturnAddrReg = CreateLiveInRegister(
DAG, &AMDGPU::SReg_64RegClass, TRI->getReturnAddressReg(MF), MVT::i64);		DAG, &AMDGPU::SReg_64RegClass, TRI->getReturnAddressReg(MF), MVT::i64);

// FIXME: Should be able to use a vreg here, but need a way to prevent it		SDValue ReturnAddrVirtualReg = DAG.getRegister(
// from being allcoated to a CSR.		MF.getRegInfo().createVirtualRegister(&AMDGPU::CCR_SGPR_64RegClass),

SDValue PhysReturnAddrReg = DAG.getRegister(TRI->getReturnAddressReg(MF),
MVT::i64);		MVT::i64);
		Chain =
Chain = DAG.getCopyToReg(Chain, DL, PhysReturnAddrReg, ReturnAddrReg, Flag);		DAG.getCopyToReg(Chain, DL, ReturnAddrVirtualReg, ReturnAddrReg, Flag);
Flag = Chain.getValue(1);		Flag = Chain.getValue(1);
		RetOps.push_back(ReturnAddrVirtualReg);
RetOps.push_back(PhysReturnAddrReg);
}		}

// Copy the result values into the output registers.		// Copy the result values into the output registers.
for (unsigned I = 0, RealRVLocIdx = 0, E = RVLocs.size(); I != E;		for (unsigned I = 0, RealRVLocIdx = 0, E = RVLocs.size(); I != E;
++I, ++RealRVLocIdx) {		++I, ++RealRVLocIdx) {
CCValAssign &VA = RVLocs[I];		CCValAssign &VA = RVLocs[I];
assert(VA.isRegLoc() && "Can only return in registers!");		assert(VA.isRegLoc() && "Can only return in registers!");
// TODO: Partially return in registers if return values don't fit.		// TODO: Partially return in registers if return values don't fit.
▲ Show 20 Lines • Show All 8,363 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SIRegisterInfo.td

Show First 20 Lines • Show All 482 Lines • ▼ Show 20 Lines	def SRegOrLds_32 : RegisterClass<"AMDGPU", [i32, f32, i16, f16, v2i16, v2f16, i1], 32,
let isAllocatable = 0;		let isAllocatable = 0;
}		}

def SGPR_64 : RegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, v4i16, v4f16], 32, (add SGPR_64Regs)> {		def SGPR_64 : RegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, v4i16, v4f16], 32, (add SGPR_64Regs)> {
let CopyCost = 1;		let CopyCost = 1;
let AllocationPriority = 9;		let AllocationPriority = 9;
}		}

		// CCR (call clobbered registers) SGPR 64-bit registers
		def CCR_SGPR_64 : RegisterClass<"AMDGPU", SGPR_64.RegTypes, 32, (add (trunc SGPR_64, 16))> {
		let CopyCost = SGPR_64.CopyCost;
		let AllocationPriority = SGPR_64.AllocationPriority;
		}

def TTMP_64 : RegisterClass<"AMDGPU", [v2i32, i64, f64, v4i16, v4f16], 32, (add TTMP_64Regs)> {		def TTMP_64 : RegisterClass<"AMDGPU", [v2i32, i64, f64, v4i16, v4f16], 32, (add TTMP_64Regs)> {
let isAllocatable = 0;		let isAllocatable = 0;
}		}

def SReg_64_XEXEC : RegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, i1, v4i16, v4f16], 32,		def SReg_64_XEXEC : RegisterClass<"AMDGPU", [v2i32, i64, v2f32, f64, i1, v4i16, v4f16], 32,
(add SGPR_64, VCC, FLAT_SCR, XNACK_MASK, TTMP_64, TBA, TMA)> {		(add SGPR_64, VCC, FLAT_SCR, XNACK_MASK, TTMP_64, TBA, TMA)> {
let CopyCost = 1;		let CopyCost = 1;
let AllocationPriority = 9;		let AllocationPriority = 9;
▲ Show 20 Lines • Show All 263 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SOPInstructions.td

	Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines

	// no input, 64-bit output.			// no input, 64-bit output.
	class SOP1_64_0 <string opName, list<dag> pattern=[]> : SOP1_Pseudo <			class SOP1_64_0 <string opName, list<dag> pattern=[]> : SOP1_Pseudo <
	opName, (outs SReg_64:$sdst), (ins), "$sdst", pattern> {			opName, (outs SReg_64:$sdst), (ins), "$sdst", pattern> {
	let has_src0 = 0;			let has_src0 = 0;
	}			}

	// 64-bit input, no output			// 64-bit input, no output
	class SOP1_1 <string opName, list<dag> pattern=[]> : SOP1_Pseudo <			class SOP1_1 <string opName, RegisterClass rc = SReg_64, list<dag> pattern=[]> : SOP1_Pseudo <
	opName, (outs), (ins SReg_64:$src0), "$src0", pattern> {			opName, (outs), (ins rc:$src0), "$src0", pattern> {
	let has_sdst = 0;			let has_sdst = 0;
	}			}


	let isMoveImm = 1 in {			let isMoveImm = 1 in {
	let isReMaterializable = 1, isAsCheapAsAMove = 1 in {			let isReMaterializable = 1, isAsCheapAsAMove = 1 in {
	def S_MOV_B32 : SOP1_32 <"s_mov_b32">;			def S_MOV_B32 : SOP1_32 <"s_mov_b32">;
	def S_MOV_B64 : SOP1_64 <"s_mov_b64">;			def S_MOV_B64 : SOP1_64 <"s_mov_b64">;
	▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	let isTerminator = 1, isBarrier = 1, SchedRW = [WriteBranch] in {			let isTerminator = 1, isBarrier = 1, SchedRW = [WriteBranch] in {

	let isBranch = 1, isIndirectBranch = 1 in {			let isBranch = 1, isIndirectBranch = 1 in {
	def S_SETPC_B64 : SOP1_1 <"s_setpc_b64">;			def S_SETPC_B64 : SOP1_1 <"s_setpc_b64">;
	} // End isBranch = 1, isIndirectBranch = 1			} // End isBranch = 1, isIndirectBranch = 1

	let isReturn = 1 in {			let isReturn = 1 in {
	// Define variant marked as return rather than branch.			// Define variant marked as return rather than branch.
	def S_SETPC_B64_return : SOP1_1<"", [(AMDGPUret_flag i64:$src0)]>;			def S_SETPC_B64_return : SOP1_1<"", CCR_SGPR_64, [(AMDGPUret_flag i64:$src0)]>;
	}			}
	} // End isTerminator = 1, isBarrier = 1			} // End isTerminator = 1, isBarrier = 1

	let isCall = 1 in {			let isCall = 1 in {
	def S_SWAPPC_B64 : SOP1_64 <"s_swappc_b64"			def S_SWAPPC_B64 : SOP1_64 <"s_swappc_b64"
	>;			>;
	}			}

	▲ Show 20 Lines • Show All 1,430 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/call-graph-register-usage.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mattr=-code-object-v3 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,CI %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mattr=-code-object-v3 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,CI %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mattr=-code-object-v3 -mcpu=fiji -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,VI-NOBUG %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mattr=-code-object-v3 -mcpu=fiji -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,VI-NOBUG %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mattr=-code-object-v3 -mcpu=iceland -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,VI-BUG %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mattr=-code-object-v3 -mcpu=iceland -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,VI-BUG %s

	; Make sure to run a GPU with the SGPR allocation bug.			; Make sure to run a GPU with the SGPR allocation bug.

	; GCN-LABEL: {{^}}use_vcc:			; GCN-LABEL: {{^}}use_vcc:
	; GCN: ; NumSgprs: 34			; GCN: ; NumSgprs: 34
	; GCN: ; NumVgprs: 0			; GCN: ; NumVgprs: 0
	define void @use_vcc() #1 {			define void @use_vcc() #1 {
	call void asm sideeffect "", "~{vcc}" () #0			call void asm sideeffect "", "~{vcc}" () #0
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_use_vcc:			; GCN-LABEL: {{^}}indirect_use_vcc:
	; GCN: v_writelane_b32 v32, s34, 2			; GCN: v_writelane_b32 v32, s34, 2
	; GCN: v_writelane_b32 v32, s36, 0			; GCN: v_writelane_b32 v32, s30, 0
	; GCN: v_writelane_b32 v32, s37, 1			; GCN: v_writelane_b32 v32, s31, 1
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GCN: v_readlane_b32 s37, v32, 1			; GCN: v_readlane_b32 s4, v32, 0
	; GCN: v_readlane_b32 s36, v32, 0			; GCN: v_readlane_b32 s5, v32, 1
	; GCN: v_readlane_b32 s34, v32, 2			; GCN: v_readlane_b32 s34, v32, 2
	; GCN: ; NumSgprs: 40			; GCN: ; NumSgprs: 37
	; GCN: ; NumVgprs: 33			; GCN: ; NumVgprs: 33
	define void @indirect_use_vcc() #1 {			define void @indirect_use_vcc() #1 {
	call void @use_vcc()			call void @use_vcc()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_2level_use_vcc_kernel:			; GCN-LABEL: {{^}}indirect_2level_use_vcc_kernel:
	; GCN: is_dynamic_callstack = 0			; GCN: is_dynamic_callstack = 0
	; CI: ; NumSgprs: 42			; CI: ; NumSgprs: 39
	; VI-NOBUG: ; NumSgprs: 44			; VI-NOBUG: ; NumSgprs: 41
	; VI-BUG: ; NumSgprs: 96			; VI-BUG: ; NumSgprs: 96
	; GCN: ; NumVgprs: 33			; GCN: ; NumVgprs: 33
	define amdgpu_kernel void @indirect_2level_use_vcc_kernel(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @indirect_2level_use_vcc_kernel(i32 addrspace(1)* %out) #0 {
	call void @indirect_use_vcc()			call void @indirect_use_vcc()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}use_flat_scratch:			; GCN-LABEL: {{^}}use_flat_scratch:
	; CI: ; NumSgprs: 36			; CI: ; NumSgprs: 36
	; VI: ; NumSgprs: 38			; VI: ; NumSgprs: 38
	; GCN: ; NumVgprs: 0			; GCN: ; NumVgprs: 0
	define void @use_flat_scratch() #1 {			define void @use_flat_scratch() #1 {
	call void asm sideeffect "", "~{flat_scratch}" () #0			call void asm sideeffect "", "~{flat_scratch}" () #0
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_use_flat_scratch:			; GCN-LABEL: {{^}}indirect_use_flat_scratch:
	; CI: ; NumSgprs: 42			; CI: ; NumSgprs: 39
	; VI: ; NumSgprs: 44			; VI: ; NumSgprs: 41
	; GCN: ; NumVgprs: 33			; GCN: ; NumVgprs: 33
	define void @indirect_use_flat_scratch() #1 {			define void @indirect_use_flat_scratch() #1 {
	call void @use_flat_scratch()			call void @use_flat_scratch()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}indirect_2level_use_flat_scratch_kernel:			; GCN-LABEL: {{^}}indirect_2level_use_flat_scratch_kernel:
	; GCN: is_dynamic_callstack = 0			; GCN: is_dynamic_callstack = 0
	; CI: ; NumSgprs: 42			; CI: ; NumSgprs: 39
	; VI-NOBUG: ; NumSgprs: 44			; VI-NOBUG: ; NumSgprs: 41
	; VI-BUG: ; NumSgprs: 96			; VI-BUG: ; NumSgprs: 96
	; GCN: ; NumVgprs: 33			; GCN: ; NumVgprs: 33
	define amdgpu_kernel void @indirect_2level_use_flat_scratch_kernel(i32 addrspace(1)* %out) #0 {			define amdgpu_kernel void @indirect_2level_use_flat_scratch_kernel(i32 addrspace(1)* %out) #0 {
	call void @indirect_use_flat_scratch()			call void @indirect_use_flat_scratch()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}use_10_vgpr:			; GCN-LABEL: {{^}}use_10_vgpr:
	▲ Show 20 Lines • Show All 173 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/call-preserved-registers.ll

Show All 21 Lines	define amdgpu_kernel void @test_kernel_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:		; GCN-LABEL: {{^}}test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
; GCN: buffer_store_dword		; GCN: buffer_store_dword
; GCN: v_writelane_b32 v32, s34, 4		; GCN: v_writelane_b32 v32, s34, 4
; GCN: v_writelane_b32 v32, s36, 0		; GCN: v_writelane_b32 v32, s36, 0
; GCN: v_writelane_b32 v32, s37, 1		; GCN: v_writelane_b32 v32, s37, 1
; GCN: v_writelane_b32 v32, s38, 2		; GCN: v_writelane_b32 v32, s30, 2
		; GCN: v_writelane_b32 v32, s31, 3

; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
; GCN-NEXT: s_swappc_b64		; GCN-NEXT: s_swappc_b64
; GCN-DAG: v_readlane_b32 s39, v32, 3		; GCN-DAG: v_readlane_b32 s4, v32, 2
; GCN-DAG: v_readlane_b32 s38, v32, 2		; GCN-DAG: v_readlane_b32 s5, v32, 3
; GCN: v_readlane_b32 s37, v32, 1		; GCN: v_readlane_b32 s37, v32, 1
; GCN: v_readlane_b32 s36, v32, 0		; GCN: v_readlane_b32 s36, v32, 0

; GCN: v_readlane_b32 s34, v32, 4		; GCN: v_readlane_b32 s34, v32, 4
; GCN: buffer_load_dword		; GCN: buffer_load_dword
; GCN: s_setpc_b64		; GCN: s_setpc_b64
define void @test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {		define void @test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {
call void @external_void_func_void()		call void @external_void_func_void()
Show All 20 Lines
}		}

; GCN-LABEL: {{^}}void_func_void_clobber_s30_s31:		; GCN-LABEL: {{^}}void_func_void_clobber_s30_s31:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_mov_b64 [[SAVEPC:s\[[0-9]+:[0-9]+\]]], s[30:31]		; GCN-NEXT: s_mov_b64 [[SAVEPC:s\[[0-9]+:[0-9]+\]]], s[30:31]
; GCN-NEXT: #ASMSTART		; GCN-NEXT: #ASMSTART
; GCN: ; clobber		; GCN: ; clobber
; GCN-NEXT: #ASMEND		; GCN-NEXT: #ASMEND
; GCN-NEXT: s_mov_b64 s[30:31], [[SAVEPC]]		; GCN-NEXT: s_setpc_b64 [[SAVEPC]]
; GCN-NEXT: s_setpc_b64 s[30:31]
define void @void_func_void_clobber_s30_s31() #2 {		define void @void_func_void_clobber_s30_s31() #2 {
call void asm sideeffect "; clobber", "~{s[30:31]}"() #0		call void asm sideeffect "; clobber", "~{s[30:31]}"() #0
ret void		ret void
}		}

; GCN-LABEL: {{^}}void_func_void_clobber_vcc:		; GCN-LABEL: {{^}}void_func_void_clobber_vcc:
; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
▲ Show 20 Lines • Show All 249 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/callee-frame-setup.ll

	Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	; GCN-NEXT: s_waitcnt			; GCN-NEXT: s_waitcnt
	; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GCN: v_writelane_b32 [[CSR_VGPR]], s34, 2			; GCN: v_writelane_b32 [[CSR_VGPR]], s34, 2
	; GCN-DAG: s_mov_b32 s34, s32			; GCN-DAG: s_mov_b32 s34, s32
	; GCN-DAG: s_add_u32 s32, s32, 0x400{{$}}			; GCN-DAG: s_add_u32 s32, s32, 0x400{{$}}
	; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0{{$}}			; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0{{$}}
	; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s36,			; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30,
	; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s37,			; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31,

	; GCN-DAG: buffer_store_dword [[ZERO]], off, s[0:3], s34{{$}}			; GCN-DAG: buffer_store_dword [[ZERO]], off, s[0:3], s34{{$}}

	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN-DAG: v_readlane_b32 s36, [[CSR_VGPR]]			; GCN-DAG: v_readlane_b32 s5, [[CSR_VGPR]]
	; GCN-DAG: v_readlane_b32 s37, [[CSR_VGPR]]			; GCN-DAG: v_readlane_b32 s4, [[CSR_VGPR]]

	; GCN: s_sub_u32 s32, s32, 0x400{{$}}			; GCN: s_sub_u32 s32, s32, 0x400{{$}}
	; GCN-NEXT: v_readlane_b32 s34, [[CSR_VGPR]], 2			; GCN-NEXT: v_readlane_b32 s34, [[CSR_VGPR]], 2
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)

	Show All 14 Lines
	; GCN-LABEL: {{^}}callee_no_stack_with_call:			; GCN-LABEL: {{^}}callee_no_stack_with_call:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GCN-DAG: s_add_u32 s32, s32, 0x400			; GCN-DAG: s_add_u32 s32, s32, 0x400
	; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s34, [[FP_SPILL_LANE:[0-9]+]]			; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s34, [[FP_SPILL_LANE:[0-9]+]]

	; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s36, 0			; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30, 0
	; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s37, 1			; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1
	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN-DAG: v_readlane_b32 s36, v32, 0			; GCN-DAG: v_readlane_b32 s4, v32, 0
	; GCN-DAG: v_readlane_b32 s37, v32, 1			; GCN-DAG: v_readlane_b32 s5, v32, 1

	; GCN: s_sub_u32 s32, s32, 0x400			; GCN: s_sub_u32 s32, s32, 0x400
	; GCN-NEXT: v_readlane_b32 s34, [[CSR_VGPR]], [[FP_SPILL_LANE]]			; GCN-NEXT: v_readlane_b32 s34, [[CSR_VGPR]], [[FP_SPILL_LANE]]
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	▲ Show 20 Lines • Show All 168 Lines • ▼ Show 20 Lines
	define void @realign_stack_no_fp_elim() #1 {			define void @realign_stack_no_fp_elim() #1 {
	%alloca = alloca i32, align 8192, addrspace(5)			%alloca = alloca i32, align 8192, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca			store volatile i32 0, i32 addrspace(5)* %alloca
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp:			; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: v_writelane_b32 v1, s34, 0			; GCN-NEXT: v_writelane_b32 v1, s34, 2
				; GCN-NEXT: v_writelane_b32 v1, s30, 0
	; GCN-NEXT: s_mov_b32 s34, s32			; GCN-NEXT: s_mov_b32 s34, s32
	; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0			; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0
				; GCN: v_writelane_b32 v1, s31, 1
	; GCN: buffer_store_dword [[ZERO]], off, s[0:3], s34 offset:4			; GCN: buffer_store_dword [[ZERO]], off, s[0:3], s34 offset:4
	; GCN: ;;#ASMSTART			; GCN: ;;#ASMSTART
	; GCN: s_add_u32 s32, s32, 0x200			; GCN: v_readlane_b32 s4, v1, 0
	; GCN-NEXT: s_mov_b64 s[30:31], vcc			; GCN-NEXT: s_add_u32 s32, s32, 0x200
				; GCN-NEXT: v_readlane_b32 s5, v1, 1
	; GCN-NEXT: s_sub_u32 s32, s32, 0x200			; GCN-NEXT: s_sub_u32 s32, s32, 0x200
	; GCN-NEXT: v_readlane_b32 s34, v1, 0			; GCN-NEXT: v_readlane_b32 s34, v1, 2
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[4:5]
	define void @no_unused_non_csr_sgpr_for_fp() #1 {			define void @no_unused_non_csr_sgpr_for_fp() #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca			store volatile i32 0, i32 addrspace(5)* %alloca

	; Use all clobberable registers, so FP has to spill to a VGPR.			; Use all clobberable registers, so FP has to spill to a VGPR.
	call void asm sideeffect "",			call void asm sideeffect "",
	"~{s0},~{s1},~{s2},~{s3},~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}			"~{s0},~{s1},~{s2},~{s3},~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}
	,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}			,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}
	,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}			,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}
	,~{s30},~{s31}"() #0			,~{s30},~{s31}"() #0

	ret void			ret void
	}			}

	; Need a new CSR VGPR to satisfy the FP spill.			; Need a new CSR VGPR to satisfy the FP spill.
	; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr:			; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GCN-NEXT: v_writelane_b32 v32, s34, 0			; GCN-NEXT: v_writelane_b32 v32, s34, 2
				; GCN-NEXT: v_writelane_b32 v32, s30, 0
	; GCN-NEXT: s_mov_b32 s34, s32			; GCN-NEXT: s_mov_b32 s34, s32
	; GCN: s_add_u32 s32, s32, 0x300{{$}}

	; GCN-DAG: s_mov_b64 vcc, s[30:31]			; GCN-DAG: v_writelane_b32 v32, s31, 1
	; GCN-DAG: buffer_store_dword			; GCN-DAG: buffer_store_dword
				; GCN: s_add_u32 s32, s32, 0x300{{$}}

	; GCN: ;;#ASMSTART			; GCN: ;;#ASMSTART
	; GCN: s_mov_b64 s[30:31], vcc

	; GCN: s_sub_u32 s32, s32, 0x300{{$}}			; GCN: v_readlane_b32 s4, v32, 0
	; GCN-NEXT: v_readlane_b32 s34, v32, 0			; GCN-NEXT: v_readlane_b32 s5, v32, 1
				; GCN-NEXT: s_sub_u32 s32, s32, 0x300{{$}}
				; GCN-NEXT: v_readlane_b32 s34, v32, 2
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define void @no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr() #1 {			define void @no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr() #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca			store volatile i32 0, i32 addrspace(5)* %alloca
	Show All 17 Lines
	; The byval argument exceeds the MUBUF constant offset, so a scratch			; The byval argument exceeds the MUBUF constant offset, so a scratch
	; register is needed to access the CSR VGPR slot.			; register is needed to access the CSR VGPR slot.
	; GCN-LABEL: {{^}}scratch_reg_needed_mubuf_offset:			; GCN-LABEL: {{^}}scratch_reg_needed_mubuf_offset:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: v_mov_b32_e32 [[SCRATCH_VGPR:v[0-9]+]], 0x1008			; GCN-NEXT: v_mov_b32_e32 [[SCRATCH_VGPR:v[0-9]+]], 0x1008
	; GCN-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], [[SCRATCH_VGPR]], s[0:3], s32 offen ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], [[SCRATCH_VGPR]], s[0:3], s32 offen ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GCN-NEXT: v_writelane_b32 v32, s34, 0			; GCN-NEXT: v_writelane_b32 v32, s34, 2
				; GCN-NEXT: v_writelane_b32 v32, s30, 0
	; GCN-NEXT: s_mov_b32 s34, s32			; GCN-NEXT: s_mov_b32 s34, s32
				; GCN-DAG: v_writelane_b32 v32, s31, 1
	; GCN-DAG: s_add_u32 s32, s32, 0x40300{{$}}			; GCN-DAG: s_add_u32 s32, s32, 0x40300{{$}}
	; GCN-DAG: s_mov_b64 vcc, s[30:31]
	; GCN-DAG: buffer_store_dword			; GCN-DAG: buffer_store_dword

	; GCN: ;;#ASMSTART			; GCN: ;;#ASMSTART
	; GCN: s_mov_b64 s[30:31], vcc

	; GCN: s_sub_u32 s32, s32, 0x40300{{$}}			; GCN: v_readlane_b32 s4, v32, 0
	; GCN-NEXT: v_readlane_b32 s34, v32, 0			; GCN-NEXT: v_readlane_b32 s5, v32, 1
				; GCN-NEXT: s_sub_u32 s32, s32, 0x40300{{$}}
				; GCN-NEXT: v_readlane_b32 s34, v32, 2
	; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: v_mov_b32_e32 [[SCRATCH_VGPR:v[0-9]+]], 0x1008			; GCN-NEXT: v_mov_b32_e32 [[SCRATCH_VGPR:v[0-9]+]], 0x1008
	; GCN-NEXT: buffer_load_dword [[CSR_VGPR]], [[SCRATCH_VGPR]], s[0:3], s32 offen ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword [[CSR_VGPR]], [[SCRATCH_VGPR]], s[0:3], s32 offen ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	define void @scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval align 4 %arg) #1 {			define void @scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval align 4 %arg) #1 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll

	Show First 20 Lines • Show All 305 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @kern_indirect_use_workgroup_id_yz() #1 {			define amdgpu_kernel void @kern_indirect_use_workgroup_id_yz() #1 {
	call void @use_workgroup_id_yz()			call void @use_workgroup_id_yz()
	ret void			ret void
	}			}

	; Argument is in right place already			; Argument is in right place already
	; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_x:			; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_x:
	; GCN-NOT: s4			; GCN-NOT: s4
				; GCN: v_readlane_b32 s4, v32, 0
	define hidden void @func_indirect_use_workgroup_id_x() #1 {			define hidden void @func_indirect_use_workgroup_id_x() #1 {
	call void @use_workgroup_id_x()			call void @use_workgroup_id_x()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_y:			; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_y:
	; GCN-NOT: s4			; GCN-NOT: s4
				; GCN: v_readlane_b32 s4, v32, 0
	define hidden void @func_indirect_use_workgroup_id_y() #1 {			define hidden void @func_indirect_use_workgroup_id_y() #1 {
	call void @use_workgroup_id_y()			call void @use_workgroup_id_y()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_z:			; GCN-LABEL: {{^}}func_indirect_use_workgroup_id_z:
	; GCN-NOT: s4			; GCN-NOT: s4
				; GCN: v_readlane_b32 s4, v32, 0
	define hidden void @func_indirect_use_workgroup_id_z() #1 {			define hidden void @func_indirect_use_workgroup_id_z() #1 {
	call void @use_workgroup_id_z()			call void @use_workgroup_id_z()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}other_arg_use_workgroup_id_x:			; GCN-LABEL: {{^}}other_arg_use_workgroup_id_x:
	; GCN: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0			; GCN: {{flat\|global}}_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0
	; GCN: ; use s4			; GCN: ; use s4
	▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines
	; GCN-NOT: s10			; GCN-NOT: s10
	; GCN-NOT: s11			; GCN-NOT: s11
	; GCN-NOT: s12			; GCN-NOT: s12
	; GCN-NOT: s13			; GCN-NOT: s13
	; GCN-NOT: s[6:7]			; GCN-NOT: s[6:7]
	; GCN-NOT: s[8:9]			; GCN-NOT: s[8:9]
	; GCN-NOT: s[10:11]			; GCN-NOT: s[10:11]
	; GCN-NOT: s[12:13]			; GCN-NOT: s[12:13]
	; GCN: s_or_saveexec_b64 s[4:5], -1			; GCN: s_or_saveexec_b64 s[16:17], -1
	define hidden void @func_indirect_use_every_sgpr_input() #1 {			define hidden void @func_indirect_use_every_sgpr_input() #1 {
	call void @use_every_sgpr_input()			call void @use_every_sgpr_input()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}func_use_every_sgpr_input_call_use_workgroup_id_xyz:			; GCN-LABEL: {{^}}func_use_every_sgpr_input_call_use_workgroup_id_xyz:
	; GCN: s_mov_b32 s4, s12			; GCN: s_mov_b32 s4, s12
	; GCN: s_mov_b32 s5, s13			; GCN: s_mov_b32 s5, s13
	▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/chain-hi-to-lo.ll

Show First 20 Lines • Show All 275 Lines • ▼ Show 20 Lines	bb:
%to.hi = insertelement <2 x i16> undef, i16 %load_hi, i32 1		%to.hi = insertelement <2 x i16> undef, i16 %load_hi, i32 1
%op.hi = add <2 x i16> %to.hi, <i16 12, i16 12>		%op.hi = add <2 x i16> %to.hi, <i16 12, i16 12>
%result = insertelement <2 x i16> %op.hi, i16 %load_lo, i32 0		%result = insertelement <2 x i16> %op.hi, i16 %load_lo, i32 0
ret <2 x i16> %result		ret <2 x i16> %result
}		}

; GCN-LABEL: {{^}}chain_hi_to_lo_group_may_alias_store:		; GCN-LABEL: {{^}}chain_hi_to_lo_group_may_alias_store:
; GFX900: v_mov_b32_e32 [[K:v[0-9]+]], 0x7b		; GFX900: v_mov_b32_e32 [[K:v[0-9]+]], 0x7b
; GFX900-NEXT: ds_read_u16 v3, v0		; GFX900-NEXT: ds_read_u16 v2, v0
; GFX900-NEXT: ds_write_b16 v1, [[K]]		; GFX900-NEXT: ds_write_b16 v1, [[K]]
; GFX900-NEXT: ds_read_u16 v0, v0 offset:2		; GFX900-NEXT: ds_read_u16 v0, v0 offset:2
; GFX900-NEXT: s_waitcnt lgkmcnt(0)		; GFX900-NEXT: s_waitcnt lgkmcnt(0)
; GFX900-NEXT: v_and_b32_e32 v0, 0xffff, v0		; GFX900-NEXT: v_and_b32_e32 v0, 0xffff, v0
; GFX900-NEXT: v_lshl_or_b32 v0, v3, 16, v0		; GFX900-NEXT: v_lshl_or_b32 v0, v2, 16, v0
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64
define <2 x i16> @chain_hi_to_lo_group_may_alias_store(i16 addrspace(3)* %ptr, i16 addrspace(3)* %may.alias) {		define <2 x i16> @chain_hi_to_lo_group_may_alias_store(i16 addrspace(3)* %ptr, i16 addrspace(3)* %may.alias) {
bb:		bb:
%gep_lo = getelementptr inbounds i16, i16 addrspace(3)* %ptr, i64 1		%gep_lo = getelementptr inbounds i16, i16 addrspace(3)* %ptr, i64 1
%gep_hi = getelementptr inbounds i16, i16 addrspace(3)* %ptr, i64 0		%gep_hi = getelementptr inbounds i16, i16 addrspace(3)* %ptr, i64 0
%load_hi = load i16, i16 addrspace(3)* %gep_hi		%load_hi = load i16, i16 addrspace(3)* %gep_hi
store i16 123, i16 addrspace(3)* %may.alias		store i16 123, i16 addrspace(3)* %may.alias
%load_lo = load i16, i16 addrspace(3)* %gep_lo		%load_lo = load i16, i16 addrspace(3)* %gep_lo

%to.hi = insertelement <2 x i16> undef, i16 %load_hi, i32 1		%to.hi = insertelement <2 x i16> undef, i16 %load_hi, i32 1
%result = insertelement <2 x i16> %to.hi, i16 %load_lo, i32 0		%result = insertelement <2 x i16> %to.hi, i16 %load_lo, i32 0
ret <2 x i16> %result		ret <2 x i16> %result
}		}

llvm/trunk/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

	Show All 25 Lines
	define float @call_split_type_used_outside_block_v2f32() #0 {			define float @call_split_type_used_outside_block_v2f32() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v2f32:			; GCN-LABEL: call_split_type_used_outside_block_v2f32:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v32, s34, 2			; GCN-NEXT: v_writelane_b32 v32, s34, 2
	; GCN-NEXT: v_writelane_b32 v32, s36, 0			; GCN-NEXT: v_writelane_b32 v32, s30, 0
	; GCN-NEXT: s_mov_b32 s34, s32			; GCN-NEXT: s_mov_b32 s34, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x400			; GCN-NEXT: s_add_u32 s32, s32, 0x400
	; GCN-NEXT: v_writelane_b32 v32, s37, 1
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func_v2f32@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func_v2f32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func_v2f32@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, func_v2f32@rel32@hi+4
	; GCN-NEXT: s_mov_b64 s[36:37], s[30:31]			; GCN-NEXT: v_writelane_b32 v32, s31, 1
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: s_mov_b64 s[30:31], s[36:37]			; GCN-NEXT: v_readlane_b32 s4, v32, 0
	; GCN-NEXT: v_readlane_b32 s37, v32, 1			; GCN-NEXT: v_readlane_b32 s5, v32, 1
	; GCN-NEXT: v_readlane_b32 s36, v32, 0
	; GCN-NEXT: s_sub_u32 s32, s32, 0x400			; GCN-NEXT: s_sub_u32 s32, s32, 0x400
	; GCN-NEXT: v_readlane_b32 s34, v32, 2			; GCN-NEXT: v_readlane_b32 s34, v32, 2
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[4:5]
	bb0:			bb0:
	%split.ret.type = call <2 x float> @func_v2f32()			%split.ret.type = call <2 x float> @func_v2f32()
	br label %bb1			br label %bb1

	bb1:			bb1:
	%extract = extractelement <2 x float> %split.ret.type, i32 0			%extract = extractelement <2 x float> %split.ret.type, i32 0
	ret float %extract			ret float %extract
	}			}

	define float @call_split_type_used_outside_block_v3f32() #0 {			define float @call_split_type_used_outside_block_v3f32() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v3f32:			; GCN-LABEL: call_split_type_used_outside_block_v3f32:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v32, s34, 2			; GCN-NEXT: v_writelane_b32 v32, s34, 2
	; GCN-NEXT: v_writelane_b32 v32, s36, 0			; GCN-NEXT: v_writelane_b32 v32, s30, 0
	; GCN-NEXT: s_mov_b32 s34, s32			; GCN-NEXT: s_mov_b32 s34, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x400			; GCN-NEXT: s_add_u32 s32, s32, 0x400
	; GCN-NEXT: v_writelane_b32 v32, s37, 1
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func_v3f32@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func_v3f32@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func_v3f32@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, func_v3f32@rel32@hi+4
	; GCN-NEXT: s_mov_b64 s[36:37], s[30:31]			; GCN-NEXT: v_writelane_b32 v32, s31, 1
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: s_mov_b64 s[30:31], s[36:37]			; GCN-NEXT: v_readlane_b32 s4, v32, 0
	; GCN-NEXT: v_readlane_b32 s37, v32, 1			; GCN-NEXT: v_readlane_b32 s5, v32, 1
	; GCN-NEXT: v_readlane_b32 s36, v32, 0
	; GCN-NEXT: s_sub_u32 s32, s32, 0x400			; GCN-NEXT: s_sub_u32 s32, s32, 0x400
	; GCN-NEXT: v_readlane_b32 s34, v32, 2			; GCN-NEXT: v_readlane_b32 s34, v32, 2
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[4:5]
	bb0:			bb0:
	%split.ret.type = call <3 x float> @func_v3f32()			%split.ret.type = call <3 x float> @func_v3f32()
	br label %bb1			br label %bb1

	bb1:			bb1:
	%extract = extractelement <3 x float> %split.ret.type, i32 0			%extract = extractelement <3 x float> %split.ret.type, i32 0
	ret float %extract			ret float %extract
	}			}

	define half @call_split_type_used_outside_block_v4f16() #0 {			define half @call_split_type_used_outside_block_v4f16() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_v4f16:			; GCN-LABEL: call_split_type_used_outside_block_v4f16:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v32, s34, 2			; GCN-NEXT: v_writelane_b32 v32, s34, 2
	; GCN-NEXT: v_writelane_b32 v32, s36, 0			; GCN-NEXT: v_writelane_b32 v32, s30, 0
	; GCN-NEXT: s_mov_b32 s34, s32			; GCN-NEXT: s_mov_b32 s34, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x400			; GCN-NEXT: s_add_u32 s32, s32, 0x400
	; GCN-NEXT: v_writelane_b32 v32, s37, 1
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func_v4f16@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func_v4f16@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func_v4f16@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, func_v4f16@rel32@hi+4
	; GCN-NEXT: s_mov_b64 s[36:37], s[30:31]			; GCN-NEXT: v_writelane_b32 v32, s31, 1
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: s_mov_b64 s[30:31], s[36:37]			; GCN-NEXT: v_readlane_b32 s4, v32, 0
	; GCN-NEXT: v_readlane_b32 s37, v32, 1			; GCN-NEXT: v_readlane_b32 s5, v32, 1
	; GCN-NEXT: v_readlane_b32 s36, v32, 0
	; GCN-NEXT: s_sub_u32 s32, s32, 0x400			; GCN-NEXT: s_sub_u32 s32, s32, 0x400
	; GCN-NEXT: v_readlane_b32 s34, v32, 2			; GCN-NEXT: v_readlane_b32 s34, v32, 2
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[4:5]
	bb0:			bb0:
	%split.ret.type = call <4 x half> @func_v4f16()			%split.ret.type = call <4 x half> @func_v4f16()
	br label %bb1			br label %bb1

	bb1:			bb1:
	%extract = extractelement <4 x half> %split.ret.type, i32 0			%extract = extractelement <4 x half> %split.ret.type, i32 0
	ret half %extract			ret half %extract
	}			}

	define { i32, half } @call_split_type_used_outside_block_struct() #0 {			define { i32, half } @call_split_type_used_outside_block_struct() #0 {
	; GCN-LABEL: call_split_type_used_outside_block_struct:			; GCN-LABEL: call_split_type_used_outside_block_struct:
	; GCN: ; %bb.0: ; %bb0			; GCN: ; %bb.0: ; %bb0
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1
	; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[4:5]
	; GCN-NEXT: v_writelane_b32 v32, s34, 2			; GCN-NEXT: v_writelane_b32 v32, s34, 2
	; GCN-NEXT: v_writelane_b32 v32, s36, 0			; GCN-NEXT: v_writelane_b32 v32, s30, 0
	; GCN-NEXT: s_mov_b32 s34, s32			; GCN-NEXT: s_mov_b32 s34, s32
	; GCN-NEXT: s_add_u32 s32, s32, 0x400			; GCN-NEXT: s_add_u32 s32, s32, 0x400
	; GCN-NEXT: v_writelane_b32 v32, s37, 1
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func_struct@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func_struct@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func_struct@rel32@hi+4			; GCN-NEXT: s_addc_u32 s5, s5, func_struct@rel32@hi+4
	; GCN-NEXT: s_mov_b64 s[36:37], s[30:31]			; GCN-NEXT: v_writelane_b32 v32, s31, 1
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: s_mov_b64 s[30:31], s[36:37]			; GCN-NEXT: v_readlane_b32 s4, v32, 0
	; GCN-NEXT: v_readlane_b32 s37, v32, 1			; GCN-NEXT: v_readlane_b32 s5, v32, 1
	; GCN-NEXT: v_readlane_b32 s36, v32, 0
	; GCN-NEXT: v_mov_b32_e32 v1, v4			; GCN-NEXT: v_mov_b32_e32 v1, v4
	; GCN-NEXT: s_sub_u32 s32, s32, 0x400			; GCN-NEXT: s_sub_u32 s32, s32, 0x400
	; GCN-NEXT: v_readlane_b32 s34, v32, 2			; GCN-NEXT: v_readlane_b32 s34, v32, 2
	; GCN-NEXT: s_or_saveexec_b64 s[4:5], -1			; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
	; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, s[4:5]			; GCN-NEXT: s_mov_b64 exec, s[6:7]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[4:5]
	bb0:			bb0:
	%split.ret.type = call { <4 x i32>, <4 x half> } @func_struct()			%split.ret.type = call { <4 x i32>, <4 x half> } @func_struct()
	br label %bb1			br label %bb1

	bb1:			bb1:
	%val0 = extractvalue { <4 x i32>, <4 x half> } %split.ret.type, 0			%val0 = extractvalue { <4 x i32>, <4 x half> } %split.ret.type, 0
	%val1 = extractvalue { <4 x i32>, <4 x half> } %split.ret.type, 1			%val1 = extractvalue { <4 x i32>, <4 x half> } %split.ret.type, 1
	%extract0 = extractelement <4 x i32> %val0, i32 0			%extract0 = extractelement <4 x i32> %val0, i32 0
	Show All 15 Lines

llvm/trunk/test/CodeGen/AMDGPU/llvm.log.f16.ll

	Show All 29 Lines

	; FUNC-LABEL: {{^}}log_v2f16			; FUNC-LABEL: {{^}}log_v2f16
	; SI: buffer_load_dword v[[A_F16_0:[0-9]+]]			; SI: buffer_load_dword v[[A_F16_0:[0-9]+]]
	; VI: flat_load_dword v[[A_F16_0:[0-9]+]]			; VI: flat_load_dword v[[A_F16_0:[0-9]+]]
	; GFX9: global_load_dword v[[A_F16_0:[0-9]+]]			; GFX9: global_load_dword v[[A_F16_0:[0-9]+]]
	; SI: s_mov_b32 [[A_F32_2:s[0-9]+]], 0x3f317218			; SI: s_mov_b32 [[A_F32_2:s[0-9]+]], 0x3f317218
	; VIGFX9: s_movk_i32 [[A_F32_2:s[0-9]+]], 0x398c			; VIGFX9: s_movk_i32 [[A_F32_2:s[0-9]+]], 0x398c
	; VI: v_mov_b32_e32 [[A_F32_2_V:v[0-9]+]], [[A_F32_2]]			; VI: v_mov_b32_e32 [[A_F32_2_V:v[0-9]+]], [[A_F32_2]]
	; SI: v_cvt_f32_f16_e32 v[[A_F32_1:[0-9]+]], v[[A_F16_0]]
	; SI: v_lshrrev_b32_e32 v[[A_F16_1:[0-9]+]], 16, v[[A_F16_0]]			; SI: v_lshrrev_b32_e32 v[[A_F16_1:[0-9]+]], 16, v[[A_F16_0]]
	; SI: v_cvt_f32_f16_e32 v[[A_F32_0:[0-9]+]], v[[A_F16_0]]			; SI: v_cvt_f32_f16_e32 v[[A_F32_0:[0-9]+]], v[[A_F16_1]]
	; SI: v_log_f32_e32 v[[R_F32_1:[0-9]+]], v[[A_F32_1]]			; SI: v_cvt_f32_f16_e32 v[[A_F32_1:[0-9]+]], v[[A_F16_0]]
	; SI: v_log_f32_e32 v[[R_F32_0:[0-9]+]], v[[A_F32_0]]			; SI: v_log_f32_e32 v[[R_F32_0:[0-9]+]], v[[A_F32_0]]
	; SI: v_mul_f32_e32 v[[R_F32_6:[0-9]+]], [[A_F32_2]], v[[R_F32_1]]			; SI: v_log_f32_e32 v[[R_F32_1:[0-9]+]], v[[A_F32_1]]
	; SI: v_cvt_f16_f32_e32 v[[R_F16_1:[0-9]+]], v[[R_F32_6]]
	; SI: v_mul_f32_e32 v[[R_F32_5:[0-9]+]], [[A_F32_2]], v[[R_F32_0]]			; SI: v_mul_f32_e32 v[[R_F32_5:[0-9]+]], [[A_F32_2]], v[[R_F32_0]]
	; SI: v_cvt_f16_f32_e32 v[[R_F16_0:[0-9]+]], v[[R_F32_5]]			; SI: v_cvt_f16_f32_e32 v[[R_F16_0:[0-9]+]], v[[R_F32_5]]
				; SI: v_mul_f32_e32 v[[R_F32_6:[0-9]+]], [[A_F32_2]], v[[R_F32_1]]
				; SI: v_cvt_f16_f32_e32 v[[R_F16_1:[0-9]+]], v[[R_F32_6]]
	; GFX9: v_log_f16_e32 v[[R_F16_2:[0-9]+]], v[[A_F16_0]]			; GFX9: v_log_f16_e32 v[[R_F16_2:[0-9]+]], v[[A_F16_0]]
	; VIGFX9: v_log_f16_sdwa v[[R_F16_1:[0-9]+]], v[[A_F16_0]] dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1			; VIGFX9: v_log_f16_sdwa v[[R_F16_1:[0-9]+]], v[[A_F16_0]] dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1
	; VI: v_log_f16_e32 v[[R_F16_0:[0-9]+]], v[[A_F16_0]]			; VI: v_log_f16_e32 v[[R_F16_0:[0-9]+]], v[[A_F16_0]]
	; VI: v_mul_f16_sdwa v[[R_F16_2:[0-9]+]], v[[R_F16_1]], [[A_F32_2_V]] dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD			; VI: v_mul_f16_sdwa v[[R_F16_2:[0-9]+]], v[[R_F16_1]], [[A_F32_2_V]] dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
	; GFX9: v_mul_f16_e32 v[[R_F32_3:[0-9]+]], [[A_F32_2]], v[[R_F16_2]]			; GFX9: v_mul_f16_e32 v[[R_F32_3:[0-9]+]], [[A_F32_2]], v[[R_F16_2]]
	; VIGFX9: v_mul_f16_e32 v[[R_F32_2:[0-9]+]], [[A_F32_2]], v[[R_F16_0]]			; VIGFX9: v_mul_f16_e32 v[[R_F32_2:[0-9]+]], [[A_F32_2]], v[[R_F16_0]]
	; SI: v_lshlrev_b32_e32 v[[R_F16_HI:[0-9]+]], 16, v[[R_F16_0]]			; SI: v_lshlrev_b32_e32 v[[R_F16_HI:[0-9]+]], 16, v[[R_F16_0]]
	; SI-NOT: v_and_b32_e32			; SI-NOT: v_and_b32_e32
	Show All 17 Lines

llvm/trunk/test/CodeGen/AMDGPU/llvm.log10.f16.ll

	Show All 29 Lines

	; GCN-LABEL: {{^}}log10_v2f16			; GCN-LABEL: {{^}}log10_v2f16
	; SI: buffer_load_dword v[[A_F16_0:[0-9]+]]			; SI: buffer_load_dword v[[A_F16_0:[0-9]+]]
	; VI: flat_load_dword v[[A_F16_0:[0-9]+]]			; VI: flat_load_dword v[[A_F16_0:[0-9]+]]
	; GFX9: global_load_dword v[[A_F16_0:[0-9]+]]			; GFX9: global_load_dword v[[A_F16_0:[0-9]+]]
	; SI: s_mov_b32 [[A_F32_2:s[0-9]+]], 0x3e9a209a			; SI: s_mov_b32 [[A_F32_2:s[0-9]+]], 0x3e9a209a
	; VIGFX9: s_movk_i32 [[A_F32_2:s[0-9]+]], 0x34d1			; VIGFX9: s_movk_i32 [[A_F32_2:s[0-9]+]], 0x34d1
	; VI: v_mov_b32_e32 [[A_F32_2_V:v[0-9]+]], [[A_F32_2]]			; VI: v_mov_b32_e32 [[A_F32_2_V:v[0-9]+]], [[A_F32_2]]
	; SI: v_cvt_f32_f16_e32 v[[A_F32_1:[0-9]+]], v[[A_F16_0]]
	; SI: v_lshrrev_b32_e32 v[[A_F16_1:[0-9]+]], 16, v[[A_F16_0]]			; SI: v_lshrrev_b32_e32 v[[A_F16_1:[0-9]+]], 16, v[[A_F16_0]]
	; SI: v_cvt_f32_f16_e32 v[[A_F32_0:[0-9]+]], v[[A_F16_0]]			; SI: v_cvt_f32_f16_e32 v[[A_F32_0:[0-9]+]], v[[A_F16_1]]
	; SI: v_log_f32_e32 v[[R_F32_1:[0-9]+]], v[[A_F32_1]]			; SI: v_cvt_f32_f16_e32 v[[A_F32_1:[0-9]+]], v[[A_F16_0]]
	; SI: v_log_f32_e32 v[[R_F32_0:[0-9]+]], v[[A_F32_0]]			; SI: v_log_f32_e32 v[[R_F32_0:[0-9]+]], v[[A_F32_0]]
	; SI: v_mul_f32_e32 v[[R_F32_6:[0-9]+]], [[A_F32_2]], v[[R_F32_1]]			; SI: v_log_f32_e32 v[[R_F32_1:[0-9]+]], v[[A_F32_1]]
	; SI: v_cvt_f16_f32_e32 v[[R_F16_1:[0-9]+]], v[[R_F32_6]]
	; SI: v_mul_f32_e32 v[[R_F32_5:[0-9]+]], [[A_F32_2]], v[[R_F32_0]]			; SI: v_mul_f32_e32 v[[R_F32_5:[0-9]+]], [[A_F32_2]], v[[R_F32_0]]
	; SI: v_cvt_f16_f32_e32 v[[R_F16_0:[0-9]+]], v[[R_F32_5]]			; SI: v_cvt_f16_f32_e32 v[[R_F16_0:[0-9]+]], v[[R_F32_5]]
				; SI: v_mul_f32_e32 v[[R_F32_6:[0-9]+]], [[A_F32_2]], v[[R_F32_1]]
				; SI: v_cvt_f16_f32_e32 v[[R_F16_1:[0-9]+]], v[[R_F32_6]]
	; GFX9: v_log_f16_e32 v[[R_F16_2:[0-9]+]], v[[A_F16_0]]			; GFX9: v_log_f16_e32 v[[R_F16_2:[0-9]+]], v[[A_F16_0]]
	; VIGFX9: v_log_f16_sdwa v[[R_F16_1:[0-9]+]], v[[A_F16_0]] dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1			; VIGFX9: v_log_f16_sdwa v[[R_F16_1:[0-9]+]], v[[A_F16_0]] dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1
	; VI: v_log_f16_e32 v[[R_F16_0:[0-9]+]], v[[A_F16_0]]			; VI: v_log_f16_e32 v[[R_F16_0:[0-9]+]], v[[A_F16_0]]
	; VI: v_mul_f16_sdwa v[[R_F16_2:[0-9]+]], v[[R_F16_1]], [[A_F32_2_V]] dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD			; VI: v_mul_f16_sdwa v[[R_F16_2:[0-9]+]], v[[R_F16_1]], [[A_F32_2_V]] dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
	; GFX9: v_mul_f16_e32 v[[R_F32_3:[0-9]+]], [[A_F32_2]], v[[R_F16_2]]			; GFX9: v_mul_f16_e32 v[[R_F32_3:[0-9]+]], [[A_F32_2]], v[[R_F16_2]]
	; VIGFX9: v_mul_f16_e32 v[[R_F32_2:[0-9]+]], [[A_F32_2]], v[[R_F16_0]]			; VIGFX9: v_mul_f16_e32 v[[R_F32_2:[0-9]+]], [[A_F32_2]], v[[R_F16_0]]
	; SI: v_lshlrev_b32_e32 v[[R_F16_HI:[0-9]+]], 16, v[[R_F16_0]]			; SI: v_lshlrev_b32_e32 v[[R_F16_HI:[0-9]+]], 16, v[[R_F16_0]]
	; SI-NOT: v_and_b32_e32			; SI-NOT: v_and_b32_e32
	Show All 17 Lines

llvm/trunk/test/CodeGen/AMDGPU/load-lo16.ll

Show First 20 Lines • Show All 265 Lines • ▼ Show 20 Lines	entry:
%build1 = insertelement <2 x i16> %reg, i16 %load, i32 0		%build1 = insertelement <2 x i16> %reg, i16 %load, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_local_lo_v2i16_reghi_vreg_multi_use_lohi:		; GCN-LABEL: {{^}}load_local_lo_v2i16_reghi_vreg_multi_use_lohi:
; GFX900: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900: ds_read_u16 v0, v0		; GFX900: ds_read_u16 v0, v0
; GFX900: v_lshrrev_b32_e32 v4, 16, v1		; GFX900: v_lshrrev_b32_e32 v[[A_F16:[0-9]+]], 16, v1
		; GFX900: v_mov_b32_e32 v[[A_F32:[0-9]+]], 0xffff
; GFX900: s_waitcnt lgkmcnt(0)		; GFX900: s_waitcnt lgkmcnt(0)
; GFX900: ds_write_b16 v2, v0		; GFX900: ds_write_b16 v2, v0
; GFX900: ds_write_b16 v3, v4		; GFX900: ds_write_b16 v3, v[[A_F16]]
; GFX900: v_mov_b32_e32 v2, 0xffff		; GFX900: v_bfi_b32 v0, v[[A_F32]], v0, v1
; GFX900: v_bfi_b32 v0, v2, v0, v1
; GFX900: global_store_dword v[0:1], v0, off		; GFX900: global_store_dword v[0:1], v0, off
; GFX900: s_waitcnt vmcnt(0) lgkmcnt(0)		; GFX900: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX900: s_setpc_b64 s[30:31]		; GFX900: s_setpc_b64 s[30:31]

; NO-D16-HI: ds_read_u16 v		; NO-D16-HI: ds_read_u16 v
define void @load_local_lo_v2i16_reghi_vreg_multi_use_lohi(i16 addrspace(3)* noalias %in, <2 x i16> %reg, i16 addrspace(3)* noalias %out0, i16 addrspace(3)* noalias %out1) #0 {		define void @load_local_lo_v2i16_reghi_vreg_multi_use_lohi(i16 addrspace(3)* noalias %in, <2 x i16> %reg, i16 addrspace(3)* noalias %out0, i16 addrspace(3)* noalias %out1) #0 {
entry:		entry:
%load = load i16, i16 addrspace(3)* %in		%load = load i16, i16 addrspace(3)* %in
▲ Show 20 Lines • Show All 710 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/nested-calls.ll

	Show All 11 Lines

	; Spill CSR VGPR used for SGPR spilling			; Spill CSR VGPR used for SGPR spilling
	; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s32 ; 4-byte Folded Spill			; GCN-NEXT: buffer_store_dword v32, off, s[0:3], s32 ; 4-byte Folded Spill
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
	; GCN-DAG: v_writelane_b32 v32, s34, 2			; GCN-DAG: v_writelane_b32 v32, s34, 2
	; GCN-DAG: s_mov_b32 s34, s32			; GCN-DAG: s_mov_b32 s34, s32
	; GCN-DAG: s_add_u32 s32, s32, 0x400			; GCN-DAG: s_add_u32 s32, s32, 0x400
	; GCN-DAG: v_writelane_b32 v32, s36, 0			; GCN-DAG: v_writelane_b32 v32, s30, 0
	; GCN-DAG: v_writelane_b32 v32, s37, 1			; GCN-DAG: v_writelane_b32 v32, s31, 1

	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN: v_readlane_b32 s37, v32, 1			; GCN: v_readlane_b32 s4, v32, 0
	; GCN: v_readlane_b32 s36, v32, 0			; GCN: v_readlane_b32 s5, v32, 1

	; GCN-NEXT: s_sub_u32 s32, s32, 0x400			; GCN-NEXT: s_sub_u32 s32, s32, 0x400
	; GCN-NEXT: v_readlane_b32 s34, v32, 2			; GCN-NEXT: v_readlane_b32 s34, v32, 2
	; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 ; 4-byte Folded Reload
	; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]			; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64 s[4:5]
	define void @test_func_call_external_void_func_i32_imm() #0 {			define void @test_func_call_external_void_func_i32_imm() #0 {
	call void @external_void_func_i32(i32 42)			call void @external_void_func_i32(i32 42)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}test_func_call_external_void_func_i32_imm_stack_use:			; GCN-LABEL: {{^}}test_func_call_external_void_func_i32_imm_stack_use:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN: s_mov_b32 s34, s32			; GCN: s_mov_b32 s34, s32
	Show All 18 Lines

llvm/trunk/test/CodeGen/AMDGPU/wave32.ll

	Show First 20 Lines • Show All 1,076 Lines • ▼ Show 20 Lines
	; GFX1032-NEXT: s_mov_b32 exec_lo, [[COPY_EXEC0]]			; GFX1032-NEXT: s_mov_b32 exec_lo, [[COPY_EXEC0]]

	; GCN-NEXT: v_writelane_b32 v32, s34, 2			; GCN-NEXT: v_writelane_b32 v32, s34, 2
	; GCN: s_mov_b32 s34, s32			; GCN: s_mov_b32 s34, s32
	; GFX1064: s_add_u32 s32, s32, 0x400			; GFX1064: s_add_u32 s32, s32, 0x400
	; GFX1032: s_add_u32 s32, s32, 0x200			; GFX1032: s_add_u32 s32, s32, 0x200


	; GCN-DAG: v_writelane_b32 v32, s36, 0			; GCN-DAG: v_writelane_b32 v32, s30, 0
	; GCN-DAG: v_writelane_b32 v32, s37, 1			; GCN-DAG: v_writelane_b32 v32, s31, 1
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GCN-DAG: v_readlane_b32 s36, v32, 0			; GCN-DAG: v_readlane_b32 s4, v32, 0
	; GCN-DAG: v_readlane_b32 s37, v32, 1			; GCN-DAG: v_readlane_b32 s5, v32, 1


	; GFX1064: s_sub_u32 s32, s32, 0x400			; GFX1064: s_sub_u32 s32, s32, 0x400
	; GFX1032: s_sub_u32 s32, s32, 0x200			; GFX1032: s_sub_u32 s32, s32, 0x200
	; GCN: v_readlane_b32 s34, v32, 2			; GCN: v_readlane_b32 s34, v32, 2
	; GFX1064: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GFX1064: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
	; GFX1032: s_or_saveexec_b32 [[COPY_EXEC1:s[0-9]]], -1{{$}}			; GFX1032: s_or_saveexec_b32 [[COPY_EXEC1:s[0-9]]], -1{{$}}
	; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 ; 4-byte Folded Reload			; GCN-NEXT: buffer_load_dword v32, off, s[0:3], s32 ; 4-byte Folded Reload
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Created a sub-register class for the return address operand in the return instruction.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 208717

llvm/trunk/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/trunk/lib/Target/AMDGPU/SIRegisterInfo.td

llvm/trunk/lib/Target/AMDGPU/SOPInstructions.td

llvm/trunk/test/CodeGen/AMDGPU/call-graph-register-usage.ll

llvm/trunk/test/CodeGen/AMDGPU/call-preserved-registers.ll

llvm/trunk/test/CodeGen/AMDGPU/callee-frame-setup.ll

llvm/trunk/test/CodeGen/AMDGPU/callee-special-input-sgprs.ll

llvm/trunk/test/CodeGen/AMDGPU/chain-hi-to-lo.ll

llvm/trunk/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

llvm/trunk/test/CodeGen/AMDGPU/llvm.log.f16.ll

llvm/trunk/test/CodeGen/AMDGPU/llvm.log10.f16.ll

llvm/trunk/test/CodeGen/AMDGPU/load-lo16.ll

llvm/trunk/test/CodeGen/AMDGPU/nested-calls.ll

llvm/trunk/test/CodeGen/AMDGPU/wave32.ll

[AMDGPU] Created a sub-register class for the return address operand in the return instruction.
ClosedPublic