This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Select VOP3 form of add
ClosedPublic

Authored by arsenm on May 5 2019, 2:26 PM.

Download Raw Diff

Details

Reviewers

Summary

The VOP3 form should always be the preferred selection form to be
shrunk later. This should only be an optimization issue, but this
partially works around a problem from clobbering VCC when
IFixSGPRCopies rewrites an SCC defining operation directly to VCC.

3 of the testcases are regressions from failing to fold the immediate
in cases it should. These can be avoided by improving the VCC liveness
handling in SIFoldOperands. Simply increasing the threshold to
computeRegisterLiveness works, although this is common enough that VCC
liveness should probably be tracked throughout the pass. The hack of
leaving behind an implicit_def instruction to avoid breaking iterator
wastes instruction count, which inhibits finding the VCC def in long
chains of adds. Doing this however exposes different, worse looking
regressions from poor scheduling behavior. This could probably be
avoided around by forcing the shrink of the addc here, but the
scheduler should probably be fixed.

The r600 add test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.

Diff Detail

Event Timeline

arsenm created this revision.May 5 2019, 2:26 PM

Herald added subscribers: t-tye, tpr, dstuttard and 5 others. · View Herald TranscriptMay 5 2019, 2:26 PM

I am in favor of this change in general, but can we fix folding issues before? We may have unwanted performance regressions otherwise.

In D61575#1492340, @rampitec wrote:

I am in favor of this change in general, but can we fix folding issues before? We may have unwanted performance regressions otherwise.

I put a lot of time into trying, but fixing all of the issues will take time and this is an important workaround. The regression in the clmem lit test from increasing the folding threshold was worse. The folding pass needs more work to track VCC accurately, and the scheduler needs work to not regress it. An alternative might be to force shrinking of the addc

In D61575#1492375, @arsenm wrote:

In D61575#1492340, @rampitec wrote:

I am in favor of this change in general, but can we fix folding issues before? We may have unwanted performance regressions otherwise.

I put a lot of time into trying, but fixing all of the issues will take time and this is an important workaround. The regression in the clmem lit test from increasing the folding threshold was worse. The folding pass needs more work to track VCC accurately, and the scheduler needs work to not regress it. An alternative might be to force shrinking of the addc

We should probably wait with this change then.

rampitec added inline comments.May 8 2019, 11:57 AM

lib/Target/AMDGPU/VOP2Instructions.td
537	While you are here, please move this brace back couple lines up.

In D61575#1492381, @rampitec wrote:

In D61575#1492375, @arsenm wrote:

In D61575#1492340, @rampitec wrote:

I am in favor of this change in general, but can we fix folding issues before? We may have unwanted performance regressions otherwise.

I put a lot of time into trying, but fixing all of the issues will take time and this is an important workaround. The regression in the clmem lit test from increasing the folding threshold was worse. The folding pass needs more work to track VCC accurately, and the scheduler needs work to not regress it. An alternative might be to force shrinking of the addc

We should probably wait with this change then.

I thought about forcing shrinking addc here, but I think it's not really good, and is purely a scheduler workaround. It isn't naturally better, and introduces new physical register constraints. I'm looking at how to fix the scheduler, but that could take a while

LGTM

This revision is now accepted and ready to land.May 8 2019, 2:27 PM

r360293

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

VOP2Instructions.td

4 lines

test/

CodeGen/

AMDGPU/

add.ll

83 lines

cvt_f32_ubyte.ll

3 lines

ds-negative-offset-addressing-mode-loop.ll

6 lines

fence-barrier.ll

3 lines

llvm.amdgcn.update.dpp.ll

3 lines

r600.add.ll

167 lines

salu-to-valu.ll

2 lines

Diff 198202

lib/Target/AMDGPU/VOP2Instructions.td

Show First 20 Lines • Show All 518 Lines • ▼ Show 20 Lines	GCNPat<
)		)
>;		>;

def : DivergentBinOp<srl, V_LSHRREV_B32_e64>;		def : DivergentBinOp<srl, V_LSHRREV_B32_e64>;
def : DivergentBinOp<sra, V_ASHRREV_I32_e64>;		def : DivergentBinOp<sra, V_ASHRREV_I32_e64>;
def : DivergentBinOp<shl, V_LSHLREV_B32_e64>;		def : DivergentBinOp<shl, V_LSHLREV_B32_e64>;

let SubtargetPredicate = HasAddNoCarryInsts in {		let SubtargetPredicate = HasAddNoCarryInsts in {
def : DivergentBinOp<add, V_ADD_U32_e32>;		def : DivergentClampingBinOp<add, V_ADD_U32_e64>;
def : DivergentClampingBinOp<sub, V_SUB_U32_e64>;		def : DivergentClampingBinOp<sub, V_SUB_U32_e64>;
}		}

let SubtargetPredicate = isGFX6GFX7GFX8GFX9, Predicates = [isGFX6GFX7GFX8GFX9] in {		let SubtargetPredicate = isGFX6GFX7GFX8GFX9, Predicates = [isGFX6GFX7GFX8GFX9] in {
def : DivergentBinOp<add, V_ADD_I32_e32>;		def : DivergentClampingBinOp<add, V_ADD_I32_e64>;
def : DivergentClampingBinOp<sub, V_SUB_I32_e64>;		def : DivergentClampingBinOp<sub, V_SUB_I32_e64>;

def : DivergentBinOp<adde, V_ADDC_U32_e32>;		def : DivergentBinOp<adde, V_ADDC_U32_e32>;
def : DivergentBinOp<sube, V_SUBB_U32_e32>;		def : DivergentBinOp<sube, V_SUBB_U32_e32>;
}		}
		rampitecUnsubmitted Not Done Reply Inline Actions While you are here, please move this brace back couple lines up. rampitec: While you are here, please move this brace back couple lines up.

class divergent_i64_BinOp <SDPatternOperator Op, Instruction Inst> :		class divergent_i64_BinOp <SDPatternOperator Op, Instruction Inst> :
GCNPat<		GCNPat<
(getDivergentFrag<Op>.ret i64:$src0, i64:$src1),		(getDivergentFrag<Op>.ret i64:$src0, i64:$src1),
(REG_SEQUENCE VReg_64,		(REG_SEQUENCE VReg_64,
(Inst		(Inst
(i32 (EXTRACT_SUBREG $src0, sub0)),		(i32 (EXTRACT_SUBREG $src0, sub0)),
(i32 (EXTRACT_SUBREG $src1, sub0))		(i32 (EXTRACT_SUBREG $src1, sub0))
▲ Show 20 Lines • Show All 811 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/add.ll

	; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SIVI,FUNC %s			; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SIVI,FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SIVI,FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SIVI,FUNC %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9,FUNC %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9,FUNC %s
	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -enable-var-scope -check-prefix=EG -check-prefix=FUNC %s

	; FUNC-LABEL: {{^}}s_add_i32:			; FUNC-LABEL: {{^}}s_add_i32:
	; EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}

	; GCN: s_add_i32 s[[REG:[0-9]+]], {{s[0-9]+, s[0-9]+}}			; GCN: s_add_i32 s[[REG:[0-9]+]], {{s[0-9]+, s[0-9]+}}
	; GCN: v_mov_b32_e32 v[[V_REG:[0-9]+]], s[[REG]]			; GCN: v_mov_b32_e32 v[[V_REG:[0-9]+]], s[[REG]]
	; GCN: buffer_store_dword v[[V_REG]],			; GCN: buffer_store_dword v[[V_REG]],
	define amdgpu_kernel void @s_add_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in) #0 {			define amdgpu_kernel void @s_add_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in) #0 {
	%b_ptr = getelementptr i32, i32 addrspace(1)* %in, i32 1			%b_ptr = getelementptr i32, i32 addrspace(1)* %in, i32 1
	%a = load i32, i32 addrspace(1)* %in			%a = load i32, i32 addrspace(1)* %in
	%b = load i32, i32 addrspace(1)* %b_ptr			%b = load i32, i32 addrspace(1)* %b_ptr
	%result = add i32 %a, %b			%result = add i32 %a, %b
	store i32 %result, i32 addrspace(1)* %out			store i32 %result, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}s_add_v2i32:			; FUNC-LABEL: {{^}}s_add_v2i32:
	; EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
	; EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}

	; GCN: s_add_i32 s{{[0-9]+, s[0-9]+, s[0-9]+}}			; GCN: s_add_i32 s{{[0-9]+, s[0-9]+, s[0-9]+}}
	; GCN: s_add_i32 s{{[0-9]+, s[0-9]+, s[0-9]+}}			; GCN: s_add_i32 s{{[0-9]+, s[0-9]+, s[0-9]+}}
	define amdgpu_kernel void @s_add_v2i32(<2 x i32> addrspace(1)* %out, <2 x i32> addrspace(1)* %in) {			define amdgpu_kernel void @s_add_v2i32(<2 x i32> addrspace(1)* %out, <2 x i32> addrspace(1)* %in) {
	%b_ptr = getelementptr <2 x i32>, <2 x i32> addrspace(1)* %in, i32 1			%b_ptr = getelementptr <2 x i32>, <2 x i32> addrspace(1)* %in, i32 1
	%a = load <2 x i32>, <2 x i32> addrspace(1)* %in			%a = load <2 x i32>, <2 x i32> addrspace(1)* %in
	%b = load <2 x i32>, <2 x i32> addrspace(1)* %b_ptr			%b = load <2 x i32>, <2 x i32> addrspace(1)* %b_ptr
	%result = add <2 x i32> %a, %b			%result = add <2 x i32> %a, %b
	store <2 x i32> %result, <2 x i32> addrspace(1)* %out			store <2 x i32> %result, <2 x i32> addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}s_add_v4i32:			; FUNC-LABEL: {{^}}s_add_v4i32:
	; EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
	; EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
	; EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
	; EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}

	; GCN: s_add_i32 s{{[0-9]+, s[0-9]+, s[0-9]+}}			; GCN: s_add_i32 s{{[0-9]+, s[0-9]+, s[0-9]+}}
	; GCN: s_add_i32 s{{[0-9]+, s[0-9]+, s[0-9]+}}			; GCN: s_add_i32 s{{[0-9]+, s[0-9]+, s[0-9]+}}
	; GCN: s_add_i32 s{{[0-9]+, s[0-9]+, s[0-9]+}}			; GCN: s_add_i32 s{{[0-9]+, s[0-9]+, s[0-9]+}}
	; GCN: s_add_i32 s{{[0-9]+, s[0-9]+, s[0-9]+}}			; GCN: s_add_i32 s{{[0-9]+, s[0-9]+, s[0-9]+}}
	define amdgpu_kernel void @s_add_v4i32(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) {			define amdgpu_kernel void @s_add_v4i32(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) {
	%b_ptr = getelementptr <4 x i32>, <4 x i32> addrspace(1)* %in, i32 1			%b_ptr = getelementptr <4 x i32>, <4 x i32> addrspace(1)* %in, i32 1
	%a = load <4 x i32>, <4 x i32> addrspace(1)* %in			%a = load <4 x i32>, <4 x i32> addrspace(1)* %in
	%b = load <4 x i32>, <4 x i32> addrspace(1)* %b_ptr			%b = load <4 x i32>, <4 x i32> addrspace(1)* %b_ptr
	%result = add <4 x i32> %a, %b			%result = add <4 x i32> %a, %b
	store <4 x i32> %result, <4 x i32> addrspace(1)* %out			store <4 x i32> %result, <4 x i32> addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}s_add_v8i32:			; FUNC-LABEL: {{^}}s_add_v8i32:
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT

	; GCN: s_add_i32			; GCN: s_add_i32
	; GCN: s_add_i32			; GCN: s_add_i32
	; GCN: s_add_i32			; GCN: s_add_i32
	; GCN: s_add_i32			; GCN: s_add_i32
	; GCN: s_add_i32			; GCN: s_add_i32
	; GCN: s_add_i32			; GCN: s_add_i32
	; GCN: s_add_i32			; GCN: s_add_i32
	; GCN: s_add_i32			; GCN: s_add_i32
	define amdgpu_kernel void @s_add_v8i32(<8 x i32> addrspace(1)* %out, <8 x i32> %a, <8 x i32> %b) {			define amdgpu_kernel void @s_add_v8i32(<8 x i32> addrspace(1)* %out, <8 x i32> %a, <8 x i32> %b) {
	entry:			entry:
	%0 = add <8 x i32> %a, %b			%0 = add <8 x i32> %a, %b
	store <8 x i32> %0, <8 x i32> addrspace(1)* %out			store <8 x i32> %0, <8 x i32> addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}s_add_v16i32:			; FUNC-LABEL: {{^}}s_add_v16i32:
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT
	; EG: ADD_INT

	; GCN: s_add_i32			; GCN: s_add_i32
	; GCN: s_add_i32			; GCN: s_add_i32
	; GCN: s_add_i32			; GCN: s_add_i32
	; GCN: s_add_i32			; GCN: s_add_i32
	; GCN: s_add_i32			; GCN: s_add_i32
	; GCN: s_add_i32			; GCN: s_add_i32
	; GCN: s_add_i32			; GCN: s_add_i32
	; GCN: s_add_i32			; GCN: s_add_i32
	Show All 13 Lines
	}			}

	; FUNC-LABEL: {{^}}v_add_i32:			; FUNC-LABEL: {{^}}v_add_i32:
	; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]			; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
	; GCN: {{buffer\|flat\|global}}_load_dword [[B:v[0-9]+]]			; GCN: {{buffer\|flat\|global}}_load_dword [[B:v[0-9]+]]
	; SIVI: v_add_{{i\|u}}32_e32 v{{[0-9]+}}, vcc, [[A]], [[B]]			; SIVI: v_add_{{i\|u}}32_e32 v{{[0-9]+}}, vcc, [[A]], [[B]]
	; GFX9: v_add_u32_e32 v{{[0-9]+}}, [[A]], [[B]]			; GFX9: v_add_u32_e32 v{{[0-9]+}}, [[A]], [[B]]
	define amdgpu_kernel void @v_add_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in) #0 {			define amdgpu_kernel void @v_add_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in) #0 {
	%tid = call i32 @llvm.r600.read.tidig.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep = getelementptr inbounds i32, i32 addrspace(1)* %in, i32 %tid			%gep = getelementptr inbounds i32, i32 addrspace(1)* %in, i32 %tid
	%b_ptr = getelementptr i32, i32 addrspace(1)* %gep, i32 1			%b_ptr = getelementptr i32, i32 addrspace(1)* %gep, i32 1
	%a = load volatile i32, i32 addrspace(1)* %gep			%a = load volatile i32, i32 addrspace(1)* %gep
	%b = load volatile i32, i32 addrspace(1)* %b_ptr			%b = load volatile i32, i32 addrspace(1)* %b_ptr
	%result = add i32 %a, %b			%result = add i32 %a, %b
	store i32 %result, i32 addrspace(1)* %out			store i32 %result, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}v_add_imm_i32:			; FUNC-LABEL: {{^}}v_add_imm_i32:
	; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]			; GCN: {{buffer\|flat\|global}}_load_dword [[A:v[0-9]+]]
	; SIVI: v_add_{{i\|u}}32_e32 v{{[0-9]+}}, vcc, 0x7b, [[A]]			; SIVI: v_add_{{i\|u}}32_e32 v{{[0-9]+}}, vcc, 0x7b, [[A]]
	; GFX9: v_add_u32_e32 v{{[0-9]+}}, 0x7b, [[A]]			; GFX9: v_add_u32_e32 v{{[0-9]+}}, 0x7b, [[A]]
	define amdgpu_kernel void @v_add_imm_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in) #0 {			define amdgpu_kernel void @v_add_imm_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in) #0 {
	%tid = call i32 @llvm.r600.read.tidig.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep = getelementptr inbounds i32, i32 addrspace(1)* %in, i32 %tid			%gep = getelementptr inbounds i32, i32 addrspace(1)* %in, i32 %tid
	%b_ptr = getelementptr i32, i32 addrspace(1)* %gep, i32 1			%b_ptr = getelementptr i32, i32 addrspace(1)* %gep, i32 1
	%a = load volatile i32, i32 addrspace(1)* %gep			%a = load volatile i32, i32 addrspace(1)* %gep
	%result = add i32 %a, 123			%result = add i32 %a, 123
	store i32 %result, i32 addrspace(1)* %out			store i32 %result, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}add64:			; FUNC-LABEL: {{^}}add64:
	; GCN: s_add_u32			; GCN: s_add_u32
	; GCN: s_addc_u32			; GCN: s_addc_u32

	; EG: MEM_RAT_CACHELESS STORE_RAW [[LO:T[0-9]+\.XY]]
	; EG-DAG: ADD_INT {{[* ]*}}
	; EG-DAG: ADDC_UINT
	; EG-DAG: ADD_INT
	; EG-DAG: ADD_INT {{[* ]*}}
	; EG-NOT: SUB
	define amdgpu_kernel void @add64(i64 addrspace(1)* %out, i64 %a, i64 %b) {			define amdgpu_kernel void @add64(i64 addrspace(1)* %out, i64 %a, i64 %b) {
	entry:			entry:
	%add = add i64 %a, %b			%add = add i64 %a, %b
	store i64 %add, i64 addrspace(1)* %out			store i64 %add, i64 addrspace(1)* %out
	ret void			ret void
	}			}

	; The v_addc_u32 and v_add_i32 instruction can't read SGPRs, because they			; The v_addc_u32 and v_add_i32 instruction can't read SGPRs, because they
	; use VCC. The test is designed so that %a will be stored in an SGPR and			; use VCC. The test is designed so that %a will be stored in an SGPR and
	; %0 will be stored in a VGPR, so the comiler will be forced to copy %a			; %0 will be stored in a VGPR, so the comiler will be forced to copy %a
	; to a VGPR before doing the add.			; to a VGPR before doing the add.

	; FUNC-LABEL: {{^}}add64_sgpr_vgpr:			; FUNC-LABEL: {{^}}add64_sgpr_vgpr:
	; GCN-NOT: v_addc_u32_e32 s			; GCN-NOT: v_addc_u32_e32 s

	; EG: MEM_RAT_CACHELESS STORE_RAW [[LO:T[0-9]+\.XY]]
	; EG-DAG: ADD_INT {{[* ]*}}
	; EG-DAG: ADDC_UINT
	; EG-DAG: ADD_INT
	; EG-DAG: ADD_INT {{[* ]*}}
	; EG-NOT: SUB
	define amdgpu_kernel void @add64_sgpr_vgpr(i64 addrspace(1)* %out, i64 %a, i64 addrspace(1)* %in) {			define amdgpu_kernel void @add64_sgpr_vgpr(i64 addrspace(1)* %out, i64 %a, i64 addrspace(1)* %in) {
	entry:			entry:
	%0 = load i64, i64 addrspace(1)* %in			%0 = load i64, i64 addrspace(1)* %in
	%1 = add i64 %a, %0			%1 = add i64 %a, %0
	store i64 %1, i64 addrspace(1)* %out			store i64 %1, i64 addrspace(1)* %out
	ret void			ret void
	}			}

	; Test i64 add inside a branch.			; Test i64 add inside a branch.
	; FUNC-LABEL: {{^}}add64_in_branch:			; FUNC-LABEL: {{^}}add64_in_branch:
	; GCN: s_add_u32			; GCN: s_add_u32
	; GCN: s_addc_u32			; GCN: s_addc_u32

	; EG: MEM_RAT_CACHELESS STORE_RAW [[LO:T[0-9]+\.XY]]
	; EG-DAG: ADD_INT {{[* ]*}}
	; EG-DAG: ADDC_UINT
	; EG-DAG: ADD_INT
	; EG-DAG: ADD_INT {{[* ]*}}
	; EG-NOT: SUB
	define amdgpu_kernel void @add64_in_branch(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) {			define amdgpu_kernel void @add64_in_branch(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) {
	entry:			entry:
	%0 = icmp eq i64 %a, 0			%0 = icmp eq i64 %a, 0
	br i1 %0, label %if, label %else			br i1 %0, label %if, label %else

	if:			if:
	%1 = load i64, i64 addrspace(1)* %in			%1 = load i64, i64 addrspace(1)* %in
	br label %endif			br label %endif

	else:			else:
	%2 = add i64 %a, %b			%2 = add i64 %a, %b
	br label %endif			br label %endif

	endif:			endif:
	%3 = phi i64 [%1, %if], [%2, %else]			%3 = phi i64 [%1, %if], [%2, %else]
	store i64 %3, i64 addrspace(1)* %out			store i64 %3, i64 addrspace(1)* %out
	ret void			ret void
	}			}

	declare i32 @llvm.r600.read.tidig.x() #1			; Make sure the VOP3 form of add is initially selected. Otherwise pair
				; of opies from/to VCC would be necessary

				; GCN-LABEL: {{^}}add_select_vop3:
				; SI: v_add_i32_e64 v0, s[0:1], s0, v0
				; VI: v_add_u32_e64 v0, s[0:1], s0, v0
				; GFX9: v_add_u32_e32 v0, s0, v0

				; GCN: ; def vcc
				; GCN: ds_write_b32
				; GCN: ; use vcc
				define amdgpu_ps void @add_select_vop3(i32 inreg %s, i32 %v) {
				%vcc = call i64 asm sideeffect "; def vcc", "={vcc}"()
				%sub = add i32 %v, %s
				store i32 %sub, i32 addrspace(3)* undef
				call void asm sideeffect "; use vcc", "{vcc}"(i64 %vcc)
				ret void
				}

				declare i32 @llvm.amdgcn.workitem.id.x() #1

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readnone speculatable }			attributes #1 = { nounwind readnone speculatable }

test/CodeGen/AMDGPU/cvt_f32_ubyte.ll

	Show First 20 Lines • Show All 274 Lines • ▼ Show 20 Lines
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: buffer_load_dword v1, v[0:1], s[4:7], 0 addr64			; SI-NEXT: buffer_load_dword v1, v[0:1], s[4:7], 0 addr64
	; SI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x9			; SI-NEXT: s_load_dwordx2 s[8:9], s[0:1], 0x9
	; SI-NEXT: s_mov_b32 s2, -1			; SI-NEXT: s_mov_b32 s2, -1
	; SI-NEXT: s_movk_i32 s12, 0xff			; SI-NEXT: s_movk_i32 s12, 0xff
	; SI-NEXT: s_mov_b32 s10, s2			; SI-NEXT: s_mov_b32 s10, s2
	; SI-NEXT: s_mov_b32 s11, s3			; SI-NEXT: s_mov_b32 s11, s3
	; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xb			; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xb
				; SI-NEXT: s_movk_i32 s13, 0x900
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: v_lshrrev_b32_e32 v4, 16, v1			; SI-NEXT: v_lshrrev_b32_e32 v4, 16, v1
	; SI-NEXT: v_add_i32_e32 v7, vcc, 9, v1			; SI-NEXT: v_add_i32_e32 v7, vcc, 9, v1
	; SI-NEXT: v_and_b32_e32 v6, 0xff00, v1			; SI-NEXT: v_and_b32_e32 v6, 0xff00, v1
	; SI-NEXT: v_lshrrev_b32_e32 v5, 24, v1			; SI-NEXT: v_lshrrev_b32_e32 v5, 24, v1
	; SI-NEXT: v_cvt_f32_ubyte3_e32 v3, v1			; SI-NEXT: v_cvt_f32_ubyte3_e32 v3, v1
	; SI-NEXT: v_cvt_f32_ubyte2_e32 v2, v1			; SI-NEXT: v_cvt_f32_ubyte2_e32 v2, v1
	; SI-NEXT: v_cvt_f32_ubyte0_e32 v0, v1			; SI-NEXT: v_cvt_f32_ubyte0_e32 v0, v1
	; SI-NEXT: v_cvt_f32_ubyte1_e32 v1, v6			; SI-NEXT: v_cvt_f32_ubyte1_e32 v1, v6
	; SI-NEXT: v_and_b32_e32 v7, s12, v7			; SI-NEXT: v_and_b32_e32 v7, s12, v7
	; SI-NEXT: v_add_i32_e32 v4, vcc, 9, v4			; SI-NEXT: v_add_i32_e32 v4, vcc, 9, v4
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[8:11], 0			; SI-NEXT: buffer_store_dwordx4 v[0:3], off, s[8:11], 0
	; SI-NEXT: s_waitcnt expcnt(0)			; SI-NEXT: s_waitcnt expcnt(0)
	; SI-NEXT: v_or_b32_e32 v0, v6, v7			; SI-NEXT: v_or_b32_e32 v0, v6, v7
	; SI-NEXT: v_lshlrev_b32_e32 v5, 8, v5			; SI-NEXT: v_lshlrev_b32_e32 v5, 8, v5
	; SI-NEXT: v_and_b32_e32 v1, s12, v4			; SI-NEXT: v_and_b32_e32 v1, s12, v4
	; SI-NEXT: v_add_i32_e32 v0, vcc, 0x900, v0			; SI-NEXT: v_add_i32_e32 v0, vcc, s13, v0
	; SI-NEXT: v_or_b32_e32 v1, v5, v1			; SI-NEXT: v_or_b32_e32 v1, v5, v1
	; SI-NEXT: v_and_b32_e32 v0, 0xffff, v0			; SI-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; SI-NEXT: v_lshlrev_b32_e32 v1, 16, v1			; SI-NEXT: v_lshlrev_b32_e32 v1, 16, v1
	; SI-NEXT: v_or_b32_e32 v0, v1, v0			; SI-NEXT: v_or_b32_e32 v0, v1, v0
	; SI-NEXT: v_add_i32_e32 v0, vcc, 0x9000000, v0			; SI-NEXT: v_add_i32_e32 v0, vcc, 0x9000000, v0
	; SI-NEXT: buffer_store_dword v0, off, s[0:3], 0			; SI-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	;			;
	▲ Show 20 Lines • Show All 648 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/ds-negative-offset-addressing-mode-loop.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs -mattr=+load-store-opt < %s \| FileCheck -check-prefix=SI --check-prefix=CHECK %s			; RUN: llc -march=amdgcn -verify-machineinstrs -mattr=+load-store-opt < %s \| FileCheck -check-prefix=SI --check-prefix=CHECK %s
	; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs -mattr=+load-store-opt < %s \| FileCheck -check-prefix=CI --check-prefix=CHECK %s			; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs -mattr=+load-store-opt < %s \| FileCheck -check-prefix=CI --check-prefix=CHECK %s
	; RUN: llc -march=amdgcn -verify-machineinstrs -mattr=+load-store-opt,+unsafe-ds-offset-folding < %s \| FileCheck -check-prefix=CI --check-prefix=CHECK %s			; RUN: llc -march=amdgcn -verify-machineinstrs -mattr=+load-store-opt,+unsafe-ds-offset-folding < %s \| FileCheck -check-prefix=CI --check-prefix=CHECK %s

	declare i32 @llvm.amdgcn.workitem.id.x() #0			declare i32 @llvm.amdgcn.workitem.id.x() #0
	declare void @llvm.amdgcn.s.barrier() #1			declare void @llvm.amdgcn.s.barrier() #1

	; Function Attrs: nounwind			; Function Attrs: nounwind
	; CHECK-LABEL: {{^}}signed_ds_offset_addressing_loop:			; CHECK-LABEL: {{^}}signed_ds_offset_addressing_loop:
				; SI: s_movk_i32 [[K_0X88:s[0-9]+]], 0x
				; SI: s_movk_i32 [[K_0X100:s[0-9]+]], 0x100
	; CHECK: BB0_1:			; CHECK: BB0_1:
	; CHECK: v_add_i32_e32 [[VADDR:v[0-9]+]],			; CHECK: v_add_i32_e32 [[VADDR:v[0-9]+]],
	; SI-DAG: ds_read_b32 v{{[0-9]+}}, [[VADDR]]			; SI-DAG: ds_read_b32 v{{[0-9]+}}, [[VADDR]]
	; SI-DAG: v_add_i32_e32 [[VADDR8:v[0-9]+]], vcc, 8, [[VADDR]]			; SI-DAG: v_add_i32_e32 [[VADDR8:v[0-9]+]], vcc, 8, [[VADDR]]
	; SI-DAG: ds_read_b32 v{{[0-9]+}}, [[VADDR8]]			; SI-DAG: ds_read_b32 v{{[0-9]+}}, [[VADDR8]]
	; SI-DAG: v_add_i32_e32 [[VADDR0x80:v[0-9]+]], vcc, 0x80, [[VADDR]]			; SI-DAG: v_add_i32_e32 [[VADDR0x80:v[0-9]+]], vcc, 0x80, [[VADDR]]
	; SI-DAG: ds_read_b32 v{{[0-9]+}}, [[VADDR0x80]]			; SI-DAG: ds_read_b32 v{{[0-9]+}}, [[VADDR0x80]]
	; SI-DAG: v_add_i32_e32 [[VADDR0x88:v[0-9]+]], vcc, 0x88, [[VADDR]]			; SI-DAG: v_add_i32_e32 [[VADDR0x88:v[0-9]+]], vcc, [[K_0X88]], [[VADDR]]
	; SI-DAG: ds_read_b32 v{{[0-9]+}}, [[VADDR0x88]]			; SI-DAG: ds_read_b32 v{{[0-9]+}}, [[VADDR0x88]]
	; SI-DAG: v_add_i32_e32 [[VADDR0x100:v[0-9]+]], vcc, 0x100, [[VADDR]]			; SI-DAG: v_add_i32_e32 [[VADDR0x100:v[0-9]+]], vcc, [[K_0X100]], [[VADDR]]
	; SI-DAG: ds_read_b32 v{{[0-9]+}}, [[VADDR0x100]]			; SI-DAG: ds_read_b32 v{{[0-9]+}}, [[VADDR0x100]]

	; CI-DAG: ds_read2_b32 v{{\[[0-9]+:[0-9]+\]}}, [[VADDR]] offset1:2			; CI-DAG: ds_read2_b32 v{{\[[0-9]+:[0-9]+\]}}, [[VADDR]] offset1:2
	; CI-DAG: ds_read2_b32 v{{\[[0-9]+:[0-9]+\]}}, [[VADDR]] offset0:32 offset1:34			; CI-DAG: ds_read2_b32 v{{\[[0-9]+:[0-9]+\]}}, [[VADDR]] offset0:32 offset1:34
	; CI-DAG: ds_read_b32 v{{[0-9]+}}, [[VADDR]] offset:256			; CI-DAG: ds_read_b32 v{{[0-9]+}}, [[VADDR]] offset:256
	; CHECK: s_endpgm			; CHECK: s_endpgm
	define amdgpu_kernel void @signed_ds_offset_addressing_loop(float addrspace(1)* noalias nocapture %out, float addrspace(3)* noalias nocapture readonly %lptr, i32 %n) #2 {			define amdgpu_kernel void @signed_ds_offset_addressing_loop(float addrspace(1)* noalias nocapture %out, float addrspace(3)* noalias nocapture readonly %lptr, i32 %n) #2 {
	entry:			entry:
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/fence-barrier.ll

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	; <label>:7: ; preds = %6, %1
%22 = load i64, i64 addrspace(4)* %21, align 8		%22 = load i64, i64 addrspace(4)* %21, align 8
%23 = add i64 %22, %20		%23 = add i64 %22, %20
%24 = getelementptr inbounds i32, i32 addrspace(1)* %9, i64 %23		%24 = getelementptr inbounds i32, i32 addrspace(1)* %9, i64 %23
store i32 %8, i32 addrspace(1)* %24, align 4		store i32 %8, i32 addrspace(1)* %24, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_global		; GCN-LABEL: {{^}}test_global
; GCN: v_add_u32_e32 v{{[0-9]+}}, vcc, 0x888, v{{[0-9]+}}		; GCN: s_movk_i32 [[K:s[0-9]+]], 0x888
		; GCN: v_add_u32_e32 v{{[0-9]+}}, vcc, [[K]], v{{[0-9]+}}
; GCN: flat_store_dword		; GCN: flat_store_dword
; GCN: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}		; GCN: s_waitcnt vmcnt(0) lgkmcnt(0){{$}}
; GCN-NEXT: s_barrier		; GCN-NEXT: s_barrier
define amdgpu_kernel void @test_global(i32 addrspace(1)*) {		define amdgpu_kernel void @test_global(i32 addrspace(1)*) {
%2 = alloca i32 addrspace(1)*, align 4, addrspace(5)		%2 = alloca i32 addrspace(1)*, align 4, addrspace(5)
%3 = alloca i32, align 4, addrspace(5)		%3 = alloca i32, align 4, addrspace(5)
store i32 addrspace(1)* %0, i32 addrspace(1)* addrspace(5)* %2, align 4		store i32 addrspace(1)* %0, i32 addrspace(1)* addrspace(5)* %2, align 4
store i32 0, i32 addrspace(5)* %3, align 4		store i32 0, i32 addrspace(5)* %3, align 4
▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.update.dpp.ll

	Show All 9 Lines
	; VI: v_mov_b32_dpp v0, v1 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0 ; encoding: [0xfa,0x02,0x00,0x7e,0x01,0x01,0x08,0x11]			; VI: v_mov_b32_dpp v0, v1 quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0 ; encoding: [0xfa,0x02,0x00,0x7e,0x01,0x01,0x08,0x11]
	define amdgpu_kernel void @dpp_test(i32 addrspace(1)* %out, i32 %in1, i32 %in2) {			define amdgpu_kernel void @dpp_test(i32 addrspace(1)* %out, i32 %in1, i32 %in2) {
	%tmp0 = call i32 @llvm.amdgcn.update.dpp.i32(i32 %in1, i32 %in2, i32 1, i32 1, i32 1, i1 1) #0			%tmp0 = call i32 @llvm.amdgcn.update.dpp.i32(i32 %in1, i32 %in2, i32 1, i32 1, i32 1, i1 1) #0
	store i32 %tmp0, i32 addrspace(1)* %out			store i32 %tmp0, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; VI-LABEL: {{^}}dpp_test1:			; VI-LABEL: {{^}}dpp_test1:
	; VI: v_add_u32_e32 [[REG:v[0-9]+]], vcc, v{{[0-9]+}}, v{{[0-9]+}}			; VI-OPT: v_add_u32_e32 [[REG:v[0-9]+]], vcc, v{{[0-9]+}}, v{{[0-9]+}}
				; VI-NOOPT: v_add_u32_e64 [[REG:v[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, v{{[0-9]+}}
	; VI-NOOPT: v_mov_b32_e32 v{{[0-9]+}}, 0			; VI-NOOPT: v_mov_b32_e32 v{{[0-9]+}}, 0
	; VI-NEXT: s_nop 0			; VI-NEXT: s_nop 0
	; VI-NEXT: s_nop 0			; VI-NEXT: s_nop 0
	; VI-NEXT: v_mov_b32_dpp {{v[0-9]+}}, [[REG]] quad_perm:[1,0,3,2] row_mask:0xf bank_mask:0xf			; VI-NEXT: v_mov_b32_dpp {{v[0-9]+}}, [[REG]] quad_perm:[1,0,3,2] row_mask:0xf bank_mask:0xf
	@0 = internal unnamed_addr addrspace(3) global [448 x i32] undef, align 4			@0 = internal unnamed_addr addrspace(3) global [448 x i32] undef, align 4
	define weak_odr amdgpu_kernel void @dpp_test1(i32* %arg) local_unnamed_addr {			define weak_odr amdgpu_kernel void @dpp_test1(i32* %arg) local_unnamed_addr {
	bb:			bb:
	%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()			%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()
	Show All 19 Lines

test/CodeGen/AMDGPU/r600.add.ll

This file was added.

				; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -enable-var-scope -check-prefix=EG -check-prefix=FUNC %s

				; FUNC-LABEL: {{^}}s_add_i32:
				; EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
				define amdgpu_kernel void @s_add_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in) #0 {
				%b_ptr = getelementptr i32, i32 addrspace(1)* %in, i32 1
				%a = load i32, i32 addrspace(1)* %in
				%b = load i32, i32 addrspace(1)* %b_ptr
				%result = add i32 %a, %b
				store i32 %result, i32 addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}s_add_v2i32:
				; EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
				; EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
				define amdgpu_kernel void @s_add_v2i32(<2 x i32> addrspace(1)* %out, <2 x i32> addrspace(1)* %in) {
				%b_ptr = getelementptr <2 x i32>, <2 x i32> addrspace(1)* %in, i32 1
				%a = load <2 x i32>, <2 x i32> addrspace(1)* %in
				%b = load <2 x i32>, <2 x i32> addrspace(1)* %b_ptr
				%result = add <2 x i32> %a, %b
				store <2 x i32> %result, <2 x i32> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}s_add_v4i32:
				; EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
				; EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
				; EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
				; EG: ADD_INT {{[* ]*}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
				define amdgpu_kernel void @s_add_v4i32(<4 x i32> addrspace(1)* %out, <4 x i32> addrspace(1)* %in) {
				%b_ptr = getelementptr <4 x i32>, <4 x i32> addrspace(1)* %in, i32 1
				%a = load <4 x i32>, <4 x i32> addrspace(1)* %in
				%b = load <4 x i32>, <4 x i32> addrspace(1)* %b_ptr
				%result = add <4 x i32> %a, %b
				store <4 x i32> %result, <4 x i32> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}s_add_v8i32:
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				define amdgpu_kernel void @s_add_v8i32(<8 x i32> addrspace(1)* %out, <8 x i32> %a, <8 x i32> %b) {
				entry:
				%0 = add <8 x i32> %a, %b
				store <8 x i32> %0, <8 x i32> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}s_add_v16i32:
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				; EG: ADD_INT
				define amdgpu_kernel void @s_add_v16i32(<16 x i32> addrspace(1)* %out, <16 x i32> %a, <16 x i32> %b) {
				entry:
				%0 = add <16 x i32> %a, %b
				store <16 x i32> %0, <16 x i32> addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}v_add_i32:
				define amdgpu_kernel void @v_add_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in) #0 {
				%tid = call i32 @llvm.r600.read.tidig.x()
				%gep = getelementptr inbounds i32, i32 addrspace(1)* %in, i32 %tid
				%b_ptr = getelementptr i32, i32 addrspace(1)* %gep, i32 1
				%a = load volatile i32, i32 addrspace(1)* %gep
				%b = load volatile i32, i32 addrspace(1)* %b_ptr
				%result = add i32 %a, %b
				store i32 %result, i32 addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}v_add_imm_i32:
				define amdgpu_kernel void @v_add_imm_i32(i32 addrspace(1)* %out, i32 addrspace(1)* %in) #0 {
				%tid = call i32 @llvm.r600.read.tidig.x()
				%gep = getelementptr inbounds i32, i32 addrspace(1)* %in, i32 %tid
				%b_ptr = getelementptr i32, i32 addrspace(1)* %gep, i32 1
				%a = load volatile i32, i32 addrspace(1)* %gep
				%result = add i32 %a, 123
				store i32 %result, i32 addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}add64:
				; EG: MEM_RAT_CACHELESS STORE_RAW [[LO:T[0-9]+\.XY]]
				; EG-DAG: ADD_INT {{[* ]*}}
				; EG-DAG: ADDC_UINT
				; EG-DAG: ADD_INT
				; EG-DAG: ADD_INT {{[* ]*}}
				; EG-NOT: SUB
				define amdgpu_kernel void @add64(i64 addrspace(1)* %out, i64 %a, i64 %b) {
				entry:
				%add = add i64 %a, %b
				store i64 %add, i64 addrspace(1)* %out
				ret void
				}

				; The v_addc_u32 and v_add_i32 instruction can't read SGPRs, because they
				; use VCC. The test is designed so that %a will be stored in an SGPR and
				; %0 will be stored in a VGPR, so the comiler will be forced to copy %a
				; to a VGPR before doing the add.

				; FUNC-LABEL: {{^}}add64_sgpr_vgpr:
				; EG: MEM_RAT_CACHELESS STORE_RAW [[LO:T[0-9]+\.XY]]
				; EG-DAG: ADD_INT {{[* ]*}}
				; EG-DAG: ADDC_UINT
				; EG-DAG: ADD_INT
				; EG-DAG: ADD_INT {{[* ]*}}
				; EG-NOT: SUB
				define amdgpu_kernel void @add64_sgpr_vgpr(i64 addrspace(1)* %out, i64 %a, i64 addrspace(1)* %in) {
				entry:
				%0 = load i64, i64 addrspace(1)* %in
				%1 = add i64 %a, %0
				store i64 %1, i64 addrspace(1)* %out
				ret void
				}

				; Test i64 add inside a branch.
				; FUNC-LABEL: {{^}}add64_in_branch:
				; EG: MEM_RAT_CACHELESS STORE_RAW [[LO:T[0-9]+\.XY]]
				; EG-DAG: ADD_INT {{[* ]*}}
				; EG-DAG: ADDC_UINT
				; EG-DAG: ADD_INT
				; EG-DAG: ADD_INT {{[* ]*}}
				; EG-NOT: SUB
				define amdgpu_kernel void @add64_in_branch(i64 addrspace(1)* %out, i64 addrspace(1)* %in, i64 %a, i64 %b, i64 %c) {
				entry:
				%0 = icmp eq i64 %a, 0
				br i1 %0, label %if, label %else

				if:
				%1 = load i64, i64 addrspace(1)* %in
				br label %endif

				else:
				%2 = add i64 %a, %b
				br label %endif

				endif:
				%3 = phi i64 [%1, %if], [%2, %else]
				store i64 %3, i64 addrspace(1)* %out
				ret void
				}

				declare i32 @llvm.r600.read.tidig.x() #1

				attributes #0 = { nounwind }
				attributes #1 = { nounwind readnone speculatable }

test/CodeGen/AMDGPU/salu-to-valu.ll

Show First 20 Lines • Show All 452 Lines • ▼ Show 20 Lines	bb6:
store i32 1, i32 addrspace(1)* %out		store i32 1, i32 addrspace(1)* %out
br label %bb7		br label %bb7

bb7: ; preds = %bb3		bb7: ; preds = %bb3
ret void		ret void
}		}

; GCN-LABEL: {{^}}phi_visit_order:		; GCN-LABEL: {{^}}phi_visit_order:
; GCN: v_add_i32_e32 v{{[0-9]+}}, vcc, 1, v{{[0-9]+}}		; GCN: v_add_i32_e64 v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}, 1, v{{[0-9]+}}
define amdgpu_kernel void @phi_visit_order() {		define amdgpu_kernel void @phi_visit_order() {
bb:		bb:
br label %bb1		br label %bb1

bb1:		bb1:
%tmp = phi i32 [ 0, %bb ], [ %tmp5, %bb4 ]		%tmp = phi i32 [ 0, %bb ], [ %tmp5, %bb4 ]
%tid = call i32 @llvm.amdgcn.workitem.id.x()		%tid = call i32 @llvm.amdgcn.workitem.id.x()
%cnd = icmp eq i32 %tid, 0		%cnd = icmp eq i32 %tid, 0
Show All 39 Lines

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Select VOP3 form of addClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 198202

lib/Target/AMDGPU/VOP2Instructions.td

test/CodeGen/AMDGPU/add.ll

test/CodeGen/AMDGPU/cvt_f32_ubyte.ll

test/CodeGen/AMDGPU/ds-negative-offset-addressing-mode-loop.ll

test/CodeGen/AMDGPU/fence-barrier.ll

test/CodeGen/AMDGPU/llvm.amdgcn.update.dpp.ll

test/CodeGen/AMDGPU/r600.add.ll

test/CodeGen/AMDGPU/salu-to-valu.ll

AMDGPU: Select VOP3 form of add
ClosedPublic