This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] enable scalar compare in truncate selection
ClosedPublic

Authored by alex-t on Aug 30 2021, 10:33 AM.

Download Raw Diff

Details

Reviewers

rampitec
foad
critson

Commits

rGe3cbf1d43741: [AMDGPU] enable scalar compare in truncate selection

Summary

Currently, the truncate selection dag node is expanded as a bitwise AND plus compare to 1. This change enables scalar comparison in the pattern if the truncate node is uniform.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

alex-t created this revision.Aug 30 2021, 10:33 AM

Herald added subscribers: kerbowa, hiraditya, t-tye and 7 others. · View Herald TranscriptAug 30 2021, 10:33 AM

alex-t requested review of this revision.Aug 30 2021, 10:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 30 2021, 10:33 AM

Herald added a subscriber: wdng. · View Herald Transcript

alex-t added a reviewer: critson.Aug 30 2021, 10:34 AM

Harbormaster completed remote builds in B121767: Diff 369484.Aug 30 2021, 11:16 AM

rampitec added inline comments.Aug 30 2021, 5:01 PM

llvm/lib/Target/AMDGPU/SIInstructions.td
2112	I think we can later fold it into s_bitcmp1_b32 $a, 0

I can confirm this resolves the Vulkan CTS failures caused by D106079.

alex-t added inline comments.Aug 31 2021, 7:33 AM

llvm/lib/Target/AMDGPU/SIInstructions.td
2112	For now, I would prefer to leave TODO here. Any objections?

rampitec added inline comments.Aug 31 2021, 9:43 AM

llvm/lib/Target/AMDGPU/SIInstructions.td
2112	No objections, I also do not think we have to select it this way, but rather combine later.

LGTM

This revision is now accepted and ready to land.Aug 31 2021, 9:46 AM

rampitec added inline comments.Aug 31 2021, 12:19 PM

llvm/lib/Target/AMDGPU/SIInstructions.td
2112	What's interesting, S_AND_B32 will produce SCC = 1 on non-zero result just by itself! When you land it we may experiment with removing S_CMP from this pattern completely, although I am not sure how the rest of the BE prepared for the lack of compare and what would the pattern after propagating SCC from moveToVALU().

rampitec added inline comments.Aug 31 2021, 4:28 PM

llvm/lib/Target/AMDGPU/SIInstructions.td
2112	D109031 will remove compare and that is possible to extend it to use s_bitcmp. The latter will be generally beneficial for any scalar code checking a bitfield, not just this.

rampitec mentioned this in D109031: [AMDGPU] Introduce optimizeCompareInstr.Aug 31 2021, 4:28 PM

Closed by commit rGe3cbf1d43741: [AMDGPU] enable scalar compare in truncate selection (authored by alex-t). · Explain WhySep 1 2021, 1:35 PM

This revision was automatically updated to reflect the committed changes.

alex-t added a commit: rGe3cbf1d43741: [AMDGPU] enable scalar compare in truncate selection.

piotr mentioned this in D109900: [AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC.Sep 17 2021, 12:26 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

SIInstructions.td

16 lines

test/

CodeGen/

AMDGPU/

amdgpu-codegenprepare-fold-binop-select.ll

3 lines

cross-block-use-is-not-abi-copy.ll

6 lines

llvm.amdgcn.div.fmas.ll

2 lines

select-i1.ll

6 lines

trunc.ll

4 lines

wave32.ll

12 lines

Diff 370044

llvm/lib/Target/AMDGPU/SIInstructions.td

Show First 20 Lines • Show All 2,102 Lines • ▼ Show 20 Lines	def : GCNPat<
(COPY VSrc_b16:$src)>;		(COPY VSrc_b16:$src)>;

def : GCNPat <		def : GCNPat <
(i32 (trunc i64:$a)),		(i32 (trunc i64:$a)),
(EXTRACT_SUBREG $a, sub0)		(EXTRACT_SUBREG $a, sub0)
>;		>;

def : GCNPat <		def : GCNPat <
		(i1 (UniformUnaryFrag<trunc> i32:$a)),
		(S_CMP_EQ_U32 (S_AND_B32 (i32 1), $a), (i32 1))
		rampitecUnsubmitted Not Done Reply Inline Actions I think we can later fold it into s_bitcmp1_b32 $a, 0 rampitec: I think we can later fold it into ``` s_bitcmp1_b32 $a, 0 ```
		alex-tAuthorUnsubmitted Done Reply Inline Actions For now, I would prefer to leave TODO here. Any objections? alex-t: For now, I would prefer to leave TODO here. Any objections?
		rampitecUnsubmitted Not Done Reply Inline Actions No objections, I also do not think we have to select it this way, but rather combine later. rampitec: No objections, I also do not think we have to select it this way, but rather combine later.
		rampitecUnsubmitted Not Done Reply Inline Actions What's interesting, S_AND_B32 will produce SCC = 1 on non-zero result just by itself! When you land it we may experiment with removing S_CMP from this pattern completely, although I am not sure how the rest of the BE prepared for the lack of compare and what would the pattern after propagating SCC from moveToVALU(). rampitec: What's interesting, S_AND_B32 will produce SCC = 1 on non-zero result just by itself! When you…
		rampitecUnsubmitted Not Done Reply Inline Actions D109031 will remove compare and that is possible to extend it to use s_bitcmp. The latter will be generally beneficial for any scalar code checking a bitfield, not just this. rampitec: D109031 will remove compare and that is possible to extend it to use s_bitcmp. The latter will…
		>;

		def : GCNPat <
		(i1 (UniformUnaryFrag<trunc> i16:$a)),
		(S_CMP_EQ_U32 (S_AND_B32 (i32 1), $a), (i32 1))
		>;

		def : GCNPat <
		(i1 (UniformUnaryFrag<trunc> i64:$a)),
		(S_CMP_EQ_U32 (S_AND_B32 (i32 1),
		(i32 (EXTRACT_SUBREG $a, sub0))), (i32 1))
		>;

		def : GCNPat <
(i1 (trunc i32:$a)),		(i1 (trunc i32:$a)),
(V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1), $a), (i32 1))		(V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1), $a), (i32 1))
>;		>;

def : GCNPat <		def : GCNPat <
(i1 (trunc i16:$a)),		(i1 (trunc i16:$a)),
(V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1), $a), (i32 1))		(V_CMP_EQ_U32_e64 (S_AND_B32 (i32 1), $a), (i32 1))
>;		>;
▲ Show 20 Lines • Show All 757 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fold-binop-select.ll

	Show First 20 Lines • Show All 391 Lines • ▼ Show 20 Lines
	;			;
	; GCN-LABEL: select_add_lhs_const_i16:			; GCN-LABEL: select_add_lhs_const_i16:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_load_dword s0, s[4:5], 0x0			; GCN-NEXT: s_load_dword s0, s[4:5], 0x0
	; GCN-NEXT: v_mov_b32_e32 v0, 0x83			; GCN-NEXT: v_mov_b32_e32 v0, 0x83
	; GCN-NEXT: v_mov_b32_e32 v1, 0x80			; GCN-NEXT: v_mov_b32_e32 v1, 0x80
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_and_b32 s0, 1, s0			; GCN-NEXT: s_and_b32 s0, 1, s0
	; GCN-NEXT: v_cmp_eq_u32_e64 vcc, s0, 1			; GCN-NEXT: s_cmp_eq_u32 s0, 1
				; GCN-NEXT: s_cselect_b64 vcc, -1, 0
	; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc			; GCN-NEXT: v_cndmask_b32_e32 v0, v0, v1, vcc
	; GCN-NEXT: flat_store_short v[0:1], v0			; GCN-NEXT: flat_store_short v[0:1], v0
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	%select = select i1 %cond, i16 5, i16 8			%select = select i1 %cond, i16 5, i16 8
	%op = add i16 %select, 123			%op = add i16 %select, 123
	store i16 %op, i16 addrspace(1)* undef			store i16 %op, i16 addrspace(1)* undef
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll

	Show First 20 Lines • Show All 169 Lines • ▼ Show 20 Lines
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
	; GCN-NEXT: s_load_dword s4, s[4:5], 0x0			; GCN-NEXT: s_load_dword s4, s[4:5], 0x0
	; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9			; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9
	; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0			; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
	; GCN-NEXT: s_add_u32 s0, s0, s9			; GCN-NEXT: s_add_u32 s0, s0, s9
	; GCN-NEXT: s_addc_u32 s1, s1, 0			; GCN-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_and_b32 s4, 1, s4			; GCN-NEXT: s_and_b32 s4, 1, s4
	; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], s4, 1			; GCN-NEXT: s_cmp_eq_u32 s4, 1
				; GCN-NEXT: s_cselect_b64 s[4:5], -1, 0
	; GCN-NEXT: s_and_b64 vcc, exec, s[4:5]			; GCN-NEXT: s_and_b64 vcc, exec, s[4:5]
	; GCN-NEXT: s_mov_b32 s32, 0			; GCN-NEXT: s_mov_b32 s32, 0
	; GCN-NEXT: s_cbranch_vccnz BB4_2			; GCN-NEXT: s_cbranch_vccnz BB4_2
	; GCN-NEXT: ; %bb.1: ; %if.else			; GCN-NEXT: ; %bb.1: ; %if.else
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func_v3i16@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func_v3i16@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func_v3i16@rel32@hi+12			; GCN-NEXT: s_addc_u32 s5, s5, func_v3i16@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	Show All 28 Lines
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
	; GCN-NEXT: s_load_dword s4, s[4:5], 0x0			; GCN-NEXT: s_load_dword s4, s[4:5], 0x0
	; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9			; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9
	; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0			; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
	; GCN-NEXT: s_add_u32 s0, s0, s9			; GCN-NEXT: s_add_u32 s0, s0, s9
	; GCN-NEXT: s_addc_u32 s1, s1, 0			; GCN-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_and_b32 s4, 1, s4			; GCN-NEXT: s_and_b32 s4, 1, s4
	; GCN-NEXT: v_cmp_eq_u32_e64 s[4:5], s4, 1			; GCN-NEXT: s_cmp_eq_u32 s4, 1
				; GCN-NEXT: s_cselect_b64 s[4:5], -1, 0
	; GCN-NEXT: s_and_b64 vcc, exec, s[4:5]			; GCN-NEXT: s_and_b64 vcc, exec, s[4:5]
	; GCN-NEXT: s_mov_b32 s32, 0			; GCN-NEXT: s_mov_b32 s32, 0
	; GCN-NEXT: s_cbranch_vccnz BB5_2			; GCN-NEXT: s_cbranch_vccnz BB5_2
	; GCN-NEXT: ; %bb.1: ; %if.else			; GCN-NEXT: ; %bb.1: ; %if.else
	; GCN-NEXT: s_getpc_b64 s[4:5]			; GCN-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, func_v3f16@rel32@lo+4			; GCN-NEXT: s_add_u32 s4, s4, func_v3f16@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, func_v3f16@rel32@hi+12			; GCN-NEXT: s_addc_u32 s5, s5, func_v3f16@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]
	Show All 36 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.div.fmas.ll

	Show All 11 Lines
	; SI-DAG: s_load_dword [[SB:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x1c			; SI-DAG: s_load_dword [[SB:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x1c
	; SI-DAG: s_load_dword [[SC:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x25			; SI-DAG: s_load_dword [[SC:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x25

	; VI-DAG: s_load_dword [[SA:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x4c			; VI-DAG: s_load_dword [[SA:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x4c
	; VI-DAG: s_load_dword [[SB:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x70			; VI-DAG: s_load_dword [[SB:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x70
	; VI-DAG: s_load_dword [[SC:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x94			; VI-DAG: s_load_dword [[SC:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x94

	; GCN-DAG: s_and_b32 [[AND_I1:s[0-9]+]], 1, s{{[0-9]+}}			; GCN-DAG: s_and_b32 [[AND_I1:s[0-9]+]], 1, s{{[0-9]+}}
	; GCN: v_cmp_eq_u32_e64 vcc, [[AND_I1]], 1			; GCN: s_cmp_eq_u32 [[AND_I1]], 1

	; GCN-DAG: v_mov_b32_e32 [[VC:v[0-9]+]], [[SC]]			; GCN-DAG: v_mov_b32_e32 [[VC:v[0-9]+]], [[SC]]
	; GCN-DAG: v_mov_b32_e32 [[VB:v[0-9]+]], [[SB]]			; GCN-DAG: v_mov_b32_e32 [[VB:v[0-9]+]], [[SB]]
	; GCN-DAG: v_mov_b32_e32 [[VA:v[0-9]+]], [[SA]]			; GCN-DAG: v_mov_b32_e32 [[VA:v[0-9]+]], [[SA]]
	; GCN: v_div_fmas_f32 [[RESULT:v[0-9]+]], [[VA]], [[VB]], [[VC]]			; GCN: v_div_fmas_f32 [[RESULT:v[0-9]+]], [[VA]], [[VB]], [[VC]]
	; GCN: buffer_store_dword [[RESULT]],			; GCN: buffer_store_dword [[RESULT]],
	define amdgpu_kernel void @test_div_fmas_f32(float addrspace(1)* %out, [8 x i32], float %a, [8 x i32], float %b, [8 x i32], float %c, [8 x i32], i1 %d) nounwind {			define amdgpu_kernel void @test_div_fmas_f32(float addrspace(1)* %out, [8 x i32], float %a, [8 x i32], float %b, [8 x i32], float %c, [8 x i32], i1 %d) nounwind {
	%result = call float @llvm.amdgcn.div.fmas.f32(float %a, float %b, float %c, i1 %d) nounwind readnone			%result = call float @llvm.amdgcn.div.fmas.f32(float %a, float %b, float %c, i1 %d) nounwind readnone
	▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/select-i1.ll

Show All 9 Lines	define amdgpu_kernel void @select_i1(i1 addrspace(1)* %out, i32 %cond, i1 %a, i1 %b) nounwind {
%cmp = icmp ugt i32 %cond, 5		%cmp = icmp ugt i32 %cond, 5
%sel = select i1 %cmp, i1 %a, i1 %b		%sel = select i1 %cmp, i1 %a, i1 %b
store i1 %sel, i1 addrspace(1)* %out, align 4		store i1 %sel, i1 addrspace(1)* %out, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}s_minmax_i1:		; GCN-LABEL: {{^}}s_minmax_i1:
; GCN: s_load_dword [[LOAD:s[0-9]+]],		; GCN: s_load_dword [[LOAD:s[0-9]+]],
		; GCN: s_and_b32 [[COND:s[0-9]+]], 1, [[LOAD]]
		; GCN: s_cmp_eq_u32 [[COND]], 1
		; GCN: s_cselect_b64 vcc, -1, 0
; GCN-DAG: s_lshr_b32 [[A:s[0-9]+]], [[LOAD]], 8		; GCN-DAG: s_lshr_b32 [[A:s[0-9]+]], [[LOAD]], 8
; GCN-DAG: s_lshr_b32 [[B:s[0-9]+]], [[LOAD]], 16		; GCN-DAG: s_lshr_b32 [[B:s[0-9]+]], [[LOAD]], 16
; GCN-DAG: s_and_b32 [[COND:s[0-9]+]], 1, [[LOAD]]
; GCN: v_mov_b32_e32 [[V_B:v[0-9]+]], [[B]]		; GCN: v_mov_b32_e32 [[V_B:v[0-9]+]], [[B]]
; GCN: v_mov_b32_e32 [[V_A:v[0-9]+]], [[A]]		; GCN: v_mov_b32_e32 [[V_A:v[0-9]+]], [[A]]
; GCN: v_cmp_eq_u32_e64 vcc, [[COND]], 1
; GCN: v_cndmask_b32_e32 [[SEL:v[0-9]+]], [[V_B]], [[V_A]]		; GCN: v_cndmask_b32_e32 [[SEL:v[0-9]+]], [[V_B]], [[V_A]]
; GCN: v_and_b32_e32 v{{[0-9]+}}, 1, [[SEL]]		; GCN: v_and_b32_e32 v{{[0-9]+}}, 1, [[SEL]]
define amdgpu_kernel void @s_minmax_i1(i1 addrspace(1)* %out, [8 x i32], i1 zeroext %cond, i1 zeroext %a, i1 zeroext %b) nounwind {		define amdgpu_kernel void @s_minmax_i1(i1 addrspace(1)* %out, [8 x i32], i1 zeroext %cond, i1 zeroext %a, i1 zeroext %b) nounwind {
%cmp = icmp slt i1 %cond, false		%cmp = icmp slt i1 %cond, false
%sel = select i1 %cmp, i1 %a, i1 %b		%sel = select i1 %cmp, i1 %a, i1 %b
store i1 %sel, i1 addrspace(1)* %out, align 4		store i1 %sel, i1 addrspace(1)* %out, align 4
ret void		ret void
}		}

llvm/test/CodeGen/AMDGPU/trunc.ll

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @sgpr_trunc_i32_to_i1(i32 addrspace(1)* %out, i32 %a) {
store i32 %result, i32 addrspace(1)* %out, align 4		store i32 %result, i32 addrspace(1)* %out, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}s_trunc_i64_to_i1:		; GCN-LABEL: {{^}}s_trunc_i64_to_i1:
; SI: s_load_dwordx2 s{{\[}}[[SLO:[0-9]+]]:{{[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0x13		; SI: s_load_dwordx2 s{{\[}}[[SLO:[0-9]+]]:{{[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0x13
; VI: s_load_dwordx2 s{{\[}}[[SLO:[0-9]+]]:{{[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0x4c		; VI: s_load_dwordx2 s{{\[}}[[SLO:[0-9]+]]:{{[0-9]+\]}}, {{s\[[0-9]+:[0-9]+\]}}, 0x4c
; GCN: s_and_b32 [[MASKED:s[0-9]+]], 1, s[[SLO]]		; GCN: s_and_b32 [[MASKED:s[0-9]+]], 1, s[[SLO]]
; GCN: v_cmp_eq_u32_e64 s{{\[}}[[VLO:[0-9]+]]:[[VHI:[0-9]+]]], [[MASKED]], 1{{$}}		; GCN: s_cmp_eq_u32 [[MASKED]], 1{{$}}
		; SI: s_cselect_b64 s{{\[}}[[VLO:[0-9]+]]:[[VHI:[0-9]+]]], -1, 0
; SI: v_cndmask_b32_e64 {{v[0-9]+}}, -12, 63, s{{\[}}[[VLO]]:[[VHI]]]		; SI: v_cndmask_b32_e64 {{v[0-9]+}}, -12, 63, s{{\[}}[[VLO]]:[[VHI]]]
; VI: s_cmp_lg_u64 s{{\[}}[[VLO]]:[[VHI]]], 0
; VI: s_cselect_b32 {{s[0-9]+}}, 63, -12		; VI: s_cselect_b32 {{s[0-9]+}}, 63, -12
define amdgpu_kernel void @s_trunc_i64_to_i1(i32 addrspace(1)* %out, [8 x i32], i64 %x) {		define amdgpu_kernel void @s_trunc_i64_to_i1(i32 addrspace(1)* %out, [8 x i32], i64 %x) {
%trunc = trunc i64 %x to i1		%trunc = trunc i64 %x to i1
%sel = select i1 %trunc, i32 63, i32 -12		%sel = select i1 %trunc, i32 63, i32 -12
store i32 %sel, i32 addrspace(1)* %out		store i32 %sel, i32 addrspace(1)* %out
ret void		ret void
}		}

Show All 17 Lines

llvm/test/CodeGen/AMDGPU/wave32.ll

Show First 20 Lines • Show All 412 Lines • ▼ Show 20 Lines	define i64 @test_mad_u64_u32(i32 %arg0, i32 %arg1, i64 %arg2) #0 {
%sext0 = zext i32 %arg0 to i64		%sext0 = zext i32 %arg0 to i64
%sext1 = zext i32 %arg1 to i64		%sext1 = zext i32 %arg1 to i64
%mul = mul i64 %sext0, %sext1		%mul = mul i64 %sext0, %sext1
%mad = add i64 %mul, %arg2		%mad = add i64 %mul, %arg2
ret i64 %mad		ret i64 %mad
}		}

; GCN-LABEL: {{^}}test_div_fmas_f32:		; GCN-LABEL: {{^}}test_div_fmas_f32:
; GFX1032: v_cmp_eq_u32_e64 vcc_lo,		; GFX1032: s_cmp_eq_u32 s0, 1
; GFX1064: v_cmp_eq_u32_e64 vcc,		; GFX1032: s_cselect_b32 vcc_lo, -1, 0
		; GFX1064: s_cmp_eq_u32 s0, 1
		; GFX1064: s_cselect_b64 vcc, -1, 0
; GCN: v_div_fmas_f32 v{{[0-9]+}}, {{[vs][0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GCN: v_div_fmas_f32 v{{[0-9]+}}, {{[vs][0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
define amdgpu_kernel void @test_div_fmas_f32(float addrspace(1)* %out, float %a, float %b, float %c, i1 %d) nounwind {		define amdgpu_kernel void @test_div_fmas_f32(float addrspace(1)* %out, float %a, float %b, float %c, i1 %d) nounwind {
%result = call float @llvm.amdgcn.div.fmas.f32(float %a, float %b, float %c, i1 %d) nounwind readnone		%result = call float @llvm.amdgcn.div.fmas.f32(float %a, float %b, float %c, i1 %d) nounwind readnone
store float %result, float addrspace(1)* %out, align 4		store float %result, float addrspace(1)* %out, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_div_fmas_f64:		; GCN-LABEL: {{^}}test_div_fmas_f64:
; GFX1032: v_cmp_eq_u32_e64 vcc_lo,		; GFX1032: s_cmp_eq_u32 s0, 1
; GFX1064: v_cmp_eq_u32_e64 vcc,		; GFX1032: s_cselect_b32 vcc_lo, -1, 0
		; GFX1064: s_cmp_eq_u32 s0, 1
		; GFX1064: s_cselect_b64 vcc, -1, 0
; GCN-DAG: v_div_fmas_f64 v[{{[0-9:]+}}], {{[vs]}}[{{[0-9:]+}}], v[{{[0-9:]+}}], v[{{[0-9:]+}}]		; GCN-DAG: v_div_fmas_f64 v[{{[0-9:]+}}], {{[vs]}}[{{[0-9:]+}}], v[{{[0-9:]+}}], v[{{[0-9:]+}}]
define amdgpu_kernel void @test_div_fmas_f64(double addrspace(1)* %out, double %a, double %b, double %c, i1 %d) nounwind {		define amdgpu_kernel void @test_div_fmas_f64(double addrspace(1)* %out, double %a, double %b, double %c, i1 %d) nounwind {
%result = call double @llvm.amdgcn.div.fmas.f64(double %a, double %b, double %c, i1 %d) nounwind readnone		%result = call double @llvm.amdgcn.div.fmas.f64(double %a, double %b, double %c, i1 %d) nounwind readnone
store double %result, double addrspace(1)* %out, align 8		store double %result, double addrspace(1)* %out, align 8
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_div_fmas_f32_i1_phi_vcc:		; GCN-LABEL: {{^}}test_div_fmas_f32_i1_phi_vcc:
▲ Show 20 Lines • Show All 755 Lines • Show Last 20 Lines