This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] [mc] Corrected several VI opcodes to avoid printing _e64
ClosedPublic

Authored by dp on May 12 2017, 4:22 AM.

Download Raw Diff

Details

Reviewers

vpykhtin
artem.tamazov

Commits

rG167f8b69e339: [AMDGPU][MC] Corrected several VI opcodes to avoid printing _e64
rL303070: [AMDGPU][MC] Corrected several VI opcodes to avoid printing _e64

Summary

There are VI instructions which can only be encoded using VOP3 format (VOP2/VOP3 in CI). However disassembler appends "_e64" suffix which is misleading for VI ISA.

Full list of opcodes affected by this issue:

v_bcnt_u32_b32
v_bfm_b32
v_cvt_pk_i16_i32
v_cvt_pk_u16_u32
v_cvt_pkaccum_u8_f32
v_cvt_pknorm_i16_f32
v_cvt_pknorm_u16_f32
v_cvt_pkrtz_f16_f32
v_ldexp_f32
v_mbcnt_hi_u32_b32
v_mbcnt_lo_u32_b32

See Bug 32936: https://bugs.llvm.org//show_bug.cgi?id=32936

Diff Detail

Repository: rL LLVM

Event Timeline

dp created this revision.May 12 2017, 4:22 AM

Herald added subscribers: nhaehnle, arsenm. · View Herald TranscriptMay 12 2017, 4:22 AM

dp retitled this revision from Corrected several VI opcodes to avoid printing _e64 to [AMDGPU] [mc] Corrected several VI opcodes to avoid printing _e64.May 12 2017, 4:23 AM

dp edited the summary of this revision. (Show Details)May 12 2017, 4:44 AM

Generally, looks good.
A subtle detail: I am thinking the we shall ensure compatibility between ISA generations as much as possible.
Let's consider the following use case:

We have valid CI ISA with VOP3 instructions.
Disassembly for CI yields text with _e64 suffixes.
The text should be assembled for VI without need to remove _e64 suffix.

Similar use case: disassembled CI ISA with VOP2 (with _e32) should not be assembled for VI.

Please check.

In D33123#753105, @artem.tamazov wrote:

Generally, looks good.
A subtle detail: I am thinking the we shall ensure compatibility between ISA generations as much as possible.
Let's consider the following use case:

We have valid CI ISA with VOP3 instructions.

Disassembly for CI yields text with _e64 suffixes.

The text should be assembled for VI without need to remove _e64 suffix.

Similar use case: disassembled CI ISA with VOP2 (with _e32) should not be assembled for VI.

Please check.

This is how current implementation works.
'_e32' and '_e64' suffices are allowed for opcodes which have only one encoding, provided that there is no conflict between specified and actual encoding size.

Examples:

v_bfm_b32 v5, s1, v2          // ok on SI,VI
v_bfm_b32_e32 v5, s1, v2  // ok on SI, error on VI
v_bfm_b32_e64 v5, s1, v2  // ok on SI,VI

artem.tamazov added inline comments.May 12 2017, 5:23 AM

test/CodeGen/AMDGPU/constant-fold-mi-operands.ll
28 ↗	(On Diff #98746)	WRT all changes in CodeGen tests: Is there is a way to indicate "_e64 or nothing" in tests? I think that would minimize changes and copypasting.

dp added inline comments.May 12 2017, 7:39 AM

test/CodeGen/AMDGPU/constant-fold-mi-operands.ll
28 ↗	(On Diff #98746)	Looks like FileCheck supports only a subset of regular expressions. Empty expressions are not allowed: v_mbcnt_lo_u32{{_e64\|}} // syntax error: empty subexpression It is possible to specify a blank or a metasymbol like \b (word boundary) but it does not work as expected: v_mbcnt_lo_u32{{_e64\| }} // does not match with v_mbcnt_lo_u32 for some reason v_mbcnt_lo_u32{{_e64\|.}} // does not work either The following pattern works but it looks ugly: v_mbcnt_lo{{_u32_e64\|_u32}} // works Any ideas?

artem.tamazov added inline comments.May 12 2017, 8:14 AM

test/CodeGen/AMDGPU/constant-fold-mi-operands.ll
28 ↗	(On Diff #98746)	{{...}} can contain any regexp, so could you try this: v_mbcnt_lo_u32{{(?:_e64)?}}

Updated as suggested by Artem to minimize changes in CodeGen tests.
Unfortunately, {{(...)?}} does not work in many cases so I had to use {{(...)*}}

Herald added subscribers: t-tye, tpr, dstuttard and 3 others. · View Herald TranscriptMay 12 2017, 10:21 AM

Thanks, good!

This revision is now accepted and ready to land.May 15 2017, 5:12 AM

Closed by commit rL303070: [AMDGPU][MC] Corrected several VI opcodes to avoid printing _e64 (authored by dpreobra). · Explain WhyMay 15 2017, 7:41 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AMDGPU/

VOP2Instructions.td

33 lines

test/

CodeGen/

AMDGPU/

constant-fold-mi-operands.ll

12 lines

ctpop.ll

80 lines

ctpop64.ll

16 lines

llvm.amdgcn.cvt.pkrtz.ll

18 lines

llvm.amdgcn.mbcnt.ll

4 lines

MC/

AMDGPU/

vop2.s

22 lines

vop3-convert.s

14 lines

Disassembler/

AMDGPU/

vop2_vi.txt

22 lines

Diff 99006

llvm/trunk/lib/Target/AMDGPU/VOP2Instructions.td

	Show First 20 Lines • Show All 651 Lines • ▼ Show 20 Lines
	}			}

	multiclass VOP2_Real_e64_vi <bits<10> op> {			multiclass VOP2_Real_e64_vi <bits<10> op> {
	def _e64_vi :			def _e64_vi :
	VOP3_Real<!cast<VOP3_Pseudo>(NAME#"_e64"), SIEncodingFamily.VI>,			VOP3_Real<!cast<VOP3_Pseudo>(NAME#"_e64"), SIEncodingFamily.VI>,
	VOP3e_vi <op, !cast<VOP3_Pseudo>(NAME#"_e64").Pfl>;			VOP3e_vi <op, !cast<VOP3_Pseudo>(NAME#"_e64").Pfl>;
	}			}

				multiclass VOP2_Real_e64only_vi <bits<10> op> {
				def _e64_vi :
				VOP3_Real<!cast<VOP3_Pseudo>(NAME#"_e64"), SIEncodingFamily.VI>,
				VOP3e_vi <op, !cast<VOP3_Pseudo>(NAME#"_e64").Pfl> {
				// Hack to stop printing _e64
				VOP3_Pseudo ps = !cast<VOP3_Pseudo>(NAME#"_e64");
				let OutOperandList = (outs VGPR_32:$vdst);
				let AsmString = ps.Mnemonic # " " # ps.AsmOperands;
				}
				}

	multiclass Base_VOP2be_Real_e32e64_vi <bits<6> op> : VOP2_Real_e32_vi<op> {			multiclass Base_VOP2be_Real_e32e64_vi <bits<6> op> : VOP2_Real_e32_vi<op> {
	def _e64_vi :			def _e64_vi :
	VOP3_Real<!cast<VOP3_Pseudo>(NAME#"_e64"), SIEncodingFamily.VI>,			VOP3_Real<!cast<VOP3_Pseudo>(NAME#"_e64"), SIEncodingFamily.VI>,
	VOP3be_vi <{0, 1, 0, 0, op{5-0}}, !cast<VOP3_Pseudo>(NAME#"_e64").Pfl>;			VOP3be_vi <{0, 1, 0, 0, op{5-0}}, !cast<VOP3_Pseudo>(NAME#"_e64").Pfl>;
	}			}

	multiclass Base_VOP2_Real_e32e64_vi <bits<6> op> :			multiclass Base_VOP2_Real_e32e64_vi <bits<6> op> :
	VOP2_Real_e32_vi<op>,			VOP2_Real_e32_vi<op>,
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	defm V_SUBREV_I32 : VOP2be_Real_e32e64_vi <0x1b>;			defm V_SUBREV_I32 : VOP2be_Real_e32e64_vi <0x1b>;
	defm V_ADDC_U32 : VOP2be_Real_e32e64_vi <0x1c>;			defm V_ADDC_U32 : VOP2be_Real_e32e64_vi <0x1c>;
	defm V_SUBB_U32 : VOP2be_Real_e32e64_vi <0x1d>;			defm V_SUBB_U32 : VOP2be_Real_e32e64_vi <0x1d>;
	defm V_SUBBREV_U32 : VOP2be_Real_e32e64_vi <0x1e>;			defm V_SUBBREV_U32 : VOP2be_Real_e32e64_vi <0x1e>;

	defm V_READLANE_B32 : VOP32_Real_vi <0x289>;			defm V_READLANE_B32 : VOP32_Real_vi <0x289>;
	defm V_WRITELANE_B32 : VOP32_Real_vi <0x28a>;			defm V_WRITELANE_B32 : VOP32_Real_vi <0x28a>;

	defm V_BFM_B32 : VOP2_Real_e64_vi <0x293>;			defm V_BFM_B32 : VOP2_Real_e64only_vi <0x293>;
	defm V_BCNT_U32_B32 : VOP2_Real_e64_vi <0x28b>;			defm V_BCNT_U32_B32 : VOP2_Real_e64only_vi <0x28b>;
	defm V_MBCNT_LO_U32_B32 : VOP2_Real_e64_vi <0x28c>;			defm V_MBCNT_LO_U32_B32 : VOP2_Real_e64only_vi <0x28c>;
	defm V_MBCNT_HI_U32_B32 : VOP2_Real_e64_vi <0x28d>;			defm V_MBCNT_HI_U32_B32 : VOP2_Real_e64only_vi <0x28d>;
	defm V_LDEXP_F32 : VOP2_Real_e64_vi <0x288>;			defm V_LDEXP_F32 : VOP2_Real_e64only_vi <0x288>;
	defm V_CVT_PKACCUM_U8_F32 : VOP2_Real_e64_vi <0x1f0>;			defm V_CVT_PKACCUM_U8_F32 : VOP2_Real_e64only_vi <0x1f0>;
	defm V_CVT_PKNORM_I16_F32 : VOP2_Real_e64_vi <0x294>;			defm V_CVT_PKNORM_I16_F32 : VOP2_Real_e64only_vi <0x294>;
	defm V_CVT_PKNORM_U16_F32 : VOP2_Real_e64_vi <0x295>;			defm V_CVT_PKNORM_U16_F32 : VOP2_Real_e64only_vi <0x295>;
	defm V_CVT_PKRTZ_F16_F32 : VOP2_Real_e64_vi <0x296>;			defm V_CVT_PKRTZ_F16_F32 : VOP2_Real_e64only_vi <0x296>;
	defm V_CVT_PK_U16_U32 : VOP2_Real_e64_vi <0x297>;			defm V_CVT_PK_U16_U32 : VOP2_Real_e64only_vi <0x297>;
	defm V_CVT_PK_I16_I32 : VOP2_Real_e64_vi <0x298>;			defm V_CVT_PK_I16_I32 : VOP2_Real_e64only_vi <0x298>;

	defm V_ADD_F16 : VOP2_Real_e32e64_vi <0x1f>;			defm V_ADD_F16 : VOP2_Real_e32e64_vi <0x1f>;
	defm V_SUB_F16 : VOP2_Real_e32e64_vi <0x20>;			defm V_SUB_F16 : VOP2_Real_e32e64_vi <0x20>;
	defm V_SUBREV_F16 : VOP2_Real_e32e64_vi <0x21>;			defm V_SUBREV_F16 : VOP2_Real_e32e64_vi <0x21>;
	defm V_MUL_F16 : VOP2_Real_e32e64_vi <0x22>;			defm V_MUL_F16 : VOP2_Real_e32e64_vi <0x22>;
	defm V_MAC_F16 : VOP2_Real_e32e64_vi <0x23>;			defm V_MAC_F16 : VOP2_Real_e32e64_vi <0x23>;
	defm V_MADMK_F16 : VOP2_Real_MADK_vi <0x24>;			defm V_MADMK_F16 : VOP2_Real_MADK_vi <0x24>;
	defm V_MADAK_F16 : VOP2_Real_MADK_vi <0x25>;			defm V_MADAK_F16 : VOP2_Real_MADK_vi <0x25>;
	Show All 34 Lines

llvm/trunk/test/CodeGen/AMDGPU/constant-fold-mi-operands.ll

	Show All 19 Lines
	define amdgpu_kernel void @fold_mi_s_and_0(i32 addrspace(1)* %out, i32 %x) #0 {			define amdgpu_kernel void @fold_mi_s_and_0(i32 addrspace(1)* %out, i32 %x) #0 {
	%size = call i32 @llvm.amdgcn.groupstaticsize()			%size = call i32 @llvm.amdgcn.groupstaticsize()
	%and = and i32 %size, %x			%and = and i32 %size, %x
	store i32 %and, i32 addrspace(1)* %out			store i32 %and, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}fold_mi_v_or_0:			; GCN-LABEL: {{^}}fold_mi_v_or_0:
	; GCN: v_mbcnt_lo_u32_b32_e64 [[RESULT:v[0-9]+]]			; GCN: v_mbcnt_lo_u32_b32{{(_e64)*}} [[RESULT:v[0-9]+]]
	; GCN-NOT: [[RESULT]]			; GCN-NOT: [[RESULT]]
	; GCN: buffer_store_dword [[RESULT]]			; GCN: buffer_store_dword [[RESULT]]
	define amdgpu_kernel void @fold_mi_v_or_0(i32 addrspace(1)* %out) {			define amdgpu_kernel void @fold_mi_v_or_0(i32 addrspace(1)* %out) {
	%x = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0) #0			%x = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0) #0
	%size = call i32 @llvm.amdgcn.groupstaticsize()			%size = call i32 @llvm.amdgcn.groupstaticsize()
	%or = or i32 %size, %x			%or = or i32 %size, %x
	store i32 %or, i32 addrspace(1)* %out			store i32 %or, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}fold_mi_s_or_0:			; GCN-LABEL: {{^}}fold_mi_s_or_0:
	; GCN: s_load_dword [[SVAL:s[0-9]+]]			; GCN: s_load_dword [[SVAL:s[0-9]+]]
	; GCN-NOT: [[SVAL]]			; GCN-NOT: [[SVAL]]
	; GCN: v_mov_b32_e32 [[VVAL:v[0-9]+]], [[SVAL]]			; GCN: v_mov_b32_e32 [[VVAL:v[0-9]+]], [[SVAL]]
	; GCN-NOT: [[VVAL]]			; GCN-NOT: [[VVAL]]
	; GCN: buffer_store_dword [[VVAL]]			; GCN: buffer_store_dword [[VVAL]]
	define amdgpu_kernel void @fold_mi_s_or_0(i32 addrspace(1)* %out, i32 %x) #0 {			define amdgpu_kernel void @fold_mi_s_or_0(i32 addrspace(1)* %out, i32 %x) #0 {
	%size = call i32 @llvm.amdgcn.groupstaticsize()			%size = call i32 @llvm.amdgcn.groupstaticsize()
	%or = or i32 %size, %x			%or = or i32 %size, %x
	store i32 %or, i32 addrspace(1)* %out			store i32 %or, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}fold_mi_v_xor_0:			; GCN-LABEL: {{^}}fold_mi_v_xor_0:
	; GCN: v_mbcnt_lo_u32_b32_e64 [[RESULT:v[0-9]+]]			; GCN: v_mbcnt_lo_u32_b32{{(_e64)*}} [[RESULT:v[0-9]+]]
	; GCN-NOT: [[RESULT]]			; GCN-NOT: [[RESULT]]
	; GCN: buffer_store_dword [[RESULT]]			; GCN: buffer_store_dword [[RESULT]]
	define amdgpu_kernel void @fold_mi_v_xor_0(i32 addrspace(1)* %out) {			define amdgpu_kernel void @fold_mi_v_xor_0(i32 addrspace(1)* %out) {
	%x = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0) #0			%x = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0) #0
	%size = call i32 @llvm.amdgcn.groupstaticsize()			%size = call i32 @llvm.amdgcn.groupstaticsize()
	%xor = xor i32 %size, %x			%xor = xor i32 %size, %x
	store i32 %xor, i32 addrspace(1)* %out			store i32 %xor, i32 addrspace(1)* %out
	ret void			ret void
	Show All 19 Lines
	define amdgpu_kernel void @fold_mi_s_not_0(i32 addrspace(1)* %out, i32 %x) #0 {			define amdgpu_kernel void @fold_mi_s_not_0(i32 addrspace(1)* %out, i32 %x) #0 {
	%size = call i32 @llvm.amdgcn.groupstaticsize()			%size = call i32 @llvm.amdgcn.groupstaticsize()
	%xor = xor i32 %size, -1			%xor = xor i32 %size, -1
	store i32 %xor, i32 addrspace(1)* %out			store i32 %xor, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}fold_mi_v_not_0:			; GCN-LABEL: {{^}}fold_mi_v_not_0:
	; GCN: v_bcnt_u32_b32_e64 v[[RESULT_LO:[0-9]+]], v{{[0-9]+}}, 0{{$}}			; GCN: v_bcnt_u32_b32{{(_e64)*}} v[[RESULT_LO:[0-9]+]], v{{[0-9]+}}, 0{{$}}
	; GCN: v_bcnt_u32_b32_e{{[0-9]+}} v[[RESULT_LO:[0-9]+]], v{{[0-9]+}}, v[[RESULT_LO]]{{$}}			; GCN: v_bcnt_u32_b32{{(_e32)(_e64)}} v[[RESULT_LO:[0-9]+]], v{{[0-9]+}}, v[[RESULT_LO]]{{$}}
	; GCN-NEXT: v_not_b32_e32 v[[RESULT_LO]]			; GCN-NEXT: v_not_b32_e32 v[[RESULT_LO]]
	; GCN-NEXT: v_mov_b32_e32 v[[RESULT_HI:[0-9]+]], -1{{$}}			; GCN-NEXT: v_mov_b32_e32 v[[RESULT_HI:[0-9]+]], -1{{$}}
	; GCN-NEXT: buffer_store_dwordx2 v{{\[}}[[RESULT_LO]]:[[RESULT_HI]]{{\]}}			; GCN-NEXT: buffer_store_dwordx2 v{{\[}}[[RESULT_LO]]:[[RESULT_HI]]{{\]}}
	define amdgpu_kernel void @fold_mi_v_not_0(i64 addrspace(1)* %out) {			define amdgpu_kernel void @fold_mi_v_not_0(i64 addrspace(1)* %out) {
	%vreg = load volatile i64, i64 addrspace(1)* undef			%vreg = load volatile i64, i64 addrspace(1)* undef
	%ctpop = call i64 @llvm.ctpop.i64(i64 %vreg)			%ctpop = call i64 @llvm.ctpop.i64(i64 %vreg)
	%xor = xor i64 %ctpop, -1			%xor = xor i64 %ctpop, -1
	store i64 %xor, i64 addrspace(1)* %out			store i64 %xor, i64 addrspace(1)* %out
	ret void			ret void
	}			}

	; The neg1 appears after folding the not 0			; The neg1 appears after folding the not 0
	; GCN-LABEL: {{^}}fold_mi_or_neg1:			; GCN-LABEL: {{^}}fold_mi_or_neg1:
	; GCN: buffer_load_dwordx2			; GCN: buffer_load_dwordx2
	; GCN: buffer_load_dwordx2 v{{\[}}[[VREG1_LO:[0-9]+]]:[[VREG1_HI:[0-9]+]]{{\]}}			; GCN: buffer_load_dwordx2 v{{\[}}[[VREG1_LO:[0-9]+]]:[[VREG1_HI:[0-9]+]]{{\]}}

	; GCN: v_bcnt_u32_b32_e64 v[[RESULT_LO:[0-9]+]], v{{[0-9]+}}, 0{{$}}			; GCN: v_bcnt_u32_b32{{(_e64)*}} v[[RESULT_LO:[0-9]+]], v{{[0-9]+}}, 0{{$}}
	; GCN: v_bcnt_u32_b32_e{{[0-9]+}} v[[RESULT_LO:[0-9]+]], v{{[0-9]+}}, v[[RESULT_LO]]{{$}}			; GCN: v_bcnt_u32_b32{{(_e32)(_e64)}} v[[RESULT_LO:[0-9]+]], v{{[0-9]+}}, v[[RESULT_LO]]{{$}}
	; GCN-DAG: v_not_b32_e32 v[[RESULT_LO]], v[[RESULT_LO]]			; GCN-DAG: v_not_b32_e32 v[[RESULT_LO]], v[[RESULT_LO]]
	; GCN-DAG: v_or_b32_e32 v[[RESULT_LO]], v[[VREG1_LO]], v[[RESULT_LO]]			; GCN-DAG: v_or_b32_e32 v[[RESULT_LO]], v[[VREG1_LO]], v[[RESULT_LO]]
	; GCN-DAG: v_mov_b32_e32 v[[RESULT_HI:[0-9]+]], v[[VREG1_HI]]			; GCN-DAG: v_mov_b32_e32 v[[RESULT_HI:[0-9]+]], v[[VREG1_HI]]
	; GCN: buffer_store_dwordx2 v{{\[}}[[RESULT_LO]]:[[RESULT_HI]]{{\]}}			; GCN: buffer_store_dwordx2 v{{\[}}[[RESULT_LO]]:[[RESULT_HI]]{{\]}}
	define amdgpu_kernel void @fold_mi_or_neg1(i64 addrspace(1)* %out) {			define amdgpu_kernel void @fold_mi_or_neg1(i64 addrspace(1)* %out) {
	%vreg0 = load volatile i64, i64 addrspace(1)* undef			%vreg0 = load volatile i64, i64 addrspace(1)* undef
	%vreg1 = load volatile i64, i64 addrspace(1)* undef			%vreg1 = load volatile i64, i64 addrspace(1)* undef
	%ctpop = call i64 @llvm.ctpop.i64(i64 %vreg0)			%ctpop = call i64 @llvm.ctpop.i64(i64 %vreg0)
	Show All 28 Lines

llvm/trunk/test/CodeGen/AMDGPU/ctpop.ll

Show All 19 Lines	define amdgpu_kernel void @s_ctpop_i32(i32 addrspace(1)* noalias %out, i32 %val) nounwind {
%ctpop = call i32 @llvm.ctpop.i32(i32 %val) nounwind readnone		%ctpop = call i32 @llvm.ctpop.i32(i32 %val) nounwind readnone
store i32 %ctpop, i32 addrspace(1)* %out, align 4		store i32 %ctpop, i32 addrspace(1)* %out, align 4
ret void		ret void
}		}

; XXX - Why 0 in register?		; XXX - Why 0 in register?
; FUNC-LABEL: {{^}}v_ctpop_i32:		; FUNC-LABEL: {{^}}v_ctpop_i32:
; GCN: buffer_load_dword [[VAL:v[0-9]+]],		; GCN: buffer_load_dword [[VAL:v[0-9]+]],
; GCN: v_bcnt_u32_b32_e64 [[RESULT:v[0-9]+]], [[VAL]], 0		; GCN: v_bcnt_u32_b32{{(_e64)*}} [[RESULT:v[0-9]+]], [[VAL]], 0
; GCN: buffer_store_dword [[RESULT]],		; GCN: buffer_store_dword [[RESULT]],
; GCN: s_endpgm		; GCN: s_endpgm

; EG: BCNT_INT		; EG: BCNT_INT
define amdgpu_kernel void @v_ctpop_i32(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {		define amdgpu_kernel void @v_ctpop_i32(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {
%val = load i32, i32 addrspace(1)* %in, align 4		%val = load i32, i32 addrspace(1)* %in, align 4
%ctpop = call i32 @llvm.ctpop.i32(i32 %val) nounwind readnone		%ctpop = call i32 @llvm.ctpop.i32(i32 %val) nounwind readnone
store i32 %ctpop, i32 addrspace(1)* %out, align 4		store i32 %ctpop, i32 addrspace(1)* %out, align 4
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_ctpop_add_chain_i32:		; FUNC-LABEL: {{^}}v_ctpop_add_chain_i32:
; GCN: buffer_load_dword [[VAL1:v[0-9]+]],		; GCN: buffer_load_dword [[VAL1:v[0-9]+]],
; GCN: buffer_load_dword [[VAL0:v[0-9]+]],		; GCN: buffer_load_dword [[VAL0:v[0-9]+]],
; GCN: v_bcnt_u32_b32_e64 [[MIDRESULT:v[0-9]+]], [[VAL1]], 0		; GCN: v_bcnt_u32_b32{{(_e64)*}} [[MIDRESULT:v[0-9]+]], [[VAL1]], 0
; SI: v_bcnt_u32_b32_e32 [[RESULT:v[0-9]+]], [[VAL0]], [[MIDRESULT]]		; SI: v_bcnt_u32_b32_e32 [[RESULT:v[0-9]+]], [[VAL0]], [[MIDRESULT]]
; VI: v_bcnt_u32_b32_e64 [[RESULT:v[0-9]+]], [[VAL0]], [[MIDRESULT]]		; VI: v_bcnt_u32_b32 [[RESULT:v[0-9]+]], [[VAL0]], [[MIDRESULT]]
; GCN: buffer_store_dword [[RESULT]],		; GCN: buffer_store_dword [[RESULT]],
; GCN: s_endpgm		; GCN: s_endpgm

; EG: BCNT_INT		; EG: BCNT_INT
; EG: BCNT_INT		; EG: BCNT_INT
define amdgpu_kernel void @v_ctpop_add_chain_i32(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in0, i32 addrspace(1)* noalias %in1) nounwind {		define amdgpu_kernel void @v_ctpop_add_chain_i32(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in0, i32 addrspace(1)* noalias %in1) nounwind {
%val0 = load i32, i32 addrspace(1)* %in0, align 4		%val0 = load i32, i32 addrspace(1)* %in0, align 4
%val1 = load i32, i32 addrspace(1)* %in1, align 4		%val1 = load i32, i32 addrspace(1)* %in1, align 4
%ctpop0 = call i32 @llvm.ctpop.i32(i32 %val0) nounwind readnone		%ctpop0 = call i32 @llvm.ctpop.i32(i32 %val0) nounwind readnone
%ctpop1 = call i32 @llvm.ctpop.i32(i32 %val1) nounwind readnone		%ctpop1 = call i32 @llvm.ctpop.i32(i32 %val1) nounwind readnone
%add = add i32 %ctpop0, %ctpop1		%add = add i32 %ctpop0, %ctpop1
store i32 %add, i32 addrspace(1)* %out, align 4		store i32 %add, i32 addrspace(1)* %out, align 4
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_ctpop_add_sgpr_i32:		; FUNC-LABEL: {{^}}v_ctpop_add_sgpr_i32:
; GCN: buffer_load_dword [[VAL0:v[0-9]+]],		; GCN: buffer_load_dword [[VAL0:v[0-9]+]],
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: v_bcnt_u32_b32_e64 [[RESULT:v[0-9]+]], [[VAL0]], s{{[0-9]+}}		; GCN-NEXT: v_bcnt_u32_b32{{(_e64)*}} [[RESULT:v[0-9]+]], [[VAL0]], s{{[0-9]+}}
; GCN: buffer_store_dword [[RESULT]],		; GCN: buffer_store_dword [[RESULT]],
; GCN: s_endpgm		; GCN: s_endpgm
define amdgpu_kernel void @v_ctpop_add_sgpr_i32(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in0, i32 addrspace(1)* noalias %in1, i32 %sval) nounwind {		define amdgpu_kernel void @v_ctpop_add_sgpr_i32(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in0, i32 addrspace(1)* noalias %in1, i32 %sval) nounwind {
%val0 = load i32, i32 addrspace(1)* %in0, align 4		%val0 = load i32, i32 addrspace(1)* %in0, align 4
%ctpop0 = call i32 @llvm.ctpop.i32(i32 %val0) nounwind readnone		%ctpop0 = call i32 @llvm.ctpop.i32(i32 %val0) nounwind readnone
%add = add i32 %ctpop0, %sval		%add = add i32 %ctpop0, %sval
store i32 %add, i32 addrspace(1)* %out, align 4		store i32 %add, i32 addrspace(1)* %out, align 4
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_ctpop_v2i32:		; FUNC-LABEL: {{^}}v_ctpop_v2i32:
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: s_endpgm		; GCN: s_endpgm

; EG: BCNT_INT		; EG: BCNT_INT
; EG: BCNT_INT		; EG: BCNT_INT
define amdgpu_kernel void @v_ctpop_v2i32(<2 x i32> addrspace(1)* noalias %out, <2 x i32> addrspace(1)* noalias %in) nounwind {		define amdgpu_kernel void @v_ctpop_v2i32(<2 x i32> addrspace(1)* noalias %out, <2 x i32> addrspace(1)* noalias %in) nounwind {
%val = load <2 x i32>, <2 x i32> addrspace(1)* %in, align 8		%val = load <2 x i32>, <2 x i32> addrspace(1)* %in, align 8
%ctpop = call <2 x i32> @llvm.ctpop.v2i32(<2 x i32> %val) nounwind readnone		%ctpop = call <2 x i32> @llvm.ctpop.v2i32(<2 x i32> %val) nounwind readnone
store <2 x i32> %ctpop, <2 x i32> addrspace(1)* %out, align 8		store <2 x i32> %ctpop, <2 x i32> addrspace(1)* %out, align 8
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_ctpop_v4i32:		; FUNC-LABEL: {{^}}v_ctpop_v4i32:
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: s_endpgm		; GCN: s_endpgm

; EG: BCNT_INT		; EG: BCNT_INT
; EG: BCNT_INT		; EG: BCNT_INT
; EG: BCNT_INT		; EG: BCNT_INT
; EG: BCNT_INT		; EG: BCNT_INT
define amdgpu_kernel void @v_ctpop_v4i32(<4 x i32> addrspace(1)* noalias %out, <4 x i32> addrspace(1)* noalias %in) nounwind {		define amdgpu_kernel void @v_ctpop_v4i32(<4 x i32> addrspace(1)* noalias %out, <4 x i32> addrspace(1)* noalias %in) nounwind {
%val = load <4 x i32>, <4 x i32> addrspace(1)* %in, align 16		%val = load <4 x i32>, <4 x i32> addrspace(1)* %in, align 16
%ctpop = call <4 x i32> @llvm.ctpop.v4i32(<4 x i32> %val) nounwind readnone		%ctpop = call <4 x i32> @llvm.ctpop.v4i32(<4 x i32> %val) nounwind readnone
store <4 x i32> %ctpop, <4 x i32> addrspace(1)* %out, align 16		store <4 x i32> %ctpop, <4 x i32> addrspace(1)* %out, align 16
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_ctpop_v8i32:		; FUNC-LABEL: {{^}}v_ctpop_v8i32:
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: s_endpgm		; GCN: s_endpgm

; EG: BCNT_INT		; EG: BCNT_INT
; EG: BCNT_INT		; EG: BCNT_INT
; EG: BCNT_INT		; EG: BCNT_INT
; EG: BCNT_INT		; EG: BCNT_INT
; EG: BCNT_INT		; EG: BCNT_INT
; EG: BCNT_INT		; EG: BCNT_INT
; EG: BCNT_INT		; EG: BCNT_INT
; EG: BCNT_INT		; EG: BCNT_INT
define amdgpu_kernel void @v_ctpop_v8i32(<8 x i32> addrspace(1)* noalias %out, <8 x i32> addrspace(1)* noalias %in) nounwind {		define amdgpu_kernel void @v_ctpop_v8i32(<8 x i32> addrspace(1)* noalias %out, <8 x i32> addrspace(1)* noalias %in) nounwind {
%val = load <8 x i32>, <8 x i32> addrspace(1)* %in, align 32		%val = load <8 x i32>, <8 x i32> addrspace(1)* %in, align 32
%ctpop = call <8 x i32> @llvm.ctpop.v8i32(<8 x i32> %val) nounwind readnone		%ctpop = call <8 x i32> @llvm.ctpop.v8i32(<8 x i32> %val) nounwind readnone
store <8 x i32> %ctpop, <8 x i32> addrspace(1)* %out, align 32		store <8 x i32> %ctpop, <8 x i32> addrspace(1)* %out, align 32
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_ctpop_v16i32:		; FUNC-LABEL: {{^}}v_ctpop_v16i32:
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: v_bcnt_u32_b32_e64		; GCN: v_bcnt_u32_b32{{(_e64)*}}
; GCN: s_endpgm		; GCN: s_endpgm

; EG: BCNT_INT		; EG: BCNT_INT
; EG: BCNT_INT		; EG: BCNT_INT
; EG: BCNT_INT		; EG: BCNT_INT
; EG: BCNT_INT		; EG: BCNT_INT
; EG: BCNT_INT		; EG: BCNT_INT
; EG: BCNT_INT		; EG: BCNT_INT
Show All 11 Lines	define amdgpu_kernel void @v_ctpop_v16i32(<16 x i32> addrspace(1)* noalias %out, <16 x i32> addrspace(1)* noalias %in) nounwind {
%val = load <16 x i32>, <16 x i32> addrspace(1)* %in, align 32		%val = load <16 x i32>, <16 x i32> addrspace(1)* %in, align 32
%ctpop = call <16 x i32> @llvm.ctpop.v16i32(<16 x i32> %val) nounwind readnone		%ctpop = call <16 x i32> @llvm.ctpop.v16i32(<16 x i32> %val) nounwind readnone
store <16 x i32> %ctpop, <16 x i32> addrspace(1)* %out, align 32		store <16 x i32> %ctpop, <16 x i32> addrspace(1)* %out, align 32
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_ctpop_i32_add_inline_constant:		; FUNC-LABEL: {{^}}v_ctpop_i32_add_inline_constant:
; GCN: buffer_load_dword [[VAL:v[0-9]+]],		; GCN: buffer_load_dword [[VAL:v[0-9]+]],
; GCN: v_bcnt_u32_b32_e64 [[RESULT:v[0-9]+]], [[VAL]], 4		; GCN: v_bcnt_u32_b32{{(_e64)*}} [[RESULT:v[0-9]+]], [[VAL]], 4
; GCN: buffer_store_dword [[RESULT]],		; GCN: buffer_store_dword [[RESULT]],
; GCN: s_endpgm		; GCN: s_endpgm

; EG: BCNT_INT		; EG: BCNT_INT
define amdgpu_kernel void @v_ctpop_i32_add_inline_constant(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {		define amdgpu_kernel void @v_ctpop_i32_add_inline_constant(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {
%val = load i32, i32 addrspace(1)* %in, align 4		%val = load i32, i32 addrspace(1)* %in, align 4
%ctpop = call i32 @llvm.ctpop.i32(i32 %val) nounwind readnone		%ctpop = call i32 @llvm.ctpop.i32(i32 %val) nounwind readnone
%add = add i32 %ctpop, 4		%add = add i32 %ctpop, 4
store i32 %add, i32 addrspace(1)* %out, align 4		store i32 %add, i32 addrspace(1)* %out, align 4
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_ctpop_i32_add_inline_constant_inv:		; FUNC-LABEL: {{^}}v_ctpop_i32_add_inline_constant_inv:
; GCN: buffer_load_dword [[VAL:v[0-9]+]],		; GCN: buffer_load_dword [[VAL:v[0-9]+]],
; GCN: v_bcnt_u32_b32_e64 [[RESULT:v[0-9]+]], [[VAL]], 4		; GCN: v_bcnt_u32_b32{{(_e64)*}} [[RESULT:v[0-9]+]], [[VAL]], 4
; GCN: buffer_store_dword [[RESULT]],		; GCN: buffer_store_dword [[RESULT]],
; GCN: s_endpgm		; GCN: s_endpgm

; EG: BCNT_INT		; EG: BCNT_INT
define amdgpu_kernel void @v_ctpop_i32_add_inline_constant_inv(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {		define amdgpu_kernel void @v_ctpop_i32_add_inline_constant_inv(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {
%val = load i32, i32 addrspace(1)* %in, align 4		%val = load i32, i32 addrspace(1)* %in, align 4
%ctpop = call i32 @llvm.ctpop.i32(i32 %val) nounwind readnone		%ctpop = call i32 @llvm.ctpop.i32(i32 %val) nounwind readnone
%add = add i32 4, %ctpop		%add = add i32 4, %ctpop
store i32 %add, i32 addrspace(1)* %out, align 4		store i32 %add, i32 addrspace(1)* %out, align 4
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_ctpop_i32_add_literal:		; FUNC-LABEL: {{^}}v_ctpop_i32_add_literal:
; GCN-DAG: buffer_load_dword [[VAL:v[0-9]+]],		; GCN-DAG: buffer_load_dword [[VAL:v[0-9]+]],
; GCN-DAG: v_mov_b32_e32 [[LIT:v[0-9]+]], 0x1869f		; GCN-DAG: v_mov_b32_e32 [[LIT:v[0-9]+]], 0x1869f
; SI: v_bcnt_u32_b32_e32 [[RESULT:v[0-9]+]], [[VAL]], [[LIT]]		; SI: v_bcnt_u32_b32_e32 [[RESULT:v[0-9]+]], [[VAL]], [[LIT]]
; VI: v_bcnt_u32_b32_e64 [[RESULT:v[0-9]+]], [[VAL]], [[LIT]]		; VI: v_bcnt_u32_b32 [[RESULT:v[0-9]+]], [[VAL]], [[LIT]]
; GCN: buffer_store_dword [[RESULT]],		; GCN: buffer_store_dword [[RESULT]],
; GCN: s_endpgm		; GCN: s_endpgm
define amdgpu_kernel void @v_ctpop_i32_add_literal(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {		define amdgpu_kernel void @v_ctpop_i32_add_literal(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in) nounwind {
%val = load i32, i32 addrspace(1)* %in, align 4		%val = load i32, i32 addrspace(1)* %in, align 4
%ctpop = call i32 @llvm.ctpop.i32(i32 %val) nounwind readnone		%ctpop = call i32 @llvm.ctpop.i32(i32 %val) nounwind readnone
%add = add i32 %ctpop, 99999		%add = add i32 %ctpop, 99999
store i32 %add, i32 addrspace(1)* %out, align 4		store i32 %add, i32 addrspace(1)* %out, align 4
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_ctpop_i32_add_var:		; FUNC-LABEL: {{^}}v_ctpop_i32_add_var:
; GCN-DAG: buffer_load_dword [[VAL:v[0-9]+]],		; GCN-DAG: buffer_load_dword [[VAL:v[0-9]+]],
; GCN-DAG: s_load_dword [[VAR:s[0-9]+]],		; GCN-DAG: s_load_dword [[VAR:s[0-9]+]],
; GCN: v_bcnt_u32_b32_e64 [[RESULT:v[0-9]+]], [[VAL]], [[VAR]]		; GCN: v_bcnt_u32_b32{{(_e64)*}} [[RESULT:v[0-9]+]], [[VAL]], [[VAR]]
; GCN: buffer_store_dword [[RESULT]],		; GCN: buffer_store_dword [[RESULT]],
; GCN: s_endpgm		; GCN: s_endpgm

; EG: BCNT_INT		; EG: BCNT_INT
define amdgpu_kernel void @v_ctpop_i32_add_var(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in, i32 %const) nounwind {		define amdgpu_kernel void @v_ctpop_i32_add_var(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in, i32 %const) nounwind {
%val = load i32, i32 addrspace(1)* %in, align 4		%val = load i32, i32 addrspace(1)* %in, align 4
%ctpop = call i32 @llvm.ctpop.i32(i32 %val) nounwind readnone		%ctpop = call i32 @llvm.ctpop.i32(i32 %val) nounwind readnone
%add = add i32 %ctpop, %const		%add = add i32 %ctpop, %const
store i32 %add, i32 addrspace(1)* %out, align 4		store i32 %add, i32 addrspace(1)* %out, align 4
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_ctpop_i32_add_var_inv:		; FUNC-LABEL: {{^}}v_ctpop_i32_add_var_inv:
; GCN-DAG: buffer_load_dword [[VAL:v[0-9]+]],		; GCN-DAG: buffer_load_dword [[VAL:v[0-9]+]],
; GCN-DAG: s_load_dword [[VAR:s[0-9]+]],		; GCN-DAG: s_load_dword [[VAR:s[0-9]+]],
; GCN: v_bcnt_u32_b32_e64 [[RESULT:v[0-9]+]], [[VAL]], [[VAR]]		; GCN: v_bcnt_u32_b32{{(_e64)*}} [[RESULT:v[0-9]+]], [[VAL]], [[VAR]]
; GCN: buffer_store_dword [[RESULT]],		; GCN: buffer_store_dword [[RESULT]],
; GCN: s_endpgm		; GCN: s_endpgm

; EG: BCNT_INT		; EG: BCNT_INT
define amdgpu_kernel void @v_ctpop_i32_add_var_inv(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in, i32 %const) nounwind {		define amdgpu_kernel void @v_ctpop_i32_add_var_inv(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in, i32 %const) nounwind {
%val = load i32, i32 addrspace(1)* %in, align 4		%val = load i32, i32 addrspace(1)* %in, align 4
%ctpop = call i32 @llvm.ctpop.i32(i32 %val) nounwind readnone		%ctpop = call i32 @llvm.ctpop.i32(i32 %val) nounwind readnone
%add = add i32 %const, %ctpop		%add = add i32 %const, %ctpop
store i32 %add, i32 addrspace(1)* %out, align 4		store i32 %add, i32 addrspace(1)* %out, align 4
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_ctpop_i32_add_vvar_inv:		; FUNC-LABEL: {{^}}v_ctpop_i32_add_vvar_inv:
; GCN-DAG: buffer_load_dword [[VAL:v[0-9]+]], off, s[{{[0-9]+:[0-9]+}}], {{0$}}		; GCN-DAG: buffer_load_dword [[VAL:v[0-9]+]], off, s[{{[0-9]+:[0-9]+}}], {{0$}}
; GCN-DAG: buffer_load_dword [[VAR:v[0-9]+]], off, s[{{[0-9]+:[0-9]+}}], 0 offset:16		; GCN-DAG: buffer_load_dword [[VAR:v[0-9]+]], off, s[{{[0-9]+:[0-9]+}}], 0 offset:16
; SI: v_bcnt_u32_b32_e32 [[RESULT:v[0-9]+]], [[VAL]], [[VAR]]		; SI: v_bcnt_u32_b32_e32 [[RESULT:v[0-9]+]], [[VAL]], [[VAR]]
; VI: v_bcnt_u32_b32_e64 [[RESULT:v[0-9]+]], [[VAL]], [[VAR]]		; VI: v_bcnt_u32_b32 [[RESULT:v[0-9]+]], [[VAL]], [[VAR]]
; GCN: buffer_store_dword [[RESULT]],		; GCN: buffer_store_dword [[RESULT]],
; GCN: s_endpgm		; GCN: s_endpgm

; EG: BCNT_INT		; EG: BCNT_INT
define amdgpu_kernel void @v_ctpop_i32_add_vvar_inv(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in, i32 addrspace(1)* noalias %constptr) nounwind {		define amdgpu_kernel void @v_ctpop_i32_add_vvar_inv(i32 addrspace(1)* noalias %out, i32 addrspace(1)* noalias %in, i32 addrspace(1)* noalias %constptr) nounwind {
%val = load i32, i32 addrspace(1)* %in, align 4		%val = load i32, i32 addrspace(1)* %in, align 4
%ctpop = call i32 @llvm.ctpop.i32(i32 %val) nounwind readnone		%ctpop = call i32 @llvm.ctpop.i32(i32 %val) nounwind readnone
%gep = getelementptr i32, i32 addrspace(1)* %constptr, i32 4		%gep = getelementptr i32, i32 addrspace(1)* %constptr, i32 4
Show All 36 Lines

llvm/trunk/test/CodeGen/AMDGPU/ctpop64.ll

Show All 20 Lines	define amdgpu_kernel void @s_ctpop_i64(i32 addrspace(1)* noalias %out, i64 %val) nounwind {
%ctpop = call i64 @llvm.ctpop.i64(i64 %val) nounwind readnone		%ctpop = call i64 @llvm.ctpop.i64(i64 %val) nounwind readnone
%truncctpop = trunc i64 %ctpop to i32		%truncctpop = trunc i64 %ctpop to i32
store i32 %truncctpop, i32 addrspace(1)* %out, align 4		store i32 %truncctpop, i32 addrspace(1)* %out, align 4
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_ctpop_i64:		; FUNC-LABEL: {{^}}v_ctpop_i64:
; GCN: buffer_load_dwordx2 v{{\[}}[[LOVAL:[0-9]+]]:[[HIVAL:[0-9]+]]{{\]}},		; GCN: buffer_load_dwordx2 v{{\[}}[[LOVAL:[0-9]+]]:[[HIVAL:[0-9]+]]{{\]}},
; GCN: v_bcnt_u32_b32_e64 [[MIDRESULT:v[0-9]+]], v[[LOVAL]], 0		; GCN: v_bcnt_u32_b32{{(_e64)*}} [[MIDRESULT:v[0-9]+]], v[[LOVAL]], 0
; SI-NEXT: v_bcnt_u32_b32_e32 [[RESULT:v[0-9]+]], v[[HIVAL]], [[MIDRESULT]]		; SI-NEXT: v_bcnt_u32_b32_e32 [[RESULT:v[0-9]+]], v[[HIVAL]], [[MIDRESULT]]
; VI-NEXT: v_bcnt_u32_b32_e64 [[RESULT:v[0-9]+]], v[[HIVAL]], [[MIDRESULT]]		; VI-NEXT: v_bcnt_u32_b32 [[RESULT:v[0-9]+]], v[[HIVAL]], [[MIDRESULT]]
; GCN: buffer_store_dword [[RESULT]],		; GCN: buffer_store_dword [[RESULT]],
; GCN: s_endpgm		; GCN: s_endpgm
define amdgpu_kernel void @v_ctpop_i64(i32 addrspace(1)* noalias %out, i64 addrspace(1)* noalias %in) nounwind {		define amdgpu_kernel void @v_ctpop_i64(i32 addrspace(1)* noalias %out, i64 addrspace(1)* noalias %in) nounwind {
%val = load i64, i64 addrspace(1)* %in, align 8		%val = load i64, i64 addrspace(1)* %in, align 8
%ctpop = call i64 @llvm.ctpop.i64(i64 %val) nounwind readnone		%ctpop = call i64 @llvm.ctpop.i64(i64 %val) nounwind readnone
%truncctpop = trunc i64 %ctpop to i32		%truncctpop = trunc i64 %ctpop to i32
store i32 %truncctpop, i32 addrspace(1)* %out, align 4		store i32 %truncctpop, i32 addrspace(1)* %out, align 4
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_ctpop_i64_user:		; FUNC-LABEL: {{^}}v_ctpop_i64_user:
; GCN: buffer_load_dwordx2 v{{\[}}[[LOVAL:[0-9]+]]:[[HIVAL:[0-9]+]]{{\]}},		; GCN: buffer_load_dwordx2 v{{\[}}[[LOVAL:[0-9]+]]:[[HIVAL:[0-9]+]]{{\]}},
; GCN: v_bcnt_u32_b32_e64 [[MIDRESULT:v[0-9]+]], v[[LOVAL]], 0		; GCN: v_bcnt_u32_b32{{(_e64)*}} [[MIDRESULT:v[0-9]+]], v[[LOVAL]], 0
; SI-NEXT: v_bcnt_u32_b32_e32 [[RESULT:v[0-9]+]], v[[HIVAL]], [[MIDRESULT]]		; SI-NEXT: v_bcnt_u32_b32_e32 [[RESULT:v[0-9]+]], v[[HIVAL]], [[MIDRESULT]]
; VI-NEXT: v_bcnt_u32_b32_e64 [[RESULT:v[0-9]+]], v[[HIVAL]], [[MIDRESULT]]		; VI-NEXT: v_bcnt_u32_b32 [[RESULT:v[0-9]+]], v[[HIVAL]], [[MIDRESULT]]
; GCN-DAG: v_or_b32_e32 v[[RESULT_LO:[0-9]+]], s{{[0-9]+}}, [[RESULT]]		; GCN-DAG: v_or_b32_e32 v[[RESULT_LO:[0-9]+]], s{{[0-9]+}}, [[RESULT]]
; GCN-DAG: v_mov_b32_e32 v[[RESULT_HI:[0-9]+]], s{{[0-9]+}}		; GCN-DAG: v_mov_b32_e32 v[[RESULT_HI:[0-9]+]], s{{[0-9]+}}
; GCN: buffer_store_dwordx2 v{{\[}}[[RESULT_LO]]:[[RESULT_HI]]{{\]}}		; GCN: buffer_store_dwordx2 v{{\[}}[[RESULT_LO]]:[[RESULT_HI]]{{\]}}
; GCN: s_endpgm		; GCN: s_endpgm
define amdgpu_kernel void @v_ctpop_i64_user(i64 addrspace(1)* noalias %out, i64 addrspace(1)* noalias %in, i64 %s.val) nounwind {		define amdgpu_kernel void @v_ctpop_i64_user(i64 addrspace(1)* noalias %out, i64 addrspace(1)* noalias %in, i64 %s.val) nounwind {
%val = load i64, i64 addrspace(1)* %in, align 8		%val = load i64, i64 addrspace(1)* %in, align 8
%ctpop = call i64 @llvm.ctpop.i64(i64 %val) nounwind readnone		%ctpop = call i64 @llvm.ctpop.i64(i64 %val) nounwind readnone
%or = or i64 %ctpop, %s.val		%or = or i64 %ctpop, %s.val
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @s_ctpop_i65(i32 addrspace(1)* noalias %out, i65 %val) nounwind {
ret void		ret void
}		}

; FIXME: Should not have extra add		; FIXME: Should not have extra add

; FUNC-LABEL: {{^}}v_ctpop_i128:		; FUNC-LABEL: {{^}}v_ctpop_i128:
; GCN: buffer_load_dwordx4 v{{\[}}[[VAL0:[0-9]+]]:[[VAL3:[0-9]+]]{{\]}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}		; GCN: buffer_load_dwordx4 v{{\[}}[[VAL0:[0-9]+]]:[[VAL3:[0-9]+]]{{\]}}, off, s{{\[[0-9]+:[0-9]+\]}}, 0{{$}}

; GCN-DAG: v_bcnt_u32_b32_e64 [[MIDRESULT0:v[0-9]+]], v{{[0-9]+}}, 0		; GCN-DAG: v_bcnt_u32_b32{{(_e64)*}} [[MIDRESULT0:v[0-9]+]], v{{[0-9]+}}, 0
; GCN-DAG: v_bcnt_u32_b32{{_e32\|_e64}} [[MIDRESULT1:v[0-9]+]], v[[VAL3]], [[MIDRESULT0]]		; GCN-DAG: v_bcnt_u32_b32{{(_e32)(_e64)}} [[MIDRESULT1:v[0-9]+]], v[[VAL3]], [[MIDRESULT0]]

; GCN-DAG: v_bcnt_u32_b32_e64 [[MIDRESULT2:v[0-9]+]], v[[VAL0]], 0		; GCN-DAG: v_bcnt_u32_b32{{(_e64)*}} [[MIDRESULT2:v[0-9]+]], v[[VAL0]], 0
; GCN-DAG: v_bcnt_u32_b32{{_e32\|_e64}} [[MIDRESULT3:v[0-9]+]], v{{[0-9]+}}, [[MIDRESULT2]]		; GCN-DAG: v_bcnt_u32_b32{{(_e32)(_e64)}} [[MIDRESULT3:v[0-9]+]], v{{[0-9]+}}, [[MIDRESULT2]]

; GCN: v_add_i32_e32 [[RESULT:v[0-9]+]], vcc, [[MIDRESULT1]], [[MIDRESULT2]]		; GCN: v_add_i32_e32 [[RESULT:v[0-9]+]], vcc, [[MIDRESULT1]], [[MIDRESULT2]]

; GCN: buffer_store_dword [[RESULT]],		; GCN: buffer_store_dword [[RESULT]],
; GCN: s_endpgm		; GCN: s_endpgm
define amdgpu_kernel void @v_ctpop_i128(i32 addrspace(1)* noalias %out, i128 addrspace(1)* noalias %in) nounwind {		define amdgpu_kernel void @v_ctpop_i128(i32 addrspace(1)* noalias %out, i128 addrspace(1)* noalias %in) nounwind {
%val = load i128, i128 addrspace(1)* %in, align 8		%val = load i128, i128 addrspace(1)* %in, align 8
%ctpop = call i128 @llvm.ctpop.i128(i128 %val) nounwind readnone		%ctpop = call i128 @llvm.ctpop.i128(i128 %val) nounwind readnone
%truncctpop = trunc i128 %ctpop to i32		%truncctpop = trunc i128 %ctpop to i32
store i32 %truncctpop, i32 addrspace(1)* %out, align 4		store i32 %truncctpop, i32 addrspace(1)* %out, align 4
ret void		ret void
}		}

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.pkrtz.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=SI %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=SI %s
	; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GFX89 -check-prefix=VI %s			; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GFX89 -check-prefix=VI %s
	; RUN: llc -march=amdgcn -mcpu=gfx901 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GFX89 -check-prefix=GFX9 %s			; RUN: llc -march=amdgcn -mcpu=gfx901 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GFX89 -check-prefix=GFX9 %s

	; GCN-LABEL: {{^}}s_cvt_pkrtz_v2f16_f32:			; GCN-LABEL: {{^}}s_cvt_pkrtz_v2f16_f32:
	; GCN-DAG: s_load_dword [[X:s[0-9]+]], s[0:1], 0x{{b\|2c}}			; GCN-DAG: s_load_dword [[X:s[0-9]+]], s[0:1], 0x{{b\|2c}}
	; GCN-DAG: s_load_dword [[SY:s[0-9]+]], s[0:1], 0x{{c\|30}}			; GCN-DAG: s_load_dword [[SY:s[0-9]+]], s[0:1], 0x{{c\|30}}
	; GCN: v_mov_b32_e32 [[VY:v[0-9]+]], [[SY]]			; GCN: v_mov_b32_e32 [[VY:v[0-9]+]], [[SY]]
	; SI: v_cvt_pkrtz_f16_f32_e32 v{{[0-9]+}}, [[X]], [[VY]]			; SI: v_cvt_pkrtz_f16_f32_e32 v{{[0-9]+}}, [[X]], [[VY]]
	; GFX89: v_cvt_pkrtz_f16_f32_e64 v{{[0-9]+}}, [[X]], [[VY]]			; GFX89: v_cvt_pkrtz_f16_f32 v{{[0-9]+}}, [[X]], [[VY]]
	define amdgpu_kernel void @s_cvt_pkrtz_v2f16_f32(<2 x half> addrspace(1)* %out, float %x, float %y) #0 {			define amdgpu_kernel void @s_cvt_pkrtz_v2f16_f32(<2 x half> addrspace(1)* %out, float %x, float %y) #0 {
	%result = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %x, float %y)			%result = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %x, float %y)
	store <2 x half> %result, <2 x half> addrspace(1)* %out			store <2 x half> %result, <2 x half> addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}s_cvt_pkrtz_samereg_v2f16_f32:			; GCN-LABEL: {{^}}s_cvt_pkrtz_samereg_v2f16_f32:
	; GCN: s_load_dword [[X:s[0-9]+]]			; GCN: s_load_dword [[X:s[0-9]+]]
	; GCN: v_cvt_pkrtz_f16_f32_e64 v{{[0-9]+}}, [[X]], [[X]]			; GCN: v_cvt_pkrtz_f16_f32{{(_e64)*}} v{{[0-9]+}}, [[X]], [[X]]
	define amdgpu_kernel void @s_cvt_pkrtz_samereg_v2f16_f32(<2 x half> addrspace(1)* %out, float %x) #0 {			define amdgpu_kernel void @s_cvt_pkrtz_samereg_v2f16_f32(<2 x half> addrspace(1)* %out, float %x) #0 {
	%result = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %x, float %x)			%result = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %x, float %x)
	store <2 x half> %result, <2 x half> addrspace(1)* %out			store <2 x half> %result, <2 x half> addrspace(1)* %out
	ret void			ret void
	}			}

	; FIXME: Folds to 0 on gfx9			; FIXME: Folds to 0 on gfx9
	; GCN-LABEL: {{^}}s_cvt_pkrtz_undef_undef:			; GCN-LABEL: {{^}}s_cvt_pkrtz_undef_undef:
	; GCN-NEXT: ; BB#0			; GCN-NEXT: ; BB#0
	; SI-NEXT: s_endpgm			; SI-NEXT: s_endpgm
	; VI-NEXT: s_endpgm			; VI-NEXT: s_endpgm
	; GFX9: v_mov_b32_e32 v{{[0-9]+}}, 0{{$}}			; GFX9: v_mov_b32_e32 v{{[0-9]+}}, 0{{$}}
	define amdgpu_kernel void @s_cvt_pkrtz_undef_undef(<2 x half> addrspace(1)* %out) #0 {			define amdgpu_kernel void @s_cvt_pkrtz_undef_undef(<2 x half> addrspace(1)* %out) #0 {
	%result = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float undef, float undef)			%result = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float undef, float undef)
	store <2 x half> %result, <2 x half> addrspace(1)* %out			store <2 x half> %result, <2 x half> addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}v_cvt_pkrtz_v2f16_f32:			; GCN-LABEL: {{^}}v_cvt_pkrtz_v2f16_f32:
	; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]			; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]
	; GCN: {{buffer\|flat}}_load_dword [[B:v[0-9]+]]			; GCN: {{buffer\|flat}}_load_dword [[B:v[0-9]+]]
	; SI: v_cvt_pkrtz_f16_f32_e32 v{{[0-9]+}}, [[A]], [[B]]			; SI: v_cvt_pkrtz_f16_f32_e32 v{{[0-9]+}}, [[A]], [[B]]
	; GFX89: v_cvt_pkrtz_f16_f32_e64 v{{[0-9]+}}, [[A]], [[B]]			; GFX89: v_cvt_pkrtz_f16_f32 v{{[0-9]+}}, [[A]], [[B]]
	define amdgpu_kernel void @v_cvt_pkrtz_v2f16_f32(<2 x half> addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {			define amdgpu_kernel void @v_cvt_pkrtz_v2f16_f32(<2 x half> addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%tid.ext = sext i32 %tid to i64			%tid.ext = sext i32 %tid to i64
	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext			%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
	%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext			%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext
	%out.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i64 %tid.ext			%out.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep			%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep			%b = load volatile float, float addrspace(1)* %b.gep
	%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %a, float %b)			%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %a, float %b)
	store <2 x half> %cvt, <2 x half> addrspace(1)* %out.gep			store <2 x half> %cvt, <2 x half> addrspace(1)* %out.gep
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}v_cvt_pkrtz_v2f16_f32_reg_imm:			; GCN-LABEL: {{^}}v_cvt_pkrtz_v2f16_f32_reg_imm:
	; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]			; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]
	; GCN: v_cvt_pkrtz_f16_f32_e64 v{{[0-9]+}}, [[A]], 1.0			; GCN: v_cvt_pkrtz_f16_f32{{(_e64)*}} v{{[0-9]+}}, [[A]], 1.0
	define amdgpu_kernel void @v_cvt_pkrtz_v2f16_f32_reg_imm(<2 x half> addrspace(1)* %out, float addrspace(1)* %a.ptr) #0 {			define amdgpu_kernel void @v_cvt_pkrtz_v2f16_f32_reg_imm(<2 x half> addrspace(1)* %out, float addrspace(1)* %a.ptr) #0 {
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%tid.ext = sext i32 %tid to i64			%tid.ext = sext i32 %tid to i64
	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext			%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
	%out.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i64 %tid.ext			%out.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep			%a = load volatile float, float addrspace(1)* %a.gep
	%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %a, float 1.0)			%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %a, float 1.0)
	store <2 x half> %cvt, <2 x half> addrspace(1)* %out.gep			store <2 x half> %cvt, <2 x half> addrspace(1)* %out.gep
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}v_cvt_pkrtz_v2f16_f32_imm_reg:			; GCN-LABEL: {{^}}v_cvt_pkrtz_v2f16_f32_imm_reg:
	; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]			; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]
	; SI: v_cvt_pkrtz_f16_f32_e32 v{{[0-9]+}}, 1.0, [[A]]			; SI: v_cvt_pkrtz_f16_f32_e32 v{{[0-9]+}}, 1.0, [[A]]
	; GFX89: v_cvt_pkrtz_f16_f32_e64 v{{[0-9]+}}, 1.0, [[A]]			; GFX89: v_cvt_pkrtz_f16_f32 v{{[0-9]+}}, 1.0, [[A]]
	define amdgpu_kernel void @v_cvt_pkrtz_v2f16_f32_imm_reg(<2 x half> addrspace(1)* %out, float addrspace(1)* %a.ptr) #0 {			define amdgpu_kernel void @v_cvt_pkrtz_v2f16_f32_imm_reg(<2 x half> addrspace(1)* %out, float addrspace(1)* %a.ptr) #0 {
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%tid.ext = sext i32 %tid to i64			%tid.ext = sext i32 %tid to i64
	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext			%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
	%out.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i64 %tid.ext			%out.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep			%a = load volatile float, float addrspace(1)* %a.gep
	%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float 1.0, float %a)			%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float 1.0, float %a)
	store <2 x half> %cvt, <2 x half> addrspace(1)* %out.gep			store <2 x half> %cvt, <2 x half> addrspace(1)* %out.gep
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}v_cvt_pkrtz_v2f16_f32_fneg_lo:			; GCN-LABEL: {{^}}v_cvt_pkrtz_v2f16_f32_fneg_lo:
	; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]			; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]
	; GCN: {{buffer\|flat}}_load_dword [[B:v[0-9]+]]			; GCN: {{buffer\|flat}}_load_dword [[B:v[0-9]+]]
	; GCN: v_cvt_pkrtz_f16_f32_e64 v{{[0-9]+}}, -[[A]], [[B]]			; GCN: v_cvt_pkrtz_f16_f32{{(_e64)*}} v{{[0-9]+}}, -[[A]], [[B]]
	define amdgpu_kernel void @v_cvt_pkrtz_v2f16_f32_fneg_lo(<2 x half> addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {			define amdgpu_kernel void @v_cvt_pkrtz_v2f16_f32_fneg_lo(<2 x half> addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%tid.ext = sext i32 %tid to i64			%tid.ext = sext i32 %tid to i64
	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext			%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
	%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext			%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext
	%out.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i64 %tid.ext			%out.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep			%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep			%b = load volatile float, float addrspace(1)* %b.gep
	%neg.a = fsub float -0.0, %a			%neg.a = fsub float -0.0, %a
	%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %neg.a, float %b)			%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %neg.a, float %b)
	store <2 x half> %cvt, <2 x half> addrspace(1)* %out.gep			store <2 x half> %cvt, <2 x half> addrspace(1)* %out.gep
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}v_cvt_pkrtz_v2f16_f32_fneg_hi:			; GCN-LABEL: {{^}}v_cvt_pkrtz_v2f16_f32_fneg_hi:
	; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]			; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]
	; GCN: {{buffer\|flat}}_load_dword [[B:v[0-9]+]]			; GCN: {{buffer\|flat}}_load_dword [[B:v[0-9]+]]
	; GCN: v_cvt_pkrtz_f16_f32_e64 v{{[0-9]+}}, [[A]], -[[B]]			; GCN: v_cvt_pkrtz_f16_f32{{(_e64)*}} v{{[0-9]+}}, [[A]], -[[B]]
	define amdgpu_kernel void @v_cvt_pkrtz_v2f16_f32_fneg_hi(<2 x half> addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {			define amdgpu_kernel void @v_cvt_pkrtz_v2f16_f32_fneg_hi(<2 x half> addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%tid.ext = sext i32 %tid to i64			%tid.ext = sext i32 %tid to i64
	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext			%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
	%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext			%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext
	%out.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i64 %tid.ext			%out.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep			%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep			%b = load volatile float, float addrspace(1)* %b.gep
	%neg.b = fsub float -0.0, %b			%neg.b = fsub float -0.0, %b
	%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %a, float %neg.b)			%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %a, float %neg.b)
	store <2 x half> %cvt, <2 x half> addrspace(1)* %out.gep			store <2 x half> %cvt, <2 x half> addrspace(1)* %out.gep
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}v_cvt_pkrtz_v2f16_f32_fneg_lo_hi:			; GCN-LABEL: {{^}}v_cvt_pkrtz_v2f16_f32_fneg_lo_hi:
	; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]			; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]
	; GCN: {{buffer\|flat}}_load_dword [[B:v[0-9]+]]			; GCN: {{buffer\|flat}}_load_dword [[B:v[0-9]+]]
	; GCN: v_cvt_pkrtz_f16_f32_e64 v{{[0-9]+}}, -[[A]], -[[B]]			; GCN: v_cvt_pkrtz_f16_f32{{(_e64)*}} v{{[0-9]+}}, -[[A]], -[[B]]
	define amdgpu_kernel void @v_cvt_pkrtz_v2f16_f32_fneg_lo_hi(<2 x half> addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {			define amdgpu_kernel void @v_cvt_pkrtz_v2f16_f32_fneg_lo_hi(<2 x half> addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%tid.ext = sext i32 %tid to i64			%tid.ext = sext i32 %tid to i64
	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext			%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
	%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext			%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext
	%out.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i64 %tid.ext			%out.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep			%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep			%b = load volatile float, float addrspace(1)* %b.gep
	%neg.a = fsub float -0.0, %a			%neg.a = fsub float -0.0, %a
	%neg.b = fsub float -0.0, %b			%neg.b = fsub float -0.0, %b
	%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %neg.a, float %neg.b)			%cvt = call <2 x half> @llvm.amdgcn.cvt.pkrtz(float %neg.a, float %neg.b)
	store <2 x half> %cvt, <2 x half> addrspace(1)* %out.gep			store <2 x half> %cvt, <2 x half> addrspace(1)* %out.gep
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}v_cvt_pkrtz_v2f16_f32_fneg_fabs_lo_fneg_hi:			; GCN-LABEL: {{^}}v_cvt_pkrtz_v2f16_f32_fneg_fabs_lo_fneg_hi:
	; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]			; GCN: {{buffer\|flat}}_load_dword [[A:v[0-9]+]]
	; GCN: {{buffer\|flat}}_load_dword [[B:v[0-9]+]]			; GCN: {{buffer\|flat}}_load_dword [[B:v[0-9]+]]
	; GCN: v_cvt_pkrtz_f16_f32_e64 v{{[0-9]+}}, -\|[[A]]\|, -[[B]]			; GCN: v_cvt_pkrtz_f16_f32{{(_e64)*}} v{{[0-9]+}}, -\|[[A]]\|, -[[B]]
	define amdgpu_kernel void @v_cvt_pkrtz_v2f16_f32_fneg_fabs_lo_fneg_hi(<2 x half> addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {			define amdgpu_kernel void @v_cvt_pkrtz_v2f16_f32_fneg_fabs_lo_fneg_hi(<2 x half> addrspace(1)* %out, float addrspace(1)* %a.ptr, float addrspace(1)* %b.ptr) #0 {
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%tid.ext = sext i32 %tid to i64			%tid.ext = sext i32 %tid to i64
	%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext			%a.gep = getelementptr inbounds float, float addrspace(1)* %a.ptr, i64 %tid.ext
	%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext			%b.gep = getelementptr inbounds float, float addrspace(1)* %b.ptr, i64 %tid.ext
	%out.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i64 %tid.ext			%out.gep = getelementptr inbounds <2 x half>, <2 x half> addrspace(1)* %out, i64 %tid.ext
	%a = load volatile float, float addrspace(1)* %a.gep			%a = load volatile float, float addrspace(1)* %a.gep
	%b = load volatile float, float addrspace(1)* %b.gep			%b = load volatile float, float addrspace(1)* %b.gep
	Show All 15 Lines

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.mbcnt.ll

	; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=SI %s			; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=SI %s
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=VI %s			; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=VI %s

	; GCN-LABEL: {{^}}mbcnt_intrinsics:			; GCN-LABEL: {{^}}mbcnt_intrinsics:
	; GCN: v_mbcnt_lo_u32_b32_e64 [[LO:v[0-9]+]], -1, 0			; GCN: v_mbcnt_lo_u32_b32{{(_e64)*}} [[LO:v[0-9]+]], -1, 0
	; SI: v_mbcnt_hi_u32_b32_e32 {{v[0-9]+}}, -1, [[LO]]			; SI: v_mbcnt_hi_u32_b32_e32 {{v[0-9]+}}, -1, [[LO]]
	; VI: v_mbcnt_hi_u32_b32_e64 {{v[0-9]+}}, -1, [[LO]]			; VI: v_mbcnt_hi_u32_b32 {{v[0-9]+}}, -1, [[LO]]
	define amdgpu_ps void @mbcnt_intrinsics(<16 x i8> addrspace(2)* inreg %arg, <16 x i8> addrspace(2)* inreg %arg1, <32 x i8> addrspace(2)* inreg %arg2, i32 inreg %arg3) {			define amdgpu_ps void @mbcnt_intrinsics(<16 x i8> addrspace(2)* inreg %arg, <16 x i8> addrspace(2)* inreg %arg1, <32 x i8> addrspace(2)* inreg %arg2, i32 inreg %arg3) {
	main_body:			main_body:
	%lo = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0) #0			%lo = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0) #0
	%hi = call i32 @llvm.amdgcn.mbcnt.hi(i32 -1, i32 %lo) #0			%hi = call i32 @llvm.amdgcn.mbcnt.hi(i32 -1, i32 %lo) #0
	%tmp = bitcast i32 %hi to float			%tmp = bitcast i32 %hi to float
	call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %tmp, float %tmp, float %tmp, float %tmp, i1 true, i1 true) #1			call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %tmp, float %tmp, float %tmp, float %tmp, i1 true, i1 true) #1
	ret void			ret void
	}			}

	declare i32 @llvm.amdgcn.mbcnt.lo(i32, i32) #0			declare i32 @llvm.amdgcn.mbcnt.lo(i32, i32) #0
	declare i32 @llvm.amdgcn.mbcnt.hi(i32, i32) #0			declare i32 @llvm.amdgcn.mbcnt.hi(i32, i32) #0
	declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #1			declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #1

	attributes #0 = { nounwind readnone }			attributes #0 = { nounwind readnone }
	attributes #1 = { nounwind }			attributes #1 = { nounwind }

llvm/trunk/test/MC/AMDGPU/vop2.s

	Show First 20 Lines • Show All 237 Lines • ▼ Show 20 Lines
	// VI: v_or_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x28]			// VI: v_or_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x28]
	v_or_b32_e32 v1, v2, v3			v_or_b32_e32 v1, v2, v3

	// SICI: v_xor_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x3a]			// SICI: v_xor_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x3a]
	// VI: v_xor_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x2a]			// VI: v_xor_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x2a]
	v_xor_b32_e32 v1, v2, v3			v_xor_b32_e32 v1, v2, v3

	// SICI: v_bfm_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x3c,0xd2,0x02,0x07,0x02,0x00]			// SICI: v_bfm_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x3c,0xd2,0x02,0x07,0x02,0x00]
	// VI: v_bfm_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x93,0xd2,0x02,0x07,0x02,0x00]			// VI: v_bfm_b32 v1, v2, v3 ; encoding: [0x01,0x00,0x93,0xd2,0x02,0x07,0x02,0x00]
	v_bfm_b32_e64 v1, v2, v3			v_bfm_b32_e64 v1, v2, v3

	// SICI: v_mac_f32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x3e]			// SICI: v_mac_f32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x3e]
	// VI: v_mac_f32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x2c]			// VI: v_mac_f32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x2c]
	v_mac_f32_e32 v1, v2, v3			v_mac_f32_e32 v1, v2, v3

	// SICI: v_madmk_f32 v1, v2, 0x42800000, v3 ; encoding: [0x02,0x07,0x02,0x40,0x00,0x00,0x80,0x42]			// SICI: v_madmk_f32 v1, v2, 0x42800000, v3 ; encoding: [0x02,0x07,0x02,0x40,0x00,0x00,0x80,0x42]
	// VI: v_madmk_f32 v1, v2, 0x42800000, v3 ; encoding: [0x02,0x07,0x02,0x2e,0x00,0x00,0x80,0x42]			// VI: v_madmk_f32 v1, v2, 0x42800000, v3 ; encoding: [0x02,0x07,0x02,0x2e,0x00,0x00,0x80,0x42]
	v_madmk_f32 v1, v2, 64.0, v3			v_madmk_f32 v1, v2, 64.0, v3

	// SICI: v_madak_f32 v1, v2, v3, 0x42800000 ; encoding: [0x02,0x07,0x02,0x42,0x00,0x00,0x80,0x42]			// SICI: v_madak_f32 v1, v2, v3, 0x42800000 ; encoding: [0x02,0x07,0x02,0x42,0x00,0x00,0x80,0x42]
	// VI: v_madak_f32 v1, v2, v3, 0x42800000 ; encoding: [0x02,0x07,0x02,0x30,0x00,0x00,0x80,0x42]			// VI: v_madak_f32 v1, v2, v3, 0x42800000 ; encoding: [0x02,0x07,0x02,0x30,0x00,0x00,0x80,0x42]
	v_madak_f32 v1, v2, v3, 64.0			v_madak_f32 v1, v2, v3, 64.0

	// SICI: v_bcnt_u32_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x44,0xd2,0x02,0x07,0x02,0x00]			// SICI: v_bcnt_u32_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x44,0xd2,0x02,0x07,0x02,0x00]
	// VI: v_bcnt_u32_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x8b,0xd2,0x02,0x07,0x02,0x00]			// VI: v_bcnt_u32_b32 v1, v2, v3 ; encoding: [0x01,0x00,0x8b,0xd2,0x02,0x07,0x02,0x00]
	v_bcnt_u32_b32_e64 v1, v2, v3			v_bcnt_u32_b32_e64 v1, v2, v3

	// SICI: v_mbcnt_lo_u32_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x46,0xd2,0x02,0x07,0x02,0x00]			// SICI: v_mbcnt_lo_u32_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x46,0xd2,0x02,0x07,0x02,0x00]
	// VI: v_mbcnt_lo_u32_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x8c,0xd2,0x02,0x07,0x02,0x00]			// VI: v_mbcnt_lo_u32_b32 v1, v2, v3 ; encoding: [0x01,0x00,0x8c,0xd2,0x02,0x07,0x02,0x00]
	v_mbcnt_lo_u32_b32_e64 v1, v2, v3			v_mbcnt_lo_u32_b32_e64 v1, v2, v3

	// SICI: v_mbcnt_hi_u32_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x48,0xd2,0x02,0x07,0x02,0x00]			// SICI: v_mbcnt_hi_u32_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x48,0xd2,0x02,0x07,0x02,0x00]
	// VI: v_mbcnt_hi_u32_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x8d,0xd2,0x02,0x07,0x02,0x00]			// VI: v_mbcnt_hi_u32_b32 v1, v2, v3 ; encoding: [0x01,0x00,0x8d,0xd2,0x02,0x07,0x02,0x00]
	v_mbcnt_hi_u32_b32_e64 v1, v2, v3			v_mbcnt_hi_u32_b32_e64 v1, v2, v3

	// SICI: v_add_i32_e32 v1, vcc, v2, v3 ; encoding: [0x02,0x07,0x02,0x4a]			// SICI: v_add_i32_e32 v1, vcc, v2, v3 ; encoding: [0x02,0x07,0x02,0x4a]
	// VI: v_add_i32_e32 v1, vcc, v2, v3 ; encoding: [0x02,0x07,0x02,0x32]			// VI: v_add_i32_e32 v1, vcc, v2, v3 ; encoding: [0x02,0x07,0x02,0x32]
	v_add_i32_e32 v1, vcc, v2, v3			v_add_i32_e32 v1, vcc, v2, v3

	// SICI: v_add_i32_e64 v1, s[0:1], v2, v3 ; encoding: [0x01,0x00,0x4a,0xd2,0x02,0x07,0x02,0x00]			// SICI: v_add_i32_e64 v1, s[0:1], v2, v3 ; encoding: [0x01,0x00,0x4a,0xd2,0x02,0x07,0x02,0x00]
	// VI: v_add_i32_e64 v1, s[0:1], v2, v3 ; encoding: [0x01,0x00,0x19,0xd1,0x02,0x07,0x02,0x00]			// VI: v_add_i32_e64 v1, s[0:1], v2, v3 ; encoding: [0x01,0x00,0x19,0xd1,0x02,0x07,0x02,0x00]
	▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	// VI: v_subbrev_u32_e32 v1, vcc, v2, v3, vcc ; encoding: [0x02,0x07,0x02,0x3c]			// VI: v_subbrev_u32_e32 v1, vcc, v2, v3, vcc ; encoding: [0x02,0x07,0x02,0x3c]
	v_subbrev_u32 v1, vcc, v2, v3, vcc			v_subbrev_u32 v1, vcc, v2, v3, vcc

	// SICI: v_subbrev_u32_e64 v1, s[0:1], v2, v3, vcc ; encoding: [0x01,0x00,0x54,0xd2,0x02,0x07,0xaa,0x01]			// SICI: v_subbrev_u32_e64 v1, s[0:1], v2, v3, vcc ; encoding: [0x01,0x00,0x54,0xd2,0x02,0x07,0xaa,0x01]
	// VI: v_subbrev_u32_e64 v1, s[0:1], v2, v3, vcc ; encoding: [0x01,0x00,0x1e,0xd1,0x02,0x07,0xaa,0x01]			// VI: v_subbrev_u32_e64 v1, s[0:1], v2, v3, vcc ; encoding: [0x01,0x00,0x1e,0xd1,0x02,0x07,0xaa,0x01]
	v_subbrev_u32 v1, s[0:1], v2, v3, vcc			v_subbrev_u32 v1, s[0:1], v2, v3, vcc

	// SICI: v_ldexp_f32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x56]			// SICI: v_ldexp_f32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x56]
	// VI: v_ldexp_f32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x88,0xd2,0x02,0x07,0x02,0x00]			// VI: v_ldexp_f32 v1, v2, v3 ; encoding: [0x01,0x00,0x88,0xd2,0x02,0x07,0x02,0x00]
	v_ldexp_f32 v1, v2, v3			v_ldexp_f32 v1, v2, v3

	// SICI: v_cvt_pkaccum_u8_f32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x58]			// SICI: v_cvt_pkaccum_u8_f32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x58]
	// VI: v_cvt_pkaccum_u8_f32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0xf0,0xd1,0x02,0x07,0x02,0x00]			// VI: v_cvt_pkaccum_u8_f32 v1, v2, v3 ; encoding: [0x01,0x00,0xf0,0xd1,0x02,0x07,0x02,0x00]
	v_cvt_pkaccum_u8_f32 v1, v2, v3			v_cvt_pkaccum_u8_f32 v1, v2, v3

	// SICI: v_cvt_pknorm_i16_f32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x5a]			// SICI: v_cvt_pknorm_i16_f32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x5a]
	// VI: v_cvt_pknorm_i16_f32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x94,0xd2,0x02,0x07,0x02,0x00]			// VI: v_cvt_pknorm_i16_f32 v1, v2, v3 ; encoding: [0x01,0x00,0x94,0xd2,0x02,0x07,0x02,0x00]
	v_cvt_pknorm_i16_f32 v1, v2, v3			v_cvt_pknorm_i16_f32 v1, v2, v3

	// SICI: v_cvt_pknorm_u16_f32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x5c]			// SICI: v_cvt_pknorm_u16_f32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x5c]
	// VI: v_cvt_pknorm_u16_f32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x95,0xd2,0x02,0x07,0x02,0x00]			// VI: v_cvt_pknorm_u16_f32 v1, v2, v3 ; encoding: [0x01,0x00,0x95,0xd2,0x02,0x07,0x02,0x00]
	v_cvt_pknorm_u16_f32 v1, v2, v3			v_cvt_pknorm_u16_f32 v1, v2, v3

	// SICI: v_cvt_pkrtz_f16_f32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x5e]			// SICI: v_cvt_pkrtz_f16_f32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x5e]
	// VI: v_cvt_pkrtz_f16_f32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x96,0xd2,0x02,0x07,0x02,0x00]			// VI: v_cvt_pkrtz_f16_f32 v1, v2, v3 ; encoding: [0x01,0x00,0x96,0xd2,0x02,0x07,0x02,0x00]
	v_cvt_pkrtz_f16_f32 v1, v2, v3			v_cvt_pkrtz_f16_f32 v1, v2, v3

	// SICI: v_cvt_pk_u16_u32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x60,0xd2,0x02,0x07,0x02,0x00]			// SICI: v_cvt_pk_u16_u32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x60,0xd2,0x02,0x07,0x02,0x00]
	// VI: v_cvt_pk_u16_u32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x97,0xd2,0x02,0x07,0x02,0x00]			// VI: v_cvt_pk_u16_u32 v1, v2, v3 ; encoding: [0x01,0x00,0x97,0xd2,0x02,0x07,0x02,0x00]
	v_cvt_pk_u16_u32_e64 v1, v2, v3			v_cvt_pk_u16_u32_e64 v1, v2, v3

	// SICI: v_cvt_pk_i16_i32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x62,0xd2,0x02,0x07,0x02,0x00]			// SICI: v_cvt_pk_i16_i32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x62,0xd2,0x02,0x07,0x02,0x00]
	// VI: v_cvt_pk_i16_i32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x98,0xd2,0x02,0x07,0x02,0x00]			// VI: v_cvt_pk_i16_i32 v1, v2, v3 ; encoding: [0x01,0x00,0x98,0xd2,0x02,0x07,0x02,0x00]
	v_cvt_pk_i16_i32_e64 v1, v2, v3			v_cvt_pk_i16_i32_e64 v1, v2, v3

	// NOSICI: error: instruction not supported on this GPU			// NOSICI: error: instruction not supported on this GPU
	// NOSICI: v_add_f16_e32 v1, v2, v3			// NOSICI: v_add_f16_e32 v1, v2, v3
	// VI: v_add_f16_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x3e]			// VI: v_add_f16_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x3e]
	v_add_f16_e32 v1, v2, v3			v_add_f16_e32 v1, v2, v3

	// NOSICI: error: instruction not supported on this GPU			// NOSICI: error: instruction not supported on this GPU
	▲ Show 20 Lines • Show All 98 Lines • Show Last 20 Lines

llvm/trunk/test/MC/AMDGPU/vop3-convert.s

	Show First 20 Lines • Show All 282 Lines • ▼ Show 20 Lines
	// VI: v_or_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x28]			// VI: v_or_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x28]
	v_or_b32 v1, v2, v3			v_or_b32 v1, v2, v3

	// SICI: v_xor_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x3a]			// SICI: v_xor_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x3a]
	// VI: v_xor_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x2a]			// VI: v_xor_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x2a]
	v_xor_b32 v1, v2, v3			v_xor_b32 v1, v2, v3

	// SICI: v_bfm_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x3c]			// SICI: v_bfm_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x3c]
	// VI: v_bfm_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x93,0xd2,0x02,0x07,0x02,0x00]			// VI: v_bfm_b32 v1, v2, v3 ; encoding: [0x01,0x00,0x93,0xd2,0x02,0x07,0x02,0x00]
	v_bfm_b32 v1, v2, v3			v_bfm_b32 v1, v2, v3

	// SICI: v_bcnt_u32_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x44]			// SICI: v_bcnt_u32_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x44]
	// VI: v_bcnt_u32_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x8b,0xd2,0x02,0x07,0x02,0x00]			// VI: v_bcnt_u32_b32 v1, v2, v3 ; encoding: [0x01,0x00,0x8b,0xd2,0x02,0x07,0x02,0x00]
	v_bcnt_u32_b32 v1, v2, v3			v_bcnt_u32_b32 v1, v2, v3

	// SICI: v_mbcnt_lo_u32_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x46]			// SICI: v_mbcnt_lo_u32_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x46]
	// VI: v_mbcnt_lo_u32_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x8c,0xd2,0x02,0x07,0x02,0x00]			// VI: v_mbcnt_lo_u32_b32 v1, v2, v3 ; encoding: [0x01,0x00,0x8c,0xd2,0x02,0x07,0x02,0x00]
	v_mbcnt_lo_u32_b32 v1, v2, v3			v_mbcnt_lo_u32_b32 v1, v2, v3

	// SICI: v_mbcnt_hi_u32_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x48]			// SICI: v_mbcnt_hi_u32_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x48]
	// VI: v_mbcnt_hi_u32_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x8d,0xd2,0x02,0x07,0x02,0x00]			// VI: v_mbcnt_hi_u32_b32 v1, v2, v3 ; encoding: [0x01,0x00,0x8d,0xd2,0x02,0x07,0x02,0x00]
	v_mbcnt_hi_u32_b32 v1, v2, v3			v_mbcnt_hi_u32_b32 v1, v2, v3

	// SICI: v_cvt_pk_u16_u32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x60]			// SICI: v_cvt_pk_u16_u32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x60]
	// VI: v_cvt_pk_u16_u32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x97,0xd2,0x02,0x07,0x02,0x00]			// VI: v_cvt_pk_u16_u32 v1, v2, v3 ; encoding: [0x01,0x00,0x97,0xd2,0x02,0x07,0x02,0x00]
	v_cvt_pk_u16_u32 v1, v2, v3			v_cvt_pk_u16_u32 v1, v2, v3

	// SICI: v_cvt_pk_i16_i32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x62]			// SICI: v_cvt_pk_i16_i32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x62]
	// VI: v_cvt_pk_i16_i32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x98,0xd2,0x02,0x07,0x02,0x00]			// VI: v_cvt_pk_i16_i32 v1, v2, v3 ; encoding: [0x01,0x00,0x98,0xd2,0x02,0x07,0x02,0x00]
	v_cvt_pk_i16_i32 v1, v2, v3			v_cvt_pk_i16_i32 v1, v2, v3

	// SICI: v_bfm_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x3c]			// SICI: v_bfm_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x3c]
	// VI: v_bfm_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x93,0xd2,0x02,0x07,0x02,0x00]			// VI: v_bfm_b32 v1, v2, v3 ; encoding: [0x01,0x00,0x93,0xd2,0x02,0x07,0x02,0x00]
	v_bfm_b32 v1, v2, v3			v_bfm_b32 v1, v2, v3

	// NOSICI: error: instruction not supported on this GPU			// NOSICI: error: instruction not supported on this GPU
	// NOSICI: v_add_f16 v1, v2, v3			// NOSICI: v_add_f16 v1, v2, v3
	// VI: v_add_f16_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x3e]			// VI: v_add_f16_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x3e]
	v_add_f16 v1, v2, v3			v_add_f16 v1, v2, v3

	// NOSICI: error: instruction not supported on this GPU			// NOSICI: error: instruction not supported on this GPU
	▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

llvm/trunk/test/MC/Disassembler/AMDGPU/vop2_vi.txt

	Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	0x02 0x07 0x02 0x26			0x02 0x07 0x02 0x26

	# VI: v_or_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x28]			# VI: v_or_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x28]
	0x02 0x07 0x02 0x28			0x02 0x07 0x02 0x28

	# VI: v_xor_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x2a]			# VI: v_xor_b32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x2a]
	0x02 0x07 0x02 0x2a			0x02 0x07 0x02 0x2a

	# VI: v_bfm_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x93,0xd2,0x02,0x07,0x02,0x00]			# VI: v_bfm_b32 v1, v2, v3 ; encoding: [0x01,0x00,0x93,0xd2,0x02,0x07,0x02,0x00]
	0x01 0x00 0x93 0xd2 0x02 0x07 0x02 0x00			0x01 0x00 0x93 0xd2 0x02 0x07 0x02 0x00

	# VI: v_mac_f32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x2c]			# VI: v_mac_f32_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x2c]
	0x02 0x07 0x02 0x2c			0x02 0x07 0x02 0x2c

	# VI: v_madmk_f32 v1, v2, 0x42800000, v3 ; encoding: [0x02,0x07,0x02,0x2e,0x00,0x00,0x80,0x42]			# VI: v_madmk_f32 v1, v2, 0x42800000, v3 ; encoding: [0x02,0x07,0x02,0x2e,0x00,0x00,0x80,0x42]
	0x02 0x07 0x02 0x2e 0x00 0x00 0x80 0x42			0x02 0x07 0x02 0x2e 0x00 0x00 0x80 0x42

	# VI: v_madak_f32 v1, v2, v3, 0x42800000 ; encoding: [0x02,0x07,0x02,0x30,0x00,0x00,0x80,0x42]			# VI: v_madak_f32 v1, v2, v3, 0x42800000 ; encoding: [0x02,0x07,0x02,0x30,0x00,0x00,0x80,0x42]
	0x02 0x07 0x02 0x30 0x00 0x00 0x80 0x42			0x02 0x07 0x02 0x30 0x00 0x00 0x80 0x42

	# VI: v_bcnt_u32_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x8b,0xd2,0x02,0x07,0x02,0x00]			# VI: v_bcnt_u32_b32 v1, v2, v3 ; encoding: [0x01,0x00,0x8b,0xd2,0x02,0x07,0x02,0x00]
	0x01 0x00 0x8b 0xd2 0x02 0x07 0x02 0x00			0x01 0x00 0x8b 0xd2 0x02 0x07 0x02 0x00

	# VI: v_mbcnt_lo_u32_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x8c,0xd2,0x02,0x07,0x02,0x00]			# VI: v_mbcnt_lo_u32_b32 v1, v2, v3 ; encoding: [0x01,0x00,0x8c,0xd2,0x02,0x07,0x02,0x00]
	0x01 0x00 0x8c 0xd2 0x02 0x07 0x02 0x00			0x01 0x00 0x8c 0xd2 0x02 0x07 0x02 0x00

	# VI: v_mbcnt_hi_u32_b32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x8d,0xd2,0x02,0x07,0x02,0x00]			# VI: v_mbcnt_hi_u32_b32 v1, v2, v3 ; encoding: [0x01,0x00,0x8d,0xd2,0x02,0x07,0x02,0x00]
	0x01 0x00 0x8d 0xd2 0x02 0x07 0x02 0x00			0x01 0x00 0x8d 0xd2 0x02 0x07 0x02 0x00

	# VI: v_add_i32_e32 v1, vcc, v2, v3 ; encoding: [0x02,0x07,0x02,0x32]			# VI: v_add_i32_e32 v1, vcc, v2, v3 ; encoding: [0x02,0x07,0x02,0x32]
	0x02 0x07 0x02 0x32			0x02 0x07 0x02 0x32

	# VI: v_add_i32_e64 v1, s[0:1], v2, v3 ; encoding: [0x01,0x00,0x19,0xd1,0x02,0x07,0x02,0x00]			# VI: v_add_i32_e64 v1, s[0:1], v2, v3 ; encoding: [0x01,0x00,0x19,0xd1,0x02,0x07,0x02,0x00]
	0x01 0x00 0x19 0xd1 0x02 0x07 0x02 0x00			0x01 0x00 0x19 0xd1 0x02 0x07 0x02 0x00

	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	0x01 0x00 0x1d 0xd1 0x02 0x07 0xaa 0x01			0x01 0x00 0x1d 0xd1 0x02 0x07 0xaa 0x01

	# VI: v_subbrev_u32_e32 v1, vcc, v2, v3, vcc ; encoding: [0x02,0x07,0x02,0x3c]			# VI: v_subbrev_u32_e32 v1, vcc, v2, v3, vcc ; encoding: [0x02,0x07,0x02,0x3c]
	0x02 0x07 0x02 0x3c			0x02 0x07 0x02 0x3c

	# VI: v_subbrev_u32_e64 v1, s[0:1], v2, v3, vcc ; encoding: [0x01,0x00,0x1e,0xd1,0x02,0x07,0xaa,0x01]			# VI: v_subbrev_u32_e64 v1, s[0:1], v2, v3, vcc ; encoding: [0x01,0x00,0x1e,0xd1,0x02,0x07,0xaa,0x01]
	0x01 0x00 0x1e 0xd1 0x02 0x07 0xaa 0x01			0x01 0x00 0x1e 0xd1 0x02 0x07 0xaa 0x01

	# VI: v_ldexp_f32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x88,0xd2,0x02,0x07,0x02,0x00]			# VI: v_ldexp_f32 v1, v2, v3 ; encoding: [0x01,0x00,0x88,0xd2,0x02,0x07,0x02,0x00]
	0x01 0x00 0x88 0xd2 0x02 0x07 0x02 0x00			0x01 0x00 0x88 0xd2 0x02 0x07 0x02 0x00

	# VI: v_cvt_pkaccum_u8_f32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0xf0,0xd1,0x02,0x07,0x02,0x00]			# VI: v_cvt_pkaccum_u8_f32 v1, v2, v3 ; encoding: [0x01,0x00,0xf0,0xd1,0x02,0x07,0x02,0x00]
	0x01 0x00 0xf0 0xd1 0x02 0x07 0x02 0x00			0x01 0x00 0xf0 0xd1 0x02 0x07 0x02 0x00

	# VI: v_cvt_pknorm_i16_f32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x94,0xd2,0x02,0x07,0x02,0x00]			# VI: v_cvt_pknorm_i16_f32 v1, v2, v3 ; encoding: [0x01,0x00,0x94,0xd2,0x02,0x07,0x02,0x00]
	0x01 0x00 0x94 0xd2 0x02 0x07 0x02 0x00			0x01 0x00 0x94 0xd2 0x02 0x07 0x02 0x00

	# VI: v_cvt_pknorm_u16_f32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x95,0xd2,0x02,0x07,0x02,0x00]			# VI: v_cvt_pknorm_u16_f32 v1, v2, v3 ; encoding: [0x01,0x00,0x95,0xd2,0x02,0x07,0x02,0x00]
	0x01 0x00 0x95 0xd2 0x02 0x07 0x02 0x00			0x01 0x00 0x95 0xd2 0x02 0x07 0x02 0x00

	# VI: v_cvt_pkrtz_f16_f32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x96,0xd2,0x02,0x07,0x02,0x00]			# VI: v_cvt_pkrtz_f16_f32 v1, v2, v3 ; encoding: [0x01,0x00,0x96,0xd2,0x02,0x07,0x02,0x00]
	0x01 0x00 0x96 0xd2 0x02 0x07 0x02 0x00			0x01 0x00 0x96 0xd2 0x02 0x07 0x02 0x00

	# VI: v_cvt_pk_u16_u32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x97,0xd2,0x02,0x07,0x02,0x00]			# VI: v_cvt_pk_u16_u32 v1, v2, v3 ; encoding: [0x01,0x00,0x97,0xd2,0x02,0x07,0x02,0x00]
	0x01 0x00 0x97 0xd2 0x02 0x07 0x02 0x00			0x01 0x00 0x97 0xd2 0x02 0x07 0x02 0x00

	# VI: v_cvt_pk_i16_i32_e64 v1, v2, v3 ; encoding: [0x01,0x00,0x98,0xd2,0x02,0x07,0x02,0x00]			# VI: v_cvt_pk_i16_i32 v1, v2, v3 ; encoding: [0x01,0x00,0x98,0xd2,0x02,0x07,0x02,0x00]
	0x01 0x00 0x98 0xd2 0x02 0x07 0x02 0x00			0x01 0x00 0x98 0xd2 0x02 0x07 0x02 0x00

	# VI: v_add_f16_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x3e]			# VI: v_add_f16_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x3e]
	0x02 0x07 0x02 0x3e			0x02 0x07 0x02 0x3e

	# VI: v_sub_f16_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x40]			# VI: v_sub_f16_e32 v1, v2, v3 ; encoding: [0x02,0x07,0x02,0x40]
	0x02 0x07 0x02 0x40			0x02 0x07 0x02 0x40

	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] [mc] Corrected several VI opcodes to avoid printing _e64ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 99006

llvm/trunk/lib/Target/AMDGPU/VOP2Instructions.td

llvm/trunk/test/CodeGen/AMDGPU/constant-fold-mi-operands.ll

llvm/trunk/test/CodeGen/AMDGPU/ctpop.ll

llvm/trunk/test/CodeGen/AMDGPU/ctpop64.ll

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.pkrtz.ll

llvm/trunk/test/CodeGen/AMDGPU/llvm.amdgcn.mbcnt.ll

llvm/trunk/test/MC/AMDGPU/vop2.s

llvm/trunk/test/MC/AMDGPU/vop3-convert.s

llvm/trunk/test/MC/Disassembler/AMDGPU/vop2_vi.txt

[AMDGPU] [mc] Corrected several VI opcodes to avoid printing _e64
ClosedPublic