Index: docs/AMDGPUUsage.rst =================================================================== --- docs/AMDGPUUsage.rst +++ docs/AMDGPUUsage.rst @@ -8,6 +8,15 @@ The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with the R600 family up until the current Volcanic Islands (GCN Gen 3). +The following additional documentation is available: + +* AMD GCN3 Instruction Set Architecture: `PDF `__ + +* Southern Islands Series Instruction Set Architecture: `PDF `__ + +* R600-Family Instruction Set Architecture: `PDF `__ + +* AMDGPU Compute Application Binary Interface: `MD `__ Conventions =========== @@ -35,96 +44,241 @@ Assembler ========= -The assembler is currently considered experimental. +AMDGPU backend has LLVM-MC based assembler which is currently in development. +It supports Southern Islands ISA, Sea Islands and Volcanic Islands. -For syntax examples look in test/MC/AMDGPU. +This document describes general syntax for instructions and operands. For more +information about instructions, their semantics and supported combinations +of operands, refer to one of Instruction Set Architecture manuals. -Below some of the currently supported features (modulo bugs). These -all apply to the Southern Islands ISA, Sea Islands and Volcanic Islands -are also supported but may be missing some instructions and have more bugs: +An instruction has the following syntax (register operands are +normally comma-separated while extra operands are space-separated): -DS Instructions ---------------- -All DS instructions are supported. +* , ... ...* -FLAT Instructions ------------------- -These instructions are only present in the Sea Islands and Volcanic Islands -instruction set. All FLAT instructions are supported for these architectures -MUBUF Instructions ------------------- -All non-atomic MUBUF instructions are supported. +Operands +-------- -SMRD Instructions ------------------ -Only the s_load_dword* SMRD instructions are supported. +The following syntax for register operands is supported: -SOP1 Instructions ------------------ -All SOP1 instructions are supported. +* SGPR registers: s0, ... or s[0], ... +* VGPR registers: v0, ... or v[0], ... +* TTMP registers: ttmp0, ... or ttmp[0], ... +* Special registers: exec (exec_lo, exec_hi), vcc (vcc_lo, vcc_hi), flat_scratch (flat_scratch_lo, flat_scratch_hi) +* Special trap registers: tba (tba_lo, tba_hi), tma (tma_lo, tma_hi) +* Register pairs, quads, etc: s[2:3], v[10:11], ttmp[5:6], s[4:7], v[12:15], ttmp[4:7], s[8:15], ... +* Register lists: [s0, s1], [ttmp0, ttmp1, ttmp2, ttmp3] +* Register index expressions: v[2*2], s[1-1:2-1] +* off indicates that an operand is not enabled -SOP2 Instructions ------------------ -All SOP2 instructions are supported. +The following extra operands are supported: -SOPC Instructions ------------------ -All SOPC instructions are supported. +* offset, offset0, offset1 +* idxen, offen bits +* glc, slc, tfe bits +* waitcnt: integer or combination of counter values +* VOP3 modifiers: -SOPP Instructions ------------------ + - abs (\| \|), neg (\-) -Unless otherwise mentioned, all SOPP instructions that have one or more -operands accept integer operands only. No verification is performed -on the operands, so it is up to the programmer to be familiar with the -range or acceptable values. +* DPP modifiers: + + - row_shl, row_shr, row_ror, row_rol + - row_mirror, row_half_mirror, row_bcast + - wave_shl, wave_shr, wave_ror, wave_rol, quad_perm + - row_mask, bank_mask, bound_ctrl + +* SDWA modifiers: -s_waitcnt -^^^^^^^^^ + - dst_sel, src0_sel, src1_sel (BYTE_N, WORD_M, DWORD) + - dst_unused (UNUSED_PAD, UNUSED_SEXT, UNUSED_PRESERVE) + - abs, neg, sext -s_waitcnt accepts named arguments to specify which memory counter(s) to -wait for. +DS Instructions Examples +------------------------ .. code-block:: nasm - ; Wait for all counters to be 0 - s_waitcnt 0 + ds_add_u32 v2, v4 offset:16 + ds_write_src2_b64 v2 offset0:4 offset1:8 + ds_cmpst_f32 v2, v4, v6 + ds_min_rtn_f64 v[8:9], v2, v[4:5] - ; Equivalent to s_waitcnt 0. Counter names can also be delimited by - ; '&' or ','. - s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0) - ; Wait for vmcnt counter to be 1. - s_waitcnt vmcnt(1) +For full list of supported instructions, refer to "LDS/GDS instructions" in ISA Manual. -VOP1, VOP2, VOP3, VOPC Instructions ------------------------------------ +FLAT Instruction Examples +-------------------------- -All 32-bit and 64-bit encodings should work. +.. code-block:: nasm + + flat_load_dword v1, v[3:4] + flat_store_dwordx3 v[3:4], v[5:7] + flat_atomic_swap v1, v[3:4], v5 glc + flat_atomic_cmpswap v1, v[3:4], v[5:6] glc slc + flat_atomic_fmax_x2 v[1:2], v[3:4], v[5:6] glc -The assembler will automatically detect which encoding size to use for -VOP1, VOP2, and VOPC instructions based on the operands. If you want to force -a specific encoding size, you can add an _e32 (for 32-bit encoding) or -_e64 (for 64-bit encoding) suffix to the instruction. Most, but not all -instructions support an explicit suffix. These are all valid assembly -strings: +For full list of supported instructions, refer to "FLAT instructions" in ISA Manual. + +MUBUF Instruction Examples +--------------------------- .. code-block:: nasm - v_mul_i32_i24 v1, v2, v3 - v_mul_i32_i24_e32 v1, v2, v3 - v_mul_i32_i24_e64 v1, v2, v3 + buffer_load_dword v1, off, s[4:7], s1 + buffer_store_dwordx4 v[1:4], v2, ttmp[4:7], s1 offen offset:4 glc tfe + buffer_store_format_xy v[1:2], off, s[4:7], s1 + buffer_wbinvl1 + buffer_atomic_inc v1, v2, s[8:11], s4 idxen offset:4 slc + +For full list of supported instructions, refer to "MUBUF Instructions" in ISA Manual. + +SMRD/SMEM Instruction Examples +------------------------------- + +.. code-block:: nasm -Assembler Directives --------------------- + s_load_dword s1, s[2:3], 0xfc + s_load_dwordx8 s[8:15], s[2:3], s4 + s_load_dwordx16 s[88:103], s[2:3], s4 + s_dcache_inv_vol + s_memtime s[4:5] + +For full list of supported instructions, refer to "Scalar Memory Operations" in ISA Manual. + +SOP1 Instruction Examples +-------------------------- + +.. code-block:: nasm + + s_mov_b32 s1, s2 + s_mov_b64 s[0:1], 0x80000000 + s_cmov_b32 s1, 200 + s_wqm_b64 s[2:3], s[4:5] + s_bcnt0_i32_b64 s1, s[2:3] + s_swappc_b64 s[2:3], s[4:5] + s_cbranch_join s[4:5] + +For full list of supported instructions, refer to "SOP1 Instructions" in ISA Manual. + +SOP2 Instruction Examples +------------------------- + +.. code-block:: nasm + + s_add_u32 s1, s2, s3 + s_and_b64 s[2:3], s[4:5], s[6:7] + s_cselect_b32 s1, s2, s3 + s_andn2_b32 s2, s4, s6 + s_lshr_b64 s[2:3], s[4:5], s6 + s_ashr_i32 s2, s4, s6 + s_bfm_b64 s[2:3], s4, s6 + s_bfe_i64 s[2:3], s[4:5], s6 + s_cbranch_g_fork s[4:5], s[6:7] + +For full list of supported instructions, refer to "SOP2 Instructions" in ISA Manual. + +SOPC Instruction Examples +-------------------------- + +.. code-block:: nasm + + s_cmp_eq_i32 s1, s2 + s_bitcmp1_b32 s1, s2 + s_bitcmp0_b64 s[2:3], s4 + s_setvskip s3, s5 + +For full list of supported instructions, refer to "SOPC Instructions" in ISA Manual. + +SOPP Instruction Examples +-------------------------- + +.. code-block:: nasm + + s_barrier + s_nop 2 + s_endpgm + s_waitcnt 0 ; Wait for all counters to be 0 + s_waitcnt vmcnt(0) & expcnt(0) & lgkmcnt(0) ; Equivalent to above + s_waitcnt vmcnt(1) ; Wait for vmcnt counter to be 1. + s_sethalt 9 + s_sleep 10 + s_sendmsg 0x1 + s_sendmsg sendmsg(MSG_INTERRUPT) + s_trap 1 + +For full list of supported instructions, refer to "SOPP Instructions" in ISA Manual. + +Unless otherwise mentioned, little verification is performed on the operands +of SOPP Instrucitons, so it is up to the programmer to be familiar with the +range or acceptable values. + +Vector ALU Instruction Examples +------------------------------- + +For vector ALU instruction opcodes (VOP1, VOP2, VOP3, VOPC, VOP_DPP, VOP_SDWA), +the assembler will automatically use optimal encoding based on its operands. +To force specific encoding, one can add a suffix to the opcode of the instruction: + +* _e32 for 32-bit VOP1/VOP2/VOPC +* _e64 for 64-bit VOP3 +* _dpp for VOP_DPP +* _sdwa for VOP_SDWA + +VOP1/VOP2/VOP3/VOPC examples: + +.. code-block:: nasm + + v_mov_b32 v1, v2 + v_mov_b32_e32 v1, v2 + v_nop + v_cvt_f64_i32_e32 v[1:2], v2 + v_floor_f32_e32 v1, v2 + v_bfrev_b32_e32 v1, v2 + v_add_f32_e32 v1, v2, v3 + v_mul_i32_i24_e64 v1, v2, 3 + v_mul_i32_i24_e32 v1, -3, v3 + v_mul_i32_i24_e32 v1, -100, v3 + v_addc_u32 v1, s[0:1], v2, v3, s[2:3] + v_max_f16_e32 v1, v2, v3 + +VOP_DPP examples: + +.. code-block:: nasm + + v_mov_b32 v0, v0 quad_perm:[0,2,1,1] + v_sin_f32 v0, v0 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0 + v_mov_b32 v0, v0 wave_shl:1 + v_mov_b32 v0, v0 row_mirror + v_mov_b32 v0, v0 row_bcast:31 + v_mov_b32 v0, v0 quad_perm:[1,3,0,1] row_mask:0xa bank_mask:0x1 bound_ctrl:0 + v_add_f32 v0, v0, |v0| row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0 + v_max_f16 v1, v2, v3 row_shl:1 row_mask:0xa bank_mask:0x1 bound_ctrl:0 + +VOP_SDWA examples: + +.. code-block:: nasm + + v_mov_b32 v1, v2 dst_sel:BYTE_0 dst_unused:UNUSED_PRESERVE src0_sel:DWORD + v_min_u32 v200, v200, v1 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:BYTE_1 src1_sel:DWORD + v_sin_f32 v0, v0 dst_unused:UNUSED_PAD src0_sel:WORD_1 + v_fract_f32 v0, |v0| dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 + v_cmpx_le_u32 vcc, v1, v2 src0_sel:BYTE_2 src1_sel:WORD_0 + +For full list of supported instructions, refer to "Vector ALU instructions". + +HSA Code Object Directives +-------------------------- + +AMDGPU ABI defines auxiliary data in output code object. In assembly source, +one can specify them with assembler directives. .hsa_code_object_version major, minor ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ *major* and *minor* are integers that specify the version of the HSA code -object that will be generated by the assembler. This value will be stored -in an entry of the .note section. +object that will be generated by the assembler. .hsa_code_object_isa [major, minor, stepping, vendor, arch] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -135,12 +289,14 @@ *vendor* and *arch* are quoted strings. *vendor* should always be equal to "AMD" and *arch* should always be equal to "AMDGPU". -If no arguments are specified, then the assembler will derive the ISA version, -*vendor*, and *arch* from the value of the -mcpu option that is passed to the -assembler. +By default, the assembler will derive the ISA version, *vendor*, and *arch* +from the value of the -mcpu option that is passed to the assembler. + +.amdgpu_hsa_kernel (name) +^^^^^^^^^^^^^^^^^^^^^^^^^ -ISA version, *vendor*, and *arch* will all be stored in a single entry of the -.note section. +This directives specifies that the symbol with given name is a kernel entry point +(label) and the object should contain corresponding symbol of type STT_AMDGPU_HSA_KERNEL. .amd_kernel_code_t ^^^^^^^^^^^^^^^^^^ @@ -165,9 +321,8 @@ The *.amd_kernel_code_t* directive must be placed immediately after the function label and before any instructions. -For a full list of amd_kernel_code_t keys, see the examples in -test/CodeGen/AMDGPU/hsa.s. For an explanation of the meanings of the different -keys, see the comments in lib/Target/AMDGPU/AmdKernelCodeT.h +For a full list of amd_kernel_code_t keys, refer to AMDGPU ABI document, +comments in lib/Target/AMDGPU/AmdKernelCodeT.h and test/CodeGen/AMDGPU/hsa.s. Here is an example of a minimal amd_kernel_code_t specification: