diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -698,7 +698,7 @@ can be used. For GFX7-GFX8 these are available in the :ref:`amdgpu-amdhsa-hsa-aql-queue` the address of which can be obtained with Queue Ptr SGPR (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). For - GFX9-GFX10 the aperture base addresses are directly available as inline + GFX9-GFX11 the aperture base addresses are directly available as inline constant registers ``SRC_SHARED_BASE/LIMIT`` and ``SRC_PRIVATE_BASE/LIMIT``. In 64-bit address mode the aperture sizes are 2^32 bytes and the base is aligned to 2^32 which makes it easier to convert from flat to segment or @@ -768,7 +768,7 @@ instructions with the scratch buffer descriptor and per wavefront scratch offset, by the scratch instructions, or by flat instructions. Multi-dword access is not supported except by flat and scratch instructions in - GFX9-GFX10. + GFX9-GFX11. **Constant 32-bit** *TODO* @@ -2536,10 +2536,10 @@ AMDGPU does not use a segment selector so this is 0. ``minimum_instruction_length`` (ubyte) - For GFX9-GFX10 this is 4. + For GFX9-GFX11 this is 4. ``maximum_operations_per_instruction`` (ubyte) - For GFX9-GFX10 this is 1. + For GFX9-GFX11 this is 1. Source text for online-compiled programs (for example, those compiled by the OpenCL language runtime) may be embedded into the DWARF Version 5 line table. @@ -3022,7 +3022,7 @@ "NumSGPRs" integer Required Number of scalar registers used by a wavefront for - GFX6-GFX10. This + GFX6-GFX11. This includes the special SGPRs for VCC, Flat Scratch (GFX7-GFX10) @@ -3038,7 +3038,7 @@ "NumVGPRs" integer Required Number of vector registers used by each work-item for - GFX6-GFX10 + GFX6-GFX11 "MaxFlatWorkGroupSize" integer Required Maximum flat work-group size supported by the @@ -3793,10 +3793,10 @@ instructions, or by flat instructions. If each lane of a wavefront accesses the same private address, the interleaving results in adjacent dwords being accessed and hence requires fewer cache lines to be fetched. Multi-dword access is not -supported except by flat and scratch instructions in GFX9-GFX10. +supported except by flat and scratch instructions in GFX9-GFX11. The generic address space uses the hardware flat address support available in -GFX7-GFX10. This uses two fixed ranges of virtual addresses (the private and +GFX7-GFX11. This uses two fixed ranges of virtual addresses (the private and local apertures), that are outside the range of addressible global memory, to map from a flat address to a private or local address. @@ -3812,7 +3812,7 @@ apertures address can be used. For GFX7-GFX8 these are available in the :ref:`amdgpu-amdhsa-hsa-aql-queue` the address of which can be obtained with Queue Ptr SGPR (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). For -GFX9-GFX10 the aperture base addresses are directly available as inline constant +GFX9-GFX11 the aperture base addresses are directly available as inline constant registers ``SRC_SHARED_BASE/LIMIT`` and ``SRC_PRIVATE_BASE/LIMIT``. In 64 bit address mode the aperture sizes are 2^32 bytes and the base is aligned to 2^32 which makes it easier to convert from flat to segment or segment to flat. @@ -3936,28 +3936,28 @@ configuration register. See :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx90a-table`. - GFX10 + GFX10-GFX11 Compute Shader (CS) program settings used by CP to set up ``COMPUTE_PGM_RSRC3`` configuration register. See - :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-table`. + :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx11-table`. 415:384 4 bytes COMPUTE_PGM_RSRC1 Compute Shader (CS) program settings used by CP to set up ``COMPUTE_PGM_RSRC1`` configuration register. See - :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`. 447:416 4 bytes COMPUTE_PGM_RSRC2 Compute Shader (CS) program settings used by CP to set up ``COMPUTE_PGM_RSRC2`` configuration register. See - :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`. + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`. 458:448 7 bits *See separate bits below.* Enable the setup of the SGPR user data registers (see @@ -3991,7 +3991,7 @@ 457:455 3 bits Reserved, must be 0. 458 1 bit ENABLE_WAVEFRONT_SIZE32 GFX6-GFX9 Reserved, must be 0. - GFX10 + GFX10-GFX11 - If 0 execute in wavefront size 64 mode. - If 1 execute in @@ -4008,8 +4008,8 @@ .. - .. table:: compute_pgm_rsrc1 for GFX6-GFX10 - :name: amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table + .. table:: compute_pgm_rsrc1 for GFX6-GFX11 + :name: amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table ======= ======= =============================== =========================================================================== Bits Size Field Name Description @@ -4027,10 +4027,10 @@ - vgprs_used = align(arch_vgprs, 4) + acc_vgprs - max(0, ceil(vgprs_used / 8) - 1) - GFX10 (wavefront size 64) + GFX10-GFX11 (wavefront size 64) - max_vgpr 1..256 - max(0, ceil(vgprs_used / 4) - 1) - GFX10 (wavefront size 32) + GFX10-GFX11 (wavefront size 32) - max_vgpr 1..256 - max(0, ceil(vgprs_used / 8) - 1) @@ -4064,7 +4064,7 @@ GFX9 - sgprs_used 0..112 - 2 * max(0, ceil(sgprs_used / 16) - 1) - GFX10 + GFX10-GFX11 Reserved, must be 0. (128 SGPRs always allocated.) @@ -4231,7 +4231,7 @@ ``COMPUTE_PGM_RSRC1.CDBG_USER``. 26 1 bit FP16_OVFL GFX6-GFX8 Reserved, must be 0. - GFX9-GFX10 + GFX9-GFX11 Wavefront starts execution with specified fp16 overflow mode. @@ -4250,7 +4250,7 @@ 28:27 2 bits Reserved, must be 0. 29 1 bit WGP_MODE GFX6-GFX9 Reserved, must be 0. - GFX10 + GFX10-GFX11 - If 0 execute work-groups in CU wavefront execution mode. - If 1 execute work-groups on @@ -4262,7 +4262,7 @@ ``COMPUTE_PGM_RSRC1.WGP_MODE``. 30 1 bit MEM_ORDERED GFX6-GFX9 Reserved, must be 0. - GFX10 + GFX10-GFX11 Controls the behavior of the s_waitcnt's vmcnt and vscnt counters. @@ -4285,7 +4285,7 @@ ``COMPUTE_PGM_RSRC1.MEM_ORDERED``. 31 1 bit FWD_PROGRESS GFX6-GFX9 Reserved, must be 0. - GFX10 + GFX10-GFX11 - If 0 execute SIMD wavefronts using oldest first policy. - If 1 execute SIMD wavefronts to @@ -4299,8 +4299,8 @@ .. - .. table:: compute_pgm_rsrc2 for GFX6-GFX10 - :name: amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table + .. table:: compute_pgm_rsrc2 for GFX6-GFX11 + :name: amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table ======= ======= =============================== =========================================================================== Bits Size Field Name Description @@ -4437,7 +4437,7 @@ GFX6 roundup(lds-size / (64 * 4)) - GFX7-GFX10 + GFX7-GFX11 roundup(lds-size / (128 * 4)) 24 1 bit ENABLE_EXCEPTION_IEEE_754_FP Wavefront starts execution @@ -4493,8 +4493,8 @@ .. - .. table:: compute_pgm_rsrc3 for GFX10 - :name: amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-table + .. table:: compute_pgm_rsrc3 for GFX10-GFX11 + :name: amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx11-table ======= ======= =============================== =========================================================================== Bits Size Field Name Description @@ -4503,8 +4503,43 @@ wavefront size 64 the value is 0-15, representing 0-120 VGPRs (granularity of 8), such that (compute_pgm_rsrc1.vgprs +1)*4 + shared_vgpr_count*8 does not exceed 256. For wavefront size 32 shared_vgpr_count must be 0. - 31:4 28 Reserved, must be 0. - bits + 9:4 6 bits INST_PREF_SIZE GFX10 + Reserved, must be 0. + GFX11 + Number of instruction bytes to prefetch, starting at the kernel's entry + point instruction, before wavefront starts execution. The value is 0..63 + with a granularity of 128 bytes. + 10 1 bit TRAP_ON_START GFX10 + Reserved, must be 0. + GFX11 + Must be 0. + + If 1, wavefront starts execution by trapping into the trap handler. + + CP is responsible for filling in the trap on start bit in + ``COMPUTE_PGM_RSRC3.TRAP_ON_START`` according to what the runtime + requests. + 11 1 bit TRAP_ON_END GFX10 + Reserved, must be 0. + GFX11 + Must be 0. + + If 1, wavefront execution terminates by trapping into the trap handler. + + CP is responsible for filling in the trap on end bit in + ``COMPUTE_PGM_RSRC3.TRAP_ON_END`` according to what the runtime requests. + 30:12 19 bits Reserved, must be 0. + 31 1 bit IMAGE_OP GFX10 + Reserved, must be 0. + GFX11 + If 1, the kernel execution contains image instructions. If executed as + part of a graphics pipeline, image read instructions will stall waiting + for any necessary ``WAIT_SYNC`` fence to be performed in order to + indicate that earlier pipeline stages have completed writing to the + image. + + Not used for compute kernels that are not part of a graphics pipeline and + must be 0. 32 **Total size 4 bytes.** ======= =================================================================================================================== @@ -4630,7 +4665,7 @@ GFX7-GFX8 since it is the same value as the second SGPR of Flat Scratch Init. However, it - may be needed for GFX9-GFX10 which + may be needed for GFX9-GFX11 which changes the meaning of the Flat Scratch Init value. then Work-Group Id X 1 32-bit work-group id in X @@ -4732,8 +4767,8 @@ :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`. The global segment can be accessed either using buffer instructions (GFX6 which -has V# 64-bit address support), flat instructions (GFX7-GFX10), or global -instructions (GFX9-GFX10). +has V# 64-bit address support), flat instructions (GFX7-GFX11), or global +instructions (GFX9-GFX11). If buffer operations are used, then the compiler can generate a V# with the following properties: @@ -4779,7 +4814,7 @@ available in dispatch packet. For M0, it is also possible to use maximum possible value of LDS for given target (0x7FFF for GFX6 and 0xFFFF for GFX7-GFX8). -GFX9-GFX10 +GFX9-GFX11 The M0 register is not used for range checking LDS accesses and so does not need to be initialized in the prolog. @@ -4891,7 +4926,7 @@ specifies *Architected flat scratch*: If ENABLE_PRIVATE_SEGMENT is enabled in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table` then the FLAT_SCRATCH + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table` then the FLAT_SCRATCH register pair will be initialized to the 64-bit address of the base of scratch backing memory being managed by SPI for the queue executing the kernel dispatch plus the value of the wave's Scratch Wavefront Offset for use as the @@ -5000,7 +5035,7 @@ termed vector memory operations. Private address space uses ``buffer_load/store`` using the scratch V# -(GFX6-GFX8), or ``scratch_load/store`` (GFX9-GFX10). Since only a single thread +(GFX6-GFX8), or ``scratch_load/store`` (GFX9-GFX11). Since only a single thread is accessing the memory, atomic memory orderings are not meaningful, and all accesses are treated as non-atomic. @@ -5067,7 +5102,7 @@ * :ref:`amdgpu-amdhsa-memory-model-gfx6-gfx9` * :ref:`amdgpu-amdhsa-memory-model-gfx90a` * :ref:`amdgpu-amdhsa-memory-model-gfx940` -* :ref:`amdgpu-amdhsa-memory-model-gfx10` +* :ref:`amdgpu-amdhsa-memory-model-gfx10-gfx11` .. _amdgpu-amdhsa-memory-model-gfx6-gfx9: @@ -11137,12 +11172,12 @@ - system for OpenCL.* ============ ============ ============== ========== ================================ -.. _amdgpu-amdhsa-memory-model-gfx10: +.. _amdgpu-amdhsa-memory-model-gfx10-gfx11: -Memory Model GFX10 -++++++++++++++++++ +Memory Model GFX10-GFX11 +++++++++++++++++++++++++ -For GFX10: +For GFX10-GFX11: * Each agent has multiple shader arrays (SA). * Each SA has multiple work-group processors (WGP). @@ -11251,7 +11286,7 @@ sample instructions. In this mode vmcnt reports completion of load, atomic with return and sample instructions in order, and the vscnt reports the completion of store and atomic without return in order. See ``MEM_ORDERED`` field in -:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. +:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`. Wavefronts can be executed in WGP or CU wavefront execution mode: @@ -11267,18 +11302,18 @@ work-group synchronization. See ``WGP_MODE`` field in -:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table` and +:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table` and :ref:`amdgpu-target-features`. -The code sequences used to implement the memory model for GFX10 are defined in -table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx10-table`. +The code sequences used to implement the memory model for GFX10-GFX11 are defined in +table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx10-gfx11-table`. - .. table:: AMDHSA Memory Model Code Sequences GFX10 - :name: amdgpu-amdhsa-memory-model-code-sequences-gfx10-table + .. table:: AMDHSA Memory Model Code Sequences GFX10-GFX11 + :name: amdgpu-amdhsa-memory-model-code-sequences-gfx10-gfx11-table ============ ============ ============== ========== ================================ LLVM Instr LLVM Memory LLVM Memory AMDGPU AMDGPU Machine Code - Ordering Sync Scope Address GFX10 + Ordering Sync Scope Address GFX10-GFX11 Space ============ ============ ============== ========== ================================ **Non-Atomic** @@ -13308,7 +13343,7 @@ :name: amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table =================== =============== ================ ================= ======================================= - Usage Code Sequence GFX6-GFX8 Inputs GFX9-GFX10 Inputs Description + Usage Code Sequence GFX6-GFX8 Inputs GFX9-GFX11 Inputs Description =================== =============== ================ ================= ======================================= reserved ``s_trap 0x00`` Reserved by hardware. debugger breakpoint ``s_trap 0x01`` *none* *none* Reserved for debugger to use for @@ -14206,7 +14241,7 @@ --------- AMDGPU backend has LLVM-MC based assembler which is currently in development. -It supports AMDGCN GFX6-GFX10. +It supports AMDGCN GFX6-GFX11. This section describes general syntax for instructions and operands. @@ -14766,121 +14801,119 @@ ======================================================== =================== ============ =================== Directive Default Supported On Description ======================================================== =================== ============ =================== - ``.amdhsa_group_segment_fixed_size`` 0 GFX6-GFX10 Controls GROUP_SEGMENT_FIXED_SIZE in + ``.amdhsa_group_segment_fixed_size`` 0 GFX6-GFX11 Controls GROUP_SEGMENT_FIXED_SIZE in :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. - ``.amdhsa_private_segment_fixed_size`` 0 GFX6-GFX10 Controls PRIVATE_SEGMENT_FIXED_SIZE in + ``.amdhsa_private_segment_fixed_size`` 0 GFX6-GFX11 Controls PRIVATE_SEGMENT_FIXED_SIZE in :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. - ``.amdhsa_kernarg_size`` 0 GFX6-GFX10 Controls KERNARG_SIZE in + ``.amdhsa_kernarg_size`` 0 GFX6-GFX11 Controls KERNARG_SIZE in :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. - ``.amdhsa_user_sgpr_count`` 0 GFX6-GFX10 Controls USER_SGPR_COUNT in COMPUTE_PGM_RSRC2 - :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table` + ``.amdhsa_user_sgpr_count`` 0 GFX6-GFX11 Controls USER_SGPR_COUNT in COMPUTE_PGM_RSRC2 + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table` ``.amdhsa_user_sgpr_private_segment_buffer`` 0 GFX6-GFX10 Controls ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER in (except :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. GFX940) - ``.amdhsa_user_sgpr_dispatch_ptr`` 0 GFX6-GFX10 Controls ENABLE_SGPR_DISPATCH_PTR in + ``.amdhsa_user_sgpr_dispatch_ptr`` 0 GFX6-GFX11 Controls ENABLE_SGPR_DISPATCH_PTR in :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. - ``.amdhsa_user_sgpr_queue_ptr`` 0 GFX6-GFX10 Controls ENABLE_SGPR_QUEUE_PTR in + ``.amdhsa_user_sgpr_queue_ptr`` 0 GFX6-GFX11 Controls ENABLE_SGPR_QUEUE_PTR in :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. - ``.amdhsa_user_sgpr_kernarg_segment_ptr`` 0 GFX6-GFX10 Controls ENABLE_SGPR_KERNARG_SEGMENT_PTR in + ``.amdhsa_user_sgpr_kernarg_segment_ptr`` 0 GFX6-GFX11 Controls ENABLE_SGPR_KERNARG_SEGMENT_PTR in :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. - ``.amdhsa_user_sgpr_dispatch_id`` 0 GFX6-GFX10 Controls ENABLE_SGPR_DISPATCH_ID in + ``.amdhsa_user_sgpr_dispatch_id`` 0 GFX6-GFX11 Controls ENABLE_SGPR_DISPATCH_ID in :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_user_sgpr_flat_scratch_init`` 0 GFX6-GFX10 Controls ENABLE_SGPR_FLAT_SCRATCH_INIT in (except :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. GFX940) - ``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX10 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in + ``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX11 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. - ``.amdhsa_wavefront_size32`` Target GFX10 Controls ENABLE_WAVEFRONT_SIZE32 in + ``.amdhsa_wavefront_size32`` Target GFX10-GFX11 Controls ENABLE_WAVEFRONT_SIZE32 in Feature :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. Specific (wavefrontsize64) ``.amdhsa_system_sgpr_private_segment_wavefront_offset`` 0 GFX6-GFX10 Controls ENABLE_PRIVATE_SEGMENT in - (except :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`. + (except :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`. GFX940) - ``.amdhsa_enable_private_segment`` 0 GFX940 Controls ENABLE_PRIVATE_SEGMENT in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`. - ``.amdhsa_system_sgpr_workgroup_id_x`` 1 GFX6-GFX10 Controls ENABLE_SGPR_WORKGROUP_ID_X in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`. - ``.amdhsa_system_sgpr_workgroup_id_y`` 0 GFX6-GFX10 Controls ENABLE_SGPR_WORKGROUP_ID_Y in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`. - ``.amdhsa_system_sgpr_workgroup_id_z`` 0 GFX6-GFX10 Controls ENABLE_SGPR_WORKGROUP_ID_Z in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`. - ``.amdhsa_system_sgpr_workgroup_info`` 0 GFX6-GFX10 Controls ENABLE_SGPR_WORKGROUP_INFO in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`. - ``.amdhsa_system_vgpr_workitem_id`` 0 GFX6-GFX10 Controls ENABLE_VGPR_WORKITEM_ID in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`. + ``.amdhsa_enable_private_segment`` 0 GFX940, Controls ENABLE_PRIVATE_SEGMENT in + GFX11 :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`. + ``.amdhsa_system_sgpr_workgroup_id_x`` 1 GFX6-GFX11 Controls ENABLE_SGPR_WORKGROUP_ID_X in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`. + ``.amdhsa_system_sgpr_workgroup_id_y`` 0 GFX6-GFX11 Controls ENABLE_SGPR_WORKGROUP_ID_Y in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`. + ``.amdhsa_system_sgpr_workgroup_id_z`` 0 GFX6-GFX11 Controls ENABLE_SGPR_WORKGROUP_ID_Z in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`. + ``.amdhsa_system_sgpr_workgroup_info`` 0 GFX6-GFX11 Controls ENABLE_SGPR_WORKGROUP_INFO in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`. + ``.amdhsa_system_vgpr_workitem_id`` 0 GFX6-GFX11 Controls ENABLE_VGPR_WORKITEM_ID in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`. Possible values are defined in :ref:`amdgpu-amdhsa-system-vgpr-work-item-id-enumeration-values-table`. - ``.amdhsa_next_free_vgpr`` Required GFX6-GFX10 Maximum VGPR number explicitly referenced, plus one. + ``.amdhsa_next_free_vgpr`` Required GFX6-GFX11 Maximum VGPR number explicitly referenced, plus one. Used to calculate GRANULATED_WORKITEM_VGPR_COUNT in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. - ``.amdhsa_next_free_sgpr`` Required GFX6-GFX10 Maximum SGPR number explicitly referenced, plus one. + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`. + ``.amdhsa_next_free_sgpr`` Required GFX6-GFX11 Maximum SGPR number explicitly referenced, plus one. Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`. ``.amdhsa_accum_offset`` Required GFX90A, Offset of a first AccVGPR in the unified register file. GFX940 Used to calculate ACCUM_OFFSET in :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx90a-table`. - ``.amdhsa_reserve_vcc`` 1 GFX6-GFX10 Whether the kernel may use the special VCC SGPR. + ``.amdhsa_reserve_vcc`` 1 GFX6-GFX11 Whether the kernel may use the special VCC SGPR. Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`. ``.amdhsa_reserve_flat_scratch`` 1 GFX7-GFX10 Whether the kernel may use flat instructions to access (except scratch memory. Used to calculate GFX940) GRANULATED_WAVEFRONT_SGPR_COUNT in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`. ``.amdhsa_reserve_xnack_mask`` Target GFX8-GFX10 Whether the kernel may trigger XNACK replay. Feature Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in - Specific :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. + Specific :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`. (xnack) - ``.amdhsa_float_round_mode_32`` 0 GFX6-GFX10 Controls FLOAT_ROUND_MODE_32 in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. + ``.amdhsa_float_round_mode_32`` 0 GFX6-GFX11 Controls FLOAT_ROUND_MODE_32 in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`. Possible values are defined in :ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`. - ``.amdhsa_float_round_mode_16_64`` 0 GFX6-GFX10 Controls FLOAT_ROUND_MODE_16_64 in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. + ``.amdhsa_float_round_mode_16_64`` 0 GFX6-GFX11 Controls FLOAT_ROUND_MODE_16_64 in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`. Possible values are defined in :ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`. - ``.amdhsa_float_denorm_mode_32`` 0 GFX6-GFX10 Controls FLOAT_DENORM_MODE_32 in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. + ``.amdhsa_float_denorm_mode_32`` 0 GFX6-GFX11 Controls FLOAT_DENORM_MODE_32 in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`. Possible values are defined in :ref:`amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table`. - ``.amdhsa_float_denorm_mode_16_64`` 3 GFX6-GFX10 Controls FLOAT_DENORM_MODE_16_64 in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. + ``.amdhsa_float_denorm_mode_16_64`` 3 GFX6-GFX11 Controls FLOAT_DENORM_MODE_16_64 in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`. Possible values are defined in :ref:`amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table`. - ``.amdhsa_dx10_clamp`` 1 GFX6-GFX10 Controls ENABLE_DX10_CLAMP in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. - ``.amdhsa_ieee_mode`` 1 GFX6-GFX10 Controls ENABLE_IEEE_MODE in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. - ``.amdhsa_fp16_overflow`` 0 GFX9-GFX10 Controls FP16_OVFL in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. + ``.amdhsa_dx10_clamp`` 1 GFX6-GFX11 Controls ENABLE_DX10_CLAMP in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`. + ``.amdhsa_ieee_mode`` 1 GFX6-GFX11 Controls ENABLE_IEEE_MODE in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`. + ``.amdhsa_fp16_overflow`` 0 GFX9-GFX11 Controls FP16_OVFL in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`. ``.amdhsa_tg_split`` Target GFX90A, Controls TG_SPLIT in - Feature GFX940 :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx90a-table`. - Specific + Feature GFX940, :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx90a-table`. + Specific GFX11 (tgsplit) - ``.amdhsa_workgroup_processor_mode`` Target GFX10 Controls ENABLE_WGP_MODE in + ``.amdhsa_workgroup_processor_mode`` Target GFX10-GFX11 Controls ENABLE_WGP_MODE in Feature :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. Specific (cumode) - ``.amdhsa_memory_ordered`` 1 GFX10 Controls MEM_ORDERED in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. - ``.amdhsa_forward_progress`` 0 GFX10 Controls FWD_PROGRESS in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. - ``.amdhsa_shared_vgpr_count`` 0 GFX10 Controls SHARED_VGPR_COUNT in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-table`. - ``.amdhsa_exception_fp_ieee_invalid_op`` 0 GFX6-GFX10 Controls ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`. - ``.amdhsa_exception_fp_denorm_src`` 0 GFX6-GFX10 Controls ENABLE_EXCEPTION_FP_DENORMAL_SOURCE in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`. - ``.amdhsa_exception_fp_ieee_div_zero`` 0 GFX6-GFX10 Controls ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`. - ``.amdhsa_exception_fp_ieee_overflow`` 0 GFX6-GFX10 Controls ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`. - ``.amdhsa_exception_fp_ieee_underflow`` 0 GFX6-GFX10 Controls ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`. - ``.amdhsa_exception_fp_ieee_inexact`` 0 GFX6-GFX10 Controls ENABLE_EXCEPTION_IEEE_754_FP_INEXACT in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`. - ``.amdhsa_exception_int_div_zero`` 0 GFX6-GFX10 Controls ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO in - :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`. + ``.amdhsa_memory_ordered`` 1 GFX10-GFX11 Controls MEM_ORDERED in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`. + ``.amdhsa_forward_progress`` 0 GFX10-GFX11 Controls FWD_PROGRESS in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx11-table`. + ``.amdhsa_exception_fp_ieee_invalid_op`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`. + ``.amdhsa_exception_fp_denorm_src`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_FP_DENORMAL_SOURCE in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`. + ``.amdhsa_exception_fp_ieee_div_zero`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`. + ``.amdhsa_exception_fp_ieee_overflow`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`. + ``.amdhsa_exception_fp_ieee_underflow`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`. + ``.amdhsa_exception_fp_ieee_inexact`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_IEEE_754_FP_INEXACT in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`. + ``.amdhsa_exception_int_div_zero`` 0 GFX6-GFX11 Controls ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx11-table`. ======================================================== =================== ============ =================== .amdgpu_metadata