diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -8459,140 +8459,250 @@ ------ This section provides code conventions used when the target triple OS is -``amdpal`` (see :ref:`amdgpu-target-triples`) for passing runtime parameters -from the application/runtime to each invocation of a hardware shader. These -parameters include both generic, application-controlled parameters called -*user data* as well as system-generated parameters that are a product of the -draw or dispatch execution. +``amdpal`` (see :ref:`amdgpu-target-triples`). -User Data -~~~~~~~~~ +.. _amdgpu-amdpal-code-object-metadata-section: -Each hardware stage has a set of 32-bit *user data registers* which can be -written from a command buffer and then loaded into SGPRs when waves are launched -via a subsequent dispatch or draw operation. This is the way most arguments are -passed from the application/runtime to a hardware shader. +Code Object Metadata +~~~~~~~~~~~~~~~~~~~~ -Compute User Data -~~~~~~~~~~~~~~~~~ +.. note:: -Compute shader user data mappings are simpler than graphics shaders and have a -fixed mapping. + The metadata is currently in development and is subject to major + changes. -Note that there are always 10 available *user data entries* in registers - -entries beyond that limit must be fetched from memory (via the spill table -pointer) by the shader. +Code object metadata is specified by the ``NT_AMDGPU_METADATA`` note +record (see :ref:`amdgpu-note-records-v3-v4`). - .. table:: PAL Compute Shader User Data Registers - :name: pal-compute-user-data-registers +The metadata is represented as Message Pack formatted binary data (see +[MsgPack]_). The top level is a Message Pack map that includes the keys +defined in table :ref:`amdgpu-amdpal-code-object-metadata-map-table` +and referenced tables. - ============= ================================ - User Register Description - ============= ================================ - 0 Global Internal Table (32-bit pointer) - 1 Per-Shader Internal Table (32-bit pointer) - 2 - 11 Application-Controlled User Data (10 32-bit values) - 12 Spill Table (32-bit pointer) - 13 - 14 Thread Group Count (64-bit pointer) - 15 GDS Range - ============= ================================ +Additional information can be added to the maps. To avoid conflicts, any +key names should be prefixed by "*vendor-name*." where ``vendor-name`` +can be the name of the vendor and specific vendor tool that generates the +information. The prefix is abbreviated to simply "." when it appears +within a map that has been added by the same *vendor-name*. -Graphics User Data -~~~~~~~~~~~~~~~~~~ + .. table:: AMDPAL Code Object Metadata Map + :name: amdgpu-amdpal-code-object-metadata-map-table -Graphics pipelines support a much more flexible user data mapping: - - .. table:: PAL Graphics Shader User Data Registers - :name: pal-graphics-user-data-registers - - ============= ================================ - User Register Description - ============= ================================ - 0 Global Internal Table (32-bit pointer) - + Per-Shader Internal Table (32-bit pointer) - + 1-15 Application Controlled User Data - (1-15 Contiguous 32-bit Values in Registers) - + Spill Table (32-bit pointer) - + Draw Index (First Stage Only) - + Vertex Offset (First Stage Only) - + Instance Offset (First Stage Only) - ============= ================================ - - The placement of the global internal table remains fixed in the first *user - data SGPR register*. Otherwise all parameters are optional, and can be mapped - to any desired *user data SGPR register*, with the following restrictions: - - * Draw Index, Vertex Offset, and Instance Offset can only be used by the first - active hardware stage in a graphics pipeline (i.e. where the API vertex - shader runs). - - * Application-controlled user data must be mapped into a contiguous range of - user data registers. - - * The application-controlled user data range supports compaction remapping, so - only *entries* that are actually consumed by the shader must be assigned to - corresponding *registers*. Note that in order to support an efficient runtime - implementation, the remapping must pack *registers* in the same order as - *entries*, with unused *entries* removed. - -.. _pal_global_internal_table: - -Global Internal Table -~~~~~~~~~~~~~~~~~~~~~ - -The global internal table is a table of *shader resource descriptors* (SRDs) -that define how certain engine-wide, runtime-managed resources should be -accessed from a shader. The majority of these resources have HW-defined formats, -and it is up to the compiler to write/read data as required by the target -hardware. - -The following table illustrates the required format: - - .. table:: PAL Global Internal Table - :name: pal-git-table - - ============= ================================ - Offset Description - ============= ================================ - 0-3 Graphics Scratch SRD - 4-7 Compute Scratch SRD - 8-11 ES/GS Ring Output SRD - 12-15 ES/GS Ring Input SRD - 16-19 GS/VS Ring Output #0 - 20-23 GS/VS Ring Output #1 - 24-27 GS/VS Ring Output #2 - 28-31 GS/VS Ring Output #3 - 32-35 GS/VS Ring Input SRD - 36-39 Tessellation Factor Buffer SRD - 40-43 Off-Chip LDS Buffer SRD - 44-47 Off-Chip Param Cache Buffer SRD - 48-51 Sample Position Buffer SRD - 52 vaRange::ShadowDescriptorTable High Bits - ============= ================================ - - The pointer to the global internal table passed to the shader as user data - is a 32-bit pointer. The top 32 bits should be assumed to be the same as - the top 32 bits of the pipeline, so the shader may use the program - counter's top 32 bits. - -.. _pal_call-convention: + =================== ============== ========= ====================================================================== + String Key Value Type Required? Description + =================== ============== ========= ====================================================================== + "amdpal.version" sequence of Required PAL code object metadata (major, minor) version. + 2 integers + "amdpal.pipelines" sequence of Required Per-pipeline metadata. See + map :ref:`amdgpu-amdpal-code-object-pipeline-metadata-map-table` for the + definition of the keys included in that map. + =================== ============== ========= ====================================================================== -Call Convention -~~~~~~~~~~~~~~~ +.. -For graphics use cases, the calling convention is `amdgpu_gfx`. + .. table:: AMDPAL Code Object Pipeline Metadata Map + :name: amdgpu-amdpal-code-object-pipeline-metadata-map-table + + ====================================== ============== ========= =================================================== + String Key Value Type Required? Description + ====================================== ============== ========= =================================================== + ".name" string Source name of the pipeline. + ".type" string Pipeline type, e.g. VsPs. + ".internal_pipeline_hash" sequence of Required Internal compiler hash for this pipeline. Lower + 2 integers 64 bits is the "stable" portion of the hash, used + for e.g. shader replacement lookup. Upper 64 bits + is the "unique" portion of the hash, used for + e.g. pipeline cache lookup. + ".shaders" map of map Per-API shader metadata. See + :ref:`amdgpu-amdpal-code-object-api-shader-metadata-map-table` + for the definition of the keys included in that + map. + ".hardware_stages" map of map Per-hardware stage metadata. See + :ref:`amdgpu-amdpal-code-object-hardware-stage-metadata-map-table` + for the definition of the keys included in that + map. + ".shader_functions" map of map Per-shader function metadata. + ".registers" map Required Hardware register configuration. The map is + implemented as "reg offset" : "value", where the + driver is required to program each specified + register to the corresponding specified value + when executing this pipeline. However, reg + offsets specifying user data registers (e.g., + COMPUTE_USER_DATA_0) need special treatment. See + :ref:`amdgpu-amdpal-code-object-user-data-section` + section for more information. + ".user_data_limit" integer Number of user data entries accessed by this + pipeline. + ".spill_threshold" integer The user data spill threshold. 0xFFFF for + NoUserDataSpilling. + ".uses_viewport_array_index" boolean Indicates whether or not the pipeline uses the + viewport array index feature. Pipelines which use + this feature can render into all 16 viewports, + whereas pipelines which don't use it are + restricted to viewport #0. + ".es_gs_lds_size" integer Amount of LDS space used internally for handling + data-passing between the ES and GS shader stages. + This can be zero if the data is passed using + off-chip buffers. This value should be used to + program all user-SGPRs which have been marked + with "UserDataMapping::EsGsLdsSize" (typically + only the GS and VS HW stages will ever have a + user-SGPR so marked). + ".nggSubgroupSize" integer Explicit max subgroup size for NGG shaders (max + number of threads in a subgroup). + ".num_interpolants" integer Graphics only. Number of PS interpolants. + ".api" string Name of the client graphics API. + ".api_create_info" binary Graphics API shader create info binary blob. + ====================================== ============== ========= =================================================== -.. note:: +.. + + .. table:: AMDPAL Code Object API Shader Metadata Map + :name: amdgpu-amdpal-code-object-api-shader-metadata-map-table + + ==================== ============== ========= ===================================================================== + String Key Value Type Required? Description + ==================== ============== ========= ===================================================================== + ".api_shader_hash" sequence of Required Input shader hash, typically passed in from the client. + 2 integers + ".hardware_mapping" sequence of Required Flags indicating the HW stages this API shader maps to. + string + ==================== ============== ========= ===================================================================== + +.. + + .. table:: AMDPAL Code Object Hardware Stage Metadata Map + :name: amdgpu-amdpal-code-object-hardware-stage-metadata-map-table + + ========================== ============== ========= =============================================================== + String Key Value Type Required? Description + ========================== ============== ========= =============================================================== + ".entry_point" string The ELF symbol pointing to this pipeline's stage entry point. + ".scratch_memory_size" integer Scratch memory size in bytes. + ".lds_size" integer Local Data Share size in bytes. + ".perf_data_buffer_size" integer Performance data buffer size in bytes. + ".vgpr_count" integer Number of VGPRs used. + ".sgpr_count" integer Number of SGPRs used. + ".vgpr_limit" integer VGPR count upper limit (only set if different from HW + default). + ".sgpr_limit" integer SGPR count upper limit (only set if different from HW + default). + ".threadgroup_dimensions" sequence of Compute only. Thread-group X/Y/Z dimensions. + 3 integers + ".wavefront_size" integer Wavefront size (only set if different from HW default). + ".uses_uavs" boolean The shader reads or writes UAV(s). + ".uses_rovs" boolean The shader reads or writes ROV(s). + ".writes_uavs" boolean The shader writes to one or more UAVs. + ".writes_depth" boolean The shader writes out a depth value. + ".uses_append_consume" boolean The shader uses append or consume ops. + ========================== ============== ========= =============================================================== + +.. _amdgpu-amdpal-code-object-user-data-section: + +User Data ++++++++++ - `amdgpu_gfx` Function calls are currently in development and are - subject to major changes. +Each hardware stage has a set of 32-bit physical SPI *user data registers* +(either 16 or 32 based on graphics IP and the stage) which can be +written from a command buffer and then loaded into SGPRs when waves are +launched via a subsequent dispatch or draw operation. This is the way +most arguments are passed from the application/runtime to a hardware +shader. + +PAL abstracts this functionality by exposing a set of 128 *user data +entries* per pipeline a client can use to pass arguments from a command +buffer to one or more shaders in that pipeline. The ELF code object must +specify a mapping from virtualized *user data entries* to physical *user +data registers*, and PAL is responsible for implementing that mapping, +including spilling overflow *user data entries* to memory if needed. + +Since the *user data registers* are GRBM-accessible SPI registers, this +mapping is actually embedded in the **.registers** metadata entry. For +most registers, the value in that map is a literal 32-bit value that +should be written to the register by the driver. However, when the +register is a *user data register* (any USER_DATA register e.g., +SPI_SHADER_USER_DATA_PS_5), the value is instead an encoding that tells +the driver to write either a *user data entry* value or one of several +driver-internal values to the register. This encoding is described in +the following table: + + .. table:: AMDPAL User Data Mapping + :name: amdgpu-amdpal-code-object-metadata-user-data-mapping-table + + ================= ========== =============================================================================== + Name Value Description + ================= ========== =============================================================================== + User Data Entry 0..127 32-bit value of user_data_entry[N] as specified via *CmdSetUserData()* + GlobalTable 0x10000000 32-bit pointer to GPU memory containing the global internal table. + PerShaderTable 0x10000001 32-bit pointer to GPU memory containing the per-shader internal table. See + :ref:`amdgpu-amdpal-code-object-metadata-user-data-per-shader-table-section` + for more detail. + SpillTable 0x10000002 32-bit pointer to GPU memory containing the user data spill table. See + :ref:`amdgpu-amdpal-code-object-metadata-user-data-spill-table-section` for + more detail. + BaseVertex 0x10000003 Vertex offset (32-bit unsigned integer). Not needed if the pipeline doesn't + reference the draw index in the vertex shader. Only supported by the first + stage in a graphics pipeline. + BaseInstance 0x10000004 Instance offset (32-bit unsigned integer). Only supported by the first stage in + a graphics pipeline. + DrawIndex 0x10000005 Draw index (32-bit unsigned integer). Only supported by the first stage in a + graphics pipeline. + Workgroup 0x10000006 Thread group count (32-bit unsigned integer). Low half of a 64-bit address of + a buffer containing the grid dimensions for a Compute dispatch operation. The + high half of the address is stored in the next sequential user-SGPR. Only + supported by compute pipelines. + EsGsLdsSize 0x1000000A Indicates that PAL will program this user-SGPR to contain the amount of LDS + space used for the ES/GS pseudo-ring-buffer for passing data between shader + stages. + ViewId 0x1000000B View id (32-bit unsigned integer) identifies a view of graphic + pipeline instancing. + StreamOutTable 0x1000000C 32-bit pointer to GPU memory containing the stream out target SRD table. This + can only appear for one shader stage per pipeline. + PerShaderPerfData 0x1000000D 32-bit pointer to GPU memory containing the per-shader performance data buffer. + VertexBufferTable 0x1000000F 32-bit pointer to GPU memory containing the vertex buffer SRD table. This can + only appear for one shader stage per pipeline. + UavExportTable 0x10000010 32-bit pointer to GPU memory containing the UAV export SRD table. This can + only appear for one shader stage per pipeline (PS). These replace color targets + and are completely separate from any UAVs used by the shader. This is optional, + and only used by the PS when UAV exports are used to replace color-target + exports to optimize specific shaders. + NggCullingData 0x10000011 64-bit pointer to GPU memory containing the hardware register data needed by + some NGG pipelines to perform culling. This value contains the address of the + first of two consecutive registers which provide the full GPU address. + FetchShaderPtr 0x10000015 64-bit pointer to GPU memory containing the fetch shader subroutine. + ================= ========== =============================================================================== + +.. _amdgpu-amdpal-code-object-metadata-user-data-per-shader-table-section: + +Per Shader Table +################ + +Low 32 bits of the GPU address for an optional buffer in the **.data** +section of the ELF. The high 32 bits of the address match the high 32 bits +of the shader's program counter. -This calling convention shares most properties with calling non-kernel -functions (see -:ref:`amdgpu-amdhsa-function-call-convention-non-kernel-functions`). -Differences are: +.. note:: - - Currently there are none, differences will be listed here + Each shader's table in the **.data** section is referenced by the symbol + "_amdgpu_xs_shdr_intrl_data" where "xs" corresponds with the hardware + shader stage the data is for. E.g., **_amdgpu_cs_shdr_intrl_data** for + the CS hardware stage. + +.. _amdgpu-amdpal-code-object-metadata-user-data-spill-table-section: + +Spill Table +########### + +It is possible for a hardware shader to need access to more user data +entries than there are slots available in user data registers for one +or more hardware shader stages. In that case, the PAL runtime expects +the necessary *user data entries* to be spilled to GPU memory and use +one user data register to point to the spilled user data memory. The +value of the *user data entry* must then represent the location where +a shader expects to read the low 32-bits of the table's GPU virtual +address. The *spill table* itself represents a set of 32-bit values +managed by the PAL runtime in GPU-accessible memory that can be made +indirectly accessible to a hardware shader. Unspecified OS --------------