Index: docs/AMDGPUUsage.rst =================================================================== --- docs/AMDGPUUsage.rst +++ docs/AMDGPUUsage.rst @@ -632,23 +632,20 @@ ``desc`` field size is a multiple of 4 bytes. The ``sh_addralign`` field of the ``.note`` section must be at least 4 to indicate at least 8 byte alignment. -The AMDGPU backend code object uses the following ELF note records in the -``.note`` section. The *Description* column specifies the layout of the note -record's ``desc`` field. All fields are consecutive bytes. Note records with -variable size strings have a corresponding ``*_size`` field that specifies the -number of bytes, including the terminating null character, in the string. The -string(s) come immediately after the preceding fields. +The AMDGPU backend code object uses the following ELF note record in the +``.note`` section. Additional note records can be present. .. table:: AMDGPU ELF Note Records :name: amdgpu-elf-note-records-table - ===== ============================== ====================================== - Name Type Description - ===== ============================== ====================================== - "AMD" ``NT_AMD_AMDGPU_HSA_METADATA`` - ===== ============================== ====================================== + ======== ============================== ====================================== + Name Type Description + ======== ============================== ====================================== + "AMDGPU" ``NT_AMDGPU_METADATA`` Metadata in Message Pack [MsgPack]_ + binary format. + ======== ============================== ====================================== .. @@ -658,17 +655,15 @@ ============================== ===== Name Value ============================== ===== - *reserved* 0-9 - ``NT_AMD_AMDGPU_HSA_METADATA`` 10 - *reserved* 11 + *reserved* 0 + ``NT_AMDGPU_METADATA`` 1 ============================== ===== -``NT_AMD_AMDGPU_HSA_METADATA`` - Specifies extensible metadata associated with the code objects executed on HSA - [HSA]_ compatible runtimes such as AMD's ROCm [AMD-ROCm]_. It is required when - the target triple OS is ``amdhsa`` (see :ref:`amdgpu-target-triples`). See - :ref:`amdgpu-amdhsa-hsa-code-object-metadata` for the syntax of the code - object metadata string. +``NT_AMDGPU_METADATA`` + Specifies extensible metadata associated with an AMDGPU code + object. It is encoded as a map in the Message Pack [MsgPack]_ binary + data format. See :ref:`amdgpu-amdhsa-code-object-metadata` for the + map keys defined for the ``amdhsa`` OS. .. _amdgpu-symbols: @@ -925,7 +920,7 @@ This section provides code conventions used when the target triple OS is ``amdhsa`` (see :ref:`amdgpu-target-triples`). -.. _amdgpu-amdhsa-hsa-code-object-metadata: +.. _amdgpu-amdhsa-code-object-target-identification: Code Object Target Identification ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -955,12 +950,14 @@ ``"amdgcn-amd-amdhsa--gfx902+xnack"`` +.. _amdgpu-amdhsa-code-object-metadata: + Code Object Metadata ~~~~~~~~~~~~~~~~~~~~ The code object metadata specifies extensible metadata associated with the code objects executed on HSA [HSA]_ compatible runtimes such as AMD's ROCm -[AMD-ROCm]_. It is specified by the ``NT_AMD_AMDGPU_HSA_METADATA`` note record +[AMD-ROCm]_. It is specified by the ``NT_AMD_AMDGPU_METADATA`` note record (see :ref:`amdgpu-note-records`) and is required when the target triple OS is ``amdhsa`` (see :ref:`amdgpu-target-triples`). It must contain the minimum information necessary to support the ROCM kernel queries. For example, the @@ -968,421 +965,405 @@ runtime may require other information to be included. For example, the AMD OpenCL runtime records kernel argument information. -The metadata is specified as a YAML formatted string (see [YAML]_ and -:doc:`YamlIO`). +The metadata is represented as Message Pack formatted binary data (see +[MsgPack]_). The top level is a Message Pack map that includes the +keys defined in table +:ref:`amdgpu-amdhsa-code-object-metadata-map-table` and referenced +tables. -.. TODO - Is the string null terminated? It probably should not if YAML allows it to - contain null characters, otherwise it should be. - -The metadata is represented as a single YAML document comprised of the mapping -defined in table :ref:`amdgpu-amdhsa-code-object-metadata-mapping-table` and -referenced tables. - -For boolean values, the string values of ``false`` and ``true`` are used for -false and true respectively. - -Additional information can be added to the mappings. To avoid conflicts, any -non-AMD key names should be prefixed by "*vendor-name*.". - - .. table:: AMDHSA Code Object Metadata Mapping - :name: amdgpu-amdhsa-code-object-metadata-mapping-table - - ========== ============== ========= ======================================= - String Key Value Type Required? Description - ========== ============== ========= ======================================= - "Version" sequence of Required - The first integer is the major - 2 integers version. Currently 1. - - The second integer is the minor - version. Currently 0. - "Printf" sequence of Each string is encoded information - strings about a printf function call. The - encoded information is organized as - fields separated by colon (':'): - - ``ID:N:S[0]:S[1]:...:S[N-1]:FormatString`` - - where: - - ``ID`` - A 32 bit integer as a unique id for - each printf function call - - ``N`` - A 32 bit integer equal to the number - of arguments of printf function call - minus 1 - - ``S[i]`` (where i = 0, 1, ... , N-1) - 32 bit integers for the size in bytes - of the i-th FormatString argument of - the printf function call - - FormatString - The format string passed to the - printf function call. - "Kernels" sequence of Required Sequence of the mappings for each - mapping kernel in the code object. See - :ref:`amdgpu-amdhsa-code-object-kernel-metadata-mapping-table` - for the definition of the mapping. - ========== ============== ========= ======================================= +Additional information can be added to the maps. To avoid conflicts, +any key names should be prefixed by "*vendor-name*." where +``vendor-name`` can be the the name of the vendor and specific vendor +tool that generates the information. The prefix is abbreviated to +simply "." when it appears within a map that has been added by the +same *vendor-name*. -.. + .. table:: AMDHSA Code Object Metadata Map + :name: amdgpu-amdhsa-code-object-metadata-map-table - .. table:: AMDHSA Code Object Kernel Metadata Mapping - :name: amdgpu-amdhsa-code-object-kernel-metadata-mapping-table - - ================= ============== ========= ================================ + ================= ============== ========= ======================================= String Key Value Type Required? Description - ================= ============== ========= ================================ - "Name" string Required Source name of the kernel. - "SymbolName" string Required Name of the kernel - descriptor ELF symbol. - "Language" string Source language of the kernel. - Values include: - - - "OpenCL C" - - "OpenCL C++" - - "HCC" - - "OpenMP" - - "LanguageVersion" sequence of - The first integer is the major - 2 integers version. - - The second integer is the - minor version. - "Attrs" mapping Mapping of kernel attributes. - See - :ref:`amdgpu-amdhsa-code-object-kernel-attribute-metadata-mapping-table` - for the mapping definition. - "Args" sequence of Sequence of mappings of the - mapping kernel arguments. See - :ref:`amdgpu-amdhsa-code-object-kernel-argument-metadata-mapping-table` - for the definition of the mapping. - "CodeProps" mapping Mapping of properties related to - the kernel code. See - :ref:`amdgpu-amdhsa-code-object-kernel-code-properties-metadata-mapping-table` - for the mapping definition. - ================= ============== ========= ================================ + ================= ============== ========= ======================================= + "amdhsa.version" sequence of Required - The first integer is the major + 2 integers version. Currently 1. + - The second integer is the minor + version. Currently 0. + "amdhsa.printf" sequence of Each string is encoded information + strings about a printf function call. The + encoded information is organized as + fields separated by colon (':'): + + ``ID:N:S[0]:S[1]:...:S[N-1]:FormatString`` + + where: + + ``ID`` + A 32 bit integer as a unique id for + each printf function call + + ``N`` + A 32 bit integer equal to the number + of arguments of printf function call + minus 1 + + ``S[i]`` (where i = 0, 1, ... , N-1) + 32 bit integers for the size in bytes + of the i-th FormatString argument of + the printf function call + + FormatString + The format string passed to the + printf function call. + "amdhsa.kernels" sequence of Required Sequence of the maps for each + map kernel in the code object. See + :ref:`amdgpu-amdhsa-code-object-kernel-metadata-map-table` + for the definition of the keys included + in that map. + ================= ============== ========= ======================================= .. - .. table:: AMDHSA Code Object Kernel Attribute Metadata Mapping - :name: amdgpu-amdhsa-code-object-kernel-attribute-metadata-mapping-table - - =================== ============== ========= ============================== - String Key Value Type Required? Description - =================== ============== ========= ============================== - "ReqdWorkGroupSize" sequence of If not 0, 0, 0 then all values - 3 integers must be >=1 and the dispatch - work-group size X, Y, Z must - correspond to the specified - values. Defaults to 0, 0, 0. - - Corresponds to the OpenCL - ``reqd_work_group_size`` - attribute. - "WorkGroupSizeHint" sequence of The dispatch work-group size - 3 integers X, Y, Z is likely to be the - specified values. - - Corresponds to the OpenCL - ``work_group_size_hint`` - attribute. - "VecTypeHint" string The name of a scalar or vector - type. - - Corresponds to the OpenCL - ``vec_type_hint`` attribute. - - "RuntimeHandle" string The external symbol name - associated with a kernel. - OpenCL runtime allocates a - global buffer for the symbol - and saves the kernel's address - to it, which is used for - device side enqueueing. Only - available for device side - enqueued kernels. - =================== ============== ========= ============================== - -.. - - .. table:: AMDHSA Code Object Kernel Argument Metadata Mapping - :name: amdgpu-amdhsa-code-object-kernel-argument-metadata-mapping-table - - ================= ============== ========= ================================ - String Key Value Type Required? Description - ================= ============== ========= ================================ - "Name" string Kernel argument name. - "TypeName" string Kernel argument type name. - "Size" integer Required Kernel argument size in bytes. - "Align" integer Required Kernel argument alignment in - bytes. Must be a power of two. - "ValueKind" string Required Kernel argument kind that - specifies how to set up the - corresponding argument. - Values include: - - "ByValue" - The argument is copied - directly into the kernarg. - - "GlobalBuffer" - A global address space pointer - to the buffer data is passed - in the kernarg. - - "DynamicSharedPointer" - A group address space pointer - to dynamically allocated LDS - is passed in the kernarg. - - "Sampler" - A global address space - pointer to a S# is passed in - the kernarg. - - "Image" - A global address space - pointer to a T# is passed in - the kernarg. - - "Pipe" - A global address space pointer - to an OpenCL pipe is passed in - the kernarg. - - "Queue" - A global address space pointer - to an OpenCL device enqueue - queue is passed in the - kernarg. - - "HiddenGlobalOffsetX" - The OpenCL grid dispatch - global offset for the X - dimension is passed in the - kernarg. - - "HiddenGlobalOffsetY" - The OpenCL grid dispatch - global offset for the Y - dimension is passed in the - kernarg. - - "HiddenGlobalOffsetZ" - The OpenCL grid dispatch - global offset for the Z - dimension is passed in the - kernarg. - - "HiddenNone" - An argument that is not used - by the kernel. Space needs to - be left for it, but it does - not need to be set up. - - "HiddenPrintfBuffer" - A global address space pointer - to the runtime printf buffer - is passed in kernarg. - - "HiddenDefaultQueue" - A global address space pointer - to the OpenCL device enqueue - queue that should be used by - the kernel by default is - passed in the kernarg. - - "HiddenCompletionAction" - A global address space pointer - to help link enqueued kernels into - the ancestor tree for determining - when the parent kernel has finished. - - "ValueType" string Required Kernel argument value type. Only - present if "ValueKind" is - "ByValue". For vector data - types, the value is for the - element type. Values include: - - - "Struct" - - "I8" - - "U8" - - "I16" - - "U16" - - "F16" - - "I32" - - "U32" - - "F32" - - "I64" - - "U64" - - "F64" - - .. TODO - How can it be determined if a - vector type, and what size - vector? - "PointeeAlign" integer Alignment in bytes of pointee - type for pointer type kernel - argument. Must be a power - of 2. Only present if - "ValueKind" is - "DynamicSharedPointer". - "AddrSpaceQual" string Kernel argument address space - qualifier. Only present if - "ValueKind" is "GlobalBuffer" or - "DynamicSharedPointer". Values - are: - - - "Private" - - "Global" - - "Constant" - - "Local" - - "Generic" - - "Region" - - .. TODO - Is GlobalBuffer only Global - or Constant? Is - DynamicSharedPointer always - Local? Can HCC allow Generic? - How can Private or Region - ever happen? - "AccQual" string Kernel argument access - qualifier. Only present if - "ValueKind" is "Image" or - "Pipe". Values - are: - - - "ReadOnly" - - "WriteOnly" - - "ReadWrite" - - .. TODO - Does this apply to - GlobalBuffer? - "ActualAccQual" string The actual memory accesses - performed by the kernel on the - kernel argument. Only present if - "ValueKind" is "GlobalBuffer", - "Image", or "Pipe". This may be - more restrictive than indicated - by "AccQual" to reflect what the - kernel actual does. If not - present then the runtime must - assume what is implied by - "AccQual" and "IsConst". Values - are: - - - "ReadOnly" - - "WriteOnly" - - "ReadWrite" - - "IsConst" boolean Indicates if the kernel argument - is const qualified. Only present - if "ValueKind" is - "GlobalBuffer". - - "IsRestrict" boolean Indicates if the kernel argument - is restrict qualified. Only - present if "ValueKind" is - "GlobalBuffer". - - "IsVolatile" boolean Indicates if the kernel argument - is volatile qualified. Only - present if "ValueKind" is - "GlobalBuffer". - - "IsPipe" boolean Indicates if the kernel argument - is pipe qualified. Only present - if "ValueKind" is "Pipe". - - .. TODO - Can GlobalBuffer be pipe - qualified? - ================= ============== ========= ================================ + .. table:: AMDHSA Code Object Kernel Metadata Map + :name: amdgpu-amdhsa-code-object-kernel-metadata-map-table + + =================================== ============== ========= ================================ + String Key Value Type Required? Description + =================================== ============== ========= ================================ + ".name" string Required Source name of the kernel. + ".symbol" string Required Name of the kernel + descriptor ELF symbol. + ".language" string Source language of the kernel. + Values include: + + - "OpenCL C" + - "OpenCL C++" + - "HCC" + - "HIP" + - "OpenMP" + - "Assembler" + + ".language_version" sequence of - The first integer is the major + 2 integers version. + - The second integer is the + minor version. + ".args" sequence of Sequence of maps of the + map kernel arguments. See + :ref:`amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table` + for the definition of the keys + included in that map. + ".reqd_workgroup_size" sequence of If not 0, 0, 0 then all values + 3 integers must be >=1 and the dispatch + work-group size X, Y, Z must + correspond to the specified + values. Defaults to 0, 0, 0. + + Corresponds to the OpenCL + ``reqd_work_group_size`` + attribute. + ".workgroup_size_hint" sequence of The dispatch work-group size + 3 integers X, Y, Z is likely to be the + specified values. + + Corresponds to the OpenCL + ``work_group_size_hint`` + attribute. + ".vec_type_hint" string The name of a scalar or vector + type. + + Corresponds to the OpenCL + ``vec_type_hint`` attribute. + + ".device_enqueue_symbol" string The external symbol name + associated with a kernel. + OpenCL runtime allocates a + global buffer for the symbol + and saves the kernel's address + to it, which is used for + device side enqueueing. Only + available for device side + enqueued kernels. + ".kernarg_segment_size" integer Required The size in bytes of + the kernarg segment + that holds the values + of the arguments to + the kernel. + ".group_segment_fixed_size" integer Required The amount of group + segment memory + required by a + work-group in + bytes. This does not + include any + dynamically allocated + group segment memory + that may be added + when the kernel is + dispatched. + ".private_segment_fixed_size" integer Required The amount of fixed + private address space + memory required for a + work-item in + bytes. If the kernel + uses a dynamic call + stack then additional + space must be added + to this value for the + call stack. + ".kernarg_segment_align" integer Required The maximum byte + alignment of + arguments in the + kernarg segment. Must + be a power of 2. + ".wavefront_size" integer Required Wavefront size. Must + be a power of 2. + ".sgpr_count" integer Required Number of scalar + registers required by a + wavefront for + GFX6-GFX9. A register + is required if it is + used explicitly, or + if a higher numbered + register is used + explicitly. This + includes the special + SGPRs for VCC, Flat + Scratch (GFX7-GFX9) + and XNACK (for + GFX8-GFX9). It does + not include the 16 + SGPR added if a trap + handler is + enabled. It is not + rounded up to the + allocation + granularity. + ".vgpr_count" integer Required Number of vector + registers required by + each work-item for + GFX6-GFX9. A register + is required if it is + used explicitly, or + if a higher numbered + register is used + explicitly. + ".max_flat_workgroup_size" integer Required Maximum flat + work-group size + supported by the + kernel in work-items. + Must be >=1 and + consistent with + ReqdWorkGroupSize if + not 0, 0, 0. + ".sgpr_spill_count" integer Number of stores from + a scalar register to + a register allocator + created spill + location. + ".vgpr_spill_count" integer Number of stores from + a vector register to + a register allocator + created spill + location. + =================================== ============== ========= ================================ .. - .. table:: AMDHSA Code Object Kernel Code Properties Metadata Mapping - :name: amdgpu-amdhsa-code-object-kernel-code-properties-metadata-mapping-table - - ============================ ============== ========= ===================== - String Key Value Type Required? Description - ============================ ============== ========= ===================== - "KernargSegmentSize" integer Required The size in bytes of - the kernarg segment - that holds the values - of the arguments to - the kernel. - "GroupSegmentFixedSize" integer Required The amount of group - segment memory - required by a - work-group in - bytes. This does not - include any - dynamically allocated - group segment memory - that may be added - when the kernel is - dispatched. - "PrivateSegmentFixedSize" integer Required The amount of fixed - private address space - memory required for a - work-item in - bytes. If the kernel - uses a dynamic call - stack then additional - space must be added - to this value for the - call stack. - "KernargSegmentAlign" integer Required The maximum byte - alignment of - arguments in the - kernarg segment. Must - be a power of 2. - "WavefrontSize" integer Required Wavefront size. Must - be a power of 2. - "NumSGPRs" integer Required Number of scalar - registers used by a - wavefront for - GFX6-GFX9. This - includes the special - SGPRs for VCC, Flat - Scratch (GFX7-GFX9) - and XNACK (for - GFX8-GFX9). It does - not include the 16 - SGPR added if a trap - handler is - enabled. It is not - rounded up to the - allocation - granularity. - "NumVGPRs" integer Required Number of vector - registers used by - each work-item for - GFX6-GFX9 - "MaxFlatWorkGroupSize" integer Required Maximum flat - work-group size - supported by the - kernel in work-items. - Must be >=1 and - consistent with - ReqdWorkGroupSize if - not 0, 0, 0. - "NumSpilledSGPRs" integer Number of stores from - a scalar register to - a register allocator - created spill - location. - "NumSpilledVGPRs" integer Number of stores from - a vector register to - a register allocator - created spill - location. - ============================ ============== ========= ===================== + .. table:: AMDHSA Code Object Kernel Argument Metadata Map + :name: amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table + + ====================== ============== ========= ================================ + String Key Value Type Required? Description + ====================== ============== ========= ================================ + ".name" string Kernel argument name. + ".type_name" string Kernel argument type name. + ".size" integer Required Kernel argument size in bytes. + ".offset" integer Required Kernel argument offset in + bytes. The offset must be a + multiple of the alignment + required by the argument. + ".value_kind" string Required Kernel argument kind that + specifies how to set up the + corresponding argument. + Values include: + + "by_value" + The argument is copied + directly into the kernarg. + + "global_buffer" + A global address space pointer + to the buffer data is passed + in the kernarg. + + "dynamic_shared_pointer" + A group address space pointer + to dynamically allocated LDS + is passed in the kernarg. + + "sampler" + A global address space + pointer to a S# is passed in + the kernarg. + + "image" + A global address space + pointer to a T# is passed in + the kernarg. + + "pipe" + A global address space pointer + to an OpenCL pipe is passed in + the kernarg. + + "queue" + A global address space pointer + to an OpenCL device enqueue + queue is passed in the + kernarg. + + "hidden_global_offset_x" + The OpenCL grid dispatch + global offset for the X + dimension is passed in the + kernarg. + + "hidden_global_offset_y" + The OpenCL grid dispatch + global offset for the Y + dimension is passed in the + kernarg. + + "hidden_global_offset_z" + The OpenCL grid dispatch + global offset for the Z + dimension is passed in the + kernarg. + + "hidden_none" + An argument that is not used + by the kernel. Space needs to + be left for it, but it does + not need to be set up. + + "hidden_printf_buffer" + A global address space pointer + to the runtime printf buffer + is passed in kernarg. + + "hidden_default_queue" + A global address space pointer + to the OpenCL device enqueue + queue that should be used by + the kernel by default is + passed in the kernarg. + + "hidden_completion_action" + A global address space pointer + to help link enqueued kernels into + the ancestor tree for determining + when the parent kernel has finished. + + ".value_type" string Required Kernel argument value type. Only + present if ".value_kind" is + "by_value". For vector data + types, the value is for the + element type. Values include: + + - "struct" + - "i8" + - "u8" + - "i16" + - "u16" + - "f16" + - "i32" + - "u32" + - "f32" + - "i64" + - "u64" + - "f64" + + .. TODO + How can it be determined if a + vector type, and what size + vector? + ".pointee_align" integer Alignment in bytes of pointee + type for pointer type kernel + argument. Must be a power + of 2. Only present if + ".value_kind" is + "dynamic_shared_pointer". + ".address_space" string Kernel argument address space + qualifier. Only present if + ".value_kind" is "global_buffer" or + "dynamic_shared_pointer". Values + are: + + - "private" + - "global" + - "constant" + - "local" + - "generic" + - "region" + + .. TODO + Is "global_buffer" only "global" + or "constant"? Is + "dynamic_shared_pointer" always + "local"? Can HCC allow "generic"? + How can "private" or "region" + ever happen? + ".access" string Kernel argument access + qualifier. Only present if + ".value_kind" is "image" or + "pipe". Values + are: + + - "read_only" + - "write_only" + - "read_write" + + .. TODO + Does this apply to + "global_buffer"? + ".actual_access" string The actual memory accesses + performed by the kernel on the + kernel argument. Only present if + ".value_kind" is "global_buffer", + "image", or "pipe". This may be + more restrictive than indicated + by ".access" to reflect what the + kernel actual does. If not + present then the runtime must + assume what is implied by + ".access" and ".is_const" . Values + are: + + - "read_only" + - "write_only" + - "read_write" + + ".is_const" boolean Indicates if the kernel argument + is const qualified. Only present + if ".value_kind" is + "global_buffer". + + ".is_restrict" boolean Indicates if the kernel argument + is restrict qualified. Only + present if ".value_kind" is + "global_buffer". + + ".is_volatile" boolean Indicates if the kernel argument + is volatile qualified. Only + present if ".value_kind" is + "global_buffer". + + ".is_pipe" boolean Indicates if the kernel argument + is pipe qualified. Only present + if ".value_kind" is "pipe". + + .. TODO + Can "global_buffer" be pipe + qualified? + ====================== ============== ========= ================================ .. @@ -1436,7 +1417,7 @@ such as grid and work-group size, together with information from the code object about the kernel, such as segment sizes. The ROCm runtime queries on the kernel symbol can be used to obtain the code object values which are - recorded in the :ref:`amdgpu-amdhsa-hsa-code-object-metadata`. + recorded in the :ref:`amdgpu-amdhsa-code-object-metadata`. 7. CP executes micro-code and is responsible for detecting and setting up the GPU to execute the wavefronts of a kernel dispatch. 8. CP ensures that when the a wavefront starts executing the kernel machine @@ -3971,7 +3952,7 @@ arguments for the AMDHSA OS (see :ref:`opencl-kernel-implicit-arguments-appended-for-amdhsa-os-table`). 3. Additional metadata is generated - (see :ref:`amdgpu-amdhsa-hsa-code-object-metadata`). + (see :ref:`amdgpu-amdhsa-code-object-metadata`). .. table:: OpenCL kernel implicit arguments appended for AMDHSA OS :name: opencl-kernel-implicit-arguments-appended-for-amdhsa-os-table @@ -4280,7 +4261,7 @@ - *wavefront_size* defaults to 6. - *kernarg_segment_alignment*, *group_segment_alignment*, and *private_segment_alignment* default to 4. Note that alignments are specified - as a power of two, so a value of **n** means an alignment of 2^ **n**. + as a power of 2, so a value of **n** means an alignment of 2^ **n**. The *.amd_kernel_code_t* directive must be placed immediately after the function label and before any instructions. @@ -4339,6 +4320,6 @@ .. [HSA] `Heterogeneous System Architecture (HSA) Foundation `__ .. [ELF] `Executable and Linkable Format (ELF) `__ .. [DWARF] `DWARF Debugging Information Format `__ -.. [YAML] `YAML Ain't Markup Language (YAML™) Version 1.2 `__ +.. [MsgPack] `Message Pack `__ .. [OpenCL] `The OpenCL Specification Version 2.0 `__ .. [HRF] `Heterogeneous-race-free Memory Models `__