Index: docs/AMDGPUUsage.rst
===================================================================
--- docs/AMDGPUUsage.rst
+++ docs/AMDGPUUsage.rst
@@ -679,17 +679,18 @@
``desc`` field size is a multiple of 4 bytes. The ``sh_addralign`` field of the
``.note`` section must be at least 4 to indicate at least 8 byte alignment.
-The AMDGPU backend code object uses the following ELF note records in the
-``.note`` section. The *Description* column specifies the layout of the note
-record's ``desc`` field. All fields are consecutive bytes. Note records with
-variable size strings have a corresponding ``*_size`` field that specifies the
-number of bytes, including the terminating null character, in the string. The
-string(s) come immediately after the preceding fields.
+.. _amdgpu-note-records-v2:
+
+Code Object V2 Note Records (-mattr=-code-object-v3)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The AMDGPU backend code object uses the following ELF note record in the
+``.note`` section.
Additional note records can be present.
- .. table:: AMDGPU ELF Note Records
- :name: amdgpu-elf-note-records-table
+ .. table:: AMDGPU Code Object V2 ELF Note Records
+ :name: amdgpu-elf-note-records-table-v2
===== ============================== ======================================
Name Type Description
@@ -699,8 +700,8 @@
..
- .. table:: AMDGPU ELF Note Record Enumeration Values
- :name: amdgpu-elf-note-record-enumeration-values-table
+ .. table:: AMDGPU Code Object V2 ELF Note Record Enumeration Values
+ :name: amdgpu-elf-note-record-enumeration-values-table-v2
============================== =====
Name Value
@@ -714,9 +715,47 @@
Specifies extensible metadata associated with the code objects executed on HSA
[HSA]_ compatible runtimes such as AMD's ROCm [AMD-ROCm]_. It is required when
the target triple OS is ``amdhsa`` (see :ref:`amdgpu-target-triples`). See
- :ref:`amdgpu-amdhsa-code-object-metadata` for the syntax of the code
+ :ref:`amdgpu-amdhsa-code-object-metadata-v2` for the syntax of the code
object metadata string.
+.. _amdgpu-note-records-v3:
+
+Code Object V3 Note Records (-mattr=+code-object-v3)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The AMDGPU backend code object uses the following ELF note record in the
+``.note`` section.
+
+Additional note records can be present.
+
+ .. table:: AMDGPU Code Object V3 ELF Note Records
+ :name: amdgpu-elf-note-records-table-v3
+
+ ======== ============================== ======================================
+ Name Type Description
+ ======== ============================== ======================================
+ "AMDGPU" ``NT_AMDGPU_METADATA`` Metadata in Message Pack [MsgPack]_
+ binary format.
+ ======== ============================== ======================================
+
+..
+
+ .. table:: AMDGPU Code Object V3 ELF Note Record Enumeration Values
+ :name: amdgpu-elf-note-record-enumeration-values-table-v3
+
+ ============================== =====
+ Name Value
+ ============================== =====
+ *reserved* 0-31
+ ``NT_AMDGPU_METADATA`` 32
+ ============================== =====
+
+``NT_AMDGPU_METADATA``
+ Specifies extensible metadata associated with an AMDGPU code
+ object. It is encoded as a map in the Message Pack [MsgPack]_ binary
+ data format. See :ref:`amdgpu-amdhsa-code-object-metadata-v3` for the
+ map keys defined for the ``amdhsa`` OS.
+
.. _amdgpu-symbols:
Symbols
@@ -1009,13 +1048,21 @@
The code object metadata specifies extensible metadata associated with the code
objects executed on HSA [HSA]_ compatible runtimes such as AMD's ROCm
-[AMD-ROCm]_. It is specified by the ``NT_AMD_AMDGPU_HSA_METADATA`` note record
-(see :ref:`amdgpu-note-records`) and is required when the target triple OS is
-``amdhsa`` (see :ref:`amdgpu-target-triples`). It must contain the minimum
-information necessary to support the ROCM kernel queries. For example, the
-segment sizes needed in a dispatch packet. In addition, a high level language
-runtime may require other information to be included. For example, the AMD
-OpenCL runtime records kernel argument information.
+[AMD-ROCm]_. It is specified in a note record (see :ref:`amdgpu-note-records`)
+and is required when the target triple OS is ``amdhsa`` (see
+:ref:`amdgpu-target-triples`). It must contain the minimum information
+necessary to support the ROCM kernel queries. For example, the segment sizes
+needed in a dispatch packet. In addition, a high level language runtime may
+require other information to be included. For example, the AMD OpenCL runtime
+records kernel argument information.
+
+.. _amdgpu-amdhsa-code-object-metadata-v2:
+
+Code Object V2 Metadata (-mattr=-code-object-v3)
+++++++++++++++++++++++++++++++++++++++++++++++++
+
+Code object V2 metadata is specified by the ``NT_AMD_AMDGPU_METADATA`` note
+record (see :ref:`amdgpu-note-records-v2`).
The metadata is specified as a YAML formatted string (see [YAML]_ and
:doc:`YamlIO`).
@@ -1025,7 +1072,7 @@
contain null characters, otherwise it should be.
The metadata is represented as a single YAML document comprised of the mapping
-defined in table :ref:`amdgpu-amdhsa-code-object-metadata-mapping-table` and
+defined in table :ref:`amdgpu-amdhsa-code-object-metadata-map-table-v2` and
referenced tables.
For boolean values, the string values of ``false`` and ``true`` are used for
@@ -1034,8 +1081,8 @@
Additional information can be added to the mappings. To avoid conflicts, any
non-AMD key names should be prefixed by "*vendor-name*.".
- .. table:: AMDHSA Code Object Metadata Mapping
- :name: amdgpu-amdhsa-code-object-metadata-mapping-table
+ .. table:: AMDHSA Code Object V2 Metadata Map
+ :name: amdgpu-amdhsa-code-object-metadata-map-table-v2
========== ============== ========= =======================================
String Key Value Type Required? Description
@@ -1072,14 +1119,14 @@
printf function call.
"Kernels" sequence of Required Sequence of the mappings for each
mapping kernel in the code object. See
- :ref:`amdgpu-amdhsa-code-object-kernel-metadata-mapping-table`
+ :ref:`amdgpu-amdhsa-code-object-kernel-metadata-map-table-v2`
for the definition of the mapping.
========== ============== ========= =======================================
..
- .. table:: AMDHSA Code Object Kernel Metadata Mapping
- :name: amdgpu-amdhsa-code-object-kernel-metadata-mapping-table
+ .. table:: AMDHSA Code Object V2 Kernel Metadata Map
+ :name: amdgpu-amdhsa-code-object-kernel-metadata-map-table-v2
================= ============== ========= ================================
String Key Value Type Required? Description
@@ -1101,22 +1148,22 @@
minor version.
"Attrs" mapping Mapping of kernel attributes.
See
- :ref:`amdgpu-amdhsa-code-object-kernel-attribute-metadata-mapping-table`
+ :ref:`amdgpu-amdhsa-code-object-kernel-attribute-metadata-map-table-v2`
for the mapping definition.
"Args" sequence of Sequence of mappings of the
mapping kernel arguments. See
- :ref:`amdgpu-amdhsa-code-object-kernel-argument-metadata-mapping-table`
+ :ref:`amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table-v2`
for the definition of the mapping.
"CodeProps" mapping Mapping of properties related to
the kernel code. See
- :ref:`amdgpu-amdhsa-code-object-kernel-code-properties-metadata-mapping-table`
+ :ref:`amdgpu-amdhsa-code-object-kernel-code-properties-metadata-map-table-v2`
for the mapping definition.
================= ============== ========= ================================
..
- .. table:: AMDHSA Code Object Kernel Attribute Metadata Mapping
- :name: amdgpu-amdhsa-code-object-kernel-attribute-metadata-mapping-table
+ .. table:: AMDHSA Code Object V2 Kernel Attribute Metadata Map
+ :name: amdgpu-amdhsa-code-object-kernel-attribute-metadata-map-table-v2
=================== ============== ========= ==============================
String Key Value Type Required? Description
@@ -1156,8 +1203,8 @@
..
- .. table:: AMDHSA Code Object Kernel Argument Metadata Mapping
- :name: amdgpu-amdhsa-code-object-kernel-argument-metadata-mapping-table
+ .. table:: AMDHSA Code Object V2 Kernel Argument Metadata Map
+ :name: amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table-v2
================= ============== ========= ================================
String Key Value Type Required? Description
@@ -1354,8 +1401,8 @@
..
- .. table:: AMDHSA Code Object Kernel Code Properties Metadata Mapping
- :name: amdgpu-amdhsa-code-object-kernel-code-properties-metadata-mapping-table
+ .. table:: AMDHSA Code Object V2 Kernel Code Properties Metadata Map
+ :name: amdgpu-amdhsa-code-object-kernel-code-properties-metadata-map-table-v2
============================ ============== ========= =====================
String Key Value Type Required? Description
@@ -1433,6 +1480,414 @@
location.
============================ ============== ========= =====================
+.. _amdgpu-amdhsa-code-object-metadata-v3:
+
+Code Object V3 Metadata (-mattr=+code-object-v3)
+++++++++++++++++++++++++++++++++++++++++++++++++
+
+Code object V3 metadata is specified by the ``NT_AMDGPU_METADATA`` note record
+(see :ref:`amdgpu-note-records-v3`).
+
+The metadata is represented as Message Pack formatted binary data (see
+[MsgPack]_). The top level is a Message Pack map that includes the
+keys defined in table
+:ref:`amdgpu-amdhsa-code-object-metadata-map-table-v3` and referenced
+tables.
+
+Additional information can be added to the maps. To avoid conflicts,
+any key names should be prefixed by "*vendor-name*." where
+``vendor-name`` can be the the name of the vendor and specific vendor
+tool that generates the information. The prefix is abbreviated to
+simply "." when it appears within a map that has been added by the
+same *vendor-name*.
+
+ .. table:: AMDHSA Code Object V3 Metadata Map
+ :name: amdgpu-amdhsa-code-object-metadata-map-table-v3
+
+ ================= ============== ========= =======================================
+ String Key Value Type Required? Description
+ ================= ============== ========= =======================================
+ "amdhsa.version" sequence of Required - The first integer is the major
+ 2 integers version. Currently 1.
+ - The second integer is the minor
+ version. Currently 0.
+ "amdhsa.printf" sequence of Each string is encoded information
+ strings about a printf function call. The
+ encoded information is organized as
+ fields separated by colon (':'):
+
+ ``ID:N:S[0]:S[1]:...:S[N-1]:FormatString``
+
+ where:
+
+ ``ID``
+ A 32 bit integer as a unique id for
+ each printf function call
+
+ ``N``
+ A 32 bit integer equal to the number
+ of arguments of printf function call
+ minus 1
+
+ ``S[i]`` (where i = 0, 1, ... , N-1)
+ 32 bit integers for the size in bytes
+ of the i-th FormatString argument of
+ the printf function call
+
+ FormatString
+ The format string passed to the
+ printf function call.
+ "amdhsa.kernels" sequence of Required Sequence of the maps for each
+ map kernel in the code object. See
+ :ref:`amdgpu-amdhsa-code-object-kernel-metadata-map-table-v3`
+ for the definition of the keys included
+ in that map.
+ ================= ============== ========= =======================================
+
+..
+
+ .. table:: AMDHSA Code Object V3 Kernel Metadata Map
+ :name: amdgpu-amdhsa-code-object-kernel-metadata-map-table-v3
+
+ =================================== ============== ========= ================================
+ String Key Value Type Required? Description
+ =================================== ============== ========= ================================
+ ".name" string Required Source name of the kernel.
+ ".symbol" string Required Name of the kernel
+ descriptor ELF symbol.
+ ".language" string Source language of the kernel.
+ Values include:
+
+ - "OpenCL C"
+ - "OpenCL C++"
+ - "HCC"
+ - "HIP"
+ - "OpenMP"
+ - "Assembler"
+
+ ".language_version" sequence of - The first integer is the major
+ 2 integers version.
+ - The second integer is the
+ minor version.
+ ".args" sequence of Sequence of maps of the
+ map kernel arguments. See
+ :ref:`amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table-v3`
+ for the definition of the keys
+ included in that map.
+ ".reqd_workgroup_size" sequence of If not 0, 0, 0 then all values
+ 3 integers must be >=1 and the dispatch
+ work-group size X, Y, Z must
+ correspond to the specified
+ values. Defaults to 0, 0, 0.
+
+ Corresponds to the OpenCL
+ ``reqd_work_group_size``
+ attribute.
+ ".workgroup_size_hint" sequence of The dispatch work-group size
+ 3 integers X, Y, Z is likely to be the
+ specified values.
+
+ Corresponds to the OpenCL
+ ``work_group_size_hint``
+ attribute.
+ ".vec_type_hint" string The name of a scalar or vector
+ type.
+
+ Corresponds to the OpenCL
+ ``vec_type_hint`` attribute.
+
+ ".device_enqueue_symbol" string The external symbol name
+ associated with a kernel.
+ OpenCL runtime allocates a
+ global buffer for the symbol
+ and saves the kernel's address
+ to it, which is used for
+ device side enqueueing. Only
+ available for device side
+ enqueued kernels.
+ ".kernarg_segment_size" integer Required The size in bytes of
+ the kernarg segment
+ that holds the values
+ of the arguments to
+ the kernel.
+ ".group_segment_fixed_size" integer Required The amount of group
+ segment memory
+ required by a
+ work-group in
+ bytes. This does not
+ include any
+ dynamically allocated
+ group segment memory
+ that may be added
+ when the kernel is
+ dispatched.
+ ".private_segment_fixed_size" integer Required The amount of fixed
+ private address space
+ memory required for a
+ work-item in
+ bytes. If the kernel
+ uses a dynamic call
+ stack then additional
+ space must be added
+ to this value for the
+ call stack.
+ ".kernarg_segment_align" integer Required The maximum byte
+ alignment of
+ arguments in the
+ kernarg segment. Must
+ be a power of 2.
+ ".wavefront_size" integer Required Wavefront size. Must
+ be a power of 2.
+ ".sgpr_count" integer Required Number of scalar
+ registers required by a
+ wavefront for
+ GFX6-GFX9. A register
+ is required if it is
+ used explicitly, or
+ if a higher numbered
+ register is used
+ explicitly. This
+ includes the special
+ SGPRs for VCC, Flat
+ Scratch (GFX7-GFX9)
+ and XNACK (for
+ GFX8-GFX9). It does
+ not include the 16
+ SGPR added if a trap
+ handler is
+ enabled. It is not
+ rounded up to the
+ allocation
+ granularity.
+ ".vgpr_count" integer Required Number of vector
+ registers required by
+ each work-item for
+ GFX6-GFX9. A register
+ is required if it is
+ used explicitly, or
+ if a higher numbered
+ register is used
+ explicitly.
+ ".max_flat_workgroup_size" integer Required Maximum flat
+ work-group size
+ supported by the
+ kernel in work-items.
+ Must be >=1 and
+ consistent with
+ ReqdWorkGroupSize if
+ not 0, 0, 0.
+ ".sgpr_spill_count" integer Number of stores from
+ a scalar register to
+ a register allocator
+ created spill
+ location.
+ ".vgpr_spill_count" integer Number of stores from
+ a vector register to
+ a register allocator
+ created spill
+ location.
+ =================================== ============== ========= ================================
+
+..
+
+ .. table:: AMDHSA Code Object V3 Kernel Argument Metadata Map
+ :name: amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table-v3
+
+ ====================== ============== ========= ================================
+ String Key Value Type Required? Description
+ ====================== ============== ========= ================================
+ ".name" string Kernel argument name.
+ ".type_name" string Kernel argument type name.
+ ".size" integer Required Kernel argument size in bytes.
+ ".offset" integer Required Kernel argument offset in
+ bytes. The offset must be a
+ multiple of the alignment
+ required by the argument.
+ ".value_kind" string Required Kernel argument kind that
+ specifies how to set up the
+ corresponding argument.
+ Values include:
+
+ "by_value"
+ The argument is copied
+ directly into the kernarg.
+
+ "global_buffer"
+ A global address space pointer
+ to the buffer data is passed
+ in the kernarg.
+
+ "dynamic_shared_pointer"
+ A group address space pointer
+ to dynamically allocated LDS
+ is passed in the kernarg.
+
+ "sampler"
+ A global address space
+ pointer to a S# is passed in
+ the kernarg.
+
+ "image"
+ A global address space
+ pointer to a T# is passed in
+ the kernarg.
+
+ "pipe"
+ A global address space pointer
+ to an OpenCL pipe is passed in
+ the kernarg.
+
+ "queue"
+ A global address space pointer
+ to an OpenCL device enqueue
+ queue is passed in the
+ kernarg.
+
+ "hidden_global_offset_x"
+ The OpenCL grid dispatch
+ global offset for the X
+ dimension is passed in the
+ kernarg.
+
+ "hidden_global_offset_y"
+ The OpenCL grid dispatch
+ global offset for the Y
+ dimension is passed in the
+ kernarg.
+
+ "hidden_global_offset_z"
+ The OpenCL grid dispatch
+ global offset for the Z
+ dimension is passed in the
+ kernarg.
+
+ "hidden_none"
+ An argument that is not used
+ by the kernel. Space needs to
+ be left for it, but it does
+ not need to be set up.
+
+ "hidden_printf_buffer"
+ A global address space pointer
+ to the runtime printf buffer
+ is passed in kernarg.
+
+ "hidden_default_queue"
+ A global address space pointer
+ to the OpenCL device enqueue
+ queue that should be used by
+ the kernel by default is
+ passed in the kernarg.
+
+ "hidden_completion_action"
+ A global address space pointer
+ to help link enqueued kernels into
+ the ancestor tree for determining
+ when the parent kernel has finished.
+
+ ".value_type" string Required Kernel argument value type. Only
+ present if ".value_kind" is
+ "by_value". For vector data
+ types, the value is for the
+ element type. Values include:
+
+ - "struct"
+ - "i8"
+ - "u8"
+ - "i16"
+ - "u16"
+ - "f16"
+ - "i32"
+ - "u32"
+ - "f32"
+ - "i64"
+ - "u64"
+ - "f64"
+
+ .. TODO
+ How can it be determined if a
+ vector type, and what size
+ vector?
+ ".pointee_align" integer Alignment in bytes of pointee
+ type for pointer type kernel
+ argument. Must be a power
+ of 2. Only present if
+ ".value_kind" is
+ "dynamic_shared_pointer".
+ ".address_space" string Kernel argument address space
+ qualifier. Only present if
+ ".value_kind" is "global_buffer" or
+ "dynamic_shared_pointer". Values
+ are:
+
+ - "private"
+ - "global"
+ - "constant"
+ - "local"
+ - "generic"
+ - "region"
+
+ .. TODO
+ Is "global_buffer" only "global"
+ or "constant"? Is
+ "dynamic_shared_pointer" always
+ "local"? Can HCC allow "generic"?
+ How can "private" or "region"
+ ever happen?
+ ".access" string Kernel argument access
+ qualifier. Only present if
+ ".value_kind" is "image" or
+ "pipe". Values
+ are:
+
+ - "read_only"
+ - "write_only"
+ - "read_write"
+
+ .. TODO
+ Does this apply to
+ "global_buffer"?
+ ".actual_access" string The actual memory accesses
+ performed by the kernel on the
+ kernel argument. Only present if
+ ".value_kind" is "global_buffer",
+ "image", or "pipe". This may be
+ more restrictive than indicated
+ by ".access" to reflect what the
+ kernel actual does. If not
+ present then the runtime must
+ assume what is implied by
+ ".access" and ".is_const" . Values
+ are:
+
+ - "read_only"
+ - "write_only"
+ - "read_write"
+
+ ".is_const" boolean Indicates if the kernel argument
+ is const qualified. Only present
+ if ".value_kind" is
+ "global_buffer".
+
+ ".is_restrict" boolean Indicates if the kernel argument
+ is restrict qualified. Only
+ present if ".value_kind" is
+ "global_buffer".
+
+ ".is_volatile" boolean Indicates if the kernel argument
+ is volatile qualified. Only
+ present if ".value_kind" is
+ "global_buffer".
+
+ ".is_pipe" boolean Indicates if the kernel argument
+ is pipe qualified. Only present
+ if ".value_kind" is "pipe".
+
+ .. TODO
+ Can "global_buffer" be pipe
+ qualified?
+ ====================== ============== ========= ================================
+
..
Kernel Dispatch
@@ -4373,7 +4828,7 @@
- *wavefront_size* defaults to 6.
- *kernarg_segment_alignment*, *group_segment_alignment*, and
*private_segment_alignment* default to 4. Note that alignments are specified
- as a power of two, so a value of **n** means an alignment of 2^ **n**.
+ as a power of 2, so a value of **n** means an alignment of 2^ **n**.
The *.amd_kernel_code_t* directive must be placed immediately after the
function label and before any instructions.
@@ -4586,6 +5041,17 @@
:ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
======================================================== ================ ============ ===================
+.amdgpu_metadata
+++++++++++++++++
+
+Optional directive which declares the contents of the ``NT_AMDGPU_METADATA``
+note record (see :ref:`amdgpu-elf-note-records-table-v3`).
+
+The contents must be in the [YAML]_ markup format, with the same structure and
+semantics described in :ref:`amdgpu-amdhsa-code-object-metadata-v3`.
+
+This directive is terminated by an ``.end_amdgpu_metadata`` directive.
+
Example HSA Source Code (-mattr=+code-object-v3)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -4618,6 +5084,24 @@
.amdhsa_next_free_sgpr .amdgcn.next_free_sgpr
.end_amdhsa_kernel
+ .amdgpu_metadata
+ ---
+ amdhsa.version:
+ - 1
+ - 0
+ amdhsa.kernels:
+ - .name: hello_world
+ .symbol: hello_world.kd
+ .kernarg_segment_size: 48
+ .group_segment_fixed_size: 0
+ .private_segment_fixed_size: 0
+ .kernarg_segment_align: 4
+ .wavefront_size: 64
+ .sgpr_count: 2
+ .vgpr_count: 3
+ .max_flat_workgroup_size: 256
+ ...
+ .end_amdgpu_metadata
Additional Documentation
========================
@@ -4636,6 +5120,7 @@
.. [ELF] `Executable and Linkable Format (ELF) `__
.. [DWARF] `DWARF Debugging Information Format `__
.. [YAML] `YAML Ain't Markup Language (YAML™) Version 1.2 `__
+.. [MsgPack] `Message Pack `__
.. [OpenCL] `The OpenCL Specification Version 2.0 `__
.. [HRF] `Heterogeneous-race-free Memory Models `__
.. [CLANG-ATTR] `Attributes in Clang `__