Index: docs/AMDGPUUsage.rst
===================================================================
--- docs/AMDGPUUsage.rst
+++ docs/AMDGPUUsage.rst
@@ -632,23 +632,20 @@
 ``desc`` field size is a multiple of 4 bytes. The ``sh_addralign`` field of the
 ``.note`` section must be at least 4 to indicate at least 8 byte alignment.
 
-The AMDGPU backend code object uses the following ELF note records in the
-``.note`` section. The *Description* column specifies the layout of the note
-record's ``desc`` field. All fields are consecutive bytes. Note records with
-variable size strings have a corresponding ``*_size`` field that specifies the
-number of bytes, including the terminating null character, in the string. The
-string(s) come immediately after the preceding fields.
+The AMDGPU backend code object uses the following ELF note record in the
+``.note`` section.
 
 Additional note records can be present.
 
   .. table:: AMDGPU ELF Note Records
      :name: amdgpu-elf-note-records-table
 
-     ===== ============================== ======================================
-     Name  Type                           Description
-     ===== ============================== ======================================
-     "AMD" ``NT_AMD_AMDGPU_HSA_METADATA`` <metadata null terminated string>
-     ===== ============================== ======================================
+     ======== ============================== ======================================
+     Name     Type                           Description
+     ======== ============================== ======================================
+     "AMDGPU" ``NT_AMD_AMDGPU_METADATA``     Metadata in Message Pack [MsgPack]_
+                                             binary format.
+     ======== ============================== ======================================
 
 ..
 
@@ -658,17 +655,15 @@
      ============================== =====
      Name                           Value
      ============================== =====
-     *reserved*                       0-9
-     ``NT_AMD_AMDGPU_HSA_METADATA``    10
-     *reserved*                        11
+     *reserved*                     0
+     ``NT_AMD_AMDGPU_METADATA``     1
      ============================== =====
 
-``NT_AMD_AMDGPU_HSA_METADATA``
-  Specifies extensible metadata associated with the code objects executed on HSA
-  [HSA]_ compatible runtimes such as AMD's ROCm [AMD-ROCm]_. It is required when
-  the target triple OS is ``amdhsa`` (see :ref:`amdgpu-target-triples`). See
-  :ref:`amdgpu-amdhsa-hsa-code-object-metadata` for the syntax of the code
-  object metadata string.
+``NT_AMD_AMDGPU_METADATA``
+  Specifies extensible metadata associated with the code object. It is
+  encoded as a map in the Message Pack [MsgPack]_ binary data
+  format. See :ref:`amdgpu-amdhsa-code-object-metadata` for the map
+  keys defined for the ``amdhsa`` OS.
 
 .. _amdgpu-symbols:
 
@@ -925,7 +920,7 @@
 This section provides code conventions used when the target triple OS is
 ``amdhsa`` (see :ref:`amdgpu-target-triples`).
 
-.. _amdgpu-amdhsa-hsa-code-object-metadata:
+.. _amdgpu-amdhsa-code-object-target-identification:
 
 Code Object Target Identification
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -955,12 +950,14 @@
 
   ``"amdgcn-amd-amdhsa--gfx902+xnack"``
 
+.. _amdgpu-amdhsa-code-object-metadata:
+
 Code Object Metadata
 ~~~~~~~~~~~~~~~~~~~~
 
 The code object metadata specifies extensible metadata associated with the code
 objects executed on HSA [HSA]_ compatible runtimes such as AMD's ROCm
-[AMD-ROCm]_. It is specified by the ``NT_AMD_AMDGPU_HSA_METADATA`` note record
+[AMD-ROCm]_. It is specified by the ``NT_AMD_AMDGPU_METADATA`` note record
 (see :ref:`amdgpu-note-records`) and is required when the target triple OS is
 ``amdhsa`` (see :ref:`amdgpu-target-triples`). It must contain the minimum
 information necessary to support the ROCM kernel queries. For example, the
@@ -968,421 +965,405 @@
 runtime may require other information to be included. For example, the AMD
 OpenCL runtime records kernel argument information.
 
-The metadata is specified as a YAML formatted string (see [YAML]_ and
-:doc:`YamlIO`).
+The metadata is represented as Message Pack formatted binary data (see
+[MsgPack]_). The top level is a Message Pack map that includes the
+keys defined in table
+:ref:`amdgpu-amdhsa-code-object-metadata-map-table` and referenced
+tables.
 
-.. TODO
-   Is the string null terminated? It probably should not if YAML allows it to
-   contain null characters, otherwise it should be.
-
-The metadata is represented as a single YAML document comprised of the mapping
-defined in table :ref:`amdgpu-amdhsa-code-object-metadata-mapping-table` and
-referenced tables.
-
-For boolean values, the string values of ``false`` and ``true`` are used for
-false and true respectively.
-
-Additional information can be added to the mappings. To avoid conflicts, any
-non-AMD key names should be prefixed by "*vendor-name*.".
-
-  .. table:: AMDHSA Code Object Metadata Mapping
-     :name: amdgpu-amdhsa-code-object-metadata-mapping-table
-
-     ========== ============== ========= =======================================
-     String Key Value Type     Required? Description
-     ========== ============== ========= =======================================
-     "Version"  sequence of    Required  - The first integer is the major
-                2 integers                 version. Currently 1.
-                                         - The second integer is the minor
-                                           version. Currently 0.
-     "Printf"   sequence of              Each string is encoded information
-                strings                  about a printf function call. The
-                                         encoded information is organized as
-                                         fields separated by colon (':'):
-
-                                         ``ID:N:S[0]:S[1]:...:S[N-1]:FormatString``
-
-                                         where:
-
-                                         ``ID``
-                                           A 32 bit integer as a unique id for
-                                           each printf function call
-
-                                         ``N``
-                                           A 32 bit integer equal to the number
-                                           of arguments of printf function call
-                                           minus 1
-
-                                         ``S[i]`` (where i = 0, 1, ... , N-1)
-                                           32 bit integers for the size in bytes
-                                           of the i-th FormatString argument of
-                                           the printf function call
-
-                                         FormatString
-                                           The format string passed to the
-                                           printf function call.
-     "Kernels"  sequence of    Required  Sequence of the mappings for each
-                mapping                  kernel in the code object. See
-                                         :ref:`amdgpu-amdhsa-code-object-kernel-metadata-mapping-table`
-                                         for the definition of the mapping.
-     ========== ============== ========= =======================================
+Additional information can be added to the maps. To avoid conflicts,
+any key names should be prefixed by "*vendor-name*." where
+``vendor-name`` can be the the name of the vendor and specific vendor
+tool that generates the information. The prefix is abbreviated to
+simply "." when it appears within a map that has been added by the
+same *vendor-name*.
 
-..
+  .. table:: AMDHSA Code Object Metadata Map
+     :name: amdgpu-amdhsa-code-object-metadata-map-table
 
-  .. table:: AMDHSA Code Object Kernel Metadata Mapping
-     :name: amdgpu-amdhsa-code-object-kernel-metadata-mapping-table
-
-     ================= ============== ========= ================================
+     ================= ============== ========= =======================================
      String Key        Value Type     Required? Description
-     ================= ============== ========= ================================
-     "Name"            string         Required  Source name of the kernel.
-     "SymbolName"      string         Required  Name of the kernel
-                                                descriptor ELF symbol.
-     "Language"        string                   Source language of the kernel.
-                                                Values include:
-
-                                                - "OpenCL C"
-                                                - "OpenCL C++"
-                                                - "HCC"
-                                                - "OpenMP"
-
-     "LanguageVersion" sequence of              - The first integer is the major
-                       2 integers                 version.
-                                                - The second integer is the
-                                                  minor version.
-     "Attrs"           mapping                  Mapping of kernel attributes.
-                                                See
-                                                :ref:`amdgpu-amdhsa-code-object-kernel-attribute-metadata-mapping-table`
-                                                for the mapping definition.
-     "Args"            sequence of              Sequence of mappings of the
-                       mapping                  kernel arguments. See
-                                                :ref:`amdgpu-amdhsa-code-object-kernel-argument-metadata-mapping-table`
-                                                for the definition of the mapping.
-     "CodeProps"       mapping                  Mapping of properties related to
-                                                the kernel code. See
-                                                :ref:`amdgpu-amdhsa-code-object-kernel-code-properties-metadata-mapping-table`
-                                                for the mapping definition.
-     ================= ============== ========= ================================
+     ================= ============== ========= =======================================
+     "amdhsa.version"  sequence of    Required  - The first integer is the major
+                       2 integers                 version. Currently 1.
+                                                - The second integer is the minor
+                                                  version. Currently 0.
+     "amdhsa.printf"   sequence of              Each string is encoded information
+                       strings                  about a printf function call. The
+                                                encoded information is organized as
+                                                fields separated by colon (':'):
+
+                                                ``ID:N:S[0]:S[1]:...:S[N-1]:FormatString``
+
+                                                where:
+
+                                                ``ID``
+                                                  A 32 bit integer as a unique id for
+                                                  each printf function call
+
+                                                ``N``
+                                                  A 32 bit integer equal to the number
+                                                  of arguments of printf function call
+                                                  minus 1
+
+                                                ``S[i]`` (where i = 0, 1, ... , N-1)
+                                                  32 bit integers for the size in bytes
+                                                  of the i-th FormatString argument of
+                                                  the printf function call
+
+                                                FormatString
+                                                  The format string passed to the
+                                                  printf function call.
+     "amdhsa.kernels"  sequence of    Required  Sequence of the maps for each
+                       map                      kernel in the code object. See
+                                                :ref:`amdgpu-amdhsa-code-object-kernel-metadata-map-table`
+                                                for the definition of the keys included
+                                                in that map.
+     ================= ============== ========= =======================================
 
 ..
 
-  .. table:: AMDHSA Code Object Kernel Attribute Metadata Mapping
-     :name: amdgpu-amdhsa-code-object-kernel-attribute-metadata-mapping-table
-
-     =================== ============== ========= ==============================
-     String Key          Value Type     Required? Description
-     =================== ============== ========= ==============================
-     "ReqdWorkGroupSize" sequence of              If not 0, 0, 0 then all values
-                         3 integers               must be >=1 and the dispatch
-                                                  work-group size X, Y, Z must
-                                                  correspond to the specified
-                                                  values. Defaults to 0, 0, 0.
-
-                                                  Corresponds to the OpenCL
-                                                  ``reqd_work_group_size``
-                                                  attribute.
-     "WorkGroupSizeHint" sequence of              The dispatch work-group size
-                         3 integers               X, Y, Z is likely to be the
-                                                  specified values.
-
-                                                  Corresponds to the OpenCL
-                                                  ``work_group_size_hint``
-                                                  attribute.
-     "VecTypeHint"       string                   The name of a scalar or vector
-                                                  type.
-
-                                                  Corresponds to the OpenCL
-                                                  ``vec_type_hint`` attribute.
-
-     "RuntimeHandle"     string                   The external symbol name
-                                                  associated with a kernel.
-                                                  OpenCL runtime allocates a
-                                                  global buffer for the symbol
-                                                  and saves the kernel's address
-                                                  to it, which is used for
-                                                  device side enqueueing. Only
-                                                  available for device side
-                                                  enqueued kernels.
-     =================== ============== ========= ==============================
-
-..
-
-  .. table:: AMDHSA Code Object Kernel Argument Metadata Mapping
-     :name: amdgpu-amdhsa-code-object-kernel-argument-metadata-mapping-table
-
-     ================= ============== ========= ================================
-     String Key        Value Type     Required? Description
-     ================= ============== ========= ================================
-     "Name"            string                   Kernel argument name.
-     "TypeName"        string                   Kernel argument type name.
-     "Size"            integer        Required  Kernel argument size in bytes.
-     "Align"           integer        Required  Kernel argument alignment in
-                                                bytes. Must be a power of two.
-     "ValueKind"       string         Required  Kernel argument kind that
-                                                specifies how to set up the
-                                                corresponding argument.
-                                                Values include:
-
-                                                "ByValue"
-                                                  The argument is copied
-                                                  directly into the kernarg.
-
-                                                "GlobalBuffer"
-                                                  A global address space pointer
-                                                  to the buffer data is passed
-                                                  in the kernarg.
-
-                                                "DynamicSharedPointer"
-                                                  A group address space pointer
-                                                  to dynamically allocated LDS
-                                                  is passed in the kernarg.
-
-                                                "Sampler"
-                                                  A global address space
-                                                  pointer to a S# is passed in
-                                                  the kernarg.
-
-                                                "Image"
-                                                  A global address space
-                                                  pointer to a T# is passed in
-                                                  the kernarg.
-
-                                                "Pipe"
-                                                  A global address space pointer
-                                                  to an OpenCL pipe is passed in
-                                                  the kernarg.
-
-                                                "Queue"
-                                                  A global address space pointer
-                                                  to an OpenCL device enqueue
-                                                  queue is passed in the
-                                                  kernarg.
-
-                                                "HiddenGlobalOffsetX"
-                                                  The OpenCL grid dispatch
-                                                  global offset for the X
-                                                  dimension is passed in the
-                                                  kernarg.
-
-                                                "HiddenGlobalOffsetY"
-                                                  The OpenCL grid dispatch
-                                                  global offset for the Y
-                                                  dimension is passed in the
-                                                  kernarg.
-
-                                                "HiddenGlobalOffsetZ"
-                                                  The OpenCL grid dispatch
-                                                  global offset for the Z
-                                                  dimension is passed in the
-                                                  kernarg.
-
-                                                "HiddenNone"
-                                                  An argument that is not used
-                                                  by the kernel. Space needs to
-                                                  be left for it, but it does
-                                                  not need to be set up.
-
-                                                "HiddenPrintfBuffer"
-                                                  A global address space pointer
-                                                  to the runtime printf buffer
-                                                  is passed in kernarg.
-
-                                                "HiddenDefaultQueue"
-                                                  A global address space pointer
-                                                  to the OpenCL device enqueue
-                                                  queue that should be used by
-                                                  the kernel by default is
-                                                  passed in the kernarg.
-
-                                                "HiddenCompletionAction"
-                                                  A global address space pointer
-                                                  to help link enqueued kernels into
-                                                  the ancestor tree for determining
-                                                  when the parent kernel has finished.
-
-     "ValueType"       string         Required  Kernel argument value type. Only
-                                                present if "ValueKind" is
-                                                "ByValue". For vector data
-                                                types, the value is for the
-                                                element type. Values include:
-
-                                                - "Struct"
-                                                - "I8"
-                                                - "U8"
-                                                - "I16"
-                                                - "U16"
-                                                - "F16"
-                                                - "I32"
-                                                - "U32"
-                                                - "F32"
-                                                - "I64"
-                                                - "U64"
-                                                - "F64"
-
-                                                .. TODO
-                                                   How can it be determined if a
-                                                   vector type, and what size
-                                                   vector?
-     "PointeeAlign"    integer                  Alignment in bytes of pointee
-                                                type for pointer type kernel
-                                                argument. Must be a power
-                                                of 2. Only present if
-                                                "ValueKind" is
-                                                "DynamicSharedPointer".
-     "AddrSpaceQual"   string                   Kernel argument address space
-                                                qualifier. Only present if
-                                                "ValueKind" is "GlobalBuffer" or
-                                                "DynamicSharedPointer". Values
-                                                are:
-
-                                                - "Private"
-                                                - "Global"
-                                                - "Constant"
-                                                - "Local"
-                                                - "Generic"
-                                                - "Region"
-
-                                                .. TODO
-                                                   Is GlobalBuffer only Global
-                                                   or Constant? Is
-                                                   DynamicSharedPointer always
-                                                   Local? Can HCC allow Generic?
-                                                   How can Private or Region
-                                                   ever happen?
-     "AccQual"         string                   Kernel argument access
-                                                qualifier. Only present if
-                                                "ValueKind" is "Image" or
-                                                "Pipe". Values
-                                                are:
-
-                                                - "ReadOnly"
-                                                - "WriteOnly"
-                                                - "ReadWrite"
-
-                                                .. TODO
-                                                   Does this apply to
-                                                   GlobalBuffer?
-     "ActualAccQual"   string                   The actual memory accesses
-                                                performed by the kernel on the
-                                                kernel argument. Only present if
-                                                "ValueKind" is "GlobalBuffer",
-                                                "Image", or "Pipe". This may be
-                                                more restrictive than indicated
-                                                by "AccQual" to reflect what the
-                                                kernel actual does. If not
-                                                present then the runtime must
-                                                assume what is implied by
-                                                "AccQual" and "IsConst". Values
-                                                are:
-
-                                                - "ReadOnly"
-                                                - "WriteOnly"
-                                                - "ReadWrite"
-
-     "IsConst"         boolean                  Indicates if the kernel argument
-                                                is const qualified. Only present
-                                                if "ValueKind" is
-                                                "GlobalBuffer".
-
-     "IsRestrict"      boolean                  Indicates if the kernel argument
-                                                is restrict qualified. Only
-                                                present if "ValueKind" is
-                                                "GlobalBuffer".
-
-     "IsVolatile"      boolean                  Indicates if the kernel argument
-                                                is volatile qualified. Only
-                                                present if "ValueKind" is
-                                                "GlobalBuffer".
-
-     "IsPipe"          boolean                  Indicates if the kernel argument
-                                                is pipe qualified. Only present
-                                                if "ValueKind" is "Pipe".
-
-                                                .. TODO
-                                                   Can GlobalBuffer be pipe
-                                                   qualified?
-     ================= ============== ========= ================================
+  .. table:: AMDHSA Code Object Kernel Metadata Map
+     :name: amdgpu-amdhsa-code-object-kernel-metadata-map-table
+
+     =================================== ============== ========= ================================
+     String Key                          Value Type     Required? Description
+     =================================== ============== ========= ================================
+     ".name"                             string         Required  Source name of the kernel.
+     ".symbol"                           string         Required  Name of the kernel
+                                                                  descriptor ELF symbol.
+     ".language"                         string                   Source language of the kernel.
+                                                                  Values include:
+
+                                                                  - "OpenCL C"
+                                                                  - "OpenCL C++"
+                                                                  - "HCC"
+                                                                  - "HIP"
+                                                                  - "OpenMP"
+                                                                  - "Assembler"
+
+     ".language_version"                 sequence of              - The first integer is the major
+                                         2 integers                 version.
+                                                                  - The second integer is the
+                                                                    minor version.
+     ".args"                             sequence of              Sequence of maps of the
+                                         map                      kernel arguments. See
+                                                                  :ref:`amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table`
+                                                                  for the definition of the keys
+                                                                  included in that map.
+     ".reqd_workgroup_size"              sequence of              If not 0, 0, 0 then all values
+                                         3 integers               must be >=1 and the dispatch
+                                                                  work-group size X, Y, Z must
+                                                                  correspond to the specified
+                                                                  values. Defaults to 0, 0, 0.
+
+                                                                  Corresponds to the OpenCL
+                                                                  ``reqd_work_group_size``
+                                                                  attribute.
+     ".workgroup_size_hint"              sequence of              The dispatch work-group size
+                                         3 integers               X, Y, Z is likely to be the
+                                                                  specified values.
+
+                                                                  Corresponds to the OpenCL
+                                                                  ``work_group_size_hint``
+                                                                  attribute.
+     ".vec_type_hint"                    string                   The name of a scalar or vector
+                                                                  type.
+
+                                                                  Corresponds to the OpenCL
+                                                                  ``vec_type_hint`` attribute.
+
+     ".device_enqueue_symbol"            string                   The external symbol name
+                                                                  associated with a kernel.
+                                                                  OpenCL runtime allocates a
+                                                                  global buffer for the symbol
+                                                                  and saves the kernel's address
+                                                                  to it, which is used for
+                                                                  device side enqueueing. Only
+                                                                  available for device side
+                                                                  enqueued kernels.
+     ".kernarg_segment_size"             integer        Required  The size in bytes of
+                                                                  the kernarg segment
+                                                                  that holds the values
+                                                                  of the arguments to
+                                                                  the kernel.
+     ".group_segment_fixed_size"         integer        Required  The amount of group
+                                                                  segment memory
+                                                                  required by a
+                                                                  work-group in
+                                                                  bytes. This does not
+                                                                  include any
+                                                                  dynamically allocated
+                                                                  group segment memory
+                                                                  that may be added
+                                                                  when the kernel is
+                                                                  dispatched.
+     ".private_segment_fixed_size"       integer        Required  The amount of fixed
+                                                                  private address space
+                                                                  memory required for a
+                                                                  work-item in
+                                                                  bytes. If the kernel
+                                                                  uses a dynamic call
+                                                                  stack then additional
+                                                                  space must be added
+                                                                  to this value for the
+                                                                  call stack.
+     ".kernarg_segment_align"            integer        Required  The maximum byte
+                                                                  alignment of
+                                                                  arguments in the
+                                                                  kernarg segment. Must
+                                                                  be a power of 2.
+     ".wavefront_size"                   integer        Required  Wavefront size. Must
+                                                                  be a power of 2.
+     ".sgpr_count"                       integer        Required  Number of scalar
+                                                                  registers required by a
+                                                                  wavefront for
+                                                                  GFX6-GFX9. A register
+                                                                  is required if it is
+                                                                  used explicitly, or
+                                                                  if a higher numbered
+                                                                  register is used
+                                                                  explicitly. This
+                                                                  includes the special
+                                                                  SGPRs for VCC, Flat
+                                                                  Scratch (GFX7-GFX9)
+                                                                  and XNACK (for
+                                                                  GFX8-GFX9). It does
+                                                                  not include the 16
+                                                                  SGPR added if a trap
+                                                                  handler is
+                                                                  enabled. It is not
+                                                                  rounded up to the
+                                                                  allocation
+                                                                  granularity.
+     ".vgpr_count"                       integer        Required  Number of vector
+                                                                  registers required by
+                                                                  each work-item for
+                                                                  GFX6-GFX9. A register
+                                                                  is required if it is
+                                                                  used explicitly, or
+                                                                  if a higher numbered
+                                                                  register is used
+                                                                  explicitly. 
+     ".max_flat_workgroup_size"          integer        Required  Maximum flat
+                                                                  work-group size
+                                                                  supported by the
+                                                                  kernel in work-items.
+                                                                  Must be >=1 and
+                                                                  consistent with
+                                                                  ReqdWorkGroupSize if
+                                                                  not 0, 0, 0.
+     ".sgpr_spill_count"                 integer                  Number of stores from
+                                                                  a scalar register to
+                                                                  a register allocator
+                                                                  created spill
+                                                                  location.
+     ".vgpr_spill_count"                 integer                  Number of stores from
+                                                                  a vector register to
+                                                                  a register allocator
+                                                                  created spill
+                                                                  location.
+     =================================== ============== ========= ================================
 
 ..
 
-  .. table:: AMDHSA Code Object Kernel Code Properties Metadata Mapping
-     :name: amdgpu-amdhsa-code-object-kernel-code-properties-metadata-mapping-table
-
-     ============================ ============== ========= =====================
-     String Key                   Value Type     Required? Description
-     ============================ ============== ========= =====================
-     "KernargSegmentSize"         integer        Required  The size in bytes of
-                                                           the kernarg segment
-                                                           that holds the values
-                                                           of the arguments to
-                                                           the kernel.
-     "GroupSegmentFixedSize"      integer        Required  The amount of group
-                                                           segment memory
-                                                           required by a
-                                                           work-group in
-                                                           bytes. This does not
-                                                           include any
-                                                           dynamically allocated
-                                                           group segment memory
-                                                           that may be added
-                                                           when the kernel is
-                                                           dispatched.
-     "PrivateSegmentFixedSize"    integer        Required  The amount of fixed
-                                                           private address space
-                                                           memory required for a
-                                                           work-item in
-                                                           bytes. If the kernel
-                                                           uses a dynamic call
-                                                           stack then additional
-                                                           space must be added
-                                                           to this value for the
-                                                           call stack.
-     "KernargSegmentAlign"        integer        Required  The maximum byte
-                                                           alignment of
-                                                           arguments in the
-                                                           kernarg segment. Must
-                                                           be a power of 2.
-     "WavefrontSize"              integer        Required  Wavefront size. Must
-                                                           be a power of 2.
-     "NumSGPRs"                   integer        Required  Number of scalar
-                                                           registers used by a
-                                                           wavefront for
-                                                           GFX6-GFX9. This
-                                                           includes the special
-                                                           SGPRs for VCC, Flat
-                                                           Scratch (GFX7-GFX9)
-                                                           and XNACK (for
-                                                           GFX8-GFX9). It does
-                                                           not include the 16
-                                                           SGPR added if a trap
-                                                           handler is
-                                                           enabled. It is not
-                                                           rounded up to the
-                                                           allocation
-                                                           granularity.
-     "NumVGPRs"                   integer        Required  Number of vector
-                                                           registers used by
-                                                           each work-item for
-                                                           GFX6-GFX9
-     "MaxFlatWorkGroupSize"       integer        Required  Maximum flat
-                                                           work-group size
-                                                           supported by the
-                                                           kernel in work-items.
-                                                           Must be >=1 and
-                                                           consistent with
-                                                           ReqdWorkGroupSize if
-                                                           not 0, 0, 0.
-     "NumSpilledSGPRs"            integer                  Number of stores from
-                                                           a scalar register to
-                                                           a register allocator
-                                                           created spill
-                                                           location.
-     "NumSpilledVGPRs"            integer                  Number of stores from
-                                                           a vector register to
-                                                           a register allocator
-                                                           created spill
-                                                           location.
-     ============================ ============== ========= =====================
+  .. table:: AMDHSA Code Object Kernel Argument Metadata Map
+     :name: amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table
+
+     ====================== ============== ========= ================================
+     String Key             Value Type     Required? Description
+     ====================== ============== ========= ================================
+     ".name"                string                   Kernel argument name.
+     ".type_name"           string                   Kernel argument type name.
+     ".size"                integer        Required  Kernel argument size in bytes.
+     ".offset"              integer        Required  Kernel argument offset in
+                                                     bytes. The offset must be a
+                                                     multiple of the alignment
+                                                     required by the argument.
+     ".value_kind"          string         Required  Kernel argument kind that
+                                                     specifies how to set up the
+                                                     corresponding argument.
+                                                     Values include:
+
+                                                     "by_value"
+                                                       The argument is copied
+                                                       directly into the kernarg.
+
+                                                     "global_buffer"
+                                                       A global address space pointer
+                                                       to the buffer data is passed
+                                                       in the kernarg.
+
+                                                     "dynamic_shared_pointer"
+                                                       A group address space pointer
+                                                       to dynamically allocated LDS
+                                                       is passed in the kernarg.
+
+                                                     "sampler"
+                                                       A global address space
+                                                       pointer to a S# is passed in
+                                                       the kernarg.
+
+                                                     "image"
+                                                       A global address space
+                                                       pointer to a T# is passed in
+                                                       the kernarg.
+
+                                                     "pipe"
+                                                       A global address space pointer
+                                                       to an OpenCL pipe is passed in
+                                                       the kernarg.
+
+                                                     "queue"
+                                                       A global address space pointer
+                                                       to an OpenCL device enqueue
+                                                       queue is passed in the
+                                                       kernarg.
+
+                                                     "hidden_global_offset_x"
+                                                       The OpenCL grid dispatch
+                                                       global offset for the X
+                                                       dimension is passed in the
+                                                       kernarg.
+
+                                                     "hidden_global_offset_y"
+                                                       The OpenCL grid dispatch
+                                                       global offset for the Y
+                                                       dimension is passed in the
+                                                       kernarg.
+
+                                                     "hidden_global_offset_z"
+                                                       The OpenCL grid dispatch
+                                                       global offset for the Z
+                                                       dimension is passed in the
+                                                       kernarg.
+
+                                                     "hidden_none"
+                                                       An argument that is not used
+                                                       by the kernel. Space needs to
+                                                       be left for it, but it does
+                                                       not need to be set up.
+
+                                                     "hidden_printf_buffer"
+                                                       A global address space pointer
+                                                       to the runtime printf buffer
+                                                       is passed in kernarg.
+
+                                                     "hidden_default_queue"
+                                                       A global address space pointer
+                                                       to the OpenCL device enqueue
+                                                       queue that should be used by
+                                                       the kernel by default is
+                                                       passed in the kernarg.
+
+                                                     "hidden_completion_action"
+                                                       A global address space pointer
+                                                       to help link enqueued kernels into
+                                                       the ancestor tree for determining
+                                                       when the parent kernel has finished.
+
+     ".value_type"          string         Required  Kernel argument value type. Only
+                                                     present if ".value_kind" is
+                                                     "by_value". For vector data
+                                                     types, the value is for the
+                                                     element type. Values include:
+
+                                                     - "struct"
+                                                     - "i8"
+                                                     - "u8"
+                                                     - "i16"
+                                                     - "u16"
+                                                     - "f16"
+                                                     - "i32"
+                                                     - "u32"
+                                                     - "f32"
+                                                     - "i64"
+                                                     - "u64"
+                                                     - "f64"
+
+                                                     .. TODO
+                                                        How can it be determined if a
+                                                        vector type, and what size
+                                                        vector?
+     ".pointee_align"       integer                  Alignment in bytes of pointee
+                                                     type for pointer type kernel
+                                                     argument. Must be a power
+                                                     of 2. Only present if
+                                                     ".value_kind" is
+                                                     "dynamic_shared_pointer".
+     ".address_space"       string                   Kernel argument address space
+                                                     qualifier. Only present if
+                                                     ".value_kind" is "global_buffer" or
+                                                     "dynamic_shared_pointer". Values
+                                                     are:
+
+                                                     - "private"
+                                                     - "global"
+                                                     - "constant"
+                                                     - "local"
+                                                     - "generic"
+                                                     - "region"
+
+                                                     .. TODO
+                                                        Is "global_buffer" only "global"
+                                                        or "constant"? Is
+                                                        "dynamic_shared_pointer" always
+                                                        "local"? Can HCC allow "generic"?
+                                                        How can "private" or "region"
+                                                        ever happen?
+     ".access"              string                   Kernel argument access
+                                                     qualifier. Only present if
+                                                     ".value_kind" is "image" or
+                                                     "pipe". Values
+                                                     are:
+
+                                                     - "read_only"
+                                                     - "write_only"
+                                                     - "read_write"
+
+                                                     .. TODO
+                                                        Does this apply to
+                                                        "global_buffer"?
+     ".actual_access"       string                   The actual memory accesses
+                                                     performed by the kernel on the
+                                                     kernel argument. Only present if
+                                                     ".value_kind" is "global_buffer",
+                                                     "image", or "pipe". This may be
+                                                     more restrictive than indicated
+                                                     by ".access" to reflect what the
+                                                     kernel actual does. If not
+                                                     present then the runtime must
+                                                     assume what is implied by
+                                                     ".access" and ".is_const"      . Values
+                                                     are:
+
+                                                     - "read_only"
+                                                     - "write_only"
+                                                     - "read_write"
+
+     ".is_const"            boolean                  Indicates if the kernel argument
+                                                     is const qualified. Only present
+                                                     if ".value_kind" is
+                                                     "global_buffer".
+
+     ".is_restrict"         boolean                  Indicates if the kernel argument
+                                                     is restrict qualified. Only
+                                                     present if ".value_kind" is
+                                                     "global_buffer".
+
+     ".is_volatile"         boolean                  Indicates if the kernel argument
+                                                     is volatile qualified. Only
+                                                     present if ".value_kind" is
+                                                     "global_buffer".
+
+     ".is_pipe"             boolean                  Indicates if the kernel argument
+                                                     is pipe qualified. Only present
+                                                     if ".value_kind" is "pipe".
+
+                                                     .. TODO
+                                                        Can "global_buffer" be pipe
+                                                        qualified?
+     ====================== ============== ========= ================================
 
 ..
 
@@ -1436,7 +1417,7 @@
    such as grid and work-group size, together with information from the code
    object about the kernel, such as segment sizes. The ROCm runtime queries on
    the kernel symbol can be used to obtain the code object values which are
-   recorded in the :ref:`amdgpu-amdhsa-hsa-code-object-metadata`.
+   recorded in the :ref:`amdgpu-amdhsa-code-object-metadata`.
 7. CP executes micro-code and is responsible for detecting and setting up the
    GPU to execute the wavefronts of a kernel dispatch.
 8. CP ensures that when the a wavefront starts executing the kernel machine
@@ -3971,7 +3952,7 @@
    arguments for the AMDHSA OS (see
    :ref:`opencl-kernel-implicit-arguments-appended-for-amdhsa-os-table`).
 3. Additional metadata is generated
-   (see :ref:`amdgpu-amdhsa-hsa-code-object-metadata`).
+   (see :ref:`amdgpu-amdhsa-code-object-metadata`).
 
   .. table:: OpenCL kernel implicit arguments appended for AMDHSA OS
      :name: opencl-kernel-implicit-arguments-appended-for-amdhsa-os-table
@@ -4280,7 +4261,7 @@
 - *wavefront_size* defaults to 6.
 - *kernarg_segment_alignment*, *group_segment_alignment*, and
   *private_segment_alignment* default to 4. Note that alignments are specified
-  as a power of two, so a value of **n** means an alignment of 2^ **n**.
+  as a power of 2, so a value of **n** means an alignment of 2^ **n**.
 
 The *.amd_kernel_code_t* directive must be placed immediately after the
 function label and before any instructions.
@@ -4339,6 +4320,6 @@
 .. [HSA] `Heterogeneous System Architecture (HSA) Foundation <http://www.hsafoundation.com/>`__
 .. [ELF] `Executable and Linkable Format (ELF) <http://www.sco.com/developers/gabi/>`__
 .. [DWARF] `DWARF Debugging Information Format <http://dwarfstd.org/>`__
-.. [YAML] `YAML Ain't Markup Language (YAMLâ„¢) Version 1.2 <http://www.yaml.org/spec/1.2/spec.html>`__
+.. [MsgPack] `Message Pack <http://www.msgpack.org/>`__
 .. [OpenCL] `The OpenCL Specification Version 2.0 <http://www.khronos.org/registry/cl/specs/opencl-2.0.pdf>`__
 .. [HRF] `Heterogeneous-race-free Memory Models <http://benedictgaster.org/wp-content/uploads/2014/01/asplos269-FINAL.pdf>`__