diff --git a/clang/docs/ClangOffloadBundler.rst b/clang/docs/ClangOffloadBundler.rst new file mode 100644 --- /dev/null +++ b/clang/docs/ClangOffloadBundler.rst @@ -0,0 +1,211 @@ +===================== +Clang Offload Bundler +===================== + +.. contents:: + :local: + +.. _clang-offload-bundler: + +Introduction +============ + +For heterogeneous single source programming languages, use one or more +``--offload-arch=`` Clang options to specify the target IDs of the +code to generate for the offload code regions. + +The tool chain may perform multiple compilations of a translation unit to +produce separate code objects for the host and potentially multiple offloaded +devices. The ``clang-offload-bundler`` tool may be used as part of the tool +chain to combine these multiple code objects into a single bundled code object. + +The tool chain may use a bundled code object as an intermediate step so that +each tool chain step consumes and produces a single file as in traditional +non-heterogeneous tool chains. The bundled code object contains the code objects +for the host and all the offload devices. + +A bundled code object may also be used to bundle just the offloaded code +objects, and embedded as data into the host code object. The host compilation +includes an ``init`` function that will use the runtime corresponding to the +offload kind (see :ref:`clang-offload-kind-table`) to load the offload code +objects appropriate to the devices present when the host program is executed. + +.. _clang-bundled-code-object-layout: + +Bundled Code Object Layout +========================== + +The layout of a bundled code object is defined by the following table: + + .. table:: Bundled Code Object Layout + :name: bundled-code-object-layout-table + + =================================== ======= ================ =============================== + Field Type Size in Bytes Description + =================================== ======= ================ =============================== + Magic String string 24 ``__CLANG_OFFLOAD_BUNDLE__`` + Number Of Code Objects integer 8 Number od bundled code objects. + 1st Bundle Entry Code Object Offset integer 8 Byte offset from beginning of + bundled code object to 1st code + object. + 1st Bundle Entry Code Object Size integer 8 Byte size of 1st code object. + 1st Bundle Entry ID Length integer 8 Character length of bundle + entry ID of 1st code object. + 1st Bundle Entry ID string 1st Bundle Entry Bundle entry ID of 1st code + ID Length object. This is not NUL + terminated. See + :ref:`clang-bundle-entry-id`. + \... + Nth Bundle Entry Code Object Offset integer 8 + Nth Bundle Entry Code Object Size integer 8 + Nth Bundle Entry ID Length integer 8 + Nth Bundle Entry ID string 1st Bundle Entry + ID Length + 1st Bundle Entry Code Object bytes 1st Bundle Entry + Code Object Size + \... + Nth Bundle Entry Code Object bytes Nth Bundle Entry + Code Object Size + =================================== ======= ================ =============================== + +.. _clang-bundle-entry-id: + +Bundle Entry ID +=============== + +Each entry in a bundled code object (see +:ref:`clang-bundled-code-object-layout`) has a bundle entry ID that indicates +the kind of the entry's code object and the runtime that manages it. + +Bundle entry ID syntax is defined by the following BNF syntax: + +.. code:: + + ::== "-" [ "-" ] + +Where: + +**offload-kind** + The runtime responsible for managing the bundled entry code object. See + :ref:`clang-offload-kind-table`. + + .. table:: Bundled Code Object Offload Kind + :name: clang-offload-kind-table + + ============= ============================================================== + Offload Kind Description + ============= ============================================================== + host Host code object. ``clang-offload-bundler`` always includes + this entry as the first bundled code object entry. For an + embedded bundled code object this entry is not used by the + runtime and so is generally an empty code object. + + hip Offload code object for the HIP language. Used for all + HIP language offload code objects when the + ``clang-offload-bundler`` is used to bundle code objects as + intermediate steps of the tool chain. Also used for AMD GPU + code objects before ABI version V4 when the + ``clang-offload-bundler`` is used to create a *fat binary* + to be loaded by the HIP runtime. The fat binary can be + loaded directly from a file, or be embedded in the host code + object as a data section with the name ``.hip_fatbin``. + + hipv4 Offload code object for the HIP language. Used for AMD GPU + code objects with at least ABI version V4 when the + ``clang-offload-bundler`` is used to create a *fat binary* + to be loaded by the HIP runtime. The fat binary can be + loaded directly from a file, or be embedded in the host code + object as a data section with the name ``.hip_fatbin``. + + openmp Offload code object for the OpenMP language extension. + ============= ============================================================== + +**target-triple** + The target triple of the code object. + +**target-id** + The canonical target ID of the code object. Present only if the target + supports a target ID. See :ref:`clang-target-id`. + +Each entry of a bundled code object must have a different bundle entry ID. There +can be multiple entries for the same processor provided they differ in target +feature settings. If there is an entry with a target feature specified as *Any*, +then all entries must specify that target feature as *Any* for the same +processor. There may be additional target specific restrictions. + +.. _clang-target-id: + +Target ID +========= + +A target ID is used to indicate the processor and optionally its configuration, +expressed by a set of target features, that affect ISA generation. It is target +specific if a target ID is supported, or if the target triple alone is +sufficient to specify the ISA generation. + +It is used with the ``-mcpu=`` and ``--offload-arch=`` +Clang compilation options to specify the kind of code to generate. + +It is also used as part of the bundle entry ID to identify the code object. See +:ref:`clang-bundle-entry-id`. + +Target ID syntax is defined by the following BNF syntax: + +.. code:: + + ::== ( ":" ( "+" | "-" ) )* + +Where: + +**processor** + Is a the target specific processor or any alternative processor name. + +**target-feature** + Is a target feature name that is supported by the processor. Each target + feature must appear at most once in a target ID and can have one of three + values: + + *Any* + Specified by omitting the target feature from the target ID. + A code object compiled with a target ID specifying the default + value of a target feature can be loaded and executed on a processor + configured with the target feature on or off. + + *On* + Specified by ``+``, indicating the target feature is enabled. A code + object compiled with a target ID specifying a target feature on + can only be loaded on a processor configured with the target feature on. + + *Off* + specified by ``-``, indicating the target feature is disabled. A code + object compiled with a target ID specifying a target feature off + can only be loaded on a processor configured with the target feature off. + +There are two forms of target ID: + +*Non-Canonical Form* + The non-canonical form is used as the input to user commands to allow the user + greater convenience. It allows both the primary and alternative processor name + to be used and the target features may be specified in any order. + +*Canonical Form* + The canonical form is used for all generated output to allow greater + convenience for tools that consume the information. It is also used for + internal passing of information between tools. Only the primary and not + alternative processor name is used and the target features are specified in + alphabetic order. Command line tools convert non-canonical form to canonical + form. + +Target Specific information +=========================== + +Target specific information is available for the following: + +*AMD GPU* + AMD GPU supports target ID and target features. See `User Guide for AMDGPU Backend + `_ which defines the `processors + `_ and `target + features `_ + supported. + +Most other targets do not support target IDs. \ No newline at end of file diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -38,8 +38,8 @@ Target Triples -------------- -Use the ``clang -target ---`` option to -specify the target triple: +Use the Clang option ``-target ---`` +to specify the target triple: .. table:: AMDGPU Architectures :name: amdgpu-architecture-table @@ -62,16 +62,22 @@ ============ ============================================================== .. table:: AMDGPU Operating Systems - :name: amdgpu-os-table + :name: amdgpu-os ============== ============================================================ OS Description ============== ============================================================ ** Defaults to the *unknown* OS. ``amdhsa`` Compute kernels executed on HSA [HSA]_ compatible runtimes - such as AMD's ROCm [AMD-ROCm]_. - ``amdpal`` Graphic shaders and compute kernels executed on AMD PAL - runtime. + such as: + + - AMD's ROCm runtime [AMD-ROCm]_ on Linux. See *AMD ROCm + Release Notes* [AMD-ROCm-Release-Notes]_ for supported + hardware and software. + - AMD's PAL runtime using the *amdhsa* loader on Windows. + + ``amdpal`` Graphic shaders and compute kernels executed on AMD's PAL + runtime using the *amdpal* loader on Windows. ``mesa3d`` Graphic shaders and compute kernels executed on Mesa 3D runtime. ============== ============================================================ @@ -90,248 +96,317 @@ Processors ---------- -Use the ``clang -mcpu `` option to specify the AMDGPU processor. The -names from both the *Processor* and *Alternative Processor* can be used. +Use the Clang options ``-mcpu=`` or ``--offload-arch=`` to +specify the AMDGPU processor together with optional target features. See +:ref:`amdgpu-target-id` and :ref:`amdgpu-target-features` for AMD GPU target +specific information. .. table:: AMDGPU Processors :name: amdgpu-processor-table - =========== =============== ============ ===== ============================= ======= ====================== - Processor Alternative Target dGPU/ Target ROCm Example - Processor Triple APU Features Support Products - Architecture Supported - [Default] - =========== =============== ============ ===== ============================= ======= ====================== + =========== =============== ============ ===== ================= =========== ============== ====================== + Processor Alternative Target dGPU/ Target Target OS Example + Processor Triple APU Features Properties Support Products + Architecture Supported () + *(see* + `amdgpu-os`_ + *and + corresponding + runtime + release notes + for current + information + and level of + support)* + =========== =============== ============ ===== ================= =========== ============== ====================== **Radeon HD 2000/3000 Series (R600)** [AMD-RADEON-HD-2000-3000]_ - ----------------------------------------------------------------------------------------------------------- - ``r600`` ``r600`` dGPU - ``r630`` ``r600`` dGPU - ``rs880`` ``r600`` dGPU - ``rv670`` ``r600`` dGPU + ------------------------------------------------------------------------------------------------------------------ + ``r600`` ``r600`` dGPU - Does not + support + generic + address + space + ``r630`` ``r600`` dGPU - Does not + support + generic + address + space + ``rs880`` ``r600`` dGPU - Does not + support + generic + address + space + ``rv670`` ``r600`` dGPU - Does not + support + generic + address + space **Radeon HD 4000 Series (R700)** [AMD-RADEON-HD-4000]_ - ----------------------------------------------------------------------------------------------------------- - ``rv710`` ``r600`` dGPU - ``rv730`` ``r600`` dGPU - ``rv770`` ``r600`` dGPU + ------------------------------------------------------------------------------------------------------------------ + ``rv710`` ``r600`` dGPU - Does not + support + generic + address + space + ``rv730`` ``r600`` dGPU - Does not + support + generic + address + space + ``rv770`` ``r600`` dGPU - Does not + support + generic + address + space **Radeon HD 5000 Series (Evergreen)** [AMD-RADEON-HD-5000]_ - ----------------------------------------------------------------------------------------------------------- - ``cedar`` ``r600`` dGPU - ``cypress`` ``r600`` dGPU - ``juniper`` ``r600`` dGPU - ``redwood`` ``r600`` dGPU - ``sumo`` ``r600`` dGPU + ------------------------------------------------------------------------------------------------------------------ + ``cedar`` ``r600`` dGPU - Does not + support + generic + address + space + ``cypress`` ``r600`` dGPU - Does not + support + generic + address + space + ``juniper`` ``r600`` dGPU - Does not + support + generic + address + space + ``redwood`` ``r600`` dGPU - Does not + support + generic + address + space + ``sumo`` ``r600`` dGPU - Does not + support + generic + address + space **Radeon HD 6000 Series (Northern Islands)** [AMD-RADEON-HD-6000]_ - ----------------------------------------------------------------------------------------------------------- - ``barts`` ``r600`` dGPU - ``caicos`` ``r600`` dGPU - ``cayman`` ``r600`` dGPU - ``turks`` ``r600`` dGPU + ------------------------------------------------------------------------------------------------------------------ + ``barts`` ``r600`` dGPU - Does not + support + generic + address + space + ``caicos`` ``r600`` dGPU - Does not + support + generic + address + space + ``cayman`` ``r600`` dGPU - Does not + support + generic + address + space + ``turks`` ``r600`` dGPU - Does not + support + generic + address + space **GCN GFX6 (Southern Islands (SI))** [AMD-GCN-GFX6]_ - ----------------------------------------------------------------------------------------------------------- - ``gfx600`` - ``tahiti`` ``amdgcn`` dGPU - ``gfx601`` - ``pitcairn`` ``amdgcn`` dGPU - - ``verde`` - ``gfx602`` - ``hainan`` ``amdgcn`` dGPU - - ``oland`` + ------------------------------------------------------------------------------------------------------------------ + ``gfx600`` - ``tahiti`` ``amdgcn`` dGPU - Does not - AMD PAL + support + generic + address + space + ``gfx601`` - ``pitcairn`` ``amdgcn`` dGPU - Does not - AMD PAL + - ``verde`` support + generic + address + space + ``gfx602`` - ``hainan`` ``amdgcn`` dGPU - Does not - AMD PAL + - ``oland`` support + generic + address + space **GCN GFX7 (Sea Islands (CI))** [AMD-GCN-GFX7]_ - ----------------------------------------------------------------------------------------------------------- - ``gfx700`` - ``kaveri`` ``amdgcn`` APU - A6-7000 - - A6 Pro-7050B - - A8-7100 - - A8 Pro-7150B - - A10-7300 - - A10 Pro-7350B - - FX-7500 - - A8-7200P - - A10-7400P - - FX-7600P - ``gfx701`` - ``hawaii`` ``amdgcn`` dGPU ROCm - FirePro W8100 - - FirePro W9100 - - FirePro S9150 - - FirePro S9170 - ``gfx702`` ``amdgcn`` dGPU ROCm - Radeon R9 290 - - Radeon R9 290x - - Radeon R390 - - Radeon R390x - ``gfx703`` - ``kabini`` ``amdgcn`` APU - E1-2100 - - ``mullins`` - E1-2200 - - E1-2500 - - E2-3000 - - E2-3800 - - A4-5000 - - A4-5100 - - A6-5200 - - A4 Pro-3340B - ``gfx704`` - ``bonaire`` ``amdgcn`` dGPU - Radeon HD 7790 - - Radeon HD 8770 - - R7 260 - - R7 260X - ``gfx705`` ``amdgcn`` APU *TBA* - - .. TODO:: - - Add product - names. + ------------------------------------------------------------------------------------------------------------------ + ``gfx700`` - ``kaveri`` ``amdgcn`` APU - AMD ROCm - A6-7000 + - AMD PAL - A6 Pro-7050B + - A8-7100 + - A8 Pro-7150B + - A10-7300 + - A10 Pro-7350B + - FX-7500 + - A8-7200P + - A10-7400P + - FX-7600P + ``gfx701`` - ``hawaii`` ``amdgcn`` dGPU - AMD ROCm - FirePro W8100 + - AMD PAL - FirePro W9100 + - FirePro S9150 + - FirePro S9170 + ``gfx702`` ``amdgcn`` dGPU - AMD ROCm - Radeon R9 290 + - AMD PAL - Radeon R9 290x + - Radeon R390 + - Radeon R390x + ``gfx703`` - ``kabini`` ``amdgcn`` APU - AMD PAL - E1-2100 + - ``mullins`` - E1-2200 + - E1-2500 + - E2-3000 + - E2-3800 + - A4-5000 + - A4-5100 + - A6-5200 + - A4 Pro-3340B + ``gfx704`` - ``bonaire`` ``amdgcn`` dGPU - AMD PAL - Radeon HD 7790 + - Radeon HD 8770 + - R7 260 + - R7 260X + ``gfx705`` ``amdgcn`` APU - AMD PAL *TBA* + + .. TODO:: + + Add product + names. **GCN GFX8 (Volcanic Islands (VI))** [AMD-GCN-GFX8]_ - ----------------------------------------------------------------------------------------------------------- - ``gfx801`` - ``carrizo`` ``amdgcn`` APU - xnack - A6-8500P - [on] - Pro A6-8500B - - A8-8600P - - Pro A8-8600B - - FX-8800P - - Pro A12-8800B - \ ``amdgcn`` APU - xnack ROCm - A10-8700P - [on] - Pro A10-8700B - - A10-8780P - \ ``amdgcn`` APU - xnack - A10-9600P - [on] - A10-9630P - - A12-9700P - - A12-9730P - - FX-9800P - - FX-9830P - \ ``amdgcn`` APU - xnack - E2-9010 - [on] - A6-9210 - - A9-9410 - ``gfx802`` - ``iceland`` ``amdgcn`` dGPU - xnack ROCm - Radeon R285 - - ``tonga`` [off] - Radeon R9 380 - - Radeon R9 385 - ``gfx803`` - ``fiji`` ``amdgcn`` dGPU - xnack ROCm - Radeon R9 Nano - [off] - Radeon R9 Fury - - Radeon R9 FuryX - - Radeon Pro Duo - - FirePro S9300x2 - - Radeon Instinct MI8 - \ - ``polaris10`` ``amdgcn`` dGPU - xnack ROCm - Radeon RX 470 - [off] - Radeon RX 480 - - Radeon Instinct MI6 - \ - ``polaris11`` ``amdgcn`` dGPU - xnack ROCm - Radeon RX 460 - [off] - ``gfx805`` - ``tongapro`` ``amdgcn`` dGPU - xnack ROCm - FirePro S7150 - [off] - FirePro S7100 - - FirePro W7100 - - Mobile FirePro - M7170 - ``gfx810`` - ``stoney`` ``amdgcn`` APU - xnack *TBA* - [on] - .. TODO:: - - Add product - names. + ------------------------------------------------------------------------------------------------------------------ + ``gfx801`` - ``carrizo`` ``amdgcn`` APU - xnack - AMD ROCm - A6-8500P + - AMD PAL - Pro A6-8500B + - A8-8600P + - Pro A8-8600B + - FX-8800P + - Pro A12-8800B + - A10-8700P + - Pro A10-8700B + - A10-8780P + - A10-9600P + - A10-9630P + - A12-9700P + - A12-9730P + - FX-9800P + - FX-9830P + - E2-9010 + - A6-9210 + - A9-9410 + ``gfx802`` - ``iceland`` ``amdgcn`` dGPU - AMD ROCm - Radeon R9 285 + - ``tonga`` - AMD PAL - Radeon R9 380 + - Radeon R9 385 + ``gfx803`` - ``fiji`` ``amdgcn`` dGPU - AMD ROCm - Radeon R9 Nano + - AMD PAL - Radeon R9 Fury + - Radeon R9 FuryX + - Radeon Pro Duo + - FirePro S9300x2 + - Radeon Instinct MI8 + \ - ``polaris10`` ``amdgcn`` dGPU - AMD ROCm - Radeon RX 470 + - AMD PAL - Radeon RX 480 + - Radeon Instinct MI6 + \ - ``polaris11`` ``amdgcn`` dGPU - AMD ROCm - Radeon RX 460 + - AMD PAL + ``gfx805`` - ``tongapro`` ``amdgcn`` dGPU - AMD ROCm - FirePro S7150 + - AMD PAL - FirePro S7100 + - FirePro W7100 + - Mobile FirePro + M7170 + ``gfx810`` - ``stoney`` ``amdgcn`` APU - xnack - AMD ROCm *TBA* + - AMD PAL + .. TODO:: + + Add product + names. **GCN GFX9** [AMD-GCN-GFX9]_ - ----------------------------------------------------------------------------------------------------------- - ``gfx900`` ``amdgcn`` dGPU - xnack ROCm - Radeon Vega - [off] Frontier Edition - - Radeon RX Vega 56 - - Radeon RX Vega 64 - - Radeon RX Vega 64 - Liquid - - Radeon Instinct MI25 - ``gfx902`` ``amdgcn`` APU - xnack - Ryzen 3 2200G - [on] - Ryzen 5 2400G - ``gfx904`` ``amdgcn`` dGPU - xnack *TBA* - [off] - .. TODO:: - - Add product - names. - - ``gfx906`` ``amdgcn`` dGPU - xnack - Radeon Instinct MI50 - [off] - Radeon Instinct MI60 - - sram-ecc - Radeon VII - [off] - Radeon Pro VII - ``gfx908`` ``amdgcn`` dGPU - xnack *TBA* - [off] - - sram-ecc - [on] - .. TODO:: - - Add product - names. - - ``gfx909`` ``amdgcn`` APU - xnack *TBA* - [off] - .. TODO:: - - Add product - names. - - ``gfx90c`` ``amdgcn`` APU - xnack - Ryzen 7 4700G - [on] - Ryzen 7 4700GE - - Ryzen 7 4700G - - Ryzen 7 4700GE - - Ryzen 5 4600G - - Ryzen 5 4600GE - - Ryzen 3 4300G - - Ryzen 3 4300GE - - Ryzen Pro 4000G - - Ryzen 7 Pro 4700G - - Ryzen 7 Pro 4750GE - - Ryzen 5 Pro 4650G - - Ryzen 5 Pro 4650GE - - Ryzen 3 Pro 4350G - - Ryzen 3 Pro 4350GE + ------------------------------------------------------------------------------------------------------------------ + ``gfx900`` ``amdgcn`` dGPU - xnack - AMD ROCm - Radeon Vega + - AMD PAL Frontier Edition + - Radeon RX Vega 56 + - Radeon RX Vega 64 + - Radeon RX Vega 64 + Liquid + - Radeon Instinct MI25 + ``gfx902`` ``amdgcn`` APU - xnack - AMD ROCm - Ryzen 3 2200G + - AMD PAL - Ryzen 5 2400G + ``gfx904`` ``amdgcn`` dGPU - xnack - AMD ROCm *TBA* + - AMD PAL + .. TODO:: + + Add product + names. + + ``gfx906`` ``amdgcn`` dGPU - sramecc - AMD ROCm - Radeon Instinct MI50 + - xnack - AMD PAL - Radeon Instinct MI60 + - Radeon VII + - Radeon Pro VII + ``gfx908`` ``amdgcn`` dGPU - sramecc - AMD ROCm *TBA* + - xnack + .. TODO:: + + Add product + names. + + ``gfx909`` ``amdgcn`` APU - xnack - AMD PAL *TBA* + + .. TODO:: + + Add product + names. + + ``gfx90c`` ``amdgcn`` APU - xnack - AMD PAL - Ryzen 7 4700G + - Ryzen 7 4700GE + - Ryzen 5 4600G + - Ryzen 5 4600GE + - Ryzen 3 4300G + - Ryzen 3 4300GE + - Ryzen Pro 4000G + - Ryzen 7 Pro 4700G + - Ryzen 7 Pro 4750GE + - Ryzen 5 Pro 4650G + - Ryzen 5 Pro 4650GE + - Ryzen 3 Pro 4350G + - Ryzen 3 Pro 4350GE **GCN GFX10** [AMD-GCN-GFX10]_ - ----------------------------------------------------------------------------------------------------------- - ``gfx1010`` ``amdgcn`` dGPU - xnack - Radeon RX 5700 - [off] - Radeon RX 5700 XT - - wavefrontsize64 - Radeon Pro 5600 XT - [off] - Radeon Pro 5600M - - cumode - [off] - ``gfx1011`` ``amdgcn`` dGPU - xnack *TBA* - [off] + ------------------------------------------------------------------------------------------------------------------ + ``gfx1010`` ``amdgcn`` dGPU - cumode - AMD ROCm - Radeon RX 5700 + - wavefrontsize64 - AMD PAL - Radeon RX 5700 XT + - xnack - Radeon Pro 5600 XT + - Radeon Pro 5600M + ``gfx1011`` ``amdgcn`` dGPU - cumode - AMD ROCm *TBA* + - wavefrontsize64 - AMD PAL + - xnack + .. TODO:: + + Add product + names. + + ``gfx1012`` ``amdgcn`` dGPU - cumode - AMD ROCm - Radeon RX 5500 + - wavefrontsize64 - AMD PAL - Radeon RX 5500 XT + - xnack + + ``gfx1030`` ``amdgcn`` dGPU - cumode - AMD ROCm *TBA* + - wavefrontsize64 - AMD PAL + .. TODO:: + + Add product + names. + + ``gfx1031`` ``amdgcn`` dGPU - cumode - AMD ROCm *TBA* + - wavefrontsize64 - AMD PAL + .. TODO:: + + Add product + names. + + ``gfx1032`` ``amdgcn`` dGPU - cumode - AMD PAL *TBA* - wavefrontsize64 - [off] - - cumode - [off] - .. TODO:: + .. TODO:: - Add product - names. + Add product + names. - ``gfx1012`` ``amdgcn`` dGPU - xnack - Radeon RX 5500 - [off] - Radeon RX 5500 XT + ``gfx1033`` ``amdgcn`` APU - cumode - AMD PAL *TBA* - wavefrontsize64 - [off] - - cumode - [off] - ``gfx1030`` ``amdgcn`` dGPU - wavefrontsize64 *TBA* - [off] - - cumode - [off] - .. TODO:: - - Add product - names. - - ``gfx1031`` ``amdgcn`` dGPU - wavefrontsize64 *TBA* - [off] - - cumode - [off] - .. TODO:: - - Add product - names. - - ``gfx1032`` ``amdgcn`` dGPU - wavefrontsize64 *TBA* - [off] - - cumode - [off] - .. TODO:: - Add product - names. - ``gfx1033`` ``amdgcn`` APU - wavefrontsize64 *TBA* - [off] - - cumode - [off] - .. TODO:: - Add product - names. - - =========== =============== ============ ===== ============================= ======= ====================== + .. TODO:: + + Add product + names. + + =========== =============== ============ ===== ================= =========== ============== ====================== .. _amdgpu-target-features: @@ -345,55 +420,116 @@ generating the code. A mismatch of features may result in incorrect execution, or a reduction in performance. -The target features supported by each processor, and the default value -used if not specified explicitly, is listed in +The target features supported by each processor is listed in :ref:`amdgpu-processor-table`. -Use the ``clang -m[no-]`` option to specify the AMDGPU -target features. +Target features are controlled by exactly one of the following Clang +options: + +``-mcpu=`` or ``--offload-arch=`` + + The ``-mcpu`` and ``--offload-arch`` can specify the target feature as + optional components of the target ID. If omitted, the target feature has the + ``any`` value. See :ref:`amdgpu-target-id`. + +``-m[no-]`` + + Target features not specified by the target ID are specified using a + separate option. These target features can have an ``on`` or ``off`` + value. ``on`` is specified by omitting the ``no-`` prefix, and + ``off`` is specified by including the ``no-`` prefix. The default + if not specified is ``off``. For example: -``-mxnack`` +``-mcpu=gfx908:xnack+`` Enable the ``xnack`` feature. -``-mno-xnack`` +``-mcpu=gfx908:xnack-`` Disable the ``xnack`` feature. +``-mcumode`` + Enable the ``cumode`` feature. +``-mno-cumode`` + Disable the ``cumode`` feature. .. table:: AMDGPU Target Features - :name: amdgpu-target-feature-table - - ====================== ================================================== - Target Feature Description - ====================== ================================================== - -m[no-]xnack Enable/disable generating code that has - memory clauses that are compatible with - having XNACK replay enabled. - - This is used for demand paging and page - migration. If XNACK replay is enabled in - the device, then if a page fault occurs - the code may execute incorrectly if the - ``xnack`` feature is not enabled. Executing - code that has the feature enabled on a - device that does not have XNACK replay - enabled will execute correctly but may - be less performant than code with the - feature disabled. - - -m[no-]sram-ecc Enable/disable generating code that assumes SRAM - ECC is enabled/disabled. - - -m[no-]wavefrontsize64 Control the default wavefront size used when - generating code for kernels. When disabled - native wavefront size 32 is used, when enabled - wavefront size 64 is used. - - -m[no-]cumode Control the default wavefront execution mode used - when generating code for kernels. When disabled - native WGP wavefront execution mode is used, - when enabled CU wavefront execution mode is used - (see :ref:`amdgpu-amdhsa-memory-model`). - ====================== ================================================== + :name: amdgpu-target-features-table + + =============== ============================ ================================================== + Target Feature Clang Option to Control Description + Name + =============== ============================ ================================================== + cumode - ``-m[no-]cumode`` Control the wavefront execution mode used + when generating code for kernels. When disabled + native WGP wavefront execution mode is used, + when enabled CU wavefront execution mode is used + (see :ref:`amdgpu-amdhsa-memory-model`). + + sramecc - ``-mcpu`` If specified, generate code that can only be + - ``--offload-arch`` loaded and executed in a process that has a + matching setting for SRAMECC. + + If not specified, generate code that can be + loaded and executed in a process with either + setting of SRAMECC. + + wavefrontsize64 - ``-m[no-]wavefrontsize64`` Control the wavefront size used when + generating code for kernels. When disabled + native wavefront size 32 is used, when enabled + wavefront size 64 is used. + + xnack - ``-mcpu`` If specified, generate code that can only be + - ``--offload-arch`` loaded and executed in a process that has a + matching setting for XNACK replay. + + If not specified, generate code that can be + loaded and executed in a process with either + setting of XNACK replay. + + This is used for demand paging and page + migration. If XNACK replay is enabled in + the device, then if a page fault occurs + the code may execute incorrectly if the + ``xnack`` feature is not enabled. Executing + code that has the feature enabled on a + device that does not have XNACK replay + enabled will execute correctly but may + be less performant than code with the + feature disabled. + =============== ============================ ================================================== + +.. _amdgpu-target-id: + +Target ID +--------- + +AMDGPU supports target IDs. See `Clang Offload Bundler +`_ for a general description. The +AMDGPU target specific information is: + +**processor** + Is a AMDGPU processor or alternative processor name specified in + :ref:`amdgpu-processor-table`. The non-canonical form target ID allows both + the primary processor and alternative processor names. The canonical form + target ID only allow the primary processor name. + +**target-feature** + Is a target feature name specified in :ref:`amdgpu-target-features-table` that + is supported by the processor. The target features supported by each processor + is specified in :ref:`amdgpu-processor-table`. Those that can be specifeid in + a target ID are marked as being controlled by ``-mcpu`` and + ``--offload-arch``. Each target feature must appear at most once in a target + ID. The non-canonical form target ID allows the target features to be + specified in any order. The canonical form target ID requires the target + features to be specified in alphabetic order. + +.. _amdgpu-embedding-bundled-objects: + +Embedding Bundled Code Objects +------------------------------ + +AMDGPU supports the HIP and OpenMP languages that perform code object embedding +as described in `Clang Offload Bundler +`_. .. _amdgpu-address-spaces: @@ -430,18 +566,21 @@ ================================= =============== =========== ================ ======= ============================ **Generic** - The generic address space uses the hardware flat address support available in - GFX7-GFX10. This uses two fixed ranges of virtual addresses (the private and - local apertures), that are outside the range of addressable global memory, to - map from a flat address to a private or local address. - - FLAT instructions can take a flat address and access global, private - (scratch), and group (LDS) memory depending on if the address is within one - of the aperture ranges. Flat access to scratch requires hardware aperture - setup and setup in the kernel prologue (see - :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`). Flat access to LDS requires - hardware aperture setup and M0 (GFX7-GFX8) register setup (see - :ref:`amdgpu-amdhsa-kernel-prolog-m0`). + The generic address space is supported unless the *Target Properties* column + of :ref:`amdgpu-processor-table` specifies *Does not support generic address + space*. + + The generic address space uses the hardware flat address support for two fixed + ranges of virtual addresses (the private and local apertures), that are + outside the range of addressable global memory, to map from a flat address to + a private or local address. This uses FLAT instructions that can take a flat + address and access global, private (scratch), and group (LDS) memory depending + on if the address is within one of the aperture ranges. + + Flat access to scratch requires hardware aperture setup and setup in the + kernel prologue (see :ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`). Flat + access to LDS requires hardware aperture setup and M0 (GFX7-GFX8) register + setup (see :ref:`amdgpu-amdhsa-kernel-prolog-m0`). To convert between a private or group address space address (termed a segment address) and a flat address the base address of the corresponding aperture @@ -701,14 +840,18 @@ - ``ELFOSABI_AMDGPU_HSA`` - ``ELFOSABI_AMDGPU_PAL`` - ``ELFOSABI_AMDGPU_MESA3D`` - ``e_ident[EI_ABIVERSION]`` - ``ELFABIVERSION_AMDGPU_HSA`` + ``e_ident[EI_ABIVERSION]`` - ``ELFABIVERSION_AMDGPU_HSA_V2`` + - ``ELFABIVERSION_AMDGPU_HSA_V3`` + - ``ELFABIVERSION_AMDGPU_HSA_V4`` - ``ELFABIVERSION_AMDGPU_PAL`` - ``ELFABIVERSION_AMDGPU_MESA3D`` ``e_type`` - ``ET_REL`` - ``ET_DYN`` ``e_machine`` ``EM_AMDGPU`` ``e_entry`` 0 - ``e_flags`` See :ref:`amdgpu-elf-header-e_flags-table` + ``e_flags`` See :ref:`amdgpu-elf-header-e_flags-v2-table`, + :ref:`amdgpu-elf-header-e_flags-table-v3`, + and :ref:`amdgpu-elf-header-e_flags-table-v4` ========================== =============================== .. @@ -724,7 +867,9 @@ ``ELFOSABI_AMDGPU_HSA`` 64 ``ELFOSABI_AMDGPU_PAL`` 65 ``ELFOSABI_AMDGPU_MESA3D`` 66 - ``ELFABIVERSION_AMDGPU_HSA`` 1 + ``ELFABIVERSION_AMDGPU_HSA_V2`` 0 + ``ELFABIVERSION_AMDGPU_HSA_V3`` 1 + ``ELFABIVERSION_AMDGPU_HSA_V4`` 2 ``ELFABIVERSION_AMDGPU_PAL`` 0 ``ELFABIVERSION_AMDGPU_MESA3D`` 0 =============================== ===== @@ -742,7 +887,7 @@ ``e_ident[EI_OSABI]`` One of the following AMDGPU target architecture specific OS ABIs - (see :ref:`amdgpu-os-table`): + (see :ref:`amdgpu-os`): * ``ELFOSABI_NONE`` for *unknown* OS. @@ -756,8 +901,18 @@ The ABI version of the AMDGPU target architecture specific OS ABI to which the code object conforms: - * ``ELFABIVERSION_AMDGPU_HSA`` is used to specify the version of AMD HSA - runtime ABI. + * ``ELFABIVERSION_AMDGPU_HSA_V2`` is used to specify the version of AMD HSA + runtime ABI for code object V2. Specify using the Clang option + ``-mcode-object-version=2``. + + * ``ELFABIVERSION_AMDGPU_HSA_V3`` is used to specify the version of AMD HSA + runtime ABI for code object V3. Specify using the Clang option + ``-mcode-object-version=3``. + + * ``ELFABIVERSION_AMDGPU_HSA_V4`` is used to specify the version of AMD HSA + runtime ABI for code object V4. Specify using the Clang option + ``-mcode-object-version=4``. This is the default code object + version if not specified. * ``ELFABIVERSION_AMDGPU_PAL`` is used to specify the version of AMD PAL runtime ABI. @@ -782,8 +937,11 @@ The value ``EM_AMDGPU`` is used for the machine for all processors supported by the ``r600`` and ``amdgcn`` architectures (see :ref:`amdgpu-processor-table`). The specific processor is specified in the - ``EF_AMDGPU_MACH`` bit field of the ``e_flags`` (see - :ref:`amdgpu-elf-header-e_flags-table`). + ``NT_AMD_HSA_ISA_VERSION`` note record for code object V2 (see + :ref:`amdgpu-note-records-v2`) and in the ``EF_AMDGPU_MACH`` bit field of the + ``e_flags`` for code object V3 to V4 (see + :ref:`amdgpu-elf-header-e_flags-table-v3` and + :ref:`amdgpu-elf-header-e_flags-table-v4`). ``e_entry`` The entry point is 0 as the entry points for individual kernels must be @@ -792,42 +950,94 @@ ``e_flags`` The AMDGPU backend uses the following ELF header flags: - .. table:: AMDGPU ELF Header ``e_flags`` - :name: amdgpu-elf-header-e_flags-table - - ================================= ========== ============================= - Name Value Description - ================================= ========== ============================= - **AMDGPU Processor Flag** See :ref:`amdgpu-processor-table`. - -------------------------------------------- ----------------------------- - ``EF_AMDGPU_MACH`` 0x000000ff AMDGPU processor selection - mask for - ``EF_AMDGPU_MACH_xxx`` values - defined in - :ref:`amdgpu-ef-amdgpu-mach-table`. - ``EF_AMDGPU_XNACK`` 0x00000100 Indicates if the ``xnack`` - target feature is - enabled for all code - contained in the code object. - If the processor - does not support the - ``xnack`` target - feature then must - be 0. - See - :ref:`amdgpu-target-features`. - ``EF_AMDGPU_SRAM_ECC`` 0x00000200 Indicates if the ``sram-ecc`` - target feature is - enabled for all code - contained in the code object. - If the processor - does not support the - ``sram-ecc`` target - feature then must - be 0. - See - :ref:`amdgpu-target-features`. - ================================= ========== ============================= + .. table:: AMDGPU ELF Header ``e_flags`` for Code Object V2 + :name: amdgpu-elf-header-e_flags-v2-table + + ===================================== ===== ============================= + Name Value Description + ===================================== ===== ============================= + ``EF_AMDGPU_FEATURE_XNACK_V2`` 0x01 Indicates if the ``xnack`` + target feature is + enabled for all code + contained in the code object. + If the processor + does not support the + ``xnack`` target + feature then must + be 0. + See + :ref:`amdgpu-target-features`. + ``EF_AMDGPU_FEATURE_TRAP_HANDLER_V2`` 0x02 Indicates if the trap + handler is enabled for all + code contained in the code + object. If the processor + does not support a trap + handler then must be 0. + See + :ref:`amdgpu-target-features`. + ===================================== ===== ============================= + + .. table:: AMDGPU ELF Header ``e_flags`` for Code Object V3 + :name: amdgpu-elf-header-e_flags-table-v3 + + ================================= ===== ============================= + Name Value Description + ================================= ===== ============================= + ``EF_AMDGPU_MACH`` 0x0ff AMDGPU processor selection + mask for + ``EF_AMDGPU_MACH_xxx`` values + defined in + :ref:`amdgpu-ef-amdgpu-mach-table`. + ``EF_AMDGPU_FEATURE_XNACK_V3`` 0x100 Indicates if the ``xnack`` + target feature is + enabled for all code + contained in the code object. + If the processor + does not support the + ``xnack`` target + feature then must + be 0. + See + :ref:`amdgpu-target-features`. + ``EF_AMDGPU_FEATURE_SRAMECC_V3`` 0x200 Indicates if the ``sramecc`` + target feature is + enabled for all code + contained in the code object. + If the processor + does not support the + ``sramecc`` target + feature then must + be 0. + See + :ref:`amdgpu-target-features`. + ================================= ===== ============================= + + .. table:: AMDGPU ELF Header ``e_flags`` for Code Object V4 + :name: amdgpu-elf-header-e_flags-table-v4 + + ============================================ ===== =================================== + Name Value Description + ============================================ ===== =================================== + ``EF_AMDGPU_MACH`` 0x0ff AMDGPU processor selection + mask for + ``EF_AMDGPU_MACH_xxx`` values + defined in + :ref:`amdgpu-ef-amdgpu-mach-table`. + ``EF_AMDGPU_FEATURE_XNACK_V4`` 0x300 XNACK selection mask for + ``EF_AMDGPU_FEATURE_XNACK_*_V4`` + values. + ``EF_AMDGPU_FEATURE_XNACK_UNSUPPORTED_V4`` 0x000 XNACK unsuppored. + ``EF_AMDGPU_FEATURE_XNACK_ANY_V4`` 0x100 XNACK can have any value. + ``EF_AMDGPU_FEATURE_XNACK_OFF_V4`` 0x200 XNACK disabled. + ``EF_AMDGPU_FEATURE_XNACK_ON_V4`` 0x300 XNACK enabled. + ``EF_AMDGPU_FEATURE_SRAMECC_V4`` 0xc00 SRAMECC selection mask for + ``EF_AMDGPU_FEATURE_SRAMECC_*_V4`` + values. + ``EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4`` 0x000 SRAMECC unsuppored. + ``EF_AMDGPU_FEATURE_SRAMECC_ANY_V4`` 0x400 SRAMECC can have any value. + ``EF_AMDGPU_FEATURE_SRAMECC_OFF_V4`` 0x800 SRAMECC disabled, + ``EF_AMDGPU_FEATURE_SRAMECC_ON_V4`` 0xc00 SRAMECC enabled. + ============================================ ===== =================================== .. table:: AMDGPU ``EF_AMDGPU_MACH`` Values :name: amdgpu-ef-amdgpu-mach-table @@ -953,7 +1163,7 @@ The AMDGPU backend code object contains ELF note records in the ``.note`` section. The set of generated notes and their semantics depend on the code object version; see :ref:`amdgpu-note-records-v2` and -:ref:`amdgpu-note-records-v3`. +:ref:`amdgpu-note-records-v3-v4`. As required by ``ELFCLASS32`` and ``ELFCLASS64``, minimal zero-byte padding must be generated after the ``name`` field to ensure the ``desc`` field is 4 @@ -964,63 +1174,186 @@ .. _amdgpu-note-records-v2: -Code Object V2 Note Records (--amdhsa-code-object-version=2) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Code Object V2 Note Records +~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. warning:: Code Object V2 is not the default code object version emitted by - this version of LLVM. For a description of the notes generated with the - default configuration (Code Object V3) see :ref:`amdgpu-note-records-v3`. +.. warning:: + Code object V2 is not the default code object version emitted by + this version of LLVM. The AMDGPU backend code object uses the following ELF note record in the -``.note`` section when compiling for Code Object V2 -(--amdhsa-code-object-version=2). +``.note`` section when compiling for code object V2. + +The note record vendor field is "AMD". Additional note records may be present, but any which are not documented here are deprecated and should not be used. .. table:: AMDGPU Code Object V2 ELF Note Records - :name: amdgpu-elf-note-records-table-v2 - - ===== ============================== ====================================== - Name Type Description - ===== ============================== ====================================== - "AMD" ``NT_AMD_AMDGPU_HSA_METADATA`` - ===== ============================== ====================================== + :name: amdgpu-elf-note-records-v2-table + + ===== ===================================== ====================================== + Name Type Description + ===== ===================================== ====================================== + "AMD" ``NT_AMD_HSA_CODE_OBJECT_VERSION`` Code object version. + "AMD" ``NT_AMD_HSA_HSAIL`` HSAIL properties generated by the HSAIL + Finalizer and not the LLVM compiler. + "AMD" ``NT_AMD_HSA_ISA_VERSION`` Target ISA version. + "AMD" ``NT_AMD_HSA_METADATA`` Metadata null terminated string in + YAML [YAML]_ textual format. + "AMD" ``NT_AMD_HSA_ISA_NAME`` Target ISA name. + ===== ===================================== ====================================== .. .. table:: AMDGPU Code Object V2 ELF Note Record Enumeration Values - :name: amdgpu-elf-note-record-enumeration-values-table-v2 - - ============================== ===== - Name Value - ============================== ===== - *reserved* 0-9 - ``NT_AMD_AMDGPU_HSA_METADATA`` 10 - *reserved* 11 - ============================== ===== - -``NT_AMD_AMDGPU_HSA_METADATA`` + :name: amdgpu-elf-note-record-enumeration-values-v2-table + + ===================================== ===== + Name Value + ===================================== ===== + ``NT_AMD_HSA_CODE_OBJECT_VERSION`` 1 + ``NT_AMD_HSA_HSAIL`` 2 + ``NT_AMD_HSA_ISA_VERSION`` 3 + *reserved* 4-9 + ``NT_AMD_HSA_METADATA`` 10 + ``NT_AMD_HSA_ISA_NAME`` 11 + ===================================== ===== + +``NT_AMD_HSA_CODE_OBJECT_VERSION`` + Specifies the code object version number. The description field has the + following layout: + + .. code:: + + struct amdgpu_hsa_note_code_object_version_s { + uint32_t major_version; + uint32_t minor_version; + }; + + The ``major_version`` has a value less than or equal to 2. + +``NT_AMD_HSA_HSAIL`` + Specifies the HSAIL properties used by the HSAIL Finalizer. The description + field has the following layout: + + .. code:: + + struct amdgpu_hsa_note_hsail_s { + uint32_t hsail_major_version; + uint32_t hsail_minor_version; + uint8_t profile; + uint8_t machine_model; + uint8_t default_float_round; + }; + +``NT_AMD_HSA_ISA_VERSION`` + Specifies the target ISA version. The description field has the following layout: + + .. code:: + + struct amdgpu_hsa_note_isa_s { + uint16_t vendor_name_size; + uint16_t architecture_name_size; + uint32_t major; + uint32_t minor; + uint32_t stepping; + char vendor_and_architecture_name[1]; + }; + + ``vendor_name_size`` and ``architecture_name_size`` are the length of the + vendor and architecture names respectively, including the NUL character. + + ``vendor_and_architecture_name`` contains the NUL terminates string for the + vendor, immediately followed by the NUL terminated string for the + architecture. + + This note record is used by the HSA runtime loader. + + Code object V2 only supports a limited number of processors and has fixed + settings for target features. See + :ref:`amdgpu-elf-note-record-supported_processors-v2-table` for a list of + processors and the corresponding target ID. In the table the note record ISA + name is a concatenation of the vendor name, architecture name, major, minor, + and stepping separated by a ":". + + The target ID column shows the processor name and fixed target features used + by the LLVM compiler. The LLVM compiler does not generate a + ``NT_AMD_HSA_HSAIL`` note record. + + A code object generated by the Finalizer also uses code object V2 and always + generates a ``NT_AMD_HSA_HSAIL`` note record. The processor name and + ``sramecc`` target feature is as shown in + :ref:`amdgpu-elf-note-record-supported_processors-v2-table` but the ``xnack`` + target feature is specified by the ``EF_AMDGPU_FEATURE_XNACK_V2`` ``e_flags`` + bit. + +``NT_AMD_HSA_ISA_NAME`` + Specifies the target ISA name as a non-NUL terminated string. + + This note record is not used by the HSA runtime loader. + + See the ``NT_AMD_HSA_ISA_VERSION`` note record description of the code object + V2's limited support of processors and fixed settings for target features. + + See :ref:`amdgpu-elf-note-record-supported_processors-v2-table` for a mapping + from the string to the corresponding target ID. If the ``xnack`` target + feature is supported and enabled, the string produced by the LLVM compiler + will may have a ``+xnack`` appended. The Finlizer did not do the appending and + instead used the ``EF_AMDGPU_FEATURE_XNACK_V2`` ``e_flags`` bit. + +``NT_AMD_HSA_METADATA`` Specifies extensible metadata associated with the code objects executed on HSA - [HSA]_ compatible runtimes such as AMD's ROCm [AMD-ROCm]_. It is required when - the target triple OS is ``amdhsa`` (see :ref:`amdgpu-target-triples`). See - :ref:`amdgpu-amdhsa-code-object-metadata-v2` for the syntax of the code - object metadata string. - -.. _amdgpu-note-records-v3: - -Code Object V3 Note Records (--amdhsa-code-object-version=3) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + [HSA]_ compatible runtimes (see :ref:`amdgpu-os`). It is required when the + target triple OS is ``amdhsa`` (see :ref:`amdgpu-target-triples`). See + :ref:`amdgpu-amdhsa-code-object-metadata-v2` for the syntax of the code object + metadata string. + + .. table:: AMDGPU Code Object V2 Supported Processors and Fixed Target Feature Settings + :name: amdgpu-elf-note-record-supported_processors-v2-table + + ==================== ========================== + Note Record ISA Name Target ID + ==================== ========================== + ``AMD:AMDGPU:6:0:0`` ``gfx600`` + ``AMD:AMDGPU:6:0:1`` ``gfx601`` + ``AMD:AMDGPU:6:0:2`` ``gfx602`` + ``AMD:AMDGPU:7:0:0`` ``gfx700`` + ``AMD:AMDGPU:7:0:1`` ``gfx701`` + ``AMD:AMDGPU:7:0:2`` ``gfx702`` + ``AMD:AMDGPU:7:0:3`` ``gfx703`` + ``AMD:AMDGPU:7:0:4`` ``gfx704`` + ``AMD:AMDGPU:7:0:5`` ``gfx705`` + ``AMD:AMDGPU:8:0:1`` ``gfx801:xnack+`` + ``AMD:AMDGPU:8:0:2`` ``gfx802`` + ``AMD:AMDGPU:8:0:3`` ``gfx803`` + ``AMD:AMDGPU:8:0:5`` ``gfx805`` + ``AMD:AMDGPU:8:1:0`` ``gfx810:xnack+`` + ``AMD:AMDGPU:9:0:0`` ``gfx900:xnack-`` + ``AMD:AMDGPU:9:0:1`` ``gfx900:xnack+`` + ``AMD:AMDGPU:9:0:2`` ``gfx902:xnack-`` + ``AMD:AMDGPU:9:0:3`` ``gfx902:xnack+`` + ``AMD:AMDGPU:9:0:4`` ``gfx904:xnack-`` + ``AMD:AMDGPU:9:0:5`` ``gfx904:xnack+`` + ``AMD:AMDGPU:9:0:6`` ``gfx906:sramecc-:xnack-`` + ``AMD:AMDGPU:9:0:7`` ``gfx906:sramecc-:xnack+`` + ==================== ========================== + +.. _amdgpu-note-records-v3-v4: + +Code Object V3 to V4 Note Records +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The AMDGPU backend code object uses the following ELF note record in the -``.note`` section when compiling for Code Object V3 -(--amdhsa-code-object-version=3). +``.note`` section when compiling for code object V3 to V4. + +The note record vendor field is "AMDGPU". Additional note records may be present, but any which are not documented here are deprecated and should not be used. - .. table:: AMDGPU Code Object V3 ELF Note Records - :name: amdgpu-elf-note-records-table-v3 + .. table:: AMDGPU Code Object V3 to V4 ELF Note Records + :name: amdgpu-elf-note-records-table-v3-v4 ======== ============================== ====================================== Name Type Description @@ -1031,8 +1364,8 @@ .. - .. table:: AMDGPU Code Object V3 ELF Note Record Enumeration Values - :name: amdgpu-elf-note-record-enumeration-values-table-v3 + .. table:: AMDGPU Code Object V3 to V4 ELF Note Record Enumeration Values + :name: amdgpu-elf-note-record-enumeration-values-table-v3-v4 ============================== ===== Name Value @@ -1042,10 +1375,11 @@ ============================== ===== ``NT_AMDGPU_METADATA`` - Specifies extensible metadata associated with an AMDGPU code - object. It is encoded as a map in the Message Pack [MsgPack]_ binary - data format. See :ref:`amdgpu-amdhsa-code-object-metadata-v3` for the - map keys defined for the ``amdhsa`` OS. + Specifies extensible metadata associated with an AMDGPU code object. It is + encoded as a map in the Message Pack [MsgPack]_ binary data format. See + :ref:`amdgpu-amdhsa-code-object-metadata-v3` and + :ref:`amdgpu-amdhsa-code-object-metadata-v4` for the map keys defined for the + ``amdhsa`` OS. .. _amdgpu-symbols: @@ -2080,67 +2414,37 @@ This section provides code conventions used when the target triple OS is ``amdhsa`` (see :ref:`amdgpu-target-triples`). -.. _amdgpu-amdhsa-code-object-target-identification: - -Code Object Target Identification -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The AMDHSA OS uses the following syntax to specify the code object -target as a single string: - - ``----`` - -Where: - - - ````, ````, ```` and ```` - are the same as the *Target Triple* (see - :ref:`amdgpu-target-triples`). - - - ```` is the same as the *Processor* (see - :ref:`amdgpu-processors`). - - - ```` is a list of the enabled *Target Features* - (see :ref:`amdgpu-target-features`), each prefixed by a plus, that - apply to *Processor*. The list must be in the same order as listed - in the table :ref:`amdgpu-target-feature-table`. Note that *Target - Features* must be included in the list if they are enabled even if - that is the default for *Processor*. - -For example: - - ``"amdgcn-amd-amdhsa--gfx902+xnack"`` - .. _amdgpu-amdhsa-code-object-metadata: Code Object Metadata ~~~~~~~~~~~~~~~~~~~~ The code object metadata specifies extensible metadata associated with the code -objects executed on HSA [HSA]_ compatible runtimes such as AMD's ROCm -[AMD-ROCm]_. The encoding and semantics of this metadata depends on the code -object version; see :ref:`amdgpu-amdhsa-code-object-metadata-v2` and -:ref:`amdgpu-amdhsa-code-object-metadata-v3`. +objects executed on HSA [HSA]_ compatible runtimes (see :ref:`amdgpu-os`). The +encoding and semantics of this metadata depends on the code object version; see +:ref:`amdgpu-amdhsa-code-object-metadata-v2`, +:ref:`amdgpu-amdhsa-code-object-metadata-v3`, and +:ref:`amdgpu-amdhsa-code-object-metadata-v4`. Code object metadata is specified in a note record (see :ref:`amdgpu-note-records`) and is required when the target triple OS is ``amdhsa`` (see :ref:`amdgpu-target-triples`). It must contain the minimum -information necessary to support the ROCM kernel queries. For example, the -segment sizes needed in a dispatch packet. In addition, a high-level language -runtime may require other information to be included. For example, the AMD -OpenCL runtime records kernel argument information. +information necessary to support the HSA compatible runtime kernel queries. For +example, the segment sizes needed in a dispatch packet. In addition, a +high-level language runtime may require other information to be included. For +example, the AMD OpenCL runtime records kernel argument information. .. _amdgpu-amdhsa-code-object-metadata-v2: -Code Object V2 Metadata (--amdhsa-code-object-version=2) -++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +Code Object V2 Metadata ++++++++++++++++++++++++ -.. warning:: Code Object V2 is not the default code object version emitted by - this version of LLVM. For a description of the metadata generated with the - default configuration (Code Object V3) see - :ref:`amdgpu-amdhsa-code-object-metadata-v3`. +.. warning:: + Code object V2 is not the default code object version emitted by this version + of LLVM. -Code object V2 metadata is specified by the ``NT_AMD_AMDGPU_METADATA`` note -record (see :ref:`amdgpu-note-records-v2`). +Code object V2 metadata is specified by the ``NT_AMD_HSA_METADATA`` note record +(see :ref:`amdgpu-note-records-v2`). The metadata is specified as a YAML formatted string (see [YAML]_ and :doc:`YamlIO`). @@ -2151,7 +2455,7 @@ contain null characters, otherwise it should be. The metadata is represented as a single YAML document comprised of the mapping -defined in table :ref:`amdgpu-amdhsa-code-object-metadata-map-table-v2` and +defined in table :ref:`amdgpu-amdhsa-code-object-metadata-map-v2-table` and referenced tables. For boolean values, the string values of ``false`` and ``true`` are used for @@ -2161,7 +2465,7 @@ non-AMD key names should be prefixed by "*vendor-name*.". .. table:: AMDHSA Code Object V2 Metadata Map - :name: amdgpu-amdhsa-code-object-metadata-map-table-v2 + :name: amdgpu-amdhsa-code-object-metadata-map-v2-table ========== ============== ========= ======================================= String Key Value Type Required? Description @@ -2198,14 +2502,14 @@ printf function call. "Kernels" sequence of Required Sequence of the mappings for each mapping kernel in the code object. See - :ref:`amdgpu-amdhsa-code-object-kernel-metadata-map-table-v2` + :ref:`amdgpu-amdhsa-code-object-kernel-metadata-map-v2-table` for the definition of the mapping. ========== ============== ========= ======================================= .. .. table:: AMDHSA Code Object V2 Kernel Metadata Map - :name: amdgpu-amdhsa-code-object-kernel-metadata-map-table-v2 + :name: amdgpu-amdhsa-code-object-kernel-metadata-map-v2-table ================= ============== ========= ================================ String Key Value Type Required? Description @@ -2227,22 +2531,22 @@ minor version. "Attrs" mapping Mapping of kernel attributes. See - :ref:`amdgpu-amdhsa-code-object-kernel-attribute-metadata-map-table-v2` + :ref:`amdgpu-amdhsa-code-object-kernel-attribute-metadata-map-v2-table` for the mapping definition. "Args" sequence of Sequence of mappings of the mapping kernel arguments. See - :ref:`amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table-v2` + :ref:`amdgpu-amdhsa-code-object-kernel-argument-metadata-map-v2-table` for the definition of the mapping. "CodeProps" mapping Mapping of properties related to the kernel code. See - :ref:`amdgpu-amdhsa-code-object-kernel-code-properties-metadata-map-table-v2` + :ref:`amdgpu-amdhsa-code-object-kernel-code-properties-metadata-map-v2-table` for the mapping definition. ================= ============== ========= ================================ .. .. table:: AMDHSA Code Object V2 Kernel Attribute Metadata Map - :name: amdgpu-amdhsa-code-object-kernel-attribute-metadata-map-table-v2 + :name: amdgpu-amdhsa-code-object-kernel-attribute-metadata-map-v2-table =================== ============== ========= ============================== String Key Value Type Required? Description @@ -2283,7 +2587,7 @@ .. .. table:: AMDHSA Code Object V2 Kernel Argument Metadata Map - :name: amdgpu-amdhsa-code-object-kernel-argument-metadata-map-table-v2 + :name: amdgpu-amdhsa-code-object-kernel-argument-metadata-map-v2-table ================= ============== ========= ================================ String Key Value Type Required? Description @@ -2478,7 +2782,7 @@ .. .. table:: AMDHSA Code Object V2 Kernel Code Properties Metadata Map - :name: amdgpu-amdhsa-code-object-kernel-code-properties-metadata-map-table-v2 + :name: amdgpu-amdhsa-code-object-kernel-code-properties-metadata-map-v2-table ============================ ============== ========= ===================== String Key Value Type Required? Description @@ -2558,11 +2862,15 @@ .. _amdgpu-amdhsa-code-object-metadata-v3: -Code Object V3 Metadata (--amdhsa-code-object-version=3) -++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +Code Object V3 Metadata ++++++++++++++++++++++++ + +.. warning:: + Code object V3 is not the default code object version emitted by this version + of LLVM. -Code object V3 metadata is specified by the ``NT_AMDGPU_METADATA`` note record -(see :ref:`amdgpu-note-records-v3`). +Code object V3 to V4 metadata is specified by the ``NT_AMDGPU_METADATA`` note +record (see :ref:`amdgpu-note-records-v3-v4`). The metadata is represented as Message Pack formatted binary data (see [MsgPack]_). The top level is a Message Pack map that includes the @@ -2960,6 +3268,36 @@ ====================== ============== ========= ================================ +.. _amdgpu-amdhsa-code-object-metadata-v4: + +Code Object V4 Metadata ++++++++++++++++++++++++ + +Code object V4 metadata is the same as +:ref:`amdgpu-amdhsa-code-object-metadata-v3` with the changes and additions +defined in table :ref:`amdgpu-amdhsa-code-object-metadata-map-table-v3`. + + .. table:: AMDHSA Code Object V4 Metadata Map Changes from :ref:`amdgpu-amdhsa-code-object-metadata-v3` + :name: amdgpu-amdhsa-code-object-metadata-map-table-v4 + + ================= ============== ========= ======================================= + String Key Value Type Required? Description + ================= ============== ========= ======================================= + "amdhsa.version" sequence of Required - The first integer is the major + 2 integers version. Currently 1. + - The second integer is the minor + version. Currently 1. + "amdhsa.target" string Required The target name of the code using the syntax: + + .. code:: + + [ "-" ] + + A canonical target ID must be + used. See :ref:`amdgpu-target-triples` + and :ref:`amdgpu-target-id`. + ================= ============== ========= ======================================= + .. Kernel Dispatch @@ -2967,10 +3305,10 @@ The HSA architected queuing language (AQL) defines a user space memory interface that can be used to control the dispatch of kernels, in an agent independent -way. An agent can have zero or more AQL queues created for it using the ROCm -runtime, in which AQL packets (all of which are 64 bytes) can be placed. See the -*HSA Platform System Architecture Specification* [HSA]_ for the AQL queue -mechanics and packet layouts. +way. An agent can have zero or more AQL queues created for it using an HSA +compatible runtime (see :ref:`amdgpu-os`), in which AQL packets (all of which +are 64 bytes) can be placed. See the *HSA Platform System Architecture +Specification* [HSA]_ for the AQL queue mechanics and packet layouts. The packet processor of a kernel agent is responsible for detecting and dispatching HSA kernels from the AQL queues associated with it. For AMD GPUs the @@ -2978,8 +3316,8 @@ asynchronous dispatch controller (ADC) and shader processor input controller (SPI). -The ROCm runtime can be used to allocate an AQL queue object. It uses the kernel -mode driver to initialize and register the AQL queue with CP. +An HSA compatible runtime can be used to allocate an AQL queue object. It uses +the kernel mode driver to initialize and register the AQL queue with CP. To dispatch a kernel the following actions are performed. This can occur in the CPU host program, or from an HSA kernel executing on a GPU. @@ -2989,30 +3327,30 @@ 2. A pointer to the kernel descriptor (see :ref:`amdgpu-amdhsa-kernel-descriptor`) of the kernel to execute is obtained. It must be for a kernel that is contained in a code object that that was - loaded by the ROCm runtime on the kernel agent with which the AQL queue is - associated. -3. Space is allocated for the kernel arguments using the ROCm runtime allocator - for a memory region with the kernarg property for the kernel agent that will - execute the kernel. It must be at least 16-byte aligned. + loaded by an HSA compatible runtime on the kernel agent with which the AQL + queue is associated. +3. Space is allocated for the kernel arguments using the HSA compatible runtime + allocator for a memory region with the kernarg property for the kernel agent + that will execute the kernel. It must be at least 16-byte aligned. 4. Kernel argument values are assigned to the kernel argument memory allocation. The layout is defined in the *HSA Programmer's Language Reference* [HSA]_. For AMDGPU the kernel execution directly accesses the kernel argument memory in the same way constant memory is accessed. (Note that the HSA specification allows an implementation to copy the kernel argument contents to another location that is accessed by the kernel.) -5. An AQL kernel dispatch packet is created on the AQL queue. The ROCm runtime - api uses 64-bit atomic operations to reserve space in the AQL queue for the - packet. The packet must be set up, and the final write must use an atomic - store release to set the packet kind to ensure the packet contents are +5. An AQL kernel dispatch packet is created on the AQL queue. The HSA compatible + runtime api uses 64-bit atomic operations to reserve space in the AQL queue + for the packet. The packet must be set up, and the final write must use an + atomic store release to set the packet kind to ensure the packet contents are visible to the kernel agent. AQL defines a doorbell signal mechanism to notify the kernel agent that the AQL queue has been updated. These rules, and the layout of the AQL queue and kernel dispatch packet is defined in the *HSA System Architecture Specification* [HSA]_. 6. A kernel dispatch packet includes information about the actual dispatch, such as grid and work-group size, together with information from the code - object about the kernel, such as segment sizes. The ROCm runtime queries on - the kernel symbol can be used to obtain the code object values which are - recorded in the :ref:`amdgpu-amdhsa-code-object-metadata`. + object about the kernel, such as segment sizes. The HSA compatible runtime + queries on the kernel symbol can be used to obtain the code object values + which are recorded in the :ref:`amdgpu-amdhsa-code-object-metadata`. 7. CP executes micro-code and is responsible for detecting and setting up the GPU to execute the wavefronts of a kernel dispatch. 8. CP ensures that when the a wavefront starts executing the kernel machine @@ -3110,30 +3448,30 @@ Image and Samplers ~~~~~~~~~~~~~~~~~~ -Image and sample handles created by the ROCm runtime are 64-bit addresses of a -hardware 32-byte V# and 48 byte S# object respectively. In order to support the -HSA ``query_sampler`` operations two extra dwords are used to store the HSA BRIG -enumeration values for the queries that are not trivially deducible from the S# -representation. +Image and sample handles created by an HSA compatible runtime (see +:ref:`amdgpu-os`) are 64-bit addresses of a hardware 32-byte V# and 48 byte S# +object respectively. In order to support the HSA ``query_sampler`` operations +two extra dwords are used to store the HSA BRIG enumeration values for the +queries that are not trivially deducible from the S# representation. HSA Signals ~~~~~~~~~~~ -HSA signal handles created by the ROCm runtime are 64-bit addresses of a -structure allocated in memory accessible from both the CPU and GPU. The -structure is defined by the ROCm runtime and subject to change between releases -(see [AMD-ROCm-github]_). +HSA signal handles created by an HSA compatible runtime (see :ref:`amdgpu-os`) +are 64-bit addresses of a structure allocated in memory accessible from both the +CPU and GPU. The structure is defined by the runtime and subject to change +between releases. For example, see [AMD-ROCm-github]_. .. _amdgpu-amdhsa-hsa-aql-queue: HSA AQL Queue ~~~~~~~~~~~~~ -The HSA AQL queue structure is defined by the ROCm runtime and subject to change -between releases (see [AMD-ROCm-github]_). For some processors it contains -fields needed to implement certain language features such as the flat address -aperture bases. It also contains fields used by CP such as managing the -allocation of scratch memory. +The HSA AQL queue structure is defined by an HSA compatible runtime (see +:ref:`amdgpu-os`) and subject to change between releases. For example, see +[AMD-ROCm-github]_. For some processors it contains fields needed to implement +certain language features such as the flat address aperture bases. It also +contains fields used by CP such as managing the allocation of scratch memory. .. _amdgpu-amdhsa-kernel-descriptor: @@ -3144,17 +3482,17 @@ execution of a kernel, including the entry point address of the machine code that implements the kernel. -Code Object V3 Kernel Descriptor for GFX6-GFX10 (--amdhsa-code-object-version=3) -++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++=+++++++++ +Code Object V3 Kernel Descriptor +++++++++++++++++++++++++++++++++ CP microcode requires the Kernel descriptor to be allocated on 64-byte alignment. The fields used by CP for code objects before V3 also match those specified in -:ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table-v3`. +:ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. - .. table:: Code Object V3 Kernel Descriptor for GFX6-GFX10 - :name: amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table-v3 + .. table:: Code Object V3 Kernel Descriptor + :name: amdgpu-amdhsa-kernel-descriptor-v3-table ======= ======= =============================== ============================ Bits Size Field Name Description @@ -3171,12 +3509,42 @@ 63:32 4 bytes PRIVATE_SEGMENT_FIXED_SIZE The amount of fixed private address space memory required for a - work-item in bytes. If - is_dynamic_callstack is 1 - then additional space must - be added to this value for - the call stack. - 127:64 8 bytes Reserved, must be 0. + work-item in bytes. + Additional space may need to + be added to this value if + the call stack has + non-inlined function calls. + 95:64 4 bytes KERNARG_SIZE The size of the kernarg + memory pointed to by the + AQL dispatch packet. The + kernarg memory is used to + pass arguments to the + kernel. + + * If the kernarg pointer in + the dispatch packet is NULL + then there are no kernel + arguments. + * If the kernarg pointer in + the dispatch packet is + not NULL and this value + is 0 then the kernarg + memory size is + unspecified. + * If the kernarg pointer in + the dispatch packet is + not NULL and this value + is not 0 then the value + specifies the kernarg + memory size in bytes. It + is recommended to provide + a value as it may be used + by CP to optimize making + the kernarg memory + visible to the kernel + code. + + 127:96 4 bytes Reserved, must be 0. 191:128 8 bytes KERNEL_CODE_ENTRY_BYTE_OFFSET Byte offset (possibly negative) from base address of kernel @@ -7554,54 +7922,134 @@ Trap Handler ABI ~~~~~~~~~~~~~~~~ -For code objects generated by AMDGPU backend for HSA [HSA]_ compatible runtimes -(such as ROCm [AMD-ROCm]_), the runtime installs a trap handler that supports -the ``s_trap`` instruction with the following usage: +For code objects generated by the AMDGPU backend for HSA [HSA]_ compatible +runtimes (see :ref:`amdgpu-os`), the runtime installs a trap handler that +supports the ``s_trap`` instruction. For usage see: + +- :ref:`amdgpu-trap-handler-for-amdhsa-os-v2-table` +- :ref:`amdgpu-trap-handler-for-amdhsa-os-v3-table` +- :ref:`amdgpu-trap-handler-for-amdhsa-os-v4-table` + + .. table:: AMDGPU Trap Handler for AMDHSA OS Code Object V2 + :name: amdgpu-trap-handler-for-amdhsa-os-v2-table - .. table:: AMDGPU Trap Handler for AMDHSA OS - :name: amdgpu-trap-handler-for-amdhsa-os-table + =================== =============== =============== ======================================= + Usage Code Sequence Trap Handler Description + Inputs + =================== =============== =============== ======================================= + reserved ``s_trap 0x00`` Reserved by hardware. + ``debugtrap(arg)`` ``s_trap 0x01`` ``SGPR0-1``: Reserved for Finalizer HSA ``debugtrap`` + ``queue_ptr`` intrinsic (not implemented). + ``VGPR0``: + ``arg`` + ``llvm.trap`` ``s_trap 0x02`` ``SGPR0-1``: Causes wave to be halted with the PC at + ``queue_ptr`` the trap instruction. The associated + queue is signalled to put it into the + error state. When the queue is put in + the error state, the waves executing + dispatches on the queue will be + terminated. + ``llvm.debugtrap`` ``s_trap 0x03`` *none* - If debugger not enabled then behaves + as a no-operation. The trap handler + is entered and immediately returns to + continue execution of the wavefront. + - If the debugger is enabled, causes + the debug trap to be reported by the + debugger and the wavefront is put in + the halt state with the PC at the + instruction. The debugger must + increment the PC and resume the wave. + reserved ``s_trap 0x04`` Reserved. + reserved ``s_trap 0x05`` Reserved. + reserved ``s_trap 0x06`` Reserved. + reserved ``s_trap 0x07`` Reserved. + reserved ``s_trap 0x08`` Reserved. + reserved ``s_trap 0xfe`` Reserved. + reserved ``s_trap 0xff`` Reserved. + =================== =============== =============== ======================================= + +.. + + .. table:: AMDGPU Trap Handler for AMDHSA OS Code Object V3 + :name: amdgpu-trap-handler-for-amdhsa-os-v3-table - =================== =============== =============== ======================= + =================== =============== =============== ======================================= Usage Code Sequence Trap Handler Description Inputs - =================== =============== =============== ======================= + =================== =============== =============== ======================================= reserved ``s_trap 0x00`` Reserved by hardware. - ``debugtrap(arg)`` ``s_trap 0x01`` ``SGPR0-1``: Reserved for HSA - ``queue_ptr`` ``debugtrap`` - ``VGPR0``: intrinsic (not - ``arg`` implemented). - ``llvm.trap`` ``s_trap 0x02`` ``SGPR0-1``: Causes dispatch to be - ``queue_ptr`` terminated and its - associated queue put - into the error state. - ``llvm.debugtrap`` ``s_trap 0x03`` - If debugger not - installed then - behaves as a - no-operation. The - trap handler is - entered and - immediately returns - to continue - execution of the - wavefront. - - If the debugger is - installed, causes - the debug trap to be - reported by the - debugger and the - wavefront is put in - the halt state until - resumed by the - debugger. + debugger breakpoint ``s_trap 0x01`` *none* Reserved for debugger to use for + breakpoints. Causes wave to be halted + with the PC at the trap instruction. + The debugger is responsible to resume + the wave, including the instruction + that the breakpoint overwrote. + ``llvm.trap`` ``s_trap 0x02`` ``SGPR0-1``: Causes wave to be halted with the PC at + ``queue_ptr`` the trap instruction. The associated + queue is signalled to put it into the + error state. When the queue is put in + the error state, the waves executing + dispatches on the queue will be + terminated. + ``llvm.debugtrap`` ``s_trap 0x03`` *none* - If debugger not enabled then behaves + as a no-operation. The trap handler + is entered and immediately returns to + continue execution of the wavefront. + - If the debugger is enabled, causes + the debug trap to be reported by the + debugger and the wavefront is put in + the halt state with the PC at the + instruction. The debugger must + increment the PC and resume the wave. reserved ``s_trap 0x04`` Reserved. reserved ``s_trap 0x05`` Reserved. reserved ``s_trap 0x06`` Reserved. - debugger breakpoint ``s_trap 0x07`` Reserved for debugger - breakpoints. + reserved ``s_trap 0x07`` Reserved. reserved ``s_trap 0x08`` Reserved. reserved ``s_trap 0xfe`` Reserved. reserved ``s_trap 0xff`` Reserved. - =================== =============== =============== ======================= + =================== =============== =============== ======================================= + +.. + + .. table:: AMDGPU Trap Handler for AMDHSA OS Code Object V4 + :name: amdgpu-trap-handler-for-amdhsa-os-v4-table + + =================== =============== =============== ============== ======================================= + Usage Code Sequence GFX6-8 Inputs GFX9-10 Inputs Description + =================== =============== =============== ============== ======================================= + reserved ``s_trap 0x00`` Reserved by hardware. + debugger breakpoint ``s_trap 0x01`` *none* *none* Reserved for debugger to use for + breakpoints. Causes wave to be halted + with the PC at the trap instruction. + The debugger is responsible to resume + the wave, including the instruction + that the breakpoint overwrote. + ``llvm.trap`` ``s_trap 0x02`` ``SGPR0-1``: *none* Causes wave to be halted with the PC at + ``queue_ptr`` the trap instruction. The associated + queue is signalled to put it into the + error state. When the queue is put in + the error state, the waves executing + dispatches on the queue will be + terminated. + ``llvm.debugtrap`` ``s_trap 0x03`` *none* *none* - If debugger not enabled then behaves + as a no-operation. The trap handler + is entered and immediately returns to + continue execution of the wavefront. + - If the debugger is enabled, causes + the debug trap to be reported by the + debugger and the wavefront is put in + the halt state with the PC at the + instruction. The debugger must + increment the PC and resume the wave. + reserved ``s_trap 0x04`` Reserved. + reserved ``s_trap 0x05`` Reserved. + reserved ``s_trap 0x06`` Reserved. + reserved ``s_trap 0x07`` Reserved. + reserved ``s_trap 0x08`` Reserved. + reserved ``s_trap 0xfe`` Reserved. + reserved ``s_trap 0xff`` Reserved. + =================== =============== =============== ============== ======================================= .. _amdgpu-amdhsa-function-call-convention: @@ -7837,7 +8285,7 @@ are undefined. The values come from the initial kernel execution state. See - :ref:`amdgpu-amdhsa-vgpr-register-set-up-order-table`. + :ref:`amdgpu-amdhsa-initial-kernel-execution-state`. .. table:: Work-item implicit argument layout :name: amdgpu-amdhsa-workitem-implicit-argument-layout-table @@ -7900,7 +8348,7 @@ .. TODO:: - Check the clang source code to decipher how function arguments and return + Check the Clang source code to decipher how function arguments and return results are handled. Also see the AMDGPU specific values used. * VGPR arguments are assigned to consecutive VGPRs starting at VGPR0 up to @@ -8463,19 +8911,14 @@ For full list of supported instructions, refer to "Vector ALU instructions". -.. TODO:: - - Remove once we switch to code object v3 by default. - .. _amdgpu-amdhsa-assembler-predefined-symbols-v2: -Code Object V2 Predefined Symbols (--amdhsa-code-object-version=2) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Code Object V2 Predefined Symbols +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. warning:: Code Object V2 is not the default code object version emitted by - this version of LLVM. For a description of the predefined symbols available - with the default configuration (Code Object V3) see - :ref:`amdgpu-amdhsa-assembler-predefined-symbols-v3`. +.. warning:: + Code object V2 is not the default code object version emitted by + this version of LLVM. The AMDGPU assembler defines and updates some symbols automatically. These symbols do not affect code generation. @@ -8526,13 +8969,12 @@ .. _amdgpu-amdhsa-assembler-directives-v2: -Code Object V2 Directives (--amdhsa-code-object-version=2) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Code Object V2 Directives +~~~~~~~~~~~~~~~~~~~~~~~~~ -.. warning:: Code Object V2 is not the default code object version emitted by - this version of LLVM. For a description of the directives supported with - the default configuration (Code Object V3) see - :ref:`amdgpu-amdhsa-assembler-directives-v3`. +.. warning:: + Code object V2 is not the default code object version emitted by + this version of LLVM. AMDGPU ABI defines auxiliary data in output code object. In assembly source, one can specify them with assembler directives. @@ -8601,13 +9043,12 @@ .. _amdgpu-amdhsa-assembler-example-v2: -Code Object V2 Example Source Code (--amdhsa-code-object-version=2) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Code Object V2 Example Source Code +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. warning:: Code Object V2 is not the default code object version emitted by - this version of LLVM. For a description of the directives supported with - the default configuration (Code Object V3) see - :ref:`amdgpu-amdhsa-assembler-example-v3`. +.. warning:: + Code Object V2 is not the default code object version emitted by + this version of LLVM. Here is an example of a minimal assembly source file, defining one HSA kernel: @@ -8645,10 +9086,10 @@ .Lfunc_end0: .size hello_world, .Lfunc_end0-hello_world -.. _amdgpu-amdhsa-assembler-predefined-symbols-v3: +.. _amdgpu-amdhsa-assembler-predefined-symbols-v3-v4: -Code Object V3 Predefined Symbols (--amdhsa-code-object-version=3) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Code Object V3 to V4 Predefined Symbols +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The AMDGPU assembler defines and updates some symbols automatically. These symbols do not affect code generation. @@ -8707,10 +9148,10 @@ May be set at any time, e.g. manually set to zero at the start of each kernel. -.. _amdgpu-amdhsa-assembler-directives-v3: +.. _amdgpu-amdhsa-assembler-directives-v3-v4: -Code Object V3 Directives (--amdhsa-code-object-version=3) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Code Object V3 to V4 Directives +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Directives which begin with ``.amdgcn`` are valid for all ``amdgcn`` architecture processors, and are not OS-specific. Directives which begin with @@ -8718,14 +9159,14 @@ ``amdhsa`` OS is specified. See :ref:`amdgpu-target-triples` and :ref:`amdgpu-processors`. -.amdgcn_target -+++++++++++++++++++++++ +.amdgcn_target "-" +++++++++++++++++++++++++++++++++++++++++++++++ -Optional directive which declares the target supported by the containing -assembler source file. Valid values are described in -:ref:`amdgpu-amdhsa-code-object-target-identification`. Used by the assembler -to validate command-line options such as ``-triple``, ``-mcpu``, and those -which specify target features. +Optional directive which declares the ``-`` supported +by the containing assembler source file. Used by the assembler to validate +command-line options such as ``-triple``, ``-mcpu``, and +``--offload-arch=``. A non-canonical target ID is allowed. See +:ref:`amdgpu-target-triples` and :ref:`amdgpu-target-id`. .amdhsa_kernel +++++++++++++++++++++ @@ -8753,27 +9194,29 @@ Directive Default Supported On Description ======================================================== =================== ============ =================== ``.amdhsa_group_segment_fixed_size`` 0 GFX6-GFX10 Controls GROUP_SEGMENT_FIXED_SIZE in - :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table-v3`. + :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_private_segment_fixed_size`` 0 GFX6-GFX10 Controls PRIVATE_SEGMENT_FIXED_SIZE in - :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table-v3`. + :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. + ``.amdhsa_kernarg_size`` 0 GFX6-GFX10 Controls KERNARG_SIZE in + :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_user_sgpr_private_segment_buffer`` 0 GFX6-GFX10 Controls ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER in - :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table-v3`. + :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_user_sgpr_dispatch_ptr`` 0 GFX6-GFX10 Controls ENABLE_SGPR_DISPATCH_PTR in - :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table-v3`. + :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_user_sgpr_queue_ptr`` 0 GFX6-GFX10 Controls ENABLE_SGPR_QUEUE_PTR in - :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table-v3`. + :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_user_sgpr_kernarg_segment_ptr`` 0 GFX6-GFX10 Controls ENABLE_SGPR_KERNARG_SEGMENT_PTR in - :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table-v3`. + :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_user_sgpr_dispatch_id`` 0 GFX6-GFX10 Controls ENABLE_SGPR_DISPATCH_ID in - :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table-v3`. + :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_user_sgpr_flat_scratch_init`` 0 GFX6-GFX10 Controls ENABLE_SGPR_FLAT_SCRATCH_INIT in - :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table-v3`. + :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX10 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in - :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table-v3`. + :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_wavefront_size32`` Target GFX10 Controls ENABLE_WAVEFRONT_SIZE32 in - Feature :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table-v3`. + Feature :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. Specific - (-wavefrontsize64) + (wavefrontsize64) ``.amdhsa_system_sgpr_private_segment_wavefront_offset`` 0 GFX6-GFX10 Controls ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET in :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx10-table`. ``.amdhsa_system_sgpr_workgroup_id_x`` 1 GFX6-GFX10 Controls ENABLE_SGPR_WORKGROUP_ID_X in @@ -8804,7 +9247,7 @@ ``.amdhsa_reserve_xnack_mask`` Target GFX8-GFX10 Whether the kernel may trigger XNACK replay. Feature Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in Specific :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. - (+xnack) + (xnack) ``.amdhsa_float_round_mode_32`` 0 GFX6-GFX10 Controls FLOAT_ROUND_MODE_32 in :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. Possible values are defined in @@ -8828,9 +9271,9 @@ ``.amdhsa_fp16_overflow`` 0 GFX9-GFX10 Controls FP16_OVFL in :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. ``.amdhsa_workgroup_processor_mode`` Target GFX10 Controls ENABLE_WGP_MODE in - Feature :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx10-table-v3`. + Feature :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. Specific - (-cumode) + (cumode) ``.amdhsa_memory_ordered`` 1 GFX10 Controls MEM_ORDERED in :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`. ``.amdhsa_forward_progress`` 0 GFX10 Controls FWD_PROGRESS in @@ -8855,17 +9298,18 @@ ++++++++++++++++ Optional directive which declares the contents of the ``NT_AMDGPU_METADATA`` -note record (see :ref:`amdgpu-elf-note-records-table-v3`). +note record (see :ref:`amdgpu-elf-note-records-table-v3-v4`). The contents must be in the [YAML]_ markup format, with the same structure and -semantics described in :ref:`amdgpu-amdhsa-code-object-metadata-v3`. +semantics described in :ref:`amdgpu-amdhsa-code-object-metadata-v3` or +:ref:`amdgpu-amdhsa-code-object-metadata-v4`. This directive is terminated by an ``.end_amdgpu_metadata`` directive. -.. _amdgpu-amdhsa-assembler-example-v3: +.. _amdgpu-amdhsa-assembler-example-v3-v4: -Code Object V3 Example Source Code (--amdhsa-code-object-version=3) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Code Object V3 to V4 Example Source Code +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Here is an example of a minimal assembly source file, defining one HSA kernel: @@ -9003,8 +9447,9 @@ .. [AMD-RADEON-HD-4000] `AMD R7xx shader ISA `__ .. [AMD-RADEON-HD-5000] `AMD Evergreen shader ISA `__ .. [AMD-RADEON-HD-6000] `AMD Cayman/Trinity shader ISA `__ -.. [AMD-ROCm] `AMD ROCm Platform `__ -.. [AMD-ROCm-github] `ROCm github `__ +.. [AMD-ROCm] `AMD ROCm Platform `__ +.. [AMD-ROCm-github] `AMD ROCm github `__ +.. [AMD-ROCm-Release-Notes] `AMD ROCm Release Notes `__ .. [CLANG-ATTR] `Attributes in Clang `__ .. [DWARF] `DWARF Debugging Information Format `__ .. [ELF] `Executable and Linkable Format (ELF) `__