diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -96,195 +96,220 @@ .. table:: AMDGPU Processors :name: amdgpu-processor-table - =========== =============== ============ ===== ================= ======= ====================== - Processor Alternative Target dGPU/ Target ROCm Example - Processor Triple APU Features Support Products + =========== =============== ============ ===== ============================= ======= ====================== + Processor Alternative Target dGPU/ Target ROCm Example + Processor Triple APU Features Support Products Architecture Supported [Default] - =========== =============== ============ ===== ================= ======= ====================== + =========== =============== ============ ===== ============================= ======= ====================== **Radeon HD 2000/3000 Series (R600)** [AMD-RADEON-HD-2000-3000]_ - ----------------------------------------------------------------------------------------------- + ----------------------------------------------------------------------------------------------------------- ``r600`` ``r600`` dGPU ``r630`` ``r600`` dGPU ``rs880`` ``r600`` dGPU ``rv670`` ``r600`` dGPU **Radeon HD 4000 Series (R700)** [AMD-RADEON-HD-4000]_ - ----------------------------------------------------------------------------------------------- + ----------------------------------------------------------------------------------------------------------- ``rv710`` ``r600`` dGPU ``rv730`` ``r600`` dGPU ``rv770`` ``r600`` dGPU **Radeon HD 5000 Series (Evergreen)** [AMD-RADEON-HD-5000]_ - ----------------------------------------------------------------------------------------------- + ----------------------------------------------------------------------------------------------------------- ``cedar`` ``r600`` dGPU ``cypress`` ``r600`` dGPU ``juniper`` ``r600`` dGPU ``redwood`` ``r600`` dGPU ``sumo`` ``r600`` dGPU **Radeon HD 6000 Series (Northern Islands)** [AMD-RADEON-HD-6000]_ - ----------------------------------------------------------------------------------------------- + ----------------------------------------------------------------------------------------------------------- ``barts`` ``r600`` dGPU ``caicos`` ``r600`` dGPU ``cayman`` ``r600`` dGPU ``turks`` ``r600`` dGPU **GCN GFX6 (Southern Islands (SI))** [AMD-GCN-GFX6]_ - ----------------------------------------------------------------------------------------------- + ----------------------------------------------------------------------------------------------------------- ``gfx600`` - ``tahiti`` ``amdgcn`` dGPU ``gfx601`` - ``pitcairn`` ``amdgcn`` dGPU - ``verde`` ``gfx602`` - ``hainan`` ``amdgcn`` dGPU - ``oland`` **GCN GFX7 (Sea Islands (CI))** [AMD-GCN-GFX7]_ - ----------------------------------------------------------------------------------------------- - ``gfx700`` - ``kaveri`` ``amdgcn`` APU - A6-7000 - - A6 Pro-7050B - - A8-7100 - - A8 Pro-7150B - - A10-7300 - - A10 Pro-7350B - - FX-7500 - - A8-7200P - - A10-7400P - - FX-7600P - ``gfx701`` - ``hawaii`` ``amdgcn`` dGPU ROCm - FirePro W8100 - - FirePro W9100 - - FirePro S9150 - - FirePro S9170 - ``gfx702`` ``amdgcn`` dGPU ROCm - Radeon R9 290 - - Radeon R9 290x - - Radeon R390 - - Radeon R390x - ``gfx703`` - ``kabini`` ``amdgcn`` APU - E1-2100 - - ``mullins`` - E1-2200 - - E1-2500 - - E2-3000 - - E2-3800 - - A4-5000 - - A4-5100 - - A6-5200 - - A4 Pro-3340B - ``gfx704`` - ``bonaire`` ``amdgcn`` dGPU - Radeon HD 7790 - - Radeon HD 8770 - - R7 260 - - R7 260X - ``gfx705`` ``amdgcn`` APU + ----------------------------------------------------------------------------------------------------------- + ``gfx700`` - ``kaveri`` ``amdgcn`` APU - A6-7000 + - A6 Pro-7050B + - A8-7100 + - A8 Pro-7150B + - A10-7300 + - A10 Pro-7350B + - FX-7500 + - A8-7200P + - A10-7400P + - FX-7600P + ``gfx701`` - ``hawaii`` ``amdgcn`` dGPU ROCm - FirePro W8100 + - FirePro W9100 + - FirePro S9150 + - FirePro S9170 + ``gfx702`` ``amdgcn`` dGPU ROCm - Radeon R9 290 + - Radeon R9 290x + - Radeon R390 + - Radeon R390x + ``gfx703`` - ``kabini`` ``amdgcn`` APU - E1-2100 + - ``mullins`` - E1-2200 + - E1-2500 + - E2-3000 + - E2-3800 + - A4-5000 + - A4-5100 + - A6-5200 + - A4 Pro-3340B + ``gfx704`` - ``bonaire`` ``amdgcn`` dGPU - Radeon HD 7790 + - Radeon HD 8770 + - R7 260 + - R7 260X + ``gfx705`` ``amdgcn`` APU *TBA* + + .. TODO:: + + Add product + names. + **GCN GFX8 (Volcanic Islands (VI))** [AMD-GCN-GFX8]_ - ----------------------------------------------------------------------------------------------- - ``gfx801`` - ``carrizo`` ``amdgcn`` APU - xnack - A6-8500P - [on] - Pro A6-8500B - - A8-8600P - - Pro A8-8600B - - FX-8800P - - Pro A12-8800B - \ ``amdgcn`` APU - xnack ROCm - A10-8700P - [on] - Pro A10-8700B - - A10-8780P - \ ``amdgcn`` APU - xnack - A10-9600P - [on] - A10-9630P - - A12-9700P - - A12-9730P - - FX-9800P - - FX-9830P - \ ``amdgcn`` APU - xnack - E2-9010 - [on] - A6-9210 - - A9-9410 - ``gfx802`` - ``iceland`` ``amdgcn`` dGPU - xnack ROCm - Radeon R285 - - ``tonga`` [off] - Radeon R9 380 - - Radeon R9 385 - ``gfx803`` - ``fiji`` ``amdgcn`` dGPU - xnack ROCm - Radeon R9 Nano - [off] - Radeon R9 Fury - - Radeon R9 FuryX - - Radeon Pro Duo - - FirePro S9300x2 - - Radeon Instinct MI8 - \ - ``polaris10`` ``amdgcn`` dGPU - xnack ROCm - Radeon RX 470 - [off] - Radeon RX 480 - - Radeon Instinct MI6 - \ - ``polaris11`` ``amdgcn`` dGPU - xnack ROCm - Radeon RX 460 + ----------------------------------------------------------------------------------------------------------- + ``gfx801`` - ``carrizo`` ``amdgcn`` APU - xnack - A6-8500P + [on] - Pro A6-8500B + - A8-8600P + - Pro A8-8600B + - FX-8800P + - Pro A12-8800B + \ ``amdgcn`` APU - xnack ROCm - A10-8700P + [on] - Pro A10-8700B + - A10-8780P + \ ``amdgcn`` APU - xnack - A10-9600P + [on] - A10-9630P + - A12-9700P + - A12-9730P + - FX-9800P + - FX-9830P + \ ``amdgcn`` APU - xnack - E2-9010 + [on] - A6-9210 + - A9-9410 + ``gfx802`` - ``iceland`` ``amdgcn`` dGPU - xnack ROCm - Radeon R285 + - ``tonga`` [off] - Radeon R9 380 + - Radeon R9 385 + ``gfx803`` - ``fiji`` ``amdgcn`` dGPU - xnack ROCm - Radeon R9 Nano + [off] - Radeon R9 Fury + - Radeon R9 FuryX + - Radeon Pro Duo + - FirePro S9300x2 + - Radeon Instinct MI8 + \ - ``polaris10`` ``amdgcn`` dGPU - xnack ROCm - Radeon RX 470 + [off] - Radeon RX 480 + - Radeon Instinct MI6 + \ - ``polaris11`` ``amdgcn`` dGPU - xnack ROCm - Radeon RX 460 [off] - ``gfx805`` - ``tongapro`` ``amdgcn`` dGPU - xnack ROCm - FirePro S7150 - [off] - FirePro S7100 - - FirePro W7100 - - Mobile FirePro - M7170 - ``gfx810`` - ``stoney`` ``amdgcn`` APU - xnack + ``gfx805`` - ``tongapro`` ``amdgcn`` dGPU - xnack ROCm - FirePro S7150 + [off] - FirePro S7100 + - FirePro W7100 + - Mobile FirePro + M7170 + ``gfx810`` - ``stoney`` ``amdgcn`` APU - xnack *TBA* [on] + .. TODO:: + + Add product + names. + **GCN GFX9** [AMD-GCN-GFX9]_ - ----------------------------------------------------------------------------------------------- - ``gfx900`` ``amdgcn`` dGPU - xnack ROCm - Radeon Vega - [off] Frontier Edition - - Radeon RX Vega 56 - - Radeon RX Vega 64 - - Radeon RX Vega 64 - Liquid - - Radeon Instinct MI25 - ``gfx902`` ``amdgcn`` APU - xnack - Ryzen 3 2200G - [on] - Ryzen 5 2400G - ``gfx904`` ``amdgcn`` dGPU - xnack *TBA* + ----------------------------------------------------------------------------------------------------------- + ``gfx900`` ``amdgcn`` dGPU - xnack ROCm - Radeon Vega + [off] Frontier Edition + - Radeon RX Vega 56 + - Radeon RX Vega 64 + - Radeon RX Vega 64 + Liquid + - Radeon Instinct MI25 + ``gfx902`` ``amdgcn`` APU - xnack - Ryzen 3 2200G + [on] - Ryzen 5 2400G + ``gfx904`` ``amdgcn`` dGPU - xnack *TBA* [off] - .. TODO:: - Add product - names. - ``gfx906`` ``amdgcn`` dGPU - xnack - Radeon Instinct MI50 - [off] - Radeon Instinct MI60 - - sram-ecc - Radeon VII - [off] - Radeon Pro VII - ``gfx908`` ``amdgcn`` dGPU - xnack *TBA* + .. TODO:: + + Add product + names. + + ``gfx906`` ``amdgcn`` dGPU - xnack - Radeon Instinct MI50 + [off] - Radeon Instinct MI60 + - sram-ecc - Radeon VII + [off] - Radeon Pro VII + ``gfx908`` ``amdgcn`` dGPU - xnack *TBA* [off] - sram-ecc [on] - .. TODO:: - Add product - names. - ``gfx909`` ``amdgcn`` APU - xnack *TBA* - [on] - .. TODO:: - Add product - names. + .. TODO:: + + Add product + names. + + ``gfx909`` ``amdgcn`` APU - xnack *TBA* + [off] + .. TODO:: + + Add product + names. + **GCN GFX10** [AMD-GCN-GFX10]_ - ----------------------------------------------------------------------------------------------- - ``gfx1010`` ``amdgcn`` dGPU - xnack - Radeon RX 5700 - [off] - Radeon RX 5700 XT - - wavefrontsize64 - Radeon Pro 5600 XT - [off] - Radeon Pro 5600M + ----------------------------------------------------------------------------------------------------------- + ``gfx1010`` ``amdgcn`` dGPU - xnack - Radeon RX 5700 + [off] - Radeon RX 5700 XT + - wavefrontsize64 - Radeon Pro 5600 XT + [off] - Radeon Pro 5600M - cumode [off] - ``gfx1011`` ``amdgcn`` dGPU - xnack *TBA* + ``gfx1011`` ``amdgcn`` dGPU - xnack *TBA* [off] - wavefrontsize64 [off] - cumode [off] - .. TODO - Add product - names. - ``gfx1012`` ``amdgcn`` dGPU - xnack - Radeon RX 5500 - [off] - Radeon RX 5500 XT + .. TODO:: + + Add product + names. + + ``gfx1012`` ``amdgcn`` dGPU - xnack - Radeon RX 5500 + [off] - Radeon RX 5500 XT - wavefrontsize64 [off] - cumode [off] - ``gfx1030`` ``amdgcn`` dGPU - wavefrontsize64 *TBA* + ``gfx1030`` ``amdgcn`` dGPU - wavefrontsize64 *TBA* [off] - cumode [off] - .. TODO - Add product - names. - ``gfx1031`` ``amdgcn`` dGPU - wavefrontsize64 *TBA* + .. TODO:: + + Add product + names. + + ``gfx1031`` ``amdgcn`` dGPU - wavefrontsize64 *TBA* [off] - cumode [off] - .. TODO - Add product - names. - ``gfx1032`` ``amdgcn`` dGPU - wavefrontsize64 *TBA* + .. TODO:: + + Add product + names. + + ``gfx1032`` ``amdgcn`` dGPU - wavefrontsize64 *TBA* [off] - cumode [off] - .. TODO - Add product - names. - =========== =============== ============ ===== ================= ======= ====================== + .. TODO:: + + Add product + names. + + =========== =============== ============ ===== ============================= ======= ====================== .. _amdgpu-target-features: @@ -782,59 +807,59 @@ .. table:: AMDGPU ``EF_AMDGPU_MACH`` Values :name: amdgpu-ef-amdgpu-mach-table - ================================= ========== ============================= - Name Value Description (see - :ref:`amdgpu-processor-table`) - ================================= ========== ============================= - ``EF_AMDGPU_MACH_NONE`` 0x000 *not specified* - ``EF_AMDGPU_MACH_R600_R600`` 0x001 ``r600`` - ``EF_AMDGPU_MACH_R600_R630`` 0x002 ``r630`` - ``EF_AMDGPU_MACH_R600_RS880`` 0x003 ``rs880`` - ``EF_AMDGPU_MACH_R600_RV670`` 0x004 ``rv670`` - ``EF_AMDGPU_MACH_R600_RV710`` 0x005 ``rv710`` - ``EF_AMDGPU_MACH_R600_RV730`` 0x006 ``rv730`` - ``EF_AMDGPU_MACH_R600_RV770`` 0x007 ``rv770`` - ``EF_AMDGPU_MACH_R600_CEDAR`` 0x008 ``cedar`` - ``EF_AMDGPU_MACH_R600_CYPRESS`` 0x009 ``cypress`` - ``EF_AMDGPU_MACH_R600_JUNIPER`` 0x00a ``juniper`` - ``EF_AMDGPU_MACH_R600_REDWOOD`` 0x00b ``redwood`` - ``EF_AMDGPU_MACH_R600_SUMO`` 0x00c ``sumo`` - ``EF_AMDGPU_MACH_R600_BARTS`` 0x00d ``barts`` - ``EF_AMDGPU_MACH_R600_CAICOS`` 0x00e ``caicos`` - ``EF_AMDGPU_MACH_R600_CAYMAN`` 0x00f ``cayman`` - ``EF_AMDGPU_MACH_R600_TURKS`` 0x010 ``turks`` - *reserved* 0x011 - Reserved for ``r600`` - 0x01f architecture processors. - ``EF_AMDGPU_MACH_AMDGCN_GFX600`` 0x020 ``gfx600`` - ``EF_AMDGPU_MACH_AMDGCN_GFX601`` 0x021 ``gfx601`` - ``EF_AMDGPU_MACH_AMDGCN_GFX700`` 0x022 ``gfx700`` - ``EF_AMDGPU_MACH_AMDGCN_GFX701`` 0x023 ``gfx701`` - ``EF_AMDGPU_MACH_AMDGCN_GFX702`` 0x024 ``gfx702`` - ``EF_AMDGPU_MACH_AMDGCN_GFX703`` 0x025 ``gfx703`` - ``EF_AMDGPU_MACH_AMDGCN_GFX704`` 0x026 ``gfx704`` - *reserved* 0x027 Reserved. - ``EF_AMDGPU_MACH_AMDGCN_GFX801`` 0x028 ``gfx801`` - ``EF_AMDGPU_MACH_AMDGCN_GFX802`` 0x029 ``gfx802`` - ``EF_AMDGPU_MACH_AMDGCN_GFX803`` 0x02a ``gfx803`` - ``EF_AMDGPU_MACH_AMDGCN_GFX810`` 0x02b ``gfx810`` - ``EF_AMDGPU_MACH_AMDGCN_GFX900`` 0x02c ``gfx900`` - ``EF_AMDGPU_MACH_AMDGCN_GFX902`` 0x02d ``gfx902`` - ``EF_AMDGPU_MACH_AMDGCN_GFX904`` 0x02e ``gfx904`` - ``EF_AMDGPU_MACH_AMDGCN_GFX906`` 0x02f ``gfx906`` - ``EF_AMDGPU_MACH_AMDGCN_GFX908`` 0x030 ``gfx908`` - ``EF_AMDGPU_MACH_AMDGCN_GFX909`` 0x031 ``gfx909`` - *reserved* 0x032 Reserved. - ``EF_AMDGPU_MACH_AMDGCN_GFX1010`` 0x033 ``gfx1010`` - ``EF_AMDGPU_MACH_AMDGCN_GFX1011`` 0x034 ``gfx1011`` - ``EF_AMDGPU_MACH_AMDGCN_GFX1012`` 0x035 ``gfx1012`` - ``EF_AMDGPU_MACH_AMDGCN_GFX1030`` 0x036 ``gfx1030`` - ``EF_AMDGPU_MACH_AMDGCN_GFX1031`` 0x037 ``gfx1031`` - ``EF_AMDGPU_MACH_AMDGCN_GFX1032`` 0x038 ``gfx1032`` - *reserved* 0x039 Reserved. - ``EF_AMDGPU_MACH_AMDGCN_GFX602`` 0x03a ``gfx602`` - ``EF_AMDGPU_MACH_AMDGCN_GFX705`` 0x03b ``gfx705`` - ``EF_AMDGPU_MACH_AMDGCN_GFX805`` 0x03c ``gfx805`` - ================================= ========== ============================= + ==================================== ========== ============================= + Name Value Description (see + :ref:`amdgpu-processor-table`) + ==================================== ========== ============================= + ``EF_AMDGPU_MACH_NONE`` 0x000 *not specified* + ``EF_AMDGPU_MACH_R600_R600`` 0x001 ``r600`` + ``EF_AMDGPU_MACH_R600_R630`` 0x002 ``r630`` + ``EF_AMDGPU_MACH_R600_RS880`` 0x003 ``rs880`` + ``EF_AMDGPU_MACH_R600_RV670`` 0x004 ``rv670`` + ``EF_AMDGPU_MACH_R600_RV710`` 0x005 ``rv710`` + ``EF_AMDGPU_MACH_R600_RV730`` 0x006 ``rv730`` + ``EF_AMDGPU_MACH_R600_RV770`` 0x007 ``rv770`` + ``EF_AMDGPU_MACH_R600_CEDAR`` 0x008 ``cedar`` + ``EF_AMDGPU_MACH_R600_CYPRESS`` 0x009 ``cypress`` + ``EF_AMDGPU_MACH_R600_JUNIPER`` 0x00a ``juniper`` + ``EF_AMDGPU_MACH_R600_REDWOOD`` 0x00b ``redwood`` + ``EF_AMDGPU_MACH_R600_SUMO`` 0x00c ``sumo`` + ``EF_AMDGPU_MACH_R600_BARTS`` 0x00d ``barts`` + ``EF_AMDGPU_MACH_R600_CAICOS`` 0x00e ``caicos`` + ``EF_AMDGPU_MACH_R600_CAYMAN`` 0x00f ``cayman`` + ``EF_AMDGPU_MACH_R600_TURKS`` 0x010 ``turks`` + *reserved* 0x011 - Reserved for ``r600`` + 0x01f architecture processors. + ``EF_AMDGPU_MACH_AMDGCN_GFX600`` 0x020 ``gfx600`` + ``EF_AMDGPU_MACH_AMDGCN_GFX601`` 0x021 ``gfx601`` + ``EF_AMDGPU_MACH_AMDGCN_GFX700`` 0x022 ``gfx700`` + ``EF_AMDGPU_MACH_AMDGCN_GFX701`` 0x023 ``gfx701`` + ``EF_AMDGPU_MACH_AMDGCN_GFX702`` 0x024 ``gfx702`` + ``EF_AMDGPU_MACH_AMDGCN_GFX703`` 0x025 ``gfx703`` + ``EF_AMDGPU_MACH_AMDGCN_GFX704`` 0x026 ``gfx704`` + *reserved* 0x027 Reserved. + ``EF_AMDGPU_MACH_AMDGCN_GFX801`` 0x028 ``gfx801`` + ``EF_AMDGPU_MACH_AMDGCN_GFX802`` 0x029 ``gfx802`` + ``EF_AMDGPU_MACH_AMDGCN_GFX803`` 0x02a ``gfx803`` + ``EF_AMDGPU_MACH_AMDGCN_GFX810`` 0x02b ``gfx810`` + ``EF_AMDGPU_MACH_AMDGCN_GFX900`` 0x02c ``gfx900`` + ``EF_AMDGPU_MACH_AMDGCN_GFX902`` 0x02d ``gfx902`` + ``EF_AMDGPU_MACH_AMDGCN_GFX904`` 0x02e ``gfx904`` + ``EF_AMDGPU_MACH_AMDGCN_GFX906`` 0x02f ``gfx906`` + ``EF_AMDGPU_MACH_AMDGCN_GFX908`` 0x030 ``gfx908`` + ``EF_AMDGPU_MACH_AMDGCN_GFX909`` 0x031 ``gfx909`` + *reserved* 0x032 Reserved. + ``EF_AMDGPU_MACH_AMDGCN_GFX1010`` 0x033 ``gfx1010`` + ``EF_AMDGPU_MACH_AMDGCN_GFX1011`` 0x034 ``gfx1011`` + ``EF_AMDGPU_MACH_AMDGCN_GFX1012`` 0x035 ``gfx1012`` + ``EF_AMDGPU_MACH_AMDGCN_GFX1030`` 0x036 ``gfx1030`` + ``EF_AMDGPU_MACH_AMDGCN_GFX1031`` 0x037 ``gfx1031`` + ``EF_AMDGPU_MACH_AMDGCN_GFX1032`` 0x038 ``gfx1032`` + *reserved* 0x039 Reserved. + ``EF_AMDGPU_MACH_AMDGCN_GFX602`` 0x03a ``gfx602`` + ``EF_AMDGPU_MACH_AMDGCN_GFX705`` 0x03b ``gfx705`` + ``EF_AMDGPU_MACH_AMDGCN_GFX805`` 0x03c ``gfx805`` + ==================================== ========== ============================= Sections -------- @@ -922,8 +947,8 @@ default configuration (Code Object V3) see :ref:`amdgpu-note-records-v3`. The AMDGPU backend code object uses the following ELF note record in the -``.note`` section when compiling for Code Object -V2 (--amdhsa-code-object-version=2). +``.note`` section when compiling for Code Object V2 +(--amdhsa-code-object-version=2). Additional note records may be present, but any which are not documented here are deprecated and should not be used. @@ -2359,12 +2384,14 @@ - "Region" .. TODO:: + Is GlobalBuffer only Global or Constant? Is DynamicSharedPointer always Local? Can HCC allow Generic? How can Private or Region ever happen? + "AccQual" string Kernel argument access qualifier. Only present if "ValueKind" is "Image" or @@ -2376,8 +2403,10 @@ - "ReadWrite" .. TODO:: + Does this apply to GlobalBuffer? + "ActualAccQual" string The actual memory accesses performed by the kernel on the kernel argument. Only present if @@ -2415,8 +2444,10 @@ if "ValueKind" is "Pipe". .. TODO:: + Can GlobalBuffer be pipe qualified? + ================= ============== ========= ================================ .. @@ -2838,12 +2869,14 @@ - "region" .. TODO:: + Is "global_buffer" only "global" or "constant"? Is "dynamic_shared_pointer" always "local"? Can HCC allow "generic"? How can "private" or "region" ever happen? + ".access" string Kernel argument access qualifier. Only present if ".value_kind" is "image" or @@ -2855,8 +2888,10 @@ - "read_write" .. TODO:: + Does this apply to "global_buffer"? + ".actual_access" string The actual memory accesses performed by the kernel on the kernel argument. Only present if @@ -2894,8 +2929,10 @@ if ".value_kind" is "pipe". .. TODO:: + Can "global_buffer" be pipe qualified? + ====================== ============== ========= ================================ .. @@ -2903,12 +2940,12 @@ Kernel Dispatch ~~~~~~~~~~~~~~~ -The HSA architected queuing language (AQL) defines a user space memory -interface that can be used to control the dispatch of kernels, in an agent -independent way. An agent can have zero or more AQL queues created for it using -the ROCm runtime, in which AQL packets (all of which are 64 bytes) can be -placed. See the *HSA Platform System Architecture Specification* [HSA]_ for the -AQL queue mechanics and packet layouts. +The HSA architected queuing language (AQL) defines a user space memory interface +that can be used to control the dispatch of kernels, in an agent independent +way. An agent can have zero or more AQL queues created for it using the ROCm +runtime, in which AQL packets (all of which are 64 bytes) can be placed. See the +*HSA Platform System Architecture Specification* [HSA]_ for the AQL queue +mechanics and packet layouts. The packet processor of a kernel agent is responsible for detecting and dispatching HSA kernels from the AQL queues associated with it. For AMD GPUs the @@ -2965,6 +3002,86 @@ 10. When the kernel dispatch has completed execution, CP signals the completion signal specified in the kernel dispatch packet if not 0. +.. _amdgpu-amdhsa-memory-spaces: + +Memory Spaces +~~~~~~~~~~~~~ + +The memory space properties are: + + .. table:: AMDHSA Memory Spaces + :name: amdgpu-amdhsa-memory-spaces-table + + ================= =========== ======== ======= ================== + Memory Space Name HSA Segment Hardware Address NULL Value + Name Name Size + ================= =========== ======== ======= ================== + Private private scratch 32 0x00000000 + Local group LDS 32 0xFFFFFFFF + Global global global 64 0x0000000000000000 + Constant constant *same as 64 0x0000000000000000 + global* + Generic flat flat 64 0x0000000000000000 + Region N/A GDS 32 *not implemented + for AMDHSA* + ================= =========== ======== ======= ================== + +The global and constant memory spaces both use global virtual addresses, which +are the same virtual address space used by the CPU. However, some virtual +addresses may only be accessible to the CPU, some only accessible by the GPU, +and some by both. + +Using the constant memory space indicates that the data will not change during +the execution of the kernel. This allows scalar read instructions to be +used. The vector and scalar L1 caches are invalidated of volatile data before +each kernel dispatch execution to allow constant memory to change values between +kernel dispatches. + +The local memory space uses the hardware Local Data Store (LDS) which is +automatically allocated when the hardware creates work-groups of wavefronts, and +freed when all the wavefronts of a work-group have terminated. The data store +(DS) instructions can be used to access it. + +The private memory space uses the hardware scratch memory support. If the kernel +uses scratch, then the hardware allocates memory that is accessed using +wavefront lane dword (4 byte) interleaving. The mapping used from private +address to physical address is: + + ``wavefront-scratch-base + + (private-address * wavefront-size * 4) + + (wavefront-lane-id * 4)`` + +There are different ways that the wavefront scratch base address is determined +by a wavefront (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). This +memory can be accessed in an interleaved manner using buffer instruction with +the scratch buffer descriptor and per wavefront scratch offset, by the scratch +instructions, or by flat instructions. If each lane of a wavefront accesses the +same private address, the interleaving results in adjacent dwords being accessed +and hence requires fewer cache lines to be fetched. Multi-dword access is not +supported except by flat and scratch instructions in GFX9-GFX10. + +The generic address space uses the hardware flat address support available in +GFX7-GFX10. This uses two fixed ranges of virtual addresses (the private and +local apertures), that are outside the range of addressible global memory, to +map from a flat address to a private or local address. + +FLAT instructions can take a flat address and access global, private (scratch) +and group (LDS) memory depending in if the address is within one of the +aperture ranges. Flat access to scratch requires hardware aperture setup and +setup in the kernel prologue (see +:ref:`amdgpu-amdhsa-kernel-prolog-flat-scratch`). Flat access to LDS requires +hardware aperture setup and M0 (GFX7-GFX8) register setup (see +:ref:`amdgpu-amdhsa-kernel-prolog-m0`). + +To convert between a segment address and a flat address the base address of the +apertures address can be used. For GFX7-GFX8 these are available in the +:ref:`amdgpu-amdhsa-hsa-aql-queue` the address of which can be obtained with +Queue Ptr SGPR (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). For +GFX9-GFX10 the aperture base addresses are directly available as inline constant +registers ``SRC_SHARED_BASE/LIMIT`` and ``SRC_PRIVATE_BASE/LIMIT``. In 64 bit +address mode the aperture sizes are 2^32 bytes and the base is aligned to 2^32 +which makes it easier to convert from flat to segment or segment to flat. + Image and Samplers ~~~~~~~~~~~~~~~~~~ @@ -3635,7 +3752,7 @@ First Private Segment Buffer 4 V# that can be used, together (enable_sgpr_private with Scratch Wavefront Offset _segment_buffer) as an offset, to access the - private address space using a + private memory space using a segment address. CP uses the value provided by @@ -3835,13 +3952,13 @@ (kernel descriptor enable of field) VGPRs ========== ========================== ====== ============================== - First Work-Item Id X 1 32-bit work item id in X + First Work-Item Id X 1 32-bit work-item id in X (Always initialized) dimension of work-group for wavefront lane. - then Work-Item Id Y 1 32-bit work item id in Y + then Work-Item Id Y 1 32-bit work-item id in Y (enable_vgpr_workitem_id dimension of work-group for > 0) wavefront lane. - then Work-Item Id Z 1 32-bit work item id in Z + then Work-Item Id Z 1 32-bit work-item id in Z (enable_vgpr_workitem_id dimension of work-group for > 1) wavefront lane. ========== ========================== ====== ============================== @@ -4100,7 +4217,7 @@ * The scalar memory operations access a scalar L1 cache shared by all wavefronts on a group of CUs. The scalar and vector L1 caches are not coherent. However, scalar operations are used in a restricted way so do not impact the memory - model. See :ref:`amdgpu-address-spaces`. + model. See :ref:`amdgpu-amdhsa-memory-spaces`. * The vector and scalar memory operations use an L2 cache shared by all CUs on the same agent. * The L2 cache has independent channels to service disjoint ranges of virtual @@ -4155,7 +4272,7 @@ * The scalar memory operations access a scalar L0 cache shared by all wavefronts on a WGP. The scalar and vector L0 caches are not coherent. However, scalar operations are used in a restricted way so do not impact the memory model. See - :ref:`amdgpu-address-spaces`. + :ref:`amdgpu-amdhsa-memory-spaces`. * The vector and scalar memory L0 caches use an L1 cache shared by all WGPs on the same SA. Therefore, no special action is required for coherence between the wavefronts of a single work-group. However, a ``BUFFER_GL1_INV`` is @@ -4220,7 +4337,7 @@ scalar L1 cache to ensure it is coherent with the vector L1 cache. The scalar and vector L1 caches are invalidated between kernel dispatches by CP since constant address space data may change between kernel dispatch executions. See -:ref:`amdgpu-address-spaces`. +:ref:`amdgpu-amdhsa-memory-spaces`. The one exception is if scalar writes are used to spill SGPR registers. In this case the AMDGPU backend ensures the memory location used to spill is never