Index: docs/AMDGPUUsage.rst =================================================================== --- docs/AMDGPUUsage.rst +++ docs/AMDGPUUsage.rst @@ -5021,6 +5021,8 @@ integer value "4". The possible GFX stepping generation numbers are presented in :ref:`amdgpu-processors`. +.. _amdgpu-amdhsa-assembler-symbol-next_free_vgpr: + .amdgcn.next_free_vgpr ++++++++++++++++++++++ @@ -5034,6 +5036,8 @@ May be set at any time, e.g. manually set to zero at the start of each kernel. +.. _amdgpu-amdhsa-assembler-symbol-next_free_sgpr: + .amdgcn.next_free_sgpr ++++++++++++++++++++++ @@ -5243,6 +5247,80 @@ ... .end_amdgpu_metadata +If an assembly source file contains multiple kernels and/or functions, the +:ref:`amdgpu-amdhsa-assembler-symbol-next_free_vgpr` and +:ref:`amdgpu-amdhsa-assembler-symbol-next_free_sgpr` symbols may be reset using +the ``.set , `` directive. For example, in the case of two +kernels, where ``function1`` is only called from ``kernel1`` it is sufficient +to group the function with the kernel that calls it and reset the symbols +between the two connected components: + +.. code-block:: none + + .amdgcn_target "amdgcn-amd-amdhsa--gfx900+xnack" // optional + + // gpr tracking symbols are implicitly set to zero + + .text + .globl kern0 + .p2align 8 + .type kern0,@function + kern0: + // ... + s_endpgm + .Lkern0_end: + .size kern0, .Lkern0_end-kern0 + + .rodata + .p2align 6 + .amdhsa_kernel kern0 + // ... + .amdhsa_next_free_vgpr .amdgcn.next_free_vgpr + .amdhsa_next_free_sgpr .amdgcn.next_free_sgpr + .end_amdhsa_kernel + + // reset symbols to begin tracking usage in func1 and kern1 + .set .amdgcn.next_free_vgpr, 0 + .set .amdgcn.next_free_sgpr, 0 + + .text + .hidden func1 + .global func1 + .p2align 2 + .type func1,@function + func1: + // ... + s_setpc_b64 s[30:31] + .Lfunc1_end: + .size func1, .Lfunc1_end-func1 + + .globl kern1 + .p2align 8 + .type kern1,@function + kern1: + // ... + s_getpc_b64 s[4:5] + s_add_u32 s4, s4, func1@rel32@lo+4 + s_addc_u32 s5, s5, func1@rel32@lo+4 + s_swappc_b64 s[30:31], s[4:5] + // ... + s_endpgm + .Lkern1_end: + .size kern1, .Lkern1_end-kern1 + + .rodata + .p2align 6 + .amdhsa_kernel kern1 + // ... + .amdhsa_next_free_vgpr .amdgcn.next_free_vgpr + .amdhsa_next_free_sgpr .amdgcn.next_free_sgpr + .end_amdhsa_kernel + +These symbols cannot identify connected components in order to automatically +track the usage for each kernel. However, in some cases careful organization of +the kernels and functions in the source file means there is minimal additional +effort required to accurately calculate GPR usage. + Additional Documentation ========================