Index: docs/AMDGPUUsage.rst =================================================================== --- docs/AMDGPUUsage.rst +++ docs/AMDGPUUsage.rst @@ -925,7 +925,7 @@ This section provides code conventions used when the target triple OS is ``amdhsa`` (see :ref:`amdgpu-target-triples`). -.. _amdgpu-amdhsa-hsa-code-object-metadata: +.. _amdgpu-code-object-target-identification: Code Object Target Identification ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -955,6 +955,8 @@ ``"amdgcn-amd-amdhsa--gfx902+xnack"`` +.. _amdgpu-amdhsa-hsa-code-object-metadata: + Code Object Metadata ~~~~~~~~~~~~~~~~~~~~ @@ -1567,10 +1569,26 @@ execution of a kernel, including the entry point address of the machine code that implements the kernel. +.. _amdgpu-amdhsa-kernel-descriptor-register-counts: + +SGPR and VGPR Counts +++++++++++++++++++++ + +For the purposes of calculating granulated register counts +(GRANULATED_WORKITEM_VGPR_COUNT and GRANULATED_WAVEFRONT_SGPR_COUNT in +:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`) the terms `vgpr_count` +and `sgpr_count` are defined, respectively, as being the highest VGPR and SGPR +numbers explicitly referenced, plus one. That is, in code where the highest +VGPR number referenced is v7 and the highest SGPR number referenced is s3, +`vgpr_count` would be 8, and `sgpr_count` would be 4. As a special case, either +or both counts may be 0; for the purposes of calculating the granularity +counts, a value of 0 should be treated as 1. + Kernel Descriptor for GFX6-GFX9 +++++++++++++++++++++++++++++++ -CP microcode requires the Kernel descritor to be allocated on 64 byte alignment. +CP microcode requires the Kernel descriptor to be allocated on 64 byte +alignment. .. table:: Kernel Descriptor for GFX6-GFX9 :name: amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table @@ -1664,9 +1682,11 @@ specific: GFX6-GFX9 - - max_vgpr 1..256 - - roundup((max_vgpg + 1) - / 4) - 1 + - vgpr_count 1..256 + - ceil(vgpr_count / 4) - 1 + + vgpr_count is defined in + :ref:`amdgpu-amdhsa-kernel-descriptor-register-counts` Used by CP to set up ``COMPUTE_PGM_RSRC1.VGPRS``. @@ -1676,13 +1696,11 @@ specific: GFX6-GFX8 - - max_sgpr 1..112 - - roundup((max_sgpg + 1) - / 8) - 1 + - sgpr_count 1..112 + - ceil(sgpr_count / 8) - 1 GFX9 - - max_sgpr 1..112 - - roundup((max_sgpg + 1) - / 16) - 1 + - sgpr_count 1..112 + - ceil(sgpr_count / 16) - 1 Includes the special SGPRs for VCC, Flat Scratch (for @@ -1692,6 +1710,9 @@ added if a trap handler is enabled. + sgpr_count is defined in + :ref:`amdgpu-amdhsa-kernel-descriptor-register-counts` + Used by CP to set up ``COMPUTE_PGM_RSRC1.SGPRS``. 11:10 2 bits PRIORITY Must be 0. @@ -4232,97 +4253,161 @@ For full list of supported instructions, refer to "Vector ALU instructions". -HSA Code Object Directives -~~~~~~~~~~~~~~~~~~~~~~~~~~ +Predefined Symbols +~~~~~~~~~~~~~~~~~~ -AMDGPU ABI defines auxiliary data in output code object. In assembly source, -one can specify them with assembler directives. +The AMDGPU assembler defines and updates some symbols automatically. These +symbols do not affect code generation. -.hsa_code_object_version major, minor -+++++++++++++++++++++++++++++++++++++ +.amdgcn.machine_version_major ++++++++++++++++++++++++++++++ -*major* and *minor* are integers that specify the version of the HSA code -object that will be generated by the assembler. +Set to the major version number of the target being assembled for. For example, +when assembling for "gfx902" this will be set to the integer value "9". -.hsa_code_object_isa [major, minor, stepping, vendor, arch] -+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +.amdgcn.machine_version_minor ++++++++++++++++++++++++++++++ +Set to the minor version number of the target being assembled for. For example, +when assembling for "gfx902" this will be set to the integer value "0". -*major*, *minor*, and *stepping* are all integers that describe the instruction -set architecture (ISA) version of the assembly program. +.amdgcn.machine_version_stepping +++++++++++++++++++++++++++++++++ -*vendor* and *arch* are quoted strings. *vendor* should always be equal to -"AMD" and *arch* should always be equal to "AMDGPU". +Set to the stepping version number of the target being assembled for. For +example, when assembling for "gfx902" this will be set to the integer value +"2". -By default, the assembler will derive the ISA version, *vendor*, and *arch* -from the value of the -mcpu option that is passed to the assembler. +.amdgcn.vgpr_count +++++++++++++++++++ -.amdgpu_hsa_kernel (name) -+++++++++++++++++++++++++ +Set to zero before assembly begins. At each instruction, if the current value +of this symbol is less than or equal to the maximum VGPR number explicitly +referenced within that instruction then the symbol value is updated to equal +that VGPR number plus one. -This directives specifies that the symbol with given name is a kernel entry point -(label) and the object should contain corresponding symbol of type STT_AMDGPU_HSA_KERNEL. +May be set at any time, e.g. manually set to zero after each kernel. -.amd_kernel_code_t +.amdgcn.sgpr_count ++++++++++++++++++ -This directive marks the beginning of a list of key / value pairs that are used -to specify the amd_kernel_code_t object that will be emitted by the assembler. -The list must be terminated by the *.end_amd_kernel_code_t* directive. For -any amd_kernel_code_t values that are unspecified a default value will be -used. The default value for all keys is 0, with the following exceptions: - -- *kernel_code_version_major* defaults to 1. -- *machine_kind* defaults to 1. -- *machine_version_major*, *machine_version_minor*, and - *machine_version_stepping* are derived from the value of the -mcpu option - that is passed to the assembler. -- *kernel_code_entry_byte_offset* defaults to 256. -- *wavefront_size* defaults to 6. -- *kernarg_segment_alignment*, *group_segment_alignment*, and - *private_segment_alignment* default to 4. Note that alignments are specified - as a power of two, so a value of **n** means an alignment of 2^ **n**. - -The *.amd_kernel_code_t* directive must be placed immediately after the -function label and before any instructions. - -For a full list of amd_kernel_code_t keys, refer to AMDGPU ABI document, -comments in lib/Target/AMDGPU/AmdKernelCodeT.h and test/CodeGen/AMDGPU/hsa.s. - -Here is an example of a minimal amd_kernel_code_t specification: - -.. code-block:: none - - .hsa_code_object_version 1,0 - .hsa_code_object_isa - - .hsatext - .globl hello_world - .p2align 8 - .amdgpu_hsa_kernel hello_world - - hello_world: - - .amd_kernel_code_t - enable_sgpr_kernarg_segment_ptr = 1 - is_ptr64 = 1 - compute_pgm_rsrc1_vgprs = 0 - compute_pgm_rsrc1_sgprs = 0 - compute_pgm_rsrc2_user_sgpr = 2 - kernarg_segment_byte_size = 8 - wavefront_sgpr_count = 2 - workitem_vgpr_count = 3 - .end_amd_kernel_code_t - - s_load_dwordx2 s[0:1], s[0:1] 0x0 - v_mov_b32 v0, 3.14159 - s_waitcnt lgkmcnt(0) - v_mov_b32 v1, s0 - v_mov_b32 v2, s1 - flat_store_dword v[1:2], v0 - s_endpgm - .Lfunc_end0: - .size hello_world, .Lfunc_end0-hello_world +Set to zero before assembly begins. At each instruction, if the current value +of this symbol is less than or equal the maximum SGPR number explicitly +referenced within that instruction then the symbol value is updated to equal +that SGPR number plus one. + +May be set at any time, e.g. manually set to zero after each kernel. + +Code Object Directives +~~~~~~~~~~~~~~~~~~~~~~ + +Directives which begin with ``.amdgcn`` are valid for all GCN targets, and are +not OS-specific. Directives which begin with ``.amdhsa`` are specific to +GCN targets with the ``amdhsa`` OS in their triple. + +.amdgcn_target ++++++++++++++++++++++++ + +Declares the target supported by the containing assembler source file. Valid +values are described in :ref:`amdgpu-code-object-target-identification`. Used +by the assembler to validate command-line options such as ``-triple``, +``-mcpu``, and those which specify target features. + +.amdhsa_kernel ++++++++++++++++++++++ + +Creates a correctly aligned AMDHSA kernel descriptor and a symbol, +``@kd``, in the current location of the current section. Only valid when +the target OS is ``amdhsa``. + +Marks the beginning of a list of directives used to generate the bytes of a +kernel descriptor, as described in :ref:`amdgpu-amdhsa-kernel-descriptor`. +Directives which may appear in this list are described in +:ref:`amdhsa-kernel-directives-table`. Directives may appear in any order, must +be valid for the target being assembled for, and cannot be repeated. Directives +support the range of values specified by the field they reference in +:ref:`amdgpu-amdhsa-kernel-descriptor`. If a directive is not specified, it is +assumed to have its default value, unless it is marked as "Required", in which +case it is an error to omit the directive. This list of directives is +terminated by an ``.end_amdhsa_kernel`` directive. + + .. table:: AMDHSA Kernel Assembler Directives + :name: amdhsa-kernel-directives-table + + ======================================================== ================ ============ =================== + Directive Default Supported On Description + ======================================================== ================ ============ =================== + ``.amdhsa_group_segment_fixed_size`` 0 GFX6-GFX9 Controls GroupSegmentFixedSize in + :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table` + ``.amdhsa_private_segment_fixed_size`` 0 GFX6-GFX9 Controls PrivateSegmentFixedSize in + :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table` + ``.amdhsa_user_sgpr_private_segment_buffer`` 0 GFX6-GFX9 Controls EnableSGPRPrivateSegmentBuffer in + :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table` + ``.amdhsa_user_sgpr_dispatch_ptr`` 0 GFX6-GFX9 Controls EnableSGPRDispatchPtr in + :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table` + ``.amdhsa_user_sgpr_queue_ptr`` 0 GFX6-GFX9 Controls EnableSGPRQueuePtr in + :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table` + ``.amdhsa_user_sgpr_kernarg_segment_ptr`` 0 GFX6-GFX9 Controls EnableSGPRKernargSegmentPtr in + :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table` + ``.amdhsa_user_sgpr_dispatch_id`` 0 GFX6-GFX9 Controls EnableSGPRDispatchID in + :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table` + ``.amdhsa_user_sgpr_flat_scratch_init`` 0 GFX6-GFX9 Controls EnableSGPRFlatScratchInit in + :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table` + ``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX9 Controls EnableSGPRPrivateSegmentSize in + :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table` + ``.amdhsa_user_sgpr_grid_workgroup_count_x`` 0 GFX6-GFX9 Controls EnableSGPRGridWorkgroupCountX in + :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table` + ``.amdhsa_user_sgpr_grid_workgroup_count_y`` 0 GFX6-GFX9 Controls EnableSGPRGridWorkgroupCountY in + :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table` + ``.amdhsa_user_sgpr_grid_workgroup_count_z`` 0 GFX6-GFX9 Controls EnableSGPRGridWorkgroupCountZ in + :ref:`amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table` + ``.amdhsa_system_sgpr_private_segment_wavefront_offset`` 0 GFX6-GFX9 Controls ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table` + ``.amdhsa_system_sgpr_workgroup_id_x`` 0 GFX6-GFX9 Controls ENABLE_SGPR_WORKGROUP_ID_X in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table` + ``.amdhsa_system_sgpr_workgroup_id_y`` 0 GFX6-GFX9 Controls ENABLE_SGPR_WORKGROUP_ID_Y in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table` + ``.amdhsa_system_sgpr_workgroup_id_z`` 0 GFX6-GFX9 Controls ENABLE_SGPR_WORKGROUP_ID_Z in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table` + ``.amdhsa_system_sgpr_workgroup_info`` 0 GFX6-GFX9 Controls ENABLE_SGPR_WORKGROUP_INFO in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table` + ``.amdhsa_system_vgpr_workitem_id`` 0 GFX6-GFX9 Controls ENABLE_VGPR_WORKITEM_ID in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table` + ``.amdhsa_vgpr_count`` Required GFX6-GFX9 Maximum VGPR number used, plus one. Used to calculate + GRANULATED_WORKITEM_VGPR_COUNT in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table` + ``.amdhsa_sgpr_count`` Required GFX6-GFX9 Maximum SGPR number used, plus one. Used to calculate + GRANULATED_WAVEFRONT_SGPR_COUNT in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table` + ``.amdhsa_float_round_mode_32`` 0 GFX6-GFX9 Controls FLOAT_ROUND_MODE_32 in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table` + ``.amdhsa_float_round_mode_16_64`` 0 GFX6-GFX9 Controls FLOAT_ROUND_MODE_16_64 in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table` + ``.amdhsa_float_denorm_mode_32`` 0 GFX6-GFX9 Controls FLOAT_DENORM_MODE_32 in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table` + ``.amdhsa_float_denorm_mode_16_64`` 3 GFX6-GFX9 Controls FLOAT_DENORM_MODE_16_64 in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table` + ``.amdhsa_dx10_clamp`` 1 GFX6-GFX9 Controls ENABLE_DX10_CLAMP in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table` + ``.amdhsa_ieee_mode`` 1 GFX6-GFX9 Controls ENABLE_IEEE_MODE in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table` + ``.amdhsa_fp16_overflow`` 0 GFX9 Controls FP16_OVFL in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table` + ``.amdhsa_exception_fp_ieee_invalid_op`` 0 GFX6-GFX9 Controls ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table` + ``.amdhsa_exception_fp_denorm_src`` 0 GFX6-GFX9 Controls ENABLE_EXCEPTION_FP_DENORMAL_SOURCE in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table` + ``.amdhsa_exception_fp_ieee_div_zero`` 0 GFX6-GFX9 Controls ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table` + ``.amdhsa_exception_fp_ieee_overflow`` 0 GFX6-GFX9 Controls ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table` + ``.amdhsa_exception_fp_ieee_underflow`` 0 GFX6-GFX9 Controls ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table` + ``.amdhsa_exception_fp_ieee_inexact`` 0 GFX6-GFX9 Controls ENABLE_EXCEPTION_IEEE_754_FP_INEXACT in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table` + ``.amdhsa_exception_int_div_zero`` 0 GFX6-GFX9 Controls ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO in + :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table` + ======================================================== ================ ============ =================== Additional Documentation ======================== Index: include/llvm/Support/AMDHSAKernelDescriptor.h =================================================================== --- include/llvm/Support/AMDHSAKernelDescriptor.h +++ include/llvm/Support/AMDHSAKernelDescriptor.h @@ -20,7 +20,7 @@ // Gets offset of specified member in specified type. #ifndef offsetof -#define offsetof(TYPE, MEMBER) ((size_t)&((TYPE*)0)->MEMBER) +#define offsetof(TYPE, MEMBER) ((std::size_t) & ((TYPE *)0)->MEMBER) #endif // offsetof // Creates enumeration entries used for packing bits into integers. Enumeration @@ -150,6 +150,9 @@ uint8_t reserved2[6]; }; +/// Overwrite \p KD with default values. +void setDefaultKernelDescriptor(kernel_descriptor_t &KD); + static_assert( sizeof(kernel_descriptor_t) == 64, "invalid size for kernel_descriptor_t"); Index: lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp =================================================================== --- lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp +++ lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp @@ -218,7 +218,8 @@ SmallString<128> KernelName; getNameWithPrefix(KernelName, &MF->getFunction()); getTargetStreamer()->EmitAmdhsaKernelDescriptor( - KernelName, getAmdhsaKernelDescriptor(*MF, CurrentProgramInfo)); + KernelName, getAmdhsaKernelDescriptor(*MF, CurrentProgramInfo), + *getSTI()); } void AMDGPUAsmPrinter::EmitFunctionEntryLabel() { Index: lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp =================================================================== --- lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp +++ lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp @@ -42,6 +42,7 @@ #include "llvm/MC/MCSubtargetInfo.h" #include "llvm/MC/MCSymbol.h" #include "llvm/Support/AMDGPUMetadata.h" +#include "llvm/Support/AMDHSAKernelDescriptor.h" #include "llvm/Support/Casting.h" #include "llvm/Support/Compiler.h" #include "llvm/Support/ErrorHandling.h" @@ -61,6 +62,7 @@ using namespace llvm; using namespace llvm::AMDGPU; +using namespace llvm::amdhsa; namespace { @@ -845,6 +847,9 @@ private: bool ParseAsAbsoluteExpression(uint32_t &Ret); + bool OutOfRangeError(SMRange Range); + bool ParseDirectiveAMDGCNTarget(); + bool ParseDirectiveAMDHSAKernel(); bool ParseDirectiveMajorMinor(uint32_t &Major, uint32_t &Minor); bool ParseDirectiveHSACodeObjectVersion(); bool ParseDirectiveHSACodeObjectISA(); @@ -863,6 +868,10 @@ bool ParseAMDGPURegister(RegisterKind& RegKind, unsigned& Reg, unsigned& RegNum, unsigned& RegWidth, unsigned *DwordRegIndex); + Optional getGprCountSymbolName(RegisterKind RegKind); + void initializeGprCountSymbol(RegisterKind RegKind); + bool updateGprCountSymbols(RegisterKind RegKind, unsigned DwordRegIndex, + unsigned RegWidth); void cvtMubufImpl(MCInst &Inst, const OperandVector &Operands, bool IsAtomic, bool IsAtomicReturn, bool IsLds = false); void cvtDSImpl(MCInst &Inst, const OperandVector &Operands, @@ -896,15 +905,29 @@ AMDGPU::IsaInfo::IsaVersion ISA = AMDGPU::IsaInfo::getIsaVersion(getFeatureBits()); MCContext &Ctx = getContext(); - MCSymbol *Sym = - Ctx.getOrCreateSymbol(Twine(".option.machine_version_major")); - Sym->setVariableValue(MCConstantExpr::create(ISA.Major, Ctx)); - Sym = Ctx.getOrCreateSymbol(Twine(".option.machine_version_minor")); - Sym->setVariableValue(MCConstantExpr::create(ISA.Minor, Ctx)); - Sym = Ctx.getOrCreateSymbol(Twine(".option.machine_version_stepping")); - Sym->setVariableValue(MCConstantExpr::create(ISA.Stepping, Ctx)); + if (AMDGPU::IsaInfo::hasCodeObjectV3(getFeatureBits())) { + MCSymbol *Sym = + Ctx.getOrCreateSymbol(Twine(".amdgcn.machine_version_major")); + Sym->setVariableValue(MCConstantExpr::create(ISA.Major, Ctx)); + Sym = Ctx.getOrCreateSymbol(Twine(".amdgcn.machine_version_minor")); + Sym->setVariableValue(MCConstantExpr::create(ISA.Minor, Ctx)); + Sym = Ctx.getOrCreateSymbol(Twine(".amdgcn.machine_version_stepping")); + Sym->setVariableValue(MCConstantExpr::create(ISA.Stepping, Ctx)); + } else { + MCSymbol *Sym = + Ctx.getOrCreateSymbol(Twine(".option.machine_version_major")); + Sym->setVariableValue(MCConstantExpr::create(ISA.Major, Ctx)); + Sym = Ctx.getOrCreateSymbol(Twine(".option.machine_version_minor")); + Sym->setVariableValue(MCConstantExpr::create(ISA.Minor, Ctx)); + Sym = Ctx.getOrCreateSymbol(Twine(".option.machine_version_stepping")); + Sym->setVariableValue(MCConstantExpr::create(ISA.Stepping, Ctx)); + } } - KernelScope.initialize(getContext()); + if (AMDGPU::IsaInfo::hasCodeObjectV3(getFeatureBits())) { + initializeGprCountSymbol(IS_VGPR); + initializeGprCountSymbol(IS_SGPR); + } else + KernelScope.initialize(getContext()); } bool hasXNACK() const { @@ -1776,6 +1799,50 @@ return true; } +Optional +AMDGPUAsmParser::getGprCountSymbolName(RegisterKind RegKind) { + switch (RegKind) { + case IS_VGPR: + return StringRef(".amdgcn.vgpr_count"); + case IS_SGPR: + return StringRef(".amdgcn.sgpr_count"); + default: + return None; + } +} + +void AMDGPUAsmParser::initializeGprCountSymbol(RegisterKind RegKind) { + auto SymbolName = getGprCountSymbolName(RegKind); + assert(SymbolName && "initializing invalid register kind"); + MCSymbol *Sym = getContext().getOrCreateSymbol(*SymbolName); + Sym->setVariableValue(MCConstantExpr::create(0, getContext())); +} + +bool AMDGPUAsmParser::updateGprCountSymbols(RegisterKind RegKind, + unsigned DwordRegIndex, + unsigned RegWidth) { + auto SymbolName = getGprCountSymbolName(RegKind); + if (!SymbolName) + return true; + MCSymbol *Sym = getContext().getOrCreateSymbol(*SymbolName); + + int64_t NewMax = DwordRegIndex + RegWidth - 1; + int64_t OldCount; + + if (!Sym->isVariable()) + return !Error(getParser().getTok().getLoc(), + ".amdgcn.{v,s}grp_count symbols must be variable"); + if (!Sym->getVariableValue(false)->evaluateAsAbsolute(OldCount)) + return !Error( + getParser().getTok().getLoc(), + ".amdgcn.{v,s}gpr_count symbols must be absolute expressions"); + + if (OldCount <= NewMax) + Sym->setVariableValue(MCConstantExpr::create(NewMax + 1, getContext())); + + return true; +} + std::unique_ptr AMDGPUAsmParser::parseRegister() { const auto &Tok = Parser.getTok(); SMLoc StartLoc = Tok.getLoc(); @@ -1786,7 +1853,11 @@ if (!ParseAMDGPURegister(RegKind, Reg, RegNum, RegWidth, &DwordRegIndex)) { return nullptr; } - KernelScope.usesRegister(RegKind, DwordRegIndex, RegWidth); + if (AMDGPU::IsaInfo::hasCodeObjectV3(getFeatureBits())) { + if (!updateGprCountSymbols(RegKind, DwordRegIndex, RegWidth)) + return nullptr; + } else + KernelScope.usesRegister(RegKind, DwordRegIndex, RegWidth); return AMDGPUOperand::CreateReg(this, Reg, StartLoc, EndLoc, false); } @@ -2542,6 +2613,245 @@ return false; } +bool AMDGPUAsmParser::ParseDirectiveAMDGCNTarget() { + if (getSTI().getTargetTriple().getArch() != Triple::amdgcn) + return TokError("directive only supported for amdgcn architecture"); + + std::string Target; + + SMLoc TargetStart = getTok().getLoc(); + if (getParser().parseEscapedString(Target)) + return true; + SMRange TargetRange = SMRange(TargetStart, getTok().getLoc()); + + std::string ExpectedTarget; + raw_string_ostream ExpectedTargetOS(ExpectedTarget); + IsaInfo::streamIsaVersion(&getSTI(), ExpectedTargetOS); + + // TODO(scott.linder): should this be in streamIsaVersion? + if (hasXNACK()) + ExpectedTargetOS << "+xnack"; + + if (Target != ExpectedTargetOS.str()) + return getParser().Error(TargetRange.Start, "target must match options", + TargetRange); + + getTargetStreamer().EmitDirectiveAMDGCNTarget(Target); + return false; +} + +bool AMDGPUAsmParser::OutOfRangeError(SMRange Range) { + return getParser().Error(Range.Start, "value out of range", Range); +} + +bool AMDGPUAsmParser::ParseDirectiveAMDHSAKernel() { + if (getSTI().getTargetTriple().getOS() != Triple::AMDHSA) + return TokError("directive only supported for amdhsa OS"); + + StringRef SymbolName; + if (getParser().parseIdentifier(SymbolName)) + return true; + + kernel_descriptor_t KD = getDefaultAmdhsaKernelDescriptor(); + + StringSet<> Seen; + + while (true) { + while (getLexer().is(AsmToken::EndOfStatement)) + Lex(); + + if (getLexer().isNot(AsmToken::Identifier)) + return TokError("expected .amdhsa_ directive or .end_amdhsa_kernel"); + + StringRef ID = getTok().getIdentifier(); + SMRange IDRange = getTok().getLocRange(); + Lex(); + + if (ID == ".end_amdhsa_kernel") + break; + + if (Seen.find(ID) != Seen.end()) + return TokError(".amdhsa_ directives cannot be repeated"); + Seen.insert(ID); + + SMLoc ValStart = getTok().getLoc(); + int64_t IVal; + if (getParser().parseAbsoluteExpression(IVal)) + return true; + SMLoc ValEnd = getTok().getLoc(); + SMRange ValRange = SMRange(ValStart, ValEnd); + + if (IVal < 0) + return OutOfRangeError(ValRange); + + uint64_t Val = IVal; + +#define PARSE_BITS_ENTRY(FIELD, ENTRY, VALUE, RANGE) \ + if (!isUInt(VALUE)) \ + return OutOfRangeError(RANGE); \ + AMDHSA_BITS_SET(FIELD, ENTRY, VALUE); + + if (ID == ".amdhsa_group_segment_fixed_size") { + if (!isUInt(Val)) + return OutOfRangeError(ValRange); + KD.group_segment_fixed_size = Val; + } else if (ID == ".amdhsa_private_segment_fixed_size") { + if (!isUInt(Val)) + return OutOfRangeError(ValRange); + KD.private_segment_fixed_size = Val; + } else if (ID == ".amdhsa_user_sgpr_private_segment_buffer") { + PARSE_BITS_ENTRY(KD.kernel_code_properties, + KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER, + Val, ValRange); + } else if (ID == ".amdhsa_user_sgpr_dispatch_ptr") { + PARSE_BITS_ENTRY(KD.kernel_code_properties, + KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_PTR, Val, + ValRange); + } else if (ID == ".amdhsa_user_sgpr_queue_ptr") { + PARSE_BITS_ENTRY(KD.kernel_code_properties, + KERNEL_CODE_PROPERTY_ENABLE_SGPR_QUEUE_PTR, Val, + ValRange); + } else if (ID == ".amdhsa_user_sgpr_kernarg_segment_ptr") { + PARSE_BITS_ENTRY(KD.kernel_code_properties, + KERNEL_CODE_PROPERTY_ENABLE_SGPR_KERNARG_SEGMENT_PTR, + Val, ValRange); + } else if (ID == ".amdhsa_user_sgpr_dispatch_id") { + PARSE_BITS_ENTRY(KD.kernel_code_properties, + KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_ID, Val, + ValRange); + } else if (ID == ".amdhsa_user_sgpr_flat_scratch_init") { + PARSE_BITS_ENTRY(KD.kernel_code_properties, + KERNEL_CODE_PROPERTY_ENABLE_SGPR_FLAT_SCRATCH_INIT, Val, + ValRange); + } else if (ID == ".amdhsa_user_sgpr_private_segment_size") { + PARSE_BITS_ENTRY(KD.kernel_code_properties, + KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE, + Val, ValRange); + } else if (ID == ".amdhsa_user_sgpr_grid_workgroup_count_x") { + PARSE_BITS_ENTRY(KD.kernel_code_properties, + KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_X, + Val, ValRange); + } else if (ID == ".amdhsa_user_sgpr_grid_workgroup_count_y") { + PARSE_BITS_ENTRY(KD.kernel_code_properties, + KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y, + Val, ValRange); + } else if (ID == ".amdhsa_user_sgpr_grid_workgroup_count_z") { + PARSE_BITS_ENTRY(KD.kernel_code_properties, + KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z, + Val, ValRange); + } else if (ID == ".amdhsa_system_sgpr_private_segment_wavefront_offset") { + PARSE_BITS_ENTRY( + KD.compute_pgm_rsrc2, + COMPUTE_PGM_RSRC2_ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET, Val, + ValRange); + } else if (ID == ".amdhsa_system_sgpr_workgroup_id_x") { + PARSE_BITS_ENTRY(KD.compute_pgm_rsrc2, + COMPUTE_PGM_RSRC2_ENABLE_SGPR_WORKGROUP_ID_X, Val, + ValRange); + } else if (ID == ".amdhsa_system_sgpr_workgroup_id_y") { + PARSE_BITS_ENTRY(KD.compute_pgm_rsrc2, + COMPUTE_PGM_RSRC2_ENABLE_SGPR_WORKGROUP_ID_Y, Val, + ValRange); + } else if (ID == ".amdhsa_system_sgpr_workgroup_id_z") { + PARSE_BITS_ENTRY(KD.compute_pgm_rsrc2, + COMPUTE_PGM_RSRC2_ENABLE_SGPR_WORKGROUP_ID_Z, Val, + ValRange); + } else if (ID == ".amdhsa_system_sgpr_workgroup_info") { + PARSE_BITS_ENTRY(KD.compute_pgm_rsrc2, + COMPUTE_PGM_RSRC2_ENABLE_SGPR_WORKGROUP_INFO, Val, + ValRange); + } else if (ID == ".amdhsa_system_vgpr_workitem_id") { + PARSE_BITS_ENTRY(KD.compute_pgm_rsrc2, + COMPUTE_PGM_RSRC2_ENABLE_VGPR_WORKITEM_ID, Val, + ValRange); + } else if (ID == ".amdhsa_vgpr_count") { + if (Val == 0) + Val = 1; + Val = divideCeil(Val, IsaInfo::getVGPRAllocGranule(getFeatureBits())) - 1; + if (!isUInt(Val)) + return OutOfRangeError(ValRange); + AMDHSA_BITS_SET(KD.compute_pgm_rsrc1, + COMPUTE_PGM_RSRC1_GRANULATED_WORKITEM_VGPR_COUNT, Val); + } else if (ID == ".amdhsa_sgpr_count") { + if (Val == 0) + Val = 1; + Val = divideCeil(Val, IsaInfo::getSGPRAllocGranule(getFeatureBits())) - 1; + if (!isUInt(Val)) + return OutOfRangeError(ValRange); + AMDHSA_BITS_SET(KD.compute_pgm_rsrc1, + COMPUTE_PGM_RSRC1_GRANULATED_WAVEFRONT_SGPR_COUNT, Val); + } else if (ID == ".amdhsa_float_round_mode_32") { + PARSE_BITS_ENTRY(KD.compute_pgm_rsrc1, + COMPUTE_PGM_RSRC1_FLOAT_ROUND_MODE_32, Val, ValRange); + } else if (ID == ".amdhsa_float_round_mode_16_64") { + PARSE_BITS_ENTRY(KD.compute_pgm_rsrc1, + COMPUTE_PGM_RSRC1_FLOAT_ROUND_MODE_16_64, Val, ValRange); + } else if (ID == ".amdhsa_float_denorm_mode_32") { + PARSE_BITS_ENTRY(KD.compute_pgm_rsrc1, + COMPUTE_PGM_RSRC1_FLOAT_DENORM_MODE_32, Val, ValRange); + } else if (ID == ".amdhsa_float_denorm_mode_16_64") { + PARSE_BITS_ENTRY(KD.compute_pgm_rsrc1, + COMPUTE_PGM_RSRC1_FLOAT_DENORM_MODE_16_64, Val, + ValRange); + } else if (ID == ".amdhsa_dx10_clamp") { + PARSE_BITS_ENTRY(KD.compute_pgm_rsrc1, + COMPUTE_PGM_RSRC1_ENABLE_DX10_CLAMP, Val, ValRange); + } else if (ID == ".amdhsa_ieee_mode") { + PARSE_BITS_ENTRY(KD.compute_pgm_rsrc1, COMPUTE_PGM_RSRC1_ENABLE_IEEE_MODE, + Val, ValRange); + } else if (ID == ".amdhsa_fp16_overflow") { + if (!isGFX9()) + return getParser().Error(IDRange.Start, "directive requires gfx9", + IDRange); + PARSE_BITS_ENTRY(KD.compute_pgm_rsrc1, COMPUTE_PGM_RSRC1_FP16_OVFL, Val, + ValRange); + } else if (ID == ".amdhsa_exception_fp_ieee_invalid_op") { + PARSE_BITS_ENTRY( + KD.compute_pgm_rsrc2, + COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION, Val, + ValRange); + } else if (ID == ".amdhsa_exception_fp_denorm_src") { + PARSE_BITS_ENTRY(KD.compute_pgm_rsrc2, + COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_FP_DENORMAL_SOURCE, + Val, ValRange); + } else if (ID == ".amdhsa_exception_fp_ieee_div_zero") { + PARSE_BITS_ENTRY( + KD.compute_pgm_rsrc2, + COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO, Val, + ValRange); + } else if (ID == ".amdhsa_exception_fp_ieee_overflow") { + PARSE_BITS_ENTRY(KD.compute_pgm_rsrc2, + COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW, + Val, ValRange); + } else if (ID == ".amdhsa_exception_fp_ieee_underflow") { + PARSE_BITS_ENTRY(KD.compute_pgm_rsrc2, + COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW, + Val, ValRange); + } else if (ID == ".amdhsa_exception_fp_ieee_inexact") { + PARSE_BITS_ENTRY(KD.compute_pgm_rsrc2, + COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_INEXACT, + Val, ValRange); + } else if (ID == ".amdhsa_exception_int_div_zero") { + PARSE_BITS_ENTRY(KD.compute_pgm_rsrc2, + COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO, + Val, ValRange); + } else { + return getParser().Error(IDRange.Start, + "unknown .amdhsa_kernel directive", IDRange); + } + } +#undef PARSE_BITS_ENTRY + + if (Seen.find(".amdhsa_vgpr_count") == Seen.end()) + return TokError(".amdhsa_vgpr_count directive is required"); + + if (Seen.find(".amdhsa_sgpr_count") == Seen.end()) + return TokError(".amdhsa_sgpr_count directive is required"); + + getTargetStreamer().EmitAmdhsaKernelDescriptor(SymbolName, KD, getSTI()); + return false; +} + bool AMDGPUAsmParser::ParseDirectiveHSACodeObjectVersion() { uint32_t Major; uint32_t Minor; @@ -2661,7 +2971,8 @@ getTargetStreamer().EmitAMDGPUSymbolType(KernelName, ELF::STT_AMDGPU_HSA_KERNEL); Lex(); - KernelScope.initialize(getContext()); + if (!AMDGPU::IsaInfo::hasCodeObjectV3(getFeatureBits())) + KernelScope.initialize(getContext()); return false; } @@ -2765,20 +3076,28 @@ bool AMDGPUAsmParser::ParseDirective(AsmToken DirectiveID) { StringRef IDVal = DirectiveID.getString(); - if (IDVal == ".hsa_code_object_version") - return ParseDirectiveHSACodeObjectVersion(); + if (AMDGPU::IsaInfo::hasCodeObjectV3(getFeatureBits())) { + if (IDVal == ".amdgcn_target") + return ParseDirectiveAMDGCNTarget(); - if (IDVal == ".hsa_code_object_isa") - return ParseDirectiveHSACodeObjectISA(); + if (IDVal == ".amdhsa_kernel") + return ParseDirectiveAMDHSAKernel(); + } else { + if (IDVal == ".hsa_code_object_version") + return ParseDirectiveHSACodeObjectVersion(); + + if (IDVal == ".hsa_code_object_isa") + return ParseDirectiveHSACodeObjectISA(); - if (IDVal == ".amd_kernel_code_t") - return ParseDirectiveAMDKernelCodeT(); + if (IDVal == ".amd_kernel_code_t") + return ParseDirectiveAMDKernelCodeT(); - if (IDVal == ".amdgpu_hsa_kernel") - return ParseDirectiveAMDGPUHsaKernel(); + if (IDVal == ".amdgpu_hsa_kernel") + return ParseDirectiveAMDGPUHsaKernel(); - if (IDVal == ".amd_amdgpu_isa") - return ParseDirectiveISAVersion(); + if (IDVal == ".amd_amdgpu_isa") + return ParseDirectiveISAVersion(); + } if (IDVal == AMDGPU::HSAMD::AssemblerDirectiveBegin) return ParseDirectiveHSAMetadata(); Index: lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h =================================================================== --- lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h +++ lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h @@ -40,6 +40,8 @@ AMDGPUTargetStreamer(MCStreamer &S) : MCTargetStreamer(S) {} + virtual void EmitDirectiveAMDGCNTarget(StringRef Target) = 0; + virtual void EmitDirectiveHSACodeObjectVersion(uint32_t Major, uint32_t Minor) = 0; @@ -65,14 +67,17 @@ virtual bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) = 0; virtual void EmitAmdhsaKernelDescriptor( - StringRef KernelName, - const amdhsa::kernel_descriptor_t &KernelDescriptor) = 0; + StringRef KernelName, const amdhsa::kernel_descriptor_t &KernelDescriptor, + const MCSubtargetInfo &STI) = 0; }; class AMDGPUTargetAsmStreamer final : public AMDGPUTargetStreamer { formatted_raw_ostream &OS; public: AMDGPUTargetAsmStreamer(MCStreamer &S, formatted_raw_ostream &OS); + + void EmitDirectiveAMDGCNTarget(StringRef Target) override; + void EmitDirectiveHSACodeObjectVersion(uint32_t Major, uint32_t Minor) override; @@ -94,8 +99,8 @@ bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) override; void EmitAmdhsaKernelDescriptor( - StringRef KernelName, - const amdhsa::kernel_descriptor_t &KernelDescriptor) override; + StringRef KernelName, const amdhsa::kernel_descriptor_t &KernelDescriptor, + const MCSubtargetInfo &STI) override; }; class AMDGPUTargetELFStreamer final : public AMDGPUTargetStreamer { @@ -109,6 +114,8 @@ MCELFStreamer &getStreamer(); + void EmitDirectiveAMDGCNTarget(StringRef Target) override; + void EmitDirectiveHSACodeObjectVersion(uint32_t Major, uint32_t Minor) override; @@ -130,8 +137,8 @@ bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) override; void EmitAmdhsaKernelDescriptor( - StringRef KernelName, - const amdhsa::kernel_descriptor_t &KernelDescriptor) override; + StringRef KernelName, const amdhsa::kernel_descriptor_t &KernelDescriptor, + const MCSubtargetInfo &STI) override; }; } Index: lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp =================================================================== --- lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp +++ lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp @@ -133,9 +133,13 @@ formatted_raw_ostream &OS) : AMDGPUTargetStreamer(S), OS(OS) { } -void -AMDGPUTargetAsmStreamer::EmitDirectiveHSACodeObjectVersion(uint32_t Major, - uint32_t Minor) { +void AMDGPUTargetAsmStreamer::EmitDirectiveAMDGCNTarget(StringRef Target) { + // TODO: quote/escape Target? + OS << "\t.amdgcn_target \"" << Target << "\"\n"; +} + +void AMDGPUTargetAsmStreamer::EmitDirectiveHSACodeObjectVersion( + uint32_t Major, uint32_t Minor) { OS << "\t.hsa_code_object_version " << Twine(Major) << "," << Twine(Minor) << '\n'; } @@ -197,9 +201,151 @@ } void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor( - StringRef KernelName, - const amdhsa::kernel_descriptor_t &KernelDescriptor) { - // FIXME: not supported yet. + StringRef KernelName, const amdhsa::kernel_descriptor_t &KD, + const MCSubtargetInfo &STI) { + amdhsa::kernel_descriptor_t DefaultKD = getDefaultAmdhsaKernelDescriptor(); + + uint32_t GranulatedVgpr = + AMDHSA_BITS_GET(KD.compute_pgm_rsrc1, + amdhsa::COMPUTE_PGM_RSRC1_GRANULATED_WORKITEM_VGPR_COUNT); + uint32_t VgprCount = + (GranulatedVgpr + 1) * + AMDGPU::IsaInfo::getVGPRAllocGranule(STI.getFeatureBits()); + uint32_t GranulatedSgpr = AMDHSA_BITS_GET( + KD.compute_pgm_rsrc1, + amdhsa::COMPUTE_PGM_RSRC1_GRANULATED_WAVEFRONT_SGPR_COUNT); + uint32_t SgprCount = + (GranulatedSgpr + 1) * + AMDGPU::IsaInfo::getSGPRAllocGranule(STI.getFeatureBits()); + + // TODO: quote/escape KernelName? + OS << "\t.amdhsa_kernel " << KernelName << '\n'; + +#define PRINT_IF_NOT_DEFAULT(STREAM, DIRECTIVE, KERNEL_DESC, \ + DEFAULT_KERNEL_DESC, MEMBER_NAME, FIELD_NAME) \ + if (AMDHSA_BITS_GET(KERNEL_DESC.MEMBER_NAME, FIELD_NAME) != \ + AMDHSA_BITS_GET(DEFAULT_KERNEL_DESC.MEMBER_NAME, FIELD_NAME)) \ + STREAM << "\t\t" << DIRECTIVE << " " \ + << AMDHSA_BITS_GET(KERNEL_DESC.MEMBER_NAME, FIELD_NAME) << '\n'; + + if (KD.group_segment_fixed_size != DefaultKD.group_segment_fixed_size) + OS << "\t\t.amdhsa_group_segment_fixed_size " << KD.group_segment_fixed_size + << '\n'; + if (KD.private_segment_fixed_size != DefaultKD.private_segment_fixed_size) + OS << "\t\t.amdhsa_private_segment_fixed_size " + << KD.private_segment_fixed_size << '\n'; + + PRINT_IF_NOT_DEFAULT( + OS, ".amdhsa_user_sgpr_private_segment_buffer", KD, DefaultKD, + kernel_code_properties, + amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER); + PRINT_IF_NOT_DEFAULT(OS, ".amdhsa_user_sgpr_dispatch_ptr", KD, DefaultKD, + kernel_code_properties, + amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_PTR); + PRINT_IF_NOT_DEFAULT(OS, ".amdhsa_user_sgpr_queue_ptr", KD, DefaultKD, + kernel_code_properties, + amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_QUEUE_PTR); + PRINT_IF_NOT_DEFAULT( + OS, ".amdhsa_user_sgpr_kernarg_segment_ptr", KD, DefaultKD, + kernel_code_properties, + amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_KERNARG_SEGMENT_PTR); + PRINT_IF_NOT_DEFAULT(OS, ".amdhsa_user_sgpr_dispatch_id", KD, DefaultKD, + kernel_code_properties, + amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_ID); + PRINT_IF_NOT_DEFAULT( + OS, ".amdhsa_user_sgpr_flat_scratch_init", KD, DefaultKD, + kernel_code_properties, + amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_FLAT_SCRATCH_INIT); + PRINT_IF_NOT_DEFAULT( + OS, ".amdhsa_user_sgpr_private_segment_size", KD, DefaultKD, + kernel_code_properties, + amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE); + PRINT_IF_NOT_DEFAULT( + OS, ".amdhsa_user_sgpr_grid_workgroup_count_x", KD, DefaultKD, + kernel_code_properties, + amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_X); + PRINT_IF_NOT_DEFAULT( + OS, ".amdhsa_user_sgpr_grid_workgroup_count_y", KD, DefaultKD, + kernel_code_properties, + amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y); + PRINT_IF_NOT_DEFAULT( + OS, ".amdhsa_user_sgpr_grid_workgroup_count_z", KD, DefaultKD, + kernel_code_properties, + amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z); + PRINT_IF_NOT_DEFAULT( + OS, ".amdhsa_system_sgpr_private_segment_wavefront_offset", KD, DefaultKD, + compute_pgm_rsrc2, + amdhsa::COMPUTE_PGM_RSRC2_ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET); + PRINT_IF_NOT_DEFAULT(OS, ".amdhsa_system_sgpr_workgroup_id_x", KD, DefaultKD, + compute_pgm_rsrc2, + amdhsa::COMPUTE_PGM_RSRC2_ENABLE_SGPR_WORKGROUP_ID_X); + PRINT_IF_NOT_DEFAULT(OS, ".amdhsa_system_sgpr_workgroup_id_y", KD, DefaultKD, + compute_pgm_rsrc2, + amdhsa::COMPUTE_PGM_RSRC2_ENABLE_SGPR_WORKGROUP_ID_Y); + PRINT_IF_NOT_DEFAULT(OS, ".amdhsa_system_sgpr_workgroup_id_z", KD, DefaultKD, + compute_pgm_rsrc2, + amdhsa::COMPUTE_PGM_RSRC2_ENABLE_SGPR_WORKGROUP_ID_Z); + PRINT_IF_NOT_DEFAULT(OS, ".amdhsa_system_sgpr_workgroup_info", KD, DefaultKD, + compute_pgm_rsrc2, + amdhsa::COMPUTE_PGM_RSRC2_ENABLE_SGPR_WORKGROUP_INFO); + PRINT_IF_NOT_DEFAULT(OS, ".amdhsa_system_vgpr_workitem_id", KD, DefaultKD, + compute_pgm_rsrc2, + amdhsa::COMPUTE_PGM_RSRC2_ENABLE_VGPR_WORKITEM_ID); + + // These directives are required. + OS << "\t\t.amdhsa_vgpr_count " << VgprCount << '\n'; + OS << "\t\t.amdhsa_sgpr_count " << SgprCount << '\n'; + + PRINT_IF_NOT_DEFAULT(OS, ".amdhsa_float_round_mode_32", KD, DefaultKD, + compute_pgm_rsrc1, + amdhsa::COMPUTE_PGM_RSRC1_FLOAT_ROUND_MODE_32); + PRINT_IF_NOT_DEFAULT(OS, ".amdhsa_float_round_mode_16_64", KD, DefaultKD, + compute_pgm_rsrc1, + amdhsa::COMPUTE_PGM_RSRC1_FLOAT_ROUND_MODE_16_64); + PRINT_IF_NOT_DEFAULT(OS, ".amdhsa_float_denorm_mode_32", KD, DefaultKD, + compute_pgm_rsrc1, + amdhsa::COMPUTE_PGM_RSRC1_FLOAT_DENORM_MODE_32); + PRINT_IF_NOT_DEFAULT(OS, ".amdhsa_float_denorm_mode_16_64", KD, DefaultKD, + compute_pgm_rsrc1, + amdhsa::COMPUTE_PGM_RSRC1_FLOAT_DENORM_MODE_16_64); + PRINT_IF_NOT_DEFAULT(OS, ".amdhsa_dx10_clamp", KD, DefaultKD, + compute_pgm_rsrc1, + amdhsa::COMPUTE_PGM_RSRC1_ENABLE_DX10_CLAMP); + PRINT_IF_NOT_DEFAULT(OS, ".amdhsa_ieee_mode", KD, DefaultKD, + compute_pgm_rsrc1, + amdhsa::COMPUTE_PGM_RSRC1_ENABLE_IEEE_MODE); + if (isGFX9(STI)) + PRINT_IF_NOT_DEFAULT(OS, ".amdhsa_fp16_overflow", KD, DefaultKD, + compute_pgm_rsrc1, + amdhsa::COMPUTE_PGM_RSRC1_FP16_OVFL); + PRINT_IF_NOT_DEFAULT( + OS, ".amdhsa_exception_fp_ieee_invalid_op", KD, DefaultKD, + compute_pgm_rsrc2, + amdhsa::COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION); + PRINT_IF_NOT_DEFAULT( + OS, ".amdhsa_exception_fp_denorm_src", KD, DefaultKD, compute_pgm_rsrc2, + amdhsa::COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_FP_DENORMAL_SOURCE); + PRINT_IF_NOT_DEFAULT( + OS, ".amdhsa_exception_fp_ieee_div_zero", KD, DefaultKD, + compute_pgm_rsrc2, + amdhsa::COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO); + PRINT_IF_NOT_DEFAULT( + OS, ".amdhsa_exception_fp_ieee_overflow", KD, DefaultKD, + compute_pgm_rsrc2, + amdhsa::COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW); + PRINT_IF_NOT_DEFAULT( + OS, ".amdhsa_exception_fp_ieee_underflow", KD, DefaultKD, + compute_pgm_rsrc2, + amdhsa::COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW); + PRINT_IF_NOT_DEFAULT( + OS, ".amdhsa_exception_fp_ieee_inexact", KD, DefaultKD, compute_pgm_rsrc2, + amdhsa::COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_IEEE_754_FP_INEXACT); + PRINT_IF_NOT_DEFAULT( + OS, ".amdhsa_exception_int_div_zero", KD, DefaultKD, compute_pgm_rsrc2, + amdhsa::COMPUTE_PGM_RSRC2_ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO); +#undef PRINT_IF_NOT_DEFAULT + + OS << "\t.end_amdhsa_kernel\n"; } //===----------------------------------------------------------------------===// @@ -247,9 +393,10 @@ S.PopSection(); } -void -AMDGPUTargetELFStreamer::EmitDirectiveHSACodeObjectVersion(uint32_t Major, - uint32_t Minor) { +void AMDGPUTargetELFStreamer::EmitDirectiveAMDGCNTarget(StringRef Target) {} + +void AMDGPUTargetELFStreamer::EmitDirectiveHSACodeObjectVersion( + uint32_t Major, uint32_t Minor) { EmitAMDGPUNote( MCConstantExpr::create(8, getContext()), @@ -370,8 +517,8 @@ } void AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor( - StringRef KernelName, - const amdhsa::kernel_descriptor_t &KernelDescriptor) { + StringRef KernelName, const amdhsa::kernel_descriptor_t &KernelDescriptor, + const MCSubtargetInfo &STI) { auto &Streamer = getStreamer(); auto &Context = Streamer.getContext(); auto &ObjectFileInfo = *Context.getObjectFileInfo(); Index: lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h =================================================================== --- lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h +++ lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h @@ -16,6 +16,7 @@ #include "llvm/ADT/StringRef.h" #include "llvm/IR/CallingConv.h" #include "llvm/MC/MCInstrDesc.h" +#include "llvm/Support/AMDHSAKernelDescriptor.h" #include "llvm/Support/Compiler.h" #include "llvm/Support/ErrorHandling.h" #include @@ -171,6 +172,8 @@ void initDefaultAMDKernelCodeT(amd_kernel_code_t &Header, const FeatureBitset &Features); +amdhsa::kernel_descriptor_t getDefaultAmdhsaKernelDescriptor(); + bool isGroupSegment(const GlobalValue *GV); bool isGlobalSegment(const GlobalValue *GV); bool isReadOnlySegment(const GlobalValue *GV); Index: lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp =================================================================== --- lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp +++ lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp @@ -446,6 +446,19 @@ Header.private_segment_alignment = 4; } +amdhsa::kernel_descriptor_t getDefaultAmdhsaKernelDescriptor() { + amdhsa::kernel_descriptor_t KD; + memset(&KD, 0, sizeof(KD)); + AMDHSA_BITS_SET(KD.compute_pgm_rsrc1, + amdhsa::COMPUTE_PGM_RSRC1_FLOAT_DENORM_MODE_16_64, + amdhsa::FLOAT_DENORM_MODE_FLUSH_NONE); + AMDHSA_BITS_SET(KD.compute_pgm_rsrc1, + amdhsa::COMPUTE_PGM_RSRC1_ENABLE_DX10_CLAMP, 1); + AMDHSA_BITS_SET(KD.compute_pgm_rsrc1, + amdhsa::COMPUTE_PGM_RSRC1_ENABLE_IEEE_MODE, 1); + return KD; +} + bool isGroupSegment(const GlobalValue *GV) { return GV->getType()->getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS; } Index: test/MC/AMDGPU/hsa-diag-v3.s =================================================================== --- /dev/null +++ test/MC/AMDGPU/hsa-diag-v3.s @@ -0,0 +1,45 @@ +// RUN: not llvm-mc -mattr=+code-object-v3 -triple amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=+xnack -show-encoding %s 2>&1 >/dev/null | FileCheck %s +// RUN: not llvm-mc -mattr=+code-object-v3 -triple amdgcn-amd- -mcpu=gfx803 -mattr=+xnack -show-encoding %s 2>&1 >/dev/null | FileCheck %s --check-prefix=NOT-AMDHSA + +.text + +.amdgcn_target "amdgcn--amdhsa-gfx803+xnack" +// CHECK: error: target must match options + +.amdhsa_kernel +// CHECK: error: unknown directive +.end_amdhsa_kernel + +.amdhsa_kernel foo + .amdhsa_group_segment_fixed_size -1 + // CHECK: error: value out of range +.end_amdhsa_kernel + +.amdhsa_kernel foo + .amdhsa_group_segment_fixed_size 10000000000 + 1 + // CHECK: error: value out of range +.end_amdhsa_kernel + +.amdhsa_kernel foo + // NOT-AMDHSA: error: directive only supported for amdhsa OS +.end_amdhsa_kernel + +.amdhsa_kernel foo + .amdhsa_group_segment_fixed_size 1 + .amdhsa_group_segment_fixed_size 1 + // CHECK: error: .amdhsa_ directives cannot be repeated +.end_amdhsa_kernel + +.amdhsa_kernel foo + // CHECK: error: .amdhsa_vgpr_count directive is required +.end_amdhsa_kernel + +.amdhsa_kernel foo + .amdhsa_vgpr_count 0 + // CHECK: error: .amdhsa_sgpr_count directive is required +.end_amdhsa_kernel + +.amdhsa_kernel foo + 1 + // CHECK: error: expected .amdhsa_ directive or .end_amdhsa_kernel +.end_amdhsa_kernel Index: test/MC/AMDGPU/hsa-v3.s =================================================================== --- /dev/null +++ test/MC/AMDGPU/hsa-v3.s @@ -0,0 +1,90 @@ +// RUN: llvm-mc -mattr=+code-object-v3 -triple amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=+xnack -show-encoding %s | FileCheck %s + +.amdgcn_target "amdgcn-amd-amdhsa--gfx803+xnack" +// CHECK: .amdgcn_target "amdgcn-amd-amdhsa--gfx803+xnack" + +.rodata +// CHECK: .rodata + +.amdhsa_kernel minimal + .amdhsa_vgpr_count 0 + .amdhsa_sgpr_count 0 +.end_amdhsa_kernel + +// CHECK: .amdhsa_kernel minimal +// CHECK-NEXT: .amdhsa_vgpr_count 4 +// CHECK-NEXT: .amdhsa_sgpr_count 16 +// CHECK-NEXT: .end_amdhsa_kernel + + +.amdhsa_kernel complete + .amdhsa_group_segment_fixed_size 1 + .amdhsa_private_segment_fixed_size 1 + .amdhsa_user_sgpr_private_segment_buffer 1 + .amdhsa_user_sgpr_dispatch_ptr 1 + .amdhsa_user_sgpr_queue_ptr 1 + .amdhsa_user_sgpr_kernarg_segment_ptr 1 + .amdhsa_user_sgpr_dispatch_id 1 + .amdhsa_user_sgpr_flat_scratch_init 1 + .amdhsa_user_sgpr_private_segment_size 1 + .amdhsa_user_sgpr_grid_workgroup_count_x 1 + .amdhsa_user_sgpr_grid_workgroup_count_y 1 + .amdhsa_user_sgpr_grid_workgroup_count_z 1 + .amdhsa_system_sgpr_private_segment_wavefront_offset 1 + .amdhsa_system_sgpr_workgroup_id_x 1 + .amdhsa_system_sgpr_workgroup_id_y 1 + .amdhsa_system_sgpr_workgroup_id_z 1 + .amdhsa_system_sgpr_workgroup_info 1 + .amdhsa_system_vgpr_workitem_id 1 + .amdhsa_vgpr_count 5 + .amdhsa_sgpr_count 17 + .amdhsa_float_round_mode_32 1 + .amdhsa_float_round_mode_16_64 1 + .amdhsa_float_denorm_mode_32 1 + .amdhsa_float_denorm_mode_16_64 0 + .amdhsa_dx10_clamp 0 + .amdhsa_ieee_mode 0 + .amdhsa_exception_fp_ieee_invalid_op 1 + .amdhsa_exception_fp_denorm_src 1 + .amdhsa_exception_fp_ieee_div_zero 1 + .amdhsa_exception_fp_ieee_overflow 1 + .amdhsa_exception_fp_ieee_underflow 1 + .amdhsa_exception_fp_ieee_inexact 1 + .amdhsa_exception_int_div_zero 1 +.end_amdhsa_kernel + +// CHECK: .amdhsa_kernel complete +// CHECK-NEXT: .amdhsa_group_segment_fixed_size 1 +// CHECK-NEXT: .amdhsa_private_segment_fixed_size 1 +// CHECK-NEXT: .amdhsa_user_sgpr_private_segment_buffer 1 +// CHECK-NEXT: .amdhsa_user_sgpr_dispatch_ptr 1 +// CHECK-NEXT: .amdhsa_user_sgpr_queue_ptr 1 +// CHECK-NEXT: .amdhsa_user_sgpr_kernarg_segment_ptr 1 +// CHECK-NEXT: .amdhsa_user_sgpr_dispatch_id 1 +// CHECK-NEXT: .amdhsa_user_sgpr_flat_scratch_init 1 +// CHECK-NEXT: .amdhsa_user_sgpr_private_segment_size 1 +// CHECK-NEXT: .amdhsa_user_sgpr_grid_workgroup_count_x 1 +// CHECK-NEXT: .amdhsa_user_sgpr_grid_workgroup_count_y 1 +// CHECK-NEXT: .amdhsa_user_sgpr_grid_workgroup_count_z 1 +// CHECK-NEXT: .amdhsa_system_sgpr_private_segment_wavefront_offset 1 +// CHECK-NEXT: .amdhsa_system_sgpr_workgroup_id_x 1 +// CHECK-NEXT: .amdhsa_system_sgpr_workgroup_id_y 1 +// CHECK-NEXT: .amdhsa_system_sgpr_workgroup_id_z 1 +// CHECK-NEXT: .amdhsa_system_sgpr_workgroup_info 1 +// CHECK-NEXT: .amdhsa_system_vgpr_workitem_id 1 +// CHECK-NEXT: .amdhsa_vgpr_count 8 +// CHECK-NEXT: .amdhsa_sgpr_count 32 +// CHECK-NEXT: .amdhsa_float_round_mode_32 1 +// CHECK-NEXT: .amdhsa_float_round_mode_16_64 1 +// CHECK-NEXT: .amdhsa_float_denorm_mode_32 1 +// CHECK-NEXT: .amdhsa_float_denorm_mode_16_64 0 +// CHECK-NEXT: .amdhsa_dx10_clamp 0 +// CHECK-NEXT: .amdhsa_ieee_mode 0 +// CHECK-NEXT: .amdhsa_exception_fp_ieee_invalid_op 1 +// CHECK-NEXT: .amdhsa_exception_fp_denorm_src 1 +// CHECK-NEXT: .amdhsa_exception_fp_ieee_div_zero 1 +// CHECK-NEXT: .amdhsa_exception_fp_ieee_overflow 1 +// CHECK-NEXT: .amdhsa_exception_fp_ieee_underflow 1 +// CHECK-NEXT: .amdhsa_exception_fp_ieee_inexact 1 +// CHECK-NEXT: .amdhsa_exception_int_div_zero 1 +// CHECK-NEXT: .end_amdhsa_kernel