Index: docs/AMDGPUUsage.rst =================================================================== --- docs/AMDGPUUsage.rst +++ docs/AMDGPUUsage.rst @@ -686,7 +686,7 @@ *link-name* ``STT_OBJECT`` - ``.data`` Global variable - ``.rodata`` - ``.bss`` - *link-name*\ ``@kd`` ``STT_OBJECT`` - ``.rodata`` Kernel descriptor + *link-name*\ ``.kd`` ``STT_OBJECT`` - ``.rodata`` Kernel descriptor *link-name* ``STT_FUNC`` - ``.text`` Kernel entry point ===================== ============== ============= ================== @@ -1578,7 +1578,7 @@ ======= ======= =============================== ============================ Bits Size Field Name Description ======= ======= =============================== ============================ - 31:0 4 bytes GroupSegmentFixedSize The amount of fixed local + 31:0 4 bytes GROUP_SEGMENT_FIXED_SIZE The amount of fixed local address space memory required for a work-group in bytes. This does not @@ -1587,7 +1587,7 @@ space memory that may be added when the kernel is dispatched. - 63:32 4 bytes PrivateSegmentFixedSize The amount of fixed + 63:32 4 bytes PRIVATE_SEGMENT_FIXED_SIZE The amount of fixed private address space memory required for a work-item in bytes. If @@ -1596,7 +1596,7 @@ be added to this value for the call stack. 127:64 8 bytes Reserved, must be 0. - 191:128 8 bytes KernelCodeEntryByteOffset Byte offset (possibly + 191:128 8 bytes KERNEL_CODE_ENTRY_BYTE_OFFSET Byte offset (possibly negative) from base address of kernel descriptor to kernel's @@ -1605,22 +1605,22 @@ aligned. 383:192 24 Reserved, must be 0. bytes - 415:384 4 bytes ComputePgmRsrc1 Compute Shader (CS) + 415:384 4 bytes COMPUTE_PGM_RSRC1 Compute Shader (CS) program settings used by CP to set up ``COMPUTE_PGM_RSRC1`` configuration register. See :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`. - 447:416 4 bytes ComputePgmRsrc2 Compute Shader (CS) + 447:416 4 bytes COMPUTE_PGM_RSRC2 Compute Shader (CS) program settings used by CP to set up ``COMPUTE_PGM_RSRC2`` configuration register. See :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`. - 448 1 bit EnableSGPRPrivateSegmentBuffer Enable the setup of the - SGPR user data registers + 448 1 bit ENABLE_SGPR_PRIVATE_SEGMENT Enable the setup of the + _BUFFER SGPR user data registers (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`). @@ -1631,18 +1631,19 @@ ``compute_pgm_rsrc2.user_sgpr.user_sgpr_count``. Any requests beyond 16 will be ignored. - 449 1 bit EnableSGPRDispatchPtr *see above* - 450 1 bit EnableSGPRQueuePtr *see above* - 451 1 bit EnableSGPRKernargSegmentPtr *see above* - 452 1 bit EnableSGPRDispatchID *see above* - 453 1 bit EnableSGPRFlatScratchInit *see above* - 454 1 bit EnableSGPRPrivateSegmentSize *see above* - 455 1 bit EnableSGPRGridWorkgroupCountX Not implemented in CP and - should always be 0. - 456 1 bit EnableSGPRGridWorkgroupCountY Not implemented in CP and - should always be 0. - 457 1 bit EnableSGPRGridWorkgroupCountZ Not implemented in CP and - should always be 0. + 449 1 bit ENABLE_SGPR_DISPATCH_PTR *see above* + 450 1 bit ENABLE_SGPR_QUEUE_PTR *see above* + 451 1 bit ENABLE_SGPR_KERNARG_SEGMENT_PTR *see above* + 452 1 bit ENABLE_SGPR_DISPATCH_ID *see above* + 453 1 bit ENABLE_SGPR_FLAT_SCRATCH_INIT *see above* + 454 1 bit ENABLE_SGPR_PRIVATE_SEGMENT *see above* + _SIZE + 455 1 bit ENABLE_SGPR_GRID_WORKGROUP Not implemented in CP and + _COUNT_X should always be 0. + 456 1 bit ENABLE_SGPR_GRID_WORKGROUP Not implemented in CP and + _COUNT_Y should always be 0. + 457 1 bit ENABLE_SGPR_GRID_WORKGROUP Not implemented in CP and + _COUNT_Z should always be 0. 463:458 6 bits Reserved, must be 0. 511:464 6 Reserved, must be 0. bytes @@ -1996,10 +1997,10 @@ ====================================== ===== ============================== Enumeration Name Value Description ====================================== ===== ============================== - AMDGPU_FLOAT_ROUND_MODE_NEAR_EVEN 0 Round Ties To Even - AMDGPU_FLOAT_ROUND_MODE_PLUS_INFINITY 1 Round Toward +infinity - AMDGPU_FLOAT_ROUND_MODE_MINUS_INFINITY 2 Round Toward -infinity - AMDGPU_FLOAT_ROUND_MODE_ZERO 3 Round Toward 0 + FLOAT_ROUND_MODE_NEAR_EVEN 0 Round Ties To Even + FLOAT_ROUND_MODE_PLUS_INFINITY 1 Round Toward +infinity + FLOAT_ROUND_MODE_MINUS_INFINITY 2 Round Toward -infinity + FLOAT_ROUND_MODE_ZERO 3 Round Toward 0 ====================================== ===== ============================== .. @@ -2010,11 +2011,11 @@ ====================================== ===== ============================== Enumeration Name Value Description ====================================== ===== ============================== - AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC_DST 0 Flush Source and Destination + FLOAT_DENORM_MODE_FLUSH_SRC_DST 0 Flush Source and Destination Denorms - AMDGPU_FLOAT_DENORM_MODE_FLUSH_DST 1 Flush Output Denorms - AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC 2 Flush Source Denorms - AMDGPU_FLOAT_DENORM_MODE_FLUSH_NONE 3 No Flush + FLOAT_DENORM_MODE_FLUSH_DST 1 Flush Output Denorms + FLOAT_DENORM_MODE_FLUSH_SRC 2 Flush Source Denorms + FLOAT_DENORM_MODE_FLUSH_NONE 3 No Flush ====================================== ===== ============================== .. @@ -2025,13 +2026,13 @@ ======================================== ===== ============================ Enumeration Name Value Description ======================================== ===== ============================ - AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X 0 Set work-item X dimension + SYSTEM_VGPR_WORKITEM_ID_X 0 Set work-item X dimension ID. - AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y 1 Set work-item X and Y + SYSTEM_VGPR_WORKITEM_ID_X_Y 1 Set work-item X and Y dimensions ID. - AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y_Z 2 Set work-item X, Y and Z + SYSTEM_VGPR_WORKITEM_ID_X_Y_Z 2 Set work-item X, Y and Z dimensions ID. - AMDGPU_SYSTEM_VGPR_WORKITEM_ID_UNDEFINED 3 Undefined. + SYSTEM_VGPR_WORKITEM_ID_UNDEFINED 3 Undefined. ======================================== ===== ============================ .. _amdgpu-amdhsa-initial-kernel-execution-state: Index: include/llvm/Support/AMDGPUKernelDescriptor.h =================================================================== --- include/llvm/Support/AMDGPUKernelDescriptor.h +++ /dev/null @@ -1,139 +0,0 @@ -//===--- AMDGPUKernelDescriptor.h -------------------------------*- C++ -*-===// -// -// The LLVM Compiler Infrastructure -// -// This file is distributed under the University of Illinois Open Source -// License. See LICENSE.TXT for details. -// -//===----------------------------------------------------------------------===// -// -/// \file -/// AMDGPU kernel descriptor definitions. For more information, visit -/// https://llvm.org/docs/AMDGPUUsage.html#kernel-descriptor-for-gfx6-gfx9 -// -//===----------------------------------------------------------------------===// - -#ifndef LLVM_SUPPORT_AMDGPUKERNELDESCRIPTOR_H -#define LLVM_SUPPORT_AMDGPUKERNELDESCRIPTOR_H - -#include - -// Creates enumeration entries used for packing bits into integers. Enumeration -// entries include bit shift amount, bit width, and bit mask. -#define AMDGPU_BITS_ENUM_ENTRY(name, shift, width) \ - name ## _SHIFT = (shift), \ - name ## _WIDTH = (width), \ - name = (((1 << (width)) - 1) << (shift)) \ - -// Gets bits for specified bit mask from specified source. -#define AMDGPU_BITS_GET(src, mask) \ - ((src & mask) >> mask ## _SHIFT) \ - -// Sets bits for specified bit mask in specified destination. -#define AMDGPU_BITS_SET(dst, mask, val) \ - dst &= (~(1 << mask ## _SHIFT) & ~mask); \ - dst |= (((val) << mask ## _SHIFT) & mask) \ - -namespace llvm { -namespace AMDGPU { -namespace HSAKD { - -/// Floating point rounding modes. -enum : uint8_t { - AMDGPU_FLOAT_ROUND_MODE_NEAR_EVEN = 0, - AMDGPU_FLOAT_ROUND_MODE_PLUS_INFINITY = 1, - AMDGPU_FLOAT_ROUND_MODE_MINUS_INFINITY = 2, - AMDGPU_FLOAT_ROUND_MODE_ZERO = 3, -}; - -/// Floating point denorm modes. -enum : uint8_t { - AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC_DST = 0, - AMDGPU_FLOAT_DENORM_MODE_FLUSH_DST = 1, - AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC = 2, - AMDGPU_FLOAT_DENORM_MODE_FLUSH_NONE = 3, -}; - -/// System VGPR workitem IDs. -enum : uint8_t { - AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X = 0, - AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y = 1, - AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y_Z = 2, - AMDGPU_SYSTEM_VGPR_WORKITEM_ID_UNDEFINED = 3, -}; - -/// Compute program resource register one layout. -enum ComputePgmRsrc1 { - AMDGPU_BITS_ENUM_ENTRY(GRANULATED_WORKITEM_VGPR_COUNT, 0, 6), - AMDGPU_BITS_ENUM_ENTRY(GRANULATED_WAVEFRONT_SGPR_COUNT, 6, 4), - AMDGPU_BITS_ENUM_ENTRY(PRIORITY, 10, 2), - AMDGPU_BITS_ENUM_ENTRY(FLOAT_ROUND_MODE_32, 12, 2), - AMDGPU_BITS_ENUM_ENTRY(FLOAT_ROUND_MODE_16_64, 14, 2), - AMDGPU_BITS_ENUM_ENTRY(FLOAT_DENORM_MODE_32, 16, 2), - AMDGPU_BITS_ENUM_ENTRY(FLOAT_DENORM_MODE_16_64, 18, 2), - AMDGPU_BITS_ENUM_ENTRY(PRIV, 20, 1), - AMDGPU_BITS_ENUM_ENTRY(ENABLE_DX10_CLAMP, 21, 1), - AMDGPU_BITS_ENUM_ENTRY(DEBUG_MODE, 22, 1), - AMDGPU_BITS_ENUM_ENTRY(ENABLE_IEEE_MODE, 23, 1), - AMDGPU_BITS_ENUM_ENTRY(BULKY, 24, 1), - AMDGPU_BITS_ENUM_ENTRY(CDBG_USER, 25, 1), - AMDGPU_BITS_ENUM_ENTRY(FP16_OVFL, 26, 1), - AMDGPU_BITS_ENUM_ENTRY(RESERVED0, 27, 5), -}; - -/// Compute program resource register two layout. -enum ComputePgmRsrc2 { - AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_PRIVATE_SEGMENT_WAVE_OFFSET, 0, 1), - AMDGPU_BITS_ENUM_ENTRY(USER_SGPR_COUNT, 1, 5), - AMDGPU_BITS_ENUM_ENTRY(ENABLE_TRAP_HANDLER, 6, 1), - AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_ID_X, 7, 1), - AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_ID_Y, 8, 1), - AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_ID_Z, 9, 1), - AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_INFO, 10, 1), - AMDGPU_BITS_ENUM_ENTRY(ENABLE_VGPR_WORKITEM_ID, 11, 2), - AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_ADDRESS_WATCH, 13, 1), - AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_MEMORY, 14, 1), - AMDGPU_BITS_ENUM_ENTRY(GRANULATED_LDS_SIZE, 15, 9), - AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION, 24, 1), - AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_FP_DENORMAL_SOURCE, 25, 1), - AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO, 26, 1), - AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW, 27, 1), - AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW, 28, 1), - AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_INEXACT, 29, 1), - AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO, 30, 1), - AMDGPU_BITS_ENUM_ENTRY(RESERVED1, 31, 1), -}; - -/// Kernel descriptor layout. This layout should be kept backwards -/// compatible as it is consumed by the command processor. -struct KernelDescriptor final { - uint32_t GroupSegmentFixedSize; - uint32_t PrivateSegmentFixedSize; - uint32_t MaxFlatWorkGroupSize; - uint64_t IsDynamicCallStack : 1; - uint64_t IsXNACKEnabled : 1; - uint64_t Reserved0 : 30; - int64_t KernelCodeEntryByteOffset; - uint64_t Reserved1[3]; - uint32_t ComputePgmRsrc1; - uint32_t ComputePgmRsrc2; - uint64_t EnableSGPRPrivateSegmentBuffer : 1; - uint64_t EnableSGPRDispatchPtr : 1; - uint64_t EnableSGPRQueuePtr : 1; - uint64_t EnableSGPRKernargSegmentPtr : 1; - uint64_t EnableSGPRDispatchID : 1; - uint64_t EnableSGPRFlatScratchInit : 1; - uint64_t EnableSGPRPrivateSegmentSize : 1; - uint64_t EnableSGPRGridWorkgroupCountX : 1; - uint64_t EnableSGPRGridWorkgroupCountY : 1; - uint64_t EnableSGPRGridWorkgroupCountZ : 1; - uint64_t Reserved2 : 54; - - KernelDescriptor() = default; -}; - -} // end namespace HSAKD -} // end namespace AMDGPU -} // end namespace llvm - -#endif // LLVM_SUPPORT_AMDGPUKERNELDESCRIPTOR_H Index: include/llvm/Support/AMDHSAKernelDescriptor.h =================================================================== --- /dev/null +++ include/llvm/Support/AMDHSAKernelDescriptor.h @@ -0,0 +1,187 @@ +//===--- AMDHSAKernelDescriptor.h -----------------------------*- C++ -*---===// +// +// The LLVM Compiler Infrastructure +// +// This file is distributed under the University of Illinois Open Source +// License. See LICENSE.TXT for details. +// +//===----------------------------------------------------------------------===// +// +/// \file +/// AMDHSA kernel descriptor definitions. For more information, visit +/// https://llvm.org/docs/AMDGPUUsage.html#kernel-descriptor +// +//===----------------------------------------------------------------------===// + +#ifndef LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H +#define LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H + +#include + +// Gets offset of specified member in specified type. +#ifndef offsetof +#define offsetof(TYPE, MEMBER) ((size_t)&((TYPE*)0)->MEMBER) +#endif // offsetof + +// Creates enumeration entries used for packing bits into integers. Enumeration +// entries include bit shift amount, bit width, and bit mask. +#ifndef AMDHSA_BITS_ENUM_ENTRY +#define AMDHSA_BITS_ENUM_ENTRY(NAME, SHIFT, WIDTH) \ + NAME ## _SHIFT = (SHIFT), \ + NAME ## _WIDTH = (WIDTH), \ + NAME = (((1 << (WIDTH)) - 1) << (SHIFT)) +#endif // AMDHSA_BITS_ENUM_ENTRY + +// Gets bits for specified bit mask from specified source. +#ifndef AMDHSA_BITS_GET +#define AMDHSA_BITS_GET(SRC, MSK) ((SRC & MSK) >> MSK ## _SHIFT) +#endif // AMDHSA_BITS_GET + +// Sets bits for specified bit mask in specified destination. +#ifndef AMDHSA_BITS_SET +#define AMDHSA_BITS_SET(DST, MSK, VAL) \ + DST &= ~MSK; \ + DST |= ((VAL << MSK ## _SHIFT) & MSK) +#endif // AMDHSA_BITS_SET + +namespace llvm { +namespace amdhsa { + +// Floating point rounding modes. Must be kept backwards compatible. +enum : uint8_t { + FLOAT_ROUND_MODE_NEAR_EVEN = 0, + FLOAT_ROUND_MODE_PLUS_INFINITY = 1, + FLOAT_ROUND_MODE_MINUS_INFINITY = 2, + FLOAT_ROUND_MODE_ZERO = 3, +}; + +// Floating point denorm modes. Must be kept backwards compatible. +enum : uint8_t { + FLOAT_DENORM_MODE_FLUSH_SRC_DST = 0, + FLOAT_DENORM_MODE_FLUSH_DST = 1, + FLOAT_DENORM_MODE_FLUSH_SRC = 2, + FLOAT_DENORM_MODE_FLUSH_NONE = 3, +}; + +// System VGPR workitem IDs. Must be kept backwards compatible. +enum : uint8_t { + SYSTEM_VGPR_WORKITEM_ID_X = 0, + SYSTEM_VGPR_WORKITEM_ID_X_Y = 1, + SYSTEM_VGPR_WORKITEM_ID_X_Y_Z = 2, + SYSTEM_VGPR_WORKITEM_ID_UNDEFINED = 3, +}; + +// Compute program resource register 1. Must be kept backwards compatible. +#define COMPUTE_PGM_RSRC1(NAME, SHIFT, WIDTH) \ + AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC1_ ## NAME, SHIFT, WIDTH) +enum : int32_t { + COMPUTE_PGM_RSRC1(GRANULATED_WORKITEM_VGPR_COUNT, 0, 6), + COMPUTE_PGM_RSRC1(GRANULATED_WAVEFRONT_SGPR_COUNT, 6, 4), + COMPUTE_PGM_RSRC1(PRIORITY, 10, 2), + COMPUTE_PGM_RSRC1(FLOAT_ROUND_MODE_32, 12, 2), + COMPUTE_PGM_RSRC1(FLOAT_ROUND_MODE_16_64, 14, 2), + COMPUTE_PGM_RSRC1(FLOAT_DENORM_MODE_32, 16, 2), + COMPUTE_PGM_RSRC1(FLOAT_DENORM_MODE_16_64, 18, 2), + COMPUTE_PGM_RSRC1(PRIV, 20, 1), + COMPUTE_PGM_RSRC1(ENABLE_DX10_CLAMP, 21, 1), + COMPUTE_PGM_RSRC1(DEBUG_MODE, 22, 1), + COMPUTE_PGM_RSRC1(ENABLE_IEEE_MODE, 23, 1), + COMPUTE_PGM_RSRC1(BULKY, 24, 1), + COMPUTE_PGM_RSRC1(CDBG_USER, 25, 1), + COMPUTE_PGM_RSRC1(FP16_OVFL, 26, 1), + COMPUTE_PGM_RSRC1(RESERVED, 27, 5), +}; +#undef COMPUTE_PGM_RSRC1 + +// Compute program resource register 2. Must be kept backwards compatible. +#define COMPUTE_PGM_RSRC2(NAME, SHIFT, WIDTH) \ + AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC2_ ## NAME, SHIFT, WIDTH) +enum : int32_t { + COMPUTE_PGM_RSRC2(ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET, 0, 1), + COMPUTE_PGM_RSRC2(USER_SGPR_COUNT, 1, 5), + COMPUTE_PGM_RSRC2(ENABLE_TRAP_HANDLER, 6, 1), + COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_X, 7, 1), + COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_Y, 8, 1), + COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_Z, 9, 1), + COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_INFO, 10, 1), + COMPUTE_PGM_RSRC2(ENABLE_VGPR_WORKITEM_ID, 11, 2), + COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_ADDRESS_WATCH, 13, 1), + COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_MEMORY, 14, 1), + COMPUTE_PGM_RSRC2(GRANULATED_LDS_SIZE, 15, 9), + COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION, 24, 1), + COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_FP_DENORMAL_SOURCE, 25, 1), + COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO, 26, 1), + COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW, 27, 1), + COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW, 28, 1), + COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_INEXACT, 29, 1), + COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO, 30, 1), + COMPUTE_PGM_RSRC2(RESERVED, 31, 1), +}; +#undef COMPUTE_PGM_RSRC2 + +// Kernel code properties. Must be kept backwards compatible. +#define KERNEL_CODE_PROPERTY(NAME, SHIFT, WIDTH) \ + AMDHSA_BITS_ENUM_ENTRY(KERNEL_CODE_PROPERTY_ ## NAME, SHIFT, WIDTH) +enum : int32_t { + KERNEL_CODE_PROPERTY(ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER, 0, 1), + KERNEL_CODE_PROPERTY(ENABLE_SGPR_DISPATCH_PTR, 1, 1), + KERNEL_CODE_PROPERTY(ENABLE_SGPR_QUEUE_PTR, 2, 1), + KERNEL_CODE_PROPERTY(ENABLE_SGPR_KERNARG_SEGMENT_PTR, 3, 1), + KERNEL_CODE_PROPERTY(ENABLE_SGPR_DISPATCH_ID, 4, 1), + KERNEL_CODE_PROPERTY(ENABLE_SGPR_FLAT_SCRATCH_INIT, 5, 1), + KERNEL_CODE_PROPERTY(ENABLE_SGPR_PRIVATE_SEGMENT_SIZE, 6, 1), + KERNEL_CODE_PROPERTY(ENABLE_SGPR_GRID_WORKGROUP_COUNT_X, 7, 1), + KERNEL_CODE_PROPERTY(ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y, 8, 1), + KERNEL_CODE_PROPERTY(ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z, 9, 1), + KERNEL_CODE_PROPERTY(RESERVED, 10, 6), +}; +#undef KERNEL_CODE_PROPERTY + +// Kernel descriptor. Must be kept backwards compatible. +struct kernel_descriptor_t { + uint32_t group_segment_fixed_size; + uint32_t private_segment_fixed_size; + uint8_t reserved0[8]; + int64_t kernel_code_entry_byte_offset; + uint8_t reserved1[24]; + uint32_t compute_pgm_rsrc1; + uint32_t compute_pgm_rsrc2; + uint16_t kernel_code_properties; + uint8_t reserved2[6]; +}; + +static_assert( + sizeof(kernel_descriptor_t) == 64, + "invalid size for kernel_descriptor_t"); +static_assert( + offsetof(kernel_descriptor_t, group_segment_fixed_size) == 0, + "invalid offset for group_segment_fixed_size"); +static_assert( + offsetof(kernel_descriptor_t, private_segment_fixed_size) == 4, + "invalid offset for private_segment_fixed_size"); +static_assert( + offsetof(kernel_descriptor_t, reserved0) == 8, + "invalid offset for reserved0"); +static_assert( + offsetof(kernel_descriptor_t, kernel_code_entry_byte_offset) == 16, + "invalid offset for kernel_code_entry_byte_offset"); +static_assert( + offsetof(kernel_descriptor_t, reserved1) == 24, + "invalid offset for reserved1"); +static_assert( + offsetof(kernel_descriptor_t, compute_pgm_rsrc1) == 48, + "invalid offset for compute_pgm_rsrc1"); +static_assert( + offsetof(kernel_descriptor_t, compute_pgm_rsrc2) == 52, + "invalid offset for compute_pgm_rsrc2"); +static_assert( + offsetof(kernel_descriptor_t, kernel_code_properties) == 56, + "invalid offset for kernel_code_properties"); +static_assert( + offsetof(kernel_descriptor_t, reserved2) == 58, + "invalid offset for reserved2"); + +} // end namespace amdhsa +} // end namespace llvm + +#endif // LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H Index: lib/Target/AMDGPU/AMDGPUAsmPrinter.h =================================================================== --- lib/Target/AMDGPU/AMDGPUAsmPrinter.h +++ lib/Target/AMDGPU/AMDGPUAsmPrinter.h @@ -20,6 +20,7 @@ #include "MCTargetDesc/AMDGPUHSAMetadataStreamer.h" #include "llvm/ADT/StringRef.h" #include "llvm/CodeGen/AsmPrinter.h" +#include "llvm/Support/AMDHSAKernelDescriptor.h" #include #include #include @@ -148,6 +149,13 @@ uint64_t CodeSize, const AMDGPUMachineFunction* MFI); + uint16_t getAmdhsaKernelCodeProperties( + const MachineFunction &MF) const; + + amdhsa::kernel_descriptor_t getAmdhsaKernelDescriptor( + const MachineFunction &MF, + const SIProgramInfo &PI) const; + public: explicit AMDGPUAsmPrinter(TargetMachine &TM, std::unique_ptr Streamer); @@ -180,6 +188,8 @@ void EmitFunctionBodyStart() override; + void EmitFunctionBodyEnd() override; + void EmitFunctionEntryLabel() override; void EmitBasicBlockStart(const MachineBasicBlock &MBB) const override; Index: lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp =================================================================== --- lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp +++ lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp @@ -116,6 +116,10 @@ } void AMDGPUAsmPrinter::EmitStartOfAsmFile(Module &M) { + if (IsaInfo::hasCodeObjectV3(getSTI()) && + TM.getTargetTriple().getOS() == Triple::AMDHSA) + return; + if (TM.getTargetTriple().getOS() != Triple::AMDHSA && TM.getTargetTriple().getOS() != Triple::AMDPAL) return; @@ -126,10 +130,6 @@ if (TM.getTargetTriple().getOS() == Triple::AMDPAL) readPALMetadata(M); - // Deprecated notes are not emitted for code object v3. - if (IsaInfo::hasCodeObjectV3(getSTI()->getFeatureBits())) - return; - // HSA emits NT_AMDGPU_HSA_CODE_OBJECT_VERSION for code objects v2. if (TM.getTargetTriple().getOS() == Triple::AMDHSA) getTargetStreamer()->EmitDirectiveHSACodeObjectVersion(2, 1); @@ -141,6 +141,10 @@ } void AMDGPUAsmPrinter::EmitEndOfAsmFile(Module &M) { + // TODO: Add metadata to code object v3. + if (IsaInfo::hasCodeObjectV3(getSTI()) && + TM.getTargetTriple().getOS() == Triple::AMDHSA) + return; // Following code requires TargetStreamer to be present. if (!getTargetStreamer()) @@ -186,8 +190,11 @@ } void AMDGPUAsmPrinter::EmitFunctionBodyStart() { - const AMDGPUMachineFunction *MFI = MF->getInfo(); - if (!MFI->isEntryFunction()) + const SIMachineFunctionInfo &MFI = *MF->getInfo(); + if (!MFI.isEntryFunction()) + return; + if (IsaInfo::hasCodeObjectV3(getSTI()) && + TM.getTargetTriple().getOS() == Triple::AMDHSA) return; const AMDGPUSubtarget &STM = MF->getSubtarget(); @@ -205,7 +212,27 @@ getHSADebugProps(*MF, CurrentProgramInfo)); } +void AMDGPUAsmPrinter::EmitFunctionBodyEnd() { + const SIMachineFunctionInfo &MFI = *MF->getInfo(); + if (!MFI.isEntryFunction()) + return; + if (!IsaInfo::hasCodeObjectV3(getSTI()) || + TM.getTargetTriple().getOS() != Triple::AMDHSA) + return; + + SmallString<128> KernelName; + getNameWithPrefix(KernelName, &MF->getFunction()); + getTargetStreamer()->EmitAmdhsaKernelDescriptor( + KernelName, getAmdhsaKernelDescriptor(*MF, CurrentProgramInfo)); +} + void AMDGPUAsmPrinter::EmitFunctionEntryLabel() { + if (IsaInfo::hasCodeObjectV3(getSTI()) && + TM.getTargetTriple().getOS() == Triple::AMDHSA) { + AsmPrinter::EmitFunctionEntryLabel(); + return; + } + const SIMachineFunctionInfo *MFI = MF->getInfo(); const AMDGPUSubtarget &STM = MF->getSubtarget(); if (MFI->isEntryFunction() && STM.isAmdCodeObjectV2(MF->getFunction())) { @@ -288,6 +315,70 @@ false); } +uint16_t AMDGPUAsmPrinter::getAmdhsaKernelCodeProperties( + const MachineFunction &MF) const { + const SIMachineFunctionInfo &MFI = *MF.getInfo(); + uint16_t KernelCodeProperties = 0; + + if (MFI.hasPrivateSegmentBuffer()) { + KernelCodeProperties |= + amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER; + } + if (MFI.hasDispatchPtr()) { + KernelCodeProperties |= + amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_PTR; + } + if (MFI.hasQueuePtr()) { + KernelCodeProperties |= + amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_QUEUE_PTR; + } + if (MFI.hasKernargSegmentPtr()) { + KernelCodeProperties |= + amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_KERNARG_SEGMENT_PTR; + } + if (MFI.hasDispatchID()) { + KernelCodeProperties |= + amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_ID; + } + if (MFI.hasFlatScratchInit()) { + KernelCodeProperties |= + amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_FLAT_SCRATCH_INIT; + } + if (MFI.hasGridWorkgroupCountX()) { + KernelCodeProperties |= + amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_X; + } + if (MFI.hasGridWorkgroupCountY()) { + KernelCodeProperties |= + amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y; + } + if (MFI.hasGridWorkgroupCountZ()) { + KernelCodeProperties |= + amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z; + } + + return KernelCodeProperties; +} + +amdhsa::kernel_descriptor_t AMDGPUAsmPrinter::getAmdhsaKernelDescriptor( + const MachineFunction &MF, + const SIProgramInfo &PI) const { + amdhsa::kernel_descriptor_t KernelDescriptor; + memset(&KernelDescriptor, 0x0, sizeof(KernelDescriptor)); + + assert(isUInt<32>(PI.ScratchSize)); + assert(isUInt<32>(PI.ComputePGMRSrc1)); + assert(isUInt<32>(PI.ComputePGMRSrc2)); + + KernelDescriptor.group_segment_fixed_size = PI.LDSSize; + KernelDescriptor.private_segment_fixed_size = PI.ScratchSize; + KernelDescriptor.compute_pgm_rsrc1 = PI.ComputePGMRSrc1; + KernelDescriptor.compute_pgm_rsrc2 = PI.ComputePGMRSrc2; + KernelDescriptor.kernel_code_properties = getAmdhsaKernelCodeProperties(MF); + + return KernelDescriptor; +} + bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) { CurrentProgramInfo = SIProgramInfo(); Index: lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h =================================================================== --- lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h +++ lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h @@ -14,6 +14,7 @@ #include "llvm/MC/MCStreamer.h" #include "llvm/MC/MCSubtargetInfo.h" #include "llvm/Support/AMDGPUMetadata.h" +#include "llvm/Support/AMDHSAKernelDescriptor.h" namespace llvm { #include "AMDGPUPTNote.h" @@ -62,6 +63,10 @@ /// \returns True on success, false on failure. virtual bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) = 0; + + virtual void EmitAmdhsaKernelDescriptor( + StringRef KernelName, + const amdhsa::kernel_descriptor_t &KernelDescriptor) = 0; }; class AMDGPUTargetAsmStreamer final : public AMDGPUTargetStreamer { @@ -87,6 +92,10 @@ /// \returns True on success, false on failure. bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) override; + + void EmitAmdhsaKernelDescriptor( + StringRef KernelName, + const amdhsa::kernel_descriptor_t &KernelDescriptor) override; }; class AMDGPUTargetELFStreamer final : public AMDGPUTargetStreamer { @@ -119,6 +128,10 @@ /// \returns True on success, false on failure. bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) override; + + void EmitAmdhsaKernelDescriptor( + StringRef KernelName, + const amdhsa::kernel_descriptor_t &KernelDescriptor) override; }; } Index: lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp =================================================================== --- lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp +++ lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp @@ -196,6 +196,12 @@ return true; } +void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor( + StringRef KernelName, + const amdhsa::kernel_descriptor_t &KernelDescriptor) { + // FIXME: not supported yet. +} + //===----------------------------------------------------------------------===// // AMDGPUTargetELFStreamer //===----------------------------------------------------------------------===// @@ -362,3 +368,57 @@ ); return true; } + +void AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor( + StringRef KernelName, + const amdhsa::kernel_descriptor_t &KernelDescriptor) { + auto &Streamer = getStreamer(); + auto &Context = Streamer.getContext(); + auto &ObjectFileInfo = *Context.getObjectFileInfo(); + auto &ReadOnlySection = *ObjectFileInfo.getReadOnlySection(); + + Streamer.PushSection(); + Streamer.SwitchSection(&ReadOnlySection); + + // CP microcode requires the kernel descriptor to be allocated on 64 byte + // alignment. + Streamer.EmitValueToAlignment(64, 0, 1, 0); + if (ReadOnlySection.getAlignment() < 64) + ReadOnlySection.setAlignment(64); + + MCSymbolELF *KernelDescriptorSymbol = cast( + Context.getOrCreateSymbol(Twine(KernelName) + Twine(".kd"))); + KernelDescriptorSymbol->setBinding(ELF::STB_GLOBAL); + KernelDescriptorSymbol->setType(ELF::STT_OBJECT); + KernelDescriptorSymbol->setSize( + MCConstantExpr::create(sizeof(KernelDescriptor), Context)); + + MCSymbolELF *KernelCodeSymbol = cast( + Context.getOrCreateSymbol(Twine(KernelName))); + KernelCodeSymbol->setBinding(ELF::STB_LOCAL); + + Streamer.EmitLabel(KernelDescriptorSymbol); + Streamer.EmitBytes(StringRef( + (const char*)&(KernelDescriptor), + offsetof(amdhsa::kernel_descriptor_t, kernel_code_entry_byte_offset))); + // FIXME: Remove the use of VK_AMDGPU_REL64 in the expression below. The + // expression being created is: + // (start of kernel code) - (start of kernel descriptor) + // It implies R_AMDGPU_REL64, but ends up being R_AMDGPU_ABS64. + Streamer.EmitValue(MCBinaryExpr::createSub( + MCSymbolRefExpr::create( + KernelCodeSymbol, MCSymbolRefExpr::VK_AMDGPU_REL64, Context), + MCSymbolRefExpr::create( + KernelDescriptorSymbol, MCSymbolRefExpr::VK_None, Context), + Context), + sizeof(KernelDescriptor.kernel_code_entry_byte_offset)); + Streamer.EmitBytes(StringRef( + (const char*)&(KernelDescriptor) + + offsetof(amdhsa::kernel_descriptor_t, kernel_code_entry_byte_offset) + + sizeof(KernelDescriptor.kernel_code_entry_byte_offset), + sizeof(KernelDescriptor) - + offsetof(amdhsa::kernel_descriptor_t, kernel_code_entry_byte_offset) - + sizeof(KernelDescriptor.kernel_code_entry_byte_offset))); + + Streamer.PopSection(); +} Index: lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h =================================================================== --- lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h +++ lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h @@ -59,9 +59,9 @@ /// Streams isa version string for given subtarget \p STI into \p Stream. void streamIsaVersion(const MCSubtargetInfo *STI, raw_ostream &Stream); -/// \returns True if given subtarget \p Features support code object version 3, +/// \returns True if given subtarget \p STI supports code object version 3, /// false otherwise. -bool hasCodeObjectV3(const FeatureBitset &Features); +bool hasCodeObjectV3(const MCSubtargetInfo *STI); /// \returns Wavefront size for given subtarget \p Features. unsigned getWavefrontSize(const FeatureBitset &Features); Index: lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp =================================================================== --- lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp +++ lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp @@ -248,8 +248,8 @@ Stream.flush(); } -bool hasCodeObjectV3(const FeatureBitset &Features) { - return Features.test(FeatureCodeObjectV3); +bool hasCodeObjectV3(const MCSubtargetInfo *STI) { + return STI->getFeatureBits().test(FeatureCodeObjectV3); } unsigned getWavefrontSize(const FeatureBitset &Features) { Index: test/CodeGen/AMDGPU/code-object-v3.ll =================================================================== --- /dev/null +++ test/CodeGen/AMDGPU/code-object-v3.ll @@ -0,0 +1,48 @@ +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=+code-object-v3 < %s | FileCheck --check-prefixes=ALL-ASM,OSABI-AMDHSA-ASM %s +; RUN: llc -filetype=obj -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=+code-object-v3 < %s | llvm-readobj -elf-output-style=GNU -notes -relocations -sections -symbols | FileCheck --check-prefixes=ALL-ELF,OSABI-AMDHSA-ELF %s + +; OSABI-AMDHSA-ASM-NOT: .hsa_code_object_version +; OSABI-AMDHSA-ASM-NOT: .hsa_code_object_isa +; OSABI-AMDHSA-ASM-NOT: .amd_amdgpu_isa +; OSABI-AMDHSA-ASM-NOT: .amd_amdgpu_hsa_metadata +; OSABI-AMDHSA-ASM-NOT: .amd_amdgpu_pal_metadata + +; OSABI-AMDHSA-ELF: Section Headers +; OSABI-AMDHSA-ELF: .text PROGBITS {{[0-9]+}} {{[0-9]+}} {{[0-9a-f]+}} {{[0-9]+}} AX {{[0-9]+}} {{[0-9]+}} 256 +; OSABI-AMDHSA-ELF: .rodata PROGBITS {{[0-9]+}} {{[0-9]+}} {{[0-9a-f]+}} {{[0-9]+}} A {{[0-9]+}} {{[0-9]+}} 64 + +; OSABI-AMDHSA-ELF: Relocation section '.rela.rodata' at offset +; OSABI-AMDHSA-ELF: 0000000000000010 0000000300000005 R_AMDGPU_REL64 0000000000000000 .text + 10 +; OSABI-AMDHSA-ELF: 0000000000000050 0000000300000005 R_AMDGPU_REL64 0000000000000000 .text + 110 + +; OSABI-AMDHSA-ELF: Symbol table '.symtab' contains {{[0-9]+}} entries +; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000000 {{[0-9]+}} FUNC LOCAL DEFAULT {{[0-9]+}} fadd +; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000100 {{[0-9]+}} FUNC LOCAL DEFAULT {{[0-9]+}} fsub +; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000000 64 OBJECT GLOBAL DEFAULT {{[0-9]+}} fadd.kd +; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000040 64 OBJECT GLOBAL DEFAULT {{[0-9]+}} fsub.kd + +; OSABI-AMDHSA-ELF-NOT: Displaying notes found + +define amdgpu_kernel void @fadd( + float addrspace(1)* %r, + float addrspace(1)* %a, + float addrspace(1)* %b) { +entry: + %a.val = load float, float addrspace(1)* %a + %b.val = load float, float addrspace(1)* %b + %r.val = fadd float %a.val, %b.val + store float %r.val, float addrspace(1)* %r + ret void +} + +define amdgpu_kernel void @fsub( + float addrspace(1)* %r, + float addrspace(1)* %a, + float addrspace(1)* %b) { +entry: + %a.val = load float, float addrspace(1)* %a + %b.val = load float, float addrspace(1)* %b + %r.val = fsub float %a.val, %b.val + store float %r.val, float addrspace(1)* %r + ret void +} Index: test/CodeGen/AMDGPU/elf-notes.ll =================================================================== --- test/CodeGen/AMDGPU/elf-notes.ll +++ test/CodeGen/AMDGPU/elf-notes.ll @@ -1,13 +1,13 @@ -; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s -; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=iceland -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s -; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 -mattr=+code-object-v3 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK-ELF --check-prefix=GFX802 %s -; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s -; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=iceland -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s -; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 -mattr=+code-object-v3 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA-ELF --check-prefix=GFX802 %s -; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s -; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=iceland -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s -; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 -mattr=+code-object-v3 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL-ELF --check-prefix=GFX802 %s -; RUN: llc -march=r600 -mattr=+code-object-v3 < %s | FileCheck --check-prefix=R600 %s +; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s +; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=iceland < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s +; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK-ELF --check-prefix=GFX802 %s +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=iceland < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s +; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA-ELF --check-prefix=GFX802 %s +; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s +; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=iceland < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s +; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL-ELF --check-prefix=GFX802 %s +; RUN: llc -march=r600 < %s | FileCheck --check-prefix=R600 %s ; OSABI-UNK-NOT: .hsa_code_object_version ; OSABI-UNK-NOT: .hsa_code_object_isa @@ -25,17 +25,17 @@ ; OSABI-UNK-ELF-NOT: NT_AMD_AMDGPU_PAL_METADATA (PAL Metadata) ; OSABI-UNK-ELF-NOT: Unknown note type -; OSABI-HSA-NOT: .hsa_code_object_version -; OSABI-HSA-NOT: .hsa_code_object_isa +; OSABI-HSA: .hsa_code_object_version +; OSABI-HSA: .hsa_code_object_isa ; OSABI-HSA: .amd_amdgpu_isa "amdgcn-amd-amdhsa--gfx802" ; OSABI-HSA: .amd_amdgpu_hsa_metadata ; OSABI-HSA-NOT: .amd_amdgpu_pal_metadata -; OSABI-HSA-ELF-NOT: Unknown note type +; OSABI-HSA-ELF: Unknown note type (0x00000001) +; OSABI-HSA-ELF: Unknown note type (0x00000003) ; OSABI-HSA-ELF: NT_AMD_AMDGPU_ISA (ISA Version) ; OSABI-HSA-ELF: ISA Version: ; OSABI-HSA-ELF: amdgcn-amd-amdhsa--gfx802 -; OSABI-HSA-ELF-NOT: Unknown note type ; OSABI-HSA-ELF: NT_AMD_AMDGPU_HSA_METADATA (HSA Metadata) ; OSABI-HSA-ELF: HSA Metadata: ; OSABI-HSA-ELF: --- @@ -51,34 +51,29 @@ ; OSABI-HSA-ELF: WavefrontSize: 64 ; OSABI-HSA-ELF: NumSGPRs: 96 ; OSABI-HSA-ELF: ... -; OSABI-HSA-ELF-NOT: Unknown note type ; OSABI-HSA-ELF-NOT: NT_AMD_AMDGPU_PAL_METADATA (PAL Metadata) -; OSABI-HSA-ELF-NOT: Unknown note type ; OSABI-PAL-NOT: .hsa_code_object_version -; OSABI-PAL-NOT: .hsa_code_object_isa +; OSABI-PAL: .hsa_code_object_isa ; OSABI-PAL: .amd_amdgpu_isa "amdgcn-amd-amdpal--gfx802" ; OSABI-PAL-NOT: .amd_amdgpu_hsa_metadata ; OSABI-PAL: .amd_amdgpu_pal_metadata -; OSABI-PAL-ELF-NOT: Unknown note type +; OSABI-PAL-ELF: Unknown note type (0x00000003) ; OSABI-PAL-ELF: NT_AMD_AMDGPU_ISA (ISA Version) ; OSABI-PAL-ELF: ISA Version: ; OSABI-PAL-ELF: amdgcn-amd-amdpal--gfx802 -; OSABI-PAL-ELF-NOT: Unknown note type ; OSABI-PAL-ELF-NOT: NT_AMD_AMDGPU_HSA_METADATA (HSA Metadata) -; OSABI-PAL-ELF-NOT: Unknown note type ; OSABI-PAL-ELF: NT_AMD_AMDGPU_PAL_METADATA (PAL Metadata) ; OSABI-PAL-ELF: PAL Metadata: ; TODO: Following check line fails on mips: ; OSABI-PAL-ELF-XXX: 0x2e12,0xac02c0,0x2e13,0x80,0x1000001b,0x1,0x10000022,0x60,0x1000003e,0x0 -; OSABI-PAL-ELF-NOT: Unknown note type ; R600-NOT: .hsa_code_object_version ; R600-NOT: .hsa_code_object_isa ; R600-NOT: .amd_amdgpu_isa ; R600-NOT: .amd_amdgpu_hsa_metadata -; R600-NOT: .amd_amdgpu_pal_metadatas +; R600-NOT: .amd_amdgpu_pal_metadata define amdgpu_kernel void @elf_notes() { ret void