This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
2
AMDGPUUsage.rst
-
include/llvm/Support/
-
llvm/
-
Support/
-
AMDGPUKernelDescriptor.h
2/9
AMDHSAKernelDescriptor.h
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPUAsmPrinter.h
-
AMDGPUAsmPrinter.cpp
-
MCTargetDesc/
-
AMDGPUTargetStreamer.h
1/10
AMDGPUTargetStreamer.cpp
-
Utils/
-
AMDGPUBaseInfo.h
-
AMDGPUBaseInfo.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
code-object-v3.ll
-
elf-notes.ll

Differential D47566

AMDHSA: Code object v3 updates
ClosedPublic

Authored by kzhuravl on May 30 2018, 5:06 PM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
tstellar
t-tye
nhaehnle

Summary

Do not emit following assembler directives:
- .hsa_code_object_version
- .hsa_code_object_isa
- .amd_amdgpu_isa
- .amd_amdgpu_hsa_metadata
- .amd_amdgpu_pal_metadata
Do not emit .note entries
Cleanup and bring in sync kernel descriptor header file
Emit kernel descriptor into .rodata with appropriate relocations and alignments

Diff Detail

Event Timeline

kzhuravl created this revision.May 30 2018, 5:06 PM

Herald added subscribers: tpr, dstuttard, yaxunl and 2 others. · View Herald TranscriptMay 30 2018, 5:06 PM

This change also requires a change in lld, which I plan to post tomorrow.

This only applies to AMDHSA for now
Update tests

kzhuravl added a parent revision: D47601: AMDGPU: Add 64-bit relative variant kind.May 31 2018, 12:31 PM

nhaehnle added inline comments.Jun 4 2018, 7:17 AM

lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
403–409	Isn't the REL64 here redundant? The way I see it, we should either have an absolute reference to `(start of kernel code) - (start of kernel descriptor)`, or a relative reference to `(start of kernel code) - offsetof(kernel_descriptor_t, kernel_code_entry_byte_offset)`. I'm not deep enough in MC to say for certain which of these is really preferable.

Fix symbol bindings and rebase.

t-tye added inline comments.Jun 4 2018, 11:46 AM

lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
403–409	The field requires an offset to the entry point so an absolute relocation cannot be used. The Rel64 relocation record is what describes what is needed. Since the kernel descriptor and entry point are now in different sections, a static relocation record is needed so that it can be fixed up when the relocatable code object is linked to make a shared object. Previously the kernel descriptor was put in the same section as the code, and the offset was "hard-wired" when it was generated.

scott.linder added a subscriber: scott.linder.Jun 4 2018, 12:11 PM

t-tye added inline comments.Jun 4 2018, 12:20 PM

include/llvm/Support/AMDHSAKernelDescriptor.h
13	Suggest just linking to "https://llvm.org/docs/AMDGPUUsage.html#kernel-descriptor" as this may support other targets in the future.
44	Should this be: ''' DST &= ~MSK; '''
51	What does "Must be kept backwards compatible." mean? Arn't these just the meaning of the values? Or is the issue that they may change value on different targets in the future?
78	uint32_t to match `uint32_t compute_pgm_rsrc1;`?
100	uint32_t to match `uint32_t compute_pgm_rsrc2;`?
126	Should this be uint16_t since the kernel descriptor field is `uint16_t kernel_code_properties;`?

scott.linder added a child revision: D47736: AMDHSA Code Object v3 assembler syntax update.Jun 4 2018, 12:35 PM

nhaehnle added inline comments.Jun 4 2018, 1:26 PM

lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
403–409	Right, I agree that a REL64 relocation is ultimately needed. My point is more about how to express that fact inside LLVM using the MCExpr framework. I read the expressing that is being created here as literally `(start of kernel code) - (start of kernel descriptor)`. The fact that it's a relative relocation really ought to be implied by that already, it's not clear to me why VK_AMDGPU_REL64 is passed to one of the constructors in addition to that.

kzhuravl added inline comments.Jun 4 2018, 3:24 PM

include/llvm/Support/AMDHSAKernelDescriptor.h
51	May change value in future.
78	I get a compiler warning if enum is from uint32_t.
lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
403–409	If we just put (start of kernel code) - (start of kernel descriptor), it ends up being R_AMDGPU_ABS64.

t-tye added inline comments.Jun 4 2018, 10:52 PM

lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
403–409	I agree with @nhaehnle that this seems strange it is needed. Is that a limitation or bug in MCExpr handling? Perhaps it is being used in a way not seen before and so it mishandles this case, and ought to be fixed?

t-tye added a subscriber: b-sumner.Jun 4 2018, 10:55 PM

t-tye added inline comments.

docs/AMDGPUUsage.rst
689	@b-sumner does this seem to be a reasonable extension to use? Using @ conflicts with its use for symbol versioning need to change to something else.

b-sumner added inline comments.Jun 5 2018, 5:46 AM

docs/AMDGPUUsage.rst
689	Yes, it seems reasonable to me.

Address review feedback.

lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
403–409	I put a fixme comment for now.

scott.linder added inline comments.Jun 7 2018, 12:31 PM

lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
403–409	I think the definition of VK_AMDGPU_REL64 got dropped in the most recent patch?

kzhuravl added inline comments.Jun 7 2018, 12:33 PM

lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
403–409	https://reviews.llvm.org/D47601

LGTM

include/llvm/Support/AMDHSAKernelDescriptor.h
51	Suggest: // Floating point rounding modes. Must match hardware definition.

This revision is now accepted and ready to land.Jun 7 2018, 2:51 PM

scott.linder added inline comments.Jun 12 2018, 9:08 AM

lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
381	Should this switch be happening here? Shouldn't the assembly writer be able to put this descriptor in any section?

rL334519

lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
381	rL334521

Revision Contents

Path

Size

docs/

AMDGPUUsage.rst

65 lines

include/

llvm/

Support/

AMDGPUKernelDescriptor.h

AMDHSAKernelDescriptor.h

187 lines

lib/

Target/

AMDGPU/

AMDGPUAsmPrinter.h

10 lines

AMDGPUAsmPrinter.cpp

103 lines

MCTargetDesc/

AMDGPUTargetStreamer.h

13 lines

AMDGPUTargetStreamer.cpp

60 lines

Utils/

AMDGPUBaseInfo.h

4 lines

AMDGPUBaseInfo.cpp

4 lines

test/

CodeGen/

AMDGPU/

code-object-v3.ll

48 lines

elf-notes.ll

39 lines

Diff 150060

docs/AMDGPUUsage.rst

Show First 20 Lines • Show All 680 Lines • ▼ Show 20 Lines	.. table:: AMDGPU ELF Symbols
:name: amdgpu-elf-symbols-table		:name: amdgpu-elf-symbols-table

===================== ============== ============= ==================		===================== ============== ============= ==================
Name Type Section Description		Name Type Section Description
===================== ============== ============= ==================		===================== ============== ============= ==================
link-name ``STT_OBJECT`` - ``.data`` Global variable		link-name ``STT_OBJECT`` - ``.data`` Global variable
- ``.rodata``		- ``.rodata``
- ``.bss``		- ``.bss``
link-name\ ``@kd`` ``STT_OBJECT`` - ``.rodata`` Kernel descriptor		link-name\ ``.kd`` ``STT_OBJECT`` - ``.rodata`` Kernel descriptor
		t-tyeUnsubmitted Not Done Reply Inline Actions @b-sumner does this seem to be a reasonable extension to use? Using @ conflicts with its use for symbol versioning need to change to something else. t-tye: @b-sumner does this seem to be a reasonable extension to use? Using @ conflicts with its use…
		b-sumnerUnsubmitted Not Done Reply Inline Actions Yes, it seems reasonable to me. b-sumner: Yes, it seems reasonable to me.
link-name ``STT_FUNC`` - ``.text`` Kernel entry point		link-name ``STT_FUNC`` - ``.text`` Kernel entry point
===================== ============== ============= ==================		===================== ============== ============= ==================

Global variable		Global variable
Global variables both used and defined by the compilation unit.		Global variables both used and defined by the compilation unit.

If the symbol is defined in the compilation unit then it is allocated in the		If the symbol is defined in the compilation unit then it is allocated in the
appropriate section according to if it has initialized data or is readonly.		appropriate section according to if it has initialized data or is readonly.
▲ Show 20 Lines • Show All 875 Lines • ▼ Show 20 Lines
CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.		CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.

.. table:: Kernel Descriptor for GFX6-GFX9		.. table:: Kernel Descriptor for GFX6-GFX9
:name: amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table		:name: amdgpu-amdhsa-kernel-descriptor-gfx6-gfx9-table

======= ======= =============================== ============================		======= ======= =============================== ============================
Bits Size Field Name Description		Bits Size Field Name Description
======= ======= =============================== ============================		======= ======= =============================== ============================
31:0 4 bytes GroupSegmentFixedSize The amount of fixed local		31:0 4 bytes GROUP_SEGMENT_FIXED_SIZE The amount of fixed local
address space memory		address space memory
required for a work-group		required for a work-group
in bytes. This does not		in bytes. This does not
include any dynamically		include any dynamically
allocated local address		allocated local address
space memory that may be		space memory that may be
added when the kernel is		added when the kernel is
dispatched.		dispatched.
63:32 4 bytes PrivateSegmentFixedSize The amount of fixed		63:32 4 bytes PRIVATE_SEGMENT_FIXED_SIZE The amount of fixed
private address space		private address space
memory required for a		memory required for a
work-item in bytes. If		work-item in bytes. If
is_dynamic_callstack is 1		is_dynamic_callstack is 1
then additional space must		then additional space must
be added to this value for		be added to this value for
the call stack.		the call stack.
127:64 8 bytes Reserved, must be 0.		127:64 8 bytes Reserved, must be 0.
191:128 8 bytes KernelCodeEntryByteOffset Byte offset (possibly		191:128 8 bytes KERNEL_CODE_ENTRY_BYTE_OFFSET Byte offset (possibly
negative) from base		negative) from base
address of kernel		address of kernel
descriptor to kernel's		descriptor to kernel's
entry point instruction		entry point instruction
which must be 256 byte		which must be 256 byte
aligned.		aligned.
383:192 24 Reserved, must be 0.		383:192 24 Reserved, must be 0.
bytes		bytes
415:384 4 bytes ComputePgmRsrc1 Compute Shader (CS)		415:384 4 bytes COMPUTE_PGM_RSRC1 Compute Shader (CS)
program settings used by		program settings used by
CP to set up		CP to set up
``COMPUTE_PGM_RSRC1``		``COMPUTE_PGM_RSRC1``
configuration		configuration
register. See		register. See
:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.		:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
447:416 4 bytes ComputePgmRsrc2 Compute Shader (CS)		447:416 4 bytes COMPUTE_PGM_RSRC2 Compute Shader (CS)
program settings used by		program settings used by
CP to set up		CP to set up
``COMPUTE_PGM_RSRC2``		``COMPUTE_PGM_RSRC2``
configuration		configuration
register. See		register. See
:ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.		:ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
448 1 bit EnableSGPRPrivateSegmentBuffer Enable the setup of the		448 1 bit ENABLE_SGPR_PRIVATE_SEGMENT Enable the setup of the
SGPR user data registers		_BUFFER SGPR user data registers
(see		(see
:ref:`amdgpu-amdhsa-initial-kernel-execution-state`).		:ref:`amdgpu-amdhsa-initial-kernel-execution-state`).

The total number of SGPR		The total number of SGPR
user data registers		user data registers
requested must not exceed		requested must not exceed
16 and match value in		16 and match value in
``compute_pgm_rsrc2.user_sgpr.user_sgpr_count``.		``compute_pgm_rsrc2.user_sgpr.user_sgpr_count``.
Any requests beyond 16		Any requests beyond 16
will be ignored.		will be ignored.
449 1 bit EnableSGPRDispatchPtr see above		449 1 bit ENABLE_SGPR_DISPATCH_PTR see above
450 1 bit EnableSGPRQueuePtr see above		450 1 bit ENABLE_SGPR_QUEUE_PTR see above
451 1 bit EnableSGPRKernargSegmentPtr see above		451 1 bit ENABLE_SGPR_KERNARG_SEGMENT_PTR see above
452 1 bit EnableSGPRDispatchID see above		452 1 bit ENABLE_SGPR_DISPATCH_ID see above
453 1 bit EnableSGPRFlatScratchInit see above		453 1 bit ENABLE_SGPR_FLAT_SCRATCH_INIT see above
454 1 bit EnableSGPRPrivateSegmentSize see above		454 1 bit ENABLE_SGPR_PRIVATE_SEGMENT see above
455 1 bit EnableSGPRGridWorkgroupCountX Not implemented in CP and		_SIZE
should always be 0.		455 1 bit ENABLE_SGPR_GRID_WORKGROUP Not implemented in CP and
456 1 bit EnableSGPRGridWorkgroupCountY Not implemented in CP and		_COUNT_X should always be 0.
should always be 0.		456 1 bit ENABLE_SGPR_GRID_WORKGROUP Not implemented in CP and
457 1 bit EnableSGPRGridWorkgroupCountZ Not implemented in CP and		_COUNT_Y should always be 0.
should always be 0.		457 1 bit ENABLE_SGPR_GRID_WORKGROUP Not implemented in CP and
		_COUNT_Z should always be 0.
463:458 6 bits Reserved, must be 0.		463:458 6 bits Reserved, must be 0.
511:464 6 Reserved, must be 0.		511:464 6 Reserved, must be 0.
bytes		bytes
512 Total size 64 bytes.		512 Total size 64 bytes.
======= ====================================================================		======= ====================================================================

..		..

▲ Show 20 Lines • Show All 337 Lines • ▼ Show 20 Lines
..		..

.. table:: Floating Point Rounding Mode Enumeration Values		.. table:: Floating Point Rounding Mode Enumeration Values
:name: amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table		:name: amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table

====================================== ===== ==============================		====================================== ===== ==============================
Enumeration Name Value Description		Enumeration Name Value Description
====================================== ===== ==============================		====================================== ===== ==============================
AMDGPU_FLOAT_ROUND_MODE_NEAR_EVEN 0 Round Ties To Even		FLOAT_ROUND_MODE_NEAR_EVEN 0 Round Ties To Even
AMDGPU_FLOAT_ROUND_MODE_PLUS_INFINITY 1 Round Toward +infinity		FLOAT_ROUND_MODE_PLUS_INFINITY 1 Round Toward +infinity
AMDGPU_FLOAT_ROUND_MODE_MINUS_INFINITY 2 Round Toward -infinity		FLOAT_ROUND_MODE_MINUS_INFINITY 2 Round Toward -infinity
AMDGPU_FLOAT_ROUND_MODE_ZERO 3 Round Toward 0		FLOAT_ROUND_MODE_ZERO 3 Round Toward 0
====================================== ===== ==============================		====================================== ===== ==============================

..		..

.. table:: Floating Point Denorm Mode Enumeration Values		.. table:: Floating Point Denorm Mode Enumeration Values
:name: amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table		:name: amdgpu-amdhsa-floating-point-denorm-mode-enumeration-values-table

====================================== ===== ==============================		====================================== ===== ==============================
Enumeration Name Value Description		Enumeration Name Value Description
====================================== ===== ==============================		====================================== ===== ==============================
AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC_DST 0 Flush Source and Destination		FLOAT_DENORM_MODE_FLUSH_SRC_DST 0 Flush Source and Destination
Denorms		Denorms
AMDGPU_FLOAT_DENORM_MODE_FLUSH_DST 1 Flush Output Denorms		FLOAT_DENORM_MODE_FLUSH_DST 1 Flush Output Denorms
AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC 2 Flush Source Denorms		FLOAT_DENORM_MODE_FLUSH_SRC 2 Flush Source Denorms
AMDGPU_FLOAT_DENORM_MODE_FLUSH_NONE 3 No Flush		FLOAT_DENORM_MODE_FLUSH_NONE 3 No Flush
====================================== ===== ==============================		====================================== ===== ==============================

..		..

.. table:: System VGPR Work-Item ID Enumeration Values		.. table:: System VGPR Work-Item ID Enumeration Values
:name: amdgpu-amdhsa-system-vgpr-work-item-id-enumeration-values-table		:name: amdgpu-amdhsa-system-vgpr-work-item-id-enumeration-values-table

======================================== ===== ============================		======================================== ===== ============================
Enumeration Name Value Description		Enumeration Name Value Description
======================================== ===== ============================		======================================== ===== ============================
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X 0 Set work-item X dimension		SYSTEM_VGPR_WORKITEM_ID_X 0 Set work-item X dimension
ID.		ID.
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y 1 Set work-item X and Y		SYSTEM_VGPR_WORKITEM_ID_X_Y 1 Set work-item X and Y
dimensions ID.		dimensions ID.
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y_Z 2 Set work-item X, Y and Z		SYSTEM_VGPR_WORKITEM_ID_X_Y_Z 2 Set work-item X, Y and Z
dimensions ID.		dimensions ID.
AMDGPU_SYSTEM_VGPR_WORKITEM_ID_UNDEFINED 3 Undefined.		SYSTEM_VGPR_WORKITEM_ID_UNDEFINED 3 Undefined.
======================================== ===== ============================		======================================== ===== ============================

.. _amdgpu-amdhsa-initial-kernel-execution-state:		.. _amdgpu-amdhsa-initial-kernel-execution-state:

Initial Kernel Execution State		Initial Kernel Execution State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This section defines the register state that will be set up by the packet		This section defines the register state that will be set up by the packet
▲ Show 20 Lines • Show All 2,302 Lines • Show Last 20 Lines

include/llvm/Support/AMDGPUKernelDescriptor.h

This file was deleted.

	//===--- AMDGPUKernelDescriptor.h -------------------------------- C++ --===//
	//
	// The LLVM Compiler Infrastructure
	//
	// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.
	//
	//===----------------------------------------------------------------------===//
	//
	/// \file
	/// AMDGPU kernel descriptor definitions. For more information, visit
	/// https://llvm.org/docs/AMDGPUUsage.html#kernel-descriptor-for-gfx6-gfx9
	//
	//===----------------------------------------------------------------------===//

	#ifndef LLVM_SUPPORT_AMDGPUKERNELDESCRIPTOR_H
	#define LLVM_SUPPORT_AMDGPUKERNELDESCRIPTOR_H

	#include <cstdint>

	// Creates enumeration entries used for packing bits into integers. Enumeration
	// entries include bit shift amount, bit width, and bit mask.
	#define AMDGPU_BITS_ENUM_ENTRY(name, shift, width) \
	name ## _SHIFT = (shift), \
	name ## _WIDTH = (width), \
	name = (((1 << (width)) - 1) << (shift)) \

	// Gets bits for specified bit mask from specified source.
	#define AMDGPU_BITS_GET(src, mask) \
	((src & mask) >> mask ## _SHIFT) \

	// Sets bits for specified bit mask in specified destination.
	#define AMDGPU_BITS_SET(dst, mask, val) \
	dst &= (~(1 << mask ## _SHIFT) & ~mask); \
	dst \|= (((val) << mask ## _SHIFT) & mask) \

	namespace llvm {
	namespace AMDGPU {
	namespace HSAKD {

	/// Floating point rounding modes.
	enum : uint8_t {
	AMDGPU_FLOAT_ROUND_MODE_NEAR_EVEN = 0,
	AMDGPU_FLOAT_ROUND_MODE_PLUS_INFINITY = 1,
	AMDGPU_FLOAT_ROUND_MODE_MINUS_INFINITY = 2,
	AMDGPU_FLOAT_ROUND_MODE_ZERO = 3,
	};

	/// Floating point denorm modes.
	enum : uint8_t {
	AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC_DST = 0,
	AMDGPU_FLOAT_DENORM_MODE_FLUSH_DST = 1,
	AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC = 2,
	AMDGPU_FLOAT_DENORM_MODE_FLUSH_NONE = 3,
	};

	/// System VGPR workitem IDs.
	enum : uint8_t {
	AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X = 0,
	AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y = 1,
	AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y_Z = 2,
	AMDGPU_SYSTEM_VGPR_WORKITEM_ID_UNDEFINED = 3,
	};

	/// Compute program resource register one layout.
	enum ComputePgmRsrc1 {
	AMDGPU_BITS_ENUM_ENTRY(GRANULATED_WORKITEM_VGPR_COUNT, 0, 6),
	AMDGPU_BITS_ENUM_ENTRY(GRANULATED_WAVEFRONT_SGPR_COUNT, 6, 4),
	AMDGPU_BITS_ENUM_ENTRY(PRIORITY, 10, 2),
	AMDGPU_BITS_ENUM_ENTRY(FLOAT_ROUND_MODE_32, 12, 2),
	AMDGPU_BITS_ENUM_ENTRY(FLOAT_ROUND_MODE_16_64, 14, 2),
	AMDGPU_BITS_ENUM_ENTRY(FLOAT_DENORM_MODE_32, 16, 2),
	AMDGPU_BITS_ENUM_ENTRY(FLOAT_DENORM_MODE_16_64, 18, 2),
	AMDGPU_BITS_ENUM_ENTRY(PRIV, 20, 1),
	AMDGPU_BITS_ENUM_ENTRY(ENABLE_DX10_CLAMP, 21, 1),
	AMDGPU_BITS_ENUM_ENTRY(DEBUG_MODE, 22, 1),
	AMDGPU_BITS_ENUM_ENTRY(ENABLE_IEEE_MODE, 23, 1),
	AMDGPU_BITS_ENUM_ENTRY(BULKY, 24, 1),
	AMDGPU_BITS_ENUM_ENTRY(CDBG_USER, 25, 1),
	AMDGPU_BITS_ENUM_ENTRY(FP16_OVFL, 26, 1),
	AMDGPU_BITS_ENUM_ENTRY(RESERVED0, 27, 5),
	};

	/// Compute program resource register two layout.
	enum ComputePgmRsrc2 {
	AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_PRIVATE_SEGMENT_WAVE_OFFSET, 0, 1),
	AMDGPU_BITS_ENUM_ENTRY(USER_SGPR_COUNT, 1, 5),
	AMDGPU_BITS_ENUM_ENTRY(ENABLE_TRAP_HANDLER, 6, 1),
	AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_ID_X, 7, 1),
	AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_ID_Y, 8, 1),
	AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_ID_Z, 9, 1),
	AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_INFO, 10, 1),
	AMDGPU_BITS_ENUM_ENTRY(ENABLE_VGPR_WORKITEM_ID, 11, 2),
	AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_ADDRESS_WATCH, 13, 1),
	AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_MEMORY, 14, 1),
	AMDGPU_BITS_ENUM_ENTRY(GRANULATED_LDS_SIZE, 15, 9),
	AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION, 24, 1),
	AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_FP_DENORMAL_SOURCE, 25, 1),
	AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO, 26, 1),
	AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW, 27, 1),
	AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW, 28, 1),
	AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_INEXACT, 29, 1),
	AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO, 30, 1),
	AMDGPU_BITS_ENUM_ENTRY(RESERVED1, 31, 1),
	};

	/// Kernel descriptor layout. This layout should be kept backwards
	/// compatible as it is consumed by the command processor.
	struct KernelDescriptor final {
	uint32_t GroupSegmentFixedSize;
	uint32_t PrivateSegmentFixedSize;
	uint32_t MaxFlatWorkGroupSize;
	uint64_t IsDynamicCallStack : 1;
	uint64_t IsXNACKEnabled : 1;
	uint64_t Reserved0 : 30;
	int64_t KernelCodeEntryByteOffset;
	uint64_t Reserved1[3];
	uint32_t ComputePgmRsrc1;
	uint32_t ComputePgmRsrc2;
	uint64_t EnableSGPRPrivateSegmentBuffer : 1;
	uint64_t EnableSGPRDispatchPtr : 1;
	uint64_t EnableSGPRQueuePtr : 1;
	uint64_t EnableSGPRKernargSegmentPtr : 1;
	uint64_t EnableSGPRDispatchID : 1;
	uint64_t EnableSGPRFlatScratchInit : 1;
	uint64_t EnableSGPRPrivateSegmentSize : 1;
	uint64_t EnableSGPRGridWorkgroupCountX : 1;
	uint64_t EnableSGPRGridWorkgroupCountY : 1;
	uint64_t EnableSGPRGridWorkgroupCountZ : 1;
	uint64_t Reserved2 : 54;

	KernelDescriptor() = default;
	};

	} // end namespace HSAKD
	} // end namespace AMDGPU
	} // end namespace llvm

	#endif // LLVM_SUPPORT_AMDGPUKERNELDESCRIPTOR_H

include/llvm/Support/AMDHSAKernelDescriptor.h

This file was added.

				//===--- AMDHSAKernelDescriptor.h ------------------------------ C++ ----===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				/// \file
				/// AMDHSA kernel descriptor definitions. For more information, visit
				/// https://llvm.org/docs/AMDGPUUsage.html#kernel-descriptor
				//
				t-tyeUnsubmitted Done Reply Inline Actions Suggest just linking to "https://llvm.org/docs/AMDGPUUsage.html#kernel-descriptor" as this may support other targets in the future. t-tye: Suggest just linking to "https://llvm.org/docs/AMDGPUUsage.html#kernel-descriptor" as this may…
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H
				#define LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H

				#include <cstdint>

				// Gets offset of specified member in specified type.
				#ifndef offsetof
				#define offsetof(TYPE, MEMBER) ((size_t)&((TYPE*)0)->MEMBER)
				#endif // offsetof

				// Creates enumeration entries used for packing bits into integers. Enumeration
				// entries include bit shift amount, bit width, and bit mask.
				#ifndef AMDHSA_BITS_ENUM_ENTRY
				#define AMDHSA_BITS_ENUM_ENTRY(NAME, SHIFT, WIDTH) \
				NAME ## _SHIFT = (SHIFT), \
				NAME ## _WIDTH = (WIDTH), \
				NAME = (((1 << (WIDTH)) - 1) << (SHIFT))
				#endif // AMDHSA_BITS_ENUM_ENTRY

				// Gets bits for specified bit mask from specified source.
				#ifndef AMDHSA_BITS_GET
				#define AMDHSA_BITS_GET(SRC, MSK) ((SRC & MSK) >> MSK ## _SHIFT)
				#endif // AMDHSA_BITS_GET

				// Sets bits for specified bit mask in specified destination.
				#ifndef AMDHSA_BITS_SET
				#define AMDHSA_BITS_SET(DST, MSK, VAL) \
				DST &= ~MSK; \
				DST \|= ((VAL << MSK ## _SHIFT) & MSK)
				t-tyeUnsubmitted Done Reply Inline Actions Should this be: ''' DST &= ~MSK; ''' t-tye: Should this be: ''' DST &= ~MSK; '''
				#endif // AMDHSA_BITS_SET

				namespace llvm {
				namespace amdhsa {

				// Floating point rounding modes. Must be kept backwards compatible.
				enum : uint8_t {
				t-tyeUnsubmitted Not Done Reply Inline Actions What does "Must be kept backwards compatible." mean? Arn't these just the meaning of the values? Or is the issue that they may change value on different targets in the future? t-tye: What does "Must be kept backwards compatible." mean? Arn't these just the meaning of the values?
				kzhuravlAuthorUnsubmitted Not Done Reply Inline Actions May change value in future. kzhuravl: May change value in future.
				t-tyeUnsubmitted Not Done Reply Inline Actions Suggest: // Floating point rounding modes. Must match hardware definition. t-tye: Suggest: ``` // Floating point rounding modes. Must match hardware definition. ```
				FLOAT_ROUND_MODE_NEAR_EVEN = 0,
				FLOAT_ROUND_MODE_PLUS_INFINITY = 1,
				FLOAT_ROUND_MODE_MINUS_INFINITY = 2,
				FLOAT_ROUND_MODE_ZERO = 3,
				};

				// Floating point denorm modes. Must be kept backwards compatible.
				enum : uint8_t {
				FLOAT_DENORM_MODE_FLUSH_SRC_DST = 0,
				FLOAT_DENORM_MODE_FLUSH_DST = 1,
				FLOAT_DENORM_MODE_FLUSH_SRC = 2,
				FLOAT_DENORM_MODE_FLUSH_NONE = 3,
				};

				// System VGPR workitem IDs. Must be kept backwards compatible.
				enum : uint8_t {
				SYSTEM_VGPR_WORKITEM_ID_X = 0,
				SYSTEM_VGPR_WORKITEM_ID_X_Y = 1,
				SYSTEM_VGPR_WORKITEM_ID_X_Y_Z = 2,
				SYSTEM_VGPR_WORKITEM_ID_UNDEFINED = 3,
				};

				// Compute program resource register 1. Must be kept backwards compatible.
				#define COMPUTE_PGM_RSRC1(NAME, SHIFT, WIDTH) \
				AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC1_ ## NAME, SHIFT, WIDTH)
				enum : int32_t {
				COMPUTE_PGM_RSRC1(GRANULATED_WORKITEM_VGPR_COUNT, 0, 6),
				t-tyeUnsubmitted Not Done Reply Inline Actions uint32_t to match `uint32_t compute_pgm_rsrc1;`? t-tye: uint32_t to match ``uint32_t compute_pgm_rsrc1;``?
				kzhuravlAuthorUnsubmitted Not Done Reply Inline Actions I get a compiler warning if enum is from uint32_t. kzhuravl: I get a compiler warning if enum is from uint32_t.
				COMPUTE_PGM_RSRC1(GRANULATED_WAVEFRONT_SGPR_COUNT, 6, 4),
				COMPUTE_PGM_RSRC1(PRIORITY, 10, 2),
				COMPUTE_PGM_RSRC1(FLOAT_ROUND_MODE_32, 12, 2),
				COMPUTE_PGM_RSRC1(FLOAT_ROUND_MODE_16_64, 14, 2),
				COMPUTE_PGM_RSRC1(FLOAT_DENORM_MODE_32, 16, 2),
				COMPUTE_PGM_RSRC1(FLOAT_DENORM_MODE_16_64, 18, 2),
				COMPUTE_PGM_RSRC1(PRIV, 20, 1),
				COMPUTE_PGM_RSRC1(ENABLE_DX10_CLAMP, 21, 1),
				COMPUTE_PGM_RSRC1(DEBUG_MODE, 22, 1),
				COMPUTE_PGM_RSRC1(ENABLE_IEEE_MODE, 23, 1),
				COMPUTE_PGM_RSRC1(BULKY, 24, 1),
				COMPUTE_PGM_RSRC1(CDBG_USER, 25, 1),
				COMPUTE_PGM_RSRC1(FP16_OVFL, 26, 1),
				COMPUTE_PGM_RSRC1(RESERVED, 27, 5),
				};
				#undef COMPUTE_PGM_RSRC1

				// Compute program resource register 2. Must be kept backwards compatible.
				#define COMPUTE_PGM_RSRC2(NAME, SHIFT, WIDTH) \
				AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC2_ ## NAME, SHIFT, WIDTH)
				enum : int32_t {
				COMPUTE_PGM_RSRC2(ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET, 0, 1),
				t-tyeUnsubmitted Not Done Reply Inline Actions uint32_t to match `uint32_t compute_pgm_rsrc2;`? t-tye: uint32_t to match ``uint32_t compute_pgm_rsrc2;``?
				COMPUTE_PGM_RSRC2(USER_SGPR_COUNT, 1, 5),
				COMPUTE_PGM_RSRC2(ENABLE_TRAP_HANDLER, 6, 1),
				COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_X, 7, 1),
				COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_Y, 8, 1),
				COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_Z, 9, 1),
				COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_INFO, 10, 1),
				COMPUTE_PGM_RSRC2(ENABLE_VGPR_WORKITEM_ID, 11, 2),
				COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_ADDRESS_WATCH, 13, 1),
				COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_MEMORY, 14, 1),
				COMPUTE_PGM_RSRC2(GRANULATED_LDS_SIZE, 15, 9),
				COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION, 24, 1),
				COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_FP_DENORMAL_SOURCE, 25, 1),
				COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO, 26, 1),
				COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW, 27, 1),
				COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW, 28, 1),
				COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_INEXACT, 29, 1),
				COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO, 30, 1),
				COMPUTE_PGM_RSRC2(RESERVED, 31, 1),
				};
				#undef COMPUTE_PGM_RSRC2

				// Kernel code properties. Must be kept backwards compatible.
				#define KERNEL_CODE_PROPERTY(NAME, SHIFT, WIDTH) \
				AMDHSA_BITS_ENUM_ENTRY(KERNEL_CODE_PROPERTY_ ## NAME, SHIFT, WIDTH)
				enum : int32_t {
				KERNEL_CODE_PROPERTY(ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER, 0, 1),
				t-tyeUnsubmitted Not Done Reply Inline Actions Should this be uint16_t since the kernel descriptor field is `uint16_t kernel_code_properties;`? t-tye: Should this be uint16_t since the kernel descriptor field is ``uint16_t kernel_code_properties…
				KERNEL_CODE_PROPERTY(ENABLE_SGPR_DISPATCH_PTR, 1, 1),
				KERNEL_CODE_PROPERTY(ENABLE_SGPR_QUEUE_PTR, 2, 1),
				KERNEL_CODE_PROPERTY(ENABLE_SGPR_KERNARG_SEGMENT_PTR, 3, 1),
				KERNEL_CODE_PROPERTY(ENABLE_SGPR_DISPATCH_ID, 4, 1),
				KERNEL_CODE_PROPERTY(ENABLE_SGPR_FLAT_SCRATCH_INIT, 5, 1),
				KERNEL_CODE_PROPERTY(ENABLE_SGPR_PRIVATE_SEGMENT_SIZE, 6, 1),
				KERNEL_CODE_PROPERTY(ENABLE_SGPR_GRID_WORKGROUP_COUNT_X, 7, 1),
				KERNEL_CODE_PROPERTY(ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y, 8, 1),
				KERNEL_CODE_PROPERTY(ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z, 9, 1),
				KERNEL_CODE_PROPERTY(RESERVED, 10, 6),
				};
				#undef KERNEL_CODE_PROPERTY

				// Kernel descriptor. Must be kept backwards compatible.
				struct kernel_descriptor_t {
				uint32_t group_segment_fixed_size;
				uint32_t private_segment_fixed_size;
				uint8_t reserved0[8];
				int64_t kernel_code_entry_byte_offset;
				uint8_t reserved1[24];
				uint32_t compute_pgm_rsrc1;
				uint32_t compute_pgm_rsrc2;
				uint16_t kernel_code_properties;
				uint8_t reserved2[6];
				};

				static_assert(
				sizeof(kernel_descriptor_t) == 64,
				"invalid size for kernel_descriptor_t");
				static_assert(
				offsetof(kernel_descriptor_t, group_segment_fixed_size) == 0,
				"invalid offset for group_segment_fixed_size");
				static_assert(
				offsetof(kernel_descriptor_t, private_segment_fixed_size) == 4,
				"invalid offset for private_segment_fixed_size");
				static_assert(
				offsetof(kernel_descriptor_t, reserved0) == 8,
				"invalid offset for reserved0");
				static_assert(
				offsetof(kernel_descriptor_t, kernel_code_entry_byte_offset) == 16,
				"invalid offset for kernel_code_entry_byte_offset");
				static_assert(
				offsetof(kernel_descriptor_t, reserved1) == 24,
				"invalid offset for reserved1");
				static_assert(
				offsetof(kernel_descriptor_t, compute_pgm_rsrc1) == 48,
				"invalid offset for compute_pgm_rsrc1");
				static_assert(
				offsetof(kernel_descriptor_t, compute_pgm_rsrc2) == 52,
				"invalid offset for compute_pgm_rsrc2");
				static_assert(
				offsetof(kernel_descriptor_t, kernel_code_properties) == 56,
				"invalid offset for kernel_code_properties");
				static_assert(
				offsetof(kernel_descriptor_t, reserved2) == 58,
				"invalid offset for reserved2");

				} // end namespace amdhsa
				} // end namespace llvm

				#endif // LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H

lib/Target/AMDGPU/AMDGPUAsmPrinter.h

Show All 14 Lines
#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUASMPRINTER_H		#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUASMPRINTER_H
#define LLVM_LIB_TARGET_AMDGPU_AMDGPUASMPRINTER_H		#define LLVM_LIB_TARGET_AMDGPU_AMDGPUASMPRINTER_H

#include "AMDGPU.h"		#include "AMDGPU.h"
#include "AMDKernelCodeT.h"		#include "AMDKernelCodeT.h"
#include "MCTargetDesc/AMDGPUHSAMetadataStreamer.h"		#include "MCTargetDesc/AMDGPUHSAMetadataStreamer.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/CodeGen/AsmPrinter.h"		#include "llvm/CodeGen/AsmPrinter.h"
		#include "llvm/Support/AMDHSAKernelDescriptor.h"
#include <cstddef>		#include <cstddef>
#include <cstdint>		#include <cstdint>
#include <limits>		#include <limits>
#include <memory>		#include <memory>
#include <string>		#include <string>
#include <vector>		#include <vector>

namespace llvm {		namespace llvm {
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	private:
void EmitPALMetadata(const MachineFunction &MF,		void EmitPALMetadata(const MachineFunction &MF,
const SIProgramInfo &KernelInfo);		const SIProgramInfo &KernelInfo);
void emitCommonFunctionComments(uint32_t NumVGPR,		void emitCommonFunctionComments(uint32_t NumVGPR,
uint32_t NumSGPR,		uint32_t NumSGPR,
uint64_t ScratchSize,		uint64_t ScratchSize,
uint64_t CodeSize,		uint64_t CodeSize,
const AMDGPUMachineFunction* MFI);		const AMDGPUMachineFunction* MFI);

		uint16_t getAmdhsaKernelCodeProperties(
		const MachineFunction &MF) const;

		amdhsa::kernel_descriptor_t getAmdhsaKernelDescriptor(
		const MachineFunction &MF,
		const SIProgramInfo &PI) const;

public:		public:
explicit AMDGPUAsmPrinter(TargetMachine &TM,		explicit AMDGPUAsmPrinter(TargetMachine &TM,
std::unique_ptr<MCStreamer> Streamer);		std::unique_ptr<MCStreamer> Streamer);

StringRef getPassName() const override;		StringRef getPassName() const override;

const MCSubtargetInfo* getSTI() const;		const MCSubtargetInfo* getSTI() const;

Show All 16 Lines	public:
bool emitPseudoExpansionLowering(MCStreamer &OutStreamer,		bool emitPseudoExpansionLowering(MCStreamer &OutStreamer,
const MachineInstr *MI);		const MachineInstr *MI);

/// Implemented in AMDGPUMCInstLower.cpp		/// Implemented in AMDGPUMCInstLower.cpp
void EmitInstruction(const MachineInstr *MI) override;		void EmitInstruction(const MachineInstr *MI) override;

void EmitFunctionBodyStart() override;		void EmitFunctionBodyStart() override;

		void EmitFunctionBodyEnd() override;

void EmitFunctionEntryLabel() override;		void EmitFunctionEntryLabel() override;

void EmitBasicBlockStart(const MachineBasicBlock &MBB) const override;		void EmitBasicBlockStart(const MachineBasicBlock &MBB) const override;

void EmitGlobalVariable(const GlobalVariable *GV) override;		void EmitGlobalVariable(const GlobalVariable *GV) override;

void EmitStartOfAsmFile(Module &M) override;		void EmitStartOfAsmFile(Module &M) override;

Show All 18 Lines

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp

Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines

AMDGPUTargetStreamer* AMDGPUAsmPrinter::getTargetStreamer() const {		AMDGPUTargetStreamer* AMDGPUAsmPrinter::getTargetStreamer() const {
if (!OutStreamer)		if (!OutStreamer)
return nullptr;		return nullptr;
return static_cast<AMDGPUTargetStreamer*>(OutStreamer->getTargetStreamer());		return static_cast<AMDGPUTargetStreamer*>(OutStreamer->getTargetStreamer());
}		}

void AMDGPUAsmPrinter::EmitStartOfAsmFile(Module &M) {		void AMDGPUAsmPrinter::EmitStartOfAsmFile(Module &M) {
		if (IsaInfo::hasCodeObjectV3(getSTI()) &&
		TM.getTargetTriple().getOS() == Triple::AMDHSA)
		return;

if (TM.getTargetTriple().getOS() != Triple::AMDHSA &&		if (TM.getTargetTriple().getOS() != Triple::AMDHSA &&
TM.getTargetTriple().getOS() != Triple::AMDPAL)		TM.getTargetTriple().getOS() != Triple::AMDPAL)
return;		return;

if (TM.getTargetTriple().getOS() == Triple::AMDHSA)		if (TM.getTargetTriple().getOS() == Triple::AMDHSA)
HSAMetadataStream.begin(M);		HSAMetadataStream.begin(M);

if (TM.getTargetTriple().getOS() == Triple::AMDPAL)		if (TM.getTargetTriple().getOS() == Triple::AMDPAL)
readPALMetadata(M);		readPALMetadata(M);

// Deprecated notes are not emitted for code object v3.
if (IsaInfo::hasCodeObjectV3(getSTI()->getFeatureBits()))
return;

// HSA emits NT_AMDGPU_HSA_CODE_OBJECT_VERSION for code objects v2.		// HSA emits NT_AMDGPU_HSA_CODE_OBJECT_VERSION for code objects v2.
if (TM.getTargetTriple().getOS() == Triple::AMDHSA)		if (TM.getTargetTriple().getOS() == Triple::AMDHSA)
getTargetStreamer()->EmitDirectiveHSACodeObjectVersion(2, 1);		getTargetStreamer()->EmitDirectiveHSACodeObjectVersion(2, 1);

// HSA and PAL emit NT_AMDGPU_HSA_ISA for code objects v2.		// HSA and PAL emit NT_AMDGPU_HSA_ISA for code objects v2.
IsaInfo::IsaVersion ISA = IsaInfo::getIsaVersion(getSTI()->getFeatureBits());		IsaInfo::IsaVersion ISA = IsaInfo::getIsaVersion(getSTI()->getFeatureBits());
getTargetStreamer()->EmitDirectiveHSACodeObjectISA(		getTargetStreamer()->EmitDirectiveHSACodeObjectISA(
ISA.Major, ISA.Minor, ISA.Stepping, "AMD", "AMDGPU");		ISA.Major, ISA.Minor, ISA.Stepping, "AMD", "AMDGPU");
}		}

void AMDGPUAsmPrinter::EmitEndOfAsmFile(Module &M) {		void AMDGPUAsmPrinter::EmitEndOfAsmFile(Module &M) {
		// TODO: Add metadata to code object v3.
		if (IsaInfo::hasCodeObjectV3(getSTI()) &&
		TM.getTargetTriple().getOS() == Triple::AMDHSA)
		return;

// Following code requires TargetStreamer to be present.		// Following code requires TargetStreamer to be present.
if (!getTargetStreamer())		if (!getTargetStreamer())
return;		return;

// Emit ISA Version (NT_AMD_AMDGPU_ISA).		// Emit ISA Version (NT_AMD_AMDGPU_ISA).
std::string ISAVersionString;		std::string ISAVersionString;
raw_string_ostream ISAVersionStream(ISAVersionString);		raw_string_ostream ISAVersionStream(ISAVersionString);
Show All 29 Lines	bool AMDGPUAsmPrinter::isBlockOnlyReachableByFallthrough(

// If this is a block implementing a long branch, an expression relative to		// If this is a block implementing a long branch, an expression relative to
// the start of the block is needed. to the start of the block.		// the start of the block is needed. to the start of the block.
// XXX - Is there a smarter way to check this?		// XXX - Is there a smarter way to check this?
return (MBB->back().getOpcode() != AMDGPU::S_SETPC_B64);		return (MBB->back().getOpcode() != AMDGPU::S_SETPC_B64);
}		}

void AMDGPUAsmPrinter::EmitFunctionBodyStart() {		void AMDGPUAsmPrinter::EmitFunctionBodyStart() {
const AMDGPUMachineFunction *MFI = MF->getInfo<AMDGPUMachineFunction>();		const SIMachineFunctionInfo &MFI = *MF->getInfo<SIMachineFunctionInfo>();
if (!MFI->isEntryFunction())		if (!MFI.isEntryFunction())
		return;
		if (IsaInfo::hasCodeObjectV3(getSTI()) &&
		TM.getTargetTriple().getOS() == Triple::AMDHSA)
return;		return;

const AMDGPUSubtarget &STM = MF->getSubtarget<AMDGPUSubtarget>();		const AMDGPUSubtarget &STM = MF->getSubtarget<AMDGPUSubtarget>();
amd_kernel_code_t KernelCode;		amd_kernel_code_t KernelCode;
if (STM.isAmdCodeObjectV2(MF->getFunction())) {		if (STM.isAmdCodeObjectV2(MF->getFunction())) {
getAmdKernelCode(KernelCode, CurrentProgramInfo, *MF);		getAmdKernelCode(KernelCode, CurrentProgramInfo, *MF);
getTargetStreamer()->EmitAMDKernelCodeT(KernelCode);		getTargetStreamer()->EmitAMDKernelCodeT(KernelCode);
}		}

if (TM.getTargetTriple().getOS() != Triple::AMDHSA)		if (TM.getTargetTriple().getOS() != Triple::AMDHSA)
return;		return;

HSAMetadataStream.emitKernel(MF->getFunction(),		HSAMetadataStream.emitKernel(MF->getFunction(),
getHSACodeProps(*MF, CurrentProgramInfo),		getHSACodeProps(*MF, CurrentProgramInfo),
getHSADebugProps(*MF, CurrentProgramInfo));		getHSADebugProps(*MF, CurrentProgramInfo));
}		}

		void AMDGPUAsmPrinter::EmitFunctionBodyEnd() {
		const SIMachineFunctionInfo &MFI = *MF->getInfo<SIMachineFunctionInfo>();
		if (!MFI.isEntryFunction())
		return;
		if (!IsaInfo::hasCodeObjectV3(getSTI()) \|\|
		TM.getTargetTriple().getOS() != Triple::AMDHSA)
		return;

		SmallString<128> KernelName;
		getNameWithPrefix(KernelName, &MF->getFunction());
		getTargetStreamer()->EmitAmdhsaKernelDescriptor(
		KernelName, getAmdhsaKernelDescriptor(*MF, CurrentProgramInfo));
		}

void AMDGPUAsmPrinter::EmitFunctionEntryLabel() {		void AMDGPUAsmPrinter::EmitFunctionEntryLabel() {
		if (IsaInfo::hasCodeObjectV3(getSTI()) &&
		TM.getTargetTriple().getOS() == Triple::AMDHSA) {
		AsmPrinter::EmitFunctionEntryLabel();
		return;
		}

const SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
const AMDGPUSubtarget &STM = MF->getSubtarget<AMDGPUSubtarget>();		const AMDGPUSubtarget &STM = MF->getSubtarget<AMDGPUSubtarget>();
if (MFI->isEntryFunction() && STM.isAmdCodeObjectV2(MF->getFunction())) {		if (MFI->isEntryFunction() && STM.isAmdCodeObjectV2(MF->getFunction())) {
SmallString<128> SymbolName;		SmallString<128> SymbolName;
getNameWithPrefix(SymbolName, &MF->getFunction()),		getNameWithPrefix(SymbolName, &MF->getFunction()),
getTargetStreamer()->EmitAMDGPUSymbolType(		getTargetStreamer()->EmitAMDGPUSymbolType(
SymbolName, ELF::STT_AMDGPU_HSA_KERNEL);		SymbolName, ELF::STT_AMDGPU_HSA_KERNEL);
}		}
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	void AMDGPUAsmPrinter::emitCommonFunctionComments(
OutStreamer->emitRawComment(" codeLenInByte = " + Twine(CodeSize), false);		OutStreamer->emitRawComment(" codeLenInByte = " + Twine(CodeSize), false);
OutStreamer->emitRawComment(" NumSgprs: " + Twine(NumSGPR), false);		OutStreamer->emitRawComment(" NumSgprs: " + Twine(NumSGPR), false);
OutStreamer->emitRawComment(" NumVgprs: " + Twine(NumVGPR), false);		OutStreamer->emitRawComment(" NumVgprs: " + Twine(NumVGPR), false);
OutStreamer->emitRawComment(" ScratchSize: " + Twine(ScratchSize), false);		OutStreamer->emitRawComment(" ScratchSize: " + Twine(ScratchSize), false);
OutStreamer->emitRawComment(" MemoryBound: " + Twine(MFI->isMemoryBound()),		OutStreamer->emitRawComment(" MemoryBound: " + Twine(MFI->isMemoryBound()),
false);		false);
}		}

		uint16_t AMDGPUAsmPrinter::getAmdhsaKernelCodeProperties(
		const MachineFunction &MF) const {
		const SIMachineFunctionInfo &MFI = *MF.getInfo<SIMachineFunctionInfo>();
		uint16_t KernelCodeProperties = 0;

		if (MFI.hasPrivateSegmentBuffer()) {
		KernelCodeProperties \|=
		amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER;
		}
		if (MFI.hasDispatchPtr()) {
		KernelCodeProperties \|=
		amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_PTR;
		}
		if (MFI.hasQueuePtr()) {
		KernelCodeProperties \|=
		amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_QUEUE_PTR;
		}
		if (MFI.hasKernargSegmentPtr()) {
		KernelCodeProperties \|=
		amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_KERNARG_SEGMENT_PTR;
		}
		if (MFI.hasDispatchID()) {
		KernelCodeProperties \|=
		amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_ID;
		}
		if (MFI.hasFlatScratchInit()) {
		KernelCodeProperties \|=
		amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_FLAT_SCRATCH_INIT;
		}
		if (MFI.hasGridWorkgroupCountX()) {
		KernelCodeProperties \|=
		amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_X;
		}
		if (MFI.hasGridWorkgroupCountY()) {
		KernelCodeProperties \|=
		amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y;
		}
		if (MFI.hasGridWorkgroupCountZ()) {
		KernelCodeProperties \|=
		amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z;
		}

		return KernelCodeProperties;
		}

		amdhsa::kernel_descriptor_t AMDGPUAsmPrinter::getAmdhsaKernelDescriptor(
		const MachineFunction &MF,
		const SIProgramInfo &PI) const {
		amdhsa::kernel_descriptor_t KernelDescriptor;
		memset(&KernelDescriptor, 0x0, sizeof(KernelDescriptor));

		assert(isUInt<32>(PI.ScratchSize));
		assert(isUInt<32>(PI.ComputePGMRSrc1));
		assert(isUInt<32>(PI.ComputePGMRSrc2));

		KernelDescriptor.group_segment_fixed_size = PI.LDSSize;
		KernelDescriptor.private_segment_fixed_size = PI.ScratchSize;
		KernelDescriptor.compute_pgm_rsrc1 = PI.ComputePGMRSrc1;
		KernelDescriptor.compute_pgm_rsrc2 = PI.ComputePGMRSrc2;
		KernelDescriptor.kernel_code_properties = getAmdhsaKernelCodeProperties(MF);

		return KernelDescriptor;
		}

bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {		bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {
CurrentProgramInfo = SIProgramInfo();		CurrentProgramInfo = SIProgramInfo();

const AMDGPUMachineFunction *MFI = MF.getInfo<AMDGPUMachineFunction>();		const AMDGPUMachineFunction *MFI = MF.getInfo<AMDGPUMachineFunction>();

// The starting address of all shader programs must be 256 bytes aligned.		// The starting address of all shader programs must be 256 bytes aligned.
// Regular functions just need the basic required instruction alignment.		// Regular functions just need the basic required instruction alignment.
MF.setAlignment(MFI->isEntryFunction() ? 8 : 2);		MF.setAlignment(MFI->isEntryFunction() ? 8 : 2);
▲ Show 20 Lines • Show All 929 Lines • Show Last 20 Lines

lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h

//===-- AMDGPUTargetStreamer.h - AMDGPU Target Streamer --------- C++ ---===//		//===-- AMDGPUTargetStreamer.h - AMDGPU Target Streamer --------- C++ ---===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_LIB_TARGET_AMDGPU_MCTARGETDESC_AMDGPUTARGETSTREAMER_H		#ifndef LLVM_LIB_TARGET_AMDGPU_MCTARGETDESC_AMDGPUTARGETSTREAMER_H
#define LLVM_LIB_TARGET_AMDGPU_MCTARGETDESC_AMDGPUTARGETSTREAMER_H		#define LLVM_LIB_TARGET_AMDGPU_MCTARGETDESC_AMDGPUTARGETSTREAMER_H

#include "AMDKernelCodeT.h"		#include "AMDKernelCodeT.h"
#include "llvm/MC/MCStreamer.h"		#include "llvm/MC/MCStreamer.h"
#include "llvm/MC/MCSubtargetInfo.h"		#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/Support/AMDGPUMetadata.h"		#include "llvm/Support/AMDGPUMetadata.h"
		#include "llvm/Support/AMDHSAKernelDescriptor.h"

namespace llvm {		namespace llvm {
#include "AMDGPUPTNote.h"		#include "AMDGPUPTNote.h"

class DataLayout;		class DataLayout;
class Function;		class Function;
class MCELFStreamer;		class MCELFStreamer;
class MCSymbol;		class MCSymbol;
Show All 32 Lines	public:
/// \returns True on success, false on failure.		/// \returns True on success, false on failure.
virtual bool EmitHSAMetadata(StringRef HSAMetadataString);		virtual bool EmitHSAMetadata(StringRef HSAMetadataString);

/// \returns True on success, false on failure.		/// \returns True on success, false on failure.
virtual bool EmitHSAMetadata(const AMDGPU::HSAMD::Metadata &HSAMetadata) = 0;		virtual bool EmitHSAMetadata(const AMDGPU::HSAMD::Metadata &HSAMetadata) = 0;

/// \returns True on success, false on failure.		/// \returns True on success, false on failure.
virtual bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) = 0;		virtual bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) = 0;

		virtual void EmitAmdhsaKernelDescriptor(
		StringRef KernelName,
		const amdhsa::kernel_descriptor_t &KernelDescriptor) = 0;
};		};

class AMDGPUTargetAsmStreamer final : public AMDGPUTargetStreamer {		class AMDGPUTargetAsmStreamer final : public AMDGPUTargetStreamer {
formatted_raw_ostream &OS;		formatted_raw_ostream &OS;
public:		public:
AMDGPUTargetAsmStreamer(MCStreamer &S, formatted_raw_ostream &OS);		AMDGPUTargetAsmStreamer(MCStreamer &S, formatted_raw_ostream &OS);
void EmitDirectiveHSACodeObjectVersion(uint32_t Major,		void EmitDirectiveHSACodeObjectVersion(uint32_t Major,
uint32_t Minor) override;		uint32_t Minor) override;
Show All 9 Lines	public:
/// \returns True on success, false on failure.		/// \returns True on success, false on failure.
bool EmitISAVersion(StringRef IsaVersionString) override;		bool EmitISAVersion(StringRef IsaVersionString) override;

/// \returns True on success, false on failure.		/// \returns True on success, false on failure.
bool EmitHSAMetadata(const AMDGPU::HSAMD::Metadata &HSAMetadata) override;		bool EmitHSAMetadata(const AMDGPU::HSAMD::Metadata &HSAMetadata) override;

/// \returns True on success, false on failure.		/// \returns True on success, false on failure.
bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) override;		bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) override;

		void EmitAmdhsaKernelDescriptor(
		StringRef KernelName,
		const amdhsa::kernel_descriptor_t &KernelDescriptor) override;
};		};

class AMDGPUTargetELFStreamer final : public AMDGPUTargetStreamer {		class AMDGPUTargetELFStreamer final : public AMDGPUTargetStreamer {
MCStreamer &Streamer;		MCStreamer &Streamer;

void EmitAMDGPUNote(const MCExpr *DescSize, unsigned NoteType,		void EmitAMDGPUNote(const MCExpr *DescSize, unsigned NoteType,
function_ref<void(MCELFStreamer &)> EmitDesc);		function_ref<void(MCELFStreamer &)> EmitDesc);

Show All 16 Lines	public:
/// \returns True on success, false on failure.		/// \returns True on success, false on failure.
bool EmitISAVersion(StringRef IsaVersionString) override;		bool EmitISAVersion(StringRef IsaVersionString) override;

/// \returns True on success, false on failure.		/// \returns True on success, false on failure.
bool EmitHSAMetadata(const AMDGPU::HSAMD::Metadata &HSAMetadata) override;		bool EmitHSAMetadata(const AMDGPU::HSAMD::Metadata &HSAMetadata) override;

/// \returns True on success, false on failure.		/// \returns True on success, false on failure.
bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) override;		bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) override;

		void EmitAmdhsaKernelDescriptor(
		StringRef KernelName,
		const amdhsa::kernel_descriptor_t &KernelDescriptor) override;
};		};

}		}
#endif		#endif

lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp

Show First 20 Lines • Show All 190 Lines • ▼ Show 20 Lines	bool AMDGPUTargetAsmStreamer::EmitPALMetadata(
std::string PALMetadataString;		std::string PALMetadataString;
if (PALMD::toString(PALMetadata, PALMetadataString))		if (PALMD::toString(PALMetadata, PALMetadataString))
return false;		return false;

OS << '\t' << PALMD::AssemblerDirective << PALMetadataString << '\n';		OS << '\t' << PALMD::AssemblerDirective << PALMetadataString << '\n';
return true;		return true;
}		}

		void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor(
		StringRef KernelName,
		const amdhsa::kernel_descriptor_t &KernelDescriptor) {
		// FIXME: not supported yet.
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// AMDGPUTargetELFStreamer		// AMDGPUTargetELFStreamer
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

AMDGPUTargetELFStreamer::AMDGPUTargetELFStreamer(		AMDGPUTargetELFStreamer::AMDGPUTargetELFStreamer(
MCStreamer &S, const MCSubtargetInfo &STI)		MCStreamer &S, const MCSubtargetInfo &STI)
: AMDGPUTargetStreamer(S), Streamer(S) {		: AMDGPUTargetStreamer(S), Streamer(S) {
MCAssembler &MCA = getStreamer().getAssembler();		MCAssembler &MCA = getStreamer().getAssembler();
▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	EmitAMDGPUNote(
ELF::NT_AMD_AMDGPU_PAL_METADATA,		ELF::NT_AMD_AMDGPU_PAL_METADATA,
[&](MCELFStreamer &OS){		[&](MCELFStreamer &OS){
for (auto I : PALMetadata)		for (auto I : PALMetadata)
OS.EmitIntValue(I, sizeof(uint32_t));		OS.EmitIntValue(I, sizeof(uint32_t));
}		}
);		);
return true;		return true;
}		}

		void AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor(
		StringRef KernelName,
		const amdhsa::kernel_descriptor_t &KernelDescriptor) {
		auto &Streamer = getStreamer();
		auto &Context = Streamer.getContext();
		auto &ObjectFileInfo = *Context.getObjectFileInfo();
		auto &ReadOnlySection = *ObjectFileInfo.getReadOnlySection();

		Streamer.PushSection();
		Streamer.SwitchSection(&ReadOnlySection);
		scott.linderUnsubmitted Not Done Reply Inline Actions Should this switch be happening here? Shouldn't the assembly writer be able to put this descriptor in any section? scott.linder: Should this switch be happening here? Shouldn't the assembly writer be able to put this…
		kzhuravlAuthorUnsubmitted Not Done Reply Inline Actions rL334521 kzhuravl: rL334521

		// CP microcode requires the kernel descriptor to be allocated on 64 byte
		// alignment.
		Streamer.EmitValueToAlignment(64, 0, 1, 0);
		if (ReadOnlySection.getAlignment() < 64)
		ReadOnlySection.setAlignment(64);

		MCSymbolELF *KernelDescriptorSymbol = cast<MCSymbolELF>(
		Context.getOrCreateSymbol(Twine(KernelName) + Twine(".kd")));
		KernelDescriptorSymbol->setBinding(ELF::STB_GLOBAL);
		KernelDescriptorSymbol->setType(ELF::STT_OBJECT);
		KernelDescriptorSymbol->setSize(
		MCConstantExpr::create(sizeof(KernelDescriptor), Context));

		MCSymbolELF *KernelCodeSymbol = cast<MCSymbolELF>(
		Context.getOrCreateSymbol(Twine(KernelName)));
		KernelCodeSymbol->setBinding(ELF::STB_LOCAL);

		Streamer.EmitLabel(KernelDescriptorSymbol);
		Streamer.EmitBytes(StringRef(
		(const char*)&(KernelDescriptor),
		offsetof(amdhsa::kernel_descriptor_t, kernel_code_entry_byte_offset)));
		// FIXME: Remove the use of VK_AMDGPU_REL64 in the expression below. The
		// expression being created is:
		// (start of kernel code) - (start of kernel descriptor)
		// It implies R_AMDGPU_REL64, but ends up being R_AMDGPU_ABS64.
		Streamer.EmitValue(MCBinaryExpr::createSub(
		MCSymbolRefExpr::create(
		nhaehnleUnsubmitted Not Done Reply Inline Actions Isn't the REL64 here redundant? The way I see it, we should either have an absolute reference to `(start of kernel code) - (start of kernel descriptor)`, or a relative reference to `(start of kernel code) - offsetof(kernel_descriptor_t, kernel_code_entry_byte_offset)`. I'm not deep enough in MC to say for certain which of these is really preferable. nhaehnle: Isn't the REL64 here redundant? The way I see it, we should either have an //absolute//…
		t-tyeUnsubmitted Not Done Reply Inline Actions The field requires an offset to the entry point so an absolute relocation cannot be used. The Rel64 relocation record is what describes what is needed. Since the kernel descriptor and entry point are now in different sections, a static relocation record is needed so that it can be fixed up when the relocatable code object is linked to make a shared object. Previously the kernel descriptor was put in the same section as the code, and the offset was "hard-wired" when it was generated. t-tye: The field requires an offset to the entry point so an absolute relocation cannot be used. The…
		nhaehnleUnsubmitted Not Done Reply Inline Actions Right, I agree that a REL64 relocation is ultimately needed. My point is more about how to express that fact inside LLVM using the MCExpr framework. I read the expressing that is being created here as literally `(start of kernel code) - (start of kernel descriptor)`. The fact that it's a relative relocation really ought to be implied by that already, it's not clear to me why VK_AMDGPU_REL64 is passed to one of the constructors in addition to that. nhaehnle: Right, I agree that a REL64 relocation is ultimately needed. My point is more about how to…
		kzhuravlAuthorUnsubmitted Not Done Reply Inline Actions If we just put (start of kernel code) - (start of kernel descriptor), it ends up being R_AMDGPU_ABS64. kzhuravl: If we just put (start of kernel code) - (start of kernel descriptor), it ends up being…
		t-tyeUnsubmitted Not Done Reply Inline Actions I agree with @nhaehnle that this seems strange it is needed. Is that a limitation or bug in MCExpr handling? Perhaps it is being used in a way not seen before and so it mishandles this case, and ought to be fixed? t-tye: I agree with @nhaehnle that this seems strange it is needed. Is that a limitation or bug in…
		kzhuravlAuthorUnsubmitted Not Done Reply Inline Actions I put a fixme comment for now. kzhuravl: I put a fixme comment for now.
		scott.linderUnsubmitted Not Done Reply Inline Actions I think the definition of VK_AMDGPU_REL64 got dropped in the most recent patch? scott.linder: I think the definition of VK_AMDGPU_REL64 got dropped in the most recent patch?
		kzhuravlAuthorUnsubmitted Not Done Reply Inline Actions https://reviews.llvm.org/D47601 kzhuravl: https://reviews.llvm.org/D47601
		KernelCodeSymbol, MCSymbolRefExpr::VK_AMDGPU_REL64, Context),
		MCSymbolRefExpr::create(
		KernelDescriptorSymbol, MCSymbolRefExpr::VK_None, Context),
		Context),
		sizeof(KernelDescriptor.kernel_code_entry_byte_offset));
		Streamer.EmitBytes(StringRef(
		(const char*)&(KernelDescriptor) +
		offsetof(amdhsa::kernel_descriptor_t, kernel_code_entry_byte_offset) +
		sizeof(KernelDescriptor.kernel_code_entry_byte_offset),
		sizeof(KernelDescriptor) -
		offsetof(amdhsa::kernel_descriptor_t, kernel_code_entry_byte_offset) -
		sizeof(KernelDescriptor.kernel_code_entry_byte_offset)));

		Streamer.PopSection();
		}

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h

	Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	};			};

	/// \returns Isa version for given subtarget \p Features.			/// \returns Isa version for given subtarget \p Features.
	IsaVersion getIsaVersion(const FeatureBitset &Features);			IsaVersion getIsaVersion(const FeatureBitset &Features);

	/// Streams isa version string for given subtarget \p STI into \p Stream.			/// Streams isa version string for given subtarget \p STI into \p Stream.
	void streamIsaVersion(const MCSubtargetInfo *STI, raw_ostream &Stream);			void streamIsaVersion(const MCSubtargetInfo *STI, raw_ostream &Stream);

	/// \returns True if given subtarget \p Features support code object version 3,			/// \returns True if given subtarget \p STI supports code object version 3,
	/// false otherwise.			/// false otherwise.
	bool hasCodeObjectV3(const FeatureBitset &Features);			bool hasCodeObjectV3(const MCSubtargetInfo *STI);

	/// \returns Wavefront size for given subtarget \p Features.			/// \returns Wavefront size for given subtarget \p Features.
	unsigned getWavefrontSize(const FeatureBitset &Features);			unsigned getWavefrontSize(const FeatureBitset &Features);

	/// \returns Local memory size in bytes for given subtarget \p Features.			/// \returns Local memory size in bytes for given subtarget \p Features.
	unsigned getLocalMemorySize(const FeatureBitset &Features);			unsigned getLocalMemorySize(const FeatureBitset &Features);

	/// \returns Number of execution units per compute unit for given subtarget \p			/// \returns Number of execution units per compute unit for given subtarget \p
	▲ Show 20 Lines • Show All 320 Lines • Show Last 20 Lines

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

Show First 20 Lines • Show All 242 Lines • ▼ Show 20 Lines	Stream << TargetTriple.getArchName() << '-'
<< TargetTriple.getEnvironmentName() << '-'		<< TargetTriple.getEnvironmentName() << '-'
<< "gfx"		<< "gfx"
<< ISAVersion.Major		<< ISAVersion.Major
<< ISAVersion.Minor		<< ISAVersion.Minor
<< ISAVersion.Stepping;		<< ISAVersion.Stepping;
Stream.flush();		Stream.flush();
}		}

bool hasCodeObjectV3(const FeatureBitset &Features) {		bool hasCodeObjectV3(const MCSubtargetInfo *STI) {
return Features.test(FeatureCodeObjectV3);		return STI->getFeatureBits().test(FeatureCodeObjectV3);
}		}

unsigned getWavefrontSize(const FeatureBitset &Features) {		unsigned getWavefrontSize(const FeatureBitset &Features) {
if (Features.test(FeatureWavefrontSize16))		if (Features.test(FeatureWavefrontSize16))
return 16;		return 16;
if (Features.test(FeatureWavefrontSize32))		if (Features.test(FeatureWavefrontSize32))
return 32;		return 32;

▲ Show 20 Lines • Show All 705 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/code-object-v3.ll

This file was added.

				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=+code-object-v3 < %s \| FileCheck --check-prefixes=ALL-ASM,OSABI-AMDHSA-ASM %s
				; RUN: llc -filetype=obj -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=+code-object-v3 < %s \| llvm-readobj -elf-output-style=GNU -notes -relocations -sections -symbols \| FileCheck --check-prefixes=ALL-ELF,OSABI-AMDHSA-ELF %s

				; OSABI-AMDHSA-ASM-NOT: .hsa_code_object_version
				; OSABI-AMDHSA-ASM-NOT: .hsa_code_object_isa
				; OSABI-AMDHSA-ASM-NOT: .amd_amdgpu_isa
				; OSABI-AMDHSA-ASM-NOT: .amd_amdgpu_hsa_metadata
				; OSABI-AMDHSA-ASM-NOT: .amd_amdgpu_pal_metadata

				; OSABI-AMDHSA-ELF: Section Headers
				; OSABI-AMDHSA-ELF: .text PROGBITS {{[0-9]+}} {{[0-9]+}} {{[0-9a-f]+}} {{[0-9]+}} AX {{[0-9]+}} {{[0-9]+}} 256
				; OSABI-AMDHSA-ELF: .rodata PROGBITS {{[0-9]+}} {{[0-9]+}} {{[0-9a-f]+}} {{[0-9]+}} A {{[0-9]+}} {{[0-9]+}} 64

				; OSABI-AMDHSA-ELF: Relocation section '.rela.rodata' at offset
				; OSABI-AMDHSA-ELF: 0000000000000010 0000000300000005 R_AMDGPU_REL64 0000000000000000 .text + 10
				; OSABI-AMDHSA-ELF: 0000000000000050 0000000300000005 R_AMDGPU_REL64 0000000000000000 .text + 110

				; OSABI-AMDHSA-ELF: Symbol table '.symtab' contains {{[0-9]+}} entries
				; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000000 {{[0-9]+}} FUNC LOCAL DEFAULT {{[0-9]+}} fadd
				; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000100 {{[0-9]+}} FUNC LOCAL DEFAULT {{[0-9]+}} fsub
				; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000000 64 OBJECT GLOBAL DEFAULT {{[0-9]+}} fadd.kd
				; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000040 64 OBJECT GLOBAL DEFAULT {{[0-9]+}} fsub.kd

				; OSABI-AMDHSA-ELF-NOT: Displaying notes found

				define amdgpu_kernel void @fadd(
				float addrspace(1)* %r,
				float addrspace(1)* %a,
				float addrspace(1)* %b) {
				entry:
				%a.val = load float, float addrspace(1)* %a
				%b.val = load float, float addrspace(1)* %b
				%r.val = fadd float %a.val, %b.val
				store float %r.val, float addrspace(1)* %r
				ret void
				}

				define amdgpu_kernel void @fsub(
				float addrspace(1)* %r,
				float addrspace(1)* %a,
				float addrspace(1)* %b) {
				entry:
				%a.val = load float, float addrspace(1)* %a
				%b.val = load float, float addrspace(1)* %b
				%r.val = fsub float %a.val, %b.val
				store float %r.val, float addrspace(1)* %r
				ret void
				}

test/CodeGen/AMDGPU/elf-notes.ll

	; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 -mattr=+code-object-v3 < %s \| FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s			; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 < %s \| FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s
	; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=iceland -mattr=+code-object-v3 < %s \| FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s			; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=iceland < %s \| FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s
	; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 -mattr=+code-object-v3 -filetype=obj < %s \| llvm-readobj -elf-output-style=GNU -notes \| FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK-ELF --check-prefix=GFX802 %s			; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 -filetype=obj < %s \| llvm-readobj -elf-output-style=GNU -notes \| FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK-ELF --check-prefix=GFX802 %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 -mattr=+code-object-v3 < %s \| FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 < %s \| FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=iceland -mattr=+code-object-v3 < %s \| FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=iceland < %s \| FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 -mattr=+code-object-v3 -filetype=obj < %s \| llvm-readobj -elf-output-style=GNU -notes \| FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA-ELF --check-prefix=GFX802 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 -filetype=obj < %s \| llvm-readobj -elf-output-style=GNU -notes \| FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA-ELF --check-prefix=GFX802 %s
	; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 -mattr=+code-object-v3 < %s \| FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s			; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 < %s \| FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s
	; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=iceland -mattr=+code-object-v3 < %s \| FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s			; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=iceland < %s \| FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s
	; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 -mattr=+code-object-v3 -filetype=obj < %s \| llvm-readobj -elf-output-style=GNU -notes \| FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL-ELF --check-prefix=GFX802 %s			; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 -filetype=obj < %s \| llvm-readobj -elf-output-style=GNU -notes \| FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL-ELF --check-prefix=GFX802 %s
	; RUN: llc -march=r600 -mattr=+code-object-v3 < %s \| FileCheck --check-prefix=R600 %s			; RUN: llc -march=r600 < %s \| FileCheck --check-prefix=R600 %s

	; OSABI-UNK-NOT: .hsa_code_object_version			; OSABI-UNK-NOT: .hsa_code_object_version
	; OSABI-UNK-NOT: .hsa_code_object_isa			; OSABI-UNK-NOT: .hsa_code_object_isa
	; OSABI-UNK: .amd_amdgpu_isa "amdgcn-amd-unknown--gfx802"			; OSABI-UNK: .amd_amdgpu_isa "amdgcn-amd-unknown--gfx802"
	; OSABI-UNK-NOT: .amd_amdgpu_hsa_metadata			; OSABI-UNK-NOT: .amd_amdgpu_hsa_metadata
	; OSABI-UNK-NOT: .amd_amdgpu_pal_metadata			; OSABI-UNK-NOT: .amd_amdgpu_pal_metadata

	; OSABI-UNK-ELF-NOT: Unknown note type			; OSABI-UNK-ELF-NOT: Unknown note type
	; OSABI-UNK-ELF: NT_AMD_AMDGPU_ISA (ISA Version)			; OSABI-UNK-ELF: NT_AMD_AMDGPU_ISA (ISA Version)
	; OSABI-UNK-ELF: ISA Version:			; OSABI-UNK-ELF: ISA Version:
	; OSABI-UNK-ELF: amdgcn-amd-unknown--gfx802			; OSABI-UNK-ELF: amdgcn-amd-unknown--gfx802
	; OSABI-UNK-ELF-NOT: Unknown note type			; OSABI-UNK-ELF-NOT: Unknown note type
	; OSABI-UNK-ELF-NOT: NT_AMD_AMDGPU_HSA_METADATA (HSA Metadata)			; OSABI-UNK-ELF-NOT: NT_AMD_AMDGPU_HSA_METADATA (HSA Metadata)
	; OSABI-UNK-ELF-NOT: Unknown note type			; OSABI-UNK-ELF-NOT: Unknown note type
	; OSABI-UNK-ELF-NOT: NT_AMD_AMDGPU_PAL_METADATA (PAL Metadata)			; OSABI-UNK-ELF-NOT: NT_AMD_AMDGPU_PAL_METADATA (PAL Metadata)
	; OSABI-UNK-ELF-NOT: Unknown note type			; OSABI-UNK-ELF-NOT: Unknown note type

	; OSABI-HSA-NOT: .hsa_code_object_version			; OSABI-HSA: .hsa_code_object_version
	; OSABI-HSA-NOT: .hsa_code_object_isa			; OSABI-HSA: .hsa_code_object_isa
	; OSABI-HSA: .amd_amdgpu_isa "amdgcn-amd-amdhsa--gfx802"			; OSABI-HSA: .amd_amdgpu_isa "amdgcn-amd-amdhsa--gfx802"
	; OSABI-HSA: .amd_amdgpu_hsa_metadata			; OSABI-HSA: .amd_amdgpu_hsa_metadata
	; OSABI-HSA-NOT: .amd_amdgpu_pal_metadata			; OSABI-HSA-NOT: .amd_amdgpu_pal_metadata

	; OSABI-HSA-ELF-NOT: Unknown note type			; OSABI-HSA-ELF: Unknown note type (0x00000001)
				; OSABI-HSA-ELF: Unknown note type (0x00000003)
	; OSABI-HSA-ELF: NT_AMD_AMDGPU_ISA (ISA Version)			; OSABI-HSA-ELF: NT_AMD_AMDGPU_ISA (ISA Version)
	; OSABI-HSA-ELF: ISA Version:			; OSABI-HSA-ELF: ISA Version:
	; OSABI-HSA-ELF: amdgcn-amd-amdhsa--gfx802			; OSABI-HSA-ELF: amdgcn-amd-amdhsa--gfx802
	; OSABI-HSA-ELF-NOT: Unknown note type
	; OSABI-HSA-ELF: NT_AMD_AMDGPU_HSA_METADATA (HSA Metadata)			; OSABI-HSA-ELF: NT_AMD_AMDGPU_HSA_METADATA (HSA Metadata)
	; OSABI-HSA-ELF: HSA Metadata:			; OSABI-HSA-ELF: HSA Metadata:
	; OSABI-HSA-ELF: ---			; OSABI-HSA-ELF: ---
	; OSABI-HSA-ELF: Version: [ 1, 0 ]			; OSABI-HSA-ELF: Version: [ 1, 0 ]
	; OSABI-HSA-ELF: Kernels:			; OSABI-HSA-ELF: Kernels:
	; OSABI-HSA-ELF: - Name: elf_notes			; OSABI-HSA-ELF: - Name: elf_notes
	; OSABI-HSA-ELF: SymbolName: 'elf_notes@kd'			; OSABI-HSA-ELF: SymbolName: 'elf_notes@kd'
	; OSABI-HSA-ELF: CodeProps:			; OSABI-HSA-ELF: CodeProps:
	; OSABI-HSA-ELF: KernargSegmentSize: 0			; OSABI-HSA-ELF: KernargSegmentSize: 0
	; OSABI-HSA-ELF: GroupSegmentFixedSize: 0			; OSABI-HSA-ELF: GroupSegmentFixedSize: 0
	; OSABI-HSA-ELF: PrivateSegmentFixedSize: 0			; OSABI-HSA-ELF: PrivateSegmentFixedSize: 0
	; OSABI-HSA-ELF: KernargSegmentAlign: 4			; OSABI-HSA-ELF: KernargSegmentAlign: 4
	; OSABI-HSA-ELF: WavefrontSize: 64			; OSABI-HSA-ELF: WavefrontSize: 64
	; OSABI-HSA-ELF: NumSGPRs: 96			; OSABI-HSA-ELF: NumSGPRs: 96
	; OSABI-HSA-ELF: ...			; OSABI-HSA-ELF: ...
	; OSABI-HSA-ELF-NOT: Unknown note type
	; OSABI-HSA-ELF-NOT: NT_AMD_AMDGPU_PAL_METADATA (PAL Metadata)			; OSABI-HSA-ELF-NOT: NT_AMD_AMDGPU_PAL_METADATA (PAL Metadata)
	; OSABI-HSA-ELF-NOT: Unknown note type

	; OSABI-PAL-NOT: .hsa_code_object_version			; OSABI-PAL-NOT: .hsa_code_object_version
	; OSABI-PAL-NOT: .hsa_code_object_isa			; OSABI-PAL: .hsa_code_object_isa
	; OSABI-PAL: .amd_amdgpu_isa "amdgcn-amd-amdpal--gfx802"			; OSABI-PAL: .amd_amdgpu_isa "amdgcn-amd-amdpal--gfx802"
	; OSABI-PAL-NOT: .amd_amdgpu_hsa_metadata			; OSABI-PAL-NOT: .amd_amdgpu_hsa_metadata
	; OSABI-PAL: .amd_amdgpu_pal_metadata			; OSABI-PAL: .amd_amdgpu_pal_metadata

	; OSABI-PAL-ELF-NOT: Unknown note type			; OSABI-PAL-ELF: Unknown note type (0x00000003)
	; OSABI-PAL-ELF: NT_AMD_AMDGPU_ISA (ISA Version)			; OSABI-PAL-ELF: NT_AMD_AMDGPU_ISA (ISA Version)
	; OSABI-PAL-ELF: ISA Version:			; OSABI-PAL-ELF: ISA Version:
	; OSABI-PAL-ELF: amdgcn-amd-amdpal--gfx802			; OSABI-PAL-ELF: amdgcn-amd-amdpal--gfx802
	; OSABI-PAL-ELF-NOT: Unknown note type
	; OSABI-PAL-ELF-NOT: NT_AMD_AMDGPU_HSA_METADATA (HSA Metadata)			; OSABI-PAL-ELF-NOT: NT_AMD_AMDGPU_HSA_METADATA (HSA Metadata)
	; OSABI-PAL-ELF-NOT: Unknown note type
	; OSABI-PAL-ELF: NT_AMD_AMDGPU_PAL_METADATA (PAL Metadata)			; OSABI-PAL-ELF: NT_AMD_AMDGPU_PAL_METADATA (PAL Metadata)
	; OSABI-PAL-ELF: PAL Metadata:			; OSABI-PAL-ELF: PAL Metadata:
	; TODO: Following check line fails on mips:			; TODO: Following check line fails on mips:
	; OSABI-PAL-ELF-XXX: 0x2e12,0xac02c0,0x2e13,0x80,0x1000001b,0x1,0x10000022,0x60,0x1000003e,0x0			; OSABI-PAL-ELF-XXX: 0x2e12,0xac02c0,0x2e13,0x80,0x1000001b,0x1,0x10000022,0x60,0x1000003e,0x0
	; OSABI-PAL-ELF-NOT: Unknown note type

	; R600-NOT: .hsa_code_object_version			; R600-NOT: .hsa_code_object_version
	; R600-NOT: .hsa_code_object_isa			; R600-NOT: .hsa_code_object_isa
	; R600-NOT: .amd_amdgpu_isa			; R600-NOT: .amd_amdgpu_isa
	; R600-NOT: .amd_amdgpu_hsa_metadata			; R600-NOT: .amd_amdgpu_hsa_metadata
	; R600-NOT: .amd_amdgpu_pal_metadatas			; R600-NOT: .amd_amdgpu_pal_metadata

	define amdgpu_kernel void @elf_notes() {			define amdgpu_kernel void @elf_notes() {
	ret void			ret void
	}			}