This is an archive of the discontinued LLVM Phabricator instance.

llvm/docs/AMDGPUUsage.rst
1298	Isn't this a generic thing that is (to be) used by all targets, not just AMDGPU? Should we instead document it elsewhere and link it here? I guess the same could be said for the target id stuff but there people don't really use it elsewhere yet.

yaxunl marked an inline comment as done.Aug 5 2020, 5:39 AM

yaxunl added inline comments.

llvm/docs/AMDGPUUsage.rst
1298	There are two usage of `clang-offload-bundler`: As a generic bundler for clang intermediate files, including preprocessor outputs, LLVM bitcode, object files. The consumer is clang. As a code object bundler (or so called fat binary) which bundles code objects for different GPU's so that it can be embedded in an executable or shared library. The consumer is HIP runtime. Here we only describe the second usage of `clang-offload-bundler`, which is only used by AMDGPU target. As you can see it refers to `code objects` and `target ID`. Therefore it is better kept in AMDGPU documentation.

jsjodin added a subscriber: jsjodin.Aug 5 2020, 6:00 AM

JonChesterfield added a subscriber: JonChesterfield.Aug 5 2020, 7:01 AM

JonChesterfield added inline comments.

llvm/docs/AMDGPUUsage.rst
393	The on/off division seems likely to cause us problems later. How about 0/1 instead, so that we can later add 2, 3 etc when a feature is added that has more than two states? Or strings.

jdoerfert added inline comments.Aug 5 2020, 7:56 AM

llvm/docs/AMDGPUUsage.rst
1298	Why is this second usage restricted to HIP (in the upcoming future)? I mean, OpenMP offload already "bundles code objects for different GPU's so that it can be embedded in an executable or shared library" today. It just doesn't yet allow to do so on IR level, something AMD is in the process of changing. That said, what distinguishes usage 1 from usage 2 anymore?

JonChesterfield added inline comments.Aug 5 2020, 8:17 AM

llvm/docs/AMDGPUUsage.rst
1326	I'm not sure we want to encode artefacts of clang-offload-bundler in this spec

yaxunl marked 3 inline comments as done.Aug 5 2020, 8:41 AM

yaxunl added inline comments.

llvm/docs/AMDGPUUsage.rst
393	Using +/- is more concise and easier to parse. If we need multi-value for future attributes, it is not difficult to differentiate them by checking the last character.
1326	This is due to a restriction of `clang-offload-bundler`. Here we document the current situation. We plan to fix that by removing the artifact.

jdoerfert added inline comments.Aug 5 2020, 8:46 AM

llvm/docs/AMDGPUUsage.rst
393	I'm with @yaxunl. This is literally duplicating something else we have in the IR already, maybe not also diverge for no real reason.

separated ClangOffloadBundlerFileFormat

Herald added a subscriber: arphaman. · View Herald TranscriptAug 5 2020, 12:03 PM

yaxunl marked an inline comment as done.Aug 5 2020, 12:04 PM

yaxunl added inline comments.

llvm/docs/AMDGPUUsage.rst
1298	I separated this part as ClangOffloadBundlerFileFormat.rst

t-tye added inline comments.Aug 5 2020, 12:51 PM

llvm/docs/AMDGPUUsage.rst
1306	Use the Sphinx doc:`ClangOffloadBundlerFileFormat` syntax. I think there is an example elsewhere in the file.

LGTM except for minor :doc: reference comment.

This revision is now accepted and ready to land.Aug 5 2020, 7:12 PM

The commit does not have a message, contains unrelated parts, and is still not addressing my issues wrt. the description of common functionality as AMDGPU functionality.

clang/docs/ClangOffloadBundlerFileFormat.rst
89	This is not generic and ignores the fact that we use this tool for non-AMDGPU targets. The target-id description above does too.
llvm/docs/AMDGPUUsage.rst
264	None of these changes seem related to "Add target ID to AMDGPU documentation".

This revision now requires changes to proceed.Aug 5 2020, 8:28 PM

jdoerfert mentioned this in D84519: [llvm-objdump][AMDGPU] Detect CPU string.Aug 6 2020, 12:02 PM

revised by Tony and Johannes' comments.

clang/docs/ClangOffloadBundlerFileFormat.rst
89	fixed
llvm/docs/AMDGPUUsage.rst
264	xnack and sram-ecc are now set through target ID, and their default value are now changed to "default". As such, the default value of all features are now described in the "Target Features" section instead of in this table.

Can you add the proposed commit message to the phab diff as well? The commit subject does by far not cover changes like "Embedding Bundled Code Objects".

jdoerfert added inline comments.Aug 10 2020, 6:01 PM

clang/docs/ClangOffloadBundlerFileFormat.rst
89	It is still completely unclear how non AMDGPU targets use this. The description is confusing IMHO. Specifically, the `<bundle_entry_id>` specifies `"-" <processor_or_target_id>` but it is unclear what that means for other targets, given that the `processor_or_target_id` description links to the AMDGPU docs. If you want to say other targets do not use the target_id at all, so there is nothing, please specify say that explicitly. If there is something, we have to say something about that as well, at least that there are alternatives to the AMDGPU way.

revised by Johannes' comments.

yaxunl edited the summary of this revision. (Show Details)Aug 11 2020, 7:10 AM

In D84822#2208562, @jdoerfert wrote:

Can you add the proposed commit message to the phab diff as well? The commit subject does by far not cover changes like "Embedding Bundled Code Objects".

Johannes, did my update address the issue? Any further changes are needed? Thanks.

t-tye added inline comments.Aug 12 2020, 9:57 AM

llvm/docs/AMDGPUUsage.rst
211	This processor does support xnack so removing that from the Features column seems incorrect. That is true for the other processors below.

t-tye requested changes to this revision.Aug 12 2020, 9:57 AM

This revision now requires changes to proceed.Aug 12 2020, 9:57 AM

Fix xnack for gfx8

LGTM

t-tye added a reviewer: arsenm.Aug 13 2020, 11:02 AM

arsenm added inline comments.Aug 13 2020, 11:51 AM

llvm/docs/AMDGPUUsage.rst
387	Default is an extremely confusing name. This should reflect that it's the universally compatible mode. AnyMode? Either? Universal?

yaxunl marked an inline comment as done.Aug 13 2020, 3:19 PM

yaxunl added inline comments.

llvm/docs/AMDGPUUsage.rst
387	Another word I can think of is "General" or "Generic". @t-tye What do you think? Thanks.

t-tye added inline comments.Aug 13 2020, 6:34 PM

llvm/docs/AMDGPUUsage.rst
387	What about "any". It is short like "on" and "off" and captures what Matt suggests (a truncation of "anymode" which would imply "onmode" and "offmode" which seem a mouthful and no more obvious). Thoughts?

yaxunl marked an inline comment as done.Aug 14 2020, 9:18 AM

yaxunl added inline comments.

llvm/docs/AMDGPUUsage.rst
387	"Any" sounds perfect. I will update the patch. Thanks.

I still believe the language is confusing at best. We should not describe generic functionality by linking AMDGPU documentation if this is not the same for all targets. We can say, for AMDGPU this is how is done, but as of now it does not state it this way. To me this reads as if the AMDGPU functionality is the only way these things are used, which is not the case as far as I can tell. Anyway, I get the feeling it might be simpler to go over this in a subsequent patch.

This revision is now accepted and ready to land.Aug 16 2020, 11:24 AM

t-tye added inline comments.Aug 17 2020, 7:51 PM

llvm/docs/AMDGPUUsage.rst
227	Change all occurences of sram-ecc to sramecc to avoid problems that hyphen is used as a bundled code object entry ID separator.
2287	Add "TargetID" attribute that is a string that is the target ID for the module.
7678	<target> -> <target-id>
7681	target ID
7684–7685	and `-mcpu`.
7760–7761	This directive is being removed since the information is now present in the target ID that is provided by the .amdgcn_target directive.

t-tye requested changes to this revision.Aug 17 2020, 7:51 PM

This revision now requires changes to proceed.Aug 17 2020, 7:51 PM

t-tye added inline comments.Aug 17 2020, 8:28 PM

clang/docs/ClangOffloadBundlerFileFormat.rst
12	[ "-" <processor_or_target_id> ] This is optional as some targets do not support this.
26	(for AMD GPU see :doc:`amdgpu-target-ids`)
88–90	defined by the target (for AMD GPU see :doc::`amdgpu-embedding-bundled-objects`).
llvm/docs/AMDGPUUsage.rst
272	We should move the code object V3 into a separate page so we continue to define that ABI since old code objects still exist. Then these changes can define the code object V4.
2287	Or is this: <target_triple> "-" <target_id>
7678	Or is this: <target_triple> "-" <target_id>
7681	Or is this: <target_triple> "-" <target_id>

revised by Tony's and Matt's comments

clang/docs/ClangOffloadBundlerFileFormat.rst
12	If a target does not support target ID, <processor_or_target_id> will be processor, so this item is still there.
26	clang documentation is in different URL than llvm documentation.
llvm/docs/AMDGPUUsage.rst
227	done
272	can we do that with a separate patch? This patch already contains too many changes.
2287	This is V2. Are we sure we want to add target ID to V2?
7684–7685	-mcpu is already mentioned by the same sentence
7760–7761	done

t-tye added inline comments.Aug 18 2020, 2:39 PM

clang/docs/ClangOffloadBundlerFileFormat.rst
12	My understanding was that some devices do not need a processor to be specified at all. For example, for x86 isn't the target triple enough by itself? In OpenMP it would simply use a bundled code object entry ID that is "opemp-<target-triple>".
26	So LLVM has a mono-repo but the documentation does not have a single TOC tree? That is a shame and rather defeats the benefits of being in a mono repo.
llvm/docs/AMDGPUUsage.rst
272	My concern is that this patch will obliterate the V3 definition with this V4 one. But I guess you can always reach back to an earlier commit as that is what git is for:-)
2287	Sorry I put the comment in the wrong section. I meant to put it in the V3 (not becoming V4) section.
7684–7685	My suggested replacement was to remove the "and those which specify target features." since those options are being removed. So the "and" needs to go before the -mcpu to become: Used by the assembler to validate command-line options such as ``-triple`` and ``-mcpu``.

How does one opt out of this scheme?

OpenMP already has a convention for finding code objects in the host binary, used by amdgpu and other targets, which I don't think matches the above.

Where's the corresponding implementation? I think it's the llc part I need to read to understand whether this proposal can be used outside of HIP and amd's OpenCL.

In D84822#2225968, @JonChesterfield wrote:

How does one opt out of this scheme?

OpenMP already has a convention for finding code objects in the host binary, used by amdgpu and other targets, which I don't think matches the above.

Where's the corresponding implementation? I think it's the llc part I need to read to understand whether this proposal can be used outside of HIP and amd's OpenCL.

If you do not care about turning on/off xnack and sram-ecc, you do not need to use it. The current processor name still works.

If you need to turn on/off xnack and sram-ecc, you need to use target ID. From user point of view, it is an extension to processor name. Target ID support is implemented in clang driver and backend (https://reviews.llvm.org/D60620). In clang driver, it is supported by clang tool by -mcpu option as an extension for processor name. As far as you pass target ID e.g. gfx906:xnack+ to clang tool by -mcpu, you will get target ID support in backend. My understanding is that since OpenMP only loads one GPU arch, only backend support is needed, therefore target ID support on OpenMP should be automatic.

revised by Tony's comments.

clang/docs/ClangOffloadBundlerFileFormat.rst
12	fixed
llvm/docs/AMDGPUUsage.rst
2287	Moved to V3.
7684–7685	Sorry, I misunderstood. Fixed.

t-tye added inline comments.Aug 19 2020, 7:26 PM

clang/docs/ClangOffloadBundlerFileFormat.rst
26	for AMD GPU see
88–90	defined by the target (for AMD GPU see `AMDGPU Embedding Bundled Code Objects <https://llvm.org/docs/AMDGPUUsage.html#embedding-bundled-objects>`_).
llvm/docs/AMDGPUUsage.rst
90	We should move the code object V3 into a separate page so we continue to define that ABI since old code objects still exist. Then these changes can define the code object V4, with a reference to the pages for previous vesions.
264	TBA .. TODO:: Add product names.
7679	Make +'s match length of above title.

In D84822#2226070, @yaxunl wrote:

If you do not care about turning on/off xnack and sram-ecc, you do not need to use it. The current processor name still works.

Nice. That makes me much less nervous about this support. Plus the change seems to have gone in last year without breaking openmp. Thanks for the link.

t-tye requested changes to this revision.Aug 20 2020, 11:04 AM

t-tye added inline comments.

llvm/docs/AMDGPUUsage.rst

262–265

Delete as gfx1031 does not support XNACK.

936–939

``EF_AMDGPU_FEATURE_XNACK_NOT_SUPPORTED_V4``   0x0
``EF_AMDGPU_FEATURE_XNACK_ANY_V4 ``            0x1
``EF_AMDGPU_FEATURE_XNACK_OFF_V4 ``            0x2
``EF_AMDGPU_FEATURE_XNACK_ON_V4 ``             0x3

948–951

``EF_AMDGPU_FEATURE_SRAMECC_NOT_SUPPORTED_V4``   0x0
``EF_AMDGPU_FEATURE_SRAMECC_ANY_V4 ``            0x1
``EF_AMDGPU_FEATURE_SRAMECC_OFF_V4 ``            0x2
``EF_AMDGPU_FEATURE_SRAMECC_ON_V4 ``             0x3

This revision now requires changes to proceed.Aug 20 2020, 11:04 AM

revised by Tony's comments.

Herald added a subscriber: jfb. · View Herald TranscriptSep 2 2020, 5:16 AM

revised

kzhuravl requested changes to this revision.Sep 2 2020, 11:12 AM

kzhuravl added inline comments.

llvm/docs/AMDGPUUsage.rst
936–939	EF_AMDGPU_FEATURE_XNACK_UNSUPPORTED_V4
948–951	EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4

This revision now requires changes to proceed.Sep 2 2020, 11:12 AM

kzhuravl added inline comments.Sep 2 2020, 11:21 AM

llvm/docs/AMDGPUUsage.rst
831	EF_AMDGPU_FEATURE_XNACK_V3
842	EF_AMDGPU_FEATURE_SRAMECC_V3
868	EF_AMDGPU_FEATURE_XNACK_V4
868	EF_AMDGPU_FEATURE_XNACK_V3
872	EF_AMDGPU_FEATURE_SRAMECC_V4

kzhuravl added inline comments.Sep 2 2020, 11:31 AM

llvm/docs/AMDGPUUsage.rst
868	EF_AMDGPU_FEATURE_XNACK_V3 in the previous comment is wrong. It has to be EF_AMDGPU_FEATURE_XNACK_V4

t-tye added inline comments.Sep 3 2020, 8:57 AM

llvm/docs/AMDGPUUsage.rst
818–819	These tables need to be corrected to accurately represent the different layouts for e_flags in the code object versions. The code object versions are not the same as the ABI versions. It should be made clearer what the relationship is. Note that the code object V2 had the following definitions: EF_AMDGPU_FEATURE_XNACK_V2 0x00000001 EF_AMDGPU_FEATURE_TRAP_HANDLER_V2 0x00000002

revised by Tony's and Konstantin's comments

ashi1 added a subscriber: ashi1.Nov 2 2020, 8:08 AM

yaxunl abandoned this revision.Jan 15 2021, 1:03 PM

Revision Contents

Path

Size

clang/

docs/

ClangOffloadBundlerFileFormat.rst

89 lines

index.rst

1 line

llvm/

docs/

AMDGPUUsage.rst

546 lines

Diff 286397

clang/docs/ClangOffloadBundlerFileFormat.rst

This file was added.

				Bundled Device Binaries
				=======================

				The ``clang-offload-bundler`` tool can be used to combine multiple device
				binaries into a single bundled device binary file. The bundled device binary
				entries are identified by a bundle entry ID which is defined by the
				following EBNF syntax:

				.. code::

				<bundle_entry_id> ::= <offload_kind> "-" <target_triple> "-" <processor_or_target_id>

				t-tyeUnsubmitted Done Reply Inline Actions [ "-" <processor_or_target_id> ] This is optional as some targets do not support this. t-tye: [ "-" <processor_or_target_id> ] This is optional as some targets do not support this.
				yaxunlAuthorUnsubmitted Done Reply Inline Actions If a target does not support target ID, <processor_or_target_id> will be processor, so this item is still there. yaxunl: If a target does not support target ID, <processor_or_target_id> will be processor, so this…
				t-tyeUnsubmitted Done Reply Inline Actions My understanding was that some devices do not need a processor to be specified at all. For example, for x86 isn't the target triple enough by itself? In OpenMP it would simply use a bundled code object entry ID that is "opemp-<target-triple>". t-tye: My understanding was that some devices do not need a processor to be specified at all. For…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions fixed yaxunl: fixed
				Where:

				offload_kind
				The runtime responsible for managing the loading of the code object.
				See :ref:`offload-kind-table`.

				target_triple
				The target triple of the device binary.

				processor_or_target_id
				If target ID is not supported by the target, this is the processor for which the device
				binary is compiled. If target ID is supported by the target, this is the target ID for which the
				device binary is compiled (see `Target ID Definition <https://llvm.org/docs/AMDGPUUsage.html#target-ids>`_).

				t-tyeUnsubmitted Done Reply Inline Actions (for AMD GPU see :doc:`amdgpu-target-ids`) t-tye: (for AMD GPU see :doc:`amdgpu-target-ids`)
				yaxunlAuthorUnsubmitted Done Reply Inline Actions clang documentation is in different URL than llvm documentation. yaxunl: clang documentation is in different URL than llvm documentation.
				t-tyeUnsubmitted Done Reply Inline Actions So LLVM has a mono-repo but the documentation does not have a single TOC tree? That is a shame and rather defeats the benefits of being in a mono repo. t-tye: So LLVM has a mono-repo but the documentation does not have a single TOC tree? That is a shame…
				t-tyeUnsubmitted Done Reply Inline Actions for AMD GPU see t-tye: for AMD GPU see
				.. table:: Bundled Device Binary Offload Kind
				:name: offload-kind-table

				============= ==============================================================
				Offload Kind Description
				============= ==============================================================
				host This offload kind is used for the first dummy empty entry
				in the header of the bundle, which is required by
				clang-offload-bundler, but is not used by language runtimes.

				hip Device binary loading is managed by the HIP language runtime.

				openmp Device binary loading is managed by the OpenMP language runtime.
				============= ==============================================================

				The format of a bundled device binary is defined by the following table:

				.. table:: Bundled Device Binary Memory Layout
				:name: bundled-device-binary-fields-table

				========================= ======== ========================== ===============================
				Field Type Size in Bytes Description
				========================= ======== ========================== ===============================
				Magic String string 24 ``__CLANG_OFFLOAD_BUNDLE__``

				Number Of Device Binaries integer 8 Denoted as N in this table

				Entry Offset 1 integer 8 Byte offset from beginning of
				bundled device binary to 1st device
				binary.

				Entry Size 1 integer 8 Byte size of 1st code object.

				Entry ID Length 1 integer 8 Bundle entry ID character length
				of 1st device binary

				Entry ID 1 string Byte size of entry ID 1 Bundle entry ID of 1st device
				binary. This is not NUL
				terminated.

				...

				Entry Offset N integer 8

				Entry Size N integer 8

				Entry ID Length N integer 8

				Entry ID N string Byte size of entry ID N

				1st Device Binary bytes Size Of 1st Device Binary

				...

				N-th Device Binary bytes Size Of N-th Device Binary
				========================= ======== ========================== ==============================

				The ``clang-offload-bundler`` is used to bundle device binaries for different processor
				and feature settings.

				If target ID is supported, the rules of compatible offload targets in a single bundled device binary is defined
				in `AMDGPU Embedding Bundled Code Objects
				<https://llvm.org/docs/AMDGPUUsage.html#embedding-bundled-objects>`_.
				jdoerfertUnsubmitted Done Reply Inline Actions This is not generic and ignores the fact that we use this tool for non-AMDGPU targets. The target-id description above does too. jdoerfert: This is not generic and ignores the fact that we use this tool for non-AMDGPU targets. The…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions fixed yaxunl: fixed
				jdoerfertUnsubmitted Done Reply Inline Actions It is still completely unclear how non AMDGPU targets use this. The description is confusing IMHO. Specifically, the `<bundle_entry_id>` specifies `"-" <processor_or_target_id>` but it is unclear what that means for other targets, given that the `processor_or_target_id` description links to the AMDGPU docs. If you want to say other targets do not use the target_id at all, so there is nothing, please specify say that explicitly. If there is something, we have to say something about that as well, at least that there are alternatives to the AMDGPU way. jdoerfert: It is still completely unclear how non AMDGPU targets use this. The description is confusing…
				No newline at end of file
				t-tyeUnsubmitted Done Reply Inline Actions defined by the target (for AMD GPU see :doc::`amdgpu-embedding-bundled-objects`). t-tye: defined by the target (for AMD GPU see :doc::`amdgpu-embedding-bundled-objects`).
				t-tyeUnsubmitted Done Reply Inline Actions defined by the target (for AMD GPU see `AMDGPU Embedding Bundled Code Objects <https://llvm.org/docs/AMDGPUUsage.html#embedding-bundled-objects>`_). t-tye: defined by the target (for AMD GPU see `AMDGPU Embedding Bundled Code Objects <https://llvm.

clang/docs/index.rst

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	.. toctree::			.. toctree::
	:maxdepth: 1			:maxdepth: 1

	ClangTools			ClangTools
	ClangCheck			ClangCheck
	ClangFormat			ClangFormat
	ClangFormatStyleOptions			ClangFormatStyleOptions
	ClangFormattedStatus			ClangFormattedStatus
				ClangOffloadBundlerFileFormat

	Design Documents			Design Documents
	================			================

	.. toctree::			.. toctree::
	:maxdepth: 1			:maxdepth: 1

	InternalsManual			InternalsManual
	Show All 14 Lines

llvm/docs/AMDGPUUsage.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	.. table:: AMDGPU Environments
============ ==============================================================		============ ==============================================================
Environment Description		Environment Description
============ ==============================================================		============ ==============================================================
<empty> Default.		<empty> Default.
============ ==============================================================		============ ==============================================================

.. _amdgpu-processors:		.. _amdgpu-processors:

Processors		Processors
		t-tyeUnsubmitted Done Reply Inline Actions We should move the code object V3 into a separate page so we continue to define that ABI since old code objects still exist. Then these changes can define the code object V4, with a reference to the pages for previous vesions. t-tye: We should move the code object V3 into a separate page so we continue to define that ABI since…
----------		----------

Use the ``clang -mcpu <Processor>`` option to specify the AMDGPU processor. The		Use the ``clang -mcpu <Processor>`` or ``clang -mcpu <Target-ID>`` option to
names from both the Processor and Alternative Processor can be used.		specify the AMDGPU processor. The names from both the Processor and
		Alternative Processor can be used. Target ID is defined in :ref:`target-id`
		which includes the Processor and the Target Features.

.. table:: AMDGPU Processors		.. table:: AMDGPU Processors
:name: amdgpu-processor-table		:name: amdgpu-processor-table

=========== =============== ============ ===== ================= ======= ======================		=========== =============== ============ ===== ======================= ======= ======================
Processor Alternative Target dGPU/ Target ROCm Example		Processor Alternative Target dGPU/ Target ROCm Example
Processor Triple APU Features Support Products		Processor Triple APU Features Support Products
Architecture Supported		Architecture Supported
[Default]		=========== =============== ============ ===== ======================= ======= ======================
=========== =============== ============ ===== ================= ======= ======================
Radeon HD 2000/3000 Series (R600) [AMD-RADEON-HD-2000-3000]_		Radeon HD 2000/3000 Series (R600) [AMD-RADEON-HD-2000-3000]_
-----------------------------------------------------------------------------------------------		-----------------------------------------------------------------------------------------------------
``r600`` ``r600`` dGPU		``r600`` ``r600`` dGPU
``r630`` ``r600`` dGPU		``r630`` ``r600`` dGPU
``rs880`` ``r600`` dGPU		``rs880`` ``r600`` dGPU
``rv670`` ``r600`` dGPU		``rv670`` ``r600`` dGPU
Radeon HD 4000 Series (R700) [AMD-RADEON-HD-4000]_		Radeon HD 4000 Series (R700) [AMD-RADEON-HD-4000]_
-----------------------------------------------------------------------------------------------		-----------------------------------------------------------------------------------------------------
``rv710`` ``r600`` dGPU		``rv710`` ``r600`` dGPU
``rv730`` ``r600`` dGPU		``rv730`` ``r600`` dGPU
``rv770`` ``r600`` dGPU		``rv770`` ``r600`` dGPU
Radeon HD 5000 Series (Evergreen) [AMD-RADEON-HD-5000]_		Radeon HD 5000 Series (Evergreen) [AMD-RADEON-HD-5000]_
-----------------------------------------------------------------------------------------------		-----------------------------------------------------------------------------------------------------
``cedar`` ``r600`` dGPU		``cedar`` ``r600`` dGPU
``cypress`` ``r600`` dGPU		``cypress`` ``r600`` dGPU
``juniper`` ``r600`` dGPU		``juniper`` ``r600`` dGPU
``redwood`` ``r600`` dGPU		``redwood`` ``r600`` dGPU
``sumo`` ``r600`` dGPU		``sumo`` ``r600`` dGPU
Radeon HD 6000 Series (Northern Islands) [AMD-RADEON-HD-6000]_		Radeon HD 6000 Series (Northern Islands) [AMD-RADEON-HD-6000]_
-----------------------------------------------------------------------------------------------		-----------------------------------------------------------------------------------------------------
``barts`` ``r600`` dGPU		``barts`` ``r600`` dGPU
``caicos`` ``r600`` dGPU		``caicos`` ``r600`` dGPU
``cayman`` ``r600`` dGPU		``cayman`` ``r600`` dGPU
``turks`` ``r600`` dGPU		``turks`` ``r600`` dGPU
GCN GFX6 (Southern Islands (SI)) [AMD-GCN-GFX6]_		GCN GFX6 (Southern Islands (SI)) [AMD-GCN-GFX6]_
-----------------------------------------------------------------------------------------------		-----------------------------------------------------------------------------------------------------
``gfx600`` - ``tahiti`` ``amdgcn`` dGPU		``gfx600`` - ``tahiti`` ``amdgcn`` dGPU
``gfx601`` - ``hainan`` ``amdgcn`` dGPU		``gfx601`` - ``hainan`` ``amdgcn`` dGPU
- ``oland``		- ``oland``
- ``pitcairn``		- ``pitcairn``
- ``verde``		- ``verde``
GCN GFX7 (Sea Islands (CI)) [AMD-GCN-GFX7]_		GCN GFX7 (Sea Islands (CI)) [AMD-GCN-GFX7]_
-----------------------------------------------------------------------------------------------		-----------------------------------------------------------------------------------------------------
``gfx700`` - ``kaveri`` ``amdgcn`` APU - A6-7000		``gfx700`` - ``kaveri`` ``amdgcn`` APU - A6-7000
- A6 Pro-7050B		- A6 Pro-7050B
- A8-7100		- A8-7100
- A8 Pro-7150B		- A8 Pro-7150B
- A10-7300		- A10-7300
- A10 Pro-7350B		- A10 Pro-7350B
- FX-7500		- FX-7500
- A8-7200P		- A8-7200P
- A10-7400P		- A10-7400P
- FX-7600P		- FX-7600P
``gfx701`` - ``hawaii`` ``amdgcn`` dGPU ROCm - FirePro W8100		``gfx701`` - ``hawaii`` ``amdgcn`` dGPU ROCm - FirePro W8100
- FirePro W9100		- FirePro W9100
- FirePro S9150		- FirePro S9150
- FirePro S9170		- FirePro S9170
``gfx702`` ``amdgcn`` dGPU ROCm - Radeon R9 290		``gfx702`` ``amdgcn`` dGPU ROCm - Radeon R9 290
- Radeon R9 290x		- Radeon R9 290x
- Radeon R390		- Radeon R390
- Radeon R390x		- Radeon R390x
``gfx703`` - ``kabini`` ``amdgcn`` APU - E1-2100		``gfx703`` - ``kabini`` ``amdgcn`` APU - E1-2100
- ``mullins`` - E1-2200		- ``mullins`` - E1-2200
- E1-2500		- E1-2500
- E2-3000		- E2-3000
- E2-3800		- E2-3800
- A4-5000		- A4-5000
- A4-5100		- A4-5100
- A6-5200		- A6-5200
- A4 Pro-3340B		- A4 Pro-3340B
``gfx704`` - ``bonaire`` ``amdgcn`` dGPU - Radeon HD 7790		``gfx704`` - ``bonaire`` ``amdgcn`` dGPU - Radeon HD 7790
- Radeon HD 8770		- Radeon HD 8770
- R7 260		- R7 260
- R7 260X		- R7 260X
GCN GFX8 (Volcanic Islands (VI)) [AMD-GCN-GFX8]_		GCN GFX8 (Volcanic Islands (VI)) [AMD-GCN-GFX8]_
-----------------------------------------------------------------------------------------------		-----------------------------------------------------------------------------------------------------
``gfx801`` - ``carrizo`` ``amdgcn`` APU - xnack - A6-8500P		``gfx801`` - ``carrizo`` ``amdgcn`` APU - xnack - A6-8500P
[on] - Pro A6-8500B		- Pro A6-8500B
- A8-8600P		- A8-8600P
- Pro A8-8600B		- Pro A8-8600B
- FX-8800P		- FX-8800P
- Pro A12-8800B		- Pro A12-8800B
\ ``amdgcn`` APU - xnack ROCm - A10-8700P		\ ``amdgcn`` APU - xnack ROCm - A10-8700P
[on] - Pro A10-8700B		- Pro A10-8700B
- A10-8780P		- A10-8780P
\ ``amdgcn`` APU - xnack - A10-9600P		\ ``amdgcn`` APU - xnack - A10-9600P
[on] - A10-9630P		- A10-9630P
- A12-9700P		- A12-9700P
- A12-9730P		- A12-9730P
- FX-9800P		- FX-9800P
- FX-9830P		- FX-9830P
\ ``amdgcn`` APU - xnack - E2-9010		\ ``amdgcn`` APU - xnack - E2-9010
[on] - A6-9210		- A6-9210
- A9-9410		- A9-9410
``gfx802`` - ``iceland`` ``amdgcn`` dGPU - xnack ROCm - FirePro S7150		``gfx802`` - ``iceland`` ``amdgcn`` dGPU - xnack ROCm - FirePro S7150
- ``tonga`` [off] - FirePro S7100		- ``tonga`` - FirePro S7100
- FirePro W7100		- FirePro W7100
- Radeon R285		- Radeon R285
- Radeon R9 380		- Radeon R9 380
- Radeon R9 385		- Radeon R9 385
- Mobile FirePro		- Mobile FirePro
M7170		M7170
``gfx803`` - ``fiji`` ``amdgcn`` dGPU - xnack ROCm - Radeon R9 Nano		``gfx803`` - ``fiji`` ``amdgcn`` dGPU - xnack ROCm - Radeon R9 Nano
[off] - Radeon R9 Fury		- Radeon R9 Fury
- Radeon R9 FuryX		- Radeon R9 FuryX
- Radeon Pro Duo		- Radeon Pro Duo
- FirePro S9300x2		- FirePro S9300x2
- Radeon Instinct MI8		- Radeon Instinct MI8
\ - ``polaris10`` ``amdgcn`` dGPU - xnack ROCm - Radeon RX 470		\ - ``polaris10`` ``amdgcn`` dGPU - xnack ROCm - Radeon RX 470
[off] - Radeon RX 480		- Radeon RX 480
- Radeon Instinct MI6		- Radeon Instinct MI6
\ - ``polaris11`` ``amdgcn`` dGPU - xnack ROCm - Radeon RX 460		\ - ``polaris11`` ``amdgcn`` dGPU - xnack ROCm - Radeon RX 460
[off]
``gfx810`` - ``stoney`` ``amdgcn`` APU - xnack		``gfx810`` - ``stoney`` ``amdgcn`` APU - xnack
[on]
GCN GFX9 [AMD-GCN-GFX9]_		GCN GFX9 [AMD-GCN-GFX9]_
		t-tyeUnsubmitted Done Reply Inline Actions This processor does support xnack so removing that from the Features column seems incorrect. That is true for the other processors below. t-tye: This processor does support xnack so removing that from the Features column seems incorrect.
-----------------------------------------------------------------------------------------------		-----------------------------------------------------------------------------------------------------
``gfx900`` ``amdgcn`` dGPU - xnack ROCm - Radeon Vega		``gfx900`` ``amdgcn`` dGPU - xnack ROCm - Radeon Vega
[off] Frontier Edition		Frontier Edition
- Radeon RX Vega 56		- Radeon RX Vega 56
- Radeon RX Vega 64		- Radeon RX Vega 64
- Radeon RX Vega 64		- Radeon RX Vega 64
Liquid		Liquid
- Radeon Instinct MI25		- Radeon Instinct MI25
``gfx902`` ``amdgcn`` APU - xnack - Ryzen 3 2200G		``gfx902`` ``amdgcn`` APU - xnack - Ryzen 3 2200G
[on] - Ryzen 5 2400G		- Ryzen 5 2400G
``gfx904`` ``amdgcn`` dGPU - xnack TBA		``gfx904`` ``amdgcn`` dGPU - xnack TBA
[off]
.. TODO::		.. TODO::
Add product		Add product
names.		names.
``gfx906`` ``amdgcn`` dGPU - xnack - Radeon Instinct MI50		``gfx906`` ``amdgcn`` dGPU - sramecc - Radeon Instinct MI50
		t-tyeUnsubmitted Done Reply Inline Actions Change all occurences of sram-ecc to sramecc to avoid problems that hyphen is used as a bundled code object entry ID separator. t-tye: Change all occurences of sram-ecc to sramecc to avoid problems that hyphen is used as a bundled…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done
[off] - Radeon Instinct MI60		- xnack - Radeon Instinct MI60
- sram-ecc - Radeon VII		- Radeon VII
[off] - Radeon Pro VII		- Radeon Pro VII
``gfx908`` ``amdgcn`` dGPU - xnack TBA		``gfx908`` ``amdgcn`` dGPU - sramecc TBA
[off]		- xnack
- sram-ecc
[on]
.. TODO::		.. TODO::
Add product		Add product
names.		names.
``gfx909`` ``amdgcn`` APU - xnack TBA		``gfx909`` ``amdgcn`` APU - xnack TBA
[on]
.. TODO::		.. TODO::
Add product		Add product
names.		names.
GCN GFX10 [AMD-GCN-GFX10]_		GCN GFX10 [AMD-GCN-GFX10]_
-----------------------------------------------------------------------------------------------		-----------------------------------------------------------------------------------------------------
``gfx1010`` ``amdgcn`` dGPU - xnack - Radeon RX 5700		``gfx1010`` ``amdgcn`` dGPU - cumode - Radeon RX 5700
[off] - Radeon RX 5700 XT		- wavefrontsize64 - Radeon RX 5700 XT
- wavefrontsize64 - Radeon Pro 5600 XT		- xnack - Radeon Pro 5600 XT
[off]
- cumode		``gfx1011`` ``amdgcn`` dGPU - cumode - Radeon Pro 5600M
[off]
``gfx1011`` ``amdgcn`` dGPU - xnack - Radeon Pro 5600M
[off]
- wavefrontsize64		- wavefrontsize64
[off]		- xnack
- cumode
[off]		``gfx1012`` ``amdgcn`` dGPU - cumode - Radeon RX 5500
``gfx1012`` ``amdgcn`` dGPU - xnack - Radeon RX 5500		- wavefrontsize64 - Raedon RX 5500 XT
[off] - Radeon RX 5500 XT		- xnack

		``gfx1030`` ``amdgcn`` dGPU - cumode TBA
- wavefrontsize64		- wavefrontsize64
[off]
- cumode
[off]
``gfx1030`` ``amdgcn`` dGPU - wavefrontsize64 TBA
[off]
- cumode
jdoerfertUnsubmitted Done Reply Inline Actions None of these changes seem related to "Add target ID to AMDGPU documentation". jdoerfert: None of these changes seem related to "Add target ID to AMDGPU documentation".
yaxunlAuthorUnsubmitted Done Reply Inline Actions xnack and sram-ecc are now set through target ID, and their default value are now changed to "default". As such, the default value of all features are now described in the "Target Features" section instead of in this table. yaxunl: xnack and sram-ecc are now set through target ID, and their default value are now changed to…
[off]
.. TODO		.. TODO
Add product		Add product
names.		names.
``gfx1031`` ``amdgcn`` dGPU - xnack TBA
[off]		``gfx1031`` ``amdgcn`` dGPU - cumode TBA
- wavefrontsize64		- wavefrontsize64
[off]		- xnack
- cumode
[off]
.. TODO		.. TODO
Add product		Add product
		t-tyeUnsubmitted Done Reply Inline Actions TBA .. TODO:: Add product names. t-tye: ``` TBA .. TODO:: Add product names. ```
names.		names.
		t-tyeUnsubmitted Done Reply Inline Actions Delete as gfx1031 does not support XNACK. t-tye: Delete as gfx1031 does not support XNACK.
=========== =============== ============ ===== ================= ======= ======================		=========== =============== ============ ===== ================= ======= ======================

		See :ref:`target-features` and :ref:`target-id` for more information on target features.

.. _amdgpu-target-features:		.. _amdgpu-target-features:

Target Features		Target Features
		t-tyeUnsubmitted Done Reply Inline Actions We should move the code object V3 into a separate page so we continue to define that ABI since old code objects still exist. Then these changes can define the code object V4. t-tye: We should move the code object V3 into a separate page so we continue to define that ABI since…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions can we do that with a separate patch? This patch already contains too many changes. yaxunl: can we do that with a separate patch? This patch already contains too many changes.
		t-tyeUnsubmitted Done Reply Inline Actions My concern is that this patch will obliterate the V3 definition with this V4 one. But I guess you can always reach back to an earlier commit as that is what git is for:-) t-tye: My concern is that this patch will obliterate the V3 definition with this V4 one. But I guess…
---------------		---------------

Target features control how code is generated to support certain		Target features control how code is generated to support certain
processor specific features. Not all target features are supported by		processor specific features. Not all target features are supported by
all processors. The runtime must ensure that the features supported by		all processors. The runtime must ensure that the features supported by
the device used to execute the code match the features enabled when		the device used to execute the code match the features enabled when
generating the code. A mismatch of features may result in incorrect		generating the code. A mismatch of features may result in incorrect
execution, or a reduction in performance.		execution, or a reduction in performance.

The target features supported by each processor, and the default value		The target features supported by each processor is listed in
used if not specified explicitly, is listed in
:ref:`amdgpu-processor-table`.		:ref:`amdgpu-processor-table`.

Use the ``clang -m[no-]<TargetFeature>`` option to specify the AMDGPU		Target features are controlled by exactly one of the following ``clang``
target features.		options:

		``-mcpu=<target-id>``

		The ``-mcpu`` can specify the target feature as optional components
		of the target ID. See :ref:`_amdgpu-target-ids`. If omitted, the target
		feature has the ``any`` value.

		``-m[no-]<target-feature>``

		Target features not specified by the target ID are specified using a
		separate option. These target features can have an ``on`` or ``off``
		value. ``on`` is specified by omitting the ``no-`` prefix, and
		``off`` is specified by including the ``no-`` prefix. The default
		if not specified is ``off``.

For example:		For example:

``-mxnack``		``-mcpu gfx908:xnack+``
Enable the ``xnack`` feature.		Enable the ``xnack`` feature.
``-mno-xnack``		``-mcpu gfx908:xnack-``
Disable the ``xnack`` feature.		Disable the ``xnack`` feature.
		``-mcumode``
		Enable the ``cumode`` feature.
		``-mno-cumode``
		Disable the ``cumode`` feature.

.. table:: AMDGPU Target Features		.. table:: AMDGPU Target Features
:name: amdgpu-target-feature-table		:name: amdgpu-target-feature-table

====================== ==================================================		====================== ======================= ==================================================
Target Feature Description		Target Feature ``Clang`` Option to Description
====================== ==================================================		Name Control
-m[no-]xnack Enable/disable generating code that has		====================== ======================= ==================================================
memory clauses that are compatible with		cumode -m[no-]cumode Control the wavefront execution mode used
having XNACK replay enabled.		when generating code for kernels. When disabled
		native WGP wavefront execution mode is used,
		when enabled CU wavefront execution mode is used
		(see :ref:`amdgpu-amdhsa-memory-model`).

		sramecc -mcpu If specified, generate code that can only be
		loaded and executed in a process that has a
		matching setting for SRAM ECC.

		If not specified, generate code that can be
		loaded and executed in a process with either
		setting of SRAM ECC.

		wavefrontsize64 -m[no-]wavefrontsize64 Control the wavefront size used when
		generating code for kernels. When disabled
		native wavefront size 32 is used, when enabled
		wavefront size 64 is used.

		xnack -mcpu If specified, generate code that can only be
		loaded and executed in a process that has a
		matching setting for XNACK replay.

		If not specified, generate code that can be
		loaded and executed in a process with either
		setting of XNACK replay.

This is used for demand paging and page		This is used for demand paging and page
migration. If XNACK replay is enabled in		migration. If XNACK replay is enabled in
the device, then if a page fault occurs		the device, then if a page fault occurs
the code may execute incorrectly if the		the code may execute incorrectly if the
``xnack`` feature is not enabled. Executing		``xnack`` feature is not enabled. Executing
code that has the feature enabled on a		code that has the feature enabled on a
device that does not have XNACK replay		device that does not have XNACK replay
enabled will execute correctly but may		enabled will execute correctly but may
be less performant than code with the		be less performant than code with the
feature disabled.		feature disabled.
		====================== ======================= ==================================================

-m[no-]sram-ecc Enable/disable generating code that assumes SRAM		.. _amdgpu-target-ids:
ECC is enabled/disabled.

-m[no-]wavefrontsize64 Control the default wavefront size used when		Target IDs
generating code for kernels. When disabled		----------
native wavefront size 32 is used, when enabled		A target ID is used to indicate the processor configuration a device binary is
wavefront size 64 is used.		compiled for. It can be treated as an extension of processor since the validity of a
		device binary depends not only on the processor but also its configuration
		which is represented by a set of target features. Target ID provides a way to
		represent processor configurations which affect ISA generation.

-m[no-]cumode Control the default wavefront execution mode used		Target ID syntax is defined by the following EBNF syntax:
when generating code for kernels. When disabled
native WGP wavefront execution mode is used,		.. code::
when enabled CU wavefront execution mode is used
(see :ref:`amdgpu-amdhsa-memory-model`).		<target_id> ::= <processor> ( ":" <target_feature> ( "+" \| "-" ) )*
====================== ==================================================
		Where:

		processor
		Is a AMDGPU processor or alternative processor name specified
		in :ref:`amdgpu-processor-table`.

		target_feature
		Is a target feature name specified in :ref:`target-features-table` that is
		supported by the processor. The target features supported by each processor
		is specified in :ref:`amdgpu-processor-table`. Each target feature must
		appear at most once in a target ID and can have one of three values:

		Any
		arsenmUnsubmitted Done Reply Inline Actions Default is an extremely confusing name. This should reflect that it's the universally compatible mode. AnyMode? Either? Universal? arsenm: Default is an extremely confusing name. This should reflect that it's the universally…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Another word I can think of is "General" or "Generic". @t-tye What do you think? Thanks. yaxunl: Another word I can think of is "General" or "Generic". @t-tye What do you think? Thanks.
		t-tyeUnsubmitted Done Reply Inline Actions What about "any". It is short like "on" and "off" and captures what Matt suggests (a truncation of "anymode" which would imply "onmode" and "offmode" which seem a mouthful and no more obvious). Thoughts? t-tye: What about "any". It is short like "on" and "off" and captures what Matt suggests (a truncation…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions "Any" sounds perfect. I will update the patch. Thanks. yaxunl: "Any" sounds perfect. I will update the patch. Thanks.
		Specified by omitting the target feature from the target ID.
		A code object compiled with a target ID specifying the default
		value of a target feature can be loaded and executed on a processor
		configured with the target feature on or off.

		On
		JonChesterfieldUnsubmitted Done Reply Inline Actions The on/off division seems likely to cause us problems later. How about 0/1 instead, so that we can later add 2, 3 etc when a feature is added that has more than two states? Or strings. JonChesterfield: The on/off division seems likely to cause us problems later. How about 0/1 instead, so that we…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Using +/- is more concise and easier to parse. If we need multi-value for future attributes, it is not difficult to differentiate them by checking the last character. yaxunl: Using +/- is more concise and easier to parse. If we need multi-value for future attributes, it…
		jdoerfertUnsubmitted Done Reply Inline Actions I'm with @yaxunl. This is literally duplicating something else we have in the IR already, maybe not also diverge for no real reason. jdoerfert: I'm with @yaxunl. This is literally duplicating something else we have in the IR already, maybe…
		Specified by ``+``, indicating the target feature is enabled. A code
		object compiled with a target ID specifying a target feature on
		can only be loaded on a processor configured with the target feature on.

		Off
		specified by ``-``, indicating the target feature is disabled. A code
		object compiled with a target ID specifying a target feature off
		can only be loaded on a processor configured with the target feature off.

		There are two forms of target ID:

		Non-Canonical Form
		The non-canonical form is used as the input to user commands to allow
		the user greater convenience. It allows both the primary and alternative
		processor name to be used (see :ref:`amdgpu-processors`) and the target
		features may be specified in any order (see :ref:`amdgpu-target-features`).

		Canonical Form
		The canonical form is used for all generated output to allow greater
		convenience for tools that consume the information. It is also used for
		internal passing of information between tools. Only the primary and not
		alternative processor name is used (see :ref:`amdgpu-processors`) and
		the target features are specified in alphabetic order
		(see :ref:`amdgpu-target-features`). Command line tools convert
		non-canonical form to canonical form.

.. _amdgpu-address-spaces:		.. _amdgpu-address-spaces:

Address Spaces		Address Spaces
--------------		--------------

The AMDGPU architecture supports a number of memory address spaces. The address		The AMDGPU architecture supports a number of memory address spaces. The address
space names use the OpenCL standard names, with some additions.		space names use the OpenCL standard names, with some additions.
▲ Show 20 Lines • Show All 295 Lines • ▼ Show 20 Lines	``e_ident[EI_OSABI]`` - ``ELFOSABI_NONE``
- ``ELFOSABI_AMDGPU_MESA3D``		- ``ELFOSABI_AMDGPU_MESA3D``
``e_ident[EI_ABIVERSION]`` - ``ELFABIVERSION_AMDGPU_HSA``		``e_ident[EI_ABIVERSION]`` - ``ELFABIVERSION_AMDGPU_HSA``
- ``ELFABIVERSION_AMDGPU_PAL``		- ``ELFABIVERSION_AMDGPU_PAL``
- ``ELFABIVERSION_AMDGPU_MESA3D``		- ``ELFABIVERSION_AMDGPU_MESA3D``
``e_type`` - ``ET_REL``		``e_type`` - ``ET_REL``
- ``ET_DYN``		- ``ET_DYN``
``e_machine`` ``EM_AMDGPU``		``e_machine`` ``EM_AMDGPU``
``e_entry`` 0		``e_entry`` 0
``e_flags`` See :ref:`amdgpu-elf-header-e_flags-table`		``e_flags`` See :ref:`amdgpu-elf-header-e_flags-table-v0_v1`
		and :ref:`amdgpu-elf-header-e_flags-table-v2`
========================== ===============================		========================== ===============================

..		..

.. table:: AMDGPU ELF Header Enumeration Values		.. table:: AMDGPU ELF Header Enumeration Values
:name: amdgpu-elf-header-enumeration-values-table		:name: amdgpu-elf-header-enumeration-values-table

=============================== =====		=============================== ======
Name Value		Name Value
=============================== =====		=============================== ======
``EM_AMDGPU`` 224		``EM_AMDGPU`` 224
``ELFOSABI_NONE`` 0		``ELFOSABI_NONE`` 0
``ELFOSABI_AMDGPU_HSA`` 64		``ELFOSABI_AMDGPU_HSA`` 64
``ELFOSABI_AMDGPU_PAL`` 65		``ELFOSABI_AMDGPU_PAL`` 65
``ELFOSABI_AMDGPU_MESA3D`` 66		``ELFOSABI_AMDGPU_MESA3D`` 66
``ELFABIVERSION_AMDGPU_HSA`` 1		``ELFABIVERSION_AMDGPU_HSA_V0`` 0
		``ELFABIVERSION_AMDGPU_HSA_V1`` 1
		``ELFABIVERSION_AMDGPU_HSA_V2`` 2
``ELFABIVERSION_AMDGPU_PAL`` 0		``ELFABIVERSION_AMDGPU_PAL`` 0
``ELFABIVERSION_AMDGPU_MESA3D`` 0		``ELFABIVERSION_AMDGPU_MESA3D`` 0
=============================== =====		=============================== ======

``e_ident[EI_CLASS]``		``e_ident[EI_CLASS]``
The ELF class is:		The ELF class is:

* ``ELFCLASS32`` for ``r600`` architecture.		* ``ELFCLASS32`` for ``r600`` architecture.

* ``ELFCLASS64`` for ``amdgcn`` architecture which only supports 64-bit		* ``ELFCLASS64`` for ``amdgcn`` architecture which only supports 64-bit
process address space applications.		process address space applications.
Show All 39 Lines	``e_type``

The AMD HSA runtime loader requires a ``ET_DYN`` code object.		The AMD HSA runtime loader requires a ``ET_DYN`` code object.

``e_machine``		``e_machine``
The value ``EM_AMDGPU`` is used for the machine for all processors supported		The value ``EM_AMDGPU`` is used for the machine for all processors supported
by the ``r600`` and ``amdgcn`` architectures (see		by the ``r600`` and ``amdgcn`` architectures (see
:ref:`amdgpu-processor-table`). The specific processor is specified in the		:ref:`amdgpu-processor-table`). The specific processor is specified in the
``EF_AMDGPU_MACH`` bit field of the ``e_flags`` (see		``EF_AMDGPU_MACH`` bit field of the ``e_flags`` (see
:ref:`amdgpu-elf-header-e_flags-table`).		:ref:`amdgpu-elf-header-e_flags-table-v0_v1` and
		:ref:`amdgpu-elf-header-e_flags-table-v2`).

``e_entry``		``e_entry``
The entry point is 0 as the entry points for individual kernels must be		The entry point is 0 as the entry points for individual kernels must be
selected in order to invoke them through AQL packets.		selected in order to invoke them through AQL packets.

``e_flags``		``e_flags``
The AMDGPU backend uses the following ELF header flags:		The AMDGPU backend uses the following ELF header flags:

.. table:: AMDGPU ELF Header ``e_flags``		.. table:: AMDGPU ELF Header ``e_flags`` (``EI_ABIVERSION_V0`` and ``EI_ABIVERSION_V1``)
:name: amdgpu-elf-header-e_flags-table		:name: amdgpu-elf-header-e_flags-table-v0_v1
		t-tyeUnsubmitted Done Reply Inline Actions These tables need to be corrected to accurately represent the different layouts for e_flags in the code object versions. The code object versions are not the same as the ABI versions. It should be made clearer what the relationship is. Note that the code object V2 had the following definitions: EF_AMDGPU_FEATURE_XNACK_V2 0x00000001 EF_AMDGPU_FEATURE_TRAP_HANDLER_V2 0x00000002 t-tye: These tables need to be corrected to accurately represent the different layouts for e_flags in…

================================= ========== =============================		================================= ========== =============================
Name Value Description		Name Value Description
================================= ========== =============================		================================= ========== =============================
AMDGPU Processor Flag See :ref:`amdgpu-processor-table`.		AMDGPU Processor Flag See :ref:`amdgpu-processor-table`.
-------------------------------------------- -----------------------------		-------------------------------------------- -----------------------------
``EF_AMDGPU_MACH`` 0x000000ff AMDGPU processor selection		``EF_AMDGPU_MACH`` 0x000000ff AMDGPU processor selection
mask for		mask for
``EF_AMDGPU_MACH_xxx`` values		``EF_AMDGPU_MACH_xxx`` values
defined in		defined in
:ref:`amdgpu-ef-amdgpu-mach-table`.		:ref:`amdgpu-ef-amdgpu-mach-table`.
``EF_AMDGPU_XNACK`` 0x00000100 Indicates if the ``xnack``		``EF_AMDGPU_XNACK`` 0x00000100 Indicates if the ``xnack``
		kzhuravlUnsubmitted Done Reply Inline Actions EF_AMDGPU_FEATURE_XNACK_V3 kzhuravl: EF_AMDGPU_FEATURE_XNACK_V3
target feature is		target feature is
enabled for all code		enabled for all code
contained in the code object.		contained in the code object.
If the processor		If the processor
does not support the		does not support the
``xnack`` target		``xnack`` target
feature then must		feature then must
be 0.		be 0.
See		See
:ref:`amdgpu-target-features`.		:ref:`amdgpu-target-features`.
``EF_AMDGPU_SRAM_ECC`` 0x00000200 Indicates if the ``sram-ecc``		``EF_AMDGPU_SRAMECC`` 0x00000200 Indicates if the ``sramecc``
		kzhuravlUnsubmitted Done Reply Inline Actions EF_AMDGPU_FEATURE_SRAMECC_V3 kzhuravl: EF_AMDGPU_FEATURE_SRAMECC_V3
target feature is		target feature is
enabled for all code		enabled for all code
contained in the code object.		contained in the code object.
If the processor		If the processor
does not support the		does not support the
``sram-ecc`` target		``sramecc`` target
feature then must		feature then must
be 0.		be 0.
See		See
:ref:`amdgpu-target-features`.		:ref:`amdgpu-target-features`.
================================= ========== =============================		================================= ========== =============================

		.. table:: AMDGPU ELF Header ``e_flags`` (``EI_ABIVERSION_V2``)
		:name: amdgpu-elf-header-e_flags-table-v2

		================================= ========== ==========================================
		Name Value Description
		================================= ========== ==========================================
		AMDGPU Processor Flag See :ref:`amdgpu-processor-table`.
		-------------------------------------------- ------------------------------------------
		``EF_AMDGPU_MACH`` 0x000000ff AMDGPU processor selection
		mask for
		``EF_AMDGPU_MACH_xxx`` values
		defined in
		:ref:`amdgpu-ef-amdgpu-mach-table`.
		``EF_AMDGPU_FEATURE_XNACK`` 0x00000300 XNACK selection mask for
		kzhuravlUnsubmitted Done Reply Inline Actions EF_AMDGPU_FEATURE_XNACK_V4 kzhuravl: EF_AMDGPU_FEATURE_XNACK_V4
		kzhuravlUnsubmitted Done Reply Inline Actions EF_AMDGPU_FEATURE_XNACK_V3 kzhuravl: EF_AMDGPU_FEATURE_XNACK_V3
		kzhuravlUnsubmitted Done Reply Inline Actions EF_AMDGPU_FEATURE_XNACK_V3 in the previous comment is wrong. It has to be EF_AMDGPU_FEATURE_XNACK_V4 kzhuravl: EF_AMDGPU_FEATURE_XNACK_V3 in the previous comment is wrong. It has to be…
		``EF_AMDGPU_FEATURE_XNACK_xxx`` values
		defined in
		:ref:`amdgpu-ef-amdgpu-feature-xnack-table`.
		``EF_AMDGPU_FEATURE_SRAMECC`` 0x00000c00 SRAMECC selection mask for
		kzhuravlUnsubmitted Done Reply Inline Actions EF_AMDGPU_FEATURE_SRAMECC_V4 kzhuravl: EF_AMDGPU_FEATURE_SRAMECC_V4
		``EF_AMDGPU_FEATURE_SRAMECC_xxx`` values
		defined in
		:ref:`amdgpu-ef-amdgpu-feature-sramecc-table`.
		================================= ========== ==========================================

.. table:: AMDGPU ``EF_AMDGPU_MACH`` Values		.. table:: AMDGPU ``EF_AMDGPU_MACH`` Values
:name: amdgpu-ef-amdgpu-mach-table		:name: amdgpu-ef-amdgpu-mach-table

================================= ========== =============================		================================= ========== =============================
Name Value Description (see		Name Value Description (see
:ref:`amdgpu-processor-table`)		:ref:`amdgpu-processor-table`)
================================= ========== =============================		================================= ========== =============================
``EF_AMDGPU_MACH_NONE`` 0x000 not specified		``EF_AMDGPU_MACH_NONE`` 0x000 not specified
Show All 36 Lines	.. table:: AMDGPU ``EF_AMDGPU_MACH`` Values
reserved 0x032 Reserved.		reserved 0x032 Reserved.
``EF_AMDGPU_MACH_AMDGCN_GFX1010`` 0x033 ``gfx1010``		``EF_AMDGPU_MACH_AMDGCN_GFX1010`` 0x033 ``gfx1010``
``EF_AMDGPU_MACH_AMDGCN_GFX1011`` 0x034 ``gfx1011``		``EF_AMDGPU_MACH_AMDGCN_GFX1011`` 0x034 ``gfx1011``
``EF_AMDGPU_MACH_AMDGCN_GFX1012`` 0x035 ``gfx1012``		``EF_AMDGPU_MACH_AMDGCN_GFX1012`` 0x035 ``gfx1012``
``EF_AMDGPU_MACH_AMDGCN_GFX1030`` 0x036 ``gfx1030``		``EF_AMDGPU_MACH_AMDGCN_GFX1030`` 0x036 ``gfx1030``
``EF_AMDGPU_MACH_AMDGCN_GFX1031`` 0x037 ``gfx1031``		``EF_AMDGPU_MACH_AMDGCN_GFX1031`` 0x037 ``gfx1031``
================================= ========== =============================		================================= ========== =============================

		.. table:: AMDGPU ``EF_AMDGPU_FEATURE_XNACK`` Values (see :ref:`amdgpu-target-features`)
		:name: amdgpu-ef-amdgpu-feature-xnack-table

		============================================ =====
		Name Value
		============================================ =====
		``EF_AMDGPU_FEATURE_XNACK_NOT_APPLICABLE`` 0x0
		``EF_AMDGPU_FEATURE_XNACK_DEFAULT`` 0x1
		``EF_AMDGPU_FEATURE_XNACK_OFF`` 0x2
		``EF_AMDGPU_FEATURE_XNACK_ON`` 0x3
		t-tyeUnsubmitted Done Reply Inline Actions ``EF_AMDGPU_FEATURE_XNACK_NOT_SUPPORTED_V4`` 0x0 ``EF_AMDGPU_FEATURE_XNACK_ANY_V4 `` 0x1 ``EF_AMDGPU_FEATURE_XNACK_OFF_V4 `` 0x2 ``EF_AMDGPU_FEATURE_XNACK_ON_V4 `` 0x3 t-tye: ``` ``EF_AMDGPU_FEATURE_XNACK_NOT_SUPPORTED_V4`` 0x0…
		kzhuravlUnsubmitted Done Reply Inline Actions EF_AMDGPU_FEATURE_XNACK_UNSUPPORTED_V4 kzhuravl: EF_AMDGPU_FEATURE_XNACK_UNSUPPORTED_V4
		============================================ =====

		.. table:: AMDGPU ``EF_AMDGPU_FEATURE_SRAMECC`` Values (see :ref:`amdgpu-target-features`)
		:name: amdgpu-ef-amdgpu-feature-sramecc-table

		============================================= =====
		Name Value
		============================================= =====
		``EF_AMDGPU_FEATURE_SRAMECC_NOT_APPLICABLE`` 0x0
		``EF_AMDGPU_FEATURE_SRAMECC_DEFAULT`` 0x1
		``EF_AMDGPU_FEATURE_SRAMECC_OFF`` 0x2
		``EF_AMDGPU_FEATURE_SRAMECC_ON`` 0x3
		t-tyeUnsubmitted Done Reply Inline Actions ``EF_AMDGPU_FEATURE_SRAMECC_NOT_SUPPORTED_V4`` 0x0 ``EF_AMDGPU_FEATURE_SRAMECC_ANY_V4 `` 0x1 ``EF_AMDGPU_FEATURE_SRAMECC_OFF_V4 `` 0x2 ``EF_AMDGPU_FEATURE_SRAMECC_ON_V4 `` 0x3 t-tye: ``` ``EF_AMDGPU_FEATURE_SRAMECC_NOT_SUPPORTED_V4`` 0x0…
		kzhuravlUnsubmitted Done Reply Inline Actions EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4 kzhuravl: EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4
		============================================= =====

Sections		Sections
--------		--------

An AMDGPU target ELF code object has the standard ELF sections which include:		An AMDGPU target ELF code object has the standard ELF sections which include:

.. table:: AMDGPU ELF Sections		.. table:: AMDGPU ELF Sections
:name: amdgpu-elf-sections-table		:name: amdgpu-elf-sections-table

▲ Show 20 Lines • Show All 325 Lines • ▼ Show 20 Lines
For example:		For example:

.. code::		.. code::

file:///dir1/dir2/file1		file:///dir1/dir2/file1
file:///dir3/dir4/file2#offset=0x2000&size=3000		file:///dir3/dir4/file2#offset=0x2000&size=3000
memory://1234#offset=0x20000&size=3000		memory://1234#offset=0x20000&size=3000

		.. _amdgpu-embedding-bundled-objects:

		Embedding Bundled Code Objects
		==============================
		jdoerfertUnsubmitted Done Reply Inline Actions Isn't this a generic thing that is (to be) used by all targets, not just AMDGPU? Should we instead document it elsewhere and link it here? I guess the same could be said for the target id stuff but there people don't really use it elsewhere yet. jdoerfert: Isn't this a generic thing that is (to be) used by all targets, not just AMDGPU? Should we…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions There are two usage of `clang-offload-bundler`: As a generic bundler for clang intermediate files, including preprocessor outputs, LLVM bitcode, object files. The consumer is clang. As a code object bundler (or so called fat binary) which bundles code objects for different GPU's so that it can be embedded in an executable or shared library. The consumer is HIP runtime. Here we only describe the second usage of `clang-offload-bundler`, which is only used by AMDGPU target. As you can see it refers to `code objects` and `target ID`. Therefore it is better kept in AMDGPU documentation. yaxunl: There are two usage of `clang-offload-bundler`: # As a generic bundler for clang…
		jdoerfertUnsubmitted Done Reply Inline Actions Why is this second usage restricted to HIP (in the upcoming future)? I mean, OpenMP offload already "bundles code objects for different GPU's so that it can be embedded in an executable or shared library" today. It just doesn't yet allow to do so on IR level, something AMD is in the process of changing. That said, what distinguishes usage 1 from usage 2 anymore? jdoerfert: Why is this second usage restricted to HIP (in the upcoming future)? I mean, OpenMP offload…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions I separated this part as ClangOffloadBundlerFileFormat.rst yaxunl: I separated this part as ClangOffloadBundlerFileFormat.rst

		Use one or more ``--offload-arch=<target-id>`` clang options to specify the
		target IDs of the offload code regions of a single source programing language.

		The compiler will perform a separate compilation for the host and a separate
		compilation for the offload code regions for each specified target ID. The
		``clang-offload-bundler`` is used to bundle the offload code objects
		(see `ClangOffloadBundlerFileFormat<https://clang.llvm.org/docs/ClangOffloadBundlerFileFormat.html>`_).
		t-tyeUnsubmitted Done Reply Inline Actions Use the Sphinx doc:`ClangOffloadBundlerFileFormat` syntax. I think there is an example elsewhere in the file. t-tye: Use the Sphinx doc:`ClangOffloadBundlerFileFormat` syntax. I think there is an example…
		The bundled code object is embedded in the host code object as a data section
		with the name ``.hip_fatbin``.

		The host compilation includes an ``init`` function that will use the runtime
		corresponding to the offload kind (see :ref:`amdgpu-offload-kind-table`) to
		load the offload code objects appropriate to the devices present when the
		host program is executed.

.. _amdgpu-dwarf-debug-information:		.. _amdgpu-dwarf-debug-information:

DWARF Debug Information		DWARF Debug Information
=======================		=======================

.. warning::		.. warning::

This section describes provisional support for AMDGPU DWARF [DWARF]_ that		This section describes provisional support for AMDGPU DWARF [DWARF]_ that
is not currently fully implemented and is subject to change.		is not currently fully implemented and is subject to change.

AMDGPU generates DWARF [DWARF]_ debugging information ELF sections (see		AMDGPU generates DWARF [DWARF]_ debugging information ELF sections (see
:ref:`amdgpu-elf-code-object`) which contain information that maps the code		:ref:`amdgpu-elf-code-object`) which contain information that maps the code
		JonChesterfieldUnsubmitted Done Reply Inline Actions I'm not sure we want to encode artefacts of clang-offload-bundler in this spec JonChesterfield: I'm not sure we want to encode artefacts of clang-offload-bundler in this spec
		yaxunlAuthorUnsubmitted Done Reply Inline Actions This is due to a restriction of `clang-offload-bundler`. Here we document the current situation. We plan to fix that by removing the artifact. yaxunl: This is due to a restriction of `clang-offload-bundler`. Here we document the current situation.
object executable code and data to the source language constructs. It can be		object executable code and data to the source language constructs. It can be
used by tools such as debuggers and profilers. It uses features defined in		used by tools such as debuggers and profilers. It uses features defined in
:doc:`AMDGPUDwarfExtensionsForHeterogeneousDebugging` that are made available in		:doc:`AMDGPUDwarfExtensionsForHeterogeneousDebugging` that are made available in
DWARF Version 4 and DWARF Version 5 as an LLVM vendor extension.		DWARF Version 4 and DWARF Version 5 as an LLVM vendor extension.

This section defines the AMDGPU target architecture specific DWARF mappings.		This section defines the AMDGPU target architecture specific DWARF mappings.

.. _amdgpu-dwarf-register-identifier:		.. _amdgpu-dwarf-register-identifier:
▲ Show 20 Lines • Show All 943 Lines • ▼ Show 20 Lines	"Printf" sequence of Each string is encoded information

FormatString		FormatString
The format string passed to the		The format string passed to the
printf function call.		printf function call.
"Kernels" sequence of Required Sequence of the mappings for each		"Kernels" sequence of Required Sequence of the mappings for each
mapping kernel in the code object. See		mapping kernel in the code object. See
:ref:`amdgpu-amdhsa-code-object-kernel-metadata-map-table-v2`		:ref:`amdgpu-amdhsa-code-object-kernel-metadata-map-table-v2`
for the definition of the mapping.		for the definition of the mapping.
		"TargetID" string Required <target_triple> "-" <target_id>
========== ============== ========= =======================================		========== ============== ========= =======================================
		t-tyeUnsubmitted Done Reply Inline Actions Add "TargetID" attribute that is a string that is the target ID for the module. t-tye: Add "TargetID" attribute that is a string that is the target ID for the module.
		t-tyeUnsubmitted Done Reply Inline Actions Or is this: <target_triple> "-" <target_id> t-tye: Or is this: <target_triple> "-" <target_id>
		yaxunlAuthorUnsubmitted Done Reply Inline Actions This is V2. Are we sure we want to add target ID to V2? yaxunl: This is V2. Are we sure we want to add target ID to V2?
		t-tyeUnsubmitted Done Reply Inline Actions Sorry I put the comment in the wrong section. I meant to put it in the V3 (not becoming V4) section. t-tye: Sorry I put the comment in the wrong section. I meant to put it in the V3 (not becoming V4)…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Moved to V3. yaxunl: Moved to V3.

..		..

.. table:: AMDHSA Code Object V2 Kernel Metadata Map		.. table:: AMDHSA Code Object V2 Kernel Metadata Map
:name: amdgpu-amdhsa-code-object-kernel-metadata-map-table-v2		:name: amdgpu-amdhsa-code-object-kernel-metadata-map-table-v2

================= ============== ========= ================================		================= ============== ========= ================================
String Key Value Type Required? Description		String Key Value Type Required? Description
▲ Show 20 Lines • Show All 5,374 Lines • ▼ Show 20 Lines
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~		~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Directives which begin with ``.amdgcn`` are valid for all ``amdgcn``		Directives which begin with ``.amdgcn`` are valid for all ``amdgcn``
architecture processors, and are not OS-specific. Directives which begin with		architecture processors, and are not OS-specific. Directives which begin with
``.amdhsa`` are specific to ``amdgcn`` architecture processors when the		``.amdhsa`` are specific to ``amdgcn`` architecture processors when the
``amdhsa`` OS is specified. See :ref:`amdgpu-target-triples` and		``amdhsa`` OS is specified. See :ref:`amdgpu-target-triples` and
:ref:`amdgpu-processors`.		:ref:`amdgpu-processors`.

.amdgcn_target <target>		.amdgcn_target <target_triple> "-" <target_id>
		t-tyeUnsubmitted Done Reply Inline Actions <target> -> <target-id> t-tye: <target> -> <target-id>
		t-tyeUnsubmitted Done Reply Inline Actions Or is this: <target_triple> "-" <target_id> t-tye: Or is this: <target_triple> "-" <target_id>
+++++++++++++++++++++++		+++++++++++++++++++++++
		t-tyeUnsubmitted Done Reply Inline Actions Make +'s match length of above title. t-tye: Make +'s match length of above title.

Optional directive which declares the target supported by the containing		Optional directive which declares the <target_triple> "-" <target_id> supported by the containing
		t-tyeUnsubmitted Done Reply Inline Actions target ID t-tye: target ID
		t-tyeUnsubmitted Done Reply Inline Actions Or is this: <target_triple> "-" <target_id> t-tye: Or is this: <target_triple> "-" <target_id>
assembler source file. Valid values are described in		assembler source file. Valid values are described in
:ref:`amdgpu-amdhsa-code-object-target-identification`. Used by the assembler		:ref:`amdgpu-amdhsa-code-object-target-identification`. Used by the assembler
to validate command-line options such as ``-triple``, ``-mcpu``, and those		to validate command-line options such as ``-triple``, ``-mcpu``, and those
which specify target features.		which specify target features.
		t-tyeUnsubmitted Done Reply Inline Actions and `-mcpu`. t-tye: and ``-mcpu``.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions -mcpu is already mentioned by the same sentence yaxunl: -mcpu is already mentioned by the same sentence
		t-tyeUnsubmitted Done Reply Inline Actions My suggested replacement was to remove the "and those which specify target features." since those options are being removed. So the "and" needs to go before the -mcpu to become: Used by the assembler to validate command-line options such as ``-triple`` and ``-mcpu``. t-tye: My suggested replacement was to remove the "and those which specify target features." since…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Sorry, I misunderstood. Fixed. yaxunl: Sorry, I misunderstood. Fixed.

.amdhsa_kernel <name>		.amdhsa_kernel <name>
+++++++++++++++++++++		+++++++++++++++++++++

Creates a correctly aligned AMDHSA kernel descriptor and a symbol,		Creates a correctly aligned AMDHSA kernel descriptor and a symbol,
``<name>.kd``, in the current location of the current section. Only valid when		``<name>.kd``, in the current location of the current section. Only valid when
the OS is ``amdhsa``. ``<name>`` must be a symbol that labels the first		the OS is ``amdhsa``. ``<name>`` must be a symbol that labels the first
instruction to execute, and does not need to be previously defined.		instruction to execute, and does not need to be previously defined.
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	``.amdhsa_next_free_sgpr`` Required GFX6-GFX10 Maximum SGPR number explicitly referenced, plus one.
Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in		Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.		:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
``.amdhsa_reserve_vcc`` 1 GFX6-GFX10 Whether the kernel may use the special VCC SGPR.		``.amdhsa_reserve_vcc`` 1 GFX6-GFX10 Whether the kernel may use the special VCC SGPR.
Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in		Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.		:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
``.amdhsa_reserve_flat_scratch`` 1 GFX7-GFX10 Whether the kernel may use flat instructions to access		``.amdhsa_reserve_flat_scratch`` 1 GFX7-GFX10 Whether the kernel may use flat instructions to access
scratch memory. Used to calculate		scratch memory. Used to calculate
GRANULATED_WAVEFRONT_SGPR_COUNT in		GRANULATED_WAVEFRONT_SGPR_COUNT in
:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.		:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
``.amdhsa_reserve_xnack_mask`` Target GFX8-GFX10 Whether the kernel may trigger XNACK replay.
Feature Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in
Specific :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
(+xnack)
``.amdhsa_float_round_mode_32`` 0 GFX6-GFX10 Controls FLOAT_ROUND_MODE_32 in		``.amdhsa_float_round_mode_32`` 0 GFX6-GFX10 Controls FLOAT_ROUND_MODE_32 in
		t-tyeUnsubmitted Done Reply Inline Actions This directive is being removed since the information is now present in the target ID that is provided by the .amdgcn_target directive. t-tye: This directive is being removed since the information is now present in the target ID that is…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done
:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.		:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
Possible values are defined in		Possible values are defined in
:ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`.		:ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`.
``.amdhsa_float_round_mode_16_64`` 0 GFX6-GFX10 Controls FLOAT_ROUND_MODE_16_64 in		``.amdhsa_float_round_mode_16_64`` 0 GFX6-GFX10 Controls FLOAT_ROUND_MODE_16_64 in
:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.		:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx10-table`.
Possible values are defined in		Possible values are defined in
:ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`.		:ref:`amdgpu-amdhsa-floating-point-rounding-mode-enumeration-values-table`.
``.amdhsa_float_denorm_mode_32`` 0 GFX6-GFX10 Controls FLOAT_DENORM_MODE_32 in		``.amdhsa_float_denorm_mode_32`` 0 GFX6-GFX10 Controls FLOAT_DENORM_MODE_32 in
▲ Show 20 Lines • Show All 200 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Add documentation for target ID and ClangOffloadBundlerFormatAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 286397

clang/docs/ClangOffloadBundlerFileFormat.rst

clang/docs/index.rst

llvm/docs/AMDGPUUsage.rst

Add documentation for target ID and ClangOffloadBundlerFormat
AbandonedPublic