This is an archive of the discontinued LLVM Phabricator instance.

llvm/docs/AMDGPUUsage.rst
1111–1112	I feel like this is missing some context. What is "an llvm.amdgcn.cs.chain sequence in the function epilog"???
1113–1118	I'm not sure any of this belongs in a calling convention description. The dealloc_vgprs thing applies to all kernels regardless of calling convention, but probably doesn't need to be documented to the end user anyway.
1120–1121	So they're not launched by hardware? That feels like the biggest difference from amdgpu_cs, and should probably be mentioned first.

nhaehnle added a reviewer: t-tye.Jun 6 2023, 4:47 AM

nhaehnle added a subscriber: t-tye.

nhaehnle added inline comments.

llvm/docs/AMDGPUUsage.rst
1111–1112	It's a new intrinsic that belongs to the same set of changes, so while the context is missing from this patch, I think it's fair to let it be added implicitly by the change that adds the intrinsic.
1113–1118	It is relevant if anybody wanted to try writing compatible code via some non-LLVM mechanism. @t-tye may have opinions on this as well.

arsenm added a subscriber: arsenm.Jun 6 2023, 5:33 AM

arsenm added inline comments.

llvm/docs/AMDGPUUsage.rst
1127	Why? Entry points require 256 byte align but regular code is fine with 4

nhaehnle added inline comments.Jun 6 2023, 11:28 AM

llvm/docs/AMDGPUUsage.rst
1127	Entry points require 256 for HW reasons. The 64 bytes we discussed is a SW choice that is orthogonal to the base functionality of amdgpu_cs_chain (making LSBs of function pointers available to stuff metadata in -- which we also could have considered doing with amdgpu_gfx functions).

Address some of the previous comments and alphabetise.

rovka marked an inline comment as done.Jun 8 2023, 1:02 AM

rovka added inline comments.

llvm/docs/AMDGPUUsage.rst
1113–1118	Ok, I'll wait for more feedback on this.
1120–1121	Good point, fixed.
1127	I agree this might not be the most pertinent place to put it, but it feels like relevant information, like you said above. I don't feel very strongly about this, we could just omit it (or mention it elsewhere if you think there's a better place for it).

Harbormaster completed remote builds in B237437: Diff 529522.Jun 8 2023, 1:41 AM

nhaehnle added inline comments.Jun 12 2023, 1:24 AM

llvm/docs/AMDGPUUsage.rst
1127	Yeah, but the alignment is relevant information in the context of the specific use of this feature in LLPC. I talked with @ruiling and he's going to use Function::setAlignment to explicitly set the desired alignment. So I'd prefer to remove it here. Or perhaps rephrase it as an "FYI, there's this use of the feature which is going to set explicit alignments on the functions". But at least to me that just seems a little weird.

nhaehnle added inline comments.Jun 12 2023, 1:26 AM

llvm/docs/AMDGPUUsage.rst
1127	And I think one big reason why it feels a little weird is that there could be tons of little "FYIs" that we could add about this feature. Why is this one special? And adding all of them would just blow up this document...

foad added inline comments.Jun 12 2023, 1:39 AM

llvm/docs/AMDGPUUsage.rst
1113–1118	It is relevant if anybody wanted to try writing compatible code via some non-LLVM mechanism. Then I don't understand what the part about waitcnts, starting from "Waits for regular memory counters are not inserted", is trying to tell me. (And I speak as someone who is quite familiar with the dealloc vgprs issue!) It is written in the style "the compiler does X". Could you rephrase it more like "the required state at a function call boundary is Y"?

Remove the bits about function alignment and MSG_DEALLOC_VGPRs.

Harbormaster completed remote builds in B238455: Diff 530858.Jun 13 2023, 6:09 AM

nhaehnle accepted this revision.Jun 19 2023, 9:13 AM

This revision is now accepted and ready to land.Jun 19 2023, 9:13 AM

Closed by commit rG041bfe40a771: [AMDGPU] Document amdgpu_cs_chain[_preserve] CCs. NFC (authored by rovka). · Explain WhyJun 20 2023, 1:47 AM

This revision was automatically updated to reflect the committed changes.

rovka added a commit: rG041bfe40a771: [AMDGPU] Document amdgpu_cs_chain[_preserve] CCs. NFC.

Revision Contents

Path

Size

llvm/

docs/

AMDGPUUsage.rst

43 lines

Diff 529522

llvm/docs/AMDGPUUsage.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,064 Lines • ▼ Show 20 Lines	.. table:: AMDGPU Calling Conventions
``ccc`` The C calling convention. Used by default.		``ccc`` The C calling convention. Used by default.
See :ref:`amdgpu-amdhsa-function-call-convention-non-kernel-functions`		See :ref:`amdgpu-amdhsa-function-call-convention-non-kernel-functions`
for more details.		for more details.

``amdgpu_cs`` Used for Mesa/AMDPAL compute shaders.		``amdgpu_cs`` Used for Mesa/AMDPAL compute shaders.
..TODO::		..TODO::
Describe.		Describe.

		``amdgpu_cs_chain`` Similar to ``amdgpu_cs``, with differences described below.

		Functions with this calling convention cannot be called directly. They must
		instead be launched via the ``llvm.amdgcn.cs.chain`` intrinsic.

		Arguments are passed in SGPRs, starting at s0, if they have the ``inreg``
		attribute, and in VGPRs otherwise, starting at v8. Using more SGPRs or VGPRs
		than available in the subtarget is not allowed. On subtargets that use
		a scratch buffer descriptor (as opposed to ``scratch_{load,store}_*`` instructions),
		the scratch buffer descriptor is passed in s[48:51]. This limits the
		SGPR / ``inreg`` arguments to the equivalent of 48 dwords; using more
		than that is not allowed.

		The return type must be void.
		Varargs, sret, byval, byref, inalloca, preallocated are not supported.

		Values in scalar registers as well as v0-v7 are not preserved. Values in
		VGPRs starting at v8 are not preserved for the active lanes, but must be
		saved by the callee for inactive lanes when using WWM.

		Wave scratch is "empty" at function boundaries. There is no stack pointer input
		or output value, but functions are free to use scratch starting from an initial
		stack pointer. Calls to ``amdgpu_gfx`` functions are allowed and behave like they
		do in ``amdgpu_cs`` functions.

		All counters (``lgkmcnt``, ``vmcnt``, ``storecnt``, etc.) are presumed in an
		unknown state at function entry. Waits for regular memory counters are not
		inserted as part of an ``llvm.amdgcn.cs.chain`` sequence in the function epilog.
		However, we add waits for errata / hardware workarounds in the epilog:

		* On gfx11+, the function epilog waits for any scratch stores to be confirmed. This
		works around the issue that we must wait for scratch stores before sending a
		``MSG_DEALLOC_VGPRS`` message.
		* Additional waits may be required (e.g. ``s_waitcnt_depctr``).

		A function may have multiple exits (e.g. one chain exit and one plain ``ret void``
		for when the wave ends), but all ``llvm.amdgcn.cs.chain`` exits must be in
		uniform control flow.

		Functions must be aligned to at least 64 bytes.
		foadUnsubmitted Not Done Reply Inline Actions I feel like this is missing some context. What is "an llvm.amdgcn.cs.chain sequence in the function epilog"??? foad: I feel like this is missing some context. What is "an llvm.amdgcn.cs.chain sequence in the…
		nhaehnleUnsubmitted Not Done Reply Inline Actions It's a new intrinsic that belongs to the same set of changes, so while the context is missing from this patch, I think it's fair to let it be added implicitly by the change that adds the intrinsic. nhaehnle: It's a new intrinsic that belongs to the same set of changes, so while the context is missing…

		``amdgpu_cs_chain_preserve`` Same as ``amdgpu_cs_chain``, but active lanes for VGPRs starting at v8 are preserved.

``amdgpu_es`` Used for AMDPAL shader stage before geometry shader if geometry is in		``amdgpu_es`` Used for AMDPAL shader stage before geometry shader if geometry is in
use. So either the domain (= tessellation evaluation) shader if		use. So either the domain (= tessellation evaluation) shader if
tessellation is in use, or otherwise the vertex shader.		tessellation is in use, or otherwise the vertex shader.
		foadUnsubmitted Not Done Reply Inline Actions I'm not sure any of this belongs in a calling convention description. The dealloc_vgprs thing applies to all kernels regardless of calling convention, but probably doesn't need to be documented to the end user anyway. foad: I'm not sure any of this belongs in a calling convention description. The dealloc_vgprs thing…
		nhaehnleUnsubmitted Not Done Reply Inline Actions It is relevant if anybody wanted to try writing compatible code via some non-LLVM mechanism. @t-tye may have opinions on this as well. nhaehnle: It is relevant if anybody wanted to try writing compatible code via some non-LLVM mechanism. @t…
		rovkaAuthorUnsubmitted Not Done Reply Inline Actions Ok, I'll wait for more feedback on this. rovka: Ok, I'll wait for more feedback on this.
		foadUnsubmitted Not Done Reply Inline Actions It is relevant if anybody wanted to try writing compatible code via some non-LLVM mechanism. Then I don't understand what the part about waitcnts, starting from "Waits for regular memory counters are not inserted", is trying to tell me. (And I speak as someone who is quite familiar with the dealloc vgprs issue!) It is written in the style "the compiler does X". Could you rephrase it more like "the required state at a function call boundary is Y"? foad: > It is relevant if anybody wanted to try writing compatible code via some non-LLVM mechanism.
..TODO::		..TODO::
Describe.		Describe.

		foadUnsubmitted Done Reply Inline Actions So they're not launched by hardware? That feels like the biggest difference from amdgpu_cs, and should probably be mentioned first. foad: So they're not launched by hardware? That feels like the biggest difference from amdgpu_cs, and…
		rovkaAuthorUnsubmitted Done Reply Inline Actions Good point, fixed. rovka: Good point, fixed.
``amdgpu_gfx`` Used for AMD graphics targets. Functions with this calling convention		``amdgpu_gfx`` Used for AMD graphics targets. Functions with this calling convention
cannot be used as entry points.		cannot be used as entry points.
..TODO::		..TODO::
Describe.		Describe.

``amdgpu_gs`` Used for Mesa/AMDPAL geometry shaders.		``amdgpu_gs`` Used for Mesa/AMDPAL geometry shaders.
		nhaehnleUnsubmitted Not Done Reply Inline Actions This shouldn't be a property of the calling convention. We should just set the alignment attribute in the frontend to ensure this. nhaehnle: This shouldn't be a property of the calling convention. We should just set the alignment…
		arsenmUnsubmitted Not Done Reply Inline Actions Why? Entry points require 256 byte align but regular code is fine with 4 arsenm: Why? Entry points require 256 byte align but regular code is fine with 4
		nhaehnleUnsubmitted Not Done Reply Inline Actions Entry points require 256 for HW reasons. The 64 bytes we discussed is a SW choice that is orthogonal to the base functionality of amdgpu_cs_chain (making LSBs of function pointers available to stuff metadata in -- which we also could have considered doing with amdgpu_gfx functions). nhaehnle: Entry points require 256 for HW reasons. The 64 bytes we discussed is a SW choice that is…
		rovkaAuthorUnsubmitted Not Done Reply Inline Actions I agree this might not be the most pertinent place to put it, but it feels like relevant information, like you said above. I don't feel very strongly about this, we could just omit it (or mention it elsewhere if you think there's a better place for it). rovka: I agree this might not be the most pertinent place to put it, but it feels like relevant…
		nhaehnleUnsubmitted Not Done Reply Inline Actions Yeah, but the alignment is relevant information in the context of the specific use of this feature in LLPC. I talked with @ruiling and he's going to use Function::setAlignment to explicitly set the desired alignment. So I'd prefer to remove it here. Or perhaps rephrase it as an "FYI, there's this use of the feature which is going to set explicit alignments on the functions". But at least to me that just seems a little weird. nhaehnle: Yeah, but the alignment is relevant information in the context of the specific use of this…
		nhaehnleUnsubmitted Not Done Reply Inline Actions And I think one big reason why it feels a little weird is that there could be tons of little "FYIs" that we could add about this feature. Why is this one special? And adding all of them would just blow up this document... nhaehnle: And I think one big reason why it feels a little weird is that there could be tons of little…
..TODO::		..TODO::
Describe.		Describe.

``amdgpu_hs`` Used for Mesa/AMDPAL hull shaders (= tessellation control shaders).		``amdgpu_hs`` Used for Mesa/AMDPAL hull shaders (= tessellation control shaders).
..TODO::		..TODO::
Describe.		Describe.

``amdgpu_kernel`` See :ref:`_amdgpu-amdhsa-function-call-convention-kernel-functions`		``amdgpu_kernel`` See :ref:`_amdgpu-amdhsa-function-call-convention-kernel-functions`
▲ Show 20 Lines • Show All 14,216 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Document amdgpu_cs_chain[_preserve] CCs. NFCClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 529522

llvm/docs/AMDGPUUsage.rst

[AMDGPU] Document amdgpu_cs_chain[_preserve] CCs. NFC
ClosedPublic