This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Add some notes about amdgpu-flat-work-group-size
ClosedPublic

Authored by arsenm on Jun 30 2023, 6:01 AM.

Download Raw Diff

Details

Reviewers

foad
nhaehnle
yaxunl

Group Reviewers

Restricted Project

Diff Detail

Event Timeline

arsenm created this revision.Jun 30 2023, 6:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2023, 6:01 AM

Herald added subscribers: StephenFan, kerbowa, tpr and 4 others. · View Herald Transcript

arsenm requested review of this revision.Jun 30 2023, 6:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2023, 6:01 AM

Herald added a subscriber: wdng. · View Herald Transcript

arsenm added inline comments.Jun 30 2023, 6:02 AM

llvm/docs/AMDGPUUsage.rst
1000	I suppose this first sentence could use clarification away from dispatched

Harbormaster completed remote builds in B242388: Diff 536200.Jun 30 2023, 7:00 AM

ping

yaxunl added inline comments.Jul 7 2023, 7:27 AM

llvm/docs/AMDGPUUsage.rst
1002–1007	Clang always adds this function attribute to the kernel. The implicit default value specified by Clang is 1,256 for OpenCL and 1,1024 for HIP.

Clarify default is backend default

yaxunl added inline comments.Jul 7 2023, 8:48 AM

llvm/docs/AMDGPUUsage.rst
1004	if the actual block size or workgroup size exceeds the limit, the behaviour will be undefined. For example, even if there is only one active thread but the thread local id exceeds the limit, the behaviour is undefined.

Harbormaster completed remote builds in B243777: Diff 538140.Jul 7 2023, 9:07 AM

scchan added a subscriber: scchan.Jul 7 2023, 10:15 AM

scchan added inline comments.

llvm/docs/AMDGPUUsage.rst
1004	I agree, the nuance here is to refer to the actual work group size at execution time exceeding the limit rather than the number of logical active lanes.

Reword again. I was trying to express you can't do what vulkan was doing and increasing the set of active lanes beyond the bounds

LGTM. Thanks.

This revision is now accepted and ready to land.Jul 7 2023, 12:55 PM

Harbormaster completed remote builds in B243840: Diff 538233.Jul 7 2023, 3:06 PM

30fd35f59ceb4c00a550b82af767a5b9cf9e252d

Revision Contents

Path

Size

llvm/

docs/

AMDGPUUsage.rst

7 lines

Diff 538233

llvm/docs/AMDGPUUsage.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 991 Lines • ▼ Show 20 Lines	The AMDGPU backend supports the following LLVM IR attributes.

.. table:: AMDGPU LLVM IR Attributes		.. table:: AMDGPU LLVM IR Attributes
:name: amdgpu-llvm-ir-attributes-table		:name: amdgpu-llvm-ir-attributes-table

======================================= ==========================================================		======================================= ==========================================================
LLVM Attribute Description		LLVM Attribute Description
======================================= ==========================================================		======================================= ==========================================================
"amdgpu-flat-work-group-size"="min,max" Specify the minimum and maximum flat work group sizes that		"amdgpu-flat-work-group-size"="min,max" Specify the minimum and maximum flat work group sizes that
will be specified when the kernel is dispatched. Generated		will be specified when the kernel is dispatched. Generated
		arsenmAuthorUnsubmitted Done Reply Inline Actions I suppose this first sentence could use clarification away from dispatched arsenm: I suppose this first sentence could use clarification away from dispatched
by the ``amdgpu_flat_work_group_size`` CLANG attribute [CLANG-ATTR]_.		by the ``amdgpu_flat_work_group_size`` CLANG attribute [CLANG-ATTR]_.
The implied default value is 1,1024.		The IR implied default value is 1,1024. Clang may emit this attribute
		with more restrictive bounds depending on language defaults.
		If the actual block or workgroup size exceeds the limit at any point during
		yaxunlUnsubmitted Not Done Reply Inline Actions if the actual block size or workgroup size exceeds the limit, the behaviour will be undefined. For example, even if there is only one active thread but the thread local id exceeds the limit, the behaviour is undefined. yaxunl: if the actual block size or workgroup size exceeds the limit, the behaviour will be undefined.
		scchanUnsubmitted Not Done Reply Inline Actions I agree, the nuance here is to refer to the actual work group size at execution time exceeding the limit rather than the number of logical active lanes. scchan: I agree, the nuance here is to refer to the actual work group size at execution time exceeding…
		the execution, the behavior is undefined. For example, even if there is
		only one active thread but the thread local id exceeds the limit, the
		behavior is undefined.
		yaxunlUnsubmitted Not Done Reply Inline Actions Clang always adds this function attribute to the kernel. The implicit default value specified by Clang is 1,256 for OpenCL and 1,1024 for HIP. yaxunl: Clang always adds this function attribute to the kernel. The implicit default value specified…

"amdgpu-implicitarg-num-bytes"="n" Number of kernel argument bytes to add to the kernel		"amdgpu-implicitarg-num-bytes"="n" Number of kernel argument bytes to add to the kernel
argument block size for the implicit arguments. This		argument block size for the implicit arguments. This
varies by OS and language (for OpenCL see		varies by OS and language (for OpenCL see
:ref:`opencl-kernel-implicit-arguments-appended-for-amdhsa-os-table`).		:ref:`opencl-kernel-implicit-arguments-appended-for-amdhsa-os-table`).
"amdgpu-num-sgpr"="n" Specifies the number of SGPRs to use. Generated by		"amdgpu-num-sgpr"="n" Specifies the number of SGPRs to use. Generated by
the ``amdgpu_num_sgpr`` CLANG attribute [CLANG-ATTR]_.		the ``amdgpu_num_sgpr`` CLANG attribute [CLANG-ATTR]_.
"amdgpu-num-vgpr"="n" Specifies the number of VGPRs to use. Generated by the		"amdgpu-num-vgpr"="n" Specifies the number of VGPRs to use. Generated by the
▲ Show 20 Lines • Show All 14,382 Lines • Show Last 20 Lines