This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Add some notes about amdgpu-flat-work-group-size
ClosedPublic

Authored by arsenm on Jun 30 2023, 6:01 AM.

Details

Reviewers
foad
nhaehnle
yaxunl
Group Reviewers
Restricted Project

Diff Detail

Event Timeline

arsenm created this revision.Jun 30 2023, 6:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2023, 6:01 AM
arsenm requested review of this revision.Jun 30 2023, 6:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2023, 6:01 AM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm added inline comments.Jun 30 2023, 6:02 AM
llvm/docs/AMDGPUUsage.rst
1000

I suppose this first sentence could use clarification away from dispatched

yaxunl added inline comments.Jul 7 2023, 7:27 AM
llvm/docs/AMDGPUUsage.rst
1002–1007

Clang always adds this function attribute to the kernel. The implicit default value specified by Clang is 1,256 for OpenCL and 1,1024 for HIP.

arsenm updated this revision to Diff 538140.Jul 7 2023, 7:33 AM

Clarify default is backend default

yaxunl added inline comments.Jul 7 2023, 8:48 AM
llvm/docs/AMDGPUUsage.rst
1004

if the actual block size or workgroup size exceeds the limit, the behaviour will be undefined. For example, even if there is only one active thread but the thread local id exceeds the limit, the behaviour is undefined.

scchan added a subscriber: scchan.Jul 7 2023, 10:15 AM
scchan added inline comments.
llvm/docs/AMDGPUUsage.rst
1004

I agree, the nuance here is to refer to the actual work group size at execution time exceeding the limit rather than the number of logical active lanes.

arsenm updated this revision to Diff 538233.Jul 7 2023, 12:44 PM

Reword again. I was trying to express you can't do what vulkan was doing and increasing the set of active lanes beyond the bounds

yaxunl accepted this revision.Jul 7 2023, 12:54 PM

LGTM. Thanks.

This revision is now accepted and ready to land.Jul 7 2023, 12:55 PM