This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Limit TID / wavefrontsize uniformness to 1D kernels
ClosedPublic

Authored by rampitec on Aug 29 2022, 12:24 PM.

Details

Summary

If a kernel has uneven dimensions we can have a value of workitem-id-x
divided by the wavefrontsize non-uniform. For example dimensions (65, 2)
will have workitems with address (64, 0) and (0, 1) packed into a same
wave which gives 1 and 0 after the division by 64 respectively.

Unfortunately, this limits the optimization to OpenCL only and only if
reqd_work_group_size attribute is set. This patch limits it to 1D kernels,
although that shall be possible to perform this optimization is the size
of the X dimension is a power of 2, we just do not currently have
infrastructure to query it.

Note that presence of amdgpu-no-workitem-id-y attribute does not help
as it only hints the lack of the workitem-id-y query, but not the absence
of the actual 2nd dimension, therefore affecting just the SGPR allocation.

Diff Detail

Event Timeline

rampitec created this revision.Aug 29 2022, 12:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2022, 12:24 PM
rampitec requested review of this revision.Aug 29 2022, 12:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2022, 12:24 PM
Herald added a subscriber: wdng. · View Herald Transcript
bcahoon accepted this revision.Aug 30 2022, 12:19 PM

LGTM - thanks!

This revision is now accepted and ready to land.Aug 30 2022, 12:19 PM