This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Account workgroup size in LDS occupancy limits
ClosedPublic

Authored by rampitec on Feb 1 2017, 1:58 PM.

Details

Summary

Functions matching LDS use to occupancy return results for a workgroup
of 64 workitems. The numbers has to be adjusted for bigger workgroups.
For example a workgroup of size 256 already occupies 4 waves just by
itself. Given that all numbers of LDS use in the compiler are per
workgroup, occupancy shall be multiplied by 4 in this case. Each 64
workitems still limited by the same number, but 4 subrgoups 64 workitems
each can afford 4 times more LDS to get the same occupancy.

In addition change initializes LDS size in the subtarget to a real value
for SI+ targets. This is required since LDS size is a variable in these
calculations.

Diff Detail

Repository
rL LLVM

Event Timeline

rampitec created this revision.Feb 1 2017, 1:58 PM
arsenm added inline comments.Feb 1 2017, 2:10 PM
lib/Target/AMDGPU/AMDGPUSubtarget.cpp
71–74 ↗(On Diff #86711)

This should never happen. These values are already defined by the subtarget feature which set these values. Are you seeing this happen? Is the feature missing for some reason from some processor definitions?

test/CodeGen/AMDGPU/large-work-group-promote-alloca.ll
71 ↗(On Diff #86711)

Can you add the check for the other case too

rampitec added inline comments.Feb 1 2017, 2:13 PM
test/CodeGen/AMDGPU/large-work-group-promote-alloca.ll
71 ↗(On Diff #86711)

Other checks will be CHECK-NOT, SI doe snot have enough LDS for promotion to happen. Do you want me ass SI-NOT?

arsenm added inline comments.Feb 1 2017, 2:15 PM
test/CodeGen/AMDGPU/large-work-group-promote-alloca.ll
71 ↗(On Diff #86711)

Yes, that would be preferable

rampitec added inline comments.Feb 1 2017, 2:19 PM
lib/Target/AMDGPU/AMDGPUSubtarget.cpp
71–74 ↗(On Diff #86711)

Thanks, I have overlooked that.

rampitec updated this revision to Diff 86724.Feb 1 2017, 2:40 PM

Removed LDS size initialization, it is present in features.
Updated test.

rampitec marked 5 inline comments as done.Feb 1 2017, 2:40 PM
arsenm accepted this revision.Feb 1 2017, 3:08 PM

LGTM

This revision is now accepted and ready to land.Feb 1 2017, 3:08 PM
This revision was automatically updated to reflect the committed changes.