This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Switch backend default max workgroup size to 1024
ClosedPublic

Authored by arsenm on Oct 30 2019, 10:50 PM.

Details

Summary

Previously this would default to 256, not the maximum supported size
of 1024. Using a maximum lower than the hardware maximum requires
language runtimes to enforce this limit for correctness, which no
language has correctly done. Switch the default to the conservatively
correct maximum, and force frontends to opt-in to the more optimal 256
default maximum.

I don't really understand why the changes in occupancy-levels.ll
increased the computed occupancy, which I expected to decrease. I'm
not sure if these tests should be forcing the old maximum.

Diff Detail

Event Timeline

arsenm created this revision.Oct 30 2019, 10:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 30 2019, 10:50 PM

I do not think that deliberately introducing performance regression is a good way to force FE to do anything.

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp
354

It currently returns 2048, not 1024 as far as I can see.

llvm/test/CodeGen/AMDGPU/hsa-metadata-kernel-code-props-v3.ll
15

And given that getMaxFlatWorkGroupSize() returns 2048 I do not understand how does it work.

llvm/test/CodeGen/AMDGPU/occupancy-levels.ll
265

This needs to be investigated first I believe. There must be some wrong logic somewhere with LDS accounting for occupancy.

I do not think that deliberately introducing performance regression is a good way to force FE to do anything.

clang already emits the clamp to 256 if unspecified. The bugs from not being correct by default have come up many times

arsenm marked an inline comment as done.Oct 31 2019, 11:04 AM
arsenm added inline comments.
llvm/test/CodeGen/AMDGPU/hsa-metadata-kernel-code-props-v3.ll
15

D66812 changes this to 1024

I do not think that deliberately introducing performance regression is a good way to force FE to do anything.

clang already emits the clamp to 256 if unspecified. The bugs from not being correct by default have come up many times

OK, thanks.

llvm/test/CodeGen/AMDGPU/hsa-metadata-kernel-code-props-v3.ll
15

Ok

rampitec accepted this revision.Nov 12 2019, 12:13 PM

LGTM

llvm/test/CodeGen/AMDGPU/occupancy-levels.ll
265

I have recollected the logic behind LDS sharing and occupancy calculations. The increase seems to be right because LDS allocation is per WG and with a bigger WG you have less of them.

This revision is now accepted and ready to land.Nov 12 2019, 12:13 PM