This is an archive of the discontinued LLVM Phabricator instance.

[libomptarget] [amdgpu] Fix default setting of max flat workgroup size
ClosedPublic

Authored by dhruvachak on Jun 28 2021, 6:15 PM.

Details

Summary

When max flat workgroup size is not specified, it is set to the default
workgroup size. This prevents kernel launch with a workgroup size larger
than the default. The fix is to ignore a size of 0 and treat it as
unspecified.

Diff Detail

Event Timeline

dhruvachak created this revision.Jun 28 2021, 6:15 PM
dhruvachak requested review of this revision.Jun 28 2021, 6:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2021, 6:15 PM
openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
1922

I guess this avoids a case where threadsPerGroup is set to zero by a zero ConstWGSize.

We've had a few bugs in this area now. The function is a map from a few integers to a few integers containing a lot of domain knowledge about deriving good launch parameters. It's only tested from the OpenMP level, where we can't easily perturb these values, so we miss stuff.

I think it's time to move all the launch value manipulation logic into this function, add a boolean to disable printing, and exhaustively test that function at the unit level.

We can either do that by making the function constexpr and writing a bunch of static asserts in this file or by declaring the prototype in a header and adding a standalone file that runs a bunch of values through it. Either way, we'd end up with a clear list of the cases it's intended to handle and much better odds of eliminating bugs in the edge cases.

JonChesterfield accepted this revision.Jun 29 2021, 6:15 AM

Desire for tests notwithstanding, still want this point fix. Thanks!

This revision is now accepted and ready to land.Jun 29 2021, 6:15 AM

I guess this avoids a case where threadsPerGroup is set to zero by a zero ConstWGSize.

The scenario I am fixing is what happens in the default case today. No flat max-work-group size is intended, so it starts off as 0 but then we set ConstWGSize to 256 if the former is 0 (line 1710 above). So getLaunchVals limits threadsPerGroup to 256 because ConstWGSize is 256. My fix allows setting threadsPerGroup beyond 256 but limited to 1024.

This revision was landed with ongoing or failed builds.Jun 29 2021, 1:48 PM
This revision was automatically updated to reflect the committed changes.