When max flat workgroup size is not specified, it is set to the default
workgroup size. This prevents kernel launch with a workgroup size larger
than the default. The fix is to ignore a size of 0 and treat it as
unspecified.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
openmp/libomptarget/plugins/amdgpu/src/rtl.cpp | ||
---|---|---|
1922 | I guess this avoids a case where threadsPerGroup is set to zero by a zero ConstWGSize. We've had a few bugs in this area now. The function is a map from a few integers to a few integers containing a lot of domain knowledge about deriving good launch parameters. It's only tested from the OpenMP level, where we can't easily perturb these values, so we miss stuff. I think it's time to move all the launch value manipulation logic into this function, add a boolean to disable printing, and exhaustively test that function at the unit level. We can either do that by making the function constexpr and writing a bunch of static asserts in this file or by declaring the prototype in a header and adding a standalone file that runs a bunch of values through it. Either way, we'd end up with a clear list of the cases it's intended to handle and much better odds of eliminating bugs in the edge cases. |
I guess this avoids a case where threadsPerGroup is set to zero by a zero ConstWGSize.
The scenario I am fixing is what happens in the default case today. No flat max-work-group size is intended, so it starts off as 0 but then we set ConstWGSize to 256 if the former is 0 (line 1710 above). So getLaunchVals limits threadsPerGroup to 256 because ConstWGSize is 256. My fix allows setting threadsPerGroup beyond 256 but limited to 1024.
I guess this avoids a case where threadsPerGroup is set to zero by a zero ConstWGSize.
We've had a few bugs in this area now. The function is a map from a few integers to a few integers containing a lot of domain knowledge about deriving good launch parameters. It's only tested from the OpenMP level, where we can't easily perturb these values, so we miss stuff.
I think it's time to move all the launch value manipulation logic into this function, add a boolean to disable printing, and exhaustively test that function at the unit level.
We can either do that by making the function constexpr and writing a bunch of static asserts in this file or by declaring the prototype in a header and adding a standalone file that runs a bunch of values through it. Either way, we'd end up with a clear list of the cases it's intended to handle and much better odds of eliminating bugs in the edge cases.