D152014 introduced an optimization that favors more smaller blocks over
fewer larger blocks, even if user sets thread_limit explicitly. This patch changes
the behavior to honor user value.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp | ||
---|---|---|
379 | Just check if the user set anything (ThreadLimit[0] > 0), we might still reduce it before but we should not reduce it further here. | |
448–449 | You don't need the changes above if you add || IsNumThreadsFromUser above. That makes it clear we keep the value and just compute num teams based on it. Rather than having all these intermediate computations we do not use. | |
471 |
See comment. LG
openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp | ||
---|---|---|
471 | With the change in the condition above, the old comment was good. |
This broke the AMD GPU buildbot (https://lab.llvm.org/buildbot/#/builders/193/builds/37371)
Just check if the user set anything (ThreadLimit[0] > 0), we might still reduce it before but we should not reduce it further here.