Fixes what's left of https://bugs.llvm.org/show_bug.cgi?id=51781.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
IMO, genericStateMachine should not be called by the extra warp. I think it would be more clear to add the guard in __kmpc_target_init.
- Applied the same fix to the custom state machine, as suggested by @jdoerfert privately, and extended the new test to cover it. For that test on the NVIDIA Pascals I tried, fixing the custom state machine didn't appear to be needed. Perhaps in that version, the master thread manages to be selected for execution before other threads in its warp. However, fixing the custom state machine did prove important for that test on an AMD GPU I tried. Maybe another test would prove it's important for Pascals too, but I haven't looked for one.
- Moved fix to callers of generic state machine functions, as suggested by @tianshilei1992.
- Pointed out this fix is also relevant to AMD GPUs, as suggested by @JonChesterfield.
I just noticed that a few LLVM :: Transforms/OpenMP test need to be updated. I'll do that soon.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
3463 | This doesn't quite work because now every thread in the last warp will execute the user code. if (InitCB <u BlockSize) return; and then whatever we had before. | |
3579 | -master +main | |
openmp/libomptarget/DeviceRTL/src/Kernel.cpp | ||
103 | -master +main |
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
3463 | Ah, you're right that's not what I meant to do. It managed to work for my test because it eliminated the divergence in the last warp. |
Sorry, I was too quick to associate the build failure with this patch, which introduced the test, but is not the cause of the recent failure.
This doesn't quite work because now every thread in the last warp will execute the user code.
I think the minimal addition is
and then whatever we had before.