This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Fix main thread barrier for Pascal and amdgpu
ClosedPublic

Authored by jdenny on Nov 10 2021, 12:25 PM.

Diff Detail

Event Timeline

jdenny created this revision.Nov 10 2021, 12:25 PM
jdenny requested review of this revision.Nov 10 2021, 12:25 PM
jdenny added a project: Restricted Project.Nov 10 2021, 12:38 PM
jdenny set the repository for this revision to rG LLVM Github Monorepo.
jdenny added a subscriber: openmp-commits.

Pretty good chance this is one of the problems on amdgpu as well.

tianshilei1992 added a comment.EditedNov 11 2021, 9:16 AM

IMO, genericStateMachine should not be called by the extra warp. I think it would be more clear to add the guard in __kmpc_target_init.

jdenny updated this revision to Diff 386680.Nov 11 2021, 4:51 PM
jdenny retitled this revision from [OpenMP] Fix master thread barrier for Pascal to [OpenMP] Fix master thread barrier for Pascal and amdgpu.
  • Applied the same fix to the custom state machine, as suggested by @jdoerfert privately, and extended the new test to cover it. For that test on the NVIDIA Pascals I tried, fixing the custom state machine didn't appear to be needed. Perhaps in that version, the master thread manages to be selected for execution before other threads in its warp. However, fixing the custom state machine did prove important for that test on an AMD GPU I tried. Maybe another test would prove it's important for Pascals too, but I haven't looked for one.
  • Moved fix to callers of generic state machine functions, as suggested by @tianshilei1992.
  • Pointed out this fix is also relevant to AMD GPUs, as suggested by @JonChesterfield.
Herald added a project: Restricted Project. · View Herald TranscriptNov 11 2021, 4:51 PM

I just noticed that a few LLVM :: Transforms/OpenMP test need to be updated. I'll do that soon.

jdoerfert added inline comments.Nov 11 2021, 5:40 PM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
3463

This doesn't quite work because now every thread in the last warp will execute the user code.
I think the minimal addition is

if (InitCB <u BlockSize)
  return;

and then whatever we had before.

3579

-master +main

openmp/libomptarget/DeviceRTL/src/Kernel.cpp
103

-master +main

jdenny updated this revision to Diff 386722.Nov 11 2021, 8:06 PM
jdenny marked 2 inline comments as done.
  • Addressed @jdoerfert's comments.
  • Updated LLVM :: Transforms/OpenMP tests.
jdenny marked an inline comment as done.Nov 11 2021, 8:07 PM
jdenny added inline comments.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
3463

Ah, you're right that's not what I meant to do. It managed to work for my test because it eliminated the divergence in the last warp.

jdenny marked an inline comment as done.Nov 11 2021, 8:07 PM
This revision is now accepted and ready to land.Nov 11 2021, 8:48 PM
jdenny retitled this revision from [OpenMP] Fix master thread barrier for Pascal and amdgpu to [OpenMP] Fix main thread barrier for Pascal and amdgpu.Nov 12 2021, 7:32 AM
This revision was landed with ongoing or failed builds.Nov 12 2021, 8:19 AM
This revision was automatically updated to reflect the committed changes.
This comment was removed by Meinersbur.

Sorry, I was too quick to associate the build failure with this patch, which introduced the test, but is not the cause of the recent failure.

Sorry, I was too quick to associate the build failure with this patch, which introduced the test, but is not the cause of the recent failure.

This patch introduced 2 problems and we only fixed one with D114802. @jhuber6 will fix the other one, same idea but different function, today.