This is an archive of the discontinued LLVM Phabricator instance.

[libomptarget][amdgcn] Implement partial barrier
ClosedPublic

Authored by JonChesterfield on Sep 30 2020, 11:46 AM.

Details

Summary

[libomptarget][amdgcn] Implement partial barrier

named_sync is used to coordinate non-spmd kernels. This uses bar.sync on nvptx.
There is no corresponding ISA support on amdgcn, so this is implemented using
shared memory, one word initialized to zero.

Each wave increments the variable by one. Whichever wave is last is responsible
for resetting the variable to zero, at which point it and the others continue.

The race condition on a wave reaching the barrier before another wave has
noticed that it has been released is handled with a generation counter, packed
into the same word.

Uses a shared variable that is not needed on nvptx. Introduces a new hook,
kmpc_impl_target_init, to allow different targets to do extra initialization.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptSep 30 2020, 11:46 AM
JonChesterfield requested review of this revision.Sep 30 2020, 11:46 AM

This was written carefully and reviewed internally. It'll be horrendous to debug in the field if it's wrong so feedback is very welcome.

openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h
187

This is formatted as clang-format desired.

Any objections? This has been through a bunch of internal testing now and seems to hang together. Hopefully the extra __kmpc_impl_target_init() function is acceptable.

jdoerfert accepted this revision.Oct 12 2020, 8:37 AM

One nit. High level and going over the code looks reasonable, let's go with it.

openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.h
114

Add some (brief) documentation to the two methods please.

This revision is now accepted and ready to land.Oct 12 2020, 8:37 AM
This revision was landed with ongoing or failed builds.Oct 12 2020, 1:27 PM
This revision was automatically updated to reflect the committed changes.
openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.h
114

Good call. Added a brief comment to amdgcn and nvptx.