This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][libomptarget] Use shared memory variable for tracking parallel level
ClosedPublic

Authored by gtbercea on Dec 17 2018, 8:44 AM.

Details

Summary

Replace existing infrastructure for tracking parallel level using global memory with a per-team shared memory variable. This minimizes the impact of the overhead of tracking the parallel level for non-nested cases.

Diff Detail

Repository
rOMP OpenMP

Event Timeline

gtbercea created this revision.Dec 17 2018, 8:44 AM
ABataev added inline comments.Jan 8 2019, 12:31 PM
libomptarget/deviceRTLs/nvptx/src/parallel.cu
345
  1. At first, you need to synchronize the threads, and only after that, you can increment the counter.
  2. Use __SYNCTHREADS() instead of the __syncthreads().
  3. Use preincrement.
386–389
  1. Predecrement.
  2. Synchronize at first, then decerement.
  3. Use __SYNCTHREADS()
gtbercea updated this revision to Diff 180731.Jan 8 2019, 1:49 PM
  • Use new sync.
gtbercea marked 2 inline comments as done.Jan 8 2019, 1:55 PM
ABataev added inline comments.Jan 9 2019, 6:22 AM
libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu
102–103
  1. I think it is better to do this initialization in the single thread and you still need to synchronize the threads even in this situation to avoid data race.
  2. Keep the initialization of the usedSlotIdx, it is used in other places.
gtbercea marked an inline comment as done.Jan 9 2019, 6:46 AM
gtbercea added inline comments.
libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu
102–103

Increments/decrements to this variable are guarded by syncthreads so I don't think we need any additional synchronization here. As for the slot I think we can leave that to have the default value for this case and rely on the compiler to perform the optimization of the remainder operations.

gtbercea updated this revision to Diff 180856.Jan 9 2019, 8:58 AM
gtbercea marked an inline comment as done.
  • Add slot initialization.
ABataev added inline comments.Jan 9 2019, 9:01 AM
libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu
101–102

Better to put this initialization under the guard of the next if statement.

gtbercea marked an inline comment as done.Jan 9 2019, 9:09 AM
This revision is now accepted and ready to land.Jan 9 2019, 9:10 AM
This revision was automatically updated to reflect the committed changes.