Replace existing infrastructure for tracking parallel level using global memory with a per-team shared memory variable. This minimizes the impact of the overhead of tracking the parallel level for non-nested cases.
Details
Details
- Reviewers
ABataev caomhin - Commits
- rG1653633a1c5a: [OpenMP][libomptarget] Use shared memory variable for tracking parallel level
rL350747: [OpenMP][libomptarget] Use shared memory variable for tracking parallel level
rOMP350747: [OpenMP][libomptarget] Use shared memory variable for tracking parallel level
Diff Detail
Diff Detail
- Repository
- rOMP OpenMP
Event Timeline
libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu | ||
---|---|---|
102–103 |
|
libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu | ||
---|---|---|
102–103 | Increments/decrements to this variable are guarded by syncthreads so I don't think we need any additional synchronization here. As for the slot I think we can leave that to have the default value for this case and rely on the compiler to perform the optimization of the remainder operations. |
libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.cu | ||
---|---|---|
101–102 | Better to put this initialization under the guard of the next if statement. |
Better to put this initialization under the guard of the next if statement.