Added function void __kmpc_syncwarp(int32_t) to expose it to the
compiler. It is required to fix the problem with the critical regions in
Cuda9.0+. We cannot use barrier in the critical region, but still need
to reconverge the threads in the warp after. This function allows to do
this.
Details
Details
Diff Detail
Diff Detail
- Repository
- rOMP OpenMP
- Build Status
Buildable 37213 Build 37212: arc lint + arc unit
Event Timeline
Comment Actions
I guess there is a clang-side patch as well which makes use of the new exposed kmpc function? Can you add it to the description? Also, can you add a basic test with a critical region that doesn't work correctly without this patch?
Comment Actions
The test is libomptarget/deviceRTLs/nvptx/test/parallel/spmd_parallel_regions.cpp. Currently it does not work in Cuda9+.
The clang patch is D66673. It is not completed yet, the tests are not updated. The functional part is finished and tested with Cuda 9.2.
Comment Actions
We have uses of __SYNCWARP in the library, right? Can we use the function instead and maybe put it in the language dependent part?