Avoid using of the atomic loop to wait for the completion of the
data-sharing interface initialization, use __shfl_sync instead for the
communication within the warp to signal other threads in the warp about
completion of the initialization.
Details
Details
Diff Detail
Diff Detail
- Repository
- rL LLVM
Event Timeline
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu | ||
---|---|---|
449 ↗ | (On Diff #179549) | Isn't FrameP always 8-bytes long? |
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu | ||
---|---|---|
449 ↗ | (On Diff #179549) | What if the architecture is 32 bits? |
libomptarget/deviceRTLs/nvptx/src/data_sharing.cu | ||
---|---|---|
449 ↗ | (On Diff #179549) | That would be true if we compiled the RTL on a 32-bit host system, but - at least at the moment - only 64-bit hosts are supported by libomptarget (x86_64, ppc64, aarch64). So the 32-bit case is unreachable, although admittedly it doesn't hurt to have this discrimination here as a safeguard for the future. On the other hand, how probable is it that we will ever support 32-bit systems? Anyway, the patch looks good, I'm leaving it up to you to decide whether the 32-bit scenario should be considered here. |