This is an archive of the discontinued LLVM Phabricator instance.

[libomptarget][nvptx] Replace cuda atomic primitives with clang intrinsics
ClosedPublic

Authored by JonChesterfield on Jan 23 2021, 12:28 PM.

Details

Summary

[libomptarget][nvptx] Replace cuda atomic primitives with clang intrinsics

Tested by diff of IR generated for target_impl.cu before and after. NFC. Part
of removing deviceRTL build time dependency on cuda SDK.

Diff Detail

Event Timeline

JonChesterfield requested review of this revision.Jan 23 2021, 12:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 23 2021, 12:28 PM
openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
142

what about using NVVM atomic intrinsics directly? We don't need the memory order then.

openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
30

__syncwarp is left. It can be also simply replaced by __nvvm_bar_warp_sync(mask).

openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
30

Syncwarp is one of the cuda_version intrinsics, will handle it with the rest of them.

142

Exposing memory order is a feature. Makes it clear we're using the slow one, gives the hook to change that if we wish.

Also gives an option to use the same clang intrinsics on amdgpu and nvptx if we wish.

tianshilei1992 added inline comments.Jan 23 2021, 1:46 PM
openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
142

That sounds appealing. Maybe we could use this patch to move atomic operations back to common part, and create another patch to rewrite other CUDA intrinsics related functions.

openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
142

I'd rather land this separately to starting to merge parts of the two target_impl files.

I'm not sure whether seq_cst is the right order for nvptx. Amdgcn should probably be device scope + acquire/release, though it's presently seq_cst as that's what hip did by default. So we may want slightly different Intrinsics again later.

tianshilei1992 added inline comments.Jan 23 2021, 2:40 PM
openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
142

JFYI, I'm working on cleaning up code to remove target_impl.h now.

openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu
142

This is presently orthogonal to D95298 and removes about half of the remaining uses of the cuda sdk. I'd prefer land it independently of hashing out consensus on what the library structure should change to.

This revision is now accepted and ready to land.Jan 23 2021, 5:59 PM