This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Missing __syncthreads intrinsic in __clang_cuda_device_functions.h
Needs ReviewPublic

Authored by bstaletic on May 22 2020, 5:42 PM.

Details

Reviewers
aprantl
tra
Summary

Seems like the __syncthreads is missing from the clang/lib/Headers/__clang_cuda_device_functions.h file. To be honest, I don't know much about CUDA. This issue was noticed by a YouCompleteMe user who then made a pull request:

https://github.com/ycm-core/ycmd/pull/1438

I did not create any tests, because a similar patch did not include tests:

https://reviews.llvm.org/D43602

Diff Detail

Event Timeline

bstaletic created this revision.May 22 2020, 5:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 22 2020, 5:42 PM
bstaletic marked an inline comment as done.May 23 2020, 12:41 AM

This doesn't seem to actually compile:

In file included from <built-in>:1:
In file included from /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang/llvm-project/clang/test/Headers/../../lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h:29:
/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang/llvm-project/build/lib/clang/11.0.0/include/__clang_cuda_device_functions.h:522:16: error: static declaration of '__syncthreads' follows non-static declaration
__DEVICE__ int __syncthreads(void) { return __nvvm_bar0(); }
               ^
/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang/llvm-project/build/lib/clang/11.0.0/include/__clang_cuda_device_functions.h:522:16: note: previous implicit declaration is here
/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang/llvm-project/build/lib/clang/11.0.0/include/__clang_cuda_device_functions.h:522:45: error: use of undeclared identifier '__nvvm_bar0'
__DEVICE__ int __syncthreads(void) { return __nvvm_bar0(); }
                                            ^

Looks like __nvvm_bar0 is not declared/defined anywhere. When grepping, compared to __nvvm_bar0_and, these two are missing:

llvm/include/llvm/IR/IntrinsicsNVVM.td
1034:  def int_nvvm_barrier0_and : GCCBuiltin<"__nvvm_bar0_and">,

clang/include/clang/Basic/BuiltinsNVPTX.def
408:BUILTIN(__nvvm_bar0_and, "ii", "")

Should I add BUILTIN(__nvvm_bar0, "v", "") to BuiltinsNVPTX.def and whatever needs to be added to the IntrinsicsNVVM.td?

clang/lib/Headers/__clang_cuda_device_functions.h
522

This doesn't seem to actually compile:

In file included from <built-in>:1:
In file included from /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang/llvm-project/clang/test/Headers/../../lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h:29:
/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang/llvm-project/build/lib/clang/11.0.0/include/__clang_cuda_device_functions.h:522:16: error: static declaration of '__syncthreads' follows non-static declaration
__DEVICE__ int __syncthreads(void) { return __nvvm_bar0(); }
               ^
/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang/llvm-project/build/lib/clang/11.0.0/include/__clang_cuda_device_functions.h:522:16: note: previous implicit declaration is here
/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang/llvm-project/build/lib/clang/11.0.0/include/__clang_cuda_device_functions.h:522:45: error: use of undeclared identifier '__nvvm_bar0'
__DEVICE__ int __syncthreads(void) { return __nvvm_bar0(); }
                                            ^

Looks like __nvvm_bar0 is not declared/defined anywhere. When grepping, compared to __nvvm_bar0_and(int), these two are missing:

llvm/include/llvm/IR/IntrinsicsNVVM.td
1034:  def int_nvvm_barrier0_and : GCCBuiltin<"__nvvm_bar0_and">,
clang/include/clang/Basic/BuiltinsNVPTX.def
408:BUILTIN(__nvvm_bar0_and, "ii", "")

Should I add BUILTIN(__nvvm_bar0, "v", "") to BuiltinsNVPTX.def and whatever needs to be added to the IntrinsicsNVVM.td?

tra added a comment.May 26 2020, 11:14 AM

__syncthreads is clang's built-in and as such should not be in any header file:
https://github.com/llvm/llvm-project/blob/master/clang/include/clang/Basic/BuiltinsNVPTX.def#L406

My guess is that the root cause of this problem is that source parsing is done using C++, not CUDA and that leads to all sorts of issues.
Also, even if parsing is done as CUDA, in order to work, it will need to point clang to CUDA SDK installation, and that's another bit that's typically missing in various tools that use clang libraries.
Further source of problems will be the fact that CUDA compilation is actually a set of compilations and the tool will need to target appropriate GPU in order to have some GPU-specific builtins available.

In short -- I don't think this patch is needed. I'll chime in on YCM pull request.

oToToT added a subscriber: oToToT.Mar 17 2021, 12:52 AM

I found this patch occasionally, I think D98128 is what you need to solve this problem.