A copy-paste error caused UB in the definition of the unsigned long long
versions of the shfl intrinsics. Reported and diagnosed by @trws.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Oops. Thank you for fixing this.
clang/test/CodeGenCUDA/shuffle_long_long.cu | ||
---|---|---|
53 | This crashes LLVM when we taget sm_70 where these instructions no longer exist. We should probably disable those sync wrappers when we compile for GPUs where they are not available, so we'd get a proper compiler error instead of a crash. Also, we should probably make non-sync instruction use conditional on SYNC. https://godbolt.org/z/7n4vsb41v |
The tests in the patch are running with -emit-llvm, so they are not actually lowering to NVPTX and that's where the failure happens. https://godbolt.org/z/cchaWxrhn
clang/lib/Headers/__clang_cuda_intrinsics.h | ||
---|---|---|
238–239 | Nit: this change is irrelevant to the patch and can be removed. | |
clang/test/CodeGenCUDA/shuffle_long_long.cu | ||
10 | This macro should not be set. If you do need something to be compiled for sm_30, you should've specified via -target-cpu sm_30. | |
15 | Nit: this should be <...> as we want the include to be found in compiler's include paths. |
The assertion is arguably not great but doesn't really matter, does it? How would I detect if they are supported?
clang/lib/Headers/__clang_cuda_intrinsics.h | ||
---|---|---|
238–239 | me running clang format on the file. I'll push it nfc before. |
The latest revision of the patch is fine in this regard. My comment pointing to compiler crash reproducer was only intended to address the "For me this passes fine" part.
The only remaining thing is the manual __CUDA_ARCH__ redifinition which looks suspect to me. Is there any reason not to use -target-cpu sm_30 instead?
Nit: this change is irrelevant to the patch and can be removed.