This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][FIX] Make shfl[_sync] for unsigned long long non-recursive
ClosedPublic

Authored by jdoerfert on Jul 11 2022, 7:44 PM.

Details

Summary

A copy-paste error caused UB in the definition of the unsigned long long
versions of the shfl intrinsics. Reported and diagnosed by @trws.

Diff Detail

Event Timeline

jdoerfert created this revision.Jul 11 2022, 7:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2022, 7:44 PM
jdoerfert requested review of this revision.Jul 11 2022, 7:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2022, 7:44 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
tra added a comment.Jul 12 2022, 12:27 PM

Oops. Thank you for fixing this.

clang/test/CodeGenCUDA/shuffle_long_long.cu
52

This crashes LLVM when we taget sm_70 where these instructions no longer exist. We should probably disable those sync wrappers when we compile for GPUs where they are not available, so we'd get a proper compiler error instead of a crash.

Also, we should probably make non-sync instruction use conditional on SYNC. https://godbolt.org/z/7n4vsb41v

Address comments

@tra, unsure about the crash. For me this passes fine (no gpu), is anything missing?

tra added a comment.Jul 20 2022, 10:54 AM

@tra, unsure about the crash. For me this passes fine (no gpu), is anything missing?

The tests in the patch are running with -emit-llvm, so they are not actually lowering to NVPTX and that's where the failure happens. https://godbolt.org/z/cchaWxrhn

clang/lib/Headers/__clang_cuda_intrinsics.h
237–239

Nit: this change is irrelevant to the patch and can be removed.

clang/test/CodeGenCUDA/shuffle_long_long.cu
10

This macro should not be set. If you do need something to be compiled for sm_30, you should've specified via -target-cpu sm_30.

15

Nit: this should be <...> as we want the include to be found in compiler's include paths.

@tra, unsure about the crash. For me this passes fine (no gpu), is anything missing?

The tests in the patch are running with -emit-llvm, so they are not actually lowering to NVPTX and that's where the failure happens. https://godbolt.org/z/cchaWxrhn

The assertion is arguably not great but doesn't really matter, does it? How would I detect if they are supported?

clang/lib/Headers/__clang_cuda_intrinsics.h
237–239

me running clang format on the file. I'll push it nfc before.

tra added a comment.Jul 20 2022, 3:12 PM

The assertion is arguably not great but doesn't really matter, does it? How would I detect if they are supported?

The latest revision of the patch is fine in this regard. My comment pointing to compiler crash reproducer was only intended to address the "For me this passes fine" part.

The only remaining thing is the manual __CUDA_ARCH__ redifinition which looks suspect to me. Is there any reason not to use -target-cpu sm_30 instead?

jdoerfert updated this revision to Diff 446285.Jul 20 2022, 3:37 PM

Address comments

The assertion is arguably not great but doesn't really matter, does it? How would I detect if they are supported?

The latest revision of the patch is fine in this regard. My comment pointing to compiler crash reproducer was only intended to address the "For me this passes fine" part.

The only remaining thing is the manual __CUDA_ARCH__ redifinition which looks suspect to me. Is there any reason not to use -target-cpu sm_30 instead?

No reason. Changed.

jdoerfert marked 4 inline comments as done.Jul 20 2022, 3:38 PM
tra accepted this revision.Jul 20 2022, 3:55 PM
This revision is now accepted and ready to land.Jul 20 2022, 3:55 PM
This revision was landed with ongoing or failed builds.Jul 21 2022, 10:41 AM
This revision was automatically updated to reflect the committed changes.