Currently clang emits these instructions via inline (volatile) asm in
the CUDA headers. Switching to intrinsics will let the optimizer reason
across calls to these intrinsics.
Details
Details
Diff Detail
Diff Detail
- Repository
- rL LLVM
Event Timeline
Comment Actions
LGTM
test/CodeGen/NVPTX/shfl.ll | ||
---|---|---|
19 ↗ | (On Diff #60123) | I'm curious why {{.}}32 here? Do you expect return type to change? |
test/CodeGen/NVPTX/shfl.ll | ||
---|---|---|
19 ↗ | (On Diff #60123) | It's currently a b32, but there's no reason (afaict) that it couldn't be a u32 (or i32). I didn't want to tie this test to the current behavior, since I don't think it matters. |