This is particularly important because a some convergent CUDA intrinsics
(e.g. __shfl_down) are implemented in terms of inline asm.
Details
Details
Diff Detail
Diff Detail
- Repository
- rL LLVM
Event Timeline
Comment Actions
I guess we would not be able to remove convergent from inline asm automatically. Do we need a way to explicitly remove convergent from inline asm?
Comment Actions
We can think about it. I'm not sure it will make a big difference, frankly. Like, if this encourages people to write less inline asm, I'm onboard with that. :)