Since PTX62 vote.ballot cannot be used to get the mask of active
threads, instead activemask.b32 instruction must be used. Required for
Cuda10.
LLVM part of the patches to fix PR43156.
Paths
| Differential D67130
[NVPTX] Add activemask intrinsic. AbandonedPublic Authored by ABataev on Sep 3 2019, 3:25 PM.
Details
Summary Since PTX62 vote.ballot cannot be used to get the mask of active LLVM part of the patches to fix PR43156.
Diff Detail
Event Timeline
ABataev added inline comments.
Revision Contents
Diff 218548 include/llvm/IR/IntrinsicsNVVM.td
lib/Target/NVPTX/NVPTX.td
lib/Target/NVPTX/NVPTXInstrInfo.td
lib/Target/NVPTX/NVPTXIntrinsics.td
test/CodeGen/NVPTX/activemask.ll
|
Are these attribute sufficient to prevent CSE'ing out of divergent branches.
E.g. we must not allow transforming this:
into that:
It would be great to add a test for that.