Even though series of cmd/cndmask can produce quite a lot of
code that is still better than a loop. In case of doubles we
would even produce two loops.
Details
Details
Diff Detail
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
9549 | Yes, although that is a separate issue. GlobalISel also needs to work with non-power of two vectors for movrel. Yet another piece of work is to tune the limits, they seem to be suboptimal at least for doubles. |
I would invert this and rename it. How about -amdgpu-use-divergent-register-indexing, default false?