This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Always expand ext/insertelement with divergent idx
ClosedPublic

Authored by rampitec on May 15 2020, 1:38 PM.

Details

Summary

Even though series of cmd/cndmask can produce quite a lot of
code that is still better than a loop. In case of doubles we
would even produce two loops.

Diff Detail

Event Timeline

rampitec created this revision.May 15 2020, 1:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2020, 1:38 PM
arsenm added inline comments.May 18 2020, 7:53 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
102–106

I would invert this and rename it. How about -amdgpu-use-divergent-register-indexing, default false?

9542–9545

GlobalISel needs the compare and select path implemented

rampitec updated this revision to Diff 264665.May 18 2020, 9:49 AM
rampitec marked 2 inline comments as done.

Inverted the option as suggested.

rampitec added inline comments.May 20 2020, 12:01 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
9542–9545

Yes, although that is a separate issue. GlobalISel also needs to work with non-power of two vectors for movrel. Yet another piece of work is to tune the limits, they seem to be suboptimal at least for doubles.

arsenm accepted this revision.May 20 2020, 3:28 PM
This revision is now accepted and ready to land.May 20 2020, 3:28 PM
This revision was automatically updated to reflect the committed changes.