This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Optimize vector_shuffles that are interleaving the lowest elements of two vectors.
ClosedPublic

Authored by craig.topper on Jan 19 2022, 8:40 PM.

Details

Summary

RISCV only has a unary shuffle that requires places indices in a
register. For interleaving two vectors this means we need at least
two vrgathers and a vmerge to do a shuffle of two vectors.

This patch teaches shuffle lowering to use a widening addu followed
by a widening vmaccu to implement the interleave. First we extract
the low half of both V1 and V2. Then we implement
(zext(V1) + zext(V2)) + (zext(V2) * zext(2^eltbits - 1)) which
simplifies to (zext(V1) + zext(V2) * 2^eltbits). This further
simplifies to (zext(V1) + zext(V2) << eltbits). Then we bitcast the
result back to the original type splitting the wide elements in half.

We can only do this if we have a type with wider elements available.
Because we're using extends we also have to be careful with fractional
lmuls. Floating point types are supported by bitcasting to/from integer.

The tests test a varied combination of LMULs split across VLEN>=128 and
VLEN>=512 tests. There a few tests with shuffle indices commuted as well
as tests for undef indices. There's one test for a vXi64/vXf64 vector which
we can't optimize, but verifies we don't crash.

Diff Detail

Event Timeline

craig.topper created this revision.Jan 19 2022, 8:40 PM
craig.topper requested review of this revision.Jan 19 2022, 8:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2022, 8:40 PM
Herald added subscribers: eopXD, MaskRay. · View Herald Transcript
rogfer01 accepted this revision.Jan 20 2022, 8:50 AM

LGTM

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2348

I admit was very confused by this because I assumed the Mask of the SDNode would be the operand of the IR more or less verbatim (e.g. 0, 2, 1, 3), but apparently it seems it is adjusted by the length of the mask itself (i.e. the elements of the second source are offset by the length of the concatenated vector, not sure if my interpretation is correct after reading the SelectionDAG code though).

This revision is now accepted and ready to land.Jan 20 2022, 8:50 AM
craig.topper added inline comments.Jan 20 2022, 9:01 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2347

consistenly -> consistently
source -> "same source"

2348

IR allows the the sources and mask to have different lengths. SelectionDAG does not. There's a piece of code in SelectionDAGBuilder that tries a few heuristics for matching the lengths. I think it can fall back to a build_vector in the worst case.

Rebase. Fix comment.

This revision was landed with ongoing or failed builds.Jan 20 2022, 2:47 PM
This revision was automatically updated to reflect the committed changes.