This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Improve shuffle vector by using wider types
ClosedPublic

Authored by wwei on Oct 12 2021, 1:46 AM.

Details

Summary

Try to widen element type to get a new mask value for a better permutation
sequence, so that we can use NEON shuffle instructions, such as zip1/2,
UZP1/2, TRN1/2, REV, INS, etc.
For example:

`shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 6, i32 7, i32 2, i32 3>`

is equivalent to:

`shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 3, i32 1>`

Finally, we can get:

`mov     v0.d[0], v1.d[1]`

Diff Detail

Event Timeline

wwei created this revision.Oct 12 2021, 1:46 AM
wwei requested review of this revision.Oct 12 2021, 1:46 AM

Hello. This sounds like a nice idea.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9580

Perhaps add a comment explaining the function.

9632

Can this happen?

9639

> 32 is probably enough of a check (and != 1 is a good check too). If we are combining adjacent elements, the most we can combine are two i32's into an i64. I think it's the same thing due to legal types, but is a little more clear.

The comment above could do with being reworded to be clearer too.

9646

Perhaps make a variable for ScalarVT.getFixedSizeInBits() (or VT.getScalarSizeInBits() which I think should be the same thing)

llvm/test/CodeGen/AArch64/neon-widen-shuffle.ll
2

Can you use the update_llc_test_checks script?

wwei updated this revision to Diff 379352.Oct 13 2021, 5:34 AM
wwei added inline comments.Oct 13 2021, 5:37 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9580

added

9632

removed

9639

update the comment

9646

add a new variable unsigned ElementSize

llvm/test/CodeGen/AArch64/neon-widen-shuffle.ll
2

updated.

dmgreen accepted this revision.Oct 14 2021, 12:24 PM

Thanks. LGTM

This revision is now accepted and ready to land.Oct 14 2021, 12:24 PM
This revision was automatically updated to reflect the committed changes.