This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Custom ISelLowering for 256b `shuffle_vector v, undef, <1, 0, 1, 0>`
Needs ReviewPublic

Authored by cameron.mcinally on May 8 2023, 9:14 AM.

Details

Summary

Continuing from D149749, here is another neoverse-v1 VLS shuffle that could be lowered better...

%x = shufflevector <2 x double> %v, <2 x double> poison, <4 x i32> <i32 1, i32 0, i32 1, i32 0>

It could be lowered in a number of ways, but I chose:

zip1 z0.d, z0.d, z0.d
uzp1 z0.d, z0.d, z0.d
ext z0.b, z0.b, z0.b, #8

The new lowering shows a 9% performance boost on 538.namd with our out-of-tree compiler.

Note that this solution takes 6 cycles, compared to a NEON sequence at 4 cycles. This is unfortunate, but I could not find a faster SVE sequence for this shuffle. [Maybe a better solution is to fall back to NEON for shuffles of this form?]

Diff Detail

Unit TestsFailed

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptMay 8 2023, 9:14 AM
cameron.mcinally requested review of this revision.May 8 2023, 9:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 8 2023, 9:14 AM

Why not dup z0.q, z0.q[0]; rev z0.d, z0.d?

Matt added a subscriber: Matt.May 8 2023, 2:07 PM

Good call, @efriedma. I didn't realize there are quadword splats.