This is an archive of the discontinued LLVM Phabricator instance.

Add more efficient vector bitcast for v16i8 on AArch64.
ClosedPublic

Authored by lawben on Jul 28 2023, 7:45 AM.

Details

Summary

We previously split the vector into two halves and performed two vector reduce operations followed by bit shifting and bitwise or. Now, we use NEON's zip1 to concatenate
the halves in a smart way and then perform only a single vector reduce. This boosts performance quite a bit for this small routine, as vector reduce is a rather expensive
intruction. Original discussion for this started in: https://reviews.llvm.org/D145301

Diff Detail

Event Timeline

lawben created this revision.Jul 28 2023, 7:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2023, 7:45 AM
lawben requested review of this revision.Jul 28 2023, 7:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2023, 7:45 AM
dmgreen accepted this revision.Aug 7 2023, 2:41 AM

Looks like a nice improvement. LGTM.

This revision is now accepted and ready to land.Aug 7 2023, 2:41 AM
This revision was automatically updated to reflect the committed changes.