We previously split the vector into two halves and performed two vector reduce operations followed by bit shifting and bitwise or. Now, we use NEON's zip1 to concatenate
the halves in a smart way and then perform only a single vector reduce. This boosts performance quite a bit for this small routine, as vector reduce is a rather expensive
intruction. Original discussion for this started in: https://reviews.llvm.org/D145301
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo