This adds some missing single source shuffle costs for AArch64, of i16 and i8 vectors. v4i16 are the same as v4i32 with a worse case cost of 3 coming from the perfect shuffle tables. The larger vector sizes expand into a constant pool, plus a load (and adrp) and a tbl. I arbitrarily chose 8 for the cost to be expensive but not too expensive.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
LGTM! The new costs look a lot more sensible. Not for this patch, but I do wonder why the v4i1,etc. costs are so high for reduce-xor.ll compared to reduce-or.ll?