These all expand to 1 or 2 UNPCK shuffle ops. AVX1/AVX2 sometimes expands to a subvector-concat + permute pattern instead but the costs turn out to be very similar, so move them from the AVX2 to the SSE2 cost table.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
I suppose this is a better ballpark, but i'm not really sold on i64/i32-vf4 part.
llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-2.ll | ||
---|---|---|
13 | LG | |
llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-2.ll | ||
13 ↗ | (On Diff #380193) | @store_i32_stride2_vf4's codegen looks really different, i'm not sure this is right: |
llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-2.ll | ||
13 ↗ | (On Diff #380193) | All of @store_i64_stride2_vf2/@store_i64_stride2_vf4's codegen looks really different. |
llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-2.ll | ||
13 | LG |
LG