These all expand to 1 or 2 UNPCK shuffle ops. AVX1/AVX2 sometimes expands to a subvector-concat + permute pattern instead but the costs turn out to be very similar, so move them from the AVX2 to the SSE2 cost table.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
I suppose this is a better ballpark, but i'm not really sold on i64/i32-vf4 part.
| llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-2.ll | ||
|---|---|---|
| 13 | LG | |
| llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-2.ll | ||
| 13 | @store_i32_stride2_vf4's codegen looks really different, i'm not sure this is right: | |
| llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-2.ll | ||
| 13 | All of @store_i64_stride2_vf2/@store_i64_stride2_vf4's codegen looks really different. | |
| llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-2.ll | ||
| 13 | LG | |
clang-format: please reformat the code
- {2, MVT::v2i8, 1}, // interleave 2 x 2i8 into 4i8 (and store) - {2, MVT::v4i8, 1}, // interleave 2 x 4i8 into 8i8 (and store) - {2, MVT::v8i8, 1}, // interleave 2 x 8i8 into 16i8 (and store) + {2, MVT::v2i8, 1}, // interleave 2 x 2i8 into 4i8 (and store) + {2, MVT::v4i8, 1}, // interleave 2 x 4i8 into 8i8 (and store) + {2, MVT::v8i8, 1}, // interleave 2 x 8i8 into 16i8 (and store)