add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8)
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
https://reviews.llvm.org/B72136 shows it fails when making HTTP Request. But I don't have permission to restart the build procedure.
Besides, I've check-all locally with success.
Thanks for your comments, Simon. With your Comments, I found my new testcases are useless since they are for Instruction::Select instead of Instruction::ShuffleVector(SK_Select)
Delete some testcases which was added previously since they are for Instruction::Select instead of Instruction::ShuffleVector(SK_Select)
@yubing I've added shuffle-select.ll which should have better test coverage - please can you rebase and check?
Hi, Simon. Should we provide more precise cost for v32i16&v64i8 in avx512f? I think they should be at most 42, according to following code in AVX512ShuffleTbl:
{TTI::SK_PermuteTwoSrc, MVT::v32i16, 42}, {TTI::SK_PermuteTwoSrc, MVT::v64i8, 42},
With avx512f, the cost SK_Select(v32i16 or v64i8) shoulde be 3(vmovdqa64 + vpternlogq)
The moves probably don't really count since they can be eliminated during register renaming. So only the vpternlog executes.
Eh, Craig, why it has relationship with register renaming? I thought, vternlog's third operand should be provided by a vmovdqa64.
Besides, we can observe the following asm for v32i16's SK_Select:
vmovdqa64 .LCPI0_0(%rip), %zmm0 # zmm0 = [0,0,0,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535,65535] vpternlogq $202, 144(%rbp), %zmm4, %zmm0
Sorry I thought the vmovdqa you mentioned was due to the vpternlogq reading 3 sources and clobbering one of them. So sometimes it needs a register to register move to preserve a register.
I'm not sure if we usually cost the constant pool load since its loop invariant. Do we cost the load that vpermi2b/w/d/q would use for 2 source permute?
You're right, we don't need to consider cost of vmovdaq64. In AVX2ShuffleTbl, although shuffle's index is provided by a MOV but we won't take it into consideration:
{TTI::SK_PermuteSingleSrc, MVT::v4f64, 1}, // vpermpd
LGTM with one minor
llvm/lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
1193 | ignore clang-format - please can you add whitespace to align these columns - otherwise its very difficult to see costs at a glance. |
clang-format: please reformat the code