If the partial matching is found and some other scalars must be
inserted, need to account the cost of the extractelements, transformed
to shuffles, and/or reused entries and calculate the cost of inserting
constants properly into the non-poison vectors.
Also, fixed the cost calculation for final gather/buildvector sequence.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
6984 | Yes, because it has some kind of strange estimation if HasUse == false, the cost of insertelement is 0. Original cost estimation estimates the cost of the deleted extractelement instruction to be 3, while the insertelement instruction to be 0. Actually, it would be good to fix this problem in AArch64 cost model. The cost must be considered free, only if the operand0 is undef/poison, otherwise it is not zero. I'm working on another solution, which should generate better shuffles, hope it will fix the regression for AArch64 and improve final emission for other targets. |
Restored original extractcost calculation, reworked estimation of the buildvector with non-undef initial vector.
LGTM with one optional minor
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
2275–2277 | Pull out the HasRealUse computation. bool HasRealUse = Opcode == Instruction::InsertElement && Op0 && !isa<UndefValue>(Op0); return getVectorInstrCostHelper(nullptr, Val, Index, HasRealIUse); |
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
6984 | Yeah that code has always been a bit off. I think once upon a time someone accidentally applied the "zero-lane insert/extract cost 0" to integers as well as floats, and since then it has happened to give better performance in many cases to keep the inaccuracy around. I will look into removing it if I can. |
Pull out the HasRealUse computation.