This patch adds a new ShuffleKind SK_Splice and then handle the cost in
getShuffleCost.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
The cost for fixed vector with splice could be improved by changing
InstructionCost getPermuteShuffleOverhead(FixedVectorType *VTy)
to
InstructionCost getPermuteShuffleOverhead(FixedVectorType *VTy, int Index)
and the loop to be until index instead of all elements.
But depends if it is fine to create a new shuffle SK::Splice for exeperimental.vector.splice.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
2041 | Separate from the type, I think we'll need to distinguish the costs based on the value of the index as well. Given two scalable vectors <x0, x1, x2, x3> and <y0, y1, y2, y3>.
| |
2042–2054 | At the moment, the costs for these is actually quite high because they're expanded to two stores and one reload. That said, I'd prefer not to reflect that in the cost-model because this is not the desired code-gen and we should favour getting more scalable vectorization to get more testing coverage. | |
2055–2058 | The predicates require two stores, a reload and an additional compare operation. Since predicates don't have a dedicated instruction, it should be fair to model the cost as that of two stores, a reload and a compare. | |
llvm/test/Analysis/CostModel/AArch64/splice.ll | ||
35 | nv? | |
60 | odd spaces. |
@RKSimon I have changed the RUN line to be accepted by update_analyze_test_checks.py
I did not run the script in sve-intrinsics.ll file because. But the CHECK's for splice is generated by update_analyze_test_checks.py
@sdesmalen The cost now takes into account the index and it is different when the scalar type is i1.
For negative index there is predicate mask and a compare and select instruction to choose the correct elements.
That is the reason it uses getCmpSelInstrCost.
For predicated there is a table that has the cost for promoting and truncating together.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
1906 | is it even needed to pass a Kind or Mask in the first place, they seem unused. | |
1907 | This can be a switch statement instead? Also, how about giving the a name like getPromotedTypeForPredicate ? | |
1934 | This could just use getCastInstrCost instead of the custom table? | |
1944 | The compare is always an integer compare, i,e. cmp ge <0, 1, 2, 3, ... N-1>, <idx, idx, idx, ... idx> | |
1954 | I think the cost has to find one, otherwise we have an unhandled/illegal type. |
Thanks for the changes, this is looking better! Just left a few more nits.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
1902 | Can you move this out of the function into a separate static MVT getPromotedTypeForPredicate function? Perhaps we'll want to reuse this at a later point. | |
1938 | If above you write: std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp); MVT PromotedVT = LT.second.getScalarType() == MVT::i1 ? getPromotedTypeForPredicate(LT.second) : LT.second; Then you can drop the IsPredicated and instead inline PromotedVT.getScalarType() == MVT::i1 in the condition below. | |
1960 | s/Ilegal/Illegal/ | |
1962–1964 | If LT.first is unsigned, the if-condition is redundant, you can write return LegalizationCost * LT.first; directly. |
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
1902 | Is this what you were suggesting? |
I spotted a few more things I missed in the last review, but I'm nearly happy.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
1902 | it was, thanks! | |
1918 | nit: redundant whitespace. | |
1940 | nit: PromotedTy | |
1944 | I'm not sure if this matters, but for LegalVTy that's e.g. nxv16i8, the CondTy is nxv16i1, not LegalVTy. | |
1946 | Should these two selects also be performed on Promoted? | |
1948 | nit: add newline after this, and maybe add a one line comment saying that this implements the cost of the operation being performed on a promoted type. |
- Replace the use of LegalVTy by PromotedVTy when computing the cost when Index<0
- Add comments about promoted cost
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
1946 | I was considering use Promoted, but was not sure if it was correct. |
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
907 | nit: Whitespace | |
1902 | Just FYI there is actually already a function in AArch64ISelLowering.cpp that does something very similar: static inline EVT getPromotedVTForPredicate(EVT VT) { assert(VT.isScalableVector() && (VT.getVectorElementType() == MVT::i1) && "Expected scalable predicate vector type!"); switch (VT.getVectorMinNumElements()) { default: llvm_unreachable("unexpected element count for vector"); case 2: return MVT::nxv2i64; case 4: return MVT::nxv4i32; case 8: return MVT::nxv8i16; case 16: return MVT::nxv16i8; } } I wonder if it's worth having a common routine in a header file? | |
1912 | I think at the point we call this function the type has been legalised and split into LT.first (a multiple of a legal type) and LT.second (a legal type). So I think I'd expect the default case to be unreachable here perhaps? | |
1953 | nit: Can you fix the formatting here please? Thanks! |
Hey @david-arm
So I knew about the getPromotedVTForPredicate, but was not sure it was a good idea to use outside the class.
But as you suggested in the review, then I believe there is no problem in making it public.
I think it is best to have only one place to check the promoted type too.
But if not let me know and I can revert the change and apply your suggestion in the previous function.
nit: Whitespace