This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][CostModel] Add cost model for experimental.vector.splice
ClosedPublic

Authored by CarolineConcatto on Jun 21 2021, 3:37 AM.

Details

Summary

This patch adds a new ShuffleKind SK_Splice and then handle the cost in
getShuffleCost.

Diff Detail

Event Timeline

CarolineConcatto requested review of this revision.Jun 21 2021, 3:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2021, 3:37 AM

The cost for fixed vector with splice could be improved by changing

InstructionCost getPermuteShuffleOverhead(FixedVectorType *VTy)

to

InstructionCost getPermuteShuffleOverhead(FixedVectorType *VTy, int Index)

and the loop to be until index instead of all elements.
But depends if it is fine to create a new shuffle SK::Splice for exeperimental.vector.splice.

Matt added a subscriber: Matt.Jun 21 2021, 8:51 AM
sdesmalen added inline comments.Jun 22 2021, 1:43 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2041

Separate from the type, I think we'll need to distinguish the costs based on the value of the index as well.

Given two scalable vectors <x0, x1, x2, x3> and <y0, y1, y2, y3>.

  • For a positive offsets we can use SVE's EXT instruction. E.g. to splice at offset #1, the result of the splice will be <x1, x2, x3, y0>.
  • For a negative offset, we can't use EXT but we can instead use SPLICE which requires (generating) a predicate. For a negative offset of 1 we need a predicate of: <0, 0, 0, 1>. This means the operation can be done using whilelt+not+splice, so for negative offsets it would be more expensive.
2042–2054

At the moment, the costs for these is actually quite high because they're expanded to two stores and one reload. That said, I'd prefer not to reflect that in the cost-model because this is not the desired code-gen and we should favour getting more scalable vectorization to get more testing coverage.

2055–2058

The predicates require two stores, a reload and an additional compare operation. Since predicates don't have a dedicated instruction, it should be fair to model the cost as that of two stores, a reload and a compare.

llvm/test/Analysis/CostModel/AArch64/splice.ll
35

nv?

60

odd spaces.

Any reason you didn't use the update_analyze_test_checks.py script?

  • Improve cost for scalable vector

@RKSimon I have changed the RUN line to be accepted by update_analyze_test_checks.py
I did not run the script in sve-intrinsics.ll file because. But the CHECK's for splice is generated by update_analyze_test_checks.py

@sdesmalen The cost now takes into account the index and it is different when the scalar type is i1.
For negative index there is predicate mask and a compare and select instruction to choose the correct elements.
That is the reason it uses getCmpSelInstrCost.
For predicated there is a table that has the cost for promoting and truncating together.

sdesmalen added inline comments.Jun 25 2021, 6:32 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1906

is it even needed to pass a Kind or Mask in the first place, they seem unused.

1907

This can be a switch statement instead?

Also, how about giving the a name like getPromotedTypeForPredicate ?

1934

This could just use getCastInstrCost instead of the custom table?

1944

The compare is always an integer compare, i,e. cmp ge <0, 1, 2, 3, ... N-1>, <idx, idx, idx, ... idx>

1954

I think the cost has to find one, otherwise we have an unhandled/illegal type.
So instead of if, this should have an assert that Entry != nullptr.

CarolineConcatto marked 5 inline comments as done.
  • Address Sander's comment
CarolineConcatto marked 2 inline comments as done.Jun 28 2021, 2:24 AM
CarolineConcatto added inline comments.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1906

Hey Sander,
You are correct about Mask, but Kind is needed.

1907

Compiler complains that MVN is not an integer for the switch.

-Use switch to implement promote type
-Remove parameter Kind from getSpliceCost

Thanks for the changes, this is looking better! Just left a few more nits.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1902

Can you move this out of the function into a separate static MVT getPromotedTypeForPredicate function? Perhaps we'll want to reuse this at a later point.

1938

If above you write:

std::pair<InstructionCost, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);
MVT PromotedVT = LT.second.getScalarType() == MVT::i1 ? getPromotedTypeForPredicate(LT.second) : LT.second;

Then you can drop the IsPredicated and instead inline PromotedVT.getScalarType() == MVT::i1 in the condition below.

1960

s/Ilegal/Illegal/

1962–1964

If LT.first is unsigned, the if-condition is redundant, you can write

return LegalizationCost * LT.first;

directly.

CarolineConcatto marked 2 inline comments as done.
  • Create static MVT getPromotedTypeForPredicate function
  • Create a MVT PromotedVT
CarolineConcatto marked 2 inline comments as done.Jun 29 2021, 7:53 AM
CarolineConcatto added inline comments.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1902

Is this what you were suggesting?

I spotted a few more things I missed in the last review, but I'm nearly happy.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1902

it was, thanks!

1918

nit: redundant whitespace.

1940

nit: PromotedTy

1944

I'm not sure if this matters, but for LegalVTy that's e.g. nxv16i8, the CondTy is nxv16i1, not LegalVTy.

1946

Should these two selects also be performed on Promoted?

1948

nit: add newline after this, and maybe add a one line comment saying that this implements the cost of the operation being performed on a promoted type.

CarolineConcatto marked 6 inline comments as done.
  • Replace the use of LegalVTy by PromotedVTy when computing the cost when Index<0
  • Add comments about promoted cost
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1946

I was considering use Promoted, but was not sure if it was correct.
I've changed now to use the promoted type.

david-arm added inline comments.Jul 1 2021, 1:17 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
907

nit: Whitespace

1902

Just FYI there is actually already a function in AArch64ISelLowering.cpp that does something very similar:

static inline EVT getPromotedVTForPredicate(EVT VT) {
  assert(VT.isScalableVector() && (VT.getVectorElementType() == MVT::i1) &&
         "Expected scalable predicate vector type!");
  switch (VT.getVectorMinNumElements()) {
  default:
    llvm_unreachable("unexpected element count for vector");
  case 2:
    return MVT::nxv2i64;
  case 4:
    return MVT::nxv4i32;
  case 8:
    return MVT::nxv8i16;
  case 16:
    return MVT::nxv16i8;
  }
}

I wonder if it's worth having a common routine in a header file?

1912

I think at the point we call this function the type has been legalised and split into LT.first (a multiple of a legal type) and LT.second (a legal type). So I think I'd expect the default case to be unreachable here perhaps?

1953

nit: Can you fix the formatting here please? Thanks!

CarolineConcatto marked an inline comment as done.
  • Use getPromotedVTForPredicate from AArch64ISelLowering to compute the promoted type
CarolineConcatto marked 2 inline comments as done.Jul 2 2021, 9:25 AM

Hey @david-arm
So I knew about the getPromotedVTForPredicate, but was not sure it was a good idea to use outside the class.
But as you suggested in the review, then I believe there is no problem in making it public.
I think it is best to have only one place to check the promoted type too.
But if not let me know and I can revert the change and apply your suggestion in the previous function.

This revision is now accepted and ready to land.Jul 5 2021, 4:00 AM
  • Fix format in line 1852: const auto *Entry = CostTableLookup
This revision was landed with ongoing or failed builds.Jul 5 2021, 6:30 AM
This revision was automatically updated to reflect the committed changes.