This is an archive of the discontinued LLVM Phabricator instance.

[COST] Improve shuffle kind detection if shuffle mask is provided.
ClosedPublic

Authored by ABataev on Apr 20 2021, 9:25 AM.

Details

Summary

Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Diff Detail

Event Timeline

ABataev created this revision.Apr 20 2021, 9:25 AM
ABataev requested review of this revision.Apr 20 2021, 9:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2021, 9:25 AM

Hello @ABataev,
Thank you for the patch and for adding me as a reviewer.
I did a comment on your patch, hope you don't mind.
Carol

llvm/include/llvm/CodeGen/BasicTTIImpl.h
797

Why are you changing this to broadcast I don't see that happening in SLPVectorizer.cpp.
Before your implementation

Kind == TTI::SK_PermuteSingleSrc

would call:

return getPermuteShuffleOverhead(cast<FixedVectorType>(Tp));

But now it calls:

return getBroadcastShuffleOverhead(cast<FixedVectorType>(Tp));

Is that correct?

In my view, some tests would fail because now it has another cost model being called.

Oddly, there is not test failure, despite, in my view and with my knowledge, it would make some tests fail because you are changing the code path for the cost model.

Do you mind adding some tests for that as well?

ABataev added inline comments.Apr 21 2021, 7:40 AM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
797

Yes, there was no such thing in SLP vectorizer, I just extended functionality. Will try to add the tests for it.

ABataev updated this revision to Diff 339372.Apr 21 2021, 1:43 PM

Added detection for broadcast case.

sdesmalen added inline comments.Apr 22 2021, 5:05 AM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
790

nit: maybe improveShuffleKindFromMask is more descriptive?

792–806

nit: This is probably just personal preference, but can this be implemented with a switch? i.e.

if (Mask.empty())
  return Kind;

switch (Kind) {
default:
  break;
case Kind == TTI::SK_PermuteSingleSrc:
  if (ShuffleVectorInst::isReverseMask(Mask))
    return TargetTransformInfo::SK_Reverse;
  if (ShuffleVectorInst::isZeroEltSplatMask(Mask))
    return TargetTransformInfo::SK_Broadcast;
  break;
case TTI::SK_PermuteTwoSrc:
  if (ShuffleVectorInst::isSelectMask(Mask))
    return TargetTransformInfo::SK_Select;
  if (ShuffleVectorInst::isTransposeMask(Mask))
    return TargetTransformInfo::SK_Transpose;
  break;
}
return Kind;
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4317

Is this change supposed to be in this patch?

ABataev added inline comments.Apr 22 2021, 5:08 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4317

Ye, without it SLP is unable to detect broadcast case. I can prepare a separate patch for it.

ABataev updated this revision to Diff 339619.Apr 22 2021, 7:16 AM

Address comments

dmgreen added inline comments.
llvm/include/llvm/CodeGen/BasicTTIImpl.h
802

Some of these Mask functions do not handle all sizes of mask. Do we need to be careful about shuffles that changesLength?

ABataev added inline comments.Apr 25 2021, 12:34 PM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
802

Here we don't have info about length changes, looks like all these functions use the size of the Mask array as vector factor.

dmgreen added inline comments.Apr 25 2021, 1:00 PM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
802

I may have the wrong impression of what is going on exactly, but I think it's about the elements being out of range for the mask size.
(Hopefully) Try this example:
https://godbolt.org/z/WWrPbPxcW

ABataev added inline comments.Apr 25 2021, 1:08 PM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
802

This function does not use 2 different length, it has just one, deduced out of Mask size. It is just impossible to have a change of size situation here, I assume. We do not work with actual shuffles, functions are static.
Also, could you provide more details about your example? What shall I expect as a result!

dmgreen added inline comments.Apr 25 2021, 1:16 PM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
802

Also, could you provide more details about your example? What shall I expect as a result!

It should not crash with this patch :-)

It says for me:

SLP: Adding cost 0 for bundle that starts with   %add.i33 = fadd fast half %21, %22.
SLP: Current total cost = 0
SLP: Adding cost 6 for bundle that starts with   %21 = extractelement <8 x half> %20, i32 0.
SLP: Current total cost = 6
clang: /work/llvm-project/llvm/lib/IR/Instructions.cpp:2122: bool isSingleSourceMaskImpl(llvm::ArrayRef<int>, int): Assertion `I
>= 0 && I < (NumOpElts * 2) && "Out-of-bounds shuffle mask element"' failed.
...
0.      Program arguments: bin/clang -target arm-arm-none-eabi -mcpu=cortex-m55 -Ofast tmp.c -S -o - -mllvm -debug -mllvm -print-
after-all
...
 #9 0x0000558afa5bc6de llvm::ShuffleVectorInst::isSingleSourceMask(llvm::ArrayRef<int>) (bin/clang+0x728e6de)
#10 0x0000558afa5bc911 llvm::ShuffleVectorInst::isSelectMask(llvm::ArrayRef<int>) (bin/clang+0x728e911)
#11 0x0000558af90f9c40 llvm::ARMTTIImpl::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm::ArrayRef
<int>, int, llvm::VectorType*) ARMTargetTransformInfo.cpp:0:0
#12 0x0000558af9fcac6a llvm::TargetTransformInfo::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm:
:ArrayRef<int>, int, llvm::VectorType*) const (bin/clang+0x6c9cc6a)
#13 0x0000558afb0d4f50 llvm::slpvectorizer::BoUpSLP::getEntryCost(llvm::slpvectorizer::BoUpSLP::TreeEntry*) (bin/clang+0x7da6f50)
ABataev added inline comments.Apr 25 2021, 1:58 PM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
802

Also, could you provide more details about your example? What shall I expect as a result!

It should not crash with this patch :-)

It says for me:

SLP: Adding cost 0 for bundle that starts with   %add.i33 = fadd fast half %21, %22.
SLP: Current total cost = 0
SLP: Adding cost 6 for bundle that starts with   %21 = extractelement <8 x half> %20, i32 0.
SLP: Current total cost = 6
clang: /work/llvm-project/llvm/lib/IR/Instructions.cpp:2122: bool isSingleSourceMaskImpl(llvm::ArrayRef<int>, int): Assertion `I
>= 0 && I < (NumOpElts * 2) && "Out-of-bounds shuffle mask element"' failed.
...
0.      Program arguments: bin/clang -target arm-arm-none-eabi -mcpu=cortex-m55 -Ofast tmp.c -S -o - -mllvm -debug -mllvm -print-
after-all
...
 #9 0x0000558afa5bc6de llvm::ShuffleVectorInst::isSingleSourceMask(llvm::ArrayRef<int>) (bin/clang+0x728e6de)
#10 0x0000558afa5bc911 llvm::ShuffleVectorInst::isSelectMask(llvm::ArrayRef<int>) (bin/clang+0x728e911)
#11 0x0000558af90f9c40 llvm::ARMTTIImpl::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm::ArrayRef
<int>, int, llvm::VectorType*) ARMTargetTransformInfo.cpp:0:0
#12 0x0000558af9fcac6a llvm::TargetTransformInfo::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm:
:ArrayRef<int>, int, llvm::VectorType*) const (bin/clang+0x6c9cc6a)
#13 0x0000558afb0d4f50 llvm::slpvectorizer::BoUpSLP::getEntryCost(llvm::slpvectorizer::BoUpSLP::TreeEntry*) (bin/clang+0x7da6f50)
802

Ok, I see. I added some checks in SLP vectorizer but looks like we need some extra checks in other places. Or relax asserts in isSingleSourceMaskImpl function (maybe, using an extra parameter)

ABataev updated this revision to Diff 340569.Apr 26 2021, 9:48 AM

Rebase + generalized a check for mask overflow

dmgreen accepted this revision.Apr 29 2021, 9:15 AM

Thanks. LGTM

This revision is now accepted and ready to land.Apr 29 2021, 9:15 AM
This revision was landed with ongoing or failed builds.Apr 29 2021, 9:43 AM
This revision was automatically updated to reflect the committed changes.