This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve isGatherShuffledEntry by looking deeper through the reused scalars.
ClosedPublic

Authored by ABataev on Jan 11 2023, 8:33 AM.

Details

Summary

The compiler may produce better results if it does not look for
constants, uses an extra analysis of phi nodes, looks through all tree
nodes without skipping the cases, where the very first set of nodes is
empty. Also, it tries to reshufle the nodes if it is profitable for
sure, i.e. at least 2 scalars are used for single node permutation and at
least 3 scalars are used for the permutation of 2 nodes.

Part of D110978

Diff Detail

Event Timeline

ABataev created this revision.Jan 11 2023, 8:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 11 2023, 8:33 AM
ABataev requested review of this revision.Jan 11 2023, 8:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 11 2023, 8:33 AM
vdmitrie added inline comments.Jan 13 2023, 10:23 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
8123

please describe

8322–8323

IMO, the comment is not much helpful unless "compatible" is described.
We check for specific properties: incoming values are either both constants (undef and/or poison is probably considered a constant as well) or instructions with same opcode having same parent block.

8337

please add some description

8342

Please comment the code?

8346

please add some description

8352

Please comment the code?

8378–8397

Could you please add some comments describing the code?

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll
78–87 ↗(On Diff #488242)

Why this test changed behavior?
My expectation would be the original scalar code is likely noticeably faster than vectorized for this case.

ABataev updated this revision to Diff 489594.Jan 16 2023, 10:01 AM

Address comments

ABataev added inline comments.Jan 17 2023, 3:31 PM
llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll
78–87 ↗(On Diff #488242)

Forgot to adjust the cost of the remaining buildvector elements, the problem of separating the code from the big patch, will fix.
The cost will be synchronized with the codegen once the main patch is committed.

ABataev updated this revision to Diff 489961.Jan 17 2023, 3:32 PM

Address comments

vdmitrie added inline comments.Jan 18 2023, 6:25 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6713–6729

I find this "if" condition quite difficult to follow.
I tried to simplify it by applying logical NOT into expressions in parens
and got the following (which, if I'm not mistaken, is equivalent):

if (E->getOpcode() != Instruction::Load || E->isAltShuffle() ||
    all_of(E->Scalars, [this](Value *V) { return getTreeEntry(V); }) ||
    isSplat(E->Scalars) ||
    (E->Scalars != GatheredScalars && GatheredScalars.size() <= 2))

where I still don't quite follow the second part.
Besides, what makes me nervous is use of isAltShuffle() interface here which was not originally designed for gather nodes. We can gather for multiple reasons including inability to schedule a bundle. We just leave this information not cleared when generate a gather node for such bundle. But originally we only used it to emulate binary operations not present in a hardware to vectorize a set of scalars.
As I understand you are trying to pre-filter gather nodes probably trying to find reshuffles of maximum two distinct scalar instructions before doing further analysis and use isAltShuffle() as an indicator for two ops specifically. But shouldn't we clear instructions state for irrelevant cases when we build a tree
if we are not going to use these nodes for further re-analysis?
We probably could write a reason why scalars are being gathered (directly into a node or as a sidecar data). That would likely simplify many places where we deal with re-analysis to restore that information after we have already built a tree.

6753

sound like this should be inside of getGatherCost().

8126–8131

Thanks. This is very helpful.

8384

typo: "If it is the first entry"

ABataev added inline comments.Jan 19 2023, 7:27 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6713–6729
  1. Even gather nodes may have isAltShuffle(). It may happen, if some vectorizable nodes with isAltShuffle() cannot be scheduled for some reason.

Not sure it should be cleared, we can do this, but actually it won't help with anything, just increase the codebase.

  1. This check here is just be safe with the loads nodes, which may be split into several vector loads, splat nodes, and for very small shuffles, which are definitely not profitable.
6753

No, later all this stuff will go away completely and will be replaced by common procedure for codegen/cost modelling. This is just a temp fix.

ABataev updated this revision to Diff 490524.Jan 19 2023, 8:07 AM

Simplify condition logic

vdmitrie added inline comments.Jan 19 2023, 9:16 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6713–6729

Thanks for explanation. w.r.t. isAltShuffle, yep, probably clearing out the data is not really necessary.
I just highlighted that use of the interface may be confusing. It is just a coincidence that it gives you the answer you'd like to have here, but it was not designed specifically for that. So if you add a short note about that as a comment that would definitely be helpful.

What the following part of the "if" condition targets?
E->Scalars != GatheredScalars && GatheredScalars.size() <= 2

The first part makes "true" if they turn out reordered, the second - when we have just two scalars (for one that would give a splat, for which we have check earlier). Is that reordering check is how you rule out unprofitable cases? I just do not understand why isAltShuffle() would not hit. Unprofitable cases still may have it set.

ABataev added inline comments.Jan 19 2023, 9:25 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6713–6729

isAltShuffle works only for instructions, it does not work for other cases like arguments, globals, etc.

6713–6729

And yes, it is another attempt to cut off unprofitable cases

vdmitrie added inline comments.Jan 19 2023, 10:03 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6713–6729

So these checks won't actually guarantee to cut off all unprofitable cases (as isAltShuffle hits on instructions which were gathered as unprofitable to vectorize). Okay. Thanks for explanation.

8309

why not use isSplat() instead?

ABataev updated this revision to Diff 490597.Jan 19 2023, 10:58 AM

Use existing functions instead of new lambda.

vdmitrie accepted this revision.Jan 19 2023, 11:47 AM

LG. Thanks.

This revision is now accepted and ready to land.Jan 19 2023, 11:47 AM

This breaks https://lab.llvm.org/buildbot/#/builders/37/builds/19641
Can you please take a look?

Sure, will do

Hello,

The following crashes with this patch:
opt -passes="slp-vectorizer" bbi-79327.ll -S -o /dev/null

It hits

opt: ../lib/Transforms/Vectorize/SLPVectorizer.cpp:8267: std::optional<TargetTransformInfo::ShuffleKind> llvm::slpvectorizer::BoUpSLP::isGatherShuffledEntry(const llvm::slpvectorizer::BoUpSLP::TreeEntry *, ArrayRef<llvm::Value *>, SmallVectorImpl<int> &, SmallVectorImpl<const llvm::slpvectorizer::BoUpSLP::TreeEntry *> &): Assertion `NodeUI && "Should only process reachable instructions"' failed.

"for.end5" in the input iis unreachable from entry but jumps to code that is reachable from entry. I suppose that is what trips the slp vectorizer up.

Hello,

The following crashes with this patch:
opt -passes="slp-vectorizer" bbi-79327.ll -S -o /dev/null

It hits

opt: ../lib/Transforms/Vectorize/SLPVectorizer.cpp:8267: std::optional<TargetTransformInfo::ShuffleKind> llvm::slpvectorizer::BoUpSLP::isGatherShuffledEntry(const llvm::slpvectorizer::BoUpSLP::TreeEntry *, ArrayRef<llvm::Value *>, SmallVectorImpl<int> &, SmallVectorImpl<const llvm::slpvectorizer::BoUpSLP::TreeEntry *> &): Assertion `NodeUI && "Should only process reachable instructions"' failed.

"for.end5" in the input iis unreachable from entry but jumps to code that is reachable from entry. I suppose that is what trips the slp vectorizer up.

Must be fixed in 5f928a223ef2c14701eab3077a534c1e49269a51

Hello,

The following crashes with this patch:
opt -passes="slp-vectorizer" bbi-79327.ll -S -o /dev/null

It hits

opt: ../lib/Transforms/Vectorize/SLPVectorizer.cpp:8267: std::optional<TargetTransformInfo::ShuffleKind> llvm::slpvectorizer::BoUpSLP::isGatherShuffledEntry(const llvm::slpvectorizer::BoUpSLP::TreeEntry *, ArrayRef<llvm::Value *>, SmallVectorImpl<int> &, SmallVectorImpl<const llvm::slpvectorizer::BoUpSLP::TreeEntry *> &): Assertion `NodeUI && "Should only process reachable instructions"' failed.

"for.end5" in the input iis unreachable from entry but jumps to code that is reachable from entry. I suppose that is what trips the slp vectorizer up.

Must be fixed in 5f928a223ef2c14701eab3077a534c1e49269a51

Yep, thanks!