This is an archive of the discontinued LLVM Phabricator instance.

[SLPVectorizer] Improved support of partial tree vectorization.
ClosedPublic

Authored by ABataev on Oct 12 2016, 8:19 AM.

Details

Summary

Currently SLP vectorizer tries to vectorize a binary operation and dies immediately after unsuccessful the first unsuccessfull attempt. Patch tries to improve the situation, trying to vectorize all binary operations of all children nodes in the binop tree.

Diff Detail

Repository
rL LLVM

Event Timeline

ABataev updated this revision to Diff 74384.Oct 12 2016, 8:19 AM
ABataev retitled this revision from to [SLPVectorizer] Improved support of partial tree vectorization..
ABataev updated this object.
ABataev added reviewers: hfinkel, mkuper, mzolotukhin.
ABataev added subscribers: RKSimon, spatel, avt77.
avt77 added a comment.Oct 13 2016, 1:02 AM

Could you give some more comments about the suggested algorithm? Is it for reductions only?

Could you give some more comments about the suggested algorithm? Is it for reductions only?

Yes, it is for reductions. Previously, when the non-vectorizable node was found during horizontal reduction analysis, the whole vectorization process stopped without any further attempts for vectorization. Now I consider this node as a possible start for another possibly vectorizable instruction.
Also, patch does not require that the initial instruction is a binary operation. It scans all operands of the root instruction, trying to find all binops in the tree, which can be vectorized.
It allows to vectorize the туче code:

int foo(int *a) {
  int res = 0;
  for (int i = 0; i < 4; ++i)
    res += a[i];
  return res * 42;
}

Currently SLP vectorizer is unable to vectorize it, but this patch allows to do it.

ABataev updated this revision to Diff 76015.Oct 27 2016, 6:29 AM

Updated to latest version

ABataev updated this revision to Diff 78553.Nov 18 2016, 10:33 AM

Updated to latest version. Moved WeakVHWithLevel class to anonymous namespace.

hfinkel edited edge metadata.Nov 22 2016, 7:20 AM

Have you checked the compile-time impact of this change? I wonder if you need some additional cutoff because it looks like you might be increasing the worst-case complexity of the algorithm here.

lib/Transforms/Vectorize/SLPVectorizer.cpp
4459 ↗(On Diff #78553)

The comment here should explain that the child level corresponds to the operand index of the instruction.

4461 ↗(On Diff #78553)
/// Is this the instruction that should be vectorized, or are we now processing children (i.e. operands of this instruction) for potential vectorization?
4499 ↗(On Diff #78553)

The comment here is out of date.

4547 ↗(On Diff #78553)

dyn_cast_or_null -> dyn_cast (here and below). BI->getOperand(0) should not return null.

Have you checked the compile-time impact of this change? I wonder if you need some additional cutoff because it looks like you might be increasing the worst-case complexity of the algorithm here.

Yes, I checked it using test suite + several programs and did not find any significant changes in compile time. Actually, I thought already about some additional cutoff and I think it worth adding just in case.

ABataev updated this revision to Diff 79314.Nov 25 2016, 11:06 AM
ABataev edited edge metadata.

Fixes after Hal's comments

hfinkel accepted this revision.Nov 28 2016, 4:23 PM
hfinkel edited edge metadata.

LGTM

This revision is now accepted and ready to land.Nov 28 2016, 4:23 PM
This revision was automatically updated to reflect the committed changes.
anemet added a subscriber: anemet.May 10 2017, 10:35 AM

Hi Alexey,

I've just come across this commit. I don't think it is up to LLVM standards in terms of understandably. Please improve; I've provided some questions inline.

Thanks,
Adam

llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
4457

Tracks for what?

4499–4505

The name of the function and the comment mismatch. What is this function supposed to do?

4517–4573

This needs a description of the algorithm.

ABataev added inline comments.May 11 2017, 6:18 AM
llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
4457

Traks if the instruction is deleted (replaced by undef value) /replaced by the new instruction (the vector version, as the result of the whole vectorization process) + tracks the processing of the instruction operands

4499–4505

Yes, probably. This function checks if it is possible to vectorize the tree + performs the vectorization of horizontal reduction or, if the instruction is not the top instruction of the horizontal reduction and this is a binary operation, vectorizes the operands of this binary instruction.

4517–4573

The algorithm is the same as before, just exit criteria become a bit weaker.

Please write a new patch with all these improvements and add me as a reviewer. Thanks.

llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
4457

Improve the comment then, please.

4460

OperandIndexAnalyzed?

4464

IsAnalyzingOperands?

Or just merge these into a single state variable, Optional<unsigned> which if None means we're processing the instruction itself.

4499–4505

Then improve the comment please.

4517–4573

There is a non-trivial loop that wasn't there before!? Please explain what it does and what state the various new data structures represent.

anna added a subscriber: anna.Jun 30 2017, 7:10 AM

Have you checked the compile-time impact of this change? I wonder if you need some additional cutoff because it looks like you might be increasing the worst-case complexity of the algorithm here.

Just FYI: this patch was improved at here: https://reviews.llvm.org/D33320
Ongoing discussion about compile time impact is on that review thread. We do see huge compile time impact (internally) because of the change in this function (the processing has increased from a single node to a max of 2 ^ 12 nodes).