This is an archive of the discontinued LLVM Phabricator instance.

[SLP] General improvements of SLP vectorization process.
ClosedPublic

Authored by ABataev on Feb 10 2017, 7:25 AM.

Details

Summary

Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does:

  1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts.
  2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible.

Diff Detail

Repository
rL LLVM

Event Timeline

ABataev created this revision.Feb 10 2017, 7:25 AM
mkuper edited edge metadata.

Thanks, Alexey.
Could you explain why this is better than just changing the main iteration to go backwards? Wouldn't it solve the same problem, without adding the distinction between different kinds of root nodes?

(Adding Ayal and Gil to this one too.)

lib/Transforms/Vectorize/SLPVectorizer.cpp
4476 ↗(On Diff #87997)

We generally try not to have unbounded recursion. I guess you could add a bound here, but I think it would be better to just rewrite this iteratively.

4494 ↗(On Diff #87997)

Ah, I see, we already had unbounded recursion here. This is kind of unfortunate. :-\
I guess one of us can change this in a separate change-list, but I still wouldn't introduce another case.

4747 ↗(On Diff #87997)

Maybe split this out? It looks fairly large for a lambda. Especially since it captures "this".

4800 ↗(On Diff #87997)

I'm a bit confused. Why is this a good time to call ProcessInstructions()?

4832 ↗(On Diff #87997)

I'm not sure this is the right condition.
On the one hand, I'm tempted to just say that any instruction which has use_empty() would be a good candidate - but I'm not sure whether we want to vectorize dead code. Then again, there aren't really too many cases of side-effecting non-void instructions. The only other case I can think of off the top of my head is InvokeInst.

Thanks, Alexey.
Could you explain why this is better than just changing the main iteration to go backwards? Wouldn't it solve the same problem, without adding the distinction between different kinds of root nodes?

(Adding Ayal and Gil to this one too.)

I tried the reversed order of the instruction analysis. I don't remember exactly why, but I decided not to continue to work on this. IIRC, there were troubles with the handling of PHIs.

ABataev added inline comments.Feb 17 2017, 7:36 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
4476 ↗(On Diff #87997)

Actually, it is s full copy of findBuildAggregate function. Ok, will rewrite it iteratively.

4747 ↗(On Diff #87997)

Agree, will add functions instead of this long lambda.

4800 ↗(On Diff #87997)

Because we ran into a key node, that was analyzed already, but we found some CmpInst, InsertElement or InsertValue instructions that were not processed yet (we're restarting the vectorization for the whole BB each time we succeded in one of our vectorization attempt). For example, it might be some new instructions, that were inserted and that may be optimized too.

4832 ↗(On Diff #87997)

Yes, I thought about InvokeInst, but thought that it is derived from CallInst. Will add it.
At least, we're not worse in this case than were before.

ABataev updated this revision to Diff 89121.Feb 20 2017, 7:29 AM

Update after review

ABataev updated this revision to Diff 89122.Feb 20 2017, 7:31 AM

Small update

anemet added a subscriber: anemet.Jun 1 2017, 12:39 PM
RKSimon added inline comments.Jul 31 2017, 9:14 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
4152 ↗(On Diff #89122)

This BinaryOperator->Instruction change (along with associated changes in tryToVectorizeHorReductionOrInstOperands and vectorizeRootInstruction) look like they can pulled out and committed without any problem.

test/Transforms/SLPVectorizer/AArch64/gather-root.ll
1 ↗(On Diff #89122)

Regenerate the current trunk codegen with the script (it looks like you've edited below), commit that and rebase to show the diffs in this patch.

ABataev updated this revision to Diff 109355.Aug 2 2017, 8:08 AM

Update after review.

RKSimon edited edge metadata.Aug 3 2017, 4:22 AM

Does anyone else have any comments?

test/Transforms/SLPVectorizer/X86/horizontal.ll
834 ↗(On Diff #109355)

It's a bit annoying that the old scalar reduction hasn't completely disappeared.

876 ↗(On Diff #109355)

It's a bit annoying that the old scalar reduction hasn't completely disappeared.

ABataev added inline comments.Aug 3 2017, 6:10 AM
test/Transforms/SLPVectorizer/X86/horizontal.ll
834 ↗(On Diff #109355)

This is how SLPVectorizer currently works. There is an outstanding patch D29641 that fixes this (and other) problem(s).

Thanks @ABataev - one more minor comment

lib/Transforms/Vectorize/SLPVectorizer.cpp
5290 ↗(On Diff #109355)

(style) missing assert message

ABataev updated this revision to Diff 109577.Aug 3 2017, 9:30 AM

Update after review

RKSimon accepted this revision.Aug 7 2017, 3:06 AM

LGTM - thanks

This revision is now accepted and ready to land.Aug 7 2017, 3:06 AM
This revision was automatically updated to reflect the committed changes.