This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Simplify `tryToVectorizeSequence()`
Needs ReviewPublic

Authored by vporpo on May 10 2023, 11:41 AM.

Details

Summary

The original code attempted to vectorize the seed vector without first
separating the entries by Type, which complicated the code.
It would also try to slice the seed vector, but this seems to be redundant as
both vectorizeStores() and tryToVectorizeList() already do that internally.

With this patch the logic is quite simple:

  1. First separate the seeds based on their Type, then
  2. Sort the seeds of each Type using the Comparator
  3. And finally go over each Type and try to vectorize it.

Diff Detail

Event Timeline

vporpo created this revision.May 10 2023, 11:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2023, 11:41 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
vporpo requested review of this revision.May 10 2023, 11:41 AM
ABataev added inline comments.May 10 2023, 12:52 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

Removal of this parameter causes regressions in the vectorization, better to restore it.

14329–14331

It increases memory consumption.

vporpo added inline comments.May 10 2023, 1:00 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

Do you have some examples where this is needed?

14329–14331

The previous version was using a Candidates vector, this one is using a map, so the memory consumption should be fairly similar.
Also this is not a global data structure, it is being freed right away, so memory consumption shouldn't matter much in this context.

ABataev added inline comments.May 10 2023, 1:02 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

No, I don't, just from my experience when I added this parameter. Testing showed regressions in LTO mode with it, IIRC.

14329–14331

It is not quite map, it is mapvector, which is vector + map

ABataev added inline comments.May 10 2023, 1:05 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

IIRC, test/Transforms/SLPVectorizer/slp-max-phi-size.ll is one of the tests

vporpo added inline comments.May 10 2023, 1:06 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

Is this in SPEC?

14329–14331

OK, so if I use an std::map would you be fine with this?

ABataev added inline comments.May 10 2023, 1:08 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

Yes

14329–14331

DenseMap should be good here, yes, since the order of the types does not matter

vporpo added inline comments.May 10 2023, 1:13 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

OK, I will run SPEC lto to check.

IIRC, test/Transforms/SLPVectorizer/slp-max-phi-size.ll is one of the tests

Doesn't slp-max-phi-size.ll look OK though? As far as I understand the purpose of this test is to make sure that we don't vectorize wider than specified by the -slp-max-vf option.

14329–14331

Hmm is iterating over a DenseMap deterministic though?

ABataev added inline comments.May 10 2023, 1:19 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

Probably. If I do remember correctly, it caused regresssions because of too early optimization of phi nodes and some other instructions.

14329–14331

Yes, you cannot iterate through it, so here you need to use MapVector.

vporpo added inline comments.May 17 2023, 10:31 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

I checked the C/C++ benchmarks of CPU2017 and I don't see any regressions. I also checked with our internal testing and there are no issues. So I think it is pretty safe.
In the unlikely case that it causes a regression, we can look into it again and add any missing tests. Wdyt?

ABataev added inline comments.May 17 2023, 10:38 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

Did you try LTO?

vporpo added inline comments.May 17 2023, 11:53 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

Ah let me check. I guess you mean -flto right?

ABataev added inline comments.May 17 2023, 11:59 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

Yes. There were regressions because of too early phi nodes vectorization at compile time, which prevents some later optimizations at link time. We could fix this by checking if LTO is enabled and the stage is linking. In this case we can set the flag to true for phis and other nodes. It would be good if you could make a patch for this!

vporpo added inline comments.May 17 2023, 2:49 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

I checked an LTO build with and I didn't see any performance difference. It's all within the noise margin.
This is the list of test I tried:

500.perlbench_r
502.gcc_r
505.mcf_r
508.namd_r
511.povray_r
519.lbm_r
523.xalancbmk_r
525.x264_r
531.deepsjeng_r
538.imagick_r
544.nab_r
557.xz_r

However, my build configuration may be different than yours. Could you try out the patch with your configuration and point me to the right benchmark/build options?

Changing the functionality depending on whether we are in the linking stage, just to make a SPEC test run faster in some specific configuration does not sound a very good solution to me.

ABataev added inline comments.May 17 2023, 3:06 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

The solution is the same, just at link time we can ignore limits and do our best to vectorize even small sequences, which may cause regression at compile time.

ABataev added inline comments.May 17 2023, 3:33 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

Also, try to compile everything with -march=native on something like Skylake

vporpo added inline comments.May 17 2023, 3:39 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

Yeah I used native for the builds, and I ran it on a Xeon(R) Gold 6154.

just at link time we can ignore limits and do our best to vectorize even small sequences

So do you mean like setting MaxVFOnly to true only for the linking stage of LTO ?

ABataev added inline comments.May 18 2023, 6:01 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

Yeah I used native for the builds, and I ran it on a Xeon(R) Gold 6154.

Then check with the default target also (SSE2). Check D103638, D108740, D92059 for more context about the changes.

just at link time we can ignore limits and do our best to vectorize even small sequences

So do you mean like setting MaxVFOnly to true only for the linking stage of LTO ?

Yes, something like this. Because currently we have to limit some vectorization to avoid regressions with LTO.

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll
96 ↗(On Diff #521059)

Regression?

llvm/test/Transforms/SLPVectorizer/X86/rgb_phi.ll
26

Regression?

vporpo added inline comments.May 18 2023, 2:54 PM
llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll
96 ↗(On Diff #521059)

These regressions are because we are not slicing the Seed vector into smaller sections. I can add this in.

vporpo updated this revision to Diff 523581.May 18 2023, 3:20 PM

If vectorizing the full Seeds vector fails we are also trying to vectorize Slices of it. This should remove the test regressions.

vporpo added inline comments.May 18 2023, 3:24 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

Yes, something like this. Because currently we have to limit some vectorization to avoid regressions with LTO.

But how am I going to test how well this will work? I am pretty sure this won't be the only phase-ordering issue in LTO builds if SLP is enabled in pre-link.

ABataev added inline comments.May 18 2023, 3:34 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14361–14365

Does it make sense to drop this check?

aeubanks added inline comments.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

https://reviews.llvm.org/D148010 sounds like the proper solution for this sort of issue

vporpo added inline comments.May 18 2023, 4:34 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14361–14365

Isn't this already checked within TryToVectorizeHelper? If I am not mistaken both vectorizeStores() and tryToVectorizeList() already check for minimum VF.

ABataev added inline comments.May 18 2023, 4:56 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14327

I rather doubt it would be good solution for SLP. I tried something like this before, saw lots of regressions. Better to run some optimizations (at full register, for example) at compile time and the remaining ones at link time, otherwise there might be not enough resources. SLP vectorizer has some internal limits and optimizing at link time only may exhaust it too fast.

14339

Seeds.size() <= 1

14342

What I'm afraid is that it may prevent some full-register optimizations. You will early optimize part register tress and miss some full-register trees (probably, with alternate opcodes)

14361–14365

It just early skips not profitable cases, saves some compile time, checks that vector can be fed with the elements, otherwise it does not worth trying vectorization, it will be scalarized.

vporpo marked an inline comment as done.May 18 2023, 5:27 PM
vporpo added inline comments.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14342

So would you prefer iterating over the SeedMap twice?
Once with MaxVFOnly = true and then once with MaxVFOnly = false?

14361–14365

It might save a bit of compilation time, but it is just duplicate code. I think it is probably better to move the checks in vectorizeStores() and tryToVectorizeList() as early as possible. Let me check if this is possible.

ABataev added inline comments.May 19 2023, 6:33 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14342

Yes, that's the idea

14361–14365

Sure, I'm fine with this too

vporpo added inline comments.May 19 2023, 1:00 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14342

The problem is that neither is clearly the better strategy. You can come up with examples where vectorizing for MaxVF first across all seeds leads to worse vectorization than trying to vectorize each seed multiple times starting from MaxVF all the way to MinVF. For example when you end up with a short wide tree versus a deep narrower tree.
Also if we are to follow a strategy MaxVF first, then it would probably make more sense if we visit seeds based on some specific order, e.g. from largest to smallest, and even then there would still be cases when the strategy won't work.

In such cases I am in favor of keeping the code as simple as possible because a more complicated solution won't necessarily pay off.

ABataev added inline comments.May 22 2023, 6:58 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
14342

I prefer to keep original implementation for the 2 reasons.

  1. I plan to add a new node entry, which may improve vectorization of nodes with the different operands.
  2. Need to try to improve the whole process by trying vectorization not only for tail values, but for other too.