This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Improve horizontal vectorization for non-power-of-2 number of instructions.
ClosedPublic

Authored by ABataev on Jan 20 2017, 10:34 AM.

Details

Summary

If the number of instructions in horizontal reduction list is not the power of 2 then only PowerOf2Floor(NumberOfInstructions) last elements are actually vectorized, other instructions remain scalar. Patch tries to vectorize the remaining elements either.

Diff Detail

Event Timeline

ABataev created this revision.Jan 20 2017, 10:34 AM
mkuper added inline comments.Jan 20 2017, 4:41 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
4275

As long as you're changing this, maybe name this "Idx" (or leave it as "i" - this is against convention in the strict sense, but that's one of the common exceptions). But I'm also ok with "I" if you think it's better.

Also, this is exactly an example of what I was talking about on the other patch - you could commit the name change separately, as a preparatory patch, and then this patch would only include functional changes, which makes it much easier to review.

4291

Can you explain why you changed ReducedVals[i] to VL[0] here and below?
I'm not sure whether this is a correct change or not, but it seems orthogonal to this patch.

4310

This whole ReduxWidth stuff is fishy.
We use it internally in matchAssociativeReduction(), and then overwrite it just before calling tryToReduce(). Why does tryToReduce use this member variable, instead of having a local (and not having this silliness overwriting it just before calling tryToReduce())?

If this makes sense, then changing this can be another preparatory NFC patch before this one. If not, then why not?

test/Transforms/SLPVectorizer/X86/horizontal-list.ll
572

Could you add a test for something like width 31, so that we get 16 + 8 + 4 + 3?

ABataev marked 2 inline comments as done.Jan 23 2017, 3:29 AM
ABataev added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
4275

Reverted it back. clang-tidy made this change.

4291

Because ReducedVals[i] actually is the same as VL[0]. Take a look at line 4278. VL ArrayRef starts from ReducedVals[i]. I had to make this change because changed the logic of updating of i counter. But anyway, reworked this part a little bit.

4310

Ok, I'll try to prepare the NFC patch for this member

test/Transforms/SLPVectorizer/X86/horizontal-list.ll
572

Ok, will do

ABataev updated this revision to Diff 85357.Jan 23 2017, 4:56 AM
ABataev marked 2 inline comments as done.

Address Michael's comments

mkuper edited edge metadata.Jan 23 2017, 11:00 AM

One more nit.

lib/Transforms/Vectorize/SLPVectorizer.cpp
4276–4277

Could you turn this into a while loop? A for with only the middle term looks odd.

ABataev updated this revision to Diff 85562.Jan 24 2017, 4:24 AM

Convert for loop to while

mkuper accepted this revision.Jan 24 2017, 10:59 AM

LGTM

This revision is now accepted and ready to land.Jan 24 2017, 10:59 AM
This revision was automatically updated to reflect the committed changes.