This is an archive of the discontinued LLVM Phabricator instance.

[LoadStoreVectorizer] Change VectorSet to Vector to match head and tail positions. Resolves PR29148.
ClosedPublic

Authored by asbirlea on Aug 30 2016, 2:51 PM.

Details

Summary

LSV was using two vector sets (heads and tails) to track pairs of adjiacent position to vectorize.
A recent optimization is trying to obtain the longest chain to vectorize and assumes the positions
in heads(H) and tails(T) match, which is not the case is there are multiple tails for the same head.

e.g.:
i1: store a[0]
i2: store a[1]
i3: store a[1]
Leads to:
H: i1
T: i2 i3
Instead of:
H: i1 i1
T: i2 i3
So the positions for instructions that follow i3 will have different indexes in H/T.
This patch resolves PR29148.

This issue also surfaced the fact that if the chain is too long, and TLI
returns a "not-fast" answer, the whole chain will be abandoned for
vectorization, even though a smaller one would be beneficial.
Added a testcase and FIXME for this.

Diff Detail

Repository
rL LLVM

Event Timeline

asbirlea updated this revision to Diff 69760.Aug 30 2016, 2:51 PM
asbirlea retitled this revision from to [LoadStoreVectorizer] Change VectorSet to Vector to match head and tail positions. Resolves PR29148..
asbirlea updated this object.
asbirlea added reviewers: arsenm, jlebar.
asbirlea added a subscriber: llvm-commits.
arsenm edited edge metadata.Aug 30 2016, 2:56 PM

This is about what I was guessing. What about having a side SmallSet instead of the linear is_contained?

I thought of that...not sure if it's worth it, if the vectors we're dealing with here are always very small (I'm thinking they're actually under 16 elements each). I can be convinced either way right now..

arsenm accepted this revision.Aug 30 2016, 3:26 PM
arsenm edited edge metadata.

LGTM. I guess the worst possible case is 16 x 16 which probably isn't so bad

This revision is now accepted and ready to land.Aug 30 2016, 3:26 PM
jlebar accepted this revision.Aug 30 2016, 3:29 PM
jlebar edited edge metadata.
jlebar added inline comments.
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
677 ↗(On Diff #69760)

Please clang-format. :)

asbirlea updated this revision to Diff 69770.Aug 30 2016, 3:39 PM
asbirlea edited edge metadata.

Format.

Here's something that bothers me. While it is possible to have a head with multiple tails, (and the Heads and Tails vectors reflect this), there's the "ConsecutiveChain[i] = j;" which forces a single path chain.
Obviously this will miss vectorization opportunities.
e.g. load a[1], load a[1], load a[2], load a[2] will only vectorize one pair, because the second a[1] will point to the same (already vectorized) a[2].
But the alternative of going though all options can end up being prohibitive.
Say you have load a[1] N times, followed by load a[2] N times etc, you'd end up with N^2 comparisons, extending to N^K for a[K].
Is it worth extending this?

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
677 ↗(On Diff #69770)

ACK.

Is it worth extending this?

Yeah, I think I had the same concern in the original review.

I don't know if it's a worthwhile optimization or not. But if you can hack something together, it should be easy to run an experiment -- compile TensorFlow (or Eigen or Thrust) and use the statistics we already have to count how many extra vectorization opportunities we find.

Sounds good. I'll land this as is to fix the PR and test the extension separately. Thanks!

escha added a subscriber: escha.Aug 30 2016, 4:53 PM

Sadly, the original version of this pass was *very* much not designed for the case of loading the same location multiple times in a single basic block... as you can probably tell.

Well, yes :) ... but that may be fine if the costs outweigh the benefits, we just don't know right now...
As Justin pointed out, I need to get some test data showing one way or the other.

This revision was automatically updated to reflect the committed changes.