This is an archive of the discontinued LLVM Phabricator instance.

[LoadStoreVectorizer] Split the chain if the prefix is empty
AbandonedPublic

Authored by volkan on Nov 22 2016, 4:03 AM.

Details

Summary

Currently, the vectorizer discards all of the instructions in the chain if
the vectorizable prefix is empty. Instead, we can try to vectorize smaller
chunks in order to increase the number of vector instructions generated.

Diff Detail

Event Timeline

volkan updated this revision to Diff 78847.Nov 22 2016, 4:03 AM
volkan retitled this revision from to [LoadStoreVectorizer] Split the chain if the prefix is empty.
volkan updated this object.
volkan added reviewers: jlebar, asbirlea, arsenm.
volkan added a subscriber: llvm-commits.
anna added a subscriber: anna.Nov 22 2016, 5:16 AM

Could you please add some test cases?

volkan updated this revision to Diff 78864.Nov 22 2016, 6:10 AM

Added a test case.

volkan updated this revision to Diff 78877.Nov 22 2016, 8:12 AM
volkan edited edge metadata.

Updated the test case.

jlebar added inline comments.Nov 22 2016, 9:55 AM
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
741

"less than or equal to VF" is redundant with <= VF in the code. Can we say "is too small" or "is smaller than the vector width"?

749

Don't we know a priori that Left will not vectorize, because its vectorizable prefix equals Chain's vectorizable prefix?

Which leads to another question: Why do we chop off VF elements rather than just one?

750

These return bools, so can we use ||?

test/Transforms/LoadStoreVectorizer/AMDGPU/empty-prefix.ll
4 ↗(On Diff #78877)

Can we write this comment in the form of

  1. "Check that we X"
  2. Something that does not rely on implementation details of the LSV? That is, can we avoid the term "vectorizable prefix", or, if we must use it, can we define it here?
asbirlea edited edge metadata.Nov 22 2016, 11:58 AM

The problem you found is real, but this is not the right solution for it.
The issue is in "getVectorizablePrefix". When a load that aliases the store is found, it gives up trying to find a prefix. In reality, all stores that precede the aliasing load (you can view this as a memory barrier) could be safely vectorized.
I'll work on a patch to fix this.

Side note: The testcase you added is fairly large. The problem can be showcased with a much smaller example, along the lines: 4 stores, 4 loads, 4 stores, with the 8 stores forming a chain. "getVectorizablePrefix" should not give up after the first store, but allow all 4 stores that precede the aliasing load to be vectorized.

volkan abandoned this revision.Nov 23 2016, 3:31 AM