Currently, the vectorizer discards all of the instructions in the chain if
the vectorizable prefix is empty. Instead, we can try to vectorize smaller
chunks in order to increase the number of vector instructions generated.
Details
- Reviewers
asbirlea • tstellarAMD jlebar arsenm
Diff Detail
Event Timeline
lib/Transforms/Vectorize/LoadStoreVectorizer.cpp | ||
---|---|---|
741 | "less than or equal to VF" is redundant with <= VF in the code. Can we say "is too small" or "is smaller than the vector width"? | |
749 | Don't we know a priori that Left will not vectorize, because its vectorizable prefix equals Chain's vectorizable prefix? Which leads to another question: Why do we chop off VF elements rather than just one? | |
750 | These return bools, so can we use ||? | |
test/Transforms/LoadStoreVectorizer/AMDGPU/empty-prefix.ll | ||
5 | Can we write this comment in the form of
|
The problem you found is real, but this is not the right solution for it.
The issue is in "getVectorizablePrefix". When a load that aliases the store is found, it gives up trying to find a prefix. In reality, all stores that precede the aliasing load (you can view this as a memory barrier) could be safely vectorized.
I'll work on a patch to fix this.
Side note: The testcase you added is fairly large. The problem can be showcased with a much smaller example, along the lines: 4 stores, 4 loads, 4 stores, with the 8 stores forming a chain. "getVectorizablePrefix" should not give up after the first store, but allow all 4 stores that precede the aliasing load to be vectorized.
"less than or equal to VF" is redundant with <= VF in the code. Can we say "is too small" or "is smaller than the vector width"?