This is an archive of the discontinued LLVM Phabricator instance.

[LoadStoreVectorizer] Use getMinusScev() to compute the distance between two pointers.
ClosedPublic

Authored by FarhanaAleen on Jul 18 2018, 4:06 PM.

Details

Summary

Currently, isConsecutiveAccess() detects two pointers(PtrA and PtrB) as consecutive by comparing PtrB with BaseDelta+PtrA. This works when both pointers are factorized or both of them are not factorized. But isConsecutiveAccess() fails if one of the pointers is factorized but the other one is not.

Here is an example:
PtrA = 4 * (A + B)
PtrB = 4 + 4A + 4B

This patch uses getMinusSCEV() to compute the distance between two pointers. getMinusSCEV() allows combining the expressions and computing the simplified distance.

Diff Detail

Repository
rL LLVM

Event Timeline

FarhanaAleen created this revision.Jul 18 2018, 4:06 PM
This revision is now accepted and ready to land.Jul 18 2018, 5:17 PM
This revision was automatically updated to reflect the committed changes.
reames added a subscriber: reames.Jul 19 2018, 11:38 AM

Adding optional items noticed in post commit review.

llvm/trunk/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
339

It really looks like the new check completely subsumes this check. Have you tried simplify the code to remove this case?

351

This block may also be subsumed by the subtract. If not, this would make a good bug against SCEV because it should be able to handle most any reasonable case here.

llvm/trunk/test/Transforms/LoadStoreVectorizer/AMDGPU/complex-index.ll
7

You really should be able to construct a target independent test for this. You don't need to exercise the actual vectorization, just the analysis phase. I'm not sure the code is structured to make this easy, but if it isn't, it really should be. (To be clear, this is not a must have, just a strong nice to have.)

FarhanaAleen added inline comments.Jul 19 2018, 1:56 PM
llvm/trunk/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
339

Thanks for your comments.

Yes, this check is completely subsumed by the new check. The reason I did not remove this check because I thought performing this(const + an existing SCEV) is much cheaper than the new check (which requires to create minus scev, might have to work with more expressions) and only do this new expensive check if the result can't be achieved in a cheaper way. Also, most program execution might highly likely hit the first check.

Now, I am also thinking that it's probably just one time cost since we cache the result which should be pretty negligible. I will remove this check.

351

No, this block is not subsumed by the subtract.

Currently, SCEV does not handle distribution around zext/sext. I am working on a patch that extends SCEV to handle this and remove this routine entirely from LoadStoreVectorizer.

llvm/trunk/test/Transforms/LoadStoreVectorizer/AMDGPU/complex-index.ll
7

Sounds good.

FarhanaAleen added inline comments.Jul 21 2018, 4:19 PM
llvm/trunk/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
351

In order to get rid of this block, a fundamental problem of SCEV needs to be solved which is proving a value "V" is not a poison value. Currently, this proving is supported only when V is in the header of a loop L where L repeatedly contains V. When V is not inside a loop, SCEV conservatively considers it as a poison value. Since SCEV can guarantee that V will not overflow, it conservatively drops the nsw/nuw flag.

func(%base) {
B1:

%add1 = add i32 nsw 1, %base
..

}

In the above example, there no way to guarantee that the %add1 will not overflow other than relying on the IR flag. But SCEV does not allow mapping back to the original instruction (), therefore corresponding scev expr will not contain nsw/nuw flag. This block of code is taking advantage of the IR to extract the flag information.