This is an archive of the discontinued LLVM Phabricator instance.

[LAA] Function 'isStridedPtr' returns additional result “Loop *Lp” via function argument and add appropriate checks out of the 'isStridedPtr'.
ClosedPublic

Authored by roman.shirokiy on Feb 15 2016, 6:08 AM.

Details

Summary

The patch fixes PR26314. Before rL257134 there was in fact no restriction for pointer access function to be strided over innermost loop. This fix mostly keeps rL257134 logic, but relaxes "stride-over-innermost loop" condition in "AccessAnalysis::canCheckPtrAtRT" function. It allows vectorization in the case described in PR26314.

Diff Detail

Event Timeline

roman.shirokiy retitled this revision from to [LAA] Function 'isStridedPtr' returns additional result “Loop *Lp” via function argument and add appropriate checks out of the 'isStridedPtr'..
roman.shirokiy updated this object.
roman.shirokiy added reviewers: hfinkel, Ayal, anemet.
roman.shirokiy added a subscriber: llvm-commits.
Ayal edited edge metadata.Apr 5 2016, 5:57 PM

Looks ok to me; leaving for Hal or Adam to formally LGTMize.

include/llvm/Analysis/LoopAccessAnalysis.h
655–661

Preds? (Admittedly not part of your patch)

Suggest to start by describing that the function "Returns the stride of \p Ptr or zero if \p Ptr is not strided". Then add that "\p Lp is an output parameter, expected to be null on entry; if \Ptr is strided across loop L, \Lp is set to L."

If \Ptr is strided across multiple loops (as in PR26314), \Lp is set to the innermost of them, right?

Alternatively isStridedPtr() can return Lp and set the Stride as an output parameter. That however would require two checks for consecutivity.

It might have been clearer to call this getStrideOfPtr().

lib/Analysis/LoopAccessAnalysis.cpp
586

"&& Lp != nullptr &&" >> "&& Lp &&"

861–871

func. >> function

test/Analysis/LoopAccessAnalysis/stride-vectorization.ll
3 ↗(On Diff #47987)

better check for a vector type?

anemet requested changes to this revision.Apr 7 2016, 6:30 PM
anemet edited edge metadata.
anemet added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
5003–5004

Rather than complicating the interface of isStridedPtr, we should just check ScalarEvoluiton::isLoopInvariant on the SCEV of Ptr here.

test/Analysis/LoopAccessAnalysis/stride-vectorization.ll
1 ↗(On Diff #47987)

Please don't put vectorization-only tests under LAA. Those should go under Transforms/LoopVectorize.

You *should* have a test under LAA but it should then check the result of the analysis. See some examples in the directory.

There are ways to combine a single test to get tested both for vectorization and LAA. There should be examples in the LAA directory.

3 ↗(On Diff #47987)

+1

This revision now requires changes to proceed.Apr 7 2016, 6:30 PM
Ayal added inline comments.Apr 8 2016, 1:01 AM
lib/Analysis/LoopAccessAnalysis.cpp
586

Can you indeed check if ScalarEvolution::isLoopInvariant on the SCEV of Ptr here wrt TheLoop, as Adam suggested below, instead of checking if Lp->contains(TheLoop)?

anemet added inline comments.Apr 8 2016, 5:06 PM
lib/Analysis/LoopAccessAnalysis.cpp
586

I don't think this is needed after rL264243.

roman.shirokiy updated this object.
roman.shirokiy edited edge metadata.

Ayal, Adam, thanks for the review!

Please, excuse me for messing up with the function signature, but for this time I've again considered it to be the good idea, since the restriction on loop being the innermost one seem to be redundant for "isStridedPtr".

If I understood it right, "SCEVAddRecExpr::getLoop", which is called for the SCEV of the pointer with access striding over some loop, should return this particular loop, and for the "isStridedPtr" function it is correct to compute pointer stride over this loop.

I've removed "const Loop *Lp" argument from "isStridedPtr", so the "loop-strided-over" is obtained inside the function. Now there is "ScalarEvolution::isLoopInvariant" check, where it is necessary to get stride value regarding specific (innermost) loop: stride is considered to be zero for invariant pointer access.

Updated tests, now there is one for the analysis results and one for the vectorization.

Also, change of "isStridedPtr" signature made me stumble upon rL263058. With this patch it is unnecessary to pass "Loop* L" through functions affected by rL263058, but there is also suspicious condition "isStridedPtr(PSE, LoadPtr, L) != 1 || isStridedPtr(PSE, LoadPtr, L) != 1", which looks like typo (or relying on side-effects?).

LLVM build with this patch passes LIT-testing.

hfinkel added inline comments.Apr 26 2016, 10:39 AM
include/llvm/Analysis/LoopAccessAnalysis.h
655–661

It might have been clearer to call this getStrideOfPtr().

I agree, and given that this change touches all calls anyway, let's rename it to make it more clear. How about, getPtrStride()?

lib/Analysis/LoopAccessAnalysis.cpp
1623

This use of isStridedPtr also needs a loop check. If the pointer is strided w.r.t. an outer loop only, then it is not consecutive.

lib/Transforms/Scalar/LoopLoadElimination.cpp
80–81

Loop check here. If a pointer is strided only w.r.t. an outer loop, it is not unit stride here.

lib/Transforms/Vectorize/LoopVectorize.cpp
5003–5004

Maybe we should have an extra overload of the function that takes a loop to check? I think that's cleaner than forcing the caller to repeat PSE.getSE()->isLoopInvariant in each case.

roman.shirokiy edited edge metadata.

Thanks Hal for the review!

Updated diff according to Hal's corrections. However, extra overload for this function turns up as a too much of repetitive code for the sake of few different lines, but approach with Lp being nullptr seems to be clean and allows the needed change for PR26314.

roman.shirokiy marked 4 inline comments as done.Apr 30 2016, 2:12 AM

Marked comments as done.

anemet added a comment.May 2 2016, 9:59 AM

Also, change of "isStridedPtr" signature made me stumble upon rL263058. With this patch it is unnecessary to pass "Loop* L" through functions affected by rL263058, but there is also suspicious condition "isStridedPtr(PSE, LoadPtr, L) != 1 || isStridedPtr(PSE, LoadPtr, L) != 1", which looks like typo (or relying on side-effects?).

Thanks for noticing. One of them was supposed to check StorePtr. Fixed in r268251.

hfinkel accepted this revision.May 3 2016, 1:46 PM
hfinkel edited edge metadata.

LGTM

anemet requested changes to this revision.May 3 2016, 3:00 PM
anemet edited edge metadata.

Sorry for not commenting on this earlier but I didn't find the time.

I am not sure what the patch does anymore. Looks like there is a single call site were we pass in nullptr for the loop and that is the place where we should already accept loop-invariant pointers after http://reviews.llvm.org/rL264243.

Can you please summarize?

I also think that if you want to rename the function you should probably do that in a separate patch. BTW, I agree with the renaming.

Thanks,
Adam

This revision now requires changes to proceed.May 3 2016, 3:00 PM

Thanks for the review!

I'll try to describe proposed change using the following example (PR26314). Up-to-date Clang (cfe/trunk 268486) is unable to vectorize this case.

#define Z 32
typedef struct s {
	int v1[Z];
	int v2[Z];
	int v3[Z][Z];
} s;

void slow_function (s* const obj) {
    for (int j=0; j<Z; j++) {
        for (int k=0; k<Z; k++) {
            int x = obj->v1[k] + obj->v2[j];
            obj->v3[j][k] += x;
        }
    }
}

Vectorization in this case is denied due to false result "AccessAnalysis::canCheckPtrAtRT" function.

Two calls of "AccessAnalysis::canCheckPtrAtRT" are possible during loop analysis, and check of pointers strides is performed only with the second call, which occurs only in particular case of failed memory dependency check of pointers with non-constant distance between them.

"AccessAnalysis::canCheckPtrAtRT" returns false on the example because if-condition below is false due to "isStridedPtr" zero result on inner-loop-invariant pointer check w.r.t. inner loop "TheLoop".

if (hasComputableBounds(PSE, StridesMap, Ptr, TheLoop) &&
    // When we run after a failing dependency check we have to make sure
    // we don't have wrapping pointers.
    (!ShouldCheckStride ||
     isStridedPtr(PSE, Ptr, TheLoop, StridesMap) == 1)) {...}
else {
  DEBUG(dbgs() << "LAA: Can't find bounds for ptr:" << *Ptr << '\n');
  CanDoRT = false;
}

After rL264243 "hasComputableBounds" returns true in case of loop-invariant pointer (like "v2[j]" in the example), but, if I understood it correctly, current "isStridedPtr" will always return zero in this case because there is a check for equality of "TheLoop" (inner) and the loop which is obtained with SCEV of "Ptr" (outer loop for "v2[j]").

When it comes to stride check in "AccessAnalysis::canCheckPtrAtRT" we have as given that there was successful previous run (without the stride check), so either no RT check is needed at all or the pointers are already found suitable for RT check. If we only have to make sure that none of the pointers are wrapping then unit stride of the pointer w.r.t. any loop might be enough?

I also think that if you want to rename the function you should probably do that in a separate patch. BTW, I agree with the renaming.

Should I create separate diff for review?

anemet added a comment.May 4 2016, 3:05 PM

Thanks for the review!

I'll try to describe proposed change using the following example (PR26314). Up-to-date Clang (cfe/trunk 268486) is unable to vectorize this case.

#define Z 32
typedef struct s {
	int v1[Z];
	int v2[Z];
	int v3[Z][Z];
} s;

void slow_function (s* const obj) {
    for (int j=0; j<Z; j++) {
        for (int k=0; k<Z; k++) {
            int x = obj->v1[k] + obj->v2[j];
            obj->v3[j][k] += x;
        }
    }
}

Vectorization in this case is denied due to false result "AccessAnalysis::canCheckPtrAtRT" function.

Two calls of "AccessAnalysis::canCheckPtrAtRT" are possible during loop analysis, and check of pointers strides is performed only with the second call, which occurs only in particular case of failed memory dependency check of pointers with non-constant distance between them.

"AccessAnalysis::canCheckPtrAtRT" returns false on the example because if-condition below is false due to "isStridedPtr" zero result on inner-loop-invariant pointer check w.r.t. inner loop "TheLoop".

if (hasComputableBounds(PSE, StridesMap, Ptr, TheLoop) &&
    // When we run after a failing dependency check we have to make sure
    // we don't have wrapping pointers.
    (!ShouldCheckStride ||
     isStridedPtr(PSE, Ptr, TheLoop, StridesMap) == 1)) {...}
else {
  DEBUG(dbgs() << "LAA: Can't find bounds for ptr:" << *Ptr << '\n');
  CanDoRT = false;
}

After rL264243 "hasComputableBounds" returns true in case of loop-invariant pointer (like "v2[j]" in the example), but, if I understood it correctly, current "isStridedPtr" will always return zero in this case because there is a check for equality of "TheLoop" (inner) and the loop which is obtained with SCEV of "Ptr" (outer loop for "v2[j]").

When it comes to stride check in "AccessAnalysis::canCheckPtrAtRT" we have as given that there was successful previous run (without the stride check), so either no RT check is needed at all or the pointers are already found suitable for RT check. If we only have to make sure that none of the pointers are wrapping then unit stride of the pointer w.r.t. any loop might be enough?

Ah, thanks for writing this up. For some reason, I thought that we were failing somewhere else during the second phase (when we prove independence with RT checks only).

I still don't think that tweaking isStridedPtr to prove non-wrapping for more cases is the right approach. If the address is not a recurrence (e.g. just a global address) that is also trivially non-wrapping but would fail early in getPtrStride.

How about leaving isStridedPtr/getPtrStride unchanged and creating a new helper isNoWrap (similar to isNoWrapAddRec). isNoWrap would essentially be isLoopInvariant(...) || getPtrStride(...) == 1.

That would also satisfy Hal's request to hide this behind a call.

What do you think?

I also think that if you want to rename the function you should probably do that in a separate patch. BTW, I agree with the renaming.

Should I create separate diff for review?

No need to. You can just commit that part right away and then rebase this patch on top of that.

Ah, thanks for writing this up. For some reason, I thought that we were failing somewhere else during the second phase (when we prove independence with RT checks only).

I still don't think that tweaking isStridedPtr to prove non-wrapping for more cases is the right approach. If the address is not a recurrence (e.g. just a global address) that is also trivially non-wrapping but would fail early in getPtrStride.

How about leaving isStridedPtr/getPtrStride unchanged and creating a new helper isNoWrap (similar to isNoWrapAddRec). isNoWrap would essentially be isLoopInvariant(...) || getPtrStride(...) == 1.

That would also satisfy Hal's request to hide this behind a call.

What do you think?

Hi Adam!
Sorry for late response, I've been distracted due to Russian holidays.
Thanks for the suggestion! I really try grasping LAA-stuff, but my understanding of it is quite superficial. I've almost prepared updated diff, but I wonder why originally there was no check for negative unit stride?

No need to. You can just commit that part right away and then rebase this patch on top of that.

This was commited in r269020.

roman.shirokiy updated this revision to Diff 56773.EditedMay 10 2016, 11:58 AM
roman.shirokiy edited edge metadata.

Updated diff according to Adam suggestions. Also added negative unit stride check since it seems to not break anything and allows to vectorize

for (int j=0; j<Z; j++) {

for (int k=Z-1; k>=0; k--) {
    int x = obj->v1[k] + obj->v2[j];
    obj->v3[j][k] += x;
}

}

Updated diff according to Adam suggestions. Also added negative unit stride check since it seems to not break anything and allows to vectorize

Can you please not change this for now. I have been thinking about this code and I am not sure I understand the original version either.

Particularly unclear is why we only check non-wrapping when we retry proving vectorization safety with memchecks only. It seems to me that we'd have to do that for any pointer participating in memchecks.

@Hal, do you remember this? I asked Arnold and he didn't remember anymore.

This check feels parallel to the SCEV tests in isDependent because the dep distance is only correct if the pointers don't wrap. However in this case I think we're only interested if due to wrapping we need to consider the "inverse" interval (end->begin) rather than the normal (begin->end).

Updated diff according to Adam suggestions. Also added negative unit stride check since it seems to not break anything and allows to vectorize

Can you please not change this for now. I have been thinking about this code and I am not sure I understand the original version either.

Particularly unclear is why we only check non-wrapping when we retry proving vectorization safety with memchecks only. It seems to me that we'd have to do that for any pointer participating in memchecks.

@Hal, do you remember this? I asked Arnold and he didn't remember anymore.

I don't in detail, but would the SCEV simplify in the first place if the expression might wrap? When we have memchecks we force the non-wrapping, and so we need to check.

This check feels parallel to the SCEV tests in isDependent because the dep distance is only correct if the pointers don't wrap. However in this case I think we're only interested if due to wrapping we need to consider the "inverse" interval (end->begin) rather than the normal (begin->end).

Updated diff according to Adam suggestions. Also added negative unit stride check since it seems to not break anything and allows to vectorize

Can you please not change this for now. I have been thinking about this code and I am not sure I understand the original version either.

Particularly unclear is why we only check non-wrapping when we retry proving vectorization safety with memchecks only. It seems to me that we'd have to do that for any pointer participating in memchecks.

@Hal, do you remember this? I asked Arnold and he didn't remember anymore.

I don't in detail, but would the SCEV simplify in the first place if the expression might wrap? When we have memchecks we force the non-wrapping, and so we need to check.

Right but non-wrapping should be true for any memcheck not just those that we use instead of the failed SCEV-based testing. E.g. in

for (i) {

A[i] = B[i];

}

we will insert memchecks right away (i.e. ShouldCheckStride=false) but we should first make sure that the pointers don't wrap.

This check feels parallel to the SCEV tests in isDependent because the dep distance is only correct if the pointers don't wrap. However in this case I think we're only interested if due to wrapping we need to consider the "inverse" interval (end->begin) rather than the normal (begin->end).

Updated diff according to Adam suggestions. Also added negative unit stride check since it seems to not break anything and allows to vectorize

Can you please not change this for now. I have been thinking about this code and I am not sure I understand the original version either.

Particularly unclear is why we only check non-wrapping when we retry proving vectorization safety with memchecks only. It seems to me that we'd have to do that for any pointer participating in memchecks.

@Hal, do you remember this? I asked Arnold and he didn't remember anymore.

I don't in detail, but would the SCEV simplify in the first place if the expression might wrap? When we have memchecks we force the non-wrapping, and so we need to check.

Right but non-wrapping should be true for any memcheck not just those that we use instead of the failed SCEV-based testing. E.g. in

for (i) {

A[i] = B[i];

}

we will insert memchecks right away (i.e. ShouldCheckStride=false) but we should first make sure that the pointers don't wrap.

Hi Adam!

Can we check for non-wrapping only in case if we already have memchecks, but Accesses.isDependencyCheckNeeded() is somehow false? If it is true then non-wrapping is checked in isDependent, as far as I understand. If it is false then we can traverse through participating pointers and check them for non-wrapping (with isNoWrap maybe), and if any one of them wraps then report that there is unsafe dependent memory operations i.e. CanVecMem = false.

Updated diff according to Adam suggestions. Also added negative unit stride check since it seems to not break anything and allows to vectorize

Can you please not change this for now. I have been thinking about this code and I am not sure I understand the original version either.

Particularly unclear is why we only check non-wrapping when we retry proving vectorization safety with memchecks only. It seems to me that we'd have to do that for any pointer participating in memchecks.

@Hal, do you remember this? I asked Arnold and he didn't remember anymore.

I don't in detail, but would the SCEV simplify in the first place if the expression might wrap? When we have memchecks we force the non-wrapping, and so we need to check.

Right but non-wrapping should be true for any memcheck not just those that we use instead of the failed SCEV-based testing. E.g. in

for (i) {

A[i] = B[i];

}

we will insert memchecks right away (i.e. ShouldCheckStride=false) but we should first make sure that the pointers don't wrap.

Hi Adam!

Can we check for non-wrapping only in case if we already have memchecks, but Accesses.isDependencyCheckNeeded() is somehow false? If it is true then non-wrapping is checked in isDependent, as far as I understand. If it is false then we can traverse through participating pointers and check them for non-wrapping (with isNoWrap maybe), and if any one of them wraps then report that there is unsafe dependent memory operations i.e. CanVecMem = false.

Hopefully, in almost all such cases in practice, SCEV itself will be able to prove nowrap on the pointer-expression AddRec. Also, FWIW, if we need to do this, we might want to do it as part of support for alignment-based multiversioning (because in the case where the pointers are aligned, even if we wrap, the vectorized loads/stores don't cross the final page boundary, and so it is not actually a problem).

roman.shirokiy edited edge metadata.

Removed negative unit stride condition.

anemet accepted this revision.Jun 7 2016, 4:03 AM
anemet edited edge metadata.

LGTM with the improved comments in the testcases. Thanks!

test/Analysis/LoopAccessAnalysis/multiple-strides-rt-memory-checks.ll
2–3

Can you please add a bit more explanation here and in the other testcase.

Please include the C version and a few words like:

When we were retrying dependence checking with memchecks only, the loop-invariant access in the inner loop was incorrectly determined to be wrapping because it was not strided in the inner loop.
This revision is now accepted and ready to land.Jun 7 2016, 4:03 AM
This revision was automatically updated to reflect the committed changes.