This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
../
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
LoopAccessAnalysis.h
-
lib/
-
Analysis/
12
LoopAccessAnalysis.cpp
-
Transforms/Vectorize/
-
Vectorize/
16
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
7
consec_no_gep.ll

Differential D20789

Consecutive memory access in Loop Vectorizer - fixed and simplified
ClosedPublic

Authored by delena on May 30 2016, 5:28 AM.

Download Raw Diff

Details

Reviewers

anemet
mzolotukhin
Ayal
mkuper
hfinkel

Commits

rG5f8cc0c3469b: [Loop Vectorizer] Consecutive memory access - fixed and simplified
rG9823c995bc0e: Fixed consecutive memory access detection in Loop Vectorizer. It did not handle…
rL281853: [Loop Vectorizer] Consecutive memory access - fixed and simplified
rL273257: Fixed consecutive memory access detection in Loop Vectorizer.

Summary

I fixed consecutive memory access detection in Loop Vectorizer.
It did not handle correctly cases without GEP.

I started from this loop that wasn't vectorized.

for (int i=0; i<len; i++)

*to++ = *from++;

Probably this loop should be resolved with memcpy, but it is another pass. Current loop vectorizer does not recognize consecutive memory access, because the generated code performs load/store without GEP. I use SCEV to detect consecutive access.

Diff Detail

Repository: rL LLVM

Event Timeline

delena updated this revision to Diff 58947.May 30 2016, 5:28 AM

delena retitled this revision from to Consecutive memory access in Loop Vectorizer - fixed and simplified.

delena updated this object.

delena added reviewers: hfinkel, anemet, mzolotukhin, Ayal.

delena set the repository for this revision to rL LLVM.

delena added a subscriber: llvm-commits.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptMay 30 2016, 5:28 AM

junbuml added a subscriber: junbuml.May 31 2016, 6:48 AM

mssimpso added a subscriber: mssimpso.May 31 2016, 9:58 AM

Ping.

delena added a reviewer: mkuper.Jun 5 2016, 5:39 AM

(1) Indeed *p++ should be treated just as consecutively as p[i++] by all means.

(2) Could we simply use getPtrStride(), like so:

int Stride = getPtrStride(PSE, Ptr, TheLoop, Strides);
if (Stride == 1 || Stride == -1)
  return Stride;
return 0;

making it work with specialization for unit-stride (vectorizing the load in version-mem-access.ll)?

(3) If you test for

*to++ == *from--

you'll also catch reverse consecutive, and won't compete with -loop-idiom's folding to memcpy; albeit cost will be higher.

In D20789#449385, @Ayal wrote:
(1) Indeed *p++ should be treated just as consecutively as p[i++] by all means.

(2) Could we simply use getPtrStride(), like so:
int Stride = getPtrStride(PSE, Ptr, TheLoop, Strides);
if (Stride == 1 || Stride == -1)
  return Stride;
return 0;
making it work with specialization for unit-stride (vectorizing the load in version-mem-access.ll)?

(3) If you test for
*to++ == *from--
you'll also catch reverse consecutive, and won't compete with -loop-idiom's folding to memcpy; albeit cost will be higher.

You can also do *to++ = 1 + *from++; which will avoid the memcpy idiom issue.

I used getPtrStride() as Ayal suggested. I also put extracting stride from GEP to a separate function.

Herald added a subscriber: sanjoy. · View Herald TranscriptJun 6 2016, 6:54 AM

mkuper added inline comments.Jun 6 2016, 11:26 AM

../lib/Transforms/Vectorize/LoopVectorize.cpp
2299–2300	Are you planning to call this with more complex GEPs? If not, I'm not sure I understand why we need to call getGEPInductionOperand(Gep) here. Isn't it guaranteed to return 1?
2314	The formatting here looks off.

Please find some remarks from me inline.

Tanks,
Michael

../lib/Transforms/Vectorize/LoopVectorize.cpp
2314–2315	Formatting looks weird here.
../test/Transforms/LoopVectorize/consec_no_gep.ll
7	Could you please elaborate on what do you mean by "without preceding GEP instruction"? The test below has two GEP-instructions before load and store. Is the problem in the bitcast between GEP and load/store?

mzolotukhin added inline comments.Jun 6 2016, 11:38 AM

../lib/Transforms/Vectorize/LoopVectorize.cpp
2300–2301	Can we relax this constraint? Looks like we didn't have such limitation before.

mkuper mentioned this in D20474: when calculating RegUsages, ignore instructions which are uniformed after vectorization.Jun 6 2016, 12:50 PM

Ayal added inline comments.Jun 6 2016, 11:57 PM

../lib/Transforms/Vectorize/LoopVectorize.cpp
2298–2299	It's admittedly not your original code, but while you're refactoring this "gep with all indices being invariant except last that's inductive with a step that's not originally a compile time constant but gets replaced by one due to specialization, skipping over z/sext/bitcast" stride 1 special case, should it belong to LAA's getPtrStride() rather than LVL's isConsecutivePtr()? In any case, please add documentation explaining what this is about.
2299–2300	Again, admittedly not your original code, just commenting that if Step is any constant, not only +-1, we could use it to return the desired stride instead of zero. Or perhaps we'd only be specializing for the final stride(?). Otherwise you may call this isConsecutiveGEPIndex() rather than getStrideFromGEPIndex().

delena added inline comments.Jun 7 2016, 12:17 AM

../lib/Transforms/Vectorize/LoopVectorize.cpp
2299–2300	It was original code. I assume that I have to check two things - (1) The last index in "induction". (2) All other are loop-invariant.
2299–2300	we could use it to return the desired stride instead of zero I don't want to complicate the code and make it more general since I don't see other usage for it. I can put this function back to isConsecutivePtr() if you prefer this solution.
2300–2301	You are right, but this index should be the last. Otherwise step 1 will not mean consecutive GEP. I'll fix.
../test/Transforms/LoopVectorize/consec_no_gep.ll
7	The bitcast is between "load" and "Phi". GEP does not feed load in this case.

I moved analysis of GEP with runtime index to llvm::getPtrStride().

This looks good to me (especially isConsecutivePrt() ;-)

../lib/Analysis/LoopAccessAnalysis.cpp
902–903	getGEPInductionOperand() returns the last relevant index to check; all indices following it do not affect the address; so InductionOperand should work even when it's not NumOperands - 1.
../lib/Transforms/Vectorize/LoopVectorize.cpp
2298–2301	somewhat redundant, as it's the first check in getPtrStride() as well

Fixed according to Ayal's comment.

anemet added inline comments.Jun 7 2016, 6:27 AM

../lib/Analysis/LoopAccessAnalysis.cpp
908–912	Sorry for jumping in late but I was on vacation. Elena, does your testcase actually exercise this part of the code or it's handled as a regular AddRec above? I am assuming not since the Ptr is not a GEP. Do we actually have coverage exercising this part?
../test/Transforms/LoopVectorize/consec_no_gep.ll
8	The C code and the IR do not match here. You have some casts here. Can this testcase be simplified to remove the casts?

delena added inline comments.Jun 7 2016, 6:44 AM

../lib/Analysis/LoopAccessAnalysis.cpp
908–912	The test that exercises this code is in version-mem-access.ll. This is the existing test that fails without this extension. My test case does not have GEP (but it has AddRec ) and this case wan't handled properly.
../test/Transforms/LoopVectorize/consec_no_gep.ll
8	I ran clang and printed IR before loop vectorizer. So, the code is exact here.

anemet added inline comments.Jun 7 2016, 9:04 AM

../lib/Analysis/LoopAccessAnalysis.cpp
908–912	OK and what is the SCEV that fails to become an AddRec after replaceSymbolicStrideSCEV? I am wondering if there is a fundamental reason we need getConsecutiveAccessFromGEPIndex. My guess is that it's this in getConsecutiveAccessFromGEPIndex: // Because of the multiplication by a stride we can have a s/zext cast. // We are going to replace this stride by 1 so the cast is safe to ignore. which does not seem fundamental.

sbaranga added a subscriber: sbaranga.Jun 7 2016, 10:11 AM

delena added inline comments.Jun 7 2016, 12:20 PM

../lib/Analysis/LoopAccessAnalysis.cpp
908–912	About this specific case with GEP where we need to call getConsecutiveAccessFromGEPIndex(): GEP itself is AddExpr, not AddRecExpr as expected. AddExpr is (base + MulExpr). The MulExpr is (AddRec_of_induction * sext(Stride)). We have Stride is equal to 1, but it is not compile time constant, but runtime constant. something like this: if (Stride ==1) then execute vector-loop else execute scalar-loop After call to replaceSymbolicStrideSCEV it is converted to sext(AddRecExpr_with_step_1).

anemet added inline comments.Jun 7 2016, 2:33 PM

../lib/Analysis/LoopAccessAnalysis.cpp
908–912	The sext is actually around the AddRec: (lldb) p PtrScev->dump() ((8 * (sext i32 {0,+,1}<nuw><%for.body> to i64)) + %x) @sbaranga, why can't predicated SCEV turn this into i64 {%x,+,8}?

sbaranga added inline comments.Jun 7 2016, 2:58 PM

../lib/Analysis/LoopAccessAnalysis.cpp
908–912	@anemet: It should. If it doesn't I would consider it a bug. What is the input where this is triggered? Is it the added test case? @delena: the expression for that GEP should get folded into an AddRecExpr by SCEV. I'll have a look at this tomorrow. We should avoid versioning on the stride if possible, since that removes a lot of the input space.

anemet added inline comments.Jun 7 2016, 3:07 PM

../lib/Analysis/LoopAccessAnalysis.cpp
908–912	@sbaranga, no. I applied the patch and then ran the old testcase of Transforms/LoopVectorize/version-mem-access.ll.

delena added inline comments.Jun 7 2016, 11:41 PM

../lib/Analysis/LoopAccessAnalysis.cpp
908–912	@sbaranga, this GEP may be folded to AddRecExpr by PSE. I can call to getPtrStride() with Assume=true and PSE will be invoked, but in this case 2 ARM tests fail: LLVM :: Transforms/LoopVectorize/AArch64/gather-cost.ll LLVM :: Transforms/LoopVectorize/ARM/gather-cost.ll

anemet added inline comments.Jun 8 2016, 3:27 AM

../lib/Analysis/LoopAccessAnalysis.cpp
908–912	@delena, great. Can you please look why those are failing?

sbaranga added inline comments.Jun 8 2016, 3:35 AM

../lib/Analysis/LoopAccessAnalysis.cpp
908–912	Thanks! If those tests are a problem we can fix them. We probably need to use Assume=true here anyway, otherwise this might not be correct: since the expression for the pointer is ((8 * (sext i32 {0,+,1}<nuw><%for.body> to i64)) + %x), this means that this pointer is not consecutive for trip counts greater than MAX_INT (the index would wrap). Either that or SCEV is dropping some flags.

I changed interface of getPtrStride(). Instead of one "Assume" parameter which means "additional run-time assumptions", I put 2 parameters - "Assume memory access versioning in run-time" and "Assume no-wrap in address calculation".
isConsecutivePtr() sets to "true" only the first assumption.
Now all tests work correctly.

In D20789#455936, @delena wrote:

I changed interface of getPtrStride(). Instead of one "Assume" parameter which means "additional run-time assumptions", I put 2 parameters - "Assume memory access versioning in run-time" and "Assume no-wrap in address calculation".
isConsecutivePtr() sets to "true" only the first assumption.

Why? What's wrong with calling getPtrStirde with Assume=true?

In D20789#456016, @anemet wrote:

In D20789#455936, @delena wrote:

I changed interface of getPtrStride(). Instead of one "Assume" parameter which means "additional run-time assumptions", I put 2 parameters - "Assume memory access versioning in run-time" and "Assume no-wrap in address calculation".
isConsecutivePtr() sets to "true" only the first assumption.

Why? What's wrong with calling getPtrStirde with Assume=true?

2 tests were failing. "Assume-versioning"=true is ok for isConsecutivePtr(), but Assume-no-wrap should be "false".

In D20789#456052, @delena wrote:

In D20789#456016, @anemet wrote:

In D20789#455936, @delena wrote:

I changed interface of getPtrStride(). Instead of one "Assume" parameter which means "additional run-time assumptions", I put 2 parameters - "Assume memory access versioning in run-time" and "Assume no-wrap in address calculation".
isConsecutivePtr() sets to "true" only the first assumption.

Why? What's wrong with calling getPtrStirde with Assume=true?

2 tests were failing. "Assume-versioning"=true is ok for isConsecutivePtr(), but Assume-no-wrap should be "false".

Why were the tests failing? You need to provide some explanation rather than just communicate the conclusion of your investigation.

As far as I can see both of these just add run-time checks from things that cannot be proven at compile time.

Hi Elena,

It looks like we don't care if the pointer is wrapping in isConsecutivePtr, but getPtrStride is testing it since this is required for LAA?

In that case it might be better to either factor out the wrapping checks from getPtrStride or change the interface so that getPtrStride takes an Assume parameter and a ShouldCheckWrap parameter?

-Silviu

Why were the tests failing?

When I turn on "Assume" parameter in the original version, I actually do two different things together:

Force check the predicated access
Disable wrap check in address calculation

When I do (1), I fix the original bug. But the two tests (we talked earlier) fail due to (2).
I separated (1) from (2) in the current patch and everything works fine.

In D20789#457257, @delena wrote:

Why were the tests failing?

When I turn on "Assume" parameter in the original version, I actually do two different things together:

Force check the predicated access

Disable wrap check in address calculation

When I do (1), I fix the original bug. But the two tests (we talked earlier) fail due to (2).
I separated (1) from (2) in the current patch and everything works fine.

This is still insufficient information. You're just restating the conclusion again (I already understood this part) without explaining why the test fails when run-time non-wrapping checks are allowed.

Also, in your list, #2 is not to disable wrap checks but to enable them.

Looking into one of the tests, it should fail if we end up vectorizing even though gathers for indirect pointer accesses are required. These are not available on ARM so the cost should be prohibitively high.

One reason could be that we now consider the indirect pointer access as consecutive (i.e. not requiring a gather) which is obviously wrong. I am wondering if there is a bug lurking somewhere in getPtrStride or in the test. Can you please look into it more and provide details.

In general, I think that @sbaranga's approach is likely to be the way forward (i.e. your patch is pretty much good) but the failures are still unexpected so let's first understand this part.

Thanks,
Adam

delena added inline comments.Jun 16 2016, 6:31 AM

../lib/Analysis/LoopAccessAnalysis.cpp
987	If I comment-out this line the two tests work without splitting the "Assume" parameter. Looks like the flag is written here for the pointer and is used in another place. But commenting out this line causes failure of another test.

I ran ARM/gather-cost.ll with debugger. It receives stride 3 for one of the memory accesses. I assume (I don't know ARM arch), that vectorization is possible with stride=3 under no-wrap flag for GEP's AddRec.
When we run isConsecutivePtr(.., Assume=true), we receive the same result (0), but the no-wrap flag for this pointer is being changed.
This no-wrap flag is checked later and the vectorization decision is different.

In D20789#460069, @delena wrote:

I ran ARM/gather-cost.ll with debugger. It receives stride 3 for one of the memory accesses. I assume (I don't know ARM arch), that vectorization is possible with stride=3 under no-wrap flag for GEP's AddRec.

Yes, we do have instructions on ARM that can handle strides {2,3,4}. The fix should be changing the stride to something that would require a gather operation (let's say 5).

When we run isConsecutivePtr(.., Assume=true), we receive the same result (0), but the no-wrap flag for this pointer is being changed.

Interesting, if Assume=true it should return 3? Do you why this is happening? Just from looking at the original code, is Assume=true it should return Stride not 0.

When we run isConsecutivePtr(.., Assume=true), we receive the same result (0), but the no-wrap flag for this pointer is being changed.

Interesting, if Assume=true it should return 3? Do you why this is happening? Just from looking at the original code, is Assume=true it should return Stride not 0.

Sorry, I was looking at isStridedPtr instead of isConsecutivePtr. This makes sense now.

What solution do you suggest? I see 2 options
(1) Change stride in the failing ARM tests and pass one parameter Assume=true to getPtrStride()
(2) Separate assumptions of getPtrStride(), the current version.

This is a bug-fix and I'd like to complete it.
Thank you.

In D20789#461679, @delena wrote:

What solution do you suggest? I see 2 options

I was suggesting doing the following:

In D20789#456115, @sbaranga wrote:

In that case it might be better to either factor out the wrapping checks from getPtrStride or change the interface so that getPtrStride takes an Assume parameter and a ShouldCheckWrap parameter?

To expand on this:

My preference would be to have just one Assume=true argument. However, we don't need to prove non-wrapping for isConsecutivePtr, so we should avoid the logic in getPtrStride that tries to prove non-wrapping:

only have one Assume argument to getPtrStride, and pass in a ShouldCheckWrap argument
In getPtrStride, change the logic for IsNoWrapAddRec to be:

bool IsNoWrapAddRec = !ShouldCheckWrap || PSE.hasNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW) || isNoWrapAddRec(Ptr, AR, PSE, Lp);

and invoke getPtrStride with Assume=true and ShouldCheckWrap=false.

FWIW this doesn't look like a bug fix.

In D20789#460069, @delena wrote:

I ran ARM/gather-cost.ll with debugger. It receives stride 3 for one of the memory accesses. I assume (I don't know ARM arch), that vectorization is possible with stride=3 under no-wrap flag for GEP's AddRec.
When we run isConsecutivePtr(.., Assume=true), we receive the same result (0), but the no-wrap flag for this pointer is being changed.
This no-wrap flag is checked later and the vectorization decision is different.

OK, thanks for the explanation but something is still not clear. If the stride is 3 how does that convert into isConsecutivePtr to return true? We are checking for Stride == 1 || Stride == -1 in the new version of isConsecutivePtr.

OK, thanks for the explanation but something is still not clear. If the stride is 3 how does that convert into isConsecutivePtr to return true? We are checking for Stride == 1 || Stride == -1 in the new version of isConsecutivePtr.

isConsecutivePtr() still returns 0. But it has a side effect. This line: PSE.setNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);
This NoOverlflow flag is set for the pointer (Ptr) and checked later, in another function.

In D20789#462008, @delena wrote:

OK, thanks for the explanation but something is still not clear. If the stride is 3 how does that convert into isConsecutivePtr to return true? We are checking for Stride == 1 || Stride == -1 in the new version of isConsecutivePtr.

isConsecutivePtr() still returns 0. But it has a side effect. This line: PSE.setNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);
This NoOverlflow flag is set for the pointer (Ptr) and checked later, in another function.

Ah, makes sense now. Thanks!

I've build the new patch according to Silviu's comments.
While my patch was under review, the isConsecutivePtr() function was changed, but the bug, I'm trying to fix wasn't resolved.
Please let me know if the call to getStridePtr() is still ok within the new code.

anemet added inline comments.Jun 20 2016, 9:55 AM

../lib/Transforms/Vectorize/LoopVectorize.cpp
2298–2299	Also you didn't resolve the merge conflict properly. isConsecutivePtr was just forwarding to getPtrStride in the earlier version as it should but now the entire original body has snuck back in.
2300–2302	No, this should be something like: const ValueToValueMap &Strides = getSymbolicStrides() ? *getSymbolicStrides() : ValueToValueMap(); int Stride = getPtrStride(PSE, Ptr, TheLoop, Strides, true, false);
../test/Transforms/LoopVectorize/consec_no_gep.ll
28–36	As I said earlier, the float cast is non-essiential here. Please simplify this just to use integer pointers.

delena added inline comments.Jun 20 2016, 10:20 AM

../lib/Transforms/Vectorize/LoopVectorize.cpp
2298–2299	You are right. I thought that getPtrStrides() will not work with empty strides set. But it works perfect.. Thanks.

delena updated this revision to Diff 61264.Jun 20 2016, 10:22 AM

anemet added inline comments.Jun 20 2016, 10:57 AM

../test/Transforms/LoopVectorize/consec_no_gep.ll
29–37	Elena, are you planning to address this comment as well?

Simplified the test case.

LGTM with the formatting change below.

../test/Transforms/LoopVectorize/consec_no_gep.ll
2	Space between ; and RUN

This revision is now accepted and ready to land.Jun 20 2016, 2:08 PM

Thanks! LGTM as well.

Closed by commit rL273257: Fixed consecutive memory access detection in Loop Vectorizer. (authored by delena). · Explain WhyJun 21 2016, 4:39 AM

This revision was automatically updated to reflect the committed changes.

anemet mentioned this in D21861: Extend LoopVectorizationLegality::isConsecutivePtr to handle multiple level GEPs.Jun 29 2016, 3:46 PM

This revision was reverted. I'm re-opening and continue working on on it.

This revision is now accepted and ready to land.Jul 31 2016, 4:05 AM

Hi, I have a question about LoopAccessAnalysis.

The current patch enables vectorization of this loop:

void IncrementalCopyFastPath(const char* src, char* op, int len) {
  while (len > 0) {
    *(reinterpret_cast<long long*>(op)) = *(reinterpret_cast<const long long*>(src));
    src += 8;
    op += 8;
    len -= 8;
  }
}

And the result is incorrect because the distance between "src" and "op" is not big enough.
And there is no run-time check between load and store pointers.
This is the loop before vectorization:

while.body:                                       ; preds = %while.body.preheader, %while.body
  %len.addr.07 = phi i32 [ %sub, %while.body ], [ %len, %while.body.preheader ]
  %op.addr.06 = phi i8* [ %add.ptr1, %while.body ], [ %op, %while.body.preheader ]
  %src.addr.05 = phi i8* [ %add.ptr, %while.body ], [ %src, %while.body.preheader ]
  %0 = bitcast i8* %src.addr.05 to i64*
  %1 = load i64, i64* %0, align 8
  %2 = bitcast i8* %op.addr.06 to i64*
  store i64 %1, i64* %2, align 8
  %add.ptr = getelementptr inbounds i8, i8* %src.addr.05, i64 8
  %add.ptr1 = getelementptr inbounds i8, i8* %op.addr.06, i64 8
  %sub = add nsw i32 %len.addr.07, -8
  %cmp = icmp sgt i32 %len.addr.07, 8
  br i1 %cmp, label %while.body, label %while.end.loopexit

  while.end.loopexit:                               ; preds = %while.body

I looked at MemoryDepChecker::areDepsSafe(). It checks pointers with same "Leader", like A[i] and A[i+2].
What function should check distance between A and B in order to say that A[i] = B[i] are safe for vectorization?
Thank you.

Hi Elena,

In D20789#501605, @delena wrote:

I looked at MemoryDepChecker::areDepsSafe(). It checks pointers with same "Leader", like A[i] and A[i+2].
What function should check distance between A and B in order to say that A[i] = B[i] are safe for vectorization?
Thank you.

Is that function inlined? Otherwise, the two pointers have different underlying objects. For accesses with different underlying objects, we guarantee no dependence using either alias analysis or a runtime no alias check.

This is done in AccessAnalysis::canCheckPtrAtRT (and implicitly in AccessAnalysis::processMemAccesses).

-Silviu

This is done in AccessAnalysis::canCheckPtrAtRT (and implicitly in AccessAnalysis::processMemAccesses).

AccessAnalysis::canCheckPtrAtRT() returns true and  DependentAccesses has these 2 accesses.

CanVecMem = DepChecker->areDepsSafe() returns true without checking anything

This is the code:

bool CanDoRTIfNeeded = Accesses.canCheckPtrAtRT(*PtrRtChecking, PSE->getSE(),
                                                TheLoop, SymbolicStrides);
if (!CanDoRTIfNeeded) {
  emitAnalysis(LoopAccessReport() << "cannot identify array bounds");
  DEBUG(dbgs() << "LAA: We can't vectorize because we can't find "
               << "the array bounds.\n");
  CanVecMem = false;
  return;
}

DEBUG(dbgs() << "LAA: We can perform a memory runtime check if needed.\n");

CanVecMem = true;
if (Accesses.isDependencyCheckNeeded()) {
  DEBUG(dbgs() << "LAA: Checking memory dependencies\n");
  CanVecMem = DepChecker->areDepsSafe(
      DependentAccesses, Accesses.getDependenciesToCheck(), SymbolicStrides);

wmi added a subscriber: wmi.Aug 1 2016, 8:32 PM

This is the output of LoopAccessAnalysis:

W:\ver2>build\Debug\bin\opt.exe -O2 test.ll -S -o test_opt.ll --debug-only=loop-accesses
LAA: Found a loop in _Z23IncrementalCopyFastPathPKcPci: while.body
LAA: Processing memory accesses...

AST: Alias Set Tracker: 1 alias sets for 2 pointer values.
AliasSet[0x24f30a0, 2] may alias, No access Pointers: (i64* %2, 18446744073709551615), (i64* %0, 18446744073709551615)

LAA: Accesses(2):

%2 = bitcast i8* %op.addr.06 to i64* (write)
%0 = bitcast i8* %src.addr.05 to i64* (read-only)

Underlying objects for pointer %2 = bitcast i8* %op.addr.06 to i64*

i8* %op

Underlying objects for pointer %0 = bitcast i8* %src.addr.05 to i64*

i8* %src

LAA: Found a runtime check ptr: %2 = bitcast i8* %op.addr.06 to i64*
LAA: Found a runtime check ptr: %0 = bitcast i8* %src.addr.05 to i64*
LAA: We need to do 1 pointer comparisons.
LAA: We can perform a memory runtime check if needed.
LAA: Checking memory dependencies
Total Dependences: 0
LAA: No unsafe dependent memory operations in loop. We need runtime memory checks.
LAA: Adding RT check for range:
Start: %op End: ((8 * (zext i32 ((8 + (-9 smax (-1 + (-1 * %len))) + %len) /u 8) to i64)) + %op)
LAA: Adding RT check for range:
Start: %src End: ((8 * (zext i32 ((8 + (-9 smax (-1 + (-1 * %len))) + %len) /u 8) to i64)) + %src)
LAA: Found a loop in _Z23IncrementalCopyFastPathPKcPci: while.body
LAA: Processing memory accesses...

AST: Alias Set Tracker: 1 alias sets for 2 pointer values.
AliasSet[0x24f3eb0, 2] may alias, No access Pointers: (i64* %231, 18446744073709551615), (i64* %229, 18446744073709551615)

LAA: Accesses(2):

%231 = bitcast i8* %op.addr.06 to i64* (write)
%229 = bitcast i8* %src.addr.05 to i64* (read-only)

Underlying objects for pointer %231 = bitcast i8* %op.addr.06 to i64*

i8* %op

Underlying objects for pointer %229 = bitcast i8* %src.addr.05 to i64*

i8* %src

LAA: Found a runtime check ptr: %231 = bitcast i8* %op.addr.06 to i64*
LAA: Found a runtime check ptr: %229 = bitcast i8* %src.addr.05 to i64*
LAA: We need to do 1 pointer comparisons.
LAA: We can perform a memory runtime check if needed.
LAA: Checking memory dependencies
Total Dependences: 0
LAA: No unsafe dependent memory operations in loop. We need runtime memory checks.
LAA: Found a loop in _Z23IncrementalCopyFastPathPKcPci: vector.body
LAA: SCEV could not compute the loop exit count.

In D20789#503264, @delena wrote:
This is the output of LoopAccessAnalysis:

W:\ver2>build\Debug\bin\opt.exe -O2 test.ll -S -o test_opt.ll --debug-only=loop-accesses
LAA: Found a loop in _Z23IncrementalCopyFastPathPKcPci: while.body
LAA: Processing memory accesses...
AST: Alias Set Tracker: 1 alias sets for 2 pointer values.
AliasSet[0x24f30a0, 2] may alias, No access Pointers: (i64* %2, 18446744073709551615), (i64* %0, 18446744073709551615)
LAA: Accesses(2):
%2 = bitcast i8* %op.addr.06 to i64* (write)
%0 = bitcast i8* %src.addr.05 to i64* (read-only)
Underlying objects for pointer %2 = bitcast i8* %op.addr.06 to i64*
i8* %op
Underlying objects for pointer %0 = bitcast i8* %src.addr.05 to i64*
i8* %src
LAA: Found a runtime check ptr: %2 = bitcast i8* %op.addr.06 to i64*
LAA: Found a runtime check ptr: %0 = bitcast i8* %src.addr.05 to i64*
LAA: We need to do 1 pointer comparisons.

LAA says we need a run-time alias check. Didn't we generate one?

LAA: We can perform a memory runtime check if needed.
LAA: Checking memory dependencies
Total Dependences: 0
LAA: No unsafe dependent memory operations in loop. We need runtime memory checks.
LAA: Adding RT check for range:
Start: %op End: ((8 * (zext i32 ((8 + (-9 smax (-1 + (-1 * %len))) + %len) /u 8) to i64)) + %op)
LAA: Adding RT check for range:
Start: %src End: ((8 * (zext i32 ((8 + (-9 smax (-1 + (-1 * %len))) + %len) /u 8) to i64)) + %src)
LAA: Found a loop in _Z23IncrementalCopyFastPathPKcPci: while.body
LAA: Processing memory accesses...
AST: Alias Set Tracker: 1 alias sets for 2 pointer values.
AliasSet[0x24f3eb0, 2] may alias, No access Pointers: (i64* %231, 18446744073709551615), (i64* %229, 18446744073709551615)
LAA: Accesses(2):
%231 = bitcast i8* %op.addr.06 to i64* (write)
%229 = bitcast i8* %src.addr.05 to i64* (read-only)
Underlying objects for pointer %231 = bitcast i8* %op.addr.06 to i64*
i8* %op
Underlying objects for pointer %229 = bitcast i8* %src.addr.05 to i64*
i8* %src
LAA: Found a runtime check ptr: %231 = bitcast i8* %op.addr.06 to i64*
LAA: Found a runtime check ptr: %229 = bitcast i8* %src.addr.05 to i64*
LAA: We need to do 1 pointer comparisons.
LAA: We can perform a memory runtime check if needed.
LAA: Checking memory dependencies
Total Dependences: 0
LAA: No unsafe dependent memory operations in loop. We need runtime memory checks.
LAA: Found a loop in _Z23IncrementalCopyFastPathPKcPci: vector.body
LAA: SCEV could not compute the loop exit count.

In D20789#503518, @anemet wrote:
In D20789#503264, @delena wrote:
This is the output of LoopAccessAnalysis:

W:\ver2>build\Debug\bin\opt.exe -O2 test.ll -S -o test_opt.ll --debug-only=loop-accesses
LAA: Found a loop in _Z23IncrementalCopyFastPathPKcPci: while.body
LAA: Processing memory accesses...
AST: Alias Set Tracker: 1 alias sets for 2 pointer values.
AliasSet[0x24f30a0, 2] may alias, No access Pointers: (i64* %2, 18446744073709551615), (i64* %0, 18446744073709551615)
LAA: Accesses(2):
%2 = bitcast i8* %op.addr.06 to i64* (write)
%0 = bitcast i8* %src.addr.05 to i64* (read-only)
Underlying objects for pointer %2 = bitcast i8* %op.addr.06 to i64*
i8* %op
Underlying objects for pointer %0 = bitcast i8* %src.addr.05 to i64*
i8* %src
LAA: Found a runtime check ptr: %2 = bitcast i8* %op.addr.06 to i64*
LAA: Found a runtime check ptr: %0 = bitcast i8* %src.addr.05 to i64*
LAA: We need to do 1 pointer comparisons.
LAA says we need a run-time alias check. Didn't we generate one?

LAA: We can perform a memory runtime check if needed.
LAA: Checking memory dependencies
Total Dependences: 0

No. It says " Total Dependences: 0" because it checks only pointers with the same leader, like A[i] and A[i+5].
IMO, this is a bug that should be fixed. In a separate patch, of course.

LAA: No unsafe dependent memory operations in loop. We need runtime memory checks.
LAA: Adding RT check for range:
Start: %op End: ((8 * (zext i32 ((8 + (-9 smax (-1 + (-1 * %len))) + %len) /u 8) to i64)) + %op)
LAA: Adding RT check for range:
Start: %src End: ((8 * (zext i32 ((8 + (-9 smax (-1 + (-1 * %len))) + %len) /u 8) to i64)) + %src)
LAA: Found a loop in _Z23IncrementalCopyFastPathPKcPci: while.body
LAA: Processing memory accesses...
AST: Alias Set Tracker: 1 alias sets for 2 pointer values.
AliasSet[0x24f3eb0, 2] may alias, No access Pointers: (i64* %231, 18446744073709551615), (i64* %229, 18446744073709551615)
LAA: Accesses(2):
%231 = bitcast i8* %op.addr.06 to i64* (write)
%229 = bitcast i8* %src.addr.05 to i64* (read-only)
Underlying objects for pointer %231 = bitcast i8* %op.addr.06 to i64*
i8* %op
Underlying objects for pointer %229 = bitcast i8* %src.addr.05 to i64*
i8* %src
LAA: Found a runtime check ptr: %231 = bitcast i8* %op.addr.06 to i64*
LAA: Found a runtime check ptr: %229 = bitcast i8* %src.addr.05 to i64*
LAA: We need to do 1 pointer comparisons.
LAA: We can perform a memory runtime check if needed.
LAA: Checking memory dependencies
Total Dependences: 0
LAA: No unsafe dependent memory operations in loop. We need runtime memory checks.
LAA: Found a loop in _Z23IncrementalCopyFastPathPKcPci: vector.body
LAA: SCEV could not compute the loop exit count.

In D20789#503549, @delena wrote:

No. It says " Total Dependences: 0" because it checks only pointers with the same leader, like A[i] and A[i+5].
IMO, this is a bug that should be fixed. In a separate patch, of course.

You may want to reread Silviu's comment. 'Total dependences' covers dependences that can be determined *statically* by the compiler (i.e. same underlying object). For anything else that may alias we need to add run-time alias checks to ensure no-dependence. This latter is what should have happened in this case.

I tried building with r274114 (before this was reverted), and we did generate a runtime alias check:

vector.memcheck:                                  ; preds = %min.iters.checked
  %7 = sub i32 -1, %len
  %8 = icmp sgt i32 %7, -9
  %smax9 = select i1 %8, i32 %7, i32 -9
  %9 = add i32 %len, %smax9
  %10 = add i32 %9, 8
  %11 = lshr i32 %10, 3
  %12 = zext i32 %11 to i64
  %13 = shl i64 %12, 3
  %scevgep = getelementptr i8, i8* %op, i64 %13
  %scevgep10 = getelementptr i8, i8* %src, i64 %13
  %bound0 = icmp ule i8* %op, %scevgep10
  %bound1 = icmp ule i8* %src, %scevgep
  %found.conflict = and i1 %bound0, %bound1
  %memcheck.conflict = and i1 %found.conflict, true
  %cast.crd = trunc i64 %n.vec to i32
  %14 = shl i32 %cast.crd, 3
  %ind.end = sub i32 %len, %14
  %15 = shl i64 %n.vec, 3
  %ind.end12 = getelementptr i8, i8* %op, i64 %15
  %ind.end14 = getelementptr i8, i8* %src, i64 %15
  br i1 %memcheck.conflict, label %scalar.ph, label %vector.ph

I'm not sure this runtime check is correct, though.

Ayal mentioned this in D23176: [LoopVectorizer] Fixed a bug in memory conflict run-time check.Aug 4 2016, 4:04 PM

I plan to re-commit this patch, but I'm asking to review two more lines.
(I'll explain in the next mail)

delena requested a review of this revision.Aug 30 2016, 6:44 AM

delena edited edge metadata.

delena added inline comments.

../lib/Transforms/Vectorize/LoopVectorize.cpp
2711–2712	This change I made after buildbot failed at assertion. The main problem that I wasn't able to reproduce this failure. (llvm self build under sanitizer ..). I fixed the failure by changing the first "if" from "if-induction" to "if-not-loop-invariant" and the buildbot passed all tests. Do you think that this fix is correct? May be you can tell me how to reproduce the failure? Thanks.

mssimpso added inline comments.Aug 30 2016, 8:11 AM

../lib/Transforms/Vectorize/LoopVectorize.cpp
2711–2712	Hi Elena, I'm looking at this for the first time, but it does look a little strange to me. Before your change, it looks like this case was specifically for pointer induction variables, but now it's not? If the GEP's pointer operand is not an induction variable, then for the GEP to be consecutive, the pointer operand should have to be loop invariant. We check this in isConsecutivePtr. Could this be a case of llvm::getGEPInductionOperand not agreeing with Legal->isInductionVariable? Matt.

delena added inline comments.Aug 30 2016, 1:11 PM

../lib/Transforms/Vectorize/LoopVectorize.cpp
2711–2712	getelementptr %Ptr, %Ind1 Let me consider 2 cases (A) and (B): (A) Let assume that %Ptr is not induction and not loop-invariant. It, probably, means that Ptr is not changed from iteration to iteration, but defined inside the loop. LICM can't hoist it due to some reason. %Ind1 should be induction in this case. (B) Another case, Ptr is not induction, but it changes from iteration to iteration providing stride=1 for the memory access. Ptr = induction * X, for example, and memory versioning takes the case when X == 1. %Ind1 is loop invariant. Case B seems more relevant than A.

In D20789#502014, @sbaranga wrote:

...

Is that function inlined? Otherwise, the two pointers have different underlying objects. For accesses with different underlying objects, we guarantee no dependence using either alias analysis or a runtime no alias check.

This is done in AccessAnalysis::canCheckPtrAtRT (and implicitly in AccessAnalysis::processMemAccesses).

I have a question about this (unrelated to Elena's patch; sorry to digress a little):

Looks like for the pointers that we can't anti-alias we call canCheckPtrAtRT with ShouldCheckWrap=false :

// Find pointers with computable bounds. We are going to use this information
// to place a runtime bound check.
bool CanDoRTIfNeeded = Accesses.canCheckPtrAtRT(*PtrRtChecking, PSE->getSE(),
                                                TheLoop, SymbolicStrides);

And if there is any unknown dependence distance we call canCheckPtrAtRT with ShouldCheckWrap=true:

if (!CanVecMem && DepChecker->shouldRetryWithRuntimeCheck()) {
    DEBUG(dbgs() << "LAA: Retrying with memory checks\n");
    …
    CanDoRTIfNeeded = Accesses.canCheckPtrAtRT(*PtrRtChecking, SE, TheLoop,
                                               SymbolicStrides, true);

Both cases will end up generating similar runtime pointer-range comparison… why are we not concerned with wrapping in the first?

And when ShouldCheckWrap is true: I assume the unit stride restriction in isNoWrap comes from the concern not to refer to an address beyond the location of the first element past the array boundary? Is that because we compute the pointer range as "ptr+Stride*TripCount", which might refer to "Stride" locations past the array boundary (and therefore may wrap around the address space)?

thanks!
Dorit

I simplified GEP cloning in vectorizeMemoryInstruction(). I assume that all necessary analysis is already done in isConsecutivePtr().

mkuper mentioned this in D24557: [Loop Vectorizer] Simplified GEP cloning. NFC..Sep 14 2016, 4:04 PM

Updated diff before commit.

Closed by commit rL281853: [Loop Vectorizer] Consecutive memory access - fixed and simplified (authored by delena). · Explain WhySep 18 2016, 7:04 AM

This revision was automatically updated to reflect the committed changes.

dorit mentioned this in D25276: [LoopVectorizer] Interleaved-mem-accesses analysis and getPtrStride .Oct 5 2016, 6:05 AM

Revision Contents

Path

Size

../

include/

llvm/

Analysis/

LoopAccessAnalysis.h

2 lines

lib/

Analysis/

LoopAccessAnalysis.cpp

8 lines

Transforms/

Vectorize/

LoopVectorize.cpp

86 lines

test/

Transforms/

LoopVectorize/

consec_no_gep.ll

43 lines

Diff 71748

../include/llvm/Analysis/LoopAccessAnalysis.h

	Show First 20 Lines • Show All 707 Lines • ▼ Show 20 Lines
	/// associated with \p PSE is true.			/// associated with \p PSE is true.
	///			///
	/// If necessary this method will version the stride of the pointer according			/// If necessary this method will version the stride of the pointer according
	/// to \p PtrToStride and therefore add further predicates to \p PSE.			/// to \p PtrToStride and therefore add further predicates to \p PSE.
	/// The \p Assume parameter indicates if we are allowed to make additional			/// The \p Assume parameter indicates if we are allowed to make additional
	/// run-time assumptions.			/// run-time assumptions.
	int64_t getPtrStride(PredicatedScalarEvolution &PSE, Value Ptr, const Loop Lp,			int64_t getPtrStride(PredicatedScalarEvolution &PSE, Value Ptr, const Loop Lp,
	const ValueToValueMap &StridesMap = ValueToValueMap(),			const ValueToValueMap &StridesMap = ValueToValueMap(),
	bool Assume = false);			bool Assume = false, bool ShouldCheckWrap = true);

	/// \brief Returns true if the memory operations \p A and \p B are consecutive.			/// \brief Returns true if the memory operations \p A and \p B are consecutive.
	/// This is a simple API that does not depend on the analysis pass.			/// This is a simple API that does not depend on the analysis pass.
	bool isConsecutiveAccess(Value A, Value B, const DataLayout &DL,			bool isConsecutiveAccess(Value A, Value B, const DataLayout &DL,
	ScalarEvolution &SE, bool CheckType = true);			ScalarEvolution &SE, bool CheckType = true);

	/// \brief This analysis provides dependence information for the memory accesses			/// \brief This analysis provides dependence information for the memory accesses
	/// of a loop.			/// of a loop.
	▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

../lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 881 Lines • ▼ Show 20 Lines	if (auto *OBO = dyn_cast<OverflowingBinaryOperator>(NonConstIndex))
}		}

return false;		return false;
}		}

/// \brief Check whether the access through \p Ptr has a constant stride.		/// \brief Check whether the access through \p Ptr has a constant stride.
int64_t llvm::getPtrStride(PredicatedScalarEvolution &PSE, Value *Ptr,		int64_t llvm::getPtrStride(PredicatedScalarEvolution &PSE, Value *Ptr,
const Loop *Lp, const ValueToValueMap &StridesMap,		const Loop *Lp, const ValueToValueMap &StridesMap,
bool Assume) {		bool Assume, bool ShouldCheckWrap) {
Type *Ty = Ptr->getType();		Type *Ty = Ptr->getType();
assert(Ty->isPointerTy() && "Unexpected non-ptr");		assert(Ty->isPointerTy() && "Unexpected non-ptr");

// Make sure that the pointer does not point to aggregate types.		// Make sure that the pointer does not point to aggregate types.
auto *PtrTy = cast<PointerType>(Ty);		auto *PtrTy = cast<PointerType>(Ty);
if (PtrTy->getElementType()->isAggregateType()) {		if (PtrTy->getElementType()->isAggregateType()) {
DEBUG(dbgs() << "LAA: Bad stride - Not a pointer to a scalar type" << *Ptr		DEBUG(dbgs() << "LAA: Bad stride - Not a pointer to a scalar type" << *Ptr
<< "\n");		<< "\n");
return 0;		return 0;
}		}

const SCEV *PtrScev = replaceSymbolicStrideSCEV(PSE, StridesMap, Ptr);		const SCEV *PtrScev = replaceSymbolicStrideSCEV(PSE, StridesMap, Ptr);

		AyalUnsubmitted Not Done Reply Inline Actions getGEPInductionOperand() returns the last relevant index to check; all indices following it do not affect the address; so InductionOperand should work even when it's not NumOperands - 1. Ayal: getGEPInductionOperand() returns the last relevant index to check; all indices following it do…
const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PtrScev);		const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PtrScev);
if (Assume && !AR)		if (Assume && !AR)
AR = PSE.getAsAddRec(Ptr);		AR = PSE.getAsAddRec(Ptr);

if (!AR) {		if (!AR) {
DEBUG(dbgs() << "LAA: Bad stride - Not an AddRecExpr pointer " << *Ptr		DEBUG(dbgs() << "LAA: Bad stride - Not an AddRecExpr pointer " << *Ptr
<< " SCEV: " << *PtrScev << "\n");		<< " SCEV: " << *PtrScev << "\n");
return 0;		return 0;
}		}
		anemetUnsubmitted Not Done Reply Inline Actions Sorry for jumping in late but I was on vacation. Elena, does your testcase actually exercise this part of the code or it's handled as a regular AddRec above? I am assuming not since the Ptr is not a GEP. Do we actually have coverage exercising this part? anemet: Sorry for jumping in late but I was on vacation. Elena, does your testcase actually exercise…
		delenaAuthorUnsubmitted Not Done Reply Inline Actions The test that exercises this code is in version-mem-access.ll. This is the existing test that fails without this extension. My test case does not have GEP (but it has AddRec ) and this case wan't handled properly. delena: The test that exercises this code is in version-mem-access.ll. This is the existing test that…
		anemetUnsubmitted Not Done Reply Inline Actions OK and what is the SCEV that fails to become an AddRec after replaceSymbolicStrideSCEV? I am wondering if there is a fundamental reason we need getConsecutiveAccessFromGEPIndex. My guess is that it's this in getConsecutiveAccessFromGEPIndex: // Because of the multiplication by a stride we can have a s/zext cast. // We are going to replace this stride by 1 so the cast is safe to ignore. which does not seem fundamental. anemet: OK and what is the SCEV that fails to become an AddRec after replaceSymbolicStrideSCEV? I am…
		delenaAuthorUnsubmitted Not Done Reply Inline Actions About this specific case with GEP where we need to call getConsecutiveAccessFromGEPIndex(): GEP itself is AddExpr, not AddRecExpr as expected. AddExpr is (base + MulExpr). The MulExpr is (AddRec_of_induction * sext(Stride)). We have Stride is equal to 1, but it is not compile time constant, but runtime constant. something like this: if (Stride ==1) then execute vector-loop else execute scalar-loop After call to replaceSymbolicStrideSCEV it is converted to sext(AddRecExpr_with_step_1). delena: About this specific case with GEP where we need to call getConsecutiveAccessFromGEPIndex()…
		anemetUnsubmitted Not Done Reply Inline Actions The sext is actually around the AddRec: (lldb) p PtrScev->dump() ((8 * (sext i32 {0,+,1}<nuw><%for.body> to i64)) + %x) @sbaranga, why can't predicated SCEV turn this into i64 {%x,+,8}? anemet: The sext is actually around the AddRec: (lldb) p PtrScev->dump() ((8 * (sext i32 {0,+…
		sbarangaUnsubmitted Not Done Reply Inline Actions @anemet: It should. If it doesn't I would consider it a bug. What is the input where this is triggered? Is it the added test case? @delena: the expression for that GEP should get folded into an AddRecExpr by SCEV. I'll have a look at this tomorrow. We should avoid versioning on the stride if possible, since that removes a lot of the input space. sbaranga: @anemet: It should. If it doesn't I would consider it a bug. What is the input where this is…
		anemetUnsubmitted Not Done Reply Inline Actions @sbaranga, no. I applied the patch and then ran the old testcase of Transforms/LoopVectorize/version-mem-access.ll. anemet: @sbaranga, no. I applied the patch and then ran the old testcase of…
		delenaAuthorUnsubmitted Not Done Reply Inline Actions @sbaranga, this GEP may be folded to AddRecExpr by PSE. I can call to getPtrStride() with Assume=true and PSE will be invoked, but in this case 2 ARM tests fail: LLVM :: Transforms/LoopVectorize/AArch64/gather-cost.ll LLVM :: Transforms/LoopVectorize/ARM/gather-cost.ll delena: @sbaranga, this GEP may be folded to AddRecExpr by PSE. I can call to getPtrStride() with…
		anemetUnsubmitted Not Done Reply Inline Actions @delena, great. Can you please look why those are failing? anemet: @delena, great. Can you please look why those are failing?
		sbarangaUnsubmitted Not Done Reply Inline Actions Thanks! If those tests are a problem we can fix them. We probably need to use Assume=true here anyway, otherwise this might not be correct: since the expression for the pointer is ((8 * (sext i32 {0,+,1}<nuw><%for.body> to i64)) + %x), this means that this pointer is not consecutive for trip counts greater than MAX_INT (the index would wrap). Either that or SCEV is dropping some flags. sbaranga: Thanks! If those tests are a problem we can fix them. We probably need to use Assume=true here…

// The accesss function must stride over the innermost loop.		// The accesss function must stride over the innermost loop.
if (Lp != AR->getLoop()) {		if (Lp != AR->getLoop()) {
DEBUG(dbgs() << "LAA: Bad stride - Not striding over innermost loop " <<		DEBUG(dbgs() << "LAA: Bad stride - Not striding over innermost loop " <<
Ptr << " SCEV: " << AR << "\n");		Ptr << " SCEV: " << AR << "\n");
return 0;		return 0;
}		}

// The address calculation must not wrap. Otherwise, a dependence could be		// The address calculation must not wrap. Otherwise, a dependence could be
// inverted.		// inverted.
// An inbounds getelementptr that is a AddRec with a unit stride		// An inbounds getelementptr that is a AddRec with a unit stride
// cannot wrap per definition. The unit stride requirement is checked later.		// cannot wrap per definition. The unit stride requirement is checked later.
// An getelementptr without an inbounds attribute and unit stride would have		// An getelementptr without an inbounds attribute and unit stride would have
// to access the pointer value "0" which is undefined behavior in address		// to access the pointer value "0" which is undefined behavior in address
// space 0, therefore we can also vectorize this case.		// space 0, therefore we can also vectorize this case.
bool IsInBoundsGEP = isInBoundsGep(Ptr);		bool IsInBoundsGEP = isInBoundsGep(Ptr);
bool IsNoWrapAddRec =		bool IsNoWrapAddRec = !ShouldCheckWrap \|\|
PSE.hasNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW) \|\|		PSE.hasNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW) \|\|
isNoWrapAddRec(Ptr, AR, PSE, Lp);		isNoWrapAddRec(Ptr, AR, PSE, Lp);
bool IsInAddressSpaceZero = PtrTy->getAddressSpace() == 0;		bool IsInAddressSpaceZero = PtrTy->getAddressSpace() == 0;
if (!IsNoWrapAddRec && !IsInBoundsGEP && !IsInAddressSpaceZero) {		if (!IsNoWrapAddRec && !IsInBoundsGEP && !IsInAddressSpaceZero) {
if (Assume) {		if (Assume) {
PSE.setNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);		PSE.setNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);
IsNoWrapAddRec = true;		IsNoWrapAddRec = true;
DEBUG(dbgs() << "LAA: Pointer may wrap in the address space:\n"		DEBUG(dbgs() << "LAA: Pointer may wrap in the address space:\n"
<< "LAA: Pointer: " << *Ptr << "\n"		<< "LAA: Pointer: " << *Ptr << "\n"
<< "LAA: SCEV: " << *AR << "\n"		<< "LAA: SCEV: " << *AR << "\n"
Show All 39 Lines	if (!IsNoWrapAddRec && (IsInBoundsGEP \|\| IsInAddressSpaceZero) &&
Stride != 1 && Stride != -1) {		Stride != 1 && Stride != -1) {
if (Assume) {		if (Assume) {
// We can avoid this case by adding a run-time check.		// We can avoid this case by adding a run-time check.
DEBUG(dbgs() << "LAA: Non unit strided pointer which is not either "		DEBUG(dbgs() << "LAA: Non unit strided pointer which is not either "
<< "inbouds or in address space 0 may wrap:\n"		<< "inbouds or in address space 0 may wrap:\n"
<< "LAA: Pointer: " << *Ptr << "\n"		<< "LAA: Pointer: " << *Ptr << "\n"
<< "LAA: SCEV: " << *AR << "\n"		<< "LAA: SCEV: " << *AR << "\n"
<< "LAA: Added an overflow assumption\n");		<< "LAA: Added an overflow assumption\n");
PSE.setNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);		PSE.setNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);
		delenaAuthorUnsubmitted Not Done Reply Inline Actions If I comment-out this line the two tests work without splitting the "Assume" parameter. Looks like the flag is written here for the pointer and is used in another place. But commenting out this line causes failure of another test. delena: If I comment-out this line the two tests work without splitting the "Assume" parameter. Looks…
} else		} else
return 0;		return 0;
}		}

return Stride;		return Stride;
}		}

/// Take the pointer operand from the Load/Store instruction.		/// Take the pointer operand from the Load/Store instruction.
▲ Show 20 Lines • Show All 1,098 Lines • Show Last 20 Lines

../lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,289 Lines • ▼ Show 20 Lines	for (unsigned Lane = 0; Lane < VF; ++Lane) {
auto *Mul = Builder.CreateMul(StartIdx, Step);		auto *Mul = Builder.CreateMul(StartIdx, Step);
auto *Add = Builder.CreateAdd(ScalarIV, Mul);		auto *Add = Builder.CreateAdd(ScalarIV, Mul);
Entry[Part][Lane] = Add;		Entry[Part][Lane] = Add;
}		}
}		}
VectorLoopValueMap.initScalar(EntryVal, Entry);		VectorLoopValueMap.initScalar(EntryVal, Entry);
}		}

int LoopVectorizationLegality::isConsecutivePtr(Value *Ptr) {		int LoopVectorizationLegality::isConsecutivePtr(Value *Ptr) {
assert(Ptr->getType()->isPointerTy() && "Unexpected non-ptr");
auto *SE = PSE.getSE();
// Make sure that the pointer does not point to structs.
if (Ptr->getType()->getPointerElementType()->isAggregateType())
return 0;

// If this value is a pointer induction variable, we know it is consecutive.
PHINode *Phi = dyn_cast_or_null<PHINode>(Ptr);
if (Phi && Inductions.count(Phi)) {
InductionDescriptor II = Inductions[Phi];
return II.getConsecutiveDirection();
}

GetElementPtrInst *Gep = getGEPInstruction(Ptr);
if (!Gep)
return 0;

unsigned NumOperands = Gep->getNumOperands();
Value *GpPtr = Gep->getPointerOperand();
// If this GEP value is a consecutive pointer induction variable and all of
// the indices are constant, then we know it is consecutive.
Phi = dyn_cast<PHINode>(GpPtr);
if (Phi && Inductions.count(Phi)) {

// Make sure that the pointer does not point to structs.
PointerType *GepPtrType = cast<PointerType>(GpPtr->getType());
if (GepPtrType->getElementType()->isAggregateType())
return 0;

// Make sure that all of the index operands are loop invariant.
for (unsigned i = 1; i < NumOperands; ++i)
if (!SE->isLoopInvariant(PSE.getSCEV(Gep->getOperand(i)), TheLoop))
return 0;

InductionDescriptor II = Inductions[Phi];
return II.getConsecutiveDirection();
}

unsigned InductionOperand = getGEPInductionOperand(Gep);

// Check that all of the gep indices are uniform except for our induction
// operand.
for (unsigned i = 0; i != NumOperands; ++i)
if (i != InductionOperand &&
!SE->isLoopInvariant(PSE.getSCEV(Gep->getOperand(i)), TheLoop))
return 0;

		AyalUnsubmitted Not Done Reply Inline Actions It's admittedly not your original code, but while you're refactoring this "gep with all indices being invariant except last that's inductive with a step that's not originally a compile time constant but gets replaced by one due to specialization, skipping over z/sext/bitcast" stride 1 special case, should it belong to LAA's getPtrStride() rather than LVL's isConsecutivePtr()? In any case, please add documentation explaining what this is about. Ayal: It's admittedly not your original code, but while you're refactoring this "gep with all indices…
		anemetUnsubmitted Not Done Reply Inline Actions Also you didn't resolve the merge conflict properly. isConsecutivePtr was just forwarding to getPtrStride in the earlier version as it should but now the entire original body has snuck back in. anemet: Also you didn't resolve the merge conflict properly. isConsecutivePtr was just forwarding to…
		delenaAuthorUnsubmitted Not Done Reply Inline Actions You are right. I thought that getPtrStrides() will not work with empty strides set. But it works perfect.. Thanks. delena: You are right. I thought that getPtrStrides() will not work with empty strides set. But it…
// We can emit wide load/stores only if the last non-zero index is the		const ValueToValueMap &Strides = getSymbolicStrides() ? *getSymbolicStrides() :
		mkuperUnsubmitted Not Done Reply Inline Actions Are you planning to call this with more complex GEPs? If not, I'm not sure I understand why we need to call getGEPInductionOperand(Gep) here. Isn't it guaranteed to return 1? mkuper: Are you planning to call this with more complex GEPs? If not, I'm not sure I understand why we…
		delenaAuthorUnsubmitted Not Done Reply Inline Actions It was original code. I assume that I have to check two things - (1) The last index in "induction". (2) All other are loop-invariant. delena: It was original code. I assume that I have to check two things - (1) The last index in…
		AyalUnsubmitted Not Done Reply Inline Actions Again, admittedly not your original code, just commenting that if Step is any constant, not only +-1, we could use it to return the desired stride instead of zero. Or perhaps we'd only be specializing for the final stride(?). Otherwise you may call this isConsecutiveGEPIndex() rather than getStrideFromGEPIndex(). Ayal: Again, admittedly not your original code, just commenting that if Step is any constant, not…
		delenaAuthorUnsubmitted Not Done Reply Inline Actions we could use it to return the desired stride instead of zero I don't want to complicate the code and make it more general since I don't see other usage for it. I can put this function back to isConsecutivePtr() if you prefer this solution. delena: > we could use it to return the desired stride instead of zero I don't want to complicate the…
// induction variable.		ValueToValueMap();
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Can we relax this constraint? Looks like we didn't have such limitation before. mzolotukhin: Can we relax this constraint? Looks like we didn't have such limitation before.
		delenaAuthorUnsubmitted Not Done Reply Inline Actions You are right, but this index should be the last. Otherwise step 1 will not mean consecutive GEP. I'll fix. delena: You are right, but this index should be the last. Otherwise step 1 will not mean consecutive…
		AyalUnsubmitted Not Done Reply Inline Actions somewhat redundant, as it's the first check in getPtrStride() as well Ayal: somewhat redundant, as it's the first check in getPtrStride() as well
const SCEV *Last = nullptr;
if (!getSymbolicStrides() \|\| !getSymbolicStrides()->count(Gep))
Last = PSE.getSCEV(Gep->getOperand(InductionOperand));
else {
// Because of the multiplication by a stride we can have a s/zext cast.
// We are going to replace this stride by 1 so the cast is safe to ignore.
//
// %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
// %0 = trunc i64 %indvars.iv to i32
// %mul = mul i32 %0, %Stride1
// %idxprom = zext i32 %mul to i64 << Safe cast.
// %arrayidx = getelementptr inbounds i32* %B, i64 %idxprom
//
Last = replaceSymbolicStrideSCEV(PSE, *getSymbolicStrides(),
Gep->getOperand(InductionOperand), Gep);
if (const SCEVCastExpr *C = dyn_cast<SCEVCastExpr>(Last))
Last =
(C->getSCEVType() == scSignExtend \|\| C->getSCEVType() == scZeroExtend)
? C->getOperand()
: Last;
}
if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Last)) {
const SCEV Step = AR->getStepRecurrence(SE);

// The memory is consecutive because the last index is consecutive
// and all other indices are loop invariant.
if (Step->isOne())
return 1;
if (Step->isAllOnesValue())
return -1;
}

		anemetUnsubmitted Not Done Reply Inline Actions No, this should be something like: const ValueToValueMap &Strides = getSymbolicStrides() ? getSymbolicStrides() : ValueToValueMap(); int Stride = getPtrStride(PSE, Ptr, TheLoop, Strides, true, false); anemet:* No, this should be something like: const ValueToValueMap &Strides = getSymbolicStrides() ?
		int Stride = getPtrStride(PSE, Ptr, TheLoop, Strides, true, false);
		if (Stride == 1 \|\| Stride == -1)
		return Stride;
return 0;		return 0;
}		}

bool LoopVectorizationLegality::isUniform(Value *V) {		bool LoopVectorizationLegality::isUniform(Value *V) {
return LAI->isUniform(V);		return LAI->isUniform(V);
}		}

const InnerLoopVectorizer::VectorParts &		const InnerLoopVectorizer::VectorParts &
InnerLoopVectorizer::getVectorValue(Value *V) {		InnerLoopVectorizer::getVectorValue(Value *V) {
		mkuperUnsubmitted Not Done Reply Inline Actions The formatting here looks off. mkuper: The formatting here looks off.
assert(V != Induction && "The new induction variable should not be used.");		assert(V != Induction && "The new induction variable should not be used.");
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Formatting looks weird here. mzolotukhin: Formatting looks weird here.
assert(!V->getType()->isVectorTy() && "Can't widen a vector");		assert(!V->getType()->isVectorTy() && "Can't widen a vector");
assert(!V->getType()->isVoidTy() && "Type does not produce a value");		assert(!V->getType()->isVoidTy() && "Type does not produce a value");

// If we have a stride that is replaced by one, do it here.		// If we have a stride that is replaced by one, do it here.
if (Legal->hasStride(V))		if (Legal->hasStride(V))
V = ConstantInt::get(V->getType(), 1);		V = ConstantInt::get(V->getType(), 1);

// If we have this scalar in the map, return it.		// If we have this scalar in the map, return it.
▲ Show 20 Lines • Show All 379 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr) {
// Determine if either a gather or scatter operation is legal.		// Determine if either a gather or scatter operation is legal.
bool CreateGatherScatter =		bool CreateGatherScatter =
!ConsecutiveStride && Legal->isLegalGatherOrScatter(Instr);		!ConsecutiveStride && Legal->isLegalGatherOrScatter(Instr);

VectorParts VectorGep;		VectorParts VectorGep;

// Handle consecutive loads/stores.		// Handle consecutive loads/stores.
GetElementPtrInst *Gep = getGEPInstruction(Ptr);		GetElementPtrInst *Gep = getGEPInstruction(Ptr);
if (ConsecutiveStride) {		if (ConsecutiveStride) {
if (Gep) {		if (Gep) {
		delenaAuthorUnsubmitted Not Done Reply Inline Actions This change I made after buildbot failed at assertion. The main problem that I wasn't able to reproduce this failure. (llvm self build under sanitizer ..). I fixed the failure by changing the first "if" from "if-induction" to "if-not-loop-invariant" and the buildbot passed all tests. Do you think that this fix is correct? May be you can tell me how to reproduce the failure? Thanks. delena: This change I made after buildbot failed at assertion. The main problem that I wasn't able to…
		mssimpsoUnsubmitted Not Done Reply Inline Actions Hi Elena, I'm looking at this for the first time, but it does look a little strange to me. Before your change, it looks like this case was specifically for pointer induction variables, but now it's not? If the GEP's pointer operand is not an induction variable, then for the GEP to be consecutive, the pointer operand should have to be loop invariant. We check this in isConsecutivePtr. Could this be a case of llvm::getGEPInductionOperand not agreeing with Legal->isInductionVariable? Matt. mssimpso: Hi Elena, I'm looking at this for the first time, but it does look a little strange to me.
		delenaAuthorUnsubmitted Not Done Reply Inline Actions getelementptr %Ptr, %Ind1 Let me consider 2 cases (A) and (B): (A) Let assume that %Ptr is not induction and not loop-invariant. It, probably, means that Ptr is not changed from iteration to iteration, but defined inside the loop. LICM can't hoist it due to some reason. %Ind1 should be induction in this case. (B) Another case, Ptr is not induction, but it changes from iteration to iteration providing stride=1 for the memory access. Ptr = induction * X, for example, and memory versioning takes the case when X == 1. %Ind1 is loop invariant. Case B seems more relevant than A. delena: getelementptr %Ptr, %Ind1 Let me consider 2 cases (A) and (B): (A) Let assume that %Ptr is…
unsigned NumOperands = Gep->getNumOperands();		unsigned NumOperands = Gep->getNumOperands();
#ifndef NDEBUG		#ifndef NDEBUG
// The original GEP that identified as a consecutive memory access		// The original GEP that identified as a consecutive memory access
// should have only one loop-variant operand.		// should have only one loop-variant operand.
unsigned NumOfLoopVariantOps = 0;		unsigned NumOfLoopVariantOps = 0;
for (unsigned i = 0; i < NumOperands; ++i)		for (unsigned i = 0; i < NumOperands; ++i)
if (!PSE.getSE()->isLoopInvariant(PSE.getSCEV(Gep->getOperand(i)),		if (!PSE.getSE()->isLoopInvariant(PSE.getSCEV(Gep->getOperand(i)),
OrigLoop))		OrigLoop))
Show All 13 Lines	#endif
// operand.		// operand.
Gep2->setOperand(i, getScalarValue(Gep->getOperand(i),		Gep2->setOperand(i, getScalarValue(Gep->getOperand(i),
0, /* First unroll iteration */		0, /* First unroll iteration */
0 /* 0-lane of the vector */ ));		0 /* 0-lane of the vector */ ));
setDebugLocFromInst(Builder, Gep);		setDebugLocFromInst(Builder, Gep);
Ptr = Builder.Insert(Gep2);		Ptr = Builder.Insert(Gep2);

} else { // No GEP		} else { // No GEP
// Use the induction element ptr.
assert(isa<PHINode>(Ptr) && "Invalid induction ptr");
setDebugLocFromInst(Builder, Ptr);		setDebugLocFromInst(Builder, Ptr);
Ptr = getScalarValue(Ptr, 0, 0);		Ptr = getScalarValue(Ptr, 0, 0);
}		}
} else {		} else {
// At this point we should vector version of GEP for Gather or Scatter		// At this point we should vector version of GEP for Gather or Scatter
assert(CreateGatherScatter && "The instruction should be scalarized");		assert(CreateGatherScatter && "The instruction should be scalarized");
if (Gep) {		if (Gep) {
// Vectorizing GEP, across UF parts. We want to get a vector value for base		// Vectorizing GEP, across UF parts. We want to get a vector value for base
▲ Show 20 Lines • Show All 4,410 Lines • Show Last 20 Lines

../test/Transforms/LoopVectorize/consec_no_gep.ll

				;RUN: opt < %s -loop-vectorize -force-vector-width=4 -force-vector-interleave=1 -instcombine -S \| FileCheck %s

				anemetUnsubmitted Not Done Reply Inline Actions Space between ; and RUN anemet: Space between ; and RUN
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				;; Check consecutive memory access without preceding GEP instruction

				mzolotukhinUnsubmitted Not Done Reply Inline Actions Could you please elaborate on what do you mean by "without preceding GEP instruction"? The test below has two GEP-instructions before load and store. Is the problem in the bitcast between GEP and load/store? mzolotukhin: Could you please elaborate on what do you mean by "without preceding GEP instruction"? The test…
				delenaAuthorUnsubmitted Not Done Reply Inline Actions The bitcast is between "load" and "Phi". GEP does not feed load in this case. delena: The bitcast is between "load" and "Phi". GEP does not feed load in this case.
				; for (int i=0; i<len; i++) {
				anemetUnsubmitted Not Done Reply Inline Actions The C code and the IR do not match here. You have some casts here. Can this testcase be simplified to remove the casts? anemet: The C code and the IR do not match here. You have some casts here. Can this testcase be…
				delenaAuthorUnsubmitted Not Done Reply Inline Actions I ran clang and printed IR before loop vectorizer. So, the code is exact here. delena: I ran clang and printed IR before loop vectorizer. So, the code is exact here.
				; to++ = from++;
				; }

				; CHECK-LABEL: @consecutive_no_gep(
				; CHECK: vector.body
				; CHECK: %[[index:.*]] = phi i64 [ 0, %vector.ph ]
				; CHECK: getelementptr float, float* %{{.*}}, i64 %[[index]]
				; CHECK: load <4 x float>

				define void @consecutive_no_gep(float* noalias nocapture readonly %from, float* noalias nocapture %to, i32 %len) #0 {
				entry:
				%cmp2 = icmp sgt i32 %len, 0
				br i1 %cmp2, label %for.body.preheader, label %for.end

				for.body.preheader: ; preds = %entry
				br label %for.body

				for.body: ; preds = %for.body.preheader, %for.body
				%i.05 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
				%from.addr.04 = phi float* [ %incdec.ptr, %for.body ], [ %from, %for.body.preheader ]
				%to.addr.03 = phi float* [ %incdec.ptr1, %for.body ], [ %to, %for.body.preheader ]
				%incdec.ptr = getelementptr inbounds float, float* %from.addr.04, i64 1
				%val = load float, float* %from.addr.04, align 4
				%incdec.ptr1 = getelementptr inbounds float, float* %to.addr.03, i64 1
				store float %val, float* %to.addr.03, align 4
				%inc = add nsw i32 %i.05, 1
				%cmp = icmp slt i32 %inc, %len
				br i1 %cmp, label %for.body, label %for.end.loopexit
				anemetUnsubmitted Not Done Reply Inline Actions As I said earlier, the float cast is non-essiential here. Please simplify this just to use integer pointers. anemet: As I said earlier, the float cast is non-essiential here. Please simplify this just to use…

				anemetUnsubmitted Not Done Reply Inline Actions Elena, are you planning to address this comment as well? anemet: Elena, are you planning to address this comment as well?
				for.end.loopexit: ; preds = %for.body
				br label %for.end

				for.end: ; preds = %for.end.loopexit, %entry
				ret void
				}