Page MenuHomePhabricator

[LAA] Allow more run-time alias checks by coercing pointer expressions to AddRecExprs
ClosedPublic

Authored by sbaranga on Feb 10 2016, 9:41 AM.

Details

Summary

LAA can only emit run-time alias checks for pointers with affine AddRec
SCEV expressions. However, non-AddRecExprs can be now be converted to
affine AddRecExprs using SCEV predicates.

This change tries to add the minimal set of SCEV predicates in order
to enable run-time alias checking.

Event Timeline

sbaranga updated this revision to Diff 47472.Feb 10 2016, 9:41 AM
sbaranga retitled this revision from to [LAA] Allow more run-time alias checks by coercing pointer expressions to AddRecExprs.
sbaranga updated this object.
sbaranga added reviewers: anemet, mzolotukhin.
sbaranga added a subscriber: llvm-commits.

We have a dependency on http://reviews.llvm.org/D17078 (without it the compiler crashes while trying to build 464.h264ref in spec2006).

sbaranga updated this revision to Diff 56906.May 11 2016, 6:58 AM

This rebases the patch and moves the lambda to a separate method.

sbaranga updated this revision to Diff 59901.Jun 7 2016, 9:31 AM

Rebase the patch since it wasn't applying to ToT anymore.

dorit added a subscriber: dorit.Feb 13 2017, 5:46 AM
dorit added a comment.Feb 13 2017, 5:54 AM

This patch still applies almost entirely as is (only needs a tiny update in the testcase), and seems to have positive performance effect on a couple benchmarks.
It is also a prerequisite for PR30654.
Lets revive this review!

dorit added a subscriber: Ayal.
dorit added inline comments.Feb 21 2017, 12:41 AM
lib/Analysis/LoopAccessAnalysis.cpp
537

I just have a small suggestion, to maybe change "Force" to "Assume", just because "Force" here has the same effect as "Assume" in the getPtrStride API, namely to allow adding new runtime tests. (Right?). But if you prefer Force that's fine with me too.

Ayal added inline comments.Mar 1 2017, 3:09 AM
test/Analysis/LoopAccessAnalysis/memcheck-wrapping-pointers.ll
6

"i can ove[r]flow": better clarify which i can overflow - in the test below the induction variable of the loop i is a signed 64bit (as is the bound 'len') whose bump has an nsw, so it is free of overflow concerns. The indices of A[i] and A[i+1] are (signed or unsigned?) 32bit idx and idx.inc, whose zext-add-trunc bump has no nuw nor nsw and are therefore subject to overflow concerns. Is it clear why the Added "SCEV assumptions" Flags for these should be <nssw>, rather than <nusw>? Starting from 0 and 1, nssw is more conservative.

"When len has a type": these assumptions are needed regardless of the type of len.

10

Suggest to clarify the corresponding C code using both i and idx.

12

"emmit" >> "emit", multiple occurrences.

20

Why sext's and not zext's?

22

"We" >> "The".

Hopefully the transformed expressions are

i32 {0,+,1}
i32 {1,+,1}

based on the added flags, right?

56

It doesn't really matter if we zext or sext here, right?
Or should zext indicate the type of idx was originally unsigned?

140

See comments above.

sbaranga added inline comments.Mar 7 2017, 3:05 AM
test/Analysis/LoopAccessAnalysis/memcheck-wrapping-pointers.ll
6

Thanks for the observations. There is a discrepancy here between the C code and the IR being tested (and the text is misleading). We are actually looking at the expressions for %inc.ptr0 and %inc.ptr1.

The sign extension comes from the getelementptr instruction. Because the SCEV expression that we are trying to linearize is something like (sext i32 {0,+,1}<%for.body> to i64), we can only add the nssw flag.

I'll update the text to clarify things.

22

i64 should be correct. We're essentially sinking the sext to get a i64 linear expression.

56

Correct (for both).

sbaranga updated this revision to Diff 91031.Mar 8 2017, 9:16 AM

Renaming Force to Assume.

The tests have been re-written, with the test IR coming directly from C so it should now match the C code.

I've removed one of tests which was only testing that this also works for NSW.
This functionality is already covered by other tests and doesn't show up often from C.

Ayal added a comment.Apr 12 2017, 4:10 PM

This looks ok to me, added only minor comments, but it should be approved by someone who understands all this better...

IIUC the current scheme originally computes CanDoRT and NeedRTCheck "independently", or rather intertwinedly, checking first if CanDoRT without Assume in parallel to determining if NeedRTCheck, and if the latter is true revisit if CanDoRT with Assume as needed by Retries. A simpler albeit possibly slower approach is to first determine separately if NeedRTCheck, returning true if not; else check if CanDoRT with Assume. But Need[RT]Check seems to depend on CanDo[AliasSet]RT, and we may return false if !NeedRTCheck when there are incomparable pointers of distinct address spaces. The use of "isDependencyCheckNeeded()" also raises an eyebrow, or two.

lib/Analysis/LoopAccessAnalysis.cpp
645–648

The tests added below check this other added condition, namely PSE.hasNoOverflow(), right?

658

IsDepCheckNeeded was originally taken out of the AS : AST loop.
Here you can simply ask "if(isDependencyCheckNeeded())" later.

660

IsWrite is needed only later by RtCheck.insert(), can move it there.

749

This goes back to the original code:
it may be slightly easier to always bump ++NumTotalPtrChecks instead of ++NumReadPtrChecks and set

NeedsCheck = (NumTotalPtrChecks >= 2 && NumWritePtrChecks >= 1);

But not sure I follow the original logic here; it's equivalent to doing:

bool NeedsCheck = (NumTotalPtrChecks >= 2 && NumWritePtrChecks >= 1);
if (IsDepCheckNeeded && CanDoAliasSetRT && RunningDepId == 2)
  NeedsCheck = false;

Having CanDoAliasSetRT && RunningDepId == 2 implies that NumTotalPtrChecks == 2, but why set NeedsCheck to false when this happens and IsDepCheckNeeded? What if NumWritePtrChecks >= 1?

May want to rename NeedsCheck >> NeedsAliasSetRTCheck, analogous to the CanDo's.

795

ok, but what if !NeedRTCheck?

test/Analysis/LoopAccessAnalysis/memcheck-wrapping-pointers.ll
6

Thanks! This looks fine to me.

25

the second one should be:
; i64 {(%b),+,4}<%for.body>

76

ove[r]flow

Ayal added a comment.Apr 18 2017, 3:33 AM
In D17080#725468, @Ayal wrote:

This looks ok to me, added only minor comments, but it should be approved by someone who understands all this better...

@mkuper, @hfinkel - can you help out here?

dorit added a comment.May 18 2017, 1:56 AM

I think @anemet has reviewed patches in this part of the vectorizer… @anemet, would you be able to please take a look?

(it's not a big patch: the heart of the patch is just one line: adding a call to
PSE.getAsAddRec(Ptr) under "if (Assume)" in hasComputeableBounds()
(which would make us more aggressive in adding runtime checks).
All the rest is an attempt to minimize the cases in which we pass Assume=true…)

Ayal added a subscriber: mssimpso.May 29 2017, 12:31 AM
In D17080#729013, @Ayal wrote:
In D17080#725468, @Ayal wrote:

This looks ok to me, added only minor comments, but it should be approved by someone who understands all this better...

@mkuper, @hfinkel - can you help out here?

@anemet , @mssimpso - can you help out here?

Ayal added a comment.Aug 20 2017, 4:10 PM

@sbaranga, can you clarify my comments and help me understand this better? Hopefully this could move forward.

Sorry for not replying earlier. It looks like there are only minor changes left? I plan to push an update after this gets unblocked.

It's also a little bit odd that I haven't been able to get a definitive review for this. It would be good to know if there's something fundamentally wrong with the approach (at least we could start thinking about workarounds).

lib/Analysis/LoopAccessAnalysis.cpp
645–648

Correct.

749

I've tried to preserve the original logic, so if there are any issues with the original logic I've also pulled them in. The only difference here being that we're doing the same logic per alias set.

It is possible that the logic can be simplified.

RunningDepId == 2 means that we only have one dependence set. There a comment above " But there is no need to checks if there is only one dependence set for this alias set.". If I remember correctly this covers cases such as:

for (int i = 0; i < n; ++i)

a[i] = a[i] + 1;

where we don't need any checks.

For renaming NeedsCheck >> NeedsAliasSetRTCheck: I have no objections.

795

If !NeedRTCheck I don't think we need to do this check.

However, the old code was doing it, so I left it in.

hfinkel added inline comments.Aug 21 2017, 1:42 PM
lib/Analysis/LoopAccessAnalysis.cpp
614–616

Please add a comment on what Assume means in practice for this function.

754

Can you extend this comment to note why, even if they failed previously, they might succeed now.

sbaranga updated this revision to Diff 112377.Aug 23 2017, 8:54 AM

Address comments received so far.

hfinkel accepted this revision.Sep 2 2017, 5:23 PM

LGTM

lib/Analysis/LoopAccessAnalysis.cpp
740

Please note here that this is the RunningDepId == 2 check below.

This revision is now accepted and ready to land.Sep 2 2017, 5:23 PM
sbaranga closed this revision.Sep 12 2017, 12:49 AM

Thanks! Committed in r313012.

anna added a subscriber: anna.Sep 13 2017, 7:02 AM

Hi Silviu,
I came across the similar gap, but I aggressively tried converting the PtrSCEV to AddRec in hasComputableBounds, which worked on our internal benchmark. Could you please clarify why you have the Assume to be false in the first call, i.e. how do we know that it's not useful to try and convert PtrSCEV to AddRec?

This was my patch over the original code.

diff --git a/lib/Analysis/LoopAccessAnalysis.cpp b/lib/Analysis/LoopAccessAnalysis.cpp
index d2dbecd..a1c7584 100644
--- a/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/lib/Analysis/LoopAccessAnalysis.cpp
@@ -608,6 +608,11 @@ static bool hasComputableBounds(PredicatedScalarEvolution &PSE,
     return true;
 
   const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PtrScev);
+  // Sometimes the PtrSCEV can be converted into the AddRec, try to retrieve it
+  // if possible.
+  if (!AR)
+    AR = PSE.getAsAddRec(Ptr);
+    if (!AR)
         return false;

Hi Anna,

In D17080#869566, @anna wrote:

Hi Silviu,
I came across the similar gap, but I aggressively tried converting the PtrSCEV to AddRec in hasComputableBounds, which worked on our internal benchmark. Could you please clarify why you have the Assume to be false in the first call, i.e. how do we know that it's not useful to try and convert PtrSCEV to AddRec?

This was my patch over the original code.

diff --git a/lib/Analysis/LoopAccessAnalysis.cpp b/lib/Analysis/LoopAccessAnalysis.cpp
index d2dbecd..a1c7584 100644
--- a/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/lib/Analysis/LoopAccessAnalysis.cpp
@@ -608,6 +608,11 @@ static bool hasComputableBounds(PredicatedScalarEvolution &PSE,
     return true;
 
   const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PtrScev);
+  // Sometimes the PtrSCEV can be converted into the AddRec, try to retrieve it
+  // if possible.
+  if (!AR)
+    AR = PSE.getAsAddRec(Ptr);
+    if (!AR)
         return false;

We only need to convert to an AddRec if we need to emit the runtime checks (see the comment above the original logic for computing NeedRTCheck).

One example would be if we only have reads in the loop. If we convert in hasComputableBounds we end up having unnecessary versioning (at least with regards to the dependence analysis).

Would the current solution also work in your internal benchmark? If not it might be because something else would need the expression to be an AddRec?

Thanks,
Silviu

anna added a comment.Sep 13 2017, 8:03 AM

Hi Silviu,

We only need to convert to an AddRec if we need to emit the runtime checks (see the comment above the original logic for computing NeedRTCheck).

One example would be if we only have reads in the loop. If we convert in hasComputableBounds we end up having unnecessary versioning (at least with regards to the dependence analysis).

Thanks for the clarification. makes sense.

Would the current solution also work in your internal benchmark? If not it might be because something else would need the expression to be an AddRec?

Yes, I tried your patch and it works. I was just curious why we needed all the extra code rather than always checking if we could convert to an AddRec.