This is an archive of the discontinued LLVM Phabricator instance.

[SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection
ClosedPublic

Authored by sbaranga on Dec 10 2015, 3:56 AM.

Details

Summary

This change adds no wrap SCEV predicates with:

  • support for runtime checking
  • support for expression rewriting: (sext ({x,+,y}) -> {sext(x),+,sext(y)} (zext ({x,+,y}) -> {zext(x),+,sext(y)}

Note that we are sign extending the increment of the SCEV, even for
the zext case. This is needed to cover the fairly common case where y would
be a (small) negative integer. In order to do this, this change adds two new
flags: nusw and nssw that are applicable to AddRecExprs and permit the
transformations above.

We also change isStridedPtr in LAA to be able to make use of
these predicates. With this feature we should now always be able to
work around overflow issues in the dependence analysis.

Diff Detail

Event Timeline

sbaranga updated this revision to Diff 42410.Dec 10 2015, 3:56 AM
sbaranga retitled this revision from to [SCEV][LAA] Add no overflow SCEV predicates and use use them to improve strided pointer detection.
sbaranga updated this object.
sbaranga added reviewers: anemet, mzolotukhin, sanjoy.
sanjoy requested changes to this revision.Dec 10 2015, 4:50 PM
sanjoy edited edge metadata.

(Reviewed the SCEV bits)

include/llvm/Analysis/ScalarEvolution.h
273

Nit: notation is {a,+,b}.

286

Nit: I'd return an SCEVAddRecExpr *.

lib/Analysis/ScalarEvolution.cpp
9596

Minor nit: I'd use auto *OF here.

9663

Please either directly pass in a SCEVAddRecExpr or use a cast<> instead of a static_cast<>.

9726

Minor: I'd just put this definition in the header.

9729

I don't think you need to dyn_cast<> to const T. If you're casting from a const pointer, you'll automatically get a const pointer.

9876

Please use cast<SCEVAddRecExpr>

lib/Analysis/ScalarEvolutionExpander.cpp
1964

There is an Instruction::getModule

1986

I think you need to always zero extend the exit count. For instance, if you have a loop with an exit count of i8 255 then i16 {0x7fff,+,1} will sign overflow, even though 0x7fff + (1 * (-1)) will not sign overflow in either the * or the +.

1999

Please assert that AR is affine. Also, it would be easier to read if you extracted out const SCEV *Step = AR->getOperand(1) and used Step here instead.

2021

I don't think this is needed -- the trip count is interpreted as an unsigned value, so you should just need to check that it is ule iDstBits -1.

2044

Minor nit: please call this "sadd" or "uadd" depending on AddF (or just call it "add").

This revision now requires changes to proceed.Dec 10 2015, 4:50 PM
sanjoy added inline comments.Dec 10 2015, 11:37 PM
lib/Analysis/ScalarEvolutionExpander.cpp
2046

Actually, now that I think of it, I don't think this scheme will work for nsw. E.g. if AR is i8 {0,+,1} and TripCount is i8 255 == i8 -1 then neither (i8 1) * (i8 255) == i8 -1 nor 0 + (-1) will sign overflow, while the add recurrence clearly does sign overflow.

I think the most obvious way to check for no nsw is to check sext(Start + TripCount * Step) == sext(Start) + zext(TripCount) * sext(Step), where you extend to 2 * Bitwidth(Start). There could also potentially be a more efficient scheme, but I don't have one with me right now.

sbaranga added inline comments.Dec 11 2015, 6:16 AM
lib/Analysis/ScalarEvolutionExpander.cpp
2046

Thanks for the catch! This is indeed a problem. I'll think about some more alternatives to this.

sbaranga added inline comments.Dec 14 2015, 4:05 AM
include/llvm/Analysis/ScalarEvolution.h
286

This is part of the SCEVPredicate interface (and is missing an override), so it has to return a a const SCEV *.

lib/Analysis/ScalarEvolutionExpander.cpp
2046

We might be able to check this without extending by writing TripCount as MAX_INT + x if it larger than MAX_INT,
and doing the check twice? I don't know if this will be more efficient.

This also got me thinking about negative increments as well, and I think that there is an issue with the nuw flag and negative increments: they will always wrap.

We actually want a property that will allow us to sign extend the increment for both the nuw and nsw cases:

zext({a,+,b}) -> {zext(a),+,sext(b)}
sext({a,+,b}) -> {sext(a),+,sext(b)}

We should define separate flags for this property, as the unsigned case doesn't have the same semantics as the nuw flag. I think we should be able to emit checks for this.

sbaranga updated this revision to Diff 42730.Dec 14 2015, 9:28 AM
sbaranga edited edge metadata.

Renamed SCEVAddRecOverflowPredicate to SCEVWrapPredicate.

SCEVWrapPredicate now has its own flags for NUW/NSW, with
a slightly different meaning of the NUW flag from SCEV.
These flags allow us to do the transformation:

zext({a,+,b}) -> {zext(a),+,sext(b)}
sext({a,+,b}) -> {sext(a),+,sext(b)}

Updated the predicate checking code to use the Sanjoy's
method of checking:

  • we extend to a large enough type to remove all possible overflows
  • we check that the the final value (without any overflow) is what we would get if we would compute the expression and then do the extend.

This should also address the remaining review comments.

lib/Analysis/ScalarEvolution.cpp
9726

This is a bit difficult. AR is a SCEVAddRecExpr and not a SCEV, and therefore moving this to the header would require casting.

sbaranga updated this revision to Diff 42848.Dec 15 2015, 6:53 AM
sbaranga edited edge metadata.

Further cleanups: this removes all the remaining occurrences of "AddRecOverflow" and replace them with "Wrap".

Add NoWrapFlags manipulation functions to SCEVWrapPredicate (similar to ScalarEvolution).

Add a getImpliedFlags static method to SCEVWrapPredicate, which deduces the statically
implied flags: the NSW flag can be taken directly from AddRecExprs. The NUW flag can be
transfered only if the the increment of the AddRecExpr is constant and positive).

sanjoy requested changes to this revision.Dec 16 2015, 11:11 PM
sanjoy edited edge metadata.
sanjoy added inline comments.
include/llvm/Analysis/ScalarEvolution.h
277

I'm still reading through the patch; but if you must introduce a new concept to replace nuw, it should be called something else to avoid confusion.

279

Do you mean zext(a + b) != ...?

281

I'd be a lot more explicit here. What (I think) you're saying is that

0 <= F(a) + F(b) < 2^n

where the inequalities and arithmetic above are normal inequalities and arithmetic over integers, and F maps an n bit tuple to the integer it represents in twos complement.

Is that correct?

288

Very minor, but why not NoWrapMask = FlagNUW | FlagNSW?

294

Minor, but what about doing some sanity checking on Mask here?

300

Same here and below on clearFlags: some minor sanity checking here would be nice.

319

explicit?

1478

Will clients ever try to RAUW / delete Value s from under this map. If not, this should be changed to use an AssertingVH; otherwise it needs to use a CallbackVH like SCEV.

1483

Minor: can you please check if it possible to make this a const reference?

lib/Analysis/ScalarEvolution.cpp
9608

Please add a comment on what Assume means, and use a more descriptive name if possible.

9668

Why not just return P.implies(A);?

9733

Why not check if Op->Flags is a superset of Flags?

9736

Might also want to return true if Flags == NSW && AR->hasNSW().

9761

This doesn't look correct? You probably want something like Step->getValue()->getValue().isNonNegative().

9892

Please use cast<>

lib/Analysis/ScalarEvolutionExpander.cpp
2002

Please use cast<>

2004

I think you can use ConstantInt::getFalse(IP->getContext()) here.

This revision now requires changes to proceed.Dec 16 2015, 11:11 PM

Hi Sanjoy,

Sorry for the delayed reply (holidays..). I should have a new version of this up for review soon.

Thanks,
Silviu

include/llvm/Analysis/ScalarEvolution.h
281

I think almost correct. You would technically need a different function for a, since we regard it as unsigned. So that means

0 <= G(a) + F(b) < 2^n, where F maps to twos complement and G is simple base 2 conversion.

Some examples:

So for a 16-bit example of a = 0xffff and b = 0xfffe we would get F(0xffff) = 2^16 - 1 and G(0xfffe) = -2 and
0 <= 2^16 - 3 < 2^16 which is satisfied.

For a = 0x0, b = 0xffff we get

0 <= -1 < 2^16 which is false.

For a = 0xffff, b = 0x0010 we get

0 <= 2^16 -1 + 2 < 2^16 which is false.

Does this make sense to you?

1478

I think AssertingVH would make the most sense here.

sbaranga updated this revision to Diff 44738.Jan 13 2016, 4:56 AM
sbaranga edited edge metadata.

Rebase the change and address the current issues raised by Sanjoy in the last review round.

sbaranga marked 16 inline comments as done.Jan 13 2016, 5:03 AM
sbaranga added inline comments.
include/llvm/Analysis/ScalarEvolution.h
288

There's no reason why we couldn't do that (apart from consistency with SCEV::NoWrapFlags). I have no preference for this.

sanjoy requested changes to this revision.Jan 17 2016, 12:47 PM
sanjoy edited edge metadata.
sanjoy added inline comments.
include/llvm/Analysis/ScalarEvolution.h
286

Do you mean G(a) + F(b) here? Might be helpful to name these as Signed and Unsigned instead of F and G.

Might be useful to explicitly note that SCEVWrapPredicate:: IncrementNUW is not commutative, and you're interpreting {a,+,b} as (((a + b) + b) + b) ... for the purposes of SCEVWrapPredicate.

291

Please don't call this nuw (here, and elsewhere in comments) -- I don't think having two definitions of nuw float around it the codebase is a good idea. Pretty much any name other than nuw is fine.

300

What you have here is fine, but I'd have done assert(Flags & IncrementNoWrapMask == Flags) instead.

lib/Analysis/ScalarEvolution.cpp
9635

As mentioned earlier, please don't call this "nuw".

9733

Very minor, but why not return Op && Op->AR == AR && setFlags(Flags, Op->Flags) == Flags; instead of the early exit above?

9741

Minor and optional, but why not have an early return here as return IFlags == IncrementNSW;?

9749

As mentioned earlier, please don't call this <nuw>.

This revision now requires changes to proceed.Jan 17 2016, 12:47 PM
sbaranga added inline comments.Jan 18 2016, 7:33 AM
include/llvm/Analysis/ScalarEvolution.h
286

Good catch, it should have been G(a) + F(b). It is indeed not commutative. I'll update the text to reflect this.

300

That seems a better way to do it, thanks.

lib/Analysis/ScalarEvolution.cpp
9741

This is a corner case, but that wouldn't work if Flags == IncrementAnyWrap (in which case the predicate would always be true).

sbaranga updated this revision to Diff 45188.Jan 18 2016, 7:47 AM
sbaranga edited edge metadata.

Rebased and addressed the comments from the last review.

Renamed the new NUW/NSW flags to NUSW and NSSW (No Unsigned/Signed With
Signed increment Wrap) and update the documentation for these flags
according to the last review round.

sbaranga updated this revision to Diff 45189.Jan 18 2016, 8:09 AM
sbaranga edited edge metadata.

Update some missed comments to reflect the previous renaming (NUW -> NUSW and NSW -> NSSW).

sbaranga updated this revision to Diff 45696.Jan 22 2016, 9:52 AM

After further thinking, we should support RAUW/delete for values in FlagsMap.

This change replaces the DenseMap with a ValueMap (with the default configuration)
which will move the flags to the new value in case of RAUW or remove the entry
in case of delete.

We now also need to explicitly specify the copy contructor for
PredicatedScalarEvolution since ValueMap deletes the default copy constructor.

SCEV bits lgtm with minor nits inline.

include/llvm/Analysis/ScalarEvolution.h
296

Do you mean "is already non-commutative"?

302

This is much better, thanks! Might be helpful to clarify htat NSSW is basically nsw, but NUSW is a hybrid specific to SCEVWrapPredicate.

lib/Analysis/ScalarEvolutionExpander.cpp
2011

I'd suggest not generating the redundant or instruction, since it is easy to avoid here.

anemet edited edge metadata.Jan 26 2016, 9:33 AM

Hi Silviu,

Just started to look at this (sorry about the delay) and on first impression, I am surprised by the lack of tests. Can't we cover this with LAA tests?

Adam

Hi Silviu,

Just started to look at this (sorry about the delay) and on first impression, I am surprised by the lack of tests. Can't we cover this with LAA tests?

Adam

Hi Adam,

I've been having some trouble testing this with just LAA (we were bailing out when checking if we can do runtime checks).
However, I can now trigger this by adding noalias on the input pointers, so that problem has been solved and I can trigger all code paths with LAA.

I'll also need your change for loop versioning testing (http://reviews.llvm.org/D16612) in order to test the runtime checks, so I'll wait for that to go in before updating this.

Thanks,
Silviu

sbaranga updated this revision to Diff 46635.Feb 2 2016, 4:06 AM
sbaranga edited edge metadata.

Add test cases where LAA can use the SCEV predicates to both transform non-AddRecExprs to AddRecExprs and directly add no overflow flags for pointers.
The test cases check both the result of LAA and that LoopVectorize adds run-time checks for the SCEV predicates.

anemet added inline comments.Feb 2 2016, 9:51 PM
lib/Analysis/LoopAccessAnalysis.cpp
812

Do you want to try to coerce the expression into affine/non-wrapping here too? This may be a more common case for C/C++ where we are always inbounds for GEP.

Just asking, it does not have to and probably shouldn't be part of this patch.

913–916

No {}

Also, any particular reason you have a dbg message above but not here?

test/Analysis/LoopAccessAnalysis/wrapping-pointer-versioning.ll
7 ↗(On Diff #46635)

Isnt't this initialized to 0 in the IR below?

26–27 ↗(On Diff #46635)

We should include the SCEV for %mul_ext here in a comment between the two. Probably both with and without the added flag on %mul. This is more critical later when more things are going on. This way we'd have the whole reasoning in front of us.

75–78 ↗(On Diff #46635)

Don't you mean 2 * index as the expression for this part? The GEP is covered later.

sbaranga updated this revision to Diff 46770.Feb 3 2016, 4:55 AM

Address comments from Adam.

sbaranga added inline comments.Feb 3 2016, 5:18 AM
lib/Analysis/LoopAccessAnalysis.cpp
812

I think that's already covered by the current change, because isStridedPtr will add the predicates if needed (and I think this is only called from isStridedPtr).

This should be better because isStridedPtr might return true even if isNoWrapAddRec returns false (for example in the case of unit strided pointers), and we try to only add predicates when needed.

If we ever end up calling isNoWrapAddRec from somewhere else than isStridedPtr, it would be a good idea to allow coercing here as well.

913–916

Thanks, I intended to add the debug message here as well.

test/Analysis/LoopAccessAnalysis/wrapping-pointer-versioning.ll
27–28 ↗(On Diff #46770)

Added text with the SCEV expressions for %mul_ext before and after the predicates for all tests.

anemet added inline comments.Feb 4 2016, 6:17 PM
lib/Analysis/LoopAccessAnalysis.cpp
812

OK, that's possible, quite a few things are happening here implicitly. But then please add a test for it. I.e. gep with inbounds but the index not marked with nowrap.

test/Analysis/LoopAccessAnalysis/wrapping-pointer-versioning.ll
99 ↗(On Diff #46770)

Long line, a few more further down.

sbaranga updated this revision to Diff 47009.Feb 5 2016, 4:12 AM

Added a test case with an inbounds GEPs, with the GEP index
having a non AddRec SCEV expression. This covers more
accurately the case where the input comes from C/C++.

Fixed the long lines in the test.

I've also noticed that the debug messages in isStridedPtr
would get confusing because they would use the SCEV expression
of the pointer before being coerced into an AddRecExpr. This
also changes the debug messages to use the updated expressions.

sbaranga marked an inline comment as done.Feb 5 2016, 4:28 AM
sbaranga added inline comments.
lib/Analysis/LoopAccessAnalysis.cpp
812

I've added a test for this.

FWIW I've looked in detail at what's going on here and I believe there is a chance to improve things: in the added test, OBO->hasNoSignedWrap() returns false, and we therefore bail out.

I think we can deduce the NSW flag from the added SCEV predicates. This would reduce the number of required SCEV predicates by 1.

anemet accepted this revision.Feb 5 2016, 10:23 AM
anemet edited edge metadata.

LGTM too.

test/Analysis/LoopAccessAnalysis/wrapping-pointer-versioning.ll
2 ↗(On Diff #47009)

Are you planning to change these to use the new loop-versioning pass in a follow-on patch?

249 ↗(On Diff #47009)

Not AddRecExpr or rather not nsw/nuw?

sbaranga marked an inline comment as done.Feb 8 2016, 2:06 AM
sbaranga added inline comments.
test/Analysis/LoopAccessAnalysis/wrapping-pointer-versioning.ll
2 ↗(On Diff #47009)

Sure, I'll do that in a follow-up change.

sbaranga retitled this revision from [SCEV][LAA] Add no overflow SCEV predicates and use use them to improve strided pointer detection to [SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection.Feb 8 2016, 2:29 AM
sbaranga updated this object.
sbaranga edited edge metadata.
sbaranga updated this revision to Diff 47170.Feb 8 2016, 2:47 AM

Fix a comment in on of the tests.

This revision was automatically updated to reflect the committed changes.

Committed in r260085. Thanks!

-Silviu