This is an archive of the discontinued LLVM Phabricator instance.

[SCEVAffinator] Make precise modular math more correct.
ClosedPublic

Authored by efriedma on Oct 5 2016, 11:15 AM.

Details

Summary

Integer math in LLVM IR is modular. Integer math in isl is
arbitrary-precision. Modeling LLVM IR math correctly in isl requires
either adding assumptions that math doesn't actually overflow, or
explicitly wrapping the math. However, expressions with the "nsw" flag
are special; we can pretend they're arbitrary-precision because it's
undefined behavior if the result wraps. SCEV expressions based on IR
instructions with an nsw flag also carry an nsw flag (roughly; actually,
the real rule is a bit more complicated, but the details don't matter
here).

Before this patch, SCEV flags were also overloaded with an additional
function: the ZExt code was mutating SCEV expressions as a hack to
indicate to checkForWrapping that we don't need to add assumptions to
the operand of a ZExt; it'll add explicit wrapping itself. This kind of
works... the problem is that if anything else ever touches that SCEV
expression, it'll get confused by the incorrect flags.

Instead, with this patch, we make the decision about whether to
explicitly wrap the math a bit earlier, basing the decision purely on
the SCEV expression itself, and not its users.

Diff Detail

Repository
rL LLVM

Event Timeline

efriedma updated this revision to Diff 73674.Oct 5 2016, 11:15 AM
efriedma retitled this revision from to [SCEVAffinator] Make precise modular math more correct..
efriedma updated this object.
efriedma added reviewers: jdoerfert, Meinersbur.
efriedma set the repository for this revision to rL LLVM.
efriedma added subscribers: llvm-commits, pollydev.
jdoerfert edited edge metadata.Oct 6 2016, 3:17 AM

Hi Eli,

thanks for this patch! I can see the problem now and think your solution is the right way to fix it. Nevertheless I added quite a few (mostly minor) comments you might want to address or comment yourself.

Cheers,

Johannes

PS. Can we expect you to work on Polly more regularly now?

include/polly/Support/SCEVAffinator.h
117 ↗(On Diff #73674)

I would rephrase this in a more direct way, e.g. True, if a possible wrap in expr is modeled, false otherwise.

lib/Support/SCEVAffinator.cpp
120 ↗(On Diff #73674)

I think the commit message should state explicitly that this caused ScalarEvoltuion to use flags for SCEVs that we introduced only as a means to an end. Also the solution should be mentioned more clearly: We avoid that by applying modulo semantics in the recursion now instead of afterwards.

250 ↗(On Diff #73674)

Did you omit the else here on purpose? If we add modulo semantics there is no wrapping (by construction)

253 ↗(On Diff #73674)
  1. This is unrelated.
  2. Does isOne() evaluate on true on the value: "i1 -1" ?
262 ↗(On Diff #73674)

Again this can be guarded by !isPrecise(..) can't it?

360 ↗(On Diff #73674)

Comments are not adjusted.

372 ↗(On Diff #73674)

This two lines looks like left over code but I am not 100% sure.

test/ScopInfo/bool-addrec.ll
12 ↗(On Diff #73674)

No triple pls.

test/ScopInfo/infeasible-rtc.ll
15 ↗(On Diff #73674)

Why did you modify the originial? Wasn't it refused anymore? Could we have both tests? The new one and the old one with a different comment + check line so people reading the patch later have more information.

test/ScopInfo/integers.ll
115 ↗(On Diff #73674)

It his hard to verify this domain without the {assumed_,invalid_,}context of the scop. Could you add those to the test or at least the review?

Currently all I know that the domain shown above doesn't make sense as it contains the following points (note the nsw for %indvars and the i0 > 3):

[n] -> { Stmt_bb[i0] : n = -1 and i0 = 3 }
[n] -> { Stmt_bb[i0] : n = -1 and i0 = 1 }
[n] -> { Stmt_bb[i0] : n = -1 and i0 = 4 }
[n] -> { Stmt_bb[i0] : n = 2 and i0 = 5 }
[n] -> { Stmt_bb[i0] : n = 1 and i0 = 6 }
[n] -> { Stmt_bb[i0] : n = 2 and i0 = 7 }
[n] -> { Stmt_bb[i0] : n = 3 and i0 = 0 }
[n] -> { Stmt_bb[i0] : n = 2 and i0 = 0 }
[n] -> { Stmt_bb[i0] : n = 1 and i0 = 0 }
[n] -> { Stmt_bb[i0] : n = 0 and i0 = 0 }
[n] -> { Stmt_bb[i0] : n = -1 and i0 = 0 }
[n] -> { Stmt_bb[i0] : n = -2 and i0 = 0 }
[n] -> { Stmt_bb[i0] : n = -3 and i0 = 0 }
[n] -> { Stmt_bb[i0] : n = -4 and i0 = 0 }
[n] -> { Stmt_bb[i0] : n = 2 and i0 = 6 }
[n] -> { Stmt_bb[i0] : n = 0 and i0 = 5 }
[n] -> { Stmt_bb[i0] : n = 1 and i0 = 5 }
[n] -> { Stmt_bb[i0] : n = 1 and i0 = 4 }
[n] -> { Stmt_bb[i0] : n = 0 and i0 = 4 }
[n] -> { Stmt_bb[i0] : n = 2 and i0 = 4 }
[n] -> { Stmt_bb[i0] : n = -2 and i0 = 3 }
[n] -> { Stmt_bb[i0] : n = 1 and i0 = 3 }
[n] -> { Stmt_bb[i0] : n = 0 and i0 = 3 }
[n] -> { Stmt_bb[i0] : n = 2 and i0 = 3 }
[n] -> { Stmt_bb[i0] : n = -1 and i0 = 2 }
[n] -> { Stmt_bb[i0] : n = -3 and i0 = 2 }
[n] -> { Stmt_bb[i0] : n = -2 and i0 = 2 }
[n] -> { Stmt_bb[i0] : n = 1 and i0 = 2 }
[n] -> { Stmt_bb[i0] : n = 0 and i0 = 2 }
[n] -> { Stmt_bb[i0] : n = 2 and i0 = 2 }
[n] -> { Stmt_bb[i0] : n = -4 and i0 = 1 }
[n] -> { Stmt_bb[i0] : n = -3 and i0 = 1 }
[n] -> { Stmt_bb[i0] : n = -2 and i0 = 1 }
[n] -> { Stmt_bb[i0] : n = 1 and i0 = 1 }
[n] -> { Stmt_bb[i0] : n = 0 and i0 = 1 }
[n] -> { Stmt_bb[i0] : n = 2 and i0 = 1 }
test/ScopInfo/zero_ext_of_truncate.ll
21 ↗(On Diff #73674)

This looks like we could run into some regressions with this more agressive representation of wrapping behaviour. Did you observe problems?

Meinersbur edited edge metadata.Oct 6 2016, 4:36 AM

I am not familiar with this part of Polly. Could you summarize how SCEV flags are handled now and how it was before?

Thanks Johannes for the extensive review.

Yes, you can expect more polly patches from me in the near future. :)

Quick summary of what's going on with SCEV flags: integer math in LLVM IR is modular. Integer math in isl is arbitrary-precision. Modeling LLVM IR math correctly in isl requires either adding assumptions that math doesn't actually overflow, or explicitly wrapping the math. However, expressions with the "nsw" flag are special; we can pretend they're arbitrary-precision because it's undefined behavior if the result wraps. SCEV expressions based on IR instructions with an nsw flag also carry an nsw flag (roughly; actually, the real rule is a bit more complicated, but the details don't matter here).

Before this patch, SCEV flags were also overloaded with an additional function: the ZExt code was mutating SCEV expressions as a hack to indicate to checkForWrapping that we don't need to add assumptions to the operand of a ZExt; it'll add explicit wrapping itself. This kind of works... the problem is that if anything else ever touches that SCEV expression, it'll get confused by the incorrect flags.

Instead, with this patch, we make the decision about whether to explicitly wrap the math a bit earlier, basing the decision purely on the SCEV expression itself, and not its users.

(I'll adapt this description for the commit message.)

lib/Support/SCEVAffinator.cpp
250 ↗(On Diff #73674)

I can. checkForWrapping is essentially a no-op on the result of addModuloSemantic, since the value can't actually be out of range.

253 ↗(On Diff #73674)

Okay, I can separate it out.

Yes, isOne() is true for i1 -1.

372 ↗(On Diff #73674)

Yes, I think so too; I'll remove them.

test/ScopInfo/infeasible-rtc.ll
15 ↗(On Diff #73674)

I modified it because the old test wasn't getting rejected: with this patch, we use modular math and correctly compute the trip count.

Sure, I can keep around the old test, too, if you think it's appropriate.

test/ScopInfo/integers.ll
115 ↗(On Diff #73674)

Full output follows. I think the domain is right if you ignore the nsw... but you're right, my patch isn't correctly honoring the nsw flag for very narrow addition operations. I'll fix that.

Printing analysis 'Polly - Create polyhedral description of Scops' for region: 'bb => return' in function 'f6':
    Function: f6
    Region: %bb---%return
    Max Loop Depth:  1
    Invariant Accesses: {
    }
    Context:
    [n] -> {  : -4 <= n <= 3 }
    Assumed Context:
    [n] -> {  :  }
    Invalid Context:
    [n] -> {  : 1 = 0 }
    p0: %n
    Arrays {
        i3 MemRef_a[*]; // Element size 1
    }
    Arrays (Bounds as pw_affs) {
        i3 MemRef_a[*]; // Element size 1
    }
    Alias Groups (0):
        n/a
    Statements {
        Stmt_bb
            Domain :=
                [n] -> { Stmt_bb[i0] : i0 >= 0 and 8*floor((5 + n)/8) <= 5 + n - i0 };
            Schedule :=
                [n] -> { Stmt_bb[i0] -> [i0] };
            MustWriteAccess :=  [Reduction Type: NONE] [Scalar: 0]
                [n] -> { Stmt_bb[i0] -> MemRef_a[i0] };
    }
test/ScopInfo/zero_ext_of_truncate.ll
21 ↗(On Diff #73674)

We have a related patch to avoid modeling truncates for loop-invariant expressions, which makes the complexity of truncate expressions less problematic (with or without this patch). Hopefully we'll get something posted soon.

Beyond that, I probably should have tested a little more carefully before I posted this patch; this version is more aggressive than what we've been using, and it causes a problem on one of our internal tests. I might tweak the heuristic.

Thank you for the explanation. I'll will look at it again after the update.

lib/Support/SCEVAffinator.cpp
41 ↗(On Diff #73674)

How is this magic number determined? Why is it constant and does not depend on the the type's width?

250 ↗(On Diff #73674)

I don't understand why when the type's bitwidth is below a threshold, we always add modulo ie. ignore the nsw flag?

jdoerfert added inline comments.Oct 7 2016, 4:34 PM
lib/Support/SCEVAffinator.cpp
41 ↗(On Diff #73674)

This is by definition a magic number we use to compare against the types bit-width. It was there before and it stems from some observations about the way ScalarEvolution handles low bit-width modulo expressions but that is basically it.

250 ↗(On Diff #73674)

Right, we should not. addModuloSemantics should now probably the only place we should check for the nsw flag and exit if it is present.

efriedma updated this revision to Diff 74454.Oct 12 2016, 4:13 PM
efriedma edited edge metadata.

Updated to address comments.

I'm not particularly happy with killing off the special case for truncates, but we really need to be doing something better with them; handling them precisely is too complicated (causes performance problems).

Otherwise, I think I addressed all the review comments.

Looks almost good. I am not sure about the i1 thing and this one test case though. Does lnt pass?

lib/Support/SCEVAffinator.cpp
215 ↗(On Diff #74454)

We have a helper function here

getNoWrapFlags(Expr)

but I do not care too much.

253 ↗(On Diff #74454)

The fact that you can remove the guard and the test still work is merely a coincidence and should not be commited this way. As soon as (for whatever reason) computeModuloForExpr(..) does not return true for an i1 expression we will have a hard to find bug here.

test/ScopInfo/bool-addrec.ll
12 ↗(On Diff #73674)

No triple pls.

test/ScopInfo/integers.ll
115 ↗(On Diff #73674)

The domain in the test is different from the one you posted, isn't it? I'm confused, sry.

test/ScopInfo/truncate-1.ll
15 ↗(On Diff #74454)

For now OK I guess.

test/ScopInfo/zero_ext_of_truncate.ll
12 ↗(On Diff #74454)

True but that is not that easy to decide. If there was a second use of "%tmp" the situation might look different. Anyway, if you want to work on this we could see if we can come up with a scheme.

efriedma updated this revision to Diff 74539.Oct 13 2016, 10:28 AM

Updated for review comments.

efriedma updated this object.Oct 13 2016, 10:29 AM
jdoerfert accepted this revision.Oct 21 2016, 8:14 AM
jdoerfert edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Oct 21 2016, 8:14 AM
This revision was automatically updated to reflect the committed changes.