Page MenuHomePhabricator

[AA] Teach BasicAA to recognize basic GEP range information.

Authored by courbet on Sep 14 2021, 1:53 AM.



The information can be implicit (from ValueTracking) or explicit.

This implements the backend part of the following RFC

We still need to settle on how to best represent the information in the
IR, but this is a separate discussion.

Diff Detail

Event Timeline

courbet created this revision.Sep 14 2021, 1:53 AM
courbet requested review of this revision.Sep 14 2021, 1:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 14 2021, 1:53 AM
nikic edited reviewers, added: nikic, fhahn, reames, jdoerfert, asbirlea; removed: lebedev.ri.Sep 14 2021, 2:41 AM
nikic added inline comments.Sep 15 2021, 2:47 PM

Ranges can be wrapping, you're probably looking for getSignedMin() and getSignedMax() here.


Can't you use the ConstantRange API to perform the multiplication by the scale?

Also, we need to consider overflow here. If the NSW flag is set we can exclude overflow and perform a saturating multiplication, but otherwise we need to take it into account, and this is not just a matter of multiplying the bounds. ConstantRange should do that correctly, though conservatively. We should have tests for this as well.

courbet updated this revision to Diff 373173.Sep 17 2021, 2:32 AM
courbet marked an inline comment as done.

Use the ConstantRange API for mutiplication and addition. Add more tests.


Thanks for the suggestion. In addition to correctness, that actually makes the code much simpler too.

I've added a bunch of tests that exercise various edge cases.

nikic added inline comments.Sun, Sep 26, 1:57 PM

The Var represents a zext(sext(V)) * Scale, so this sextOrTrunc isn't right if ZExtBits is non-zero. We first need an (optional) sext and then an (optional) zext.

nikic added inline comments.Sun, Sep 26, 2:42 PM

This doesn't really preserve the spirit of the test. Would it be possible to adjust the range metadata instead, to something like !{ i32 1, i32 0 }, to get a range that only excludes zero?

courbet updated this revision to Diff 375536.Tue, Sep 28, 5:23 AM
courbet marked an inline comment as done.
  • Conditionally zext/sext if needed.
  • Add test with zero extension.



Done. I've also added a unit test with a zero extension that fails without this change.

courbet updated this revision to Diff 375537.Tue, Sep 28, 5:25 AM

revert naming changes in tests

nikic accepted this revision.Tue, Sep 28, 11:02 AM



The code added here doesn't really rely on the fact that there is only a single index -- it could be integrated in the GCD/AllNonNegative/AllNonPositive loop above instead and add up the ranges of all indices. I'm okay with landing it in this limited form first though.


It's possible for both SExtBits and ZExtBits to be non-zero, so this should be extending to R.getBitWidth() + Var.SExtBits + Var.ZExtBits.


We probably don't need this sextOrTrunc() anymore?


nit: Braces are unnecessary here.


Seems to be unused?

This revision is now accepted and ready to land.Tue, Sep 28, 11:02 AM
courbet marked 2 inline comments as done.Wed, Sep 29, 7:53 AM

Thanks a lot for the review.


If that's the case, R has been sign-extended above, so Var.SExtBits is already counted in R.getBitWidth() (which is not the same as on line 1232)


We still need to extend from 32 to 64 bits when --basic-aa-force-at-least-64b=0. This is apparently not taken into account in SExtBits/ZExtBits.

courbet updated this revision to Diff 375893.Wed, Sep 29, 7:54 AM

fix last comments

nikic added inline comments.Wed, Sep 29, 8:51 AM

Oh right, that makes sense!


The case where an extend is needed should be represented by SExtBits -- but the reverse case (where a truncate is needed) is currently not represented (which is an open bug). So yeah, you're right and we can't drop this entirely.

This optimization seems to have uncovered a latent bug in LoopIdiom:

xbolva00 added inline comments.

Comments needs to be adjusted.

courbet added inline comments.Mon, Oct 11, 1:44 AM

Fixed in 342d7b654c63cadaa6135913c9582a7272fced58: "Same as @test1, this time the assume just guarantees %skip > -3, which is
enough to derive NoAlias for %ptr and %col.ptr.2 (distance is more than 3 doubles, and we load 1 double), but not %col.ptr.1 and %col.ptr.2 (distance is more than 3 doubles, and we load 6 doubles)."