This is an archive of the discontinued LLVM Phabricator instance.

Exploit a zero LoopExit count to eliminate loop exits
ClosedPublic

Authored by reames on Jun 20 2019, 12:29 PM.

Details

Summary

This turned out to be surprisingly effective. I was originally doing this just for completeness sake, but it seems like there are a lot of cases where SCEV's exit count reasoning is stronger than it's isKnownPredicate reasoning.

Once this is in, I'm thinking about trying to build on the same infrastructure to eliminate provably untaken checks. There may be something generally interesting here.

Diff Detail

Event Timeline

reames created this revision.Jun 20 2019, 12:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2019, 12:29 PM

it seems like there are a lot of cases where SCEV's exit count reasoning is stronger than it's isKnownPredicate reasoning.

This is somewhat surprising. For instance in @test_06 I would have expected SCEV to compute the range of %narrow.iv to be [INT32_MAX, INT32_MAX+1) (using the fact that the backedge taken count is 0), and thus figure resolve the icmp slt to false by just looking at the ranges. Any idea why that's not happening?

it seems like there are a lot of cases where SCEV's exit count reasoning is stronger than it's isKnownPredicate reasoning.

This is somewhat surprising. For instance in @test_06 I would have expected SCEV to compute the range of %narrow.iv to be [INT32_MAX, INT32_MAX+1) (using the fact that the backedge taken count is 0), and thus figure resolve the icmp slt to false by just looking at the ranges. Any idea why that's not happening?

I haven't looked into that one specifically, but in general, I've been noticing a *lot* of issues with caching and computation of overflow bits. I've now seen several cases where we form a SCEV, then later figure out one component of it is nsw/nuw, but don't simplify the previously formed SCEV. Since we rely on equality in many cases for equivalence checks, that means we fail to prove things which appear obvious. Usually, a second run on indvars (after flags are set in IR) gets these cases.

Also, there is an argument that exit count reasoning "should" be more powerful than the predicate based form. When we're analysing a compare *knowing it controls an exit* we do have slightly more information than in the general predicate reasoning. (Consider a loop with one unanalysable exit and thus no exact BE count for the loop as a whole..)

sanjoy accepted this revision.Jun 21 2019, 11:09 AM

it seems like there are a lot of cases where SCEV's exit count reasoning is stronger than it's isKnownPredicate reasoning.

This is somewhat surprising. For instance in @test_06 I would have expected SCEV to compute the range of %narrow.iv to be [INT32_MAX, INT32_MAX+1) (using the fact that the backedge taken count is 0), and thus figure resolve the icmp slt to false by just looking at the ranges. Any idea why that's not happening?

I haven't looked into that one specifically,

SCEV seems to compute the correct range for that specific case:

%narrow.iv = trunc i64 %iv to i32
-->  {2147483647,+,1}<%loop> U: [2147483647,-2147483648) S: [2147483647,-2147483648)          Exits: 2147483647               LoopDispositions: { %loop: Computable }

so the comparison should be folded based on constant range analysis alone. Do you mind spot checking if indvars is even _trying_ to fold the condition correctly?

but in general, I've been noticing a *lot* of issues with caching and computation of overflow bits. I've now seen several cases where we form a SCEV, then later figure out one component of it is nsw/nuw, but don't simplify the previously formed SCEV.

Having proper use lists for SCEV might help. That would let us simplify users of a SCEV in a simplified manner once a SCEV has been proven not to wrap.

Since we rely on equality in many cases for equivalence checks, that means we fail to prove things which appear obvious. Usually, a second run on indvars (after flags are set in IR) gets these cases.

Also, there is an argument that exit count reasoning "should" be more powerful than the predicate based form. When we're analysing a compare *knowing it controls an exit* we do have slightly more information than in the general predicate reasoning. (Consider a loop with one unanalysable exit and thus no exact BE count for the loop as a whole..)

Agreed. I'm mainly curious about the specific test changes in this patch, not with the general idea.

This revision is now accepted and ready to land.Jun 21 2019, 11:09 AM

it seems like there are a lot of cases where SCEV's exit count reasoning is stronger than it's isKnownPredicate reasoning.

This is somewhat surprising. For instance in @test_06 I would have expected SCEV to compute the range of %narrow.iv to be [INT32_MAX, INT32_MAX+1) (using the fact that the backedge taken count is 0), and thus figure resolve the icmp slt to false by just looking at the ranges. Any idea why that's not happening?

I haven't looked into that one specifically,

SCEV seems to compute the correct range for that specific case:

%narrow.iv = trunc i64 %iv to i32
-->  {2147483647,+,1}<%loop> U: [2147483647,-2147483648) S: [2147483647,-2147483648)          Exits: 2147483647               LoopDispositions: { %loop: Computable }

so the comparison should be folded based on constant range analysis alone. Do you mind spot checking if indvars is even _trying_ to fold the condition correctly?

SimplifyIndVar generally only works on direct users of the IV, including for icmp simplification. In this case there's a trunc between the IV and the icmp, so we don't try to simplify.

This revision was automatically updated to reflect the committed changes.

Agreed. I'm mainly curious about the specific test changes in this patch, not with the general idea.

I dug into this a bit, and it turns out the Nikita is partly right. What's going on is that when we visit a cast (for the widening analysis), we always skip it's users (for the simplifying transforms). This doesn't seem to really make any sense, and removing it doesn't appear to break any test. Anyone have any idea why this might be the case? I'm really tempted to just remove the continue.

Agreed. I'm mainly curious about the specific test changes in this patch, not with the general idea.

I dug into this a bit, and it turns out the Nikita is partly right. What's going on is that when we visit a cast (for the widening analysis), we always skip it's users (for the simplifying transforms). This doesn't seem to really make any sense, and removing it doesn't appear to break any test. Anyone have any idea why this might be the case? I'm really tempted to just remove the continue.

That sounds reasonable, but here are a couple of things you might want to check before making that change (I assume you're talking about removing this like: https://github.com/llvm-mirror/llvm/blob/5fae5e00b26a33fcdc2c350ddbe5b0bc36b67ea6/lib/Transforms/Utils/SimplifyIndVar.cpp#L919)):

  • Maybe we're skipping sext/zext since we're going to create a wider IV and revisit the widened users anyway (in IndVarSimplify::simplifyAndExtend).
  • Maybe we're skipping trunc because most likely the trunc was inserted by a previous round of Widener.createWideIV and we do not want to re-do the work of simplifying the narrow IV.
  • Maybe SimplifyIndvar::simplifyUsers expects all of the IV users it is asked to simplify to have the same bitwidth.