Page MenuHomePhabricator

[SCEV] Compute trip counts w/frozen conditions

Authored by reames on Nov 22 2019, 3:09 PM.



I'd really appreciate a sceptical eye on this. I'm not 100% sure this is correct.

The motivation is that unswitching a loop nest will produce conditions in the outer loop which are frozen. If that exit would otherwise be computable, we'd really like it to remain computable w/freeze.

My hesitation is that I'm not sure it's sound to propagate an analysis result (potentially based on UB) through freeze. What do folks think? Is this legal? Or not?

If not, suggestions for alternate approaches to recover knowledge in SCEV?

Diff Detail

Event Timeline

reames created this revision.Nov 22 2019, 3:09 PM


Currently ScalarEvolution::getBackedgeTakenCount() is commented as follows, which is not clear about the case when instruction returning nondeterministic value is involved (such as freeze).

/// If the specified loop has a predictable backedge-taken count, return it,
/// otherwise return a SCEVCouldNotCompute object. The backedge-taken count is
/// the number of times the loop header will be branched to from within the
/// loop, assuming there are no abnormal exists like exception throws. This is
/// one less than the trip count of the loop, since it doesn't count the first
/// iteration, when the header is branched to from outside the loop.
/// Note that it is not valid to call this method on a loop without a
/// loop-invariant backedge-taken count (see
/// hasLoopInvariantBackedgeTakenCount).

Updating its definition as follows helps the tests frozen_condition, frozen_iv, frozen_inc to have %n backedge-count.
We can simulate a case where the number of loop iteration is %n by properly choosing the freezed values at different iterations.

  /// loop, assuming there are no abnormal exists like exception throws.
+  /// If there are more than one possible backedge-taken counts because its branch
+  /// condition is evaluated from nondeterministic operations like freeze,
+  /// any of possible backedge-taken counts is valid. This assumes that variables
+  /// defined outside the loop are fixed.

In case of the frozen_limit example, the number of iterations is fixed to %freeze (%freeze is defined outside the loop), so it cannot be %n. I think the current CHECK is okay.

This update will support sustaining SCEV's analysis after insertion of freezes, but not very sure whether it is correct.
A possible problematic case is creating a code snippet and inserting it into a loop, based on SCEV's analysis result. Loop unrolling might be the case if it adds increment of the induction variable, because it wouldn't create freezed increment right now.

Interesting question :)

Let's focus on the first example, but with a ule instead: = add nsw i32 %iv, 1
%becond = icmp ule i32 %iv, %n
%freeze = freeze i1 %becond
br i1 %freeze, label %loop, label %leave

If %n == UINT_MAX, then icmp is always true. If freeze is not there, eventually %iv.inv becomes poison and then we have UB at the branch. So without freeze we can safely bound the number of iterations by %n. With freeze, the loop potentially never terminates (once the IV becomes poison, frozen cond can be forever true).

With ult we don't have this particular problem of the IV becoming poison. But what if the IV initializer is non-constant and poison in the first place? Then the freeze gives a non-terminating loop again (non-deterministically).
Same goes if %n is poison: we can't bound the number of iterations. You would need to push the freeze to %n. Then it's ok.

Essentially we need to push freezes out of loops. A freeze evaluated in the loop body can give a non-deterministic value in each iteration so we can't rely on it for most (any?) analysis.

reames abandoned this revision.Dec 2 2019, 4:41 PM

I agree with the analysis from Nuno, and thus the patch as written is wrong.

aqjune added a subscriber: spatel.Wed, Apr 1, 6:58 AM

Hello all,
I'm seeing this issue after is merged. LoopStrengthReduce doesn't fire on 49f7513.ll because the induction variable is frozen (LSR successfully optimizes 0019c2f.ll on the other hand).
To resolve this, I'd like to write a transformation suggested by @nlopes; pushing freeze out of a loop, effectively making it as 49f7513.candidate.ll.
Is there an appropriate pass for having this transformation? Or should it be a separate pass?

aqjune added a subscriber: sanwou01.Wed, Apr 1, 6:58 AM