This is an archive of the discontinued LLVM Phabricator instance.

[DA] [SCEV] Provide facility to check for total ordering based on dominance
Needs ReviewPublic

Authored by bmahjour on Mar 4 2020, 10:53 AM.

Details

Summary

Currently SCEV assumes that all recurrences used by a given SCEV need to dominate each other. In other words there must be a total ordering on the set of loops used in a SCEV expression. When doing MIV tests on accesses in two sibling loops that are triangular, we can run into a problem because the loops in SCEV expressions for the bounds of one access do not necessarily dominate the SCEV expressions for the bounds of the other access. For example, with an LLVM_ASSERT="on" build, we get a crash in ScalarEvolution when we running DependenceAnalysis on the following code (see the LIT test bellow for the IR):

void foo(int *restrict A, int n1, int n2, int n3) {
  for (int i1 = 0; i1 < n1; i1++) {
    for (int i2 = 2; i2 < n2; i2++) {
      for (int i3 = i2 + 1; i3 < n3; i3++) {
        A[i2 + i3*n2] = 11;
      }
    }
    for (int i4 = 2; i4 < n3; i4++) {
      for (int i5 = 1; i5 < i4 - 1; i5++) {
        A[i5] = 22;
      }
    }
  }
}
ScalarEvolution.cpp:736: int CompareSCEVComplexity(EquivalenceClasses<const llvm::SCEV *> &, EquivalenceClasses<const llvm::Value *> &, const llvm::LoopInfo *const, const llvm::SCEV *, const llvm::SCEV *, llvm::DominatorTree &, unsigned int): Assertion `DT.dominates(RHead, LHead) && "No dominance between recurrences used by one SCEV?"' failed.
Stack dump:
...
 #9 0x00007c9e3a1845d8 CompareSCEVComplexity(llvm::EquivalenceClasses<llvm::SCEV const*>&, llvm::EquivalenceClasses<llvm::Value const*>&, llvm::LoopInfo const*, llvm::SCEV const*, llvm::SCEV const*, llvm::DominatorTree&, unsigned int) 
#10 0x00007c9e3a145854 GroupByComplexity(llvm::SmallVectorImpl<llvm::SCEV const*>&, llvm::LoopInfo*, llvm::DominatorTree&) 
#11 0x00007c9e3a138b08 llvm::ScalarEvolution::getAddExpr(llvm::SmallVectorImpl<llvm::SCEV const*>&, llvm::SCEV::NoWrapFlags, unsigned int)
#12 0x00007c9e39f7d300 llvm::DependenceInfo::testBounds(unsigned char, unsigned int, llvm::DependenceInfo::BoundInfo*, llvm::SCEV const*) const 
#13 0x00007c9e39f7c32c llvm::DependenceInfo::banerjeeMIVtest(llvm::SCEV const*, llvm::SCEV const*, llvm::SmallBitVector const&, llvm::FullDependence&) const 
#14 0x00007c9e39f7ba80 llvm::DependenceInfo::testMIV(llvm::SCEV const*, llvm::SCEV const*, llvm::SmallBitVector const&, llvm::FullDependence&) const 
#15 0x00007c9e39f857b0 llvm::DependenceInfo::depends(llvm::Instruction*, llvm::Instruction*, bool)

This patch adds a query to ScalarEvolution to be able to examine a list of SCEV expressions and determine if the total ordering constraint holds for the set of associated loops. The MIV test uses this facility to check if the bounds can even be represented by SCEV, and return if the check fails.

Diff Detail

Event Timeline

bmahjour created this revision.Mar 4 2020, 10:53 AM
Meinersbur added inline comments.Mar 4 2020, 1:34 PM
llvm/lib/Analysis/DependenceAnalysis.cpp
2566

Is it even correct to call testBounds with SCEVExprs bounds with disjoint loop nests? Maybe the bug is in to consider MaxLevels instead of CommonLevels.

2725

Isn't return false the bail-out?

llvm/lib/Analysis/ScalarEvolution.cpp
12364

This is basically a 'contains' relationship. Correct?

Implementations of std::sort may assume that the relation is irreflexible (and assert on that). Use properlyDominates instead?

12369–12373

[suggestion]

for (int i = 1, e = SortedLoops.size(); i < e; ++i) {
  if (!DT.dominates(SortedLoops[i-1]->getHeader(), SortedLoops[i]->getHeader()))
    return false;
}
bmahjour updated this revision to Diff 248573.Mar 5 2020, 12:35 PM
bmahjour marked 7 inline comments as done.

Address Michael's comments.

llvm/lib/Analysis/DependenceAnalysis.cpp
2566

The Banerjee Inequality requires that delta between the coefficients of the first terms of the Diophantine function h be in between LB and UB, where LB and UB are defined to be the sum of min/max quantities over all loop levels (not just the common levels), so the code is correct in looping over all loops including the siblings. This is more of a limitation in SCEV not being able to represent the bounds when they are non-invariant recurrences that straddle non-dominating loops.

I'm actually not quite sure why there is this assumption in SCEV about dominance. Do you know?

2725

No, this function is trying to disprove dependence by returning false. If the delta is known not to be in the range of [LB, UB] then it returns false meaning that there is no real (or integer) solution to the diophantine equation which indicates that there is no dependence. When we cannot prove that the delta is in the range, then the function conservatively returns true (meaning that the dependence cannot be disproved).

llvm/lib/Analysis/ScalarEvolution.cpp
12364

Right, it's a good idea.

Meinersbur added inline comments.Mar 5 2020, 1:58 PM
llvm/include/llvm/Analysis/ScalarEvolution.h
1106

Since satisfiesTotalOrder does not add Exprs, (Mutable)ArrayRef might be sufficient as arguments.

llvm/lib/Analysis/DependenceAnalysis.cpp
2566

I'm actually not quite sure why there is this assumption in SCEV about dominance. Do you know?

AFAIU it is not about dominance, but that SCEVAddRecExpr only being valid within the loop (in the loop applies dominance by the loop header). If outside the loop, getSCEVAtScope 'removes' the SCEVAddRecExprs for loops it is not in. Using SCEVAddRecExprs from sibling loops in the same expression would be a contradiction as it cannot be valid in both loops at the same time.

I don't know how DependenceInfo is supposed to work comparing instructions in sibling loops, but I'd suspect the problem being there, not in SCEV. Maybe looking up

// Use Banerjee's Inequalities to test an MIV subscript pair.
// (Wolfe, in the race-car book, calls this the Extreme Value Test.)
// Generally follows the discussion in Section 2.5.2 of
//
//    Optimizing Supercompilers for Supercomputers
//    Michael Wolfe
//

could help.

bmahjour marked 3 inline comments as done.Mar 6 2020, 10:56 AM
bmahjour added inline comments.
llvm/lib/Analysis/DependenceAnalysis.cpp
2566

Using SCEVAddRecExprs from sibling loops in the same expression would be a contradiction as it cannot be valid in both loops at the same time.

Suppose you have two non-nested sibling loops like this:

int i, j;
for (i = 0; i < n; i++)
  ...
for (j = 0; j < n; j++)
  ...

 ... = i + j;

I can see that if we add {0,+,1}<j_loop> and {0,+,1}<i_loop> together to form a new SCEV expression it cannot be evaluated in either of the loops. However, if we evaluate i+j after both loops and passing nullptr to getSCEVAtScope we expect to get an expression that evaluates to 2*n. So theoretically we should be able to represent the expression symbolically, and only fail if trying to evaluate it in either the i-loop or the j-loop. Currently SCEV does not even support creating such an expression symbolically (as per asserts in CompareSCEVComplexity and isKnownViaInduction), instead of asserting in getSCEVAtScope.

I don't know how DependenceInfo is supposed to work comparing instructions in sibling loops, but I'd suspect the problem being there, not in SCEV. Maybe looking up...

I don't have the "race-car" book, but I did lookup the equations in Optimizing Compilers for Modern Architectures (by Randy Allen & Ken Kennedy) and what the DependenceInfo is trying to do makes sense to me. There is a special form of Banerjee Inequality for trapezoidal and triangular loops which is much more complicated, but is expected to produce more accurate results, however applying the original test to trapezoidal/triangular loops should still produce correct, but possibly over-conservative, results.

bmahjour updated this revision to Diff 248793.Mar 6 2020, 10:58 AM
bmahjour marked an inline comment as done.

Use ArrayRef instead of SmallVector as the argument to satisfiesTotalOrder.

Meinersbur added inline comments.
llvm/lib/Analysis/DependenceAnalysis.cpp
2566

When using i or j, getSCEVAtScope should be applied before combining with other SCEVs to ensure that we get the right representation for where it is used. That is, getSCEVAtScope(i) + getSCEVAtScope(j) instead of getSCEVAtScope(i+j). It might be that nothing interesting happens during ScalarEvolution::getAddExpr. However, we'd get into problems if the normalization of SCEVAdd makes use of properties that are only valid within the loop.

Note that this is not an issue when in LCSAA, since the uses of %i and %j after the loops would be $l.lcssa respectively %j.lcssa representing the exit values already. As would if we'd put one of the loops inside an if (e.g. loop guard), forcing an exit node phi:

if (c) {
  for (i = 0; i < n; i++) 
    ...
}
for (j = 0; j < n; j++)
  ...

Using the SCEVAddRec after the if would be illegal as it ignores what the value of %i would be if c was false. It is also wrong according to the dominator assumption for SCEV, probably because of this very reason. For dependence check between the loops, IMHO for the value of %i inside the %j loop, it should use the exit value of %i, not the AddRecExpr at all, even without the conditional. That is, handle it as if it was LCSSA. I think in your test case it is because the AddRecExpr is itself inside another loop and hence not dominating the other loop body.
What does Optimizing Compilers for Modern Architectures about this case? Maybe we could ask the original author Preston Briggs <preston.briggs@gmail.com>.

I do have the race-car book and I looked up the Extreme Value Test. The book ignores cases where statements are not in the same loop.

bmahjour marked an inline comment as done.Mar 26 2020, 8:43 AM
bmahjour added inline comments.
llvm/lib/Analysis/DependenceAnalysis.cpp
2566

I agree LCSSA can avoid this issue, but we don't (shouldn't) require LCSSA form for dependence analysis to work.

From what I can see, getSCEVAtScope does not consider control flow structures at all when querying at top-level scope (ie we cannot distinguish between guarded vs non-guarded top-level scopes), so even if we did getSCEVAtScope(i) + getSCEVAtScope(j) we would still have the problem that i-loop may not have been executed.

I suppose the control flow issue may be solvable with predication, but even then we should still be able to symbolically represent the expressions and perform simplification before trying to evaluate with getSCEVAtScope. For example getSCEVAtScope(i<c>+j-i<c>) gets simplified to getSCEVAtScope(j) which can be evaluated without violating the dominance relationship. Maybe the assertions should move to getSCEVAtScope? @reames Phillip could you please share your thoughts on this?

IMHO for the value of %i inside the %j loop, it should use the exit value of %i, not the AddRecExpr at all, even without the conditional. That is, handle it as if it was LCSSA

In order to do that first we need to detect if we are in a situation where one loop does not dominate another (similar to the proposed solution in this patch) and if so use the exit values. The exit values would have to be computed using getSCEVAtScope I assume, but getSCEVAtScope may return the input SCEV if it cannot compute the exit value which puts us back at square one.

For the case where we have a conditionally guarded loop, we can conservatively assume that the condition is always true and add the loop bounds to the set of constraints being considered. Since the dependence is disproved only if the Diophantine function is outside of the sum of min/max values for all the bounds, by considering the guarded loop bound we are only tightening the constraint and being more conservative. I'll ask Preston to confirm this.

bmahjour added a project: Restricted Project.May 5 2022, 12:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 12:21 PM