This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
1/1
ScalarEvolution.h
-
lib/Analysis/
-
Analysis/
7/8
DependenceAnalysis.cpp
3/3
ScalarEvolution.cpp
-
test/Analysis/DependenceAnalysis/
-
Analysis/
-
DependenceAnalysis/
-
SiblingLoopLimitation.ll

Differential D75628

[DA] [SCEV] Provide facility to check for total ordering based on dominance
Needs ReviewPublic

Authored by bmahjour on Mar 4 2020, 10:53 AM.

Download Raw Diff

Details

Reviewers

Meinersbur
fhahn
dmgreen
Whitney
etiotto
reames
kbarton
spop

Summary

Currently SCEV assumes that all recurrences used by a given SCEV need to dominate each other. In other words there must be a total ordering on the set of loops used in a SCEV expression. When doing MIV tests on accesses in two sibling loops that are triangular, we can run into a problem because the loops in SCEV expressions for the bounds of one access do not necessarily dominate the SCEV expressions for the bounds of the other access. For example, with an LLVM_ASSERT="on" build, we get a crash in ScalarEvolution when we running DependenceAnalysis on the following code (see the LIT test bellow for the IR):

void foo(int *restrict A, int n1, int n2, int n3) {
  for (int i1 = 0; i1 < n1; i1++) {
    for (int i2 = 2; i2 < n2; i2++) {
      for (int i3 = i2 + 1; i3 < n3; i3++) {
        A[i2 + i3*n2] = 11;
      }
    }
    for (int i4 = 2; i4 < n3; i4++) {
      for (int i5 = 1; i5 < i4 - 1; i5++) {
        A[i5] = 22;
      }
    }
  }
}

ScalarEvolution.cpp:736: int CompareSCEVComplexity(EquivalenceClasses<const llvm::SCEV *> &, EquivalenceClasses<const llvm::Value *> &, const llvm::LoopInfo *const, const llvm::SCEV *, const llvm::SCEV *, llvm::DominatorTree &, unsigned int): Assertion `DT.dominates(RHead, LHead) && "No dominance between recurrences used by one SCEV?"' failed.
Stack dump:
...
 #9 0x00007c9e3a1845d8 CompareSCEVComplexity(llvm::EquivalenceClasses<llvm::SCEV const*>&, llvm::EquivalenceClasses<llvm::Value const*>&, llvm::LoopInfo const*, llvm::SCEV const*, llvm::SCEV const*, llvm::DominatorTree&, unsigned int) 
#10 0x00007c9e3a145854 GroupByComplexity(llvm::SmallVectorImpl<llvm::SCEV const*>&, llvm::LoopInfo*, llvm::DominatorTree&) 
#11 0x00007c9e3a138b08 llvm::ScalarEvolution::getAddExpr(llvm::SmallVectorImpl<llvm::SCEV const*>&, llvm::SCEV::NoWrapFlags, unsigned int)
#12 0x00007c9e39f7d300 llvm::DependenceInfo::testBounds(unsigned char, unsigned int, llvm::DependenceInfo::BoundInfo*, llvm::SCEV const*) const 
#13 0x00007c9e39f7c32c llvm::DependenceInfo::banerjeeMIVtest(llvm::SCEV const*, llvm::SCEV const*, llvm::SmallBitVector const&, llvm::FullDependence&) const 
#14 0x00007c9e39f7ba80 llvm::DependenceInfo::testMIV(llvm::SCEV const*, llvm::SCEV const*, llvm::SmallBitVector const&, llvm::FullDependence&) const 
#15 0x00007c9e39f857b0 llvm::DependenceInfo::depends(llvm::Instruction*, llvm::Instruction*, bool)

This patch adds a query to ScalarEvolution to be able to examine a list of SCEV expressions and determine if the total ordering constraint holds for the set of associated loops. The MIV test uses this facility to check if the bounds can even be represented by SCEV, and return if the check fails.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

bmahjour created this revision.Mar 4 2020, 10:53 AM

Herald added subscribers: mgrang, javed.absar, hiraditya. · View Herald TranscriptMar 4 2020, 10:53 AM

Harbormaster completed remote builds in B48079: Diff 248238.Mar 4 2020, 1:31 PM

Meinersbur added inline comments.Mar 4 2020, 1:34 PM

llvm/lib/Analysis/DependenceAnalysis.cpp
2566	Is it even correct to call `testBounds` with SCEVExprs bounds with disjoint loop nests? Maybe the bug is in to consider `MaxLevels` instead of `CommonLevels`.
2725	Isn't `return false` the bail-out?
llvm/lib/Analysis/ScalarEvolution.cpp
12364	This is basically a 'contains' relationship. Correct? Implementations of `std::sort` may assume that the relation is irreflexible (and `assert` on that). Use `properlyDominates` instead?
12369–12373	[suggestion] for (int i = 1, e = SortedLoops.size(); i < e; ++i) { if (!DT.dominates(SortedLoops[i-1]->getHeader(), SortedLoops[i]->getHeader())) return false; }

Address Michael's comments.

llvm/lib/Analysis/DependenceAnalysis.cpp
2566	The Banerjee Inequality requires that delta between the coefficients of the first terms of the Diophantine function `h` be in between LB and UB, where LB and UB are defined to be the sum of min/max quantities over all loop levels (not just the common levels), so the code is correct in looping over all loops including the siblings. This is more of a limitation in SCEV not being able to represent the bounds when they are non-invariant recurrences that straddle non-dominating loops. I'm actually not quite sure why there is this assumption in SCEV about dominance. Do you know?
2725	No, this function is trying to disprove dependence by returning false. If the delta is known not to be in the range of [LB, UB] then it returns false meaning that there is no real (or integer) solution to the diophantine equation which indicates that there is no dependence. When we cannot prove that the delta is in the range, then the function conservatively returns true (meaning that the dependence cannot be disproved).
llvm/lib/Analysis/ScalarEvolution.cpp
12364	Right, it's a good idea.

Meinersbur added inline comments.Mar 5 2020, 1:58 PM

llvm/include/llvm/Analysis/ScalarEvolution.h
1106	Since `satisfiesTotalOrder` does not add Exprs, `(Mutable)ArrayRef` might be sufficient as arguments.
llvm/lib/Analysis/DependenceAnalysis.cpp
2566	I'm actually not quite sure why there is this assumption in SCEV about dominance. Do you know? AFAIU it is not about dominance, but that SCEVAddRecExpr only being valid within the loop (in the loop applies dominance by the loop header). If outside the loop, `getSCEVAtScope` 'removes' the SCEVAddRecExprs for loops it is not in. Using SCEVAddRecExprs from sibling loops in the same expression would be a contradiction as it cannot be valid in both loops at the same time. I don't know how `DependenceInfo` is supposed to work comparing instructions in sibling loops, but I'd suspect the problem being there, not in SCEV. Maybe looking up // Use Banerjee's Inequalities to test an MIV subscript pair. // (Wolfe, in the race-car book, calls this the Extreme Value Test.) // Generally follows the discussion in Section 2.5.2 of // // Optimizing Supercompilers for Supercomputers // Michael Wolfe // could help.

Harbormaster completed remote builds in B48252: Diff 248573.Mar 5 2020, 2:17 PM

bmahjour marked 3 inline comments as done.Mar 6 2020, 10:56 AM

bmahjour added inline comments.

llvm/lib/Analysis/DependenceAnalysis.cpp
2566	Using SCEVAddRecExprs from sibling loops in the same expression would be a contradiction as it cannot be valid in both loops at the same time. Suppose you have two non-nested sibling loops like this: int i, j; for (i = 0; i < n; i++) ... for (j = 0; j < n; j++) ... ... = i + j; I can see that if we add `{0,+,1}<j_loop>` and `{0,+,1}<i_loop>` together to form a new SCEV expression it cannot be evaluated in either of the loops. However, if we evaluate `i+j` after both loops and passing `nullptr` to `getSCEVAtScope` we expect to get an expression that evaluates to `2n`. So theoretically we should be able to represent the expression symbolically, and only fail if trying to evaluate it in either the i-loop or the j-loop. Currently SCEV does not even support creating such an expression symbolically (as per asserts in `CompareSCEVComplexity` and `isKnownViaInduction`), instead of asserting in `getSCEVAtScope`. I don't know how DependenceInfo is supposed to work comparing instructions in sibling loops, but I'd suspect the problem being there, not in SCEV. Maybe looking up... I don't have the "race-car" book, but I did lookup the equations in Optimizing Compilers for Modern Architectures (by Randy Allen & Ken Kennedy)* and what the `DependenceInfo` is trying to do makes sense to me. There is a special form of Banerjee Inequality for trapezoidal and triangular loops which is much more complicated, but is expected to produce more accurate results, however applying the original test to trapezoidal/triangular loops should still produce correct, but possibly over-conservative, results.

Use ArrayRef instead of SmallVector as the argument to satisfiesTotalOrder.

Harbormaster completed remote builds in B48374: Diff 248793.Mar 6 2020, 12:09 PM

Meinersbur added a reviewer: spop.Mar 9 2020, 12:00 PM

Meinersbur added inline comments.

llvm/lib/Analysis/DependenceAnalysis.cpp
2566	When using `i` or `j`, `getSCEVAtScope` should be applied before combining with other SCEVs to ensure that we get the right representation for where it is used. That is, `getSCEVAtScope(i) + getSCEVAtScope(j)` instead of `getSCEVAtScope(i+j)`. It might be that nothing interesting happens during `ScalarEvolution::getAddExpr`. However, we'd get into problems if the normalization of SCEVAdd makes use of properties that are only valid within the loop. Note that this is not an issue when in LCSAA, since the uses of %i and %j after the loops would be $l.lcssa respectively %j.lcssa representing the exit values already. As would if we'd put one of the loops inside an if (e.g. loop guard), forcing an exit node phi: if (c) { for (i = 0; i < n; i++) ... } for (j = 0; j < n; j++) ... Using the SCEVAddRec after the if would be illegal as it ignores what the value of %i would be if `c` was false. It is also wrong according to the dominator assumption for SCEV, probably because of this very reason. For dependence check between the loops, IMHO for the value of %i inside the %j loop, it should use the exit value of %i, not the AddRecExpr at all, even without the conditional. That is, handle it as if it was LCSSA. I think in your test case it is because the AddRecExpr is itself inside another loop and hence not dominating the other loop body. What does Optimizing Compilers for Modern Architectures about this case? Maybe we could ask the original author Preston Briggs <preston.briggs@gmail.com>. I do have the race-car book and I looked up the Extreme Value Test. The book ignores cases where statements are not in the same loop.

bmahjour marked an inline comment as done.Mar 26 2020, 8:43 AM

bmahjour added inline comments.

llvm/lib/Analysis/DependenceAnalysis.cpp
2566	I agree LCSSA can avoid this issue, but we don't (shouldn't) require LCSSA form for dependence analysis to work. From what I can see, getSCEVAtScope does not consider control flow structures at all when querying at top-level scope (ie we cannot distinguish between guarded vs non-guarded top-level scopes), so even if we did `getSCEVAtScope(i) + getSCEVAtScope(j)` we would still have the problem that `i-loop` may not have been executed. I suppose the control flow issue may be solvable with predication, but even then we should still be able to symbolically represent the expressions and perform simplification before trying to evaluate with getSCEVAtScope. For example `getSCEVAtScope(i<c>+j-i<c>)` gets simplified to `getSCEVAtScope(j)` which can be evaluated without violating the dominance relationship. Maybe the assertions should move to getSCEVAtScope? @reames Phillip could you please share your thoughts on this? IMHO for the value of %i inside the %j loop, it should use the exit value of %i, not the AddRecExpr at all, even without the conditional. That is, handle it as if it was LCSSA In order to do that first we need to detect if we are in a situation where one loop does not dominate another (similar to the proposed solution in this patch) and if so use the exit values. The exit values would have to be computed using `getSCEVAtScope` I assume, but `getSCEVAtScope` may return the input SCEV if it cannot compute the exit value which puts us back at square one. For the case where we have a conditionally guarded loop, we can conservatively assume that the condition is always true and add the loop bounds to the set of constraints being considered. Since the dependence is disproved only if the Diophantine function is outside of the sum of min/max values for all the bounds, by considering the guarded loop bound we are only tightening the constraint and being more conservative. I'll ask Preston to confirm this.

bmahjour added a project: Restricted Project.May 5 2022, 12:21 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 12:21 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

ScalarEvolution.h

8 lines

lib/

Analysis/

DependenceAnalysis.cpp

24 lines

ScalarEvolution.cpp

29 lines

test/

Analysis/

DependenceAnalysis/

SiblingLoopLimitation.ll

121 lines

Diff 248793

llvm/include/llvm/Analysis/ScalarEvolution.h

Show First 20 Lines • Show All 1,094 Lines • ▼ Show 20 Lines	public:
const SCEV rewriteUsingPredicate(const SCEV S, const Loop *L,		const SCEV rewriteUsingPredicate(const SCEV S, const Loop *L,
SCEVUnionPredicate &A);		SCEVUnionPredicate &A);
/// Tries to convert the \p S expression to an AddRec expression,		/// Tries to convert the \p S expression to an AddRec expression,
/// adding additional predicates to \p Preds as required.		/// adding additional predicates to \p Preds as required.
const SCEVAddRecExpr *convertSCEVToAddRecWithPredicates(		const SCEVAddRecExpr *convertSCEVToAddRecWithPredicates(
const SCEV S, const Loop L,		const SCEV S, const Loop L,
SmallPtrSetImpl<const SCEVPredicate *> &Preds);		SmallPtrSetImpl<const SCEVPredicate *> &Preds);

		/// Examines the list of SCEV expressions to find all their used loops and
		/// returns true if a total ordering relationship based on dominance can be
		/// applied to that set of loops found. Returns false otherwise.
		bool satisfiesTotalOrder(ArrayRef<const SCEV *> Exprs) const;
		MeinersburUnsubmitted Done Reply Inline Actions Since `satisfiesTotalOrder` does not add Exprs, `(Mutable)ArrayRef` might be sufficient as arguments. Meinersbur: Since `satisfiesTotalOrder` does not add Exprs, `(Mutable)ArrayRef` might be sufficient as…

private:		private:
/// A CallbackVH to arrange for ScalarEvolution to be notified whenever a		/// A CallbackVH to arrange for ScalarEvolution to be notified whenever a
/// Value is deleted.		/// Value is deleted.
class SCEVCallbackVH final : public CallbackVH {		class SCEVCallbackVH final : public CallbackVH {
ScalarEvolution *SE;		ScalarEvolution *SE;

void deleted() override;		void deleted() override;
void allUsesReplacedWith(Value *New) override;		void allUsesReplacedWith(Value *New) override;
▲ Show 20 Lines • Show All 766 Lines • ▼ Show 20 Lines	const SCEV getOrCreateAddRecExpr(ArrayRef<const SCEV > Ops,
const Loop *L, SCEV::NoWrapFlags Flags);		const Loop *L, SCEV::NoWrapFlags Flags);

/// Return x if \p Val is f(x) where f is a 1-1 function.		/// Return x if \p Val is f(x) where f is a 1-1 function.
const SCEV stripInjectiveFunctions(const SCEV Val) const;		const SCEV stripInjectiveFunctions(const SCEV Val) const;

/// Find all of the loops transitively used in \p S, and fill \p LoopsUsed.		/// Find all of the loops transitively used in \p S, and fill \p LoopsUsed.
/// A loop is considered "used" by an expression if it contains		/// A loop is considered "used" by an expression if it contains
/// an add rec on said loop.		/// an add rec on said loop.
void getUsedLoops(const SCEV S, SmallPtrSetImpl<const Loop > &LoopsUsed);		void getUsedLoops(const SCEV *S,
		SmallPtrSetImpl<const Loop *> &LoopsUsed) const;

/// Find all of the loops transitively used in \p S, and update \c LoopUsers		/// Find all of the loops transitively used in \p S, and update \c LoopUsers
/// accordingly.		/// accordingly.
void addToLoopUseLists(const SCEV *S);		void addToLoopUseLists(const SCEV *S);

/// Try to match the pattern generated by getURemExpr(A, B). If successful,		/// Try to match the pattern generated by getURemExpr(A, B). If successful,
/// Assign A and B to LHS and RHS, respectively.		/// Assign A and B to LHS and RHS, respectively.
bool matchURem(const SCEV Expr, const SCEV &LHS, const SCEV *&RHS);		bool matchURem(const SCEV Expr, const SCEV &LHS, const SCEV *&RHS);
▲ Show 20 Lines • Show All 181 Lines • Show Last 20 Lines

llvm/lib/Analysis/DependenceAnalysis.cpp

Show First 20 Lines • Show All 2,557 Lines • ▼ Show 20 Lines	if (Bound[K].Upper[Dependence::DVEntry::ALL])
LLVM_DEBUG(dbgs() << *Bound[K].Upper[Dependence::DVEntry::ALL] << '\n');		LLVM_DEBUG(dbgs() << *Bound[K].Upper[Dependence::DVEntry::ALL] << '\n');
else		else
LLVM_DEBUG(dbgs() << "+inf\n");		LLVM_DEBUG(dbgs() << "+inf\n");
#endif		#endif
}		}

// Test the , , *, ... case.		// Test the , , *, ... case.
bool Disproved = false;		bool Disproved = false;
if (testBounds(Dependence::DVEntry::ALL, 0, Bound, Delta)) {		if (testBounds(Dependence::DVEntry::ALL, 0, Bound, Delta)) {
		MeinersburUnsubmitted Done Reply Inline Actions Is it even correct to call `testBounds` with SCEVExprs bounds with disjoint loop nests? Maybe the bug is in to consider `MaxLevels` instead of `CommonLevels`. Meinersbur: Is it even correct to call `testBounds` with SCEVExprs bounds with disjoint loop nests? Maybe…
		bmahjourAuthorUnsubmitted Done Reply Inline Actions The Banerjee Inequality requires that delta between the coefficients of the first terms of the Diophantine function `h` be in between LB and UB, where LB and UB are defined to be the sum of min/max quantities over all loop levels (not just the common levels), so the code is correct in looping over all loops including the siblings. This is more of a limitation in SCEV not being able to represent the bounds when they are non-invariant recurrences that straddle non-dominating loops. I'm actually not quite sure why there is this assumption in SCEV about dominance. Do you know? bmahjour: The Banerjee Inequality requires that delta between the coefficients of the first terms of the…
		MeinersburUnsubmitted Done Reply Inline Actions I'm actually not quite sure why there is this assumption in SCEV about dominance. Do you know? AFAIU it is not about dominance, but that SCEVAddRecExpr only being valid within the loop (in the loop applies dominance by the loop header). If outside the loop, `getSCEVAtScope` 'removes' the SCEVAddRecExprs for loops it is not in. Using SCEVAddRecExprs from sibling loops in the same expression would be a contradiction as it cannot be valid in both loops at the same time. I don't know how `DependenceInfo` is supposed to work comparing instructions in sibling loops, but I'd suspect the problem being there, not in SCEV. Maybe looking up // Use Banerjee's Inequalities to test an MIV subscript pair. // (Wolfe, in the race-car book, calls this the Extreme Value Test.) // Generally follows the discussion in Section 2.5.2 of // // Optimizing Supercompilers for Supercomputers // Michael Wolfe // could help. Meinersbur: > I'm actually not quite sure why there is this assumption in SCEV about dominance. Do you know?
		bmahjourAuthorUnsubmitted Done Reply Inline Actions Using SCEVAddRecExprs from sibling loops in the same expression would be a contradiction as it cannot be valid in both loops at the same time. Suppose you have two non-nested sibling loops like this: int i, j; for (i = 0; i < n; i++) ... for (j = 0; j < n; j++) ... ... = i + j; I can see that if we add `{0,+,1}<j_loop>` and `{0,+,1}<i_loop>` together to form a new SCEV expression it cannot be evaluated in either of the loops. However, if we evaluate `i+j` after both loops and passing `nullptr` to `getSCEVAtScope` we expect to get an expression that evaluates to `2n`. So theoretically we should be able to represent the expression symbolically, and only fail if trying to evaluate it in either the i-loop or the j-loop. Currently SCEV does not even support creating such an expression symbolically (as per asserts in `CompareSCEVComplexity` and `isKnownViaInduction`), instead of asserting in `getSCEVAtScope`. I don't know how DependenceInfo is supposed to work comparing instructions in sibling loops, but I'd suspect the problem being there, not in SCEV. Maybe looking up... I don't have the "race-car" book, but I did lookup the equations in Optimizing Compilers for Modern Architectures (by Randy Allen & Ken Kennedy)* and what the `DependenceInfo` is trying to do makes sense to me. There is a special form of Banerjee Inequality for trapezoidal and triangular loops which is much more complicated, but is expected to produce more accurate results, however applying the original test to trapezoidal/triangular loops should still produce correct, but possibly over-conservative, results. bmahjour: > Using SCEVAddRecExprs from sibling loops in the same expression would be a contradiction as…
		MeinersburUnsubmitted Not Done Reply Inline Actions When using `i` or `j`, `getSCEVAtScope` should be applied before combining with other SCEVs to ensure that we get the right representation for where it is used. That is, `getSCEVAtScope(i) + getSCEVAtScope(j)` instead of `getSCEVAtScope(i+j)`. It might be that nothing interesting happens during `ScalarEvolution::getAddExpr`. However, we'd get into problems if the normalization of SCEVAdd makes use of properties that are only valid within the loop. Note that this is not an issue when in LCSAA, since the uses of %i and %j after the loops would be $l.lcssa respectively %j.lcssa representing the exit values already. As would if we'd put one of the loops inside an if (e.g. loop guard), forcing an exit node phi: if (c) { for (i = 0; i < n; i++) ... } for (j = 0; j < n; j++) ... Using the SCEVAddRec after the if would be illegal as it ignores what the value of %i would be if `c` was false. It is also wrong according to the dominator assumption for SCEV, probably because of this very reason. For dependence check between the loops, IMHO for the value of %i inside the %j loop, it should use the exit value of %i, not the AddRecExpr at all, even without the conditional. That is, handle it as if it was LCSSA. I think in your test case it is because the AddRecExpr is itself inside another loop and hence not dominating the other loop body. What does Optimizing Compilers for Modern Architectures about this case? Maybe we could ask the original author Preston Briggs <preston.briggs@gmail.com>. I do have the race-car book and I looked up the Extreme Value Test. The book ignores cases where statements are not in the same loop. Meinersbur: When using `i` or `j`, `getSCEVAtScope` should be applied before combining with other SCEVs to…
		bmahjourAuthorUnsubmitted Done Reply Inline Actions I agree LCSSA can avoid this issue, but we don't (shouldn't) require LCSSA form for dependence analysis to work. From what I can see, getSCEVAtScope does not consider control flow structures at all when querying at top-level scope (ie we cannot distinguish between guarded vs non-guarded top-level scopes), so even if we did `getSCEVAtScope(i) + getSCEVAtScope(j)` we would still have the problem that `i-loop` may not have been executed. I suppose the control flow issue may be solvable with predication, but even then we should still be able to symbolically represent the expressions and perform simplification before trying to evaluate with getSCEVAtScope. For example `getSCEVAtScope(i<c>+j-i<c>)` gets simplified to `getSCEVAtScope(j)` which can be evaluated without violating the dominance relationship. Maybe the assertions should move to getSCEVAtScope? @reames Phillip could you please share your thoughts on this? IMHO for the value of %i inside the %j loop, it should use the exit value of %i, not the AddRecExpr at all, even without the conditional. That is, handle it as if it was LCSSA In order to do that first we need to detect if we are in a situation where one loop does not dominate another (similar to the proposed solution in this patch) and if so use the exit values. The exit values would have to be computed using `getSCEVAtScope` I assume, but `getSCEVAtScope` may return the input SCEV if it cannot compute the exit value which puts us back at square one. For the case where we have a conditionally guarded loop, we can conservatively assume that the condition is always true and add the loop bounds to the set of constraints being considered. Since the dependence is disproved only if the Diophantine function is outside of the sum of min/max values for all the bounds, by considering the guarded loop bound we are only tightening the constraint and being more conservative. I'll ask Preston to confirm this. bmahjour: I agree LCSSA can avoid this issue, but we don't (shouldn't) require LCSSA form for dependence…
// Explore the direction vector hierarchy.		// Explore the direction vector hierarchy.
unsigned DepthExpanded = 0;		unsigned DepthExpanded = 0;
unsigned NewDeps = exploreDirections(1, A, B, Bound,		unsigned NewDeps = exploreDirections(1, A, B, Bound,
Loops, DepthExpanded, Delta);		Loops, DepthExpanded, Delta);
if (NewDeps > 0) {		if (NewDeps > 0) {
bool Improved = false;		bool Improved = false;
for (unsigned K = 1; K <= CommonLevels; ++K) {		for (unsigned K = 1; K <= CommonLevels; ++K) {
if (Loops[K]) {		if (Loops[K]) {
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	#endif

Bound[Level].Direction = Dependence::DVEntry::ALL;		Bound[Level].Direction = Dependence::DVEntry::ALL;
return NewDeps;		return NewDeps;
}		}
else		else
return exploreDirections(Level + 1, A, B, Bound, Loops, DepthExpanded, Delta);		return exploreDirections(Level + 1, A, B, Bound, Loops, DepthExpanded, Delta);
}		}


// Returns true iff the current bounds are plausible.		// Returns true iff the current bounds are plausible.
bool DependenceInfo::testBounds(unsigned char DirKind, unsigned Level,		bool DependenceInfo::testBounds(unsigned char DirKind, unsigned Level,
BoundInfo Bound, const SCEV Delta) const {		BoundInfo Bound, const SCEV Delta) const {
		// Check to see if the set of loops referenced by the bounds satisfy a total
		// order. If not, SCEV cannot compute the bounds.
		SmallVector<const SCEV *, 8> BoundExprs;
		for (unsigned K = 1; K <= MaxLevels; ++K) {
		const SCEV *LB = Bound[K].Lower[Bound[K].Direction];
		if (LB)
		BoundExprs.push_back(LB);
		}
		BoundExprs.push_back(Delta);
		if (!SE->satisfiesTotalOrder(BoundExprs))
		return true;
		MeinersburUnsubmitted Done Reply Inline Actions Isn't `return false` the bail-out? Meinersbur: Isn't `return false` the bail-out?
		bmahjourAuthorUnsubmitted Done Reply Inline Actions No, this function is trying to disprove dependence by returning false. If the delta is known not to be in the range of [LB, UB] then it returns false meaning that there is no real (or integer) solution to the diophantine equation which indicates that there is no dependence. When we cannot prove that the delta is in the range, then the function conservatively returns true (meaning that the dependence cannot be disproved). bmahjour: No, this function is trying to disprove dependence by returning false. If the delta is known…
		BoundExprs.clear();
		for (unsigned K = 1; K <= MaxLevels; ++K) {
		const SCEV *UB = Bound[K].Upper[Bound[K].Direction];
		if (UB)
		BoundExprs.push_back(UB);
		}
		BoundExprs.push_back(Delta);
		if (!SE->satisfiesTotalOrder(BoundExprs))
		return true;
		BoundExprs.clear();

Bound[Level].Direction = DirKind;		Bound[Level].Direction = DirKind;
if (const SCEV *LowerBound = getLowerBound(Bound))		if (const SCEV *LowerBound = getLowerBound(Bound))
if (isKnownPredicate(CmpInst::ICMP_SGT, LowerBound, Delta))		if (isKnownPredicate(CmpInst::ICMP_SGT, LowerBound, Delta))
return false;		return false;
if (const SCEV *UpperBound = getUpperBound(Bound))		if (const SCEV *UpperBound = getUpperBound(Bound))
if (isKnownPredicate(CmpInst::ICMP_SGT, Delta, UpperBound))		if (isKnownPredicate(CmpInst::ICMP_SGT, Delta, UpperBound))
return false;		return false;
return true;		return true;
}		}


// Computes the upper and lower bounds for level K		// Computes the upper and lower bounds for level K
// using the * direction. Records them in Bound.		// using the * direction. Records them in Bound.
// Wolfe gives the equations		// Wolfe gives the equations
//		//
// LB^*_k = (A^-_k - B^+_k)(U_k - L_k) + (A_k - B_k)L_k		// LB^*_k = (A^-_k - B^+_k)(U_k - L_k) + (A_k - B_k)L_k
// UB^*_k = (A^+_k - B^-_k)(U_k - L_k) + (A_k - B_k)L_k		// UB^*_k = (A^+_k - B^-_k)(U_k - L_k) + (A_k - B_k)L_k
//		//
// Since we normalize loops, we can simplify these equations to		// Since we normalize loops, we can simplify these equations to
▲ Show 20 Lines • Show All 1,359 Lines • Show Last 20 Lines

llvm/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,989 Lines • ▼ Show 20 Lines	auto RemoveSCEVFromBackedgeMap =
++I;		++I;
}		}
};		};

RemoveSCEVFromBackedgeMap(BackedgeTakenCounts);		RemoveSCEVFromBackedgeMap(BackedgeTakenCounts);
RemoveSCEVFromBackedgeMap(PredicatedBackedgeTakenCounts);		RemoveSCEVFromBackedgeMap(PredicatedBackedgeTakenCounts);
}		}

void		void ScalarEvolution::getUsedLoops(
ScalarEvolution::getUsedLoops(const SCEV *S,		const SCEV S, SmallPtrSetImpl<const Loop > &LoopsUsed) const {
SmallPtrSetImpl<const Loop *> &LoopsUsed) {
struct FindUsedLoops {		struct FindUsedLoops {
FindUsedLoops(SmallPtrSetImpl<const Loop *> &LoopsUsed)		FindUsedLoops(SmallPtrSetImpl<const Loop *> &LoopsUsed)
: LoopsUsed(LoopsUsed) {}		: LoopsUsed(LoopsUsed) {}
SmallPtrSetImpl<const Loop *> &LoopsUsed;		SmallPtrSetImpl<const Loop *> &LoopsUsed;
bool follow(const SCEV *S) {		bool follow(const SCEV *S) {
if (auto *AR = dyn_cast<SCEVAddRecExpr>(S))		if (auto *AR = dyn_cast<SCEVAddRecExpr>(S))
LoopsUsed.insert(AR->getLoop());		LoopsUsed.insert(AR->getLoop());
return true;		return true;
▲ Show 20 Lines • Show All 335 Lines • ▼ Show 20 Lines	const SCEVAddRecExpr *ScalarEvolution::convertSCEVToAddRecWithPredicates(
// Since the transformation was successful, we can now transfer the SCEV		// Since the transformation was successful, we can now transfer the SCEV
// predicates.		// predicates.
for (auto *P : TransformPreds)		for (auto *P : TransformPreds)
Preds.insert(P);		Preds.insert(P);

return AddRec;		return AddRec;
}		}

		bool ScalarEvolution::satisfiesTotalOrder(ArrayRef<const SCEV *> Exprs) const {
		SmallPtrSet<const Loop *, 16> Loops;
		for (const SCEV *E : Exprs)
		getUsedLoops(E, Loops);
		if (Loops.size() < 2)
		return true;
		// Copy to a container that can be sorted.
		SmallVector<const Loop *, 16> SortedLoops;
		SortedLoops.assign(Loops.begin(), Loops.end());
		Loops.clear();
		std::sort(SortedLoops.begin(), SortedLoops.end(),
		[this](const Loop LHS, const Loop RHS) {
		return this->DT.properlyDominates(LHS->getHeader(),
		RHS->getHeader());
		MeinersburUnsubmitted Done Reply Inline Actions This is basically a 'contains' relationship. Correct? Implementations of `std::sort` may assume that the relation is irreflexible (and `assert` on that). Use `properlyDominates` instead? Meinersbur: This is basically a 'contains' relationship. Correct? Implementations of `std::sort` may…
		bmahjourAuthorUnsubmitted Done Reply Inline Actions Right, it's a good idea. bmahjour: Right, it's a good idea.
		});
		// Now that the set is sorted, we can detect in O(n) that total order is
		// satisfied iff every element is larger than the next element.
		for (int I = 1, E = SortedLoops.size(); I < E; ++I)
		if (!DT.dominates(SortedLoops[I - 1]->getHeader(),
		SortedLoops[I]->getHeader()))
		return false;
		return true;
		}
		MeinersburUnsubmitted Done Reply Inline Actions [suggestion] for (int i = 1, e = SortedLoops.size(); i < e; ++i) { if (!DT.dominates(SortedLoops[i-1]->getHeader(), SortedLoops[i]->getHeader())) return false; } Meinersbur: [suggestion] ``` for (int i = 1, e = SortedLoops.size(); i < e; ++i) { if (!DT.dominates…

/// SCEV predicates		/// SCEV predicates
SCEVPredicate::SCEVPredicate(const FoldingSetNodeIDRef ID,		SCEVPredicate::SCEVPredicate(const FoldingSetNodeIDRef ID,
SCEVPredicateKind Kind)		SCEVPredicateKind Kind)
: FastID(ID), Kind(Kind) {}		: FastID(ID), Kind(Kind) {}

SCEVEqualPredicate::SCEVEqualPredicate(const FoldingSetNodeIDRef ID,		SCEVEqualPredicate::SCEVEqualPredicate(const FoldingSetNodeIDRef ID,
const SCEV LHS, const SCEV RHS)		const SCEV LHS, const SCEV RHS)
: SCEVPredicate(ID, P_Equal), LHS(LHS), RHS(RHS) {		: SCEVPredicate(ID, P_Equal), LHS(LHS), RHS(RHS) {
▲ Show 20 Lines • Show All 301 Lines • Show Last 20 Lines

llvm/test/Analysis/DependenceAnalysis/SiblingLoopLimitation.ll

This file was added.

				; RUN: opt < %s -disable-output -passes="print<da>" 2>&1 \| FileCheck %s

				; CHECK-LABEL: foo
				; CHECK-LABEL: Src: store i32 11, i32* %arrayidx, align 4 --> Dst: store i32 22, i32* %arrayidx22, align 4
				; CHECK-NEXT: da analyze - output [S\|<]!

				;;void foo(int *restrict A, int n1, int n2, int n3) {
				;; for (int i1 = 0; i1 < n1; i1++) {
				;; for (int i2 = 2; i2 < n2; i2++) {
				;; for (int i3 = i2 + 1; i3 < n3; i3++) {
				;; A[i2 + i3*n2] = 11;
				;; }
				;; }
				;; for (int i4 = 2; i4 < n3; i4++) {
				;; for (int i5 = 1; i5 < i4 - 1; i5++) {
				;; A[i5] = 22;
				;; }
				;; }
				;; }
				;;}

				define void @foo(i32* noalias %A, i32 signext %n1, i32 signext %n2, i32 signext %n3) {
				entry:
				%cmp9 = icmp sgt i32 %n1, 0
				br i1 %cmp9, label %for.body.preheader, label %for.end31

				for.body.preheader: ; preds = %entry
				%0 = sext i32 %n2 to i64
				%1 = sext i32 %n3 to i64
				%2 = add i32 %n3, -1
				br label %for.body

				for.body: ; preds = %for.body.preheader, %for.inc29
				%i1.010 = phi i32 [ %inc30, %for.inc29 ], [ 0, %for.body.preheader ]
				%cmp23 = icmp sgt i32 %n2, 2
				br i1 %cmp23, label %for.body4.preheader, label %for.end12

				for.body4.preheader: ; preds = %for.body
				%wide.trip.count17 = zext i32 %n2 to i64
				br label %for.body4

				for.body4: ; preds = %for.body4.preheader, %for.inc10
				%indvars.iv15 = phi i64 [ 2, %for.body4.preheader ], [ %indvars.iv.next16, %for.inc10 ]
				%indvars.iv = phi i64 [ 3, %for.body4.preheader ], [ %indvars.iv.next, %for.inc10 ]
				%indvars.iv.next16 = add nuw nsw i64 %indvars.iv15, 1
				%cmp61 = icmp slt i64 %indvars.iv.next16, %1
				br i1 %cmp61, label %for.body8.preheader, label %for.inc10

				for.body8.preheader: ; preds = %for.body4
				%wide.trip.count = zext i32 %n3 to i64
				br label %for.body8

				for.body8: ; preds = %for.body8.preheader, %for.body8
				%indvars.iv11 = phi i64 [ %indvars.iv, %for.body8.preheader ], [ %indvars.iv.next12, %for.body8 ]
				%3 = mul nsw i64 %indvars.iv11, %0
				%4 = add nsw i64 %indvars.iv15, %3
				%arrayidx = getelementptr inbounds i32, i32* %A, i64 %4
				store i32 11, i32* %arrayidx, align 4
				%indvars.iv.next12 = add nuw nsw i64 %indvars.iv11, 1
				%exitcond = icmp ne i64 %indvars.iv.next12, %wide.trip.count
				br i1 %exitcond, label %for.body8, label %for.inc10.loopexit

				for.inc10.loopexit: ; preds = %for.body8
				br label %for.inc10

				for.inc10: ; preds = %for.inc10.loopexit, %for.body4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond18 = icmp ne i64 %indvars.iv.next16, %wide.trip.count17
				br i1 %exitcond18, label %for.body4, label %for.end12.loopexit

				for.end12.loopexit: ; preds = %for.inc10
				br label %for.end12

				for.end12: ; preds = %for.end12.loopexit, %for.body
				%cmp147 = icmp sgt i32 %n3, 2
				br i1 %cmp147, label %for.body16.preheader, label %for.inc29

				for.body16.preheader: ; preds = %for.end12
				%wide.trip.count27 = zext i32 %2 to i64
				br label %for.body16

				for.body16: ; preds = %for.body16.preheader, %for.inc26
				%indvars.iv25 = phi i64 [ 1, %for.body16.preheader ], [ %indvars.iv.next26, %for.inc26 ]
				%i4.08 = phi i32 [ %inc27, %for.inc26 ], [ 2, %for.body16.preheader ]
				%cmp185 = icmp ugt i32 %i4.08, 2
				br i1 %cmp185, label %for.body20.preheader, label %for.inc26

				for.body20.preheader: ; preds = %for.body16
				br label %for.body20

				for.body20: ; preds = %for.body20.preheader, %for.body20
				%indvars.iv19 = phi i64 [ 1, %for.body20.preheader ], [ %indvars.iv.next20, %for.body20 ]
				%arrayidx22 = getelementptr inbounds i32, i32* %A, i64 %indvars.iv19
				store i32 22, i32* %arrayidx22, align 4
				%indvars.iv.next20 = add nuw nsw i64 %indvars.iv19, 1
				%exitcond24 = icmp ne i64 %indvars.iv.next20, %indvars.iv25
				br i1 %exitcond24, label %for.body20, label %for.inc26.loopexit

				for.inc26.loopexit: ; preds = %for.body20
				br label %for.inc26

				for.inc26: ; preds = %for.inc26.loopexit, %for.body16
				%inc27 = add nuw nsw i32 %i4.08, 1
				%indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1
				%exitcond28 = icmp ne i64 %indvars.iv.next26, %wide.trip.count27
				br i1 %exitcond28, label %for.body16, label %for.inc29.loopexit

				for.inc29.loopexit: ; preds = %for.inc26
				br label %for.inc29

				for.inc29: ; preds = %for.inc29.loopexit, %for.end12
				%inc30 = add nuw nsw i32 %i1.010, 1
				%exitcond29 = icmp ne i32 %inc30, %n1
				br i1 %exitcond29, label %for.body, label %for.end31.loopexit

				for.end31.loopexit: ; preds = %for.inc29
				br label %for.end31

				for.end31: ; preds = %for.end31.loopexit, %entry
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[DA] [SCEV] Provide facility to check for total ordering based on dominanceNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 248793

llvm/include/llvm/Analysis/ScalarEvolution.h

llvm/lib/Analysis/DependenceAnalysis.cpp

llvm/lib/Analysis/ScalarEvolution.cpp

llvm/test/Analysis/DependenceAnalysis/SiblingLoopLimitation.ll

[DA] [SCEV] Provide facility to check for total ordering based on dominance
Needs ReviewPublic