This is an archive of the discontinued LLVM Phabricator instance.

[mlir][scf] Simplify affine.min ops after loop peeling
ClosedPublic

Authored by springerm on Jul 31 2021, 6:47 AM.

Details

Summary

Simplify affine.min ops, enabling various other canonicalizations inside the peeled loop body.

affine.min ops such as:

map = affine_map<(d0)[s0, s1] -> (s0, -d0 + s1)>
%r = affine.min #affine.min #map(%iv)[%step, %ub]

are rewritten them into (in the case the peeled loop):

%r = %step

To determine how an affine.min op should be rewritten and to prove its correctness, FlatAffineConstraints is utilized.

Depends On D107730

Diff Detail

Event Timeline

springerm created this revision.Jul 31 2021, 6:47 AM
springerm requested review of this revision.Jul 31 2021, 6:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2021, 6:47 AM

Generally looks good!

mlir/include/mlir/Dialect/SCF/Transforms.h
76

Does the description need mention of loop peeling anymore?
It seems to me this is a general feature that can apply to any value iv that is known to be bound by ub and to increment by step.

I would recommend the following:

  1. split this into: a. a helper that constructs the set and returns whether it is empty b. an IR rewriter that calls the helper
  2. rename "insideLoop" into a boolean that convey lesser-equal / greater than.
mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp
174

I would rephrase this in a more mathematical form; I'd love to "see" the constraints system you are building rather than have it described in text.

My experience is that a formatting constraints sets by block matrices where the 0 and subidentities appear clearly makes it quite easy to follow; see e.e. page 4 of http://icps.u-strasbg.fr/~bastoul/research/papers/VBGC06-ICS.pdf which should give an intuitive understanding of the layout of the constraints (in a different use case).

194

braces for multiline if please.

195

I am not sure I understand this note, could you explain?
It seems you want to ignore failures here but I am not sure how this works.

223

I do not see the "for each" behavior here?

233

Can we make the creation of those dims ahead of time?
The addInequality would be a bit more annoying to write but it is usually more readable to just fix the dimension of the constraint set before introducing inequalities (when possible).

255

I'd restructure this block and the previous a bit if possible; it wasn't immediately clear to me why you needed the extra insert here? Then after digging a bit more I saw that it is probably getSliceBounds that drops the particular dimension by projecting.

260

This would benefit from seeing the whole constraint set in math form.

springerm updated this revision to Diff 363636.Aug 2 2021, 11:31 PM
springerm marked 2 inline comments as done.

rebase + update

springerm edited the summary of this revision. (Show Details)Aug 2 2021, 11:31 PM
springerm updated this revision to Diff 363637.Aug 2 2021, 11:45 PM

minor updates

springerm added inline comments.Aug 3 2021, 5:51 PM
mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp
195

Update: I no longer use addLowerOrUpperBound here and add an equality directly.

223

The "for each" is in addLowerOrUpperBound (it is not written as a for-each and if you're not familiar with this function, it may be hard to see). addLowerOrUpperBound adds an inequality for each result in mapOpMap.

ftynse added inline comments.Aug 4 2021, 5:32 AM
mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp
267–268

I don't see anything in getSliceBounds that can clear the minOpValueUb so the first condition is always false. The second condition is currently also always false, but there is a TODO in getSliceBound that would trigger it, consider adding some debug spew here for when this becomes the case.

278

Nit: also assert that eq[kDimMinOpUb] == 0? Looking at the current system of constraints, it should be the case, but better to future-proof this.

281
291

I'd consider just removing the last inequality at the end of the loop, this will spare a copy on each iteration.

308

Are we sure that the UB AffineExpr has exactly three inputs? There doesn't seem to be any filtering of the affine.min that gets processed, so the original map can seemingly be arbitrarily complex as long as the emptiness check is satisfied.

springerm updated this revision to Diff 365086.Aug 8 2021, 11:07 PM
springerm marked 6 inline comments as done.

address comments

springerm edited the summary of this revision. (Show Details)Aug 8 2021, 11:07 PM
springerm added inline comments.
mlir/include/mlir/Dialect/SCF/Transforms.h
76

Refactored as follows: The logic for solving the constraint system etc. is in canonicalizeAffineMinOp.

This function could even be moved out of the SCF dialect into a different file.

mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp
233

I rewrote large parts of the commit. Dimensions are still added "in the middle", though. This is more convenient, because I can pass the FlatAffineConstraints to a builder function, that adds pattern-specific constraints. At that point, I do not want to bother the user of canonicalizeAffineMinOp with unexpected extra dims. In the constraint builder, only those dims exist in the constraint set, that were specified by the caller (dims parameter).

278

I no longer use addEquality here.

308

Correct, this is wrong here. Had the fix in a dependent commit that's not out for review yet, but moved it into this one.

aartbik added inline comments.Aug 9 2021, 2:01 PM
mlir/include/mlir/Dialect/SCF/Transforms.h
47

can you elaborate on the "is beneficial" (i.e. we generally distinguish between enabling transformations and transformations that "cleanup" after the fact, and this is clearly the latter)

mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp
107

document the result value (ie. returns failure when not rewritten)

203

typo: replace with that bound?

springerm marked 7 inline comments as done.Aug 12 2021, 1:58 AM
springerm added inline comments.
mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp
107

Extended the comment. Also added detailed description of what loops are rewritten in the comment of peelAndCanonicalizeForLoop.

260

Added to the beginning of the function.

springerm updated this revision to Diff 365939.Aug 12 2021, 2:16 AM
springerm marked an inline comment as done.

address comments

nicolasvasilache requested changes to this revision.Aug 18 2021, 4:34 AM
nicolasvasilache added inline comments.
mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp
164

Bound -> Bind

168

assoicated - > associated

179

So I am looking at the implementation of alignAffineMapWithValues as it now also gets used from here and I find that it could be significantly improved IMO.

I would just make everything explicit and use:

/// Sparse replace method. Apply AffineExpr::replace(`map`) to each of the
/// results and return a new AffineMap with the new results and with inferred
/// number of dims and symbols.
AffineMap replace(const DenseMap<AffineExpr, AffineExpr> &map) const;

To get there, there would be a helper function that would take an operand and return the AffineExpr to which the operand is aligned in the constraint system.
If not there yet it would add it and return a new AffineSymbolExpr (I think from looking at your code).

Then you can easily for_each those into he DenseMap and call the replace func.
This should significantly improve composability.

Does this make sense ?

(Not for this revision but would be nice as a followup).

188

"replaced with the bound" -> "folded away" ?

226

dims -> operands ?

250

It seems quite convoluted to pass a lambda with implicit invariant that after "addDim in this function is called then the number of columns must match".

I would get rid of constraintsBuilder entirely and just lift l 239 - 252 into the caller.

261

Same comment as above, I think constructing an explicit DenseMap<AffineExpr, AffineExpr> based on Values and just using the appropriate sparse replacement function would be significantly more readable.

323

In the grander order of things where we call FM, I find this micro-optimization to be more confusing than anything else (especially in the context of new symValues that you also have to assert against).

I'd prefer to see a fresh copy of the FlatAffineValueConstraint, add a constraint, test for emptiness and let if RAII away.

326

(I think) the block of code below somewhat reproduces AffineMap::compressUnusedDims and Symbols?
But this is mostly hidden by the fact that there are empt values that may be introduced for Dim ?

In light of this last point, I think the convolution has reached a point where the separation between alignment and Value / Attribute worlds would already be beneficial in this revision.

380

This seems unnecessarily convoluted, at the very least I would:

// Add loop peeling invariant. This is the main piece of knowledge that
// enables AffineMinOp simplification.
if (insideLoop) {
  // ub - iv >= step (equiv.: -iv + ub - step + 0 >= 0)
  // Intuitively: Inside the peeled loop, every iteration is a "full"
  // iteration, i.e., step divides the iteration space `ub - lb` evenly.
  auto lambda = ...;
  return canonicalizeAffineMinOp(rewriter, minOp,
                             /*dims=*/ValueRange{iv, ub, step}, lambda);
}
// ub - iv < step (equiv.: iv + -ub + step - 1 >= 0)
// Intuitively: `iv` is the split bound here, i.e., the iteration variable
// value of the very last iteration (in the unpeeled loop). At that point,
// there are less than `step` elements remaining. (Otherwise, the peeled
// loop would run for at least one more iteration.)
auto lambda = ...;
return canonicalizeAffineMinOp(rewriter, minOp,
                           /*dims=*/ValueRange{iv, ub, step}, lambda);

However, in light of the other comments I would refactor much more deeply and just pass a FlatAffineConstraints to the canonicalize function.

mlir/test/Dialect/SCF/for-loop-peeling.mlir
4

Hmm seems like this should simplify to (s1 - s0) mod s2.
Likely some missing affineexpr simplification pattern, however it may just be adding more canonicalization patterns that will continue to fail in slightly more complex cases.
Maybe a lot of this should be using FlatAffineConstraints directly.

168

#map3 -> #[[MAP3]]

201

I find a little hard to keep track of things at such a distance, could you please reorder in interleaved form?

// Most common case: Rewrite min(%ub - %iv, %step) to %step.
//      CHECK:     memref.store %[[STEP]]
%m0 = affine.min #map0(%ub, %iv)[%step]
memref.store %m0, %d[%c0] : memref<?xindex>

 // Increase %ub - %iv a little bit, pattern should still apply.
...

// At the end, the checks within the scf.if

Maybe even embed the affine_map into the affine.min since it is not reused ?

This revision now requires changes to proceed.Aug 18 2021, 4:34 AM
springerm marked 10 inline comments as done.

address comments

mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp
164

I think this should actually be "bound" (to bound). As in upper/lower bound.

226

These are the SSA values associated to the dimensions in the constraints set. These dims can be referred to in the constraintsBuilder. Note: dims are not the operands of the AffineMinOp. Those would be minOp.operands().

E.g.:
The caller of canonicalizeAffineMinOp adds two inequalities via the constraintBuilder. Those inequalities use iv, ub, step dimensions. dims contains the SSA values of those dimensions. Note: It is important that dims and the (in)equalities creates by the constraintBuilder stay in sync.

Looking at this code again, I think it would be better to use FlatAffineValueConstraints here instead of FlatAffineConstraints. Then I don't have to maintain extra helper variables for dims/syms SSA values. With the recent refactorings, I can just call the base class functions which do not have any "affine dialect" assumptions. I can try splitting the FlatAffineValueConstraints class in a subsequent commit, so that the "affine dialect" functionality is properly separated (as we talked last time) and cannot be called by accident in here.

I'll refactor this as follows:

  • canonicalizeAffineMinOp takes a FlatAffineValueConstraints as an argument.
  • dims parameter is gone.
250

Replaced by FlatAffineValueConstraints. There is need to keep track of dim/sym SSA values anymore.

323

That's how I had it in an earlier revision. This was a suggestion by another reviewer to avoid the overhead of copying the FlatAffineConstraints. Either one is fine with me.

326

I think I should use canonicalizeMapAndOperands here. It requires only a small change to support "empty" Values.

380

Replaced with FlatAffineValueConstraints, which is created by the caller and passed to canonicalizeAffineMinOp.

This revision is now accepted and ready to land.Aug 19 2021, 12:21 AM
This revision was automatically updated to reflect the committed changes.