This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/SCF/
-
mlir/
-
Dialect/
-
SCF/
3/3
Transforms.h
-
lib/Dialect/SCF/Transforms/
-
Dialect/
-
SCF/
-
Transforms/
34/37
LoopSpecialization.cpp
-
test/Dialect/SCF/
-
Dialect/
-
SCF/
2/3
for-loop-peeling.mlir

Differential D107222

[mlir][scf] Simplify affine.min ops after loop peeling
ClosedPublic

Authored by springerm on Jul 31 2021, 6:47 AM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
ftynse
bondhugula
aartbik
rriddle

Commits

rG8e8b70aa8479: [mlir][scf] Simplify affine.min ops after loop peeling

Summary

Simplify affine.min ops, enabling various other canonicalizations inside the peeled loop body.

affine.min ops such as:

map = affine_map<(d0)[s0, s1] -> (s0, -d0 + s1)>
%r = affine.min #affine.min #map(%iv)[%step, %ub]

are rewritten them into (in the case the peeled loop):

%r = %step

To determine how an affine.min op should be rewritten and to prove its correctness, FlatAffineConstraints is utilized.

Depends On D107730

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

springerm created this revision.Jul 31 2021, 6:47 AM

Herald added subscribers: Chia-hungDuan, dcaballe, cota and 17 others. · View Herald TranscriptJul 31 2021, 6:47 AM

springerm requested review of this revision.Jul 31 2021, 6:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2021, 6:47 AM

Herald added a subscriber: stephenneuendorffer. · View Herald Transcript

Harbormaster completed remote builds in B117323: Diff 363295.Jul 31 2021, 7:27 AM

rebase

Harbormaster completed remote builds in B117383: Diff 363379.Aug 1 2021, 11:45 PM

Generally looks good!

mlir/include/mlir/Dialect/SCF/Transforms.h
76	Does the description need mention of loop peeling anymore? It seems to me this is a general feature that can apply to any value `iv` that is known to be bound by `ub` and to increment by `step`. I would recommend the following: split this into: a. a helper that constructs the set and returns whether it is empty b. an IR rewriter that calls the helper rename "insideLoop" into a boolean that convey lesser-equal / greater than.
mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp
174	I would rephrase this in a more mathematical form; I'd love to "see" the constraints system you are building rather than have it described in text. My experience is that a formatting constraints sets by block matrices where the 0 and subidentities appear clearly makes it quite easy to follow; see e.e. page 4 of http://icps.u-strasbg.fr/~bastoul/research/papers/VBGC06-ICS.pdf which should give an intuitive understanding of the layout of the constraints (in a different use case).
194	braces for multiline if please.
195	I am not sure I understand this note, could you explain? It seems you want to ignore failures here but I am not sure how this works.
223	I do not see the "for each" behavior here?
233	Can we make the creation of those dims ahead of time? The addInequality would be a bit more annoying to write but it is usually more readable to just fix the dimension of the constraint set before introducing inequalities (when possible).
255	I'd restructure this block and the previous a bit if possible; it wasn't immediately clear to me why you needed the extra insert here? Then after digging a bit more I saw that it is probably getSliceBounds that drops the particular dimension by projecting.
260	This would benefit from seeing the whole constraint set in math form.

rebase

Herald added a subscriber: wrengr. · View Herald TranscriptAug 2 2021, 7:14 PM

Harbormaster completed remote builds in B117554: Diff 363616.Aug 2 2021, 8:14 PM

rebase + update

springerm edited the summary of this revision. (Show Details)Aug 2 2021, 11:31 PM

springerm added a parent revision: D107326: [mlir][scf] Fix bug in peelForLoop.

minor updates

Harbormaster completed remote builds in B117568: Diff 363637.Aug 3 2021, 12:17 AM

springerm added a reviewer: aartbik.Aug 3 2021, 4:07 PM

update

springerm added inline comments.Aug 3 2021, 5:51 PM

mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp
195	Update: I no longer use `addLowerOrUpperBound` here and add an equality directly.
223	The "for each" is in `addLowerOrUpperBound` (it is not written as a for-each and if you're not familiar with this function, it may be hard to see). `addLowerOrUpperBound` adds an inequality for each result in `mapOpMap`.

Harbormaster completed remote builds in B117786: Diff 363924.Aug 3 2021, 6:22 PM

ftynse added inline comments.Aug 4 2021, 5:32 AM

mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp
267–268	I don't see anything in getSliceBounds that can clear the `minOpValueUb` so the first condition is always false. The second condition is currently also always false, but there is a TODO in getSliceBound that would trigger it, consider adding some debug spew here for when this becomes the case.
278	Nit: also assert that eq[kDimMinOpUb] == 0? Looking at the current system of constraints, it should be the case, but better to future-proof this.
281
291	I'd consider just removing the last inequality at the end of the loop, this will spare a copy on each iteration.
308	Are we sure that the UB AffineExpr has exactly three inputs? There doesn't seem to be any filtering of the affine.min that gets processed, so the original map can seemingly be arbitrarily complex as long as the emptiness check is satisfied.

address comments

springerm edited the summary of this revision. (Show Details)Aug 8 2021, 11:07 PM

springerm added a parent revision: D107730: [mlir][IR] Add optional offset `offset` parameter to shiftDims/shiftSymbols.

springerm added inline comments.

mlir/include/mlir/Dialect/SCF/Transforms.h
76	Refactored as follows: The logic for solving the constraint system etc. is in `canonicalizeAffineMinOp`. This function could even be moved out of the SCF dialect into a different file.
mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp
233	I rewrote large parts of the commit. Dimensions are still added "in the middle", though. This is more convenient, because I can pass the FlatAffineConstraints to a builder function, that adds pattern-specific constraints. At that point, I do not want to bother the user of `canonicalizeAffineMinOp` with unexpected extra dims. In the constraint builder, only those dims exist in the constraint set, that were specified by the caller (`dims` parameter).
278	I no longer use `addEquality` here.
308	Correct, this is wrong here. Had the fix in a dependent commit that's not out for review yet, but moved it into this one.

springerm added a child revision: D107731: [mlir][scf] Add general affine.min canonicalization pattern.Aug 8 2021, 11:07 PM

Harbormaster completed remote builds in B118607: Diff 365086.Aug 8 2021, 11:51 PM

springerm added a child revision: D107733: [mlir][SparseTensor] Split scf.for loop into masked/unmasked parts.Aug 9 2021, 12:26 AM

aartbik added inline comments.Aug 9 2021, 2:01 PM

mlir/include/mlir/Dialect/SCF/Transforms.h
47	can you elaborate on the "is beneficial" (i.e. we generally distinguish between enabling transformations and transformations that "cleanup" after the fact, and this is clearly the latter)
mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp
107	document the result value (ie. returns failure when not rewritten)
203	typo: replace with that bound?

rebase

Harbormaster completed remote builds in B119028: Diff 365689.Aug 11 2021, 1:14 AM

rebase

Harbormaster completed remote builds in B119194: Diff 365914.Aug 12 2021, 12:40 AM

springerm marked 7 inline comments as done.Aug 12 2021, 1:58 AM

springerm added inline comments.

mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp
107	Extended the comment. Also added detailed description of what loops are rewritten in the comment of `peelAndCanonicalizeForLoop`.
260	Added to the beginning of the function.

address comments

Harbormaster completed remote builds in B119215: Diff 365939.Aug 12 2021, 2:17 AM

rebase

Harbormaster completed remote builds in B119830: Diff 366802.Aug 16 2021, 10:16 PM

nicolasvasilache requested changes to this revision.Aug 18 2021, 4:34 AM

nicolasvasilache added inline comments.

mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp
164	Bound -> Bind
168	assoicated - > associated
179	So I am looking at the implementation of `alignAffineMapWithValues` as it now also gets used from here and I find that it could be significantly improved IMO. I would just make everything explicit and use: /// Sparse replace method. Apply AffineExpr::replace(`map`) to each of the /// results and return a new AffineMap with the new results and with inferred /// number of dims and symbols. AffineMap replace(const DenseMap<AffineExpr, AffineExpr> &map) const; To get there, there would be a helper function that would take an operand and return the AffineExpr to which the operand is aligned in the constraint system. If not there yet it would add it and return a new AffineSymbolExpr (I think from looking at your code). Then you can easily for_each those into he DenseMap and call the replace func. This should significantly improve composability. Does this make sense ? (Not for this revision but would be nice as a followup).
188	"replaced with the bound" -> "folded away" ?
226	dims -> operands ?
250	It seems quite convoluted to pass a lambda with implicit invariant that after "addDim in this function is called then the number of columns must match". I would get rid of constraintsBuilder entirely and just lift l 239 - 252 into the caller.
261	Same comment as above, I think constructing an explicit DenseMap<AffineExpr, AffineExpr> based on Values and just using the appropriate sparse replacement function would be significantly more readable.
323	In the grander order of things where we call FM, I find this micro-optimization to be more confusing than anything else (especially in the context of new symValues that you also have to assert against). I'd prefer to see a fresh copy of the FlatAffineValueConstraint, add a constraint, test for emptiness and let if RAII away.
326	(I think) the block of code below somewhat reproduces AffineMap::compressUnusedDims and Symbols? But this is mostly hidden by the fact that there are empt values that may be introduced for Dim ? In light of this last point, I think the convolution has reached a point where the separation between alignment and Value / Attribute worlds would already be beneficial in this revision.
380	This seems unnecessarily convoluted, at the very least I would: // Add loop peeling invariant. This is the main piece of knowledge that // enables AffineMinOp simplification. if (insideLoop) { // ub - iv >= step (equiv.: -iv + ub - step + 0 >= 0) // Intuitively: Inside the peeled loop, every iteration is a "full" // iteration, i.e., step divides the iteration space `ub - lb` evenly. auto lambda = ...; return canonicalizeAffineMinOp(rewriter, minOp, /dims=/ValueRange{iv, ub, step}, lambda); } // ub - iv < step (equiv.: iv + -ub + step - 1 >= 0) // Intuitively: `iv` is the split bound here, i.e., the iteration variable // value of the very last iteration (in the unpeeled loop). At that point, // there are less than `step` elements remaining. (Otherwise, the peeled // loop would run for at least one more iteration.) auto lambda = ...; return canonicalizeAffineMinOp(rewriter, minOp, /dims=/ValueRange{iv, ub, step}, lambda); However, in light of the other comments I would refactor much more deeply and just pass a FlatAffineConstraints to the canonicalize function.
mlir/test/Dialect/SCF/for-loop-peeling.mlir
4	Hmm seems like this should simplify to `(s1 - s0) mod s2`. Likely some missing affineexpr simplification pattern, however it may just be adding more canonicalization patterns that will continue to fail in slightly more complex cases. Maybe a lot of this should be using FlatAffineConstraints directly.
168	`#map3 -> #[[MAP3]]`
201	I find a little hard to keep track of things at such a distance, could you please reorder in interleaved form? // Most common case: Rewrite min(%ub - %iv, %step) to %step. // CHECK: memref.store %[[STEP]] %m0 = affine.min #map0(%ub, %iv)[%step] memref.store %m0, %d[%c0] : memref<?xindex> // Increase %ub - %iv a little bit, pattern should still apply. ... // At the end, the checks within the scf.if Maybe even embed the affine_map into the affine.min since it is not reused ?

This revision now requires changes to proceed.Aug 18 2021, 4:34 AM

address comments

mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp
164	I think this should actually be "bound" (to bound). As in upper/lower bound.
226	These are the SSA values associated to the dimensions in the constraints set. These dims can be referred to in the `constraintsBuilder`. Note: `dims` are not the operands of the AffineMinOp. Those would be `minOp.operands()`. E.g.: The caller of `canonicalizeAffineMinOp` adds two inequalities via the constraintBuilder. Those inequalities use `iv`, `ub`, `step` dimensions. `dims` contains the SSA values of those dimensions. Note: It is important that `dims` and the (in)equalities creates by the constraintBuilder stay in sync. Looking at this code again, I think it would be better to use `FlatAffineValueConstraints` here instead of `FlatAffineConstraints`. Then I don't have to maintain extra helper variables for dims/syms SSA values. With the recent refactorings, I can just call the base class functions which do not have any "affine dialect" assumptions. I can try splitting the `FlatAffineValueConstraints` class in a subsequent commit, so that the "affine dialect" functionality is properly separated (as we talked last time) and cannot be called by accident in here. I'll refactor this as follows: canonicalizeAffineMinOp takes a `FlatAffineValueConstraints` as an argument. `dims` parameter is gone.
250	Replaced by FlatAffineValueConstraints. There is need to keep track of dim/sym SSA values anymore.
323	That's how I had it in an earlier revision. This was a suggestion by another reviewer to avoid the overhead of copying the FlatAffineConstraints. Either one is fine with me.
326	I think I should use `canonicalizeMapAndOperands` here. It requires only a small change to support "empty" Values.
380	Replaced with FlatAffineValueConstraints, which is created by the caller and passed to `canonicalizeAffineMinOp`.

Harbormaster completed remote builds in B120262: Diff 367399.Aug 18 2021, 11:23 PM

nicolasvasilache accepted this revision.Aug 19 2021, 12:21 AM

This revision is now accepted and ready to land.Aug 19 2021, 12:21 AM

Closed by commit rG8e8b70aa8479: [mlir][scf] Simplify affine.min ops after loop peeling (authored by springerm). · Explain WhyAug 19 2021, 1:25 AM

This revision was automatically updated to reflect the committed changes.

springerm added a commit: rG8e8b70aa8479: [mlir][scf] Simplify affine.min ops after loop peeling.

Herald added a reviewer: rriddle. · View Herald TranscriptAug 19 2021, 1:25 AM

springerm mentioned this in D107730: [mlir][IR] Add optional offset `offset` parameter to shiftDims/shiftSymbols.Aug 19 2021, 1:37 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

SCF/

Transforms.h

16 lines

lib/

Dialect/

SCF/

Transforms/

LoopSpecialization.cpp

253 lines

test/

Dialect/

SCF/

for-loop-peeling.mlir

106 lines

Diff 365689

mlir/include/mlir/Dialect/SCF/Transforms.h

	Show All 38 Lines
	/// analysis.			/// analysis.
	void naivelyFuseParallelOps(Region &region);			void naivelyFuseParallelOps(Region &region);

	/// Rewrite a for loop with bounds/step that potentially do not divide evenly			/// Rewrite a for loop with bounds/step that potentially do not divide evenly
	/// into a for loop where the step divides the iteration space evenly, followed			/// into a for loop where the step divides the iteration space evenly, followed
	/// by an scf.if for the last (partial) iteration (if any). This transformation			/// by an scf.if for the last (partial) iteration (if any). This transformation
	/// is called "loop peeling".			/// is called "loop peeling".
	///			///
	/// Other patterns can simplify/canonicalize operations in the body of the loop			/// This transformation is beneficial for a wide range of transformations such
				aartbikUnsubmitted Done Reply Inline Actions can you elaborate on the "is beneficial" (i.e. we generally distinguish between enabling transformations and transformations that "cleanup" after the fact, and this is clearly the latter) aartbik: can you elaborate on the "is beneficial" (i.e. we generally distinguish between enabling…
	/// and the scf.if. This is beneficial for a wide range of transformations such
	/// as vectorization or loop tiling.			/// as vectorization or loop tiling.
	///			///
	/// E.g., assuming a lower bound of 0 (for illustration purposes):			/// E.g., assuming a lower bound of 0 (for illustration purposes):
	/// ```			/// ```
	/// scf.for %iv = %c0 to %ub step %c4 {			/// scf.for %iv = %c0 to %ub step %c4 {
	/// (loop body)			/// (loop body)
	/// }			/// }
	/// ```			/// ```
	/// is rewritten into the following pseudo IR:			/// is rewritten into the following pseudo IR:
	/// ```			/// ```
	/// %newUb = %ub - (%ub mod %c4)			/// %newUb = %ub - (%ub mod %c4)
	/// scf.for %iv = %c0 to %newUb step %c4 {			/// scf.for %iv = %c0 to %newUb step %c4 {
	/// (loop body)			/// (loop body)
	/// }			/// }
	/// scf.if %newUb < %ub {			/// scf.if %newUb < %ub {
	/// (loop body)			/// (loop body)
	/// }			/// }
	/// ```			/// ```
	///			///
	/// This function rewrites the given scf.for loop in-place and creates a new			/// After loop peeling, this function tries to simplify/canonicalize affine.min
	/// scf.if operation (returned via `ifOp`) for the last iteration.			/// operations in the body of the loop and the scf.if, taking advantage of the
	///			/// fact that every iteration of the peeled loop is a "full" iteration. This
	/// TODO: Simplify affine.min ops inside the new loop/if statement.			/// canonicalization is expected to enable further canonicalization
	LogicalResult peelForLoop(RewriterBase &b, ForOp forOp, scf::IfOp &ifOp);			/// opportunities through other patterns.
				///
				/// Note: This function rewrites the given scf.for loop in-place and creates a
				/// new scf.if operation for the last iteration. It replaces all uses of the
				/// unpeeled loop with the results of the newly generated scf.if.
				LogicalResult peelAndCanonicalizeForLoop(RewriterBase &rewriter, ForOp forOp);
				nicolasvasilacheUnsubmitted Done Reply Inline Actions Does the description need mention of loop peeling anymore? It seems to me this is a general feature that can apply to any value `iv` that is known to be bound by `ub` and to increment by `step`. I would recommend the following: split this into: a. a helper that constructs the set and returns whether it is empty b. an IR rewriter that calls the helper rename "insideLoop" into a boolean that convey lesser-equal / greater than. nicolasvasilache: Does the description need mention of loop peeling anymore? It seems to me this is a general…
				springermAuthorUnsubmitted Done Reply Inline Actions Refactored as follows: The logic for solving the constraint system etc. is in `canonicalizeAffineMinOp`. This function could even be moved out of the SCF dialect into a different file. springerm: Refactored as follows: The logic for solving the constraint system etc. is in…

	/// Tile a parallel loop of the form			/// Tile a parallel loop of the form
	/// scf.parallel (%i0, %i1) = (%arg0, %arg1) to (%arg2, %arg3)			/// scf.parallel (%i0, %i1) = (%arg0, %arg1) to (%arg2, %arg3)
	/// step (%arg4, %arg5)			/// step (%arg4, %arg5)
	///			///
	/// into			/// into
	/// scf.parallel (%i0, %i1) = (%arg0, %arg1) to (%arg2, %arg3)			/// scf.parallel (%i0, %i1) = (%arg0, %arg1) to (%arg2, %arg3)
	/// step (%arg4*tileSize[0],			/// step (%arg4*tileSize[0],
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp

//===- LoopSpecialization.cpp - scf.parallel/SCR.for specialization -------===// //===- LoopSpecialization.cpp - scf.parallel/SCR.for specialization -------===//

// //

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information. // See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// //

// Specializes parallel loops and for loops for easier unrolling and // Specializes parallel loops and for loops for easier unrolling and

// vectorization. // vectorization.

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#include "PassDetail.h" #include "PassDetail.h"

#include "mlir/Analysis/AffineStructures.h"

#include "mlir/Dialect/Affine/IR/AffineOps.h" #include "mlir/Dialect/Affine/IR/AffineOps.h"

#include "mlir/Dialect/SCF/Passes.h" #include "mlir/Dialect/SCF/Passes.h"

#include "mlir/Dialect/SCF/SCF.h" #include "mlir/Dialect/SCF/SCF.h"

#include "mlir/Dialect/SCF/Transforms.h" #include "mlir/Dialect/SCF/Transforms.h"

#include "mlir/Dialect/StandardOps/IR/Ops.h" #include "mlir/Dialect/StandardOps/IR/Ops.h"

#include "mlir/Dialect/Utils/StaticValueUtils.h" #include "mlir/Dialect/Utils/StaticValueUtils.h"

#include "mlir/IR/AffineExpr.h" #include "mlir/IR/AffineExpr.h"

#include "mlir/IR/BlockAndValueMapping.h" #include "mlir/IR/BlockAndValueMapping.h"

▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines static void specializeForLoopForUnrolling(ForOp op) {

ifOp.getThenBodyBuilder().clone(*op.getOperation(), map); ifOp.getThenBodyBuilder().clone(*op.getOperation(), map);

ifOp.getElseBodyBuilder().clone(*op.getOperation()); ifOp.getElseBodyBuilder().clone(*op.getOperation());

op.erase(); op.erase();

} }

/// Rewrite a for loop with bounds/step that potentially do not divide evenly /// Rewrite a for loop with bounds/step that potentially do not divide evenly

/// into a for loop where the step divides the iteration space evenly, followed /// into a for loop where the step divides the iteration space evenly, followed

/// by an scf.if for the last (partial) iteration (if any). /// by an scf.if for the last (partial) iteration (if any).

LogicalResult mlir::scf::peelForLoop(RewriterBase &b, ForOp forOp, ///

scf::IfOp &ifOp) { /// This function rewrites the given scf.for loop in-place and creates a new

/// scf.if operation for the last iteration. It replaces all uses of the

/// unpeeled loop with the results of the newly generated scf.if.

///

/// The newly generated scf.if operation is returned via `ifOp`. The boundary

/// at which the loop is split (new upper bound) is returned via `splitBound`.

aartbikUnsubmitted

Done

document the result value (ie. returns failure when not rewritten)

aartbik: document the result value (ie. returns failure when not rewritten)

springermAuthorUnsubmitted

Done

Extended the comment. Also added detailed description of what loops are rewritten in the comment of peelAndCanonicalizeForLoop.

springerm: Extended the comment. Also added detailed description of what loops are rewritten in the…

static LogicalResult peelForLoop(RewriterBase &b, ForOp forOp, scf::IfOp &ifOp,

Value &splitBound) {

RewriterBase::InsertionGuard guard(b); RewriterBase::InsertionGuard guard(b);

auto lbInt = getConstantIntValue(forOp.lowerBound()); auto lbInt = getConstantIntValue(forOp.lowerBound());

auto ubInt = getConstantIntValue(forOp.upperBound()); auto ubInt = getConstantIntValue(forOp.upperBound());

auto stepInt = getConstantIntValue(forOp.step()); auto stepInt = getConstantIntValue(forOp.step());

// No specialization necessary if step already divides upper bound evenly. // No specialization necessary if step already divides upper bound evenly.

if (lbInt && ubInt && stepInt && (*ubInt - *lbInt) % *stepInt == 0) if (lbInt && ubInt && stepInt && (*ubInt - *lbInt) % *stepInt == 0)

return failure(); return failure();

// No specialization necessary if step size is 1. // No specialization necessary if step size is 1.

if (stepInt == static_cast<int64_t>(1)) if (stepInt == static_cast<int64_t>(1))

return failure(); return failure();

auto loc = forOp.getLoc(); auto loc = forOp.getLoc();

AffineExpr dim0, dim1, dim2; AffineExpr dim0, dim1, dim2;

bindDims(b.getContext(), dim0, dim1, dim2); bindDims(b.getContext(), dim0, dim1, dim2);

// New upper bound: %ub - (%ub - %lb) mod %step // New upper bound: %ub - (%ub - %lb) mod %step

auto modMap = AffineMap::get(3, 0, {dim1 - ((dim1 - dim0) % dim2)}); auto modMap = AffineMap::get(3, 0, {dim1 - ((dim1 - dim0) % dim2)});

b.setInsertionPoint(forOp); b.setInsertionPoint(forOp);

Value splitBound = b.createOrFold<AffineApplyOp>( splitBound = b.createOrFold<AffineApplyOp>(

loc, modMap, loc, modMap,

ValueRange{forOp.lowerBound(), forOp.upperBound(), forOp.step()}); ValueRange{forOp.lowerBound(), forOp.upperBound(), forOp.step()});

// Set new upper loop bound. // Set new upper loop bound.

Value previousUb = forOp.upperBound(); Value previousUb = forOp.upperBound();

b.updateRootInPlace(forOp, b.updateRootInPlace(forOp,

[&]() { forOp.upperBoundMutable().assign(splitBound); }); [&]() { forOp.upperBoundMutable().assign(splitBound); });

b.setInsertionPointAfter(forOp); b.setInsertionPointAfter(forOp);

Show All 19 Lines static LogicalResult peelForLoop(RewriterBase &b, ForOp forOp, scf::IfOp &ifOp,

// Build else case. // Build else case.

if (!resultTypes.empty()) if (!resultTypes.empty())

ifOp.getElseBodyBuilder(b.getListener()) ifOp.getElseBodyBuilder(b.getListener())

.create<scf::YieldOp>(loc, forOp->getResults()); .create<scf::YieldOp>(loc, forOp->getResults());

return success(); return success();

} }

/// Bound an identifier `pos` in a given FlatAffineConstraints with constraints

nicolasvasilacheUnsubmitted

Done

Bound -> Bind

nicolasvasilache: Bound -> Bind

springermAuthorUnsubmitted

Done

I think this should actually be "bound" (to bound). As in upper/lower bound.

springerm: I think this should actually be "bound" (to bound). As in upper/lower bound.

/// drawn from an affine map. Before adding the constraint, the dimensions/

/// symbols of the affine map are aligned with the constraint set. `operands`

/// are the SSA Value operands used with the affine map. `dims`/`syms` are the

/// SSA Values assoicated with the FlatAffineConstraint's dimension/symbol

nicolasvasilacheUnsubmitted

Done

assoicated - > associated

nicolasvasilache: assoicated - > associated

/// columns. Note that this function adds a new symbol column to the constraint

/// set for each dimension/symbol that exists in the affine map but not in the

/// constraint set. The new symbols are returned via `syms`.

static LogicalResult alignAndAddBound(FlatAffineConstraints &constraints,

unsigned pos, AffineMap map,

ValueRange operands, ValueRange dims,

nicolasvasilacheUnsubmitted

Done

I would rephrase this in a more mathematical form; I'd love to "see" the constraints system you are building rather than have it described in text.

My experience is that a formatting constraints sets by block matrices where the 0 and subidentities appear clearly makes it quite easy to follow; see e.e. page 4 of http://icps.u-strasbg.fr/~bastoul/research/papers/VBGC06-ICS.pdf which should give an intuitive understanding of the layout of the constraints (in a different use case).

nicolasvasilache: I would rephrase this in a more mathematical form; I'd love to "see" the constraints system you…

SmallVector<Value> &syms, bool eq,

bool lower) {

SmallVector<Value> newSyms;

AffineMap alignedMap =

alignAffineMapWithValues(map, operands, dims, syms, &newSyms);

nicolasvasilacheUnsubmitted

Not Done

So I am looking at the implementation of alignAffineMapWithValues as it now also gets used from here and I find that it could be significantly improved IMO.

I would just make everything explicit and use:

/// Sparse replace method. Apply AffineExpr::replace(`map`) to each of the
/// results and return a new AffineMap with the new results and with inferred
/// number of dims and symbols.
AffineMap replace(const DenseMap<AffineExpr, AffineExpr> &map) const;

To get there, there would be a helper function that would take an operand and return the AffineExpr to which the operand is aligned in the constraint system.
If not there yet it would add it and return a new AffineSymbolExpr (I think from looking at your code).

Then you can easily for_each those into he DenseMap and call the replace func.
This should significantly improve composability.

Does this make sense ?

(Not for this revision but would be nice as a followup).

nicolasvasilache: So I am looking at the implementation of `alignAffineMapWithValues` as it now also gets used…

for (unsigned i = 0; i < newSyms.size() - syms.size(); ++i)

constraints.addSymbolId(constraints.getNumSymbolIds());

std::swap(syms, newSyms);

return constraints.addLowerOrUpperBound(pos, alignedMap, eq, lower);

}

/// This function tries to canonicalize affine.min operations by proving that

/// its value is bounded by the same lower and upper bound. In that case, the

/// operation can be replaced with the bound.

nicolasvasilacheUnsubmitted

Done

"replaced with the bound" -> "folded away" ?

nicolasvasilache: "replaced with the bound" -> "folded away" ?

///

/// Bounds are computed by FlatAffineConstraints. Invariants required for

/// finding/proving bounds should be supplied via `constraintBuilder` by adding

/// constraints to the provided FlatAffineConstraints. Only the dimensions in

/// `dims` and constants can be used when adding constraints. Adding new

/// dimensions/symbols is not allowed. (However, local columns may be added.)

nicolasvasilacheUnsubmitted

Done

braces for multiline if please.

nicolasvasilache: braces for multiline if please.

///

nicolasvasilacheUnsubmitted

Done

I am not sure I understand this note, could you explain?
It seems you want to ignore failures here but I am not sure how this works.

nicolasvasilache: I am not sure I understand this note, could you explain? It seems you want to ignore failures…

springermAuthorUnsubmitted

Done

Update: I no longer use addLowerOrUpperBound here and add an equality directly.

springerm: Update: I no longer use `addLowerOrUpperBound` here and add an equality directly.

/// 1. Set up a constraint system with the dimensions passed as `dims`.

/// 2. Call the builder, which may add new constraints.

/// 3. Add dimensions for `minOp` and `minOpUb` (upper bound of `minOp`).

/// 4. Add each result of `minOp` as a dimension `r_i`.

/// 5. Compute an upper bound of `minOp` and bind it to `minOpUb`.

/// 6. For each result of `minOp`: Prove that r_i >= minOpUb. If this is the

/// case, upper_bound(minOp) == lower_bound(minOp) and `minOp` can be

/// replaced with the that bound.

aartbikUnsubmitted

Done

typo: replace with that bound?

aartbik: typo: replace with that bound?

static LogicalResult canonicalizeAffineMinOp(

RewriterBase &rewriter, AffineMinOp minOp, ValueRange dims,

function_ref<LogicalResult(FlatAffineConstraints &)> constraintsBuilder) {

RewriterBase::InsertionGuard guard(rewriter);

AffineMap minOpMap = minOp.getAffineMap();

unsigned numResults = minOpMap.getNumResults();

FlatAffineConstraints constraints;

// Keep track of SSA values of dims (if any), so that affine maps can be

// aligned with the dims in `constraints`.

SmallVector<Value> dimValues, symValues;

/// Add an SSA value as a dimension to the constraint system. If the SSA value

/// is a constant, set the dimension to the constant value.

auto addDim = [&](Value value = {}) {

unsigned dimId = constraints.getNumDimIds();

constraints.addDimId(dimId);

if (auto constInt = getConstantIntValue(value))

constraints.setIdToConstant(dimId, *constInt);

dimValues.push_back(value);

return dimId;

nicolasvasilacheUnsubmitted

Done

I do not see the "for each" behavior here?

nicolasvasilache: I do not see the "for each" behavior here?

springermAuthorUnsubmitted

Done

The "for each" is in addLowerOrUpperBound (it is not written as a for-each and if you're not familiar with this function, it may be hard to see). addLowerOrUpperBound adds an inequality for each result in mapOpMap.

springerm: The "for each" is in `addLowerOrUpperBound` (it is not written as a for-each and if you're not…

};

// Set up constraint system and call builder.

nicolasvasilacheUnsubmitted

Done

dims -> operands ?

nicolasvasilache: dims -> operands ?

springermAuthorUnsubmitted

Done

These are the SSA values associated to the dimensions in the constraints set. These dims can be referred to in the constraintsBuilder. Note: dims are not the operands of the AffineMinOp. Those would be minOp.operands().

E.g.:
The caller of canonicalizeAffineMinOp adds two inequalities via the constraintBuilder. Those inequalities use iv, ub, step dimensions. dims contains the SSA values of those dimensions. Note: It is important that dims and the (in)equalities creates by the constraintBuilder stay in sync.

Looking at this code again, I think it would be better to use FlatAffineValueConstraints here instead of FlatAffineConstraints. Then I don't have to maintain extra helper variables for dims/syms SSA values. With the recent refactorings, I can just call the base class functions which do not have any "affine dialect" assumptions. I can try splitting the FlatAffineValueConstraints class in a subsequent commit, so that the "affine dialect" functionality is properly separated (as we talked last time) and cannot be called by accident in here.

I'll refactor this as follows:

canonicalizeAffineMinOp takes a FlatAffineValueConstraints as an argument.
dims parameter is gone.

springerm: These are the SSA values associated to the dimensions in the constraints set. These dims can be…

for (Value value : dims)

addDim(value);

if (failed(constraintsBuilder(constraints)))

return failure();

// Add extra dimensions.

unsigned dimMinOp = addDim(); // `minOp`

nicolasvasilacheUnsubmitted

Done

Can we make the creation of those dims ahead of time?
The addInequality would be a bit more annoying to write but it is usually more readable to just fix the dimension of the constraint set before introducing inequalities (when possible).

nicolasvasilache: Can we make the creation of those dims ahead of time? The addInequality would be a bit more…

springermAuthorUnsubmitted

Done

I rewrote large parts of the commit. Dimensions are still added "in the middle", though. This is more convenient, because I can pass the FlatAffineConstraints to a builder function, that adds pattern-specific constraints. At that point, I do not want to bother the user of canonicalizeAffineMinOp with unexpected extra dims. In the constraint builder, only those dims exist in the constraint set, that were specified by the caller (dims parameter).

springerm: I rewrote large parts of the commit. Dimensions are still added "in the middle", though. This…

unsigned dimMinOpUb = addDim(); // `minOp` upper bound

unsigned resultDimStart = constraints.getNumDimIds();

for (unsigned i = 0; i < numResults; ++i)

addDim();

// Add an inequality for each result expr_i of minOpMap: minOp <= expr_i

if (failed(alignAndAddBound(constraints, dimMinOp, minOpMap, minOp.operands(),

dimValues, symValues,

/*eq=*/false, /*lower=*/false)))

return failure();

// Set helper dimension r_i for each result expr_i of minOpMap.

for (unsigned i = 0; i < numResults; ++i) {

// Add an equality: r_i = expr_i

if (failed(alignAndAddBound(constraints, resultDimStart + i,

minOpMap.getSubMap({i}), minOp.operands(),

dimValues, symValues, /*eq=*/true,

nicolasvasilacheUnsubmitted

Done

It seems quite convoluted to pass a lambda with implicit invariant that after "addDim in this function is called then the number of columns must match".

I would get rid of constraintsBuilder entirely and just lift l 239 - 252 into the caller.

nicolasvasilache: It seems quite convoluted to pass a lambda with implicit invariant that after "addDim in this…

springermAuthorUnsubmitted

Done

Replaced by FlatAffineValueConstraints. There is need to keep track of dim/sym SSA values anymore.

springerm: Replaced by FlatAffineValueConstraints. There is need to keep track of dim/sym SSA values…

/*lower=*/true)))

return failure();

}

// Try to compute an upper bound for minOp, expressed in terms of the other

nicolasvasilacheUnsubmitted

Done

I'd restructure this block and the previous a bit if possible; it wasn't immediately clear to me why you needed the extra insert here? Then after digging a bit more I saw that it is probably getSliceBounds that drops the particular dimension by projecting.

nicolasvasilache: I'd restructure this block and the previous a bit if possible; it wasn't immediately clear to…

// dimensions.

SmallVector<AffineMap> minOpValLb(1), minOpValUb(1);

constraints.getSliceBounds(dimMinOp, 1, minOp.getContext(), &minOpValLb,

&minOpValUb);

// TODO: `getSliceBounds` may return multiple bounds at the moment. This is

nicolasvasilacheUnsubmitted

Not Done

This would benefit from seeing the whole constraint set in math form.

nicolasvasilache: This would benefit from seeing the whole constraint set in math form.

springermAuthorUnsubmitted

Done

Added to the beginning of the function.

springerm: Added to the beginning of the function.

// a TODO of `getSliceBounds` and not handled here.

nicolasvasilacheUnsubmitted

Not Done

Same comment as above, I think constructing an explicit DenseMap<AffineExpr, AffineExpr> based on Values and just using the appropriate sparse replacement function would be significantly more readable.

nicolasvasilache: Same comment as above, I think constructing an explicit DenseMap<AffineExpr, AffineExpr> based…

if (!minOpValUb[0] || minOpValUb[0].getNumResults() != 1)

return failure(); // No or multiple upper bounds found.

// Add an equality: dimMinOpUb = minOpValUb[0]

// Add back dimension for minOp. (Was removed by `getSliceBounds`.)

AffineMap alignedUbMap = minOpValUb[0].shiftDims(/*shift=*/1,

/*offset=*/dimMinOp);

ftynseUnsubmitted

Done

I don't see anything in getSliceBounds that can clear the minOpValueUb so the first condition is always false. The second condition is currently also always false, but there is a TODO in getSliceBound that would trigger it, consider adding some debug spew here for when this becomes the case.

ftynse: I don't see anything in getSliceBounds that can clear the `minOpValueUb` so the first condition…

if (failed(constraints.addLowerOrUpperBound(dimMinOpUb, alignedUbMap,

/*eq=*/true)))

return failure();

// If the constraint system is empty, there is an inconsistency. (E.g., this

// can happen if loop lb > ub.)

if (constraints.isEmpty())

return failure();

// Prove that each result of minOpMap has a lower bound that is equal to (or

ftynseUnsubmitted

Done

Nit: also assert that eq[kDimMinOpUb] == 0? Looking at the current system of constraints, it should be the case, but better to future-proof this.

ftynse: Nit: also assert that eq[kDimMinOpUb] == 0? Looking at the current system of constraints, it…

springermAuthorUnsubmitted

Done

I no longer use addEquality here.

springerm: I no longer use `addEquality` here.

// greater than) the upper bound of minOp (`kDimMinOpUb`). In that case,

// minOp can be replaced with the bound. I.e., prove that for each result

// expr_i (represented by dimension r_i):

ftynseUnsubmitted

Done

constraints.addEquality(eq);

- // Prove that each result of minOpMap has a lower bound that is equal (or

+ // Prove that each result of minOpMap has a lower bound that is equal to (or

// greater than) the upper bound of minOp (`kDimMinOpUb`). In that case,

ftynse:

// r_i >= minOpUb

// To prove this inequality, add its negation to the constraint set and prove

// that the constraint set is empty.

for (unsigned i = resultDimStart; i < resultDimStart + numResults; ++i) {

FlatAffineConstraints newConstr(constraints);

// Add inequality: r_i < minOpUb (equiv.: minOpUb - r_i - 1 >= 0)

SmallVector<int64_t> ineq(newConstr.getNumCols(), 0);

ineq[dimMinOpUb] = 1;

ftynseUnsubmitted

Done

I'd consider just removing the last inequality at the end of the loop, this will spare a copy on each iteration.

ftynse: I'd consider just removing the last inequality at the end of the loop, this will spare a copy…

ineq[i] = -1;

ineq[newConstr.getNumCols() - 1] = -1;

newConstr.addInequality(ineq);

if (!newConstr.isEmpty())

return failure();

}

// Lower and upper bound of minOp are equal. Replace minOp with its upper

// bound. `dimValues` and `symValues` may have "empty" Values. These must be

// filtered out from the list of Values and from the affine map.

SmallVector<Value> newOperands;

SmallVector<AffineExpr> dimReplacements, symReplacements;

unsigned numDims = 0, numSyms = 0;

for (Value val : dimValues) {

if (val) {

newOperands.push_back(val);

dimReplacements.push_back(rewriter.getAffineDimExpr(numDims++));

ftynseUnsubmitted

Done

Are we sure that the UB AffineExpr has exactly three inputs? There doesn't seem to be any filtering of the affine.min that gets processed, so the original map can seemingly be arbitrarily complex as long as the emptiness check is satisfied.

ftynse: Are we sure that the UB AffineExpr has exactly three inputs? There doesn't seem to be any…

springermAuthorUnsubmitted

Done

Correct, this is wrong here. Had the fix in a dependent commit that's not out for review yet, but moved it into this one.

springerm: Correct, this is wrong here. Had the fix in a dependent commit that's not out for review yet…

} else {

dimReplacements.push_back(rewriter.getAffineDimExpr(numDims));

}

for (Value val : symValues) {

if (val) {

newOperands.push_back(val);

symReplacements.push_back(rewriter.getAffineSymbolExpr(numSyms++));

} else {

symReplacements.push_back(rewriter.getAffineSymbolExpr(numSyms));

}

AffineMap newMap = alignedUbMap.replaceDimsAndSymbols(

dimReplacements, symReplacements, numDims, numSyms);

rewriter.setInsertionPoint(minOp);

nicolasvasilacheUnsubmitted

Done

In the grander order of things where we call FM, I find this micro-optimization to be more confusing than anything else (especially in the context of new symValues that you also have to assert against).

I'd prefer to see a fresh copy of the FlatAffineValueConstraint, add a constraint, test for emptiness and let if RAII away.

nicolasvasilache: In the grander order of things where we call FM, I find this micro-optimization to be more…

springermAuthorUnsubmitted

Done

That's how I had it in an earlier revision. This was a suggestion by another reviewer to avoid the overhead of copying the FlatAffineConstraints. Either one is fine with me.

springerm: That's how I had it in an earlier revision. This was a suggestion by another reviewer to avoid…

rewriter.replaceOpWithNewOp<AffineApplyOp>(minOp, newMap, newOperands);

return success();

}

nicolasvasilacheUnsubmitted

Done

(I think) the block of code below somewhat reproduces AffineMap::compressUnusedDims and Symbols?
But this is mostly hidden by the fact that there are empt values that may be introduced for Dim ?

In light of this last point, I think the convolution has reached a point where the separation between alignment and Value / Attribute worlds would already be beneficial in this revision.

nicolasvasilache: (I think) the block of code below somewhat reproduces AffineMap::compressUnusedDims and Symbols?

springermAuthorUnsubmitted

Done

I think I should use canonicalizeMapAndOperands here. It requires only a small change to support "empty" Values.

springerm: I think I should use `canonicalizeMapAndOperands` here. It requires only a small change to…

/// Try to simplify an affine.min operation `minOp` after loop peeling. This

/// function detects affine.min operations such as (ub is the previous upper

/// bound of the unpeeled loop):

/// ```

/// #map = affine_map<(d0)[s0, s1] -> (s0, -d0 + s1)>

/// %r = affine.min #affine.min #map(%iv)[%step, %ub]

/// ```

/// and rewrites them into (in the case the peeled loop):

/// ```

/// %r = %step

/// ```

/// affine.min operations inside the generated scf.if operation are rewritten in

/// a similar way.

///

/// This function builds up a set of constraints, capable of proving that:

/// * Inside the peeled loop: min(step, ub - iv) == step

/// * Inside the scf.if operation: min(step, ub - iv) == ub - iv

///

/// Note: `ub` is the previous upper bound of the loop (before peeling).

/// `insideLoop` must be true for affine.min ops inside the loop and false for

/// affine.min ops inside the scf.for op.

static LogicalResult rewritePeeledAffineOp(RewriterBase &rewriter,

AffineMinOp minOp, Value iv,

Value ub, Value step,

bool insideLoop) {

auto constraintsBuilder = [&](FlatAffineConstraints &constraints) {

// Add loop peeling invariant. This is the main piece of knowledge that

// enables AffineMinOp simplification.

if (insideLoop) {

// ub - iv >= step (equiv.: -iv + ub - step + 0 >= 0)

// Intuitively: Inside the peeled loop, every iteration is a "full"

// iteration, i.e., step divides the iteration space `ub - lb` evenly.

constraints.addInequality({-1, 1, -1, 0});

} else {

// ub - iv < step (equiv.: iv + -ub + step - 1 >= 0)

// Intuitively: `iv` is the split bound here, i.e., the iteration variable

// value of the very last iteration (in the unpeeled loop). At that point,

// there are less than `step` elements remaining. (Otherwise, the peeled

// loop would run for at least one more iteration.)

constraints.addInequality({1, -1, 1, -1});

}

return success();

};

return canonicalizeAffineMinOp(rewriter, minOp,

/*dims=*/ValueRange{iv, ub, step},

constraintsBuilder);

}

LogicalResult mlir::scf::peelAndCanonicalizeForLoop(RewriterBase &rewriter,

ForOp forOp) {

Value ub = forOp.upperBound();

scf::IfOp ifOp;

nicolasvasilacheUnsubmitted

Done

This seems unnecessarily convoluted, at the very least I would:

// Add loop peeling invariant. This is the main piece of knowledge that
// enables AffineMinOp simplification.
if (insideLoop) {
  // ub - iv >= step (equiv.: -iv + ub - step + 0 >= 0)
  // Intuitively: Inside the peeled loop, every iteration is a "full"
  // iteration, i.e., step divides the iteration space `ub - lb` evenly.
  auto lambda = ...;
  return canonicalizeAffineMinOp(rewriter, minOp,
                             /*dims=*/ValueRange{iv, ub, step}, lambda);
}
// ub - iv < step (equiv.: iv + -ub + step - 1 >= 0)
// Intuitively: `iv` is the split bound here, i.e., the iteration variable
// value of the very last iteration (in the unpeeled loop). At that point,
// there are less than `step` elements remaining. (Otherwise, the peeled
// loop would run for at least one more iteration.)
auto lambda = ...;
return canonicalizeAffineMinOp(rewriter, minOp,
                           /*dims=*/ValueRange{iv, ub, step}, lambda);

However, in light of the other comments I would refactor much more deeply and just pass a FlatAffineConstraints to the canonicalize function.

nicolasvasilache: This seems unnecessarily convoluted, at the very least I would: ``` // Add loop peeling…

springermAuthorUnsubmitted

Done

Replaced with FlatAffineValueConstraints, which is created by the caller and passed to canonicalizeAffineMinOp.

springerm: Replaced with FlatAffineValueConstraints, which is created by the caller and passed to…

Value splitBound;

if (failed(peelForLoop(rewriter, forOp, ifOp, splitBound)))

return failure();

// Rewrite affine.min ops.

forOp.walk([&](AffineMinOp minOp) {

(void)rewritePeeledAffineOp(rewriter, minOp, forOp.getInductionVar(), ub,

forOp.step(), /*insideLoop=*/true);

});

ifOp.walk([&](AffineMinOp minOp) {

(void)rewritePeeledAffineOp(rewriter, minOp, splitBound, ub, forOp.step(),

/*insideLoop=*/false);

});

return success();

}

static constexpr char kPeeledLoopLabel[] = "__peeled_loop__"; static constexpr char kPeeledLoopLabel[] = "__peeled_loop__";

namespace { namespace {

struct ForLoopPeelingPattern : public OpRewritePattern<ForOp> { struct ForLoopPeelingPattern : public OpRewritePattern<ForOp> {

using OpRewritePattern<ForOp>::OpRewritePattern; using OpRewritePattern<ForOp>::OpRewritePattern;

LogicalResult matchAndRewrite(ForOp forOp, LogicalResult matchAndRewrite(ForOp forOp,

PatternRewriter &rewriter) const override { PatternRewriter &rewriter) const override {

if (forOp->hasAttr(kPeeledLoopLabel)) if (forOp->hasAttr(kPeeledLoopLabel))

return failure(); return failure();

if (failed(peelAndCanonicalizeForLoop(rewriter, forOp)))

scf::IfOp ifOp;

if (failed(peelForLoop(rewriter, forOp, ifOp)))

return failure(); return failure();

// Apply label, so that the same loop is not rewritten a second time. // Apply label, so that the same loop is not rewritten a second time.

rewriter.updateRootInPlace(forOp, [&]() { rewriter.updateRootInPlace(forOp, [&]() {

forOp->setAttr(kPeeledLoopLabel, rewriter.getUnitAttr()); forOp->setAttr(kPeeledLoopLabel, rewriter.getUnitAttr());

}); });

return success(); return success();

} }

}; };

} // namespace } // namespace

namespace { namespace {

struct ParallelLoopSpecialization struct ParallelLoopSpecialization

: public SCFParallelLoopSpecializationBase<ParallelLoopSpecialization> { : public SCFParallelLoopSpecializationBase<ParallelLoopSpecialization> {

Show All 38 Lines

mlir/test/Dialect/SCF/for-loop-peeling.mlir

// RUN: mlir-opt %s -for-loop-peeling -canonicalize -split-input-file \| FileCheck %s		// RUN: mlir-opt %s -for-loop-peeling -canonicalize -split-input-file \| FileCheck %s

// CHECK-DAG: #[[MAP0:.*]] = affine_map<()[s0, s1, s2] -> (s1 - (s1 - s0) mod s2)>		// CHECK-DAG: #[[MAP0:.*]] = affine_map<()[s0, s1, s2] -> (s1 - (s1 - s0) mod s2)>
// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0)[s0, s1] -> (s0, -d0 + s1)>		// CHECK-DAG: #[[MAP1:.*]] = affine_map<()[s0, s1, s2] -> (s1 - (s1 - (s1 - s0) mod s2))>
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Hmm seems like this should simplify to `(s1 - s0) mod s2`. Likely some missing affineexpr simplification pattern, however it may just be adding more canonicalization patterns that will continue to fail in slightly more complex cases. Maybe a lot of this should be using FlatAffineConstraints directly. nicolasvasilache: Hmm seems like this should simplify to `(s1 - s0) mod s2`. Likely some missing affineexpr…
// CHECK-DAG: #[[MAP2:.*]] = affine_map<()[s0, s1, s2] -> (s0, s2 - (s2 - (s2 - s1) mod s0))>
// CHECK: func @fully_dynamic_bounds(		// CHECK: func @fully_dynamic_bounds(
// CHECK-SAME: %[[LB:.]]: index, %[[UB:.]]: index, %[[STEP:.*]]: index		// CHECK-SAME: %[[LB:.]]: index, %[[UB:.]]: index, %[[STEP:.*]]: index
// CHECK: %[[C0_I32:.*]] = constant 0 : i32		// CHECK: %[[C0_I32:.*]] = constant 0 : i32
// CHECK: %[[NEW_UB:.*]] = affine.apply #[[MAP0]]()[%[[LB]], %[[UB]], %[[STEP]]]		// CHECK: %[[NEW_UB:.*]] = affine.apply #[[MAP0]]()[%[[LB]], %[[UB]], %[[STEP]]]
// CHECK: %[[LOOP:.]] = scf.for %[[IV:.]] = %[[LB]] to %[[NEW_UB]]		// CHECK: %[[LOOP:.]] = scf.for %[[IV:.]] = %[[LB]] to %[[NEW_UB]]
// CHECK-SAME: step %[[STEP]] iter_args(%[[ACC:.*]] = %[[C0_I32]]) -> (i32) {		// CHECK-SAME: step %[[STEP]] iter_args(%[[ACC:.*]] = %[[C0_I32]]) -> (i32) {
// CHECK: %[[MINOP:.*]] = affine.min #[[MAP1]](%[[IV]])[%[[STEP]], %[[UB]]]		// CHECK: %[[CAST:.*]] = index_cast %[[STEP]] : index to i32
// CHECK: %[[CAST:.*]] = index_cast %[[MINOP]] : index to i32
// CHECK: %[[ADD:.*]] = addi %[[ACC]], %[[CAST]] : i32		// CHECK: %[[ADD:.*]] = addi %[[ACC]], %[[CAST]] : i32
// CHECK: scf.yield %[[ADD]]		// CHECK: scf.yield %[[ADD]]
// CHECK: }		// CHECK: }
// CHECK: %[[HAS_MORE:.*]] = cmpi slt, %[[NEW_UB]], %[[UB]]		// CHECK: %[[HAS_MORE:.*]] = cmpi slt, %[[NEW_UB]], %[[UB]]
// CHECK: %[[RESULT:.*]] = scf.if %[[HAS_MORE]] -> (i32) {		// CHECK: %[[RESULT:.*]] = scf.if %[[HAS_MORE]] -> (i32) {
// CHECK: %[[REM:.*]] = affine.min #[[MAP2]]()[%[[STEP]], %[[LB]], %[[UB]]]		// CHECK: %[[REM:.*]] = affine.apply #[[MAP1]]()[%[[LB]], %[[UB]], %[[STEP]]]
// CHECK: %[[CAST2:.*]] = index_cast %[[REM]]		// CHECK: %[[CAST2:.*]] = index_cast %[[REM]]
// CHECK: %[[ADD2:.*]] = addi %[[LOOP]], %[[CAST2]]		// CHECK: %[[ADD2:.*]] = addi %[[LOOP]], %[[CAST2]]
// CHECK: scf.yield %[[ADD2]]		// CHECK: scf.yield %[[ADD2]]
// CHECK: } else {		// CHECK: } else {
// CHECK: scf.yield %[[LOOP]]		// CHECK: scf.yield %[[LOOP]]
// CHECK: }		// CHECK: }
// CHECK: return %[[RESULT]]		// CHECK: return %[[RESULT]]
#map = affine_map<(d0, d1)[s0] -> (s0, d0 - d1)>		#map = affine_map<(d0, d1)[s0] -> (s0, d0 - d1)>
func @fully_dynamic_bounds(%lb : index, %ub: index, %step: index) -> i32 {		func @fully_dynamic_bounds(%lb : index, %ub: index, %step: index) -> i32 {
%c0 = constant 0 : i32		%c0 = constant 0 : i32
%r = scf.for %iv = %lb to %ub step %step iter_args(%arg = %c0) -> i32 {		%r = scf.for %iv = %lb to %ub step %step iter_args(%arg = %c0) -> i32 {
%s = affine.min #map(%ub, %iv)[%step]		%s = affine.min #map(%ub, %iv)[%step]
%casted = index_cast %s : index to i32		%casted = index_cast %s : index to i32
%0 = addi %arg, %casted : i32		%0 = addi %arg, %casted : i32
scf.yield %0 : i32		scf.yield %0 : i32
}		}
return %r : i32		return %r : i32
}		}

// -----		// -----

// CHECK-DAG: #[[MAP:.*]] = affine_map<(d0) -> (4, -d0 + 17)>
// CHECK: func @fully_static_bounds(		// CHECK: func @fully_static_bounds(
// CHECK-DAG: %[[C0_I32:.*]] = constant 0 : i32		// CHECK-DAG: %[[C0_I32:.*]] = constant 0 : i32
// CHECK-DAG: %[[C1_I32:.*]] = constant 1 : i32		// CHECK-DAG: %[[C1_I32:.*]] = constant 1 : i32
		// CHECK-DAG: %[[C4_I32:.*]] = constant 4 : i32
// CHECK-DAG: %[[C0:.*]] = constant 0 : index		// CHECK-DAG: %[[C0:.*]] = constant 0 : index
// CHECK-DAG: %[[C4:.*]] = constant 4 : index		// CHECK-DAG: %[[C4:.*]] = constant 4 : index
// CHECK-DAG: %[[C16:.*]] = constant 16 : index		// CHECK-DAG: %[[C16:.*]] = constant 16 : index
// CHECK: %[[LOOP:.]] = scf.for %[[IV:.]] = %[[C0]] to %[[C16]]		// CHECK: %[[LOOP:.]] = scf.for %[[IV:.]] = %[[C0]] to %[[C16]]
// CHECK-SAME: step %[[C4]] iter_args(%[[ACC:.*]] = %[[C0_I32]]) -> (i32) {		// CHECK-SAME: step %[[C4]] iter_args(%[[ACC:.*]] = %[[C0_I32]]) -> (i32) {
// CHECK: %[[MINOP:.*]] = affine.min #[[MAP]](%[[IV]])		// CHECK: %[[ADD:.*]] = addi %[[ACC]], %[[C4_I32]] : i32
// CHECK: %[[CAST:.*]] = index_cast %[[MINOP]] : index to i32
// CHECK: %[[ADD:.*]] = addi %[[ACC]], %[[CAST]] : i32
// CHECK: scf.yield %[[ADD]]		// CHECK: scf.yield %[[ADD]]
// CHECK: }		// CHECK: }
// CHECK: %[[RESULT:.*]] = addi %[[LOOP]], %[[C1_I32]] : i32		// CHECK: %[[RESULT:.*]] = addi %[[LOOP]], %[[C1_I32]] : i32
// CHECK: return %[[RESULT]]		// CHECK: return %[[RESULT]]
#map = affine_map<(d0, d1)[s0] -> (s0, d0 - d1)>		#map = affine_map<(d0, d1)[s0] -> (s0, d0 - d1)>
func @fully_static_bounds() -> i32 {		func @fully_static_bounds() -> i32 {
%c0_i32 = constant 0 : i32		%c0_i32 = constant 0 : i32
%lb = constant 0 : index		%lb = constant 0 : index
%step = constant 4 : index		%step = constant 4 : index
%ub = constant 17 : index		%ub = constant 17 : index
%r = scf.for %iv = %lb to %ub step %step		%r = scf.for %iv = %lb to %ub step %step
iter_args(%arg = %c0_i32) -> i32 {		iter_args(%arg = %c0_i32) -> i32 {
%s = affine.min #map(%ub, %iv)[%step]		%s = affine.min #map(%ub, %iv)[%step]
%casted = index_cast %s : index to i32		%casted = index_cast %s : index to i32
%0 = addi %arg, %casted : i32		%0 = addi %arg, %casted : i32
scf.yield %0 : i32		scf.yield %0 : i32
}		}
return %r : i32		return %r : i32
}		}

// -----		// -----

// CHECK-DAG: #[[MAP0:.]] = affine_map<()[s0] -> ((s0 floordiv 4) 4)>		// CHECK-DAG: #[[MAP0:.]] = affine_map<()[s0] -> ((s0 floordiv 4) 4)>
// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0)[s0] -> (4, -d0 + s0)>		// CHECK-DAG: #[[MAP1:.*]] = affine_map<()[s0] -> (s0 mod 4)>
// CHECK-DAG: #[[MAP2:.*]] = affine_map<()[s0] -> (4, s0 mod 4)>
// CHECK: func @dynamic_upper_bound(		// CHECK: func @dynamic_upper_bound(
// CHECK-SAME: %[[UB:.*]]: index		// CHECK-SAME: %[[UB:.*]]: index
// CHECK-DAG: %[[C0_I32:.*]] = constant 0 : i32		// CHECK-DAG: %[[C0_I32:.*]] = constant 0 : i32
		// CHECK-DAG: %[[C4_I32:.*]] = constant 4 : i32
// CHECK-DAG: %[[C0:.*]] = constant 0 : index		// CHECK-DAG: %[[C0:.*]] = constant 0 : index
// CHECK-DAG: %[[C4:.*]] = constant 4 : index		// CHECK-DAG: %[[C4:.*]] = constant 4 : index
// CHECK: %[[NEW_UB:.*]] = affine.apply #[[MAP0]]()[%[[UB]]]		// CHECK: %[[NEW_UB:.*]] = affine.apply #[[MAP0]]()[%[[UB]]]
// CHECK: %[[LOOP:.]] = scf.for %[[IV:.]] = %[[C0]] to %[[NEW_UB]]		// CHECK: %[[LOOP:.]] = scf.for %[[IV:.]] = %[[C0]] to %[[NEW_UB]]
// CHECK-SAME: step %[[C4]] iter_args(%[[ACC:.*]] = %[[C0_I32]]) -> (i32) {		// CHECK-SAME: step %[[C4]] iter_args(%[[ACC:.*]] = %[[C0_I32]]) -> (i32) {
// CHECK: %[[MINOP:.*]] = affine.min #[[MAP1]](%[[IV]])[%[[UB]]]		// CHECK: %[[ADD:.*]] = addi %[[ACC]], %[[C4_I32]] : i32
// CHECK: %[[CAST:.*]] = index_cast %[[MINOP]] : index to i32
// CHECK: %[[ADD:.*]] = addi %[[ACC]], %[[CAST]] : i32
// CHECK: scf.yield %[[ADD]]		// CHECK: scf.yield %[[ADD]]
// CHECK: }		// CHECK: }
// CHECK: %[[HAS_MORE:.*]] = cmpi slt, %[[NEW_UB]], %[[UB]]		// CHECK: %[[HAS_MORE:.*]] = cmpi slt, %[[NEW_UB]], %[[UB]]
// CHECK: %[[RESULT:.*]] = scf.if %[[HAS_MORE]] -> (i32) {		// CHECK: %[[RESULT:.*]] = scf.if %[[HAS_MORE]] -> (i32) {
// CHECK: %[[REM:.*]] = affine.min #[[MAP2]]()[%[[UB]]]		// CHECK: %[[REM:.*]] = affine.apply #[[MAP1]]()[%[[UB]]]
// CHECK: %[[CAST2:.*]] = index_cast %[[REM]]		// CHECK: %[[CAST2:.*]] = index_cast %[[REM]]
// CHECK: %[[ADD2:.*]] = addi %[[LOOP]], %[[CAST2]]		// CHECK: %[[ADD2:.*]] = addi %[[LOOP]], %[[CAST2]]
// CHECK: scf.yield %[[ADD2]]		// CHECK: scf.yield %[[ADD2]]
// CHECK: } else {		// CHECK: } else {
// CHECK: scf.yield %[[LOOP]]		// CHECK: scf.yield %[[LOOP]]
// CHECK: }		// CHECK: }
// CHECK: return %[[RESULT]]		// CHECK: return %[[RESULT]]
#map = affine_map<(d0, d1)[s0] -> (s0, d0 - d1)>		#map = affine_map<(d0, d1)[s0] -> (s0, d0 - d1)>
Show All 9 Lines	%r = scf.for %iv = %lb to %ub step %step
scf.yield %0 : i32		scf.yield %0 : i32
}		}
return %r : i32		return %r : i32
}		}

// -----		// -----

// CHECK-DAG: #[[MAP0:.]] = affine_map<()[s0] -> ((s0 floordiv 4) 4)>		// CHECK-DAG: #[[MAP0:.]] = affine_map<()[s0] -> ((s0 floordiv 4) 4)>
// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0)[s0] -> (4, -d0 + s0)>		// CHECK-DAG: #[[MAP1:.*]] = affine_map<()[s0] -> (s0 mod 4)>
// CHECK-DAG: #[[MAP2:.*]] = affine_map<()[s0] -> (4, s0 mod 4)>
// CHECK: func @no_loop_results(		// CHECK: func @no_loop_results(
// CHECK-SAME: %[[UB:.]]: index, %[[MEMREF:.]]: memref<i32>		// CHECK-SAME: %[[UB:.]]: index, %[[MEMREF:.]]: memref<i32>
		// CHECK-DAG: %[[C4_I32:.*]] = constant 4 : i32
// CHECK-DAG: %[[C0:.*]] = constant 0 : index		// CHECK-DAG: %[[C0:.*]] = constant 0 : index
// CHECK-DAG: %[[C4:.*]] = constant 4 : index		// CHECK-DAG: %[[C4:.*]] = constant 4 : index
// CHECK: %[[NEW_UB:.*]] = affine.apply #[[MAP0]]()[%[[UB]]]		// CHECK: %[[NEW_UB:.*]] = affine.apply #[[MAP0]]()[%[[UB]]]
// CHECK: scf.for %[[IV:.*]] = %[[C0]] to %[[NEW_UB]] step %[[C4]] {		// CHECK: scf.for %[[IV:.*]] = %[[C0]] to %[[NEW_UB]] step %[[C4]] {
// CHECK: %[[MINOP:.*]] = affine.min #[[MAP1]](%[[IV]])[%[[UB]]]
// CHECK: %[[LOAD:.*]] = memref.load %[[MEMREF]][]		// CHECK: %[[LOAD:.*]] = memref.load %[[MEMREF]][]
// CHECK: %[[CAST:.*]] = index_cast %[[MINOP]] : index to i32		// CHECK: %[[ADD:.*]] = addi %[[LOAD]], %[[C4_I32]] : i32
// CHECK: %[[ADD:.*]] = addi %[[LOAD]], %[[CAST]] : i32
// CHECK: memref.store %[[ADD]], %[[MEMREF]]		// CHECK: memref.store %[[ADD]], %[[MEMREF]]
// CHECK: }		// CHECK: }
// CHECK: %[[HAS_MORE:.*]] = cmpi slt, %[[NEW_UB]], %[[UB]]		// CHECK: %[[HAS_MORE:.*]] = cmpi slt, %[[NEW_UB]], %[[UB]]
// CHECK: scf.if %[[HAS_MORE]] {		// CHECK: scf.if %[[HAS_MORE]] {
// CHECK: %[[REM:.*]] = affine.min #[[MAP2]]()[%[[UB]]]		// CHECK: %[[REM:.*]] = affine.apply #[[MAP1]]()[%[[UB]]]
// CHECK: %[[LOAD2:.*]] = memref.load %[[MEMREF]][]		// CHECK: %[[LOAD2:.*]] = memref.load %[[MEMREF]][]
// CHECK: %[[CAST2:.*]] = index_cast %[[REM]]		// CHECK: %[[CAST2:.*]] = index_cast %[[REM]]
// CHECK: %[[ADD2:.*]] = addi %[[LOAD2]], %[[CAST2]]		// CHECK: %[[ADD2:.*]] = addi %[[LOAD2]], %[[CAST2]]
// CHECK: memref.store %[[ADD2]], %[[MEMREF]]		// CHECK: memref.store %[[ADD2]], %[[MEMREF]]
// CHECK: }		// CHECK: }
// CHECK: return		// CHECK: return
#map = affine_map<(d0, d1)[s0] -> (s0, d0 - d1)>		#map = affine_map<(d0, d1)[s0] -> (s0, d0 - d1)>
func @no_loop_results(%ub : index, %d : memref<i32>) {		func @no_loop_results(%ub : index, %d : memref<i32>) {
%c0_i32 = constant 0 : i32		%c0_i32 = constant 0 : i32
%lb = constant 0 : index		%lb = constant 0 : index
%step = constant 4 : index		%step = constant 4 : index
scf.for %iv = %lb to %ub step %step {		scf.for %iv = %lb to %ub step %step {
%s = affine.min #map(%ub, %iv)[%step]		%s = affine.min #map(%ub, %iv)[%step]
%r = memref.load %d[] : memref<i32>		%r = memref.load %d[] : memref<i32>
%casted = index_cast %s : index to i32		%casted = index_cast %s : index to i32
%0 = addi %r, %casted : i32		%0 = addi %r, %casted : i32
memref.store %0, %d[] : memref<i32>		memref.store %0, %d[] : memref<i32>
}		}
return		return
}		}

		// -----

		// Test rewriting of affine.min ops. Make sure that more general cases than
		// the ones above are successfully rewritten. Also make sure that the pattern
		// does not rewrite affine.min ops that should not be rewritten.

		// CHECK-DAG: #[[MAP1:.*]] = affine_map<()[s0] -> (s0 + 1)>
		// CHECK-DAG: #[[MAP2:.*]] = affine_map<(d0)[s0, s1] -> (s0, -d0 + s1 - 1)>
		// CHECK-DAG: #[[MAP3:.*]] = affine_map<(d0)[s0, s1, s2] -> (s0, -d0 + s1, s2)>
		// CHECK-DAG: #[[MAP4:.*]] = affine_map<()[s0, s1, s2] -> (s1 - (s1 - (s1 - s0) mod s2))>
		// CHECK-DAG: #[[MAP5:.*]] = affine_map<()[s0, s1, s2] -> (s1 - (s1 - (s1 - s0) mod s2) + 1)>
		// CHECK-DAG: #[[MAP6:.*]] = affine_map<()[s0, s1, s2] -> (s1 - (s1 - (s1 - s0) mod s2) - 1)>
		// CHECK-DAG: #[[MAP7:.*]] = affine_map<()[s0, s1, s2, s3] -> (s0, s2 - (s2 - (s2 - s1) mod s0), s3)>
		// CHECK: func @test_affine_min_rewrite(
		// CHECK-SAME: %[[LB:.]]: index, %[[UB:.]]: index, %[[STEP:.*]]: index,
		// CHECK-SAME: %[[MEMREF:.]]: memref<?xindex>, %[[SOME_VAL:.]]: index
		// CHECK: scf.for %[[IV:.]] = %[[LB]] to %{{.}} step %[[STEP]] {
		// CHECK: %[[RES2:.*]] = affine.apply #[[MAP1]]()[%[[STEP]]]
		// CHECK: %[[RES3:.*]] = affine.min #[[MAP2]](%[[IV]])[%[[STEP]], %[[UB]]]
		// CHECK: %[[RES4:.*]] = affine.min #map3(%[[IV]])[%[[STEP]], %[[UB]], %[[SOME_VAL]]]
		nicolasvasilacheUnsubmitted Done Reply Inline Actions `#map3 -> #[[MAP3]]` nicolasvasilache: `#map3 -> #[[MAP3]] `
		// CHECK: memref.store %[[STEP]]
		// CHECK: memref.store %[[STEP]]
		// CHECK: memref.store %[[RES2]]
		// CHECK: memref.store %[[RES3]]
		// CHECK: memref.store %[[RES4]]
		// CHECK: }
		// CHECK: scf.if {{.*}} {
		// CHECK: %[[RES_IF_0:.*]] = affine.apply #[[MAP4]]()[%[[LB]], %[[UB]], %[[STEP]]]
		// CHECK: %[[RES_IF_1:.*]] = affine.apply #[[MAP5]]()[%[[LB]], %[[UB]], %[[STEP]]]
		// CHECK: %[[RES_IF_2:.*]] = affine.apply #[[MAP5]]()[%[[LB]], %[[UB]], %[[STEP]]]
		// CHECK: %[[RES_IF_3:.*]] = affine.apply #[[MAP6]]()[%[[LB]], %[[UB]], %[[STEP]]]
		// CHECK: %[[RES_IF_4:.*]] = affine.min #[[MAP7]]()[%[[STEP]], %[[LB]], %[[UB]], %[[SOME_VAL]]]
		// CHECK: memref.store %[[RES_IF_0]]
		// CHECK: memref.store %[[RES_IF_1]]
		// CHECK: memref.store %[[RES_IF_2]]
		// CHECK: memref.store %[[RES_IF_3]]
		// CHECK: memref.store %[[RES_IF_4]]
		#map0 = affine_map<(d0, d1)[s0] -> (s0, d0 - d1)>
		#map1 = affine_map<(d0, d1)[s0] -> (d0 - d1 + 1, s0)>
		#map2 = affine_map<(d0, d1)[s0] -> (s0 + 1, d0 - d1 + 1)>
		#map3 = affine_map<(d0, d1)[s0] -> (s0, d0 - d1 - 1)>
		#map4 = affine_map<(d0, d1, d2)[s0] -> (s0, d0 - d1, d2)>
		func @test_affine_min_rewrite(%lb : index, %ub: index,
		%step: index, %d : memref<?xindex>,
		%some_val: index) {
		%c0 = constant 0 : index
		%c1 = constant 1 : index
		%c2 = constant 2 : index
		%c3 = constant 3 : index
		%c4 = constant 4 : index
		scf.for %iv = %lb to %ub step %step {
		// Most common case: Rewrite min(%ub - %iv, %step) to %step.
		%m0 = affine.min #map0(%ub, %iv)[%step]
		nicolasvasilacheUnsubmitted Done Reply Inline Actions I find a little hard to keep track of things at such a distance, could you please reorder in interleaved form? // Most common case: Rewrite min(%ub - %iv, %step) to %step. // CHECK: memref.store %[[STEP]] %m0 = affine.min #map0(%ub, %iv)[%step] memref.store %m0, %d[%c0] : memref<?xindex> // Increase %ub - %iv a little bit, pattern should still apply. ... // At the end, the checks within the scf.if Maybe even embed the affine_map into the affine.min since it is not reused ? nicolasvasilache: I find a little hard to keep track of things at such a distance, could you please reorder in…
		// Increase %ub - %iv a little bit, pattern should still apply.
		%m1 = affine.min #map1(%ub, %iv)[%step]
		// Rewrite min(%ub - %iv + 1, %step + 1) to %step + 1.
		%m2 = affine.min #map2(%ub, %iv)[%step]
		// min(%ub - %iv - 1, %step) cannot be simplified because %ub - %iv - 1
		// can be smaller than %step. (Can be simplified in if-statement.)
		%m3 = affine.min #map3(%ub, %iv)[%step]
		// min(%ub - %iv, %step, %some_val) cannot be simplified because the range
		// of %some_val is unknown.
		%m4 = affine.min #map4(%ub, %iv, %some_val)[%step]
		memref.store %m0, %d[%c0] : memref<?xindex>
		memref.store %m1, %d[%c1] : memref<?xindex>
		memref.store %m2, %d[%c2] : memref<?xindex>
		memref.store %m3, %d[%c3] : memref<?xindex>
		memref.store %m4, %d[%c4] : memref<?xindex>
		}
		return
		}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][scf] Simplify affine.min ops after loop peelingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 365689

mlir/include/mlir/Dialect/SCF/Transforms.h

mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp

mlir/test/Dialect/SCF/for-loop-peeling.mlir

[mlir][scf] Simplify affine.min ops after loop peeling
ClosedPublic