This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
Transforms/
30/30
Loops.cpp
-
Utils/
-
Utils.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
loops.mlir

Differential D83191

[mlir] Loop bounds inference in linalg.generic op improved to support bounds for convolution
ClosedPublic

Authored by limo1996 on Jul 6 2020, 12:59 AM.

Download Raw Diff

Details

Reviewers

ftynse
nicolasvasilache
rriddle

Commits

rGe4dd964df016: [mlir] Loop bounds inference in linalg.generic op improved to support bounds…

Summary

Loop bound inference is right now very limited as it supports only permutation maps and thus
it is impossible to implement convolution with linalg.generic as it requires more advanced
loop bound inference. This commits solves it for the convolution case.

Depends On D83158

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

limo1996 created this revision.Jul 6 2020, 12:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2020, 1:00 AM

Herald added subscribers: msifontes, jurahul, Kayjukh and 13 others. · View Herald Transcript

Harbormaster completed remote builds in B62958: Diff 275596.Jul 6 2020, 1:16 AM

ftynse requested changes to this revision.Jul 6 2020, 5:59 AM

ftynse added inline comments.

mlir/lib/Dialect/Linalg/Transforms/Loops.cpp
80	Nit: Top-level functions should use `///` for comments.
84	Please explain the semantics of the argument in the top-level comment instead.
85	Prefer `ValueRange` unless it is missing some functionality
86	We tend to use `assert(condition && "textual description")`
96	Prefer early-return to decrease horizontal alignment. E.g., for (...) { if (auto x = dyn_cast<..>) { // code } } can be transformed into for (...) { auto x = dyn_cast<..>; if (!x) continue; // code }
98	`cast` assume the operand has the expected type and fails an assertion if it does not. If there is no other check, e.g. in the op verifier, this may trigger an assertion on user-entered IR, which we must avoid
100	MLIR uses camelBack for identifiers
504	Please use a more descriptive variable name. Also avoid C-style casts.
507	No need for a cast here, but I'd check or assert somewhere (depending on whether it can happen with user-entered IR or not) that `numIn > numDims`, one can have a map of the shape `(i) -> (i,i,i)`.

This revision now requires changes to proceed.Jul 6 2020, 5:59 AM

All comments of ftynse incorporated

Deferring final approval to @nicolasvasilache

mlir/lib/Dialect/Linalg/Transforms/Loops.cpp
109	This is an anti-pattern in LLVM, if you intend to cast, you should use `dyn_cast` and check that the result is non-null instead. auto llhs = lhs.getLHS().dyn_cast<AffineDimExpr>(); if (!llhs) continue; int dimPosition = llhs.getPosition(); (This will not work for "add" and "mul" above because they don't have a dedicated type) Also, avoid single-character variable names.

This revision is now accepted and ready to land.Jul 7 2020, 1:38 AM

limo1996 marked 9 inline comments as done.Jul 7 2020, 1:57 AM

limo1996 added inline comments.

mlir/lib/Dialect/Linalg/Transforms/Loops.cpp
507	numInputs = numDims + numSymbols so it should never happen

limo1996 marked 2 inline comments as done.Jul 7 2020, 3:03 AM

limo1996 added inline comments.

mlir/lib/Dialect/Linalg/Transforms/Loops.cpp
517	@ftynse inversePermutation needs to "support symbols by ignoring them" as verification uses the function to check whether concatination of indexing maps is invertible.. Another option is to ignore the check if map has symbols which can cause problems..

Proposed change to circumvent call to inversePermutation during verification of linalg.generic

Herald added a reviewer: rriddle. · View Herald TranscriptJul 7 2020, 3:11 AM

limo1996 marked an inline comment as done.Jul 7 2020, 3:13 AM

limo1996 added inline comments.

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
298 ↗	(On Diff #275972)	@ftynse Here is the proposed change which requires the least amount of code..

ftynse requested changes to this revision.Jul 7 2020, 3:43 AM

ftynse added inline comments.

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
298 ↗	(On Diff #275972)	I am interested in a change that keeps the verifier correct and useful (that is, not allowing operations we cannot handle elsewhere), not in the change with the minimal number of LoC. This change should be included in the previous commit instead, I suppose.
mlir/lib/Dialect/Linalg/Transforms/Loops.cpp
504	mlir uses camelBack

This revision now requires changes to proceed.Jul 7 2020, 3:43 AM

2 of ftynse's comments addressed

limo1996 marked 2 inline comments as done.Jul 7 2020, 5:21 AM

limo1996 added a child revision: D83378: [mlir][WIP] symbol_source attr in linalg.generic proposal.Jul 8 2020, 3:19 AM

getSymbolSource: llvm::Optional instead of -1

Harbormaster completed remote builds in B63400: Diff 276402.Jul 8 2020, 6:39 AM

evolve

Harbormaster completed remote builds in B63583: Diff 276731.Jul 9 2020, 7:35 AM

ftynse added inline comments.Jul 16 2020, 6:56 AM

mlir/lib/Dialect/Linalg/Transforms/Loops.cpp
95	This does not look correct. Assume the concatenated map looks like `(d0,d1,d2)[s0] -> (d2,d0,d1)` (the symbol is unused for simplicity). The expected ranges should be the same as produced by the inversion method, i.e. `(0..allViewSizes[1], 0..allViewSizes[2], 0..allViewSizes[0])`, but this code will produce `(0..allViewSizes[0], 0..allViewSizes[1], 0..allViewSizes[2])`. I think you need `allViewSizes[ <index-of-result-in-map> ]` instead of `allViewSizes[d.getPosition()]`, but please double check.
100	Could you please add a comment that explains the math behind this? Affine expressions are not super-easy to follow.
103	Shouldn't this also check that LHS and RHS are binary expressions of a specific kind, like "add" or "floordiv"? Like: "if the expression has the shape di + dj - sk floordiv CST, the bounds are ... because ... ". Also explain that it always overwrite the existing range because the range is always smaller in this case. Imagine having a map like `(d0,d1)[s0] -> (d0,d1,d0+d1-s0 floordiv 3)`, you will have computed the bounds for both loops when you hit this case. <disregard for now> Normally, we should have computed the maximum across all found lower bounds and the minimum across all found upper bounds. </disregard for now>
500	Since the parent commit introduced `symbol_source` as an ODS attribute, it should be possible to just call `linalgOp.symbol_source()` here and avoid using a hardcoded name.
506–507	`unsigned diff = map.getNumSymbols();` ? Also, `numSymb` sounds like a better name for this?
513	I'm not convinced `diff * symbolSource + idx` is the correct expression. You actually want the sizes that correspond to `symbolSource`'s operand, which should be computed as the sum of ranks of all operands before `symbolSource`. (`diff` is the current number of symbols).

ftynse requested changes to this revision.Jul 16 2020, 6:56 AM

This revision now requires changes to proceed.Jul 16 2020, 6:56 AM

All comments resolved

mlir/lib/Dialect/Linalg/Transforms/Loops.cpp
95	Yes. I moved the logic from the next commit here..
500	This code was removed and simplified so no need for that
506–507	This code was also removed
513	Yes this was also fixed in the previous commit and evolved here..

All comments resolved

Harbormaster failed remote builds in B64693: Diff 278790!Jul 17 2020, 8:50 AM

comment moved

Harbormaster failed remote builds in B64696: Diff 278796!Jul 17 2020, 9:01 AM

ftynse added inline comments.Jul 20 2020, 1:57 AM

mlir/lib/Dialect/Linalg/Transforms/Loops.cpp
100	The comment I see (`m + n - s floordiv 2`) is overly concise to qualify as explanation. I was expecting something like: "if the access pattern is (m,n)[s] -> (m + n - s floordiv C)", then the bounds are `s floordiv C <= n <= max m - s floordiv C` because reason1, reason2. Actually, I seem to have said the exact same thing in the comment below, which is marked "done". Did you forget to reupload the diff after changes?
103	I don't see checks for binary expression kinds on lhs and rhs. In the current state, this can match something like `d0-d1*s0 floordiv 42`, in which case the computed bounds will be wrong. Did you forget to upload a new version?

Evolved changes from parent commit which included changes to emitting loop ranges with symbols

Harbormaster failed remote builds in B64941: Diff 279275!Jul 20 2020, 9:24 AM

last comment of nicolas resolved

Harbormaster failed remote builds in B64944: Diff 279283!Jul 20 2020, 9:36 AM

Harbormaster completed remote builds in B64944: Diff 279283.Jul 21 2020, 2:45 AM

ftynse requested changes to this revision.Jul 21 2020, 3:04 AM

ftynse added inline comments.

mlir/include/mlir/Dialect/Linalg/Utils/Utils.h
123 ↗	(On Diff #279283)	Let's be more explicit: "Reserve is mandatory to avoid a potential undefined behavior with pushing back to smallvector from itself." Also, please terminate sentences with a dot in comments. Always.
mlir/lib/Dialect/Linalg/Transforms/Loops.cpp
100	Thanks for providing more details! Unfortunately, this confirmed my understanding that the code went too far with eager simplification and produces off-by-one results. Given 0 <= n < A 0 <= n + m - s floordiv 2 < B the correct upper bound on m is m < B - max n + s floordiv 2 We can substitute max n with (A-1) from the first inequality, thus obtaining m < B - A + 1 + s floordiv 2 Now, the code makes the assumption that s = A, which can be reasonable for the use case. This gives m < B - A + 1 + A floordiv 2 and then it tries to simplify to m < B - A floordiv 2 which only holds when `A = 1 mod 2`, because in this case `A = 2 * (A floordiv 2) + 1` and cancels out the +/-1. Otherwise, it should have been `m < B - A floordiv 2 + 1`. This does not produce out-of-bounds accesses, but ignores the last iteration for even values of A. Let's not be too eager with simplification and just compute the upper bound on m as `m < B - A + 1 + s floordiv 2` without any assumption on the divisibility of A by 2.
103	I still don't see the checks I expected. To be precise, `auto lhs = binOp.getLHS().dyn_cast<AffineBinaryOpExpr>();` does _not_ guarantee that lhs is a sum. I expect to see `if (lhs.getKind != AffineExprKind::Add) continue;` and similarly for rhs.

This revision now requires changes to proceed.Jul 21 2020, 3:04 AM

All comments of Alex resolved

Harbormaster failed remote builds in B65070: Diff 279499!Jul 21 2020, 6:23 AM

synced

Harbormaster completed remote builds in B65196: Diff 279741.Jul 22 2020, 3:01 AM

ftynse accepted this revision.Jul 22 2020, 4:35 AM

ftynse added inline comments.

mlir/lib/Dialect/Linalg/Transforms/Loops.cpp
124	Actually, it shouldn't matter anymore whether you do `floordiv 2` or `floordiv any-other-value`.
126	This could use a better name. Please try really hard to avoid short semantically meaningless names. I recall making this same comment. The purpose of the name may seem obvious today for you, but it won't be when you'll have to debug this code two month from now. Does `c2` stand for `c * 2`, `c squared`? Given that there is only one use of this variable, just fold it to its use `if (minusOne.getValue() != 1) continue;`.
143	Can't this be done as part of affine apply? `viewSizes` is at least partially contained by `values`. Also, nobody prevents you from adding more inputs to one of affine apply's.

This revision is now accepted and ready to land.Jul 22 2020, 4:35 AM

comments of Alex incorporated

All comments of Alex done

Harbormaster completed remote builds in B65232: Diff 279823.Jul 22 2020, 7:49 AM

Closed by commit rGe4dd964df016: [mlir] Loop bounds inference in linalg.generic op improved to support bounds… (authored by limo1996, committed by ftynse). · Explain WhyJul 23 2020, 2:02 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Linalg/

Transforms/

Loops.cpp

70 lines

Utils/

Utils.cpp

4 lines

test/

Dialect/

Linalg/

loops.mlir

174 lines

Diff 275596

mlir/lib/Dialect/Linalg/Transforms/Loops.cpp

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	emitLoopRanges(OpBuilder &b, Location loc, AffineMap map,
SmallVector<SubViewOp::Range, 4> res;		SmallVector<SubViewOp::Range, 4> res;
for (unsigned idx = 0, e = map.getNumResults(); idx < e; ++idx) {		for (unsigned idx = 0, e = map.getNumResults(); idx < e; ++idx) {
res.push_back(SubViewOp::Range{std_constant_index(0), sizes[idx],		res.push_back(SubViewOp::Range{std_constant_index(0), sizes[idx],
std_constant_index(1)});		std_constant_index(1)});
}		}
return res;		return res;
}		}

		// Does the same as emitLoopRanges but can handle symbols in the map as well.
		ftynseUnsubmitted Done Reply Inline Actions Nit: Top-level functions should use `///` for comments. ftynse: Nit: Top-level functions should use `///` for comments.
		// Expects a non-inverted, concatenated map.
		static SmallVector<SubViewOp::Range, 4>
		emitLoopRangesWithSymbols(OpBuilder &b, Location loc,
		AffineMap map, // concat maps
		ftynseUnsubmitted Done Reply Inline Actions Please explain the semantics of the argument in the top-level comment instead. ftynse: Please explain the semantics of the argument in the top-level comment instead.
		ArrayRef<Value> allViewSizes) {
		ftynseUnsubmitted Done Reply Inline Actions Prefer `ValueRange` unless it is missing some functionality ftynse: Prefer `ValueRange` unless it is missing some functionality
		assert(allViewSizes.size() == map.getNumInputs());
		ftynseUnsubmitted Done Reply Inline Actions We tend to use `assert(condition && "textual description")` ftynse: We tend to use `assert(condition && "textual description")`
		SmallVector<SubViewOp::Range, 4> res(map.getNumDims());
		for (auto result : map.getResults()) {
		if (auto d = result.dyn_cast<AffineDimExpr>()) {
		if (!res[d.getPosition()].offset)
		res[d.getPosition()] = SubViewOp::Range{std_constant_index(0),
		allViewSizes[d.getPosition()],
		std_constant_index(1)};
		}

		ftynseUnsubmitted Done Reply Inline Actions This does not look correct. Assume the concatenated map looks like `(d0,d1,d2)[s0] -> (d2,d0,d1)` (the symbol is unused for simplicity). The expected ranges should be the same as produced by the inversion method, i.e. `(0..allViewSizes[1], 0..allViewSizes[2], 0..allViewSizes[0])`, but this code will produce `(0..allViewSizes[0], 0..allViewSizes[1], 0..allViewSizes[2])`. I think you need `allViewSizes[ <index-of-result-in-map> ]` instead of `allViewSizes[d.getPosition()]`, but please double check. ftynse: This does not look correct. Assume the concatenated map looks like `(d0,d1,d2)[s0] -> (d2,d0…
		limo1996AuthorUnsubmitted Done Reply Inline Actions Yes. I moved the logic from the next commit here.. limo1996: Yes. I moved the logic from the next commit here..
		if (auto binOp = result.dyn_cast<AffineBinaryOpExpr>()) {
		ftynseUnsubmitted Done Reply Inline Actions Prefer early-return to decrease horizontal alignment. E.g., for (...) { if (auto x = dyn_cast<..>) { // code } } can be transformed into for (...) { auto x = dyn_cast<..>; if (!x) continue; // code } ftynse: Prefer early-return to decrease horizontal alignment. E.g., ``` for (...) { if (auto x =…
		auto lhs = binOp.getLHS().cast<AffineBinaryOpExpr>();
		int m = lhs.getLHS().cast<AffineDimExpr>().getPosition();
		ftynseUnsubmitted Done Reply Inline Actions `cast` assume the operand has the expected type and fails an assertion if it does not. If there is no other check, e.g. in the op verifier, this may trigger an assertion on user-entered IR, which we must avoid ftynse: `cast` assume the operand has the expected type and fails an assertion if it does not. If…
		auto floorDivExpr = binOp.getRHS().cast<AffineBinaryOpExpr>().getLHS();
		AffineMap from_map =
		ftynseUnsubmitted Done Reply Inline Actions MLIR uses camelBack for identifiers ftynse: MLIR uses camelBack for identifiers
		ftynseUnsubmitted Done Reply Inline Actions Could you please add a comment that explains the math behind this? Affine expressions are not super-easy to follow. ftynse: Could you please add a comment that explains the math behind this? Affine expressions are not…
		ftynseUnsubmitted Done Reply Inline Actions The comment I see (`m + n - s floordiv 2`) is overly concise to qualify as explanation. I was expecting something like: "if the access pattern is (m,n)[s] -> (m + n - s floordiv C)", then the bounds are `s floordiv C <= n <= max m - s floordiv C` because reason1, reason2. Actually, I seem to have said the exact same thing in the comment below, which is marked "done". Did you forget to reupload the diff after changes? ftynse: The comment I see (`m + n - s floordiv 2`) is overly concise to qualify as explanation. I was…
		ftynseUnsubmitted Done Reply Inline Actions Thanks for providing more details! Unfortunately, this confirmed my understanding that the code went too far with eager simplification and produces off-by-one results. Given 0 <= n < A 0 <= n + m - s floordiv 2 < B the correct upper bound on m is m < B - max n + s floordiv 2 We can substitute max n with (A-1) from the first inequality, thus obtaining m < B - A + 1 + s floordiv 2 Now, the code makes the assumption that s = A, which can be reasonable for the use case. This gives m < B - A + 1 + A floordiv 2 and then it tries to simplify to m < B - A floordiv 2 which only holds when `A = 1 mod 2`, because in this case `A = 2 * (A floordiv 2) + 1` and cancels out the +/-1. Otherwise, it should have been `m < B - A floordiv 2 + 1`. This does not produce out-of-bounds accesses, but ignores the last iteration for even values of A. Let's not be too eager with simplification and just compute the upper bound on m as `m < B - A + 1 + s floordiv 2` without any assumption on the divisibility of A by 2. ftynse: Thanks for providing more details! Unfortunately, this confirmed my understanding that the…
		AffineMap::get(map.getNumDims(), map.getNumSymbols(), floorDivExpr);
		auto from = applyMapToValues(b, loc, from_map, allViewSizes).front();
		auto to = b.create<SubIOp>(loc, allViewSizes[m], from);
		ftynseUnsubmitted Done Reply Inline Actions Shouldn't this also check that LHS and RHS are binary expressions of a specific kind, like "add" or "floordiv"? Like: "if the expression has the shape di + dj - sk floordiv CST, the bounds are ... because ... ". Also explain that it always overwrite the existing range because the range is always smaller in this case. Imagine having a map like `(d0,d1)[s0] -> (d0,d1,d0+d1-s0 floordiv 3)`, you will have computed the bounds for both loops when you hit this case. <disregard for now> Normally, we should have computed the maximum across all found lower bounds and the minimum across all found upper bounds. </disregard for now> ftynse: Shouldn't this also check that LHS and RHS are binary expressions of a specific kind, like…
		ftynseUnsubmitted Done Reply Inline Actions I don't see checks for binary expression kinds on lhs and rhs. In the current state, this can match something like `d0-d1s0 floordiv 42`, in which case the computed bounds will be wrong. Did you forget to upload a new version? ftynse:* I don't see checks for binary expression kinds on lhs and rhs. In the current state, this can…
		ftynseUnsubmitted Done Reply Inline Actions I still don't see the checks I expected. To be precise, `auto lhs = binOp.getLHS().dyn_cast<AffineBinaryOpExpr>();` does _not_ guarantee that lhs is a sum. I expect to see `if (lhs.getKind != AffineExprKind::Add) continue;` and similarly for rhs. ftynse: I still don't see the checks I expected. To be precise, `auto lhs = binOp.getLHS().
		res[m] = SubViewOp::Range{from, to, std_constant_index(1)};
		}
		}
		return res;
		}

		ftynseUnsubmitted Done Reply Inline Actions This is an anti-pattern in LLVM, if you intend to cast, you should use `dyn_cast` and check that the result is non-null instead. auto llhs = lhs.getLHS().dyn_cast<AffineDimExpr>(); if (!llhs) continue; int dimPosition = llhs.getPosition(); (This will not work for "add" and "mul" above because they don't have a dedicated type) Also, avoid single-character variable names. ftynse: This is an anti-pattern in LLVM, if you intend to cast, you should use `dyn_cast` and check…
template <typename IndexedValueType, typename OpType>		template <typename IndexedValueType, typename OpType>
static void inlineRegionAndEmitStore(OpType op, ArrayRef<Value> indexedValues,		static void inlineRegionAndEmitStore(OpType op, ArrayRef<Value> indexedValues,
ArrayRef<SmallVector<Value, 8>> indexing,		ArrayRef<SmallVector<Value, 8>> indexing,
ArrayRef<Value> outputBuffers) {		ArrayRef<Value> outputBuffers) {
assert(op.getOperation()->getNumRegions() == 1 &&		assert(op.getOperation()->getNumRegions() == 1 &&
"Expected single region op");		"Expected single region op");
auto &b = ScopedContext::getBuilderRef();		auto &b = ScopedContext::getBuilderRef();
auto &block = op.region().front();		auto &block = op.region().front();
BlockAndValueMapping map;		BlockAndValueMapping map;
map.map(block.getArguments(), indexedValues);		map.map(block.getArguments(), indexedValues);
for (auto &op : block.without_terminator()) {		for (auto &op : block.without_terminator()) {
assert(op.getNumRegions() == 0 && "expected a non-nested region");		assert(op.getNumRegions() == 0 && "expected a non-nested region");
auto *newOp = b.clone(op, map);		auto *newOp = b.clone(op, map);
map.map(op.getResults(), newOp->getResults());		map.map(op.getResults(), newOp->getResults());
}		}
		ftynseUnsubmitted Done Reply Inline Actions Actually, it shouldn't matter anymore whether you do `floordiv 2` or `floordiv any-other-value`. ftynse: Actually, it shouldn't matter anymore whether you do `floordiv 2` or `floordiv any-other-value`.

Operation &terminator = block.back();		Operation &terminator = block.back();
		ftynseUnsubmitted Done Reply Inline Actions This could use a better name. Please try really hard to avoid short semantically meaningless names. I recall making this same comment. The purpose of the name may seem obvious today for you, but it won't be when you'll have to debug this code two month from now. Does `c2` stand for `c * 2`, `c squared`? Given that there is only one use of this variable, just fold it to its use `if (minusOne.getValue() != 1) continue;`. ftynse: This could use a better name. Please try really hard to avoid short semantically meaningless…
assert(isa<YieldOp>(terminator) &&		assert(isa<YieldOp>(terminator) &&
"expected a yield op in the end of the region");		"expected a yield op in the end of the region");
for (unsigned i = 0, e = terminator.getNumOperands(); i < e; ++i) {		for (unsigned i = 0, e = terminator.getNumOperands(); i < e; ++i) {
IndexedValueType O(outputBuffers[i]);		IndexedValueType O(outputBuffers[i]);
O(indexing[i]) = map.lookupOrDefault(terminator.getOperand(i));		O(indexing[i]) = map.lookupOrDefault(terminator.getOperand(i));
}		}
}		}

// Returns a pair that contains input indices and output indices of a		// Returns a pair that contains input indices and output indices of a
// SingleInputPoolingOp `op`.		// SingleInputPoolingOp `op`.
struct InputAndOutputIndices {		struct InputAndOutputIndices {
SmallVector<Value, 8> inputs;		SmallVector<Value, 8> inputs;
SmallVector<Value, 8> outputs;		SmallVector<Value, 8> outputs;
};		};
template <typename SingleInputPoolingOp>		template <typename SingleInputPoolingOp>
static InputAndOutputIndices getInputAndOutputIndices(ArrayRef<Value> allIvs,		static InputAndOutputIndices getInputAndOutputIndices(ArrayRef<Value> allIvs,
SingleInputPoolingOp op) {		SingleInputPoolingOp op) {
		ftynseUnsubmitted Done Reply Inline Actions Can't this be done as part of affine apply? `viewSizes` is at least partially contained by `values`. Also, nobody prevents you from adding more inputs to one of affine apply's. ftynse: Can't this be done as part of affine apply? `viewSizes` is at least partially contained by…
auto &b = ScopedContext::getBuilderRef();		auto &b = ScopedContext::getBuilderRef();
auto loc = ScopedContext::getLocation();		auto loc = ScopedContext::getLocation();
auto mapsRange = op.indexing_maps().template getAsRange<AffineMapAttr>();		auto mapsRange = op.indexing_maps().template getAsRange<AffineMapAttr>();
auto maps = llvm::to_vector<8>(		auto maps = llvm::to_vector<8>(
llvm::map_range(mapsRange, [](AffineMapAttr a) { return a.getValue(); }));		llvm::map_range(mapsRange, [](AffineMapAttr a) { return a.getValue(); }));
return InputAndOutputIndices{		return InputAndOutputIndices{
makeCanonicalAffineApplies(b, loc, maps[0], allIvs),		makeCanonicalAffineApplies(b, loc, maps[0], allIvs),
makeCanonicalAffineApplies(b, loc, maps[2], allIvs)};		makeCanonicalAffineApplies(b, loc, maps[2], allIvs)};
▲ Show 20 Lines • Show All 337 Lines • ▼ Show 20 Lines	Optional<LinalgLoops> linalgOpToLoopsImpl(Operation *op, OpBuilder &builder) {
// permutation map (which is asserted in the inverse calculation).		// permutation map (which is asserted in the inverse calculation).
auto linalgOp = cast<ConcreteOpTy>(op);		auto linalgOp = cast<ConcreteOpTy>(op);
assert(linalgOp.hasBufferSemantics() &&		assert(linalgOp.hasBufferSemantics() &&
"expected linalg op with buffer semantics");		"expected linalg op with buffer semantics");
auto mapsRange =		auto mapsRange =
linalgOp.indexing_maps().template getAsRange<AffineMapAttr>();		linalgOp.indexing_maps().template getAsRange<AffineMapAttr>();
auto maps = llvm::to_vector<8>(		auto maps = llvm::to_vector<8>(
llvm::map_range(mapsRange, [](AffineMapAttr a) { return a.getValue(); }));		llvm::map_range(mapsRange, [](AffineMapAttr a) { return a.getValue(); }));
AffineMap invertedMap = inversePermutation(concatAffineMaps(maps));		auto map = concatAffineMaps(maps);
		SmallVector<SubViewOp::Range, 4> loopRanges;

		auto attr = linalgOp.template getAttrOfType<IntegerAttr>("symbol_source");
		ftynseUnsubmitted Done Reply Inline Actions Since the parent commit introduced `symbol_source` as an ODS attribute, it should be possible to just call `linalgOp.symbol_source()` here and avoid using a hardcoded name. ftynse: Since the parent commit introduced `symbol_source` as an ODS attribute, it should be possible…
		limo1996AuthorUnsubmitted Done Reply Inline Actions This code was removed and simplified so no need for that limo1996: This code was removed and simplified so no need for that
		if (attr) {
		// This map has symbols and thus is not a permutation. Therefore we
		// cannot invert it.
		unsigned ss = (unsigned)attr.getInt();
		ftynseUnsubmitted Done Reply Inline Actions Please use a more descriptive variable name. Also avoid C-style casts. ftynse: Please use a more descriptive variable name. Also avoid C-style casts.
		ftynseUnsubmitted Done Reply Inline Actions mlir uses camelBack ftynse: mlir uses camelBack
		auto sizes = getViewSizes(builder, linalgOp);
		unsigned numIn = map.getNumInputs(), numDims = map.getNumDims();
		unsigned diff = (unsigned)(numIn - numDims);
		ftynseUnsubmitted Done Reply Inline Actions No need for a cast here, but I'd check or assert somewhere (depending on whether it can happen with user-entered IR or not) that `numIn > numDims`, one can have a map of the shape `(i) -> (i,i,i)`. ftynse: No need for a cast here, but I'd check or assert somewhere (depending on whether it can happen…
		limo1996AuthorUnsubmitted Done Reply Inline Actions numInputs = numDims + numSymbols so it should never happen limo1996: numInputs = numDims + numSymbols so it should never happen
		ftynseUnsubmitted Done Reply Inline Actions `unsigned diff = map.getNumSymbols();` ? Also, `numSymb` sounds like a better name for this? ftynse: `unsigned diff = map.getNumSymbols();` ? Also, `numSymb` sounds like a better name for this?
		limo1996AuthorUnsubmitted Done Reply Inline Actions This code was also removed limo1996: This code was also removed

		// Append or rewrite the end of the value list that corresponds to the
		// symbols. They are in this case dims of the "symbol_source" operand.
		sizes.resize(numIn);
		for (unsigned idx = 0; idx < diff; idx++)
		sizes[numDims + idx] = sizes[diff * ss + idx];
		ftynseUnsubmitted Done Reply Inline Actions I'm not convinced `diff * symbolSource + idx` is the correct expression. You actually want the sizes that correspond to `symbolSource`'s operand, which should be computed as the sum of ranks of all operands before `symbolSource`. (`diff` is the current number of symbols). ftynse: I'm not convinced `diff * symbolSource + idx` is the correct expression. You actually want the…
		limo1996AuthorUnsubmitted Done Reply Inline Actions Yes this was also fixed in the previous commit and evolved here.. limo1996: Yes this was also fixed in the previous commit and evolved here..
		loopRanges = emitLoopRangesWithSymbols(scope.getBuilderRef(),
		scope.getLocation(), map, sizes);
		} else {
		AffineMap invertedMap = inversePermutation(map);
		limo1996AuthorUnsubmitted Done Reply Inline Actions @ftynse inversePermutation needs to "support symbols by ignoring them" as verification uses the function to check whether concatination of indexing maps is invertible.. Another option is to ignore the check if map has symbols which can cause problems.. limo1996: @ftynse inversePermutation needs to "support symbols by ignoring them" as verification uses the…
if (!invertedMap)		if (!invertedMap)
return {};		return {};
if (invertedMap.isEmpty()) {		if (invertedMap.isEmpty()) {
emitScalarImplementation<IndexedValueTy>({}, linalgOp);		emitScalarImplementation<IndexedValueTy>({}, linalgOp);
return LinalgLoops();		return LinalgLoops();
}		}

		loopRanges = emitLoopRanges(scope.getBuilderRef(), scope.getLocation(),
		invertedMap, getViewSizes(builder, linalgOp));
		}
SmallVector<Value, 4> allIvs;		SmallVector<Value, 4> allIvs;
auto loopRanges =
emitLoopRanges(scope.getBuilderRef(), scope.getLocation(), invertedMap,
getViewSizes(builder, linalgOp));
GenerateLoopNest<LoopTy>::doit(		GenerateLoopNest<LoopTy>::doit(
loopRanges, linalgOp.iterator_types().getValue(), [&](ValueRange ivs) {		loopRanges, linalgOp.iterator_types().getValue(), [&](ValueRange ivs) {
allIvs.append(ivs.begin(), ivs.end());		allIvs.append(ivs.begin(), ivs.end());
emitScalarImplementation<IndexedValueTy>(allIvs, linalgOp);		emitScalarImplementation<IndexedValueTy>(allIvs, linalgOp);
});		});
// Number of loop ops might be different from the number of ivs since some		// Number of loop ops might be different from the number of ivs since some
// loops like affine.parallel and scf.parallel have multiple ivs.		// loops like affine.parallel and scf.parallel have multiple ivs.
llvm::SetVector<Operation *> loopSet;		llvm::SetVector<Operation *> loopSet;
▲ Show 20 Lines • Show All 201 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Utils/Utils.cpp

	Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	}			}

	SmallVector<Value, 4> mlir::linalg::applyMapToValues(OpBuilder &b, Location loc,			SmallVector<Value, 4> mlir::linalg::applyMapToValues(OpBuilder &b, Location loc,
	AffineMap map,			AffineMap map,
	ArrayRef<Value> values,			ArrayRef<Value> values,
	OperationFolder *folder) {			OperationFolder *folder) {
	SmallVector<Value, 4> res;			SmallVector<Value, 4> res;
	res.reserve(map.getNumResults());			res.reserve(map.getNumResults());
	unsigned numDims = map.getNumDims();			unsigned numDims = map.getNumDims(), numSym = map.getNumSymbols();
	// For each `expr` in `map`, applies the `expr` to the values extracted from			// For each `expr` in `map`, applies the `expr` to the values extracted from
	// ranges. If the resulting application can be folded into a Value, the			// ranges. If the resulting application can be folded into a Value, the
	// folding occurs eagerly. Otherwise, an affine.apply operation is emitted.			// folding occurs eagerly. Otherwise, an affine.apply operation is emitted.
	for (auto expr : map.getResults()) {			for (auto expr : map.getResults()) {
	AffineMap map = AffineMap::get(numDims, 0, expr);			AffineMap map = AffineMap::get(numDims, numSym, expr);
	res.push_back(emitOrFoldComposedAffineApply(b, loc, map, values, folder));			res.push_back(emitOrFoldComposedAffineApply(b, loc, map, values, folder));
	}			}
	return res;			return res;
	}			}

	/// Returns all the operands of `linalgOp` that are not views.			/// Returns all the operands of `linalgOp` that are not views.
	/// Asserts that these operands are value types to allow transformations like			/// Asserts that these operands are value types to allow transformations like
	/// tiling to just use the values when cloning `linalgOp`.			/// tiling to just use the values when cloning `linalgOp`.
	▲ Show 20 Lines • Show All 165 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/loops.mlir

// RUN: mlir-opt %s -convert-linalg-to-loops \| FileCheck --check-prefix=CHECKLOOP %s		// RUN: mlir-opt %s -convert-linalg-to-loops \| FileCheck --check-prefix=CHECKLOOP %s
// RUN: mlir-opt %s -convert-linalg-to-parallel-loops \| FileCheck --check-prefix=CHECKPARALLEL %s		// RUN: mlir-opt %s -convert-linalg-to-parallel-loops \| FileCheck --check-prefix=CHECKPARALLEL %s

// Test that we can lower all the way to LLVM without crashing, don't check results here.		// Test that we can lower all the way to LLVM without crashing, don't check results here.
// RUN: mlir-opt %s -convert-linalg-to-loops -convert-linalg-to-llvm -o=/dev/null 2>&1		// RUN: mlir-opt %s -convert-linalg-to-loops -convert-linalg-to-llvm -o=/dev/null 2>&1

// CHECKLOOP-DAG: #[[$strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>		// CHECKLOOP-DAG: #[[$strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>
// CHECKLOOP-DAG: #[[$strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>		// CHECKLOOP-DAG: #[[$strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>
// CHECKLOOP-DAG: #[[$strided3D:.]] = affine_map<(d0, d1, d2)[s0, s1, s2] -> (d0 s1 + s0 + d1 * s2 + d2)>		// CHECKLOOP-DAG: #[[$strided3D:.]] = affine_map<(d0, d1, d2)[s0, s1, s2] -> (d0 s1 + s0 + d1 * s2 + d2)>
// CHECKLOOP-DAG: #[[$strided4D:.]] = affine_map<(d0, d1, d2, d3)[s0, s1, s2, s3] -> (d0 s1 + s0 + d1 * s2 + d2 * s3 + d3)>		// CHECKLOOP-DAG: #[[$strided4D:.]] = affine_map<(d0, d1, d2, d3)[s0, s1, s2, s3] -> (d0 s1 + s0 + d1 * s2 + d2 * s3 + d3)>
// CHECKLOOP-DAG: #[[$clampMinMap:.*]] = affine_map<(d0) -> (d0, 0)>		// CHECKLOOP-DAG: #[[$clampMinMap:.*]] = affine_map<(d0) -> (d0, 0)>

// CHECKLOOP-DAG: #[[$stride1Dilation1:.*]] = affine_map<(d0, d1) -> (d0 + d1)>		// CHECKLOOP-DAG: #[[$stride1Dilation1:.*]] = affine_map<(d0, d1) -> (d0 + d1)>
// CHECKLOOP-DAG: #[[$stride2Dilation1:.]] = affine_map<(d0, d1) -> (d0 2 + d1)>		// CHECKLOOP-DAG: #[[$stride2Dilation1:.]] = affine_map<(d0, d1) -> (d0 2 + d1)>
// CHECKLOOP-DAG: #[[$stride2Dilation4:.]] = affine_map<(d0, d1) -> (d0 2 + d1 * 4)>		// CHECKLOOP-DAG: #[[$stride2Dilation4:.]] = affine_map<(d0, d1) -> (d0 2 + d1 * 4)>
// CHECKLOOP-DAG: #[[$stride3Dilation5:.]] = affine_map<(d0, d1) -> (d0 3 + d1 * 5)>		// CHECKLOOP-DAG: #[[$stride3Dilation5:.]] = affine_map<(d0, d1) -> (d0 3 + d1 * 5)>
		// CHECKLOOP-DAG: #[[$convHalf:.*]] = affine_map<()[s0] -> (s0 floordiv 2)>
// CHECKLOOP-DAG: #[[$stridedConv:.*]] = affine_map<(d0, d1)[s0] -> (d0 + d1 - s0 floordiv 2)>		// CHECKLOOP-DAG: #[[$stridedConv:.*]] = affine_map<(d0, d1)[s0] -> (d0 + d1 - s0 floordiv 2)>

// CHECKPARALLEL-DAG: #[[$strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>		// CHECKPARALLEL-DAG: #[[$strided1D:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>
// CHECKPARALLEL-DAG: #[[$strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>		// CHECKPARALLEL-DAG: #[[$strided2D:.]] = affine_map<(d0, d1)[s0, s1] -> (d0 s1 + s0 + d1)>
// CHECKPARALLEL-DAG: #[[$strided3D:.]] = affine_map<(d0, d1, d2)[s0, s1, s2] -> (d0 s1 + s0 + d1 * s2 + d2)>		// CHECKPARALLEL-DAG: #[[$strided3D:.]] = affine_map<(d0, d1, d2)[s0, s1, s2] -> (d0 s1 + s0 + d1 * s2 + d2)>
// CHECKPARALLEL-DAG: #[[$strided4D:.]] = affine_map<(d0, d1, d2, d3)[s0, s1, s2, s3] -> (d0 s1 + s0 + d1 * s2 + d2 * s3 + d3)>		// CHECKPARALLEL-DAG: #[[$strided4D:.]] = affine_map<(d0, d1, d2, d3)[s0, s1, s2, s3] -> (d0 s1 + s0 + d1 * s2 + d2 * s3 + d3)>
// CHECKPARALLEL-DAG: #[[$clampMinMap:.*]] = affine_map<(d0) -> (d0, 0)>		// CHECKPARALLEL-DAG: #[[$clampMinMap:.*]] = affine_map<(d0) -> (d0, 0)>

// CHECKPARALLEL-DAG: #[[$stride1Dilation1:.*]] = affine_map<(d0, d1) -> (d0 + d1)>		// CHECKPARALLEL-DAG: #[[$stride1Dilation1:.*]] = affine_map<(d0, d1) -> (d0 + d1)>
// CHECKPARALLEL-DAG: #[[$stride2Dilation1:.]] = affine_map<(d0, d1) -> (d0 2 + d1)>		// CHECKPARALLEL-DAG: #[[$stride2Dilation1:.]] = affine_map<(d0, d1) -> (d0 2 + d1)>
// CHECKPARALLEL-DAG: #[[$stride2Dilation4:.]] = affine_map<(d0, d1) -> (d0 2 + d1 * 4)>		// CHECKPARALLEL-DAG: #[[$stride2Dilation4:.]] = affine_map<(d0, d1) -> (d0 2 + d1 * 4)>
// CHECKPARALLEL-DAG: #[[$stride3Dilation5:.]] = affine_map<(d0, d1) -> (d0 3 + d1 * 5)>		// CHECKPARALLEL-DAG: #[[$stride3Dilation5:.]] = affine_map<(d0, d1) -> (d0 3 + d1 * 5)>
		// CHECKPARALLEL-DAG: #[[$convHalf:.*]] = affine_map<()[s0] -> (s0 floordiv 2)>
// CHECKPARALLEL-DAG: #[[$stridedConv:.*]] = affine_map<(d0, d1)[s0] -> (d0 + d1 - s0 floordiv 2)>		// CHECKPARALLEL-DAG: #[[$stridedConv:.*]] = affine_map<(d0, d1)[s0] -> (d0 + d1 - s0 floordiv 2)>


func @matmul(%arg0: memref<?xi8>, %M: index, %N: index, %K: index) {		func @matmul(%arg0: memref<?xi8>, %M: index, %N: index, %K: index) {
%c0 = constant 0 : index		%c0 = constant 0 : index
%c1 = constant 1 : index		%c1 = constant 1 : index
%A = view %arg0[%c0][%M, %K] : memref<?xi8> to memref<?x?xf32>		%A = view %arg0[%c0][%M, %K] : memref<?xi8> to memref<?x?xf32>
%B = view %arg0[%c0][%K, %N] : memref<?xi8> to memref<?x?xf32>		%B = view %arg0[%c0][%K, %N] : memref<?xi8> to memref<?x?xf32>
▲ Show 20 Lines • Show All 904 Lines • ▼ Show 20 Lines	^bb0(%a: f32, %b: f32, %c: f32) :
memref<?xf32>		memref<?xf32>
return		return
}		}

// CHECKLOOP-LABEL: @conv1d		// CHECKLOOP-LABEL: @conv1d
// CHECKLOOP-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?xf32>		// CHECKLOOP-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?xf32>
// CHECKLOOP-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?xf32>		// CHECKLOOP-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?xf32>
// CHECKLOOP-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?xf32>		// CHECKLOOP-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?xf32>
// CHECKLOOP: %[[dim0:.*]] = dim %[[arg0]], %c0 : memref<?xf32>		// CHECKLOOP: %c1 = constant 1 : index
// CHECKLOOP: %[[dim1:.*]] = dim %[[arg2]], %c0 : memref<?xf32>		// CHECKLOOP: %c0 = constant 0 : index
// CHECKLOOP: scf.for %[[b:.]] = %{{.}} to %[[dim1]] step %{{.*}} {		// CHECKLOOP: %[[dim0:.*]] = dim %[[arg1]], %c0 : memref<?xf32>
// CHECKLOOP: scf.for %[[m:.]] = %{{.}} to %[[dim0]] step %{{.*}} {		// CHECKLOOP: %[[dim1:.*]] = dim %[[arg0]], %c0 : memref<?xf32>
		// CHECKLOOP: %[[half:.*]] = affine.apply #[[$convHalf]]()[%[[dim1]]]
		// CHECKLOOP: %[[sizeMinusHalf:.*]] = subi %[[dim0]], %[[half]] : index
		// CHECKLOOP: scf.for %[[b:.]] = %[[half]] to %[[sizeMinusHalf]] step %{{.}} {
		// CHECKLOOP: scf.for %[[m:.]] = %{{.}} to %[[dim1]] step %{{.*}} {
// CHECKLOOP: %[[dim2:.*]] = dim %[[arg0]], %c0 : memref<?xf32>		// CHECKLOOP: %[[dim2:.*]] = dim %[[arg0]], %c0 : memref<?xf32>
// CHECKLOOP: %[[aff:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim2]]]		// CHECKLOOP: %[[aff:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim2]]]
// CHECKLOOP: %[[va:.*]] = load %[[arg1]][%[[aff]]] : memref<?xf32>		// CHECKLOOP: %[[va:.*]] = load %[[arg1]][%[[aff]]] : memref<?xf32>
// CHECKLOOP: %[[vb:.*]] = load %[[arg0]][%[[m]]] : memref<?xf32>		// CHECKLOOP: %[[vb:.*]] = load %[[arg0]][%[[m]]] : memref<?xf32>
// CHECKLOOP: %[[vc:.*]] = load %[[arg2]][%[[b]]] : memref<?xf32>		// CHECKLOOP: %[[vc:.*]] = load %[[arg2]][%[[b]]] : memref<?xf32>
// CHECKLOOP: %[[inc:.*]] = mulf %[[va]], %[[vb]] : f32		// CHECKLOOP: %[[inc:.*]] = mulf %[[va]], %[[vb]] : f32
// CHECKLOOP: %[[res:.*]] = addf %[[vc]], %[[inc]] : f32		// CHECKLOOP: %[[res:.*]] = addf %[[vc]], %[[inc]] : f32
// CHECKLOOP: store %[[res]], %[[arg2]][%[[b]]] : memref<?xf32>		// CHECKLOOP: store %[[res]], %[[arg2]][%[[b]]] : memref<?xf32>

// CHECKPARALLEL-LABEL: @conv1d		// CHECKPARALLEL-LABEL: @conv1d
// CHECKPARALLEL-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?xf32>		// CHECKPARALLEL-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?xf32>
// CHECKPARALLEL-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?xf32>		// CHECKPARALLEL-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?xf32>
// CHECKPARALLEL-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?xf32>		// CHECKPARALLEL-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?xf32>
// CHECKPARALLEL: %[[dim0:.*]] = dim %[[arg0]], %c0 : memref<?xf32>		// CHECKPARALLEL: %c1 = constant 1 : index
// CHECKPARALLEL: %[[dim1:.*]] = dim %[[arg2]], %c0 : memref<?xf32>		// CHECKPARALLEL: %c0 = constant 0 : index
// CHECKPARALLEL: scf.parallel (%[[b:.]], %[[m:.]]) = (%c0, %c0) to (%[[dim1]], %[[dim0]]) step ({{.*}}) {		// CHECKPARALLEL: %[[dim0:.*]] = dim %[[arg1]], %c0 : memref<?xf32>
		// CHECKPARALLEL: %[[dim1:.*]] = dim %[[arg0]], %c0 : memref<?xf32>
		// CHECKPARALLEL: %[[half:.*]] = affine.apply #[[$convHalf]]()[%[[dim1]]]
		// CHECKPARALLEL: %[[sizeMinusHalf:.*]] = subi %[[dim0]], %[[half]] : index
		// CHECKPARALLEL: scf.parallel (%[[b:.]], %[[m:.]]) = (%[[half]], %c0) to (%[[sizeMinusHalf]], %[[dim1]]) step ({{.*}}) {
// CHECKPARALLEL: %[[dim2:.*]] = dim %[[arg0]], %c0 : memref<?xf32>		// CHECKPARALLEL: %[[dim2:.*]] = dim %[[arg0]], %c0 : memref<?xf32>
// CHECKPARALLEL: %[[aff:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim2]]]		// CHECKPARALLEL: %[[aff:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim2]]]
// CHECKPARALLEL: %[[va:.*]] = load %[[arg1]][%[[aff]]] : memref<?xf32>		// CHECKPARALLEL: %[[va:.*]] = load %[[arg1]][%[[aff]]] : memref<?xf32>
// CHECKPARALLEL: %[[vb:.*]] = load %[[arg0]][%[[m]]] : memref<?xf32>		// CHECKPARALLEL: %[[vb:.*]] = load %[[arg0]][%[[m]]] : memref<?xf32>
// CHECKPARALLEL: %[[vc:.*]] = load %[[arg2]][%[[b]]] : memref<?xf32>		// CHECKPARALLEL: %[[vc:.*]] = load %[[arg2]][%[[b]]] : memref<?xf32>
// CHECKPARALLEL: %[[inc:.*]] = mulf %[[va]], %[[vb]] : f32		// CHECKPARALLEL: %[[inc:.*]] = mulf %[[va]], %[[vb]] : f32
// CHECKPARALLEL: %[[res:.*]] = addf %[[vc]], %[[inc]] : f32		// CHECKPARALLEL: %[[res:.*]] = addf %[[vc]], %[[inc]] : f32
// CHECKPARALLEL: store %[[res]], %[[arg2]][%[[b]]] : memref<?xf32>		// CHECKPARALLEL: store %[[res]], %[[arg2]][%[[b]]] : memref<?xf32>
Show All 26 Lines	^bb0(%a: f32, %b: f32, %c: f32) :
memref<?x?xf32>		memref<?x?xf32>
return		return
}		}

// CHECKLOOP-LABEL: @conv2d		// CHECKLOOP-LABEL: @conv2d
// CHECKLOOP-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?x?xf32>		// CHECKLOOP-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?x?xf32>
// CHECKLOOP-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?x?xf32>		// CHECKLOOP-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?x?xf32>
// CHECKLOOP-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?x?xf32>		// CHECKLOOP-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?x?xf32>
// CHECKLOOP: %[[dim0:.*]] = dim %[[arg0]], %c0 : memref<?x?xf32>		// CHECKLOOP: %[[dim0:.*]] = dim %[[arg1]], %c0 : memref<?x?xf32>
// CHECKLOOP: %[[dim1:.*]] = dim %[[arg0]], %c1 : memref<?x?xf32>		// CHECKLOOP: %[[dim1:.*]] = dim %[[arg1]], %c1 : memref<?x?xf32>
// CHECKLOOP: %[[dim2:.*]] = dim %[[arg2]], %c0 : memref<?x?xf32>		// CHECKLOOP: %[[dim2:.*]] = dim %[[arg0]], %c0 : memref<?x?xf32>
// CHECKLOOP: %[[dim3:.*]] = dim %[[arg2]], %c1 : memref<?x?xf32>		// CHECKLOOP: %[[dim3:.*]] = dim %[[arg0]], %c1 : memref<?x?xf32>
// CHECKLOOP: scf.for %[[i0:.]] = %{{.}} to %[[dim2]] step %{{.*}} {		// CHECKLOOP: %[[half1:.*]] = affine.apply #[[$convHalf]]()[%[[dim2]]]
// CHECKLOOP: scf.for %[[i1:.]] = %{{.}} to %[[dim3]] step %{{.*}} {		// CHECKLOOP: %[[sizeMinusHalf1:.*]] = subi %[[dim0]], %[[half1]] : index
// CHECKLOOP: scf.for %[[i2:.]] = %{{.}} to %[[dim0]] step %{{.*}} {		// CHECKLOOP: %[[half2:.*]] = affine.apply #[[$convHalf]]()[%[[dim3]]]
// CHECKLOOP: scf.for %[[i3:.]] = %{{.}} to %[[dim1]] step %{{.*}} {		// CHECKLOOP: %[[sizeMinusHalf2:.*]] = subi %[[dim1]], %[[half2]] : index
		// CHECKLOOP: scf.for %[[i0:.]] = %[[half1]] to %[[sizeMinusHalf1]] step %{{.}} {
		// CHECKLOOP: scf.for %[[i1:.]] = %[[half2]] to %[[sizeMinusHalf2]] step %{{.}} {
		// CHECKLOOP: scf.for %[[i2:.]] = %{{.}} to %[[dim2]] step %{{.*}} {
		// CHECKLOOP: scf.for %[[i3:.]] = %{{.}} to %[[dim3]] step %{{.*}} {
// CHECKLOOP: %[[dim4:.*]] = dim %[[arg0]], %c0 : memref<?x?xf32>		// CHECKLOOP: %[[dim4:.*]] = dim %[[arg0]], %c0 : memref<?x?xf32>
// CHECKLOOP: %[[dim5:.*]] = dim %[[arg0]], %c1 : memref<?x?xf32>		// CHECKLOOP: %[[dim5:.*]] = dim %[[arg0]], %c1 : memref<?x?xf32>
// CHECKLOOP: %[[aff1:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim4]]]		// CHECKLOOP: %[[aff1:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim4]]]
// CHECKLOOP: %[[aff2:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim5]]]		// CHECKLOOP: %[[aff2:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim5]]]
// CHECKLOOP: %[[va:.*]] = load %[[arg1]][%[[aff1]], %[[aff2]]] : memref<?x?xf32>		// CHECKLOOP: %[[va:.*]] = load %[[arg1]][%[[aff1]], %[[aff2]]] : memref<?x?xf32>
// CHECKLOOP: %[[vb:.*]] = load %[[arg0]][%[[i2]], %[[i3]]] : memref<?x?xf32>		// CHECKLOOP: %[[vb:.*]] = load %[[arg0]][%[[i2]], %[[i3]]] : memref<?x?xf32>
// CHECKLOOP: %[[vc:.*]] = load %[[arg2]][%[[i0]], %[[i1]]] : memref<?x?xf32>		// CHECKLOOP: %[[vc:.*]] = load %[[arg2]][%[[i0]], %[[i1]]] : memref<?x?xf32>
// CHECKLOOP: %[[inc:.*]] = mulf %[[va]], %[[vb]] : f32		// CHECKLOOP: %[[inc:.*]] = mulf %[[va]], %[[vb]] : f32
// CHECKLOOP: %[[res:.*]] = addf %[[vc]], %[[inc]] : f32		// CHECKLOOP: %[[res:.*]] = addf %[[vc]], %[[inc]] : f32
// CHECKLOOP: store %[[res]], %[[arg2]][%[[i0]], %[[i1]]] : memref<?x?xf32>		// CHECKLOOP: store %[[res]], %[[arg2]][%[[i0]], %[[i1]]] : memref<?x?xf32>

// CHECKPARALLEL-LABEL: @conv2d		// CHECKPARALLEL-LABEL: @conv2d
// CHECKPARALLEL-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?x?xf32>		// CHECKPARALLEL-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?x?xf32>
// CHECKPARALLEL-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?x?xf32>		// CHECKPARALLEL-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?x?xf32>
// CHECKPARALLEL-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?x?xf32>		// CHECKPARALLEL-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?x?xf32>
// CHECKPARALLEL: %[[dim0:.*]] = dim %[[arg0]], %c0 : memref<?x?xf32>		// CHECKPARALLEL: %[[dim0:.*]] = dim %[[arg1]], %c0 : memref<?x?xf32>
// CHECKPARALLEL: %[[dim1:.*]] = dim %[[arg0]], %c1 : memref<?x?xf32>		// CHECKPARALLEL: %[[dim1:.*]] = dim %[[arg1]], %c1 : memref<?x?xf32>
// CHECKPARALLEL: %[[dim2:.*]] = dim %[[arg2]], %c0 : memref<?x?xf32>		// CHECKPARALLEL: %[[dim2:.*]] = dim %[[arg0]], %c0 : memref<?x?xf32>
// CHECKPARALLEL: %[[dim3:.*]] = dim %[[arg2]], %c1 : memref<?x?xf32>		// CHECKPARALLEL: %[[dim3:.*]] = dim %[[arg0]], %c1 : memref<?x?xf32>
// CHECKPARALLEL: scf.parallel (%[[i0:.]], %[[i1:.]], %[[i2:.]], %[[i3:.]]) = (%c0, %c0, %c0, %c0) to (%[[dim2]], %[[dim3]], %[[dim0]], %[[dim1]]) step ({{.*}}) {		// CHECKPARALLEL: %[[half1:.*]] = affine.apply #[[$convHalf]]()[%[[dim2]]]
		// CHECKPARALLEL: %[[sizeMinusHalf1:.*]] = subi %[[dim0]], %[[half1]] : index
		// CHECKPARALLEL: %[[half2:.*]] = affine.apply #[[$convHalf]]()[%[[dim3]]]
		// CHECKPARALLEL: %[[sizeMinusHalf2:.*]] = subi %[[dim1]], %[[half2]] : index
		// CHECKPARALLEL: scf.parallel (%[[i0:.]], %[[i1:.]], %[[i2:.]], %[[i3:.]]) = (%[[half1]], %[[half2]], %c0, %c0) to (%[[sizeMinusHalf1]], %[[sizeMinusHalf2]], %[[dim2]], %[[dim3]]) step ({{.*}}) {
// CHECKPARALLEL: %[[dim4:.*]] = dim %[[arg0]], %c0 : memref<?x?xf32>		// CHECKPARALLEL: %[[dim4:.*]] = dim %[[arg0]], %c0 : memref<?x?xf32>
// CHECKPARALLEL: %[[dim5:.*]] = dim %[[arg0]], %c1 : memref<?x?xf32>		// CHECKPARALLEL: %[[dim5:.*]] = dim %[[arg0]], %c1 : memref<?x?xf32>
// CHECKPARALLEL: %[[aff1:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim4]]]		// CHECKPARALLEL: %[[aff1:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim4]]]
// CHECKPARALLEL: %[[aff2:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim5]]]		// CHECKPARALLEL: %[[aff2:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim5]]]
// CHECKPARALLEL: %[[va:.*]] = load %[[arg1]][%[[aff1]], %[[aff2]]] : memref<?x?xf32>		// CHECKPARALLEL: %[[va:.*]] = load %[[arg1]][%[[aff1]], %[[aff2]]] : memref<?x?xf32>
// CHECKPARALLEL: %[[vb:.*]] = load %[[arg0]][%[[i2]], %[[i3]]] : memref<?x?xf32>		// CHECKPARALLEL: %[[vb:.*]] = load %[[arg0]][%[[i2]], %[[i3]]] : memref<?x?xf32>
// CHECKPARALLEL: %[[vc:.*]] = load %[[arg2]][%[[i0]], %[[i1]]] : memref<?x?xf32>		// CHECKPARALLEL: %[[vc:.*]] = load %[[arg2]][%[[i0]], %[[i1]]] : memref<?x?xf32>
// CHECKPARALLEL: %[[inc:.*]] = mulf %[[va]], %[[vb]] : f32		// CHECKPARALLEL: %[[inc:.*]] = mulf %[[va]], %[[vb]] : f32
Show All 28 Lines	^bb0(%a: f32, %b: f32, %c: f32) :
memref<?x?x?xf32>		memref<?x?x?xf32>
return		return
}		}

// CHECKLOOP-LABEL: @conv3d		// CHECKLOOP-LABEL: @conv3d
// CHECKLOOP-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?x?x?xf32>		// CHECKLOOP-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?x?x?xf32>
// CHECKLOOP-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?x?x?xf32>		// CHECKLOOP-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?x?x?xf32>
// CHECKLOOP-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?x?x?xf32>		// CHECKLOOP-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?x?x?xf32>
// CHECKLOOP: %[[dim0:.*]] = dim %[[arg0]], %c0 : memref<?x?x?xf32>		// CHECKLOOP: %[[dim0:.*]] = dim %[[arg1]], %c0 : memref<?x?x?xf32>
// CHECKLOOP: %[[dim1:.*]] = dim %[[arg0]], %c1 : memref<?x?x?xf32>		// CHECKLOOP: %[[dim1:.*]] = dim %[[arg1]], %c1 : memref<?x?x?xf32>
// CHECKLOOP: %[[dim2:.*]] = dim %[[arg0]], %c2 : memref<?x?x?xf32>		// CHECKLOOP: %[[dim2:.*]] = dim %[[arg1]], %c2 : memref<?x?x?xf32>
// CHECKLOOP: %[[dim3:.*]] = dim %[[arg2]], %c0 : memref<?x?x?xf32>		// CHECKLOOP: %[[dim3:.*]] = dim %[[arg0]], %c0 : memref<?x?x?xf32>
// CHECKLOOP: %[[dim4:.*]] = dim %[[arg2]], %c1 : memref<?x?x?xf32>		// CHECKLOOP: %[[dim4:.*]] = dim %[[arg0]], %c1 : memref<?x?x?xf32>
// CHECKLOOP: %[[dim5:.*]] = dim %[[arg2]], %c2 : memref<?x?x?xf32>		// CHECKLOOP: %[[dim5:.*]] = dim %[[arg0]], %c2 : memref<?x?x?xf32>
// CHECKLOOP: scf.for %[[i0:.]] = %{{.}} to %[[dim3]] step %{{.*}} {		// CHECKLOOP: %[[half1:.*]] = affine.apply #[[$convHalf]]()[%[[dim3]]]
// CHECKLOOP: scf.for %[[i1:.]] = %{{.}} to %[[dim4]] step %{{.*}} {		// CHECKLOOP: %[[sizeMinusHalf1:.*]] = subi %[[dim0]], %[[half1]] : index
// CHECKLOOP: scf.for %[[i2:.]] = %{{.}} to %[[dim5]] step %{{.*}} {		// CHECKLOOP: %[[half2:.*]] = affine.apply #[[$convHalf]]()[%[[dim4]]]
// CHECKLOOP: scf.for %[[i3:.]] = %{{.}} to %[[dim0]] step %{{.*}} {		// CHECKLOOP: %[[sizeMinusHalf2:.*]] = subi %[[dim1]], %[[half2]] : index
// CHECKLOOP: scf.for %[[i4:.]] = %{{.}} to %[[dim1]] step %{{.*}} {		// CHECKLOOP: %[[half3:.*]] = affine.apply #[[$convHalf]]()[%[[dim5]]]
// CHECKLOOP: scf.for %[[i5:.]] = %{{.}} to %[[dim2]] step %{{.*}} {		// CHECKLOOP: %[[sizeMinusHalf3:.*]] = subi %[[dim2]], %[[half3]] : index
		// CHECKLOOP: scf.for %[[i0:.]] = %[[half1]] to %[[sizeMinusHalf1]] step %{{.}} {
		// CHECKLOOP: scf.for %[[i1:.]] = %[[half2]] to %[[sizeMinusHalf2]] step %{{.}} {
		// CHECKLOOP: scf.for %[[i2:.]] = %[[half3]] to %[[sizeMinusHalf3]] step %{{.}} {
		// CHECKLOOP: scf.for %[[i3:.]] = %{{.}} to %[[dim3]] step %{{.*}} {
		// CHECKLOOP: scf.for %[[i4:.]] = %{{.}} to %[[dim4]] step %{{.*}} {
		// CHECKLOOP: scf.for %[[i5:.]] = %{{.}} to %[[dim5]] step %{{.*}} {
// CHECKLOOP: %[[dim6:.*]] = dim %[[arg0]], %c0 : memref<?x?x?xf32>		// CHECKLOOP: %[[dim6:.*]] = dim %[[arg0]], %c0 : memref<?x?x?xf32>
// CHECKLOOP: %[[dim7:.*]] = dim %[[arg0]], %c1 : memref<?x?x?xf32>		// CHECKLOOP: %[[dim7:.*]] = dim %[[arg0]], %c1 : memref<?x?x?xf32>
// CHECKLOOP: %[[dim8:.*]] = dim %[[arg0]], %c2 : memref<?x?x?xf32>		// CHECKLOOP: %[[dim8:.*]] = dim %[[arg0]], %c2 : memref<?x?x?xf32>
// CHECKLOOP: %[[aff1:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim6]]]		// CHECKLOOP: %[[aff1:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim6]]]
// CHECKLOOP: %[[aff2:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim7]]]		// CHECKLOOP: %[[aff2:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim7]]]
// CHECKLOOP: %[[aff3:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim8]]]		// CHECKLOOP: %[[aff3:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim8]]]
// CHECKLOOP: %[[va:.*]] = load %[[arg1]][%[[aff1]], %[[aff2]], %[[aff3]]] : memref<?x?x?xf32>		// CHECKLOOP: %[[va:.*]] = load %[[arg1]][%[[aff1]], %[[aff2]], %[[aff3]]] : memref<?x?x?xf32>
// CHECKLOOP: %[[vb:.*]] = load %[[arg0]][%[[i3]], %[[i4]], %[[i5]]] : memref<?x?x?xf32>		// CHECKLOOP: %[[vb:.*]] = load %[[arg0]][%[[i3]], %[[i4]], %[[i5]]] : memref<?x?x?xf32>
// CHECKLOOP: %[[vc:.*]] = load %[[arg2]][%[[i0]], %[[i1]], %[[i2]]] : memref<?x?x?xf32>		// CHECKLOOP: %[[vc:.*]] = load %[[arg2]][%[[i0]], %[[i1]], %[[i2]]] : memref<?x?x?xf32>
// CHECKLOOP: %[[inc:.*]] = mulf %[[va]], %[[vb]] : f32		// CHECKLOOP: %[[inc:.*]] = mulf %[[va]], %[[vb]] : f32
// CHECKLOOP: %[[res:.*]] = addf %[[vc]], %[[inc]] : f32		// CHECKLOOP: %[[res:.*]] = addf %[[vc]], %[[inc]] : f32
// CHECKLOOP: store %[[res]], %[[arg2]][%[[i0]], %[[i1]], %[[i2]]] : memref<?x?x?xf32>		// CHECKLOOP: store %[[res]], %[[arg2]][%[[i0]], %[[i1]], %[[i2]]] : memref<?x?x?xf32>

// CHECKPARALLEL-LABEL: @conv3d		// CHECKPARALLEL-LABEL: @conv3d
// CHECKPARALLEL-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?x?x?xf32>		// CHECKPARALLEL-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?x?x?xf32>
// CHECKPARALLEL-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?x?x?xf32>		// CHECKPARALLEL-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?x?x?xf32>
// CHECKPARALLEL-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?x?x?xf32>		// CHECKPARALLEL-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?x?x?xf32>
// CHECKPARALLEL: %[[dim0:.*]] = dim %[[arg0]], %c0 : memref<?x?x?xf32>		// CHECKPARALLEL: %[[dim0:.*]] = dim %[[arg1]], %c0 : memref<?x?x?xf32>
// CHECKPARALLEL: %[[dim1:.*]] = dim %[[arg0]], %c1 : memref<?x?x?xf32>		// CHECKPARALLEL: %[[dim1:.*]] = dim %[[arg1]], %c1 : memref<?x?x?xf32>
// CHECKPARALLEL: %[[dim2:.*]] = dim %[[arg0]], %c2 : memref<?x?x?xf32>		// CHECKPARALLEL: %[[dim2:.*]] = dim %[[arg1]], %c2 : memref<?x?x?xf32>
// CHECKPARALLEL: %[[dim3:.*]] = dim %[[arg2]], %c0 : memref<?x?x?xf32>		// CHECKPARALLEL: %[[dim3:.*]] = dim %[[arg0]], %c0 : memref<?x?x?xf32>
// CHECKPARALLEL: %[[dim4:.*]] = dim %[[arg2]], %c1 : memref<?x?x?xf32>		// CHECKPARALLEL: %[[dim4:.*]] = dim %[[arg0]], %c1 : memref<?x?x?xf32>
// CHECKPARALLEL: %[[dim5:.*]] = dim %[[arg2]], %c2 : memref<?x?x?xf32>		// CHECKPARALLEL: %[[dim5:.*]] = dim %[[arg0]], %c2 : memref<?x?x?xf32>
// CHECKPARALLEL: scf.parallel (%[[i0:.]], %[[i1:.]], %[[i2:.]], %[[i3:.]], %[[i4:.]], %[[i5:.]]) = (%c0, %c0, %c0, %c0, %c0, %c0) to (%[[dim3]], %[[dim4]], %[[dim5]], %[[dim0]], %[[dim1]], %[[dim2]]) step ({{.*}}) {		// CHECKPARALLEL: %[[half1:.*]] = affine.apply #[[$convHalf]]()[%[[dim3]]]
		// CHECKPARALLEL: %[[sizeMinusHalf1:.*]] = subi %[[dim0]], %[[half1]] : index
		// CHECKPARALLEL: %[[half2:.*]] = affine.apply #[[$convHalf]]()[%[[dim4]]]
		// CHECKPARALLEL: %[[sizeMinusHalf2:.*]] = subi %[[dim1]], %[[half2]] : index
		// CHECKPARALLEL: %[[half3:.*]] = affine.apply #[[$convHalf]]()[%[[dim5]]]
		// CHECKPARALLEL: %[[sizeMinusHalf3:.*]] = subi %[[dim2]], %[[half3]] : index
		// CHECKPARALLEL: scf.parallel (%[[i0:.]], %[[i1:.]], %[[i2:.]], %[[i3:.]], %[[i4:.]], %[[i5:.]]) = (%[[half1]], %[[half2]], %[[half3]], %c0, %c0, %c0) to (%[[sizeMinusHalf1]], %[[sizeMinusHalf2]], %[[sizeMinusHalf3]], %[[dim3]], %[[dim4]], %[[dim5]]) step ({{.*}}) {
// CHECKPARALLEL: %[[dim6:.*]] = dim %[[arg0]], %c0 : memref<?x?x?xf32>		// CHECKPARALLEL: %[[dim6:.*]] = dim %[[arg0]], %c0 : memref<?x?x?xf32>
// CHECKPARALLEL: %[[dim7:.*]] = dim %[[arg0]], %c1 : memref<?x?x?xf32>		// CHECKPARALLEL: %[[dim7:.*]] = dim %[[arg0]], %c1 : memref<?x?x?xf32>
// CHECKPARALLEL: %[[dim8:.*]] = dim %[[arg0]], %c2 : memref<?x?x?xf32>		// CHECKPARALLEL: %[[dim8:.*]] = dim %[[arg0]], %c2 : memref<?x?x?xf32>
// CHECKPARALLEL: %[[aff1:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim6]]]		// CHECKPARALLEL: %[[aff1:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim6]]]
// CHECKPARALLEL: %[[aff2:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim7]]]		// CHECKPARALLEL: %[[aff2:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim7]]]
// CHECKPARALLEL: %[[aff3:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim8]]]		// CHECKPARALLEL: %[[aff3:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim8]]]
// CHECKPARALLEL: %[[va:.*]] = load %[[arg1]][%[[aff1]], %[[aff2]], %[[aff3]]] : memref<?x?x?xf32>		// CHECKPARALLEL: %[[va:.*]] = load %[[arg1]][%[[aff1]], %[[aff2]], %[[aff3]]] : memref<?x?x?xf32>
// CHECKPARALLEL: %[[vb:.*]] = load %[[arg0]][%[[i3]], %[[i4]], %[[i5]]] : memref<?x?x?xf32>		// CHECKPARALLEL: %[[vb:.*]] = load %[[arg0]][%[[i3]], %[[i4]], %[[i5]]] : memref<?x?x?xf32>
Show All 30 Lines	^bb0(%a: f32, %b: f32, %c: f32) :
memref<?x?x?x?xf32>		memref<?x?x?x?xf32>
return		return
}		}

// CHECKLOOP-LABEL: @conv4d		// CHECKLOOP-LABEL: @conv4d
// CHECKLOOP-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?x?x?x?xf32>		// CHECKLOOP-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?x?x?x?xf32>
// CHECKLOOP-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?x?x?x?xf32>		// CHECKLOOP-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?x?x?x?xf32>
// CHECKLOOP-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?x?x?x?xf32>		// CHECKLOOP-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?x?x?x?xf32>
// CHECKLOOP: %[[dim0:.*]] = dim %[[arg0]], %c0 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[dim0:.*]] = dim %[[arg1]], %c0 : memref<?x?x?x?xf32>
// CHECKLOOP: %[[dim1:.*]] = dim %[[arg0]], %c1 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[dim1:.*]] = dim %[[arg1]], %c1 : memref<?x?x?x?xf32>
// CHECKLOOP: %[[dim2:.*]] = dim %[[arg0]], %c2 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[dim2:.*]] = dim %[[arg1]], %c2 : memref<?x?x?x?xf32>
// CHECKLOOP: %[[dim3:.*]] = dim %[[arg0]], %c3 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[dim3:.*]] = dim %[[arg1]], %c3 : memref<?x?x?x?xf32>
// CHECKLOOP: %[[dim4:.*]] = dim %[[arg2]], %c0 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[dim4:.*]] = dim %[[arg0]], %c0 : memref<?x?x?x?xf32>
// CHECKLOOP: %[[dim5:.*]] = dim %[[arg2]], %c1 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[dim5:.*]] = dim %[[arg0]], %c1 : memref<?x?x?x?xf32>
// CHECKLOOP: %[[dim6:.*]] = dim %[[arg2]], %c2 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[dim6:.*]] = dim %[[arg0]], %c2 : memref<?x?x?x?xf32>
// CHECKLOOP: %[[dim7:.*]] = dim %[[arg2]], %c3 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[dim7:.*]] = dim %[[arg0]], %c3 : memref<?x?x?x?xf32>
// CHECKLOOP: scf.for %[[i0:.]] = %{{.}} to %[[dim4]] step %{{.*}} {		// CHECKLOOP: %[[half1:.*]] = affine.apply #[[$convHalf]]()[%[[dim4]]]
// CHECKLOOP: scf.for %[[i1:.]] = %{{.}} to %[[dim5]] step %{{.*}} {		// CHECKLOOP: %[[sizeMinusHalf1:.*]] = subi %[[dim0]], %[[half1]] : index
// CHECKLOOP: scf.for %[[i2:.]] = %{{.}} to %[[dim6]] step %{{.*}} {		// CHECKLOOP: %[[half2:.*]] = affine.apply #[[$convHalf]]()[%[[dim5]]]
// CHECKLOOP: scf.for %[[i3:.]] = %{{.}} to %[[dim7]] step %{{.*}} {		// CHECKLOOP: %[[sizeMinusHalf2:.*]] = subi %[[dim1]], %[[half2]] : index
// CHECKLOOP: scf.for %[[i4:.]] = %{{.}} to %[[dim0]] step %{{.*}} {		// CHECKLOOP: %[[half3:.*]] = affine.apply #[[$convHalf]]()[%[[dim6]]]
// CHECKLOOP: scf.for %[[i5:.]] = %{{.}} to %[[dim1]] step %{{.*}} {		// CHECKLOOP: %[[sizeMinusHalf3:.*]] = subi %[[dim2]], %[[half3]] : index
// CHECKLOOP: scf.for %[[i6:.]] = %{{.}} to %[[dim2]] step %{{.*}} {		// CHECKLOOP: %[[half4:.*]] = affine.apply #[[$convHalf]]()[%[[dim7]]]
// CHECKLOOP: scf.for %[[i7:.]] = %{{.}} to %[[dim3]] step %{{.*}} {		// CHECKLOOP: %[[sizeMinusHalf4:.*]] = subi %[[dim3]], %[[half4]] : index
		// CHECKLOOP: scf.for %[[i0:.]] = %[[half1]] to %[[sizeMinusHalf1]] step %{{.}} {
		// CHECKLOOP: scf.for %[[i1:.]] = %[[half2]] to %[[sizeMinusHalf2]] step %{{.}} {
		// CHECKLOOP: scf.for %[[i2:.]] = %[[half3]] to %[[sizeMinusHalf3]] step %{{.}} {
		// CHECKLOOP: scf.for %[[i3:.]] = %[[half4]] to %[[sizeMinusHalf4]] step %{{.}} {
		// CHECKLOOP: scf.for %[[i4:.]] = %{{.}} to %[[dim4]] step %{{.*}} {
		// CHECKLOOP: scf.for %[[i5:.]] = %{{.}} to %[[dim5]] step %{{.*}} {
		// CHECKLOOP: scf.for %[[i6:.]] = %{{.}} to %[[dim6]] step %{{.*}} {
		// CHECKLOOP: scf.for %[[i7:.]] = %{{.}} to %[[dim7]] step %{{.*}} {
// CHECKLOOP: %[[dim8:.*]] = dim %[[arg0]], %c0 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[dim8:.*]] = dim %[[arg0]], %c0 : memref<?x?x?x?xf32>
// CHECKLOOP: %[[dim9:.*]] = dim %[[arg0]], %c1 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[dim9:.*]] = dim %[[arg0]], %c1 : memref<?x?x?x?xf32>
// CHECKLOOP: %[[dim10:.*]] = dim %[[arg0]], %c2 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[dim10:.*]] = dim %[[arg0]], %c2 : memref<?x?x?x?xf32>
// CHECKLOOP: %[[dim11:.*]] = dim %[[arg0]], %c3 : memref<?x?x?x?xf32>		// CHECKLOOP: %[[dim11:.*]] = dim %[[arg0]], %c3 : memref<?x?x?x?xf32>
// CHECKLOOP: %[[aff1:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim8]]]		// CHECKLOOP: %[[aff1:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim8]]]
// CHECKLOOP: %[[aff2:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim9]]]		// CHECKLOOP: %[[aff2:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim9]]]
// CHECKLOOP: %[[aff3:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim10]]]		// CHECKLOOP: %[[aff3:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim10]]]
// CHECKLOOP: %[[aff4:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim11]]]		// CHECKLOOP: %[[aff4:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim11]]]
// CHECKLOOP: %[[va:.*]] = load %[[arg1]][%[[aff1]], %[[aff2]], %[[aff3]], %[[aff4]]] : memref<?x?x?x?xf32>		// CHECKLOOP: %[[va:.*]] = load %[[arg1]][%[[aff1]], %[[aff2]], %[[aff3]], %[[aff4]]] : memref<?x?x?x?xf32>
// CHECKLOOP: %[[vb:.*]] = load %[[arg0]][%[[i4]], %[[i5]], %[[i6]], %[[i7]]] : memref<?x?x?x?xf32>		// CHECKLOOP: %[[vb:.*]] = load %[[arg0]][%[[i4]], %[[i5]], %[[i6]], %[[i7]]] : memref<?x?x?x?xf32>
// CHECKLOOP: %[[vc:.*]] = load %[[arg2]][%[[i0]], %[[i1]], %[[i2]], %[[i3]]] : memref<?x?x?x?xf32>		// CHECKLOOP: %[[vc:.*]] = load %[[arg2]][%[[i0]], %[[i1]], %[[i2]], %[[i3]]] : memref<?x?x?x?xf32>
// CHECKLOOP: %[[inc:.*]] = mulf %[[va]], %[[vb]] : f32		// CHECKLOOP: %[[inc:.*]] = mulf %[[va]], %[[vb]] : f32
// CHECKLOOP: %[[res:.*]] = addf %[[vc]], %[[inc]] : f32		// CHECKLOOP: %[[res:.*]] = addf %[[vc]], %[[inc]] : f32
// CHECKLOOP: store %[[res]], %[[arg2]][%[[i0]], %[[i1]], %[[i2]], %[[i3]]] : memref<?x?x?x?xf32>		// CHECKLOOP: store %[[res]], %[[arg2]][%[[i0]], %[[i1]], %[[i2]], %[[i3]]] : memref<?x?x?x?xf32>

// CHECKPARALLEL-LABEL: @conv4d		// CHECKPARALLEL-LABEL: @conv4d
// CHECKPARALLEL-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?x?x?x?xf32>		// CHECKPARALLEL-SAME: %[[arg0:[a-zA-Z0-9]+]]: memref<?x?x?x?xf32>
// CHECKPARALLEL-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?x?x?x?xf32>		// CHECKPARALLEL-SAME: %[[arg1:[a-zA-Z0-9]+]]: memref<?x?x?x?xf32>
// CHECKPARALLEL-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?x?x?x?xf32>		// CHECKPARALLEL-SAME: %[[arg2:[a-zA-Z0-9]+]]: memref<?x?x?x?xf32>
// CHECKPARALLEL: %[[dim0:.*]] = dim %[[arg0]], %c0 : memref<?x?x?x?xf32>		// CHECKPARALLEL: %[[dim0:.*]] = dim %[[arg1]], %c0 : memref<?x?x?x?xf32>
// CHECKPARALLEL: %[[dim1:.*]] = dim %[[arg0]], %c1 : memref<?x?x?x?xf32>		// CHECKPARALLEL: %[[dim1:.*]] = dim %[[arg1]], %c1 : memref<?x?x?x?xf32>
// CHECKPARALLEL: %[[dim2:.*]] = dim %[[arg0]], %c2 : memref<?x?x?x?xf32>		// CHECKPARALLEL: %[[dim2:.*]] = dim %[[arg1]], %c2 : memref<?x?x?x?xf32>
// CHECKPARALLEL: %[[dim3:.*]] = dim %[[arg0]], %c3 : memref<?x?x?x?xf32>		// CHECKPARALLEL: %[[dim3:.*]] = dim %[[arg1]], %c3 : memref<?x?x?x?xf32>
// CHECKPARALLEL: %[[dim4:.*]] = dim %[[arg2]], %c0 : memref<?x?x?x?xf32>		// CHECKPARALLEL: %[[dim4:.*]] = dim %[[arg0]], %c0 : memref<?x?x?x?xf32>
// CHECKPARALLEL: %[[dim5:.*]] = dim %[[arg2]], %c1 : memref<?x?x?x?xf32>		// CHECKPARALLEL: %[[dim5:.*]] = dim %[[arg0]], %c1 : memref<?x?x?x?xf32>
// CHECKPARALLEL: %[[dim6:.*]] = dim %[[arg2]], %c2 : memref<?x?x?x?xf32>		// CHECKPARALLEL: %[[dim6:.*]] = dim %[[arg0]], %c2 : memref<?x?x?x?xf32>
// CHECKPARALLEL: %[[dim7:.*]] = dim %[[arg2]], %c3 : memref<?x?x?x?xf32>		// CHECKPARALLEL: %[[dim7:.*]] = dim %[[arg0]], %c3 : memref<?x?x?x?xf32>
// CHECKPARALLEL: scf.parallel (%[[i0:.]], %[[i1:.]], %[[i2:.]], %[[i3:.]], %[[i4:.]], %[[i5:.]], %[[i6:.]], %[[i7:.]]) = (%c0, %c0, %c0, %c0, %c0, %c0, %c0, %c0) to (%[[dim4]], %[[dim5]], %[[dim6]], %[[dim7]], %[[dim0]], %[[dim1]], %[[dim2]], %[[dim3]]) step ({{.*}}) {		// CHECKPARALLEL: %[[half1:.*]] = affine.apply #[[$convHalf]]()[%[[dim4]]]
		// CHECKPARALLEL: %[[sizeMinusHalf1:.*]] = subi %[[dim0]], %[[half1]] : index
		// CHECKPARALLEL: %[[half2:.*]] = affine.apply #[[$convHalf]]()[%[[dim5]]]
		// CHECKPARALLEL: %[[sizeMinusHalf2:.*]] = subi %[[dim1]], %[[half2]] : index
		// CHECKPARALLEL: %[[half3:.*]] = affine.apply #[[$convHalf]]()[%[[dim6]]]
		// CHECKPARALLEL: %[[sizeMinusHalf3:.*]] = subi %[[dim2]], %[[half3]] : index
		// CHECKPARALLEL: %[[half4:.*]] = affine.apply #[[$convHalf]]()[%[[dim7]]]
		// CHECKPARALLEL: %[[sizeMinusHalf4:.*]] = subi %[[dim3]], %[[half4]] : index
		// CHECKPARALLEL: scf.parallel (%[[i0:.]], %[[i1:.]], %[[i2:.]], %[[i3:.]], %[[i4:.]], %[[i5:.]], %[[i6:.]], %[[i7:.]]) = (%[[half1]], %[[half2]], %[[half3]], %[[half4]], %c0, %c0, %c0, %c0) to (%[[sizeMinusHalf1]], %[[sizeMinusHalf2]], %[[sizeMinusHalf3]], %[[sizeMinusHalf4]], %[[dim4]], %[[dim5]], %[[dim6]], %[[dim7]]) step ({{.*}}) {
// CHECKPARALLEL: %[[dim8:.*]] = dim %[[arg0]], %c0 : memref<?x?x?x?xf32>		// CHECKPARALLEL: %[[dim8:.*]] = dim %[[arg0]], %c0 : memref<?x?x?x?xf32>
// CHECKPARALLEL: %[[dim9:.*]] = dim %[[arg0]], %c1 : memref<?x?x?x?xf32>		// CHECKPARALLEL: %[[dim9:.*]] = dim %[[arg0]], %c1 : memref<?x?x?x?xf32>
// CHECKPARALLEL: %[[dim10:.*]] = dim %[[arg0]], %c2 : memref<?x?x?x?xf32>		// CHECKPARALLEL: %[[dim10:.*]] = dim %[[arg0]], %c2 : memref<?x?x?x?xf32>
// CHECKPARALLEL: %[[dim11:.*]] = dim %[[arg0]], %c3 : memref<?x?x?x?xf32>		// CHECKPARALLEL: %[[dim11:.*]] = dim %[[arg0]], %c3 : memref<?x?x?x?xf32>
// CHECKPARALLEL: %[[aff1:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim8]]]		// CHECKPARALLEL: %[[aff1:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim8]]]
// CHECKPARALLEL: %[[aff2:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim9]]]		// CHECKPARALLEL: %[[aff2:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim9]]]
// CHECKPARALLEL: %[[aff3:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim10]]]		// CHECKPARALLEL: %[[aff3:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim10]]]
// CHECKPARALLEL: %[[aff4:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim11]]]		// CHECKPARALLEL: %[[aff4:.]] = affine.apply #[[$stridedConv]](%{{.}}, %{{.*}})[%[[dim11]]]
// CHECKPARALLEL: %[[va:.*]] = load %[[arg1]][%[[aff1]], %[[aff2]], %[[aff3]], %[[aff4]]] : memref<?x?x?x?xf32>		// CHECKPARALLEL: %[[va:.*]] = load %[[arg1]][%[[aff1]], %[[aff2]], %[[aff3]], %[[aff4]]] : memref<?x?x?x?xf32>
// CHECKPARALLEL: %[[vb:.*]] = load %[[arg0]][%[[i4]], %[[i5]], %[[i6]], %[[i7]]] : memref<?x?x?x?xf32>		// CHECKPARALLEL: %[[vb:.*]] = load %[[arg0]][%[[i4]], %[[i5]], %[[i6]], %[[i7]]] : memref<?x?x?x?xf32>
// CHECKPARALLEL: %[[vc:.*]] = load %[[arg2]][%[[i0]], %[[i1]], %[[i2]], %[[i3]]] : memref<?x?x?x?xf32>		// CHECKPARALLEL: %[[vc:.*]] = load %[[arg2]][%[[i0]], %[[i1]], %[[i2]], %[[i3]]] : memref<?x?x?x?xf32>
// CHECKPARALLEL: %[[inc:.*]] = mulf %[[va]], %[[vb]] : f32		// CHECKPARALLEL: %[[inc:.*]] = mulf %[[va]], %[[vb]] : f32
// CHECKPARALLEL: %[[res:.*]] = addf %[[vc]], %[[inc]] : f32		// CHECKPARALLEL: %[[res:.*]] = addf %[[vc]], %[[inc]] : f32
// CHECKPARALLEL: store %[[res]], %[[arg2]][%[[i0]], %[[i1]], %[[i2]], %[[i3]]] : memref<?x?x?x?xf32>		// CHECKPARALLEL: store %[[res]], %[[arg2]][%[[i0]], %[[i1]], %[[i2]], %[[i3]]] : memref<?x?x?x?xf32>

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Loop bounds inference in linalg.generic op improved to support bounds for convolutionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 275596

mlir/lib/Dialect/Linalg/Transforms/Loops.cpp

mlir/lib/Dialect/Linalg/Utils/Utils.cpp

mlir/test/Dialect/Linalg/loops.mlir

[mlir] Loop bounds inference in linalg.generic op improved to support bounds for convolution
ClosedPublic