This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Affine/
-
mlir/
-
Dialect/
-
Affine/
-
IR/
-
AffineOps.td
1/1
AffineValueMap.h
2/2
Passes.h
3/3
Passes.td
-
Utils.h
-
lib/Dialect/Affine/
-
Dialect/
-
Affine/
-
IR/
10/30
AffineOps.cpp
-
Transforms/
5/6
AffineParallelNormalize.cpp
-
CMakeLists.txt
-
test/Dialect/Affine/
-
Dialect/
-
Affine/
3/6
affine-parallel-normalize.mlir
-
canonicalize.mlir

Differential D84998

[MLIR] Add affine.parallel folder and AffineParallelNormalizePass
ClosedPublic

Authored by flaub on Jul 30 2020, 7:38 PM.

Download Raw Diff

Details

Reviewers

jbruestle
nicolasvasilache
ftynse
rriddle
andydavis1
bondhugula
dcaballe

Commits

rGcca3f3dd2681: [MLIR] Add affine.parallel folder and normalizer

Summary

Add a folder to the affine.parallel op so that loop bounds expressions are canonicalized.

Additionally, a new AffineParallelNormalizePass is added to adjust affine.parallel ops so that the lower bound is always 0 and the upper bound always represents a range with a step size of 1.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

flaub created this revision.Jul 30 2020, 7:38 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 30 2020, 7:38 PM

Herald added subscribers: msifontes, jurahul, Kayjukh and 12 others. · View Herald Transcript

flaub requested review of this revision.Jul 30 2020, 7:38 PM

Herald added a subscriber: stephenneuendorffer. · View Herald TranscriptJul 30 2020, 7:38 PM

Harbormaster completed remote builds in B66492: Diff 282105.Jul 30 2020, 7:55 PM

Fix pre-merge lint check

I notice auto being used pervasively and it's impacting readability at most places. Please spell the type out.

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2522	assert message please.
2688	Doc comment please.
mlir/test/Dialect/Affine/affine-fold.mlir
2 ↗	(On Diff #282119)	Please add a couple of lines here on what these test cases are testing.

This revision now requires changes to proceed.Jul 31 2020, 2:15 AM

Address CR comments

flaub marked 3 inline comments as done.Jul 31 2020, 11:59 AM

@dcaballe or @ftynse - could one of you review this please? I'm quite tied up the next week.

bondhugula edited reviewers, added: andydavis1; removed: bondhugula.Aug 1 2020, 12:09 AM

Thanks for the patch! Just a high-level comment for now. It looks good overall.

Any thoughts on what should go to the canonicalization pass and what should go to an independent pass? Removing dead loops or 1-trip-count loops seem like a good fit to me. However, normalizing the loop bounds and step (SimplifyAffineParallel) is something that might not be always desirable since it's moving the complexity from the loop control to the loop body. Would it make sense to have an independent loop normalization pass for that? I'm worried about adding too many transformation to canonicalization and that we have to go all or nothing with them.

Also, any thoughts on how we can refactor this so that we can also use it for affine.for or even scf? Could it be implemented using the loop interface? Could you give that a try? Otherwise we would end up replicating this 4 times...

rriddle added inline comments.Aug 3 2020, 2:04 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2662	Not sure what you mean here, the rewriter will gladly rollback successfully applied patterns. Just because a pattern was applied doesn't mean it won't get rolled back.

A lot of your patterns (all of them?) are breaking the contract with the pattern rewriter right now.

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2620	Drop trivial braces here and below.
2679	All of these transformations are breaking the contract with the pattern rewriter, you can't do these in place. Some you can (setting operands, attribute) but you are required to start a root update.

flaub added inline comments.Aug 3 2020, 3:59 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2662	My understanding was that `arg.replaceAllUsesWith(lowerBoundValue);` was not something the rewriter could rollback. So I was attempting to determine if the transformation was legal before doing something that couldn't be recovered from.
2679	OK, I'll try to think of a different way to do this. Do you have docs or test cases I can refer to that can teach me the right way to do this?

rriddle added inline comments.Aug 3 2020, 4:31 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2662	If something isn't expressible with the PatternRewriter, we generally just add the necessary API for that mutation. It's fine even if not all of the pattern rewrite drivers support it immediately, we just add `llvm_unreachable("Unsupported ...")` in those cases. For this case, you should be able to use the `replaceUsesOfBlockArgumentWith` method, but for some reason that is only on ConversionPatternRewriter. We should move that to the base PatternRewriter class as a new hook, at which point you could use it directly. https://github.com/llvm/llvm-project/blob/49bbb8b60e451d173c7dd42993592e8aa4d95f24/mlir/include/mlir/Transforms/DialectConversion.h#L458 The default implementation should effectively just be this inner loop: https://github.com/llvm/llvm-project/blob/49bbb8b60e451d173c7dd42993592e8aa4d95f24/mlir/lib/Transforms/DialectConversion.cpp#L889
2679	Docs on rewriter API are lacking right now, but I've been working on fixing that. If you want to do a root update, you need to use the root update API: https://github.com/llvm/llvm-project/blob/49bbb8b60e451d173c7dd42993592e8aa4d95f24/mlir/include/mlir/IR/PatternMatch.h#L343 It allows for updating specific parts of an operation in-place, i.e. attributes/operations/successors, but not others(e.g. things happening in regions).
2679	You can use `op.attrNameAttr(newAttr)` to set a specific attribute, e.g. in this case `op. lowerBoundsMapAttr(AffineMapAttr::get(newLower))`.

@rriddle I've attempted to follow your advice, could you take another look? Is this what you were thinking of?

@dcaballe I think you're right about not including the AffineSimplifyParallel with canonicalization. I've split this out into its own pass, what do you think?

Herald added a subscriber: mgorny. · View Herald TranscriptAug 5 2020, 2:08 PM

flaub updated this revision to Diff 283394.Aug 5 2020, 2:13 PM

flaub marked an inline comment as done.

dcaballe requested changes to this revision.Aug 5 2020, 4:12 PM

Thanks Frank! It looks good to me. Just a few minor comments

mlir/include/mlir/Dialect/Affine/Passes.h
40	Could we use Normalize instead of Simplify? https://en.wikipedia.org/wiki/Normalized_loop We can rename it later on if we add other simplifications in the future.
mlir/include/mlir/Dialect/Affine/Passes.td
122	Probably good to add the previous comment to this description: / Simplify affine.parallel ops so that they have a step size of 1 and a lower / bound of 0.
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2606	I don't understand this condition. Why don't remove a 0-rank loop if it has a custom attribute?
2688	Doc still missing?
mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp
22 ↗	(On Diff #283394)	Same, Simplify -> Normalize
38 ↗	(On Diff #283394)	We shouldn't run full blown canonicalization on all the ops as part of this pass. If canonicalization is needed, we can always invoke the canonicalizer after this pass. In that way, we would leave the decision of running canonicalization or not after this pass to the user.
51 ↗	(On Diff #283394)	nit, readability: lbExpr.getValue() -> lbExpr.getValue() != 0
mlir/test/Dialect/Affine/affine-fold.mlir
23 ↗	(On Diff #283394)	no iteration -> only one iteration?

This revision now requires changes to proceed.Aug 5 2020, 4:12 PM

flaub added inline comments.Aug 6 2020, 11:04 AM

mlir/include/mlir/Dialect/Affine/Passes.h
40	Gotcha, will do.
mlir/include/mlir/Dialect/Affine/Passes.td
122	Was trying to keep this short since this is just for the command line. Looking at the other summaries that seems consistent. I can try to make it a little bit more descriptive. How about if we go with the normalize nomenclature, can we just say: "Normalize affine.parallel ops so that lower bounds are 0 and step size is 1."
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2606	We have cases where we'd like to avoid removing affine.parallel ops even when they have no IVs where the op represents something structural, like it might represent the outermost loop which we might want to do kernel outlining on. This was an attempt to prevent this particular canonicalization from running in this case. We've been 'tagging' ops in our pipeline and this was an easy way to control canonicalization. This isn't the most elegant or general solution so we're open to something better here.
mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp
38 ↗	(On Diff #283394)	Fair enough I'll remove it and just add it to our pipeline as a separate pass.
mlir/test/Dialect/Affine/affine-fold.mlir
23 ↗	(On Diff #283394)	Actually I meant to say when no IVs exist. But yes, that's right.

dcaballe added a subscriber: bondhugula.Aug 6 2020, 11:56 AM

dcaballe added inline comments.

mlir/include/mlir/Dialect/Affine/Passes.td
122	Sounds good!
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2606	One option could be to convert your special rank-0 affine.parallel to some other region op that properly models what you want (kernel outlining, in this case) before the canonicalization (or probably use that region op from the beginning? I'm missing how you get to this rank-0 affine.parallel scenario). Another option could be to have a dedicated attribute to prevent the canonicalization. I think I would lean towards the first option since we wouldn't be overloading rank-0 affine.parallel constructs with special semantics. Probably @rriddle, @ftynse, @bondhugula could also help here.

flaub added inline comments.Aug 6 2020, 12:55 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2606	IIUC, currently kernel outlining is based on `affine.for` and `scf.parallel` ops. Our plan was to lower `affine.parallel` to `scf.parallel` and then use the SCFToGPU pass and then eventually perform kernel outlining. If we were to canonicalize away this empty `affine.parallel`, then any inner `affine.parallel` would be elevated up one level, which would mean kernel outlining would be working on the wrong level. Our model assumes that the outermost `affine.parallel` represents iteration of the workgroup items. It's very possible that we'd have a single kernel launch for a single workgroup item, which would be represented as `affine.parallel () = () to ()`. Regarding adding a new op, I suppose that would work, although it seems redundant. What should we call this op? It would act as a shield for this canonicalization but then we'd need a special lowering for this op into `affine.for` or `scf.parallel`. And it's not really rank-0 that have these special semantics, it's any outermost loop that has special semantics that is imposed by kernel outlining.

bondhugula requested changes to this revision.Aug 6 2020, 1:15 PM

bondhugula added inline comments.

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2606	This looks like a hack / substitute due to a missing abstraction/region op. In fact, std.execute_region exactly model scenarios like this among several others. (There is the affine.execute_region as well but you won't need it here since it starts a new affine scope at that point.) Perhaps you may want to leave it the way that's simplest for this revision and put a TODO for the need of a more suitable abstraction?
2616	Nit: Terminate with full stop please.
2650	Likewise and anywhere else.
mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
81 ↗	(On Diff #283394)	Avoid `auto` here please.
87 ↗	(On Diff #283394)	Likewise.

bondhugula added inline comments.Aug 6 2020, 1:19 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2671	Nit: prefix `Map` to name.
mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp
38 ↗	(On Diff #283394)	Note that, where needed, there is also a local version of the canonicalizer to apply on specific ops if you know which ops you want to canonicalize.

bondhugula added inline comments.Aug 6 2020, 1:22 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2607	The comment needs another line to explain the rationale.
mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp
64 ↗	(On Diff #283394)	Nit: steps should always be int64_t.

flaub added inline comments.Aug 6 2020, 1:27 PM

mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
112 ↗	(On Diff #283394)	@bondhugula I'm confused, this file already uses this style for `auto`. How do I know which style to use?

flaub added inline comments.Aug 6 2020, 1:30 PM

mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp
38 ↗	(On Diff #283394)	Is there an example of this someplace? That seems like something that would be useful in this case.

flaub added inline comments.Aug 6 2020, 1:39 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2606	Agreed this is a hack, looking for better alternatives, but it does practically work for now. I've been looking for the `std.execute_region` and `affine.execute_region` but can't find it. Was it a proposal or perhaps renamed?

Address comments.

flaub edited the summary of this revision. (Show Details)Aug 6 2020, 2:13 PM

@dcaballe & @bondhugula Is there anything else you'd like for me to address?

@rriddle I added replaceUsesOfBlockArgument and replaceUsesOfWith to the PatternRewriter, left the default impl as unsupported, and then implemented a simple version for GreedyPatternRewriteDriver, WDYT?

dcaballe added inline comments.Aug 10 2020, 2:07 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2606	To keep this moving while working on a better abstraction and since you are already overusing affine.parallel, would it be possible to remove the check below so that any affine.parallel is optimized and then add some fake bounds to your special affine.parallel so that it's not optimized away? Another option would be to define a specific attribute that we can use to prevent the optimization. We could check for that attribute only. WDYT?

In D84998#2205662, @flaub wrote:

@rriddle I added replaceUsesOfBlockArgument and replaceUsesOfWith to the PatternRewriter, left the default impl as unsupported, and then implemented a simple version for GreedyPatternRewriteDriver, WDYT?

Sorry for the delay, was waiting for some of the other discussion on this revision to resolve first.

mlir/include/mlir/IR/PatternMatch.h
318 ↗	(On Diff #283734)	Please use the inner loop here as the default implementation of this hook (https://github.com/llvm/llvm-project/blob/49bbb8b60e451d173c7dd42993592e8aa4d95f24/mlir/lib/Transforms/DialectConversion.cpp#L889). The operation rooted at `to` could be using `from`, using the implementation linked above allows for the use cases to "just work".
322 ↗	(On Diff #283734)	I don't think this is necessary, see the comment at the use of it below.
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2620	This seems like it should just be `rewriter.replaceOp(op, yield.operands());`.
2626	Drop the mlir::
2628	Why is this hook necessary? Seems like you should be using `rewriter.replaceOp(op, yield.operands())`instead of this loop and eraseOp.
2629	This would also need to go through the rewriter.
2674	This would also need to go through the rewriter.
mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp
33 ↗	(On Diff #283394)	Please do not do this. This is placing a full run of the canonicalizer pass inside of your pass, just let the user schedule the canonicalizer in their pipeline.
mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
80 ↗	(On Diff #283734)	When you update the default implementation, you can call it from here.

bondhugula added inline comments.Aug 13 2020, 12:52 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2606	The execute_region ops are still pending - the corresponding revisions are dormant on differential. Although there are no unresolved issues w.r.t std.execute_region, I didn't finish it up for the lack of immediate use cases (an `std.yield` terminator needs to be added for it).

bondhugula added inline comments.Aug 13 2020, 12:59 PM

mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
112 ↗	(On Diff #283394)	A lot of the old code was written with an overuse of `auto`. That's fine - it's not a big deal here. In general, avoid `auto` if it doesn't improve readability. The issue is that the type is often obvious to the author of the revision at this point, but it often isn't when reading it later "locally".

After some more discussions and thought, I've removed the complex canonicalizations that:

break the PatternRewriter rules
have hacky special exemptions

What's left is the folder for affine.parallel (very similar to the one for affine.for) and the separate AffineParallelNormalizePass. The other patterns will be refactored as separate passes that we might submit here at a later time.

Hey Frank, didn't mean for this to force you to scale back the patch. If it helps I can implement the pattern rewriter hooks that you would need. Just let me know.

River

bondhugula added inline comments.Aug 14 2020, 10:13 PM

mlir/include/mlir/Dialect/Affine/IR/AffineValueMap.h
77	This doesn't describe what this method does! (but only its return status).
mlir/test/Dialect/Affine/affine-parallel-normalize.mlir
23	Nit: bounds -> bound or bounds'

bondhugula added inline comments.Aug 14 2020, 10:19 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp

2513

Nit: avoid auto.

mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp

38 ↗

(On Diff #283394)

The method is called mlir::applyOpPatternsAndFold and is used in these five files:

lib/Dialect/Affine/Utils/Utils.cpp
lib/Dialect/Affine/Transforms/AffineDataCopyGeneration.cpp
lib/Dialect/Affine/Transforms/SimplifyAffineStructures.cpp
lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
lib/Transforms/Utils/LoopUtils.cpp

Thanks for addressing the comments, Frank! LGTM. I'll leave the final approval to Uday and River since I'm OOO next week.

What's left is the folder for affine.parallel (very similar to the one for affine.for) and the separate AffineParallelNormalizePass. The other patterns will be refactored as separate passes that we might submit here at a later time.

Actually, having them in a separate pass could be a good idea since removing a loop (or iteration space dim), even if it has 1 iteration, is somehow dropping high level information that can be useful or even expected by some passes. I would actually have problems if we removed 1-iteration loops or dimension too early in the pipeline. Maybe a LoopSimplifyPass could gather some of these loop optimization so that we have more control over them.

Unblocking the review from my side. OOO next week.

bondhugula added inline comments.Aug 18 2020, 4:49 AM

mlir/test/Dialect/Affine/affine-parallel-normalize.mlir
2	Are you sure you want to run `-canonicalize` here? This would make it an integration test and in many cases may make it hard/non-trivial to maintain test cases due to changes to `-canonicalize`. I understand it makes it much easier/intuitive for you to write the CHECK lines - more compact IR but in general combining multiple passes for for something that is meant to test just a specific pass is discouraged. Actually, if you really want to run the canonicalizer always after this, you can run it from inside the pass and just drop this `-canonicalize`. But that still gives one a combined effect. What do you think?

bondhugula added inline comments.Aug 18 2020, 4:50 AM

mlir/lib/Dialect/Affine/Transforms/AffineParallelNormalize.cpp
24	Could you add a line on whether it's always guaranteed to succeed or when it might fail.

bondhugula requested changes to this revision.Aug 18 2020, 4:57 AM

bondhugula added inline comments.

mlir/lib/Dialect/Affine/Transforms/AffineParallelNormalize.cpp
60–62	This looks like a bug. Shouldn't this be ceilDiv?
mlir/test/Dialect/Affine/affine-parallel-normalize.mlir
10	Shouldn't there be 4 iterations in the outer loop here?

This revision now requires changes to proceed.Aug 18 2020, 4:57 AM

In D84998#2218985, @rriddle wrote:

Hey Frank, didn't mean for this to force you to scale back the patch. If it helps I can implement the pattern rewriter hooks that you would need. Just let me know.

River

No worries River, thanks for the offer. I think we're still exploring the design space here with regards to being able to configure canonicalizations. Perhaps it's just a bad idea and I need to rethink it. I'm happy to back away and take more time to think thru a better solution so that we don't paint ourselves into a corner.

All that said, it would be helpful to see your vision for updating the default implementation of PatternRewriter, it wasn't clear to me how big of a refactor you were looking for, so I was trying to do the most minimal thing. I think there's a separate issue which is how to prevent authors of pattern rewriters from breaking the rules; it seems very easy to do so right now and at present we see no errors or other indications that this has happened.

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2513	Seems like this one improves readability, because otherwise it's the lengthy: `AffineParallelOp::operand_range`
mlir/lib/Dialect/Affine/Transforms/AffineParallelNormalize.cpp
60–62	Good catch!
mlir/test/Dialect/Affine/affine-parallel-normalize.mlir
2	I've dropped the test that relies on canonicalization because it's true that as a unit test, we don't need to also test the combined capability. For context, this pass was originally written as a canonicalization pattern. I think a further refinement would be to provide a way to call the transformation directly without going thru a pass to allow callers to decide whether they want to combine canonicalization with normalization. The combination is what I require in my current use cases. But as a general purpose library I see the value in not restricting other users to that particular use case.
10	Yes! Thanks for catching this!

flaub updated this revision to Diff 286470.Aug 18 2020, 8:52 PM

bondhugula added inline comments.Aug 18 2020, 11:29 PM

mlir/lib/Dialect/Affine/Transforms/AffineParallelNormalize.cpp
38	`steps` should all be stored in int64_t for consistency.
40	A comment here and/or one for the `isWorkPending \|= ...` line in the body.
41	auto -> int64_t

Thanks for addressing something. Looks good to me. Please take care of the remaining minor comments for doc.

mlir/test/Dialect/Affine/affine-parallel-normalize.mlir
4	Nit: Set -> Normalize

This revision is now accepted and ready to land.Aug 18 2020, 11:33 PM

flaub updated this revision to Diff 286904.Aug 20 2020, 3:15 PM

flaub marked 4 inline comments as done.

Closed by commit rGcca3f3dd2681: [MLIR] Add affine.parallel folder and normalizer (authored by flaub). · Explain WhyAug 20 2020, 3:24 PM

This revision was automatically updated to reflect the committed changes.

flaub added a commit: rGcca3f3dd2681: [MLIR] Add affine.parallel folder and normalizer.

Thanks for the review everyone, I've made tiny tweaks after further feedback. This includes extracting the core transformation of normalization into a separate utility function (in case other users want to integrate this transformation in a different pass), and also adding a helper method AffineParallelOp::getSteps().

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Affine/

IR/

19 lines

4 lines

3 lines

6 lines

5 lines

lib/

Dialect/

Affine/

IR/

AffineOps.cpp

99 lines

Transforms/

AffineParallelNormalize.cpp

96 lines

CMakeLists.txt

1 line

test/

Dialect/

Affine/

affine-parallel-normalize.mlir

25 lines

canonicalize.mlir

23 lines

Diff 286906

mlir/include/mlir/Dialect/Affine/IR/AffineOps.td

Show First 20 Lines • Show All 615 Lines • ▼ Show 20 Lines	OpBuilder<"OpBuilder &builder, OperationState &result, "
"AffineMap ubMap, ValueRange ubArgs, "		"AffineMap ubMap, ValueRange ubArgs, "
"ArrayRef<int64_t> steps">		"ArrayRef<int64_t> steps">
];		];

let extraClassDeclaration = [{		let extraClassDeclaration = [{
/// Get the number of dimensions.		/// Get the number of dimensions.
unsigned getNumDims();		unsigned getNumDims();

operand_range getLowerBoundsOperands();
operand_range getUpperBoundsOperands();

AffineValueMap getLowerBoundsValueMap();
AffineValueMap getUpperBoundsValueMap();
AffineValueMap getRangesValueMap();		AffineValueMap getRangesValueMap();

/// Get ranges as constants, may fail in dynamic case.		/// Get ranges as constants, may fail in dynamic case.
Optional<SmallVector<int64_t, 8>> getConstantRanges();		Optional<SmallVector<int64_t, 8>> getConstantRanges();

Block *getBody();		Block *getBody();
OpBuilder getBodyBuilder();		OpBuilder getBodyBuilder();
MutableArrayRef<BlockArgument> getIVs() {		MutableArrayRef<BlockArgument> getIVs() {
return getBody()->getArguments();		return getBody()->getArguments();
}		}

		operand_range getLowerBoundsOperands();
		AffineValueMap getLowerBoundsValueMap();
		void setLowerBounds(ValueRange operands, AffineMap map);
		void setLowerBoundsMap(AffineMap map);

		operand_range getUpperBoundsOperands();
		AffineValueMap getUpperBoundsValueMap();
		void setUpperBounds(ValueRange operands, AffineMap map);
		void setUpperBoundsMap(AffineMap map);

		SmallVector<int64_t, 8> getSteps();
void setSteps(ArrayRef<int64_t> newSteps);		void setSteps(ArrayRef<int64_t> newSteps);

static StringRef getReductionsAttrName() { return "reductions"; }		static StringRef getReductionsAttrName() { return "reductions"; }
static StringRef getLowerBoundsMapAttrName() { return "lowerBoundsMap"; }		static StringRef getLowerBoundsMapAttrName() { return "lowerBoundsMap"; }
static StringRef getUpperBoundsMapAttrName() { return "upperBoundsMap"; }		static StringRef getUpperBoundsMapAttrName() { return "upperBoundsMap"; }
static StringRef getStepsAttrName() { return "steps"; }		static StringRef getStepsAttrName() { return "steps"; }
}];		}];

		let hasFolder = 1;
}		}

def AffinePrefetchOp : Affine_Op<"prefetch"> {		def AffinePrefetchOp : Affine_Op<"prefetch"> {
let summary = "affine prefetch operation";		let summary = "affine prefetch operation";
let description = [{		let description = [{
The "affine.prefetch" op prefetches data from a memref location described		The "affine.prefetch" op prefetches data from a memref location described
with an affine subscript similar to affine.load, and has three attributes:		with an affine subscript similar to affine.load, and has three attributes:
a read/write specifier, a locality hint, and a cache type specifier as shown		a read/write specifier, a locality hint, and a cache type specifier as shown
▲ Show 20 Lines • Show All 268 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Affine/IR/AffineValueMap.h

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	public:
inline unsigned getNumDims() const { return map.getNumDims(); }		inline unsigned getNumDims() const { return map.getNumDims(); }
inline unsigned getNumSymbols() const { return map.getNumSymbols(); }		inline unsigned getNumSymbols() const { return map.getNumSymbols(); }
inline unsigned getNumResults() const { return map.getNumResults(); }		inline unsigned getNumResults() const { return map.getNumResults(); }

Value getOperand(unsigned i) const;		Value getOperand(unsigned i) const;
ArrayRef<Value> getOperands() const;		ArrayRef<Value> getOperands() const;
AffineMap getAffineMap() const;		AffineMap getAffineMap() const;

		/// Attempts to canonicalize the map and operands. Return success if the map
		bondhugulaUnsubmitted Done Reply Inline Actions This doesn't describe what this method does! (but only its return status). bondhugula: This doesn't describe what this method does! (but only its return status).
		/// and/or operands have been modified.
		LogicalResult canonicalize();

private:		private:
// A mutable affine map.		// A mutable affine map.
MutableAffineMap map;		MutableAffineMap map;

// TODO: make these trailing objects?		// TODO: make these trailing objects?
/// The SSA operands binding to the dim's and symbols of 'map'.		/// The SSA operands binding to the dim's and symbols of 'map'.
SmallVector<Value, 4> operands;		SmallVector<Value, 4> operands;
/// The SSA results binding to the results of 'map'.		/// The SSA results binding to the results of 'map'.
SmallVector<Value, 4> results;		SmallVector<Value, 4> results;
};		};

} // namespace mlir		} // namespace mlir

#endif // MLIR_DIALECT_AFFINE_IR_AFFINEVALUEMAP_H		#endif // MLIR_DIALECT_AFFINE_IR_AFFINEVALUEMAP_H

mlir/include/mlir/Dialect/Affine/Passes.h

	Show All 29 Lines
	/// operations out of affine loops.			/// operations out of affine loops.
	std::unique_ptr<OperationPass<FuncOp>>			std::unique_ptr<OperationPass<FuncOp>>
	createAffineLoopInvariantCodeMotionPass();			createAffineLoopInvariantCodeMotionPass();

	/// Creates a pass to convert all parallel affine.for's into 1-d affine.parallel			/// Creates a pass to convert all parallel affine.for's into 1-d affine.parallel
	/// ops.			/// ops.
	std::unique_ptr<OperationPass<FuncOp>> createAffineParallelizePass();			std::unique_ptr<OperationPass<FuncOp>> createAffineParallelizePass();

				/// Normalize affine.parallel ops so that lower bounds are 0 and steps are 1.
				std::unique_ptr<OperationPass<FuncOp>> createAffineParallelNormalizePass();

				dcaballeUnsubmitted Done Reply Inline Actions Could we use Normalize instead of Simplify? https://en.wikipedia.org/wiki/Normalized_loop We can rename it later on if we add other simplifications in the future. dcaballe: Could we use Normalize instead of Simplify? https://en.wikipedia.org/wiki/Normalized_loop We…
				flaubAuthorUnsubmitted Done Reply Inline Actions Gotcha, will do. flaub: Gotcha, will do.
	/// Performs packing (or explicit copying) of accessed memref regions into			/// Performs packing (or explicit copying) of accessed memref regions into
	/// buffers in the specified faster memory space through either pointwise copies			/// buffers in the specified faster memory space through either pointwise copies
	/// or DMA operations.			/// or DMA operations.
	std::unique_ptr<OperationPass<FuncOp>> createAffineDataCopyGenerationPass(			std::unique_ptr<OperationPass<FuncOp>> createAffineDataCopyGenerationPass(
	unsigned slowMemorySpace, unsigned fastMemorySpace,			unsigned slowMemorySpace, unsigned fastMemorySpace,
	unsigned tagMemorySpace = 0, int minDmaTransferSize = 1024,			unsigned tagMemorySpace = 0, int minDmaTransferSize = 1024,
	uint64_t fastMemCapacityBytes = std::numeric_limits<uint64_t>::max());			uint64_t fastMemCapacityBytes = std::numeric_limits<uint64_t>::max());
	/// Overload relying on pass options for initialization.			/// Overload relying on pass options for initialization.
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Affine/Passes.td

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	def AffineVectorize : FunctionPass<"affine-super-vectorize"> {
];		];
}		}

def AffineParallelize : FunctionPass<"affine-parallelize"> {		def AffineParallelize : FunctionPass<"affine-parallelize"> {
let summary = "Convert affine.for ops into 1-D affine.parallel";		let summary = "Convert affine.for ops into 1-D affine.parallel";
let constructor = "mlir::createAffineParallelizePass()";		let constructor = "mlir::createAffineParallelizePass()";
}		}

		def AffineParallelNormalize : FunctionPass<"affine-parallel-normalize"> {
		let summary = "Normalize affine.parallel ops so that lower bounds are 0 and "
		dcaballeUnsubmitted Done Reply Inline Actions Probably good to add the previous comment to this description: / Simplify affine.parallel ops so that they have a step size of 1 and a lower / bound of 0. dcaballe: Probably good to add the previous comment to this description: /// Simplify affine.parallel…
		flaubAuthorUnsubmitted Done Reply Inline Actions Was trying to keep this short since this is just for the command line. Looking at the other summaries that seems consistent. I can try to make it a little bit more descriptive. How about if we go with the normalize nomenclature, can we just say: "Normalize affine.parallel ops so that lower bounds are 0 and step size is 1." flaub: Was trying to keep this short since this is just for the command line. Looking at the other…
		dcaballeUnsubmitted Done Reply Inline Actions Sounds good! dcaballe: Sounds good!
		"steps are 1";
		let constructor = "mlir::createAffineParallelNormalizePass()";
		}

def SimplifyAffineStructures : FunctionPass<"simplify-affine-structures"> {		def SimplifyAffineStructures : FunctionPass<"simplify-affine-structures"> {
let summary = "Simplify affine expressions in maps/sets and normalize "		let summary = "Simplify affine expressions in maps/sets and normalize "
"memrefs";		"memrefs";
let constructor = "mlir::createSimplifyAffineStructuresPass()";		let constructor = "mlir::createSimplifyAffineStructuresPass()";
}		}

#endif // MLIR_DIALECT_AFFINE_PASSES		#endif // MLIR_DIALECT_AFFINE_PASSES

mlir/include/mlir/Dialect/Affine/Utils.h

	Show All 37 Lines
	/// 'vectorSizes'. By default, each vectorization factor is applied			/// 'vectorSizes'. By default, each vectorization factor is applied
	/// inner-to-outer to the loops of each loop nest. 'fastestVaryingPattern' can			/// inner-to-outer to the loops of each loop nest. 'fastestVaryingPattern' can
	/// be optionally used to provide a different loop vectorization order.			/// be optionally used to provide a different loop vectorization order.
	void vectorizeAffineLoops(			void vectorizeAffineLoops(
	Operation *parentOp,			Operation *parentOp,
	llvm::DenseSet<Operation , DenseMapInfo<Operation >> &loops,			llvm::DenseSet<Operation , DenseMapInfo<Operation >> &loops,
	ArrayRef<int64_t> vectorSizes, ArrayRef<int64_t> fastestVaryingPattern);			ArrayRef<int64_t> vectorSizes, ArrayRef<int64_t> fastestVaryingPattern);

				/// Normalize a affine.parallel op so that lower bounds are 0 and steps are 1.
				/// As currently implemented, this transformation cannot fail and will return
				/// early if the op is already in a normalized form.
				void normalizeAffineParallel(AffineParallelOp op);

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_AFFINE_UTILS_H			#endif // MLIR_DIALECT_AFFINE_UTILS_H

mlir/lib/Dialect/Affine/IR/AffineOps.cpp

Show First 20 Lines • Show All 2,499 Lines • ▼ Show 20 Lines
}		}

Block *AffineParallelOp::getBody() { return &region().front(); }		Block *AffineParallelOp::getBody() { return &region().front(); }

OpBuilder AffineParallelOp::getBodyBuilder() {		OpBuilder AffineParallelOp::getBodyBuilder() {
return OpBuilder(getBody(), std::prev(getBody()->end()));		return OpBuilder(getBody(), std::prev(getBody()->end()));
}		}

		void AffineParallelOp::setLowerBounds(ValueRange lbOperands, AffineMap map) {
		assert(lbOperands.size() == map.getNumInputs() &&
		"operands to map must match number of inputs");
		assert(map.getNumResults() >= 1 && "bounds map has at least one result");

		auto ubOperands = getUpperBoundsOperands();
		bondhugulaUnsubmitted Not Done Reply Inline Actions Nit: avoid `auto`. bondhugula: Nit: avoid `auto`.
		flaubAuthorUnsubmitted Done Reply Inline Actions Seems like this one improves readability, because otherwise it's the lengthy: `AffineParallelOp::operand_range` flaub: Seems like this one improves readability, because otherwise it's the lengthy: `AffineParallelOp…

		SmallVector<Value, 4> newOperands(lbOperands);
		newOperands.append(ubOperands.begin(), ubOperands.end());
		getOperation()->setOperands(newOperands);

		lowerBoundsMapAttr(AffineMapAttr::get(map));
		}

		void AffineParallelOp::setUpperBounds(ValueRange ubOperands, AffineMap map) {
		bondhugulaUnsubmitted Done Reply Inline Actions assert message please. bondhugula: assert message please.
		assert(ubOperands.size() == map.getNumInputs() &&
		"operands to map must match number of inputs");
		assert(map.getNumResults() >= 1 && "bounds map has at least one result");

		SmallVector<Value, 4> newOperands(getLowerBoundsOperands());
		newOperands.append(ubOperands.begin(), ubOperands.end());
		getOperation()->setOperands(newOperands);

		upperBoundsMapAttr(AffineMapAttr::get(map));
		}

		void AffineParallelOp::setLowerBoundsMap(AffineMap map) {
		AffineMap lbMap = lowerBoundsMap();
		assert(lbMap.getNumDims() == map.getNumDims() &&
		lbMap.getNumSymbols() == map.getNumSymbols());
		(void)lbMap;
		lowerBoundsMapAttr(AffineMapAttr::get(map));
		}

		void AffineParallelOp::setUpperBoundsMap(AffineMap map) {
		AffineMap ubMap = upperBoundsMap();
		assert(ubMap.getNumDims() == map.getNumDims() &&
		ubMap.getNumSymbols() == map.getNumSymbols());
		(void)ubMap;
		upperBoundsMapAttr(AffineMapAttr::get(map));
		}

		SmallVector<int64_t, 8> AffineParallelOp::getSteps() {
		SmallVector<int64_t, 8> result;
		for (Attribute attr : steps()) {
		result.push_back(attr.cast<IntegerAttr>().getInt());
		}
		return result;
		}

void AffineParallelOp::setSteps(ArrayRef<int64_t> newSteps) {		void AffineParallelOp::setSteps(ArrayRef<int64_t> newSteps) {
assert(newSteps.size() == getNumDims() && "steps & num dims mismatch");		stepsAttr(getBodyBuilder().getI64ArrayAttr(newSteps));
setAttr(getStepsAttrName(), getBodyBuilder().getI64ArrayAttr(newSteps));
}		}

static LogicalResult verify(AffineParallelOp op) {		static LogicalResult verify(AffineParallelOp op) {
auto numDims = op.getNumDims();		auto numDims = op.getNumDims();
if (op.lowerBoundsMap().getNumResults() != numDims \|\|		if (op.lowerBoundsMap().getNumResults() != numDims \|\|
op.upperBoundsMap().getNumResults() != numDims \|\|		op.upperBoundsMap().getNumResults() != numDims \|\|
op.steps().size() != numDims \|\|		op.steps().size() != numDims \|\|
op.getBody()->getNumArguments() != numDims)		op.getBody()->getNumArguments() != numDims)
Show All 17 Lines	if (failed(verifyDimAndSymbolIdentifiers(op, op.getLowerBoundsOperands(),
return failure();		return failure();
/// Upper bounds.		/// Upper bounds.
if (failed(verifyDimAndSymbolIdentifiers(op, op.getUpperBoundsOperands(),		if (failed(verifyDimAndSymbolIdentifiers(op, op.getUpperBoundsOperands(),
op.upperBoundsMap().getNumDims())))		op.upperBoundsMap().getNumDims())))
return failure();		return failure();
return success();		return success();
}		}

		LogicalResult AffineValueMap::canonicalize() {
		SmallVector<Value, 4> newOperands{operands};
		auto newMap = getAffineMap();
		composeAffineMapAndOperands(&newMap, &newOperands);
		if (newMap == getAffineMap() && newOperands == operands)
		return failure();
		reset(newMap, newOperands);
		return success();
		}

		/// Canonicalize the bounds of the given loop.
		static LogicalResult canonicalizeLoopBounds(AffineParallelOp op) {
		AffineValueMap lb = op.getLowerBoundsValueMap();
		bool lbCanonicalized = succeeded(lb.canonicalize());
		dcaballeUnsubmitted Not Done Reply Inline Actions I don't understand this condition. Why don't remove a 0-rank loop if it has a custom attribute? dcaballe: I don't understand this condition. Why don't remove a 0-rank loop if it has a custom attribute?
		flaubAuthorUnsubmitted Done Reply Inline Actions We have cases where we'd like to avoid removing affine.parallel ops even when they have no IVs where the op represents something structural, like it might represent the outermost loop which we might want to do kernel outlining on. This was an attempt to prevent this particular canonicalization from running in this case. We've been 'tagging' ops in our pipeline and this was an easy way to control canonicalization. This isn't the most elegant or general solution so we're open to something better here. flaub: We have cases where we'd like to avoid removing affine.parallel ops even when they have no IVs…
		dcaballeUnsubmitted Not Done Reply Inline Actions One option could be to convert your special rank-0 affine.parallel to some other region op that properly models what you want (kernel outlining, in this case) before the canonicalization (or probably use that region op from the beginning? I'm missing how you get to this rank-0 affine.parallel scenario). Another option could be to have a dedicated attribute to prevent the canonicalization. I think I would lean towards the first option since we wouldn't be overloading rank-0 affine.parallel constructs with special semantics. Probably @rriddle, @ftynse, @bondhugula could also help here. dcaballe: One option could be to convert your special rank-0 affine.parallel to some other region op that…
		flaubAuthorUnsubmitted Done Reply Inline Actions IIUC, currently kernel outlining is based on `affine.for` and `scf.parallel` ops. Our plan was to lower `affine.parallel` to `scf.parallel` and then use the SCFToGPU pass and then eventually perform kernel outlining. If we were to canonicalize away this empty `affine.parallel`, then any inner `affine.parallel` would be elevated up one level, which would mean kernel outlining would be working on the wrong level. Our model assumes that the outermost `affine.parallel` represents iteration of the workgroup items. It's very possible that we'd have a single kernel launch for a single workgroup item, which would be represented as `affine.parallel () = () to ()`. Regarding adding a new op, I suppose that would work, although it seems redundant. What should we call this op? It would act as a shield for this canonicalization but then we'd need a special lowering for this op into `affine.for` or `scf.parallel`. And it's not really rank-0 that have these special semantics, it's any outermost loop that has special semantics that is imposed by kernel outlining. flaub: IIUC, currently kernel outlining is based on `affine.for` and `scf.parallel` ops. Our plan was…
		bondhugulaUnsubmitted Not Done Reply Inline Actions This looks like a hack / substitute due to a missing abstraction/region op. In fact, std.execute_region exactly model scenarios like this among several others. (There is the affine.execute_region as well but you won't need it here since it starts a new affine scope at that point.) Perhaps you may want to leave it the way that's simplest for this revision and put a TODO for the need of a more suitable abstraction? bondhugula: This looks like a hack / substitute due to a missing abstraction/region op. In fact, std.
		flaubAuthorUnsubmitted Done Reply Inline Actions Agreed this is a hack, looking for better alternatives, but it does practically work for now. I've been looking for the `std.execute_region` and `affine.execute_region` but can't find it. Was it a proposal or perhaps renamed? flaub: Agreed this is a hack, looking for better alternatives, but it does practically work for now.
		dcaballeUnsubmitted Not Done Reply Inline Actions To keep this moving while working on a better abstraction and since you are already overusing affine.parallel, would it be possible to remove the check below so that any affine.parallel is optimized and then add some fake bounds to your special affine.parallel so that it's not optimized away? Another option would be to define a specific attribute that we can use to prevent the optimization. We could check for that attribute only. WDYT? dcaballe: To keep this moving while working on a better abstraction and since you are already overusing…
		bondhugulaUnsubmitted Done Reply Inline Actions The execute_region ops are still pending - the corresponding revisions are dormant on differential. Although there are no unresolved issues w.r.t std.execute_region, I didn't finish it up for the lack of immediate use cases (an `std.yield` terminator needs to be added for it). bondhugula: The execute_region ops are still pending - the corresponding revisions are dormant on…

		bondhugulaUnsubmitted Not Done Reply Inline Actions The comment needs another line to explain the rationale. bondhugula: The comment needs another line to explain the rationale.
		AffineValueMap ub = op.getUpperBoundsValueMap();
		bool ubCanonicalized = succeeded(ub.canonicalize());

		// Any canonicalization change always leads to updated map(s).
		if (!lbCanonicalized && !ubCanonicalized)
		return failure();

		if (lbCanonicalized)
		op.setLowerBounds(lb.getOperands(), lb.getAffineMap());
		bondhugulaUnsubmitted Not Done Reply Inline Actions Nit: Terminate with full stop please. bondhugula: Nit: Terminate with full stop please.
		if (ubCanonicalized)
		op.setUpperBounds(ub.getOperands(), ub.getAffineMap());

		return success();
		rriddleUnsubmitted Done Reply Inline Actions Drop trivial braces here and below. rriddle: Drop trivial braces here and below.
		rriddleUnsubmitted Not Done Reply Inline Actions This seems like it should just be `rewriter.replaceOp(op, yield.operands());`. rriddle: This seems like it should just be `rewriter.replaceOp(op, yield.operands());`.
		}

		LogicalResult AffineParallelOp::fold(ArrayRef<Attribute> operands,
		SmallVectorImpl<OpFoldResult> &results) {
		return canonicalizeLoopBounds(*this);
		}
		rriddleUnsubmitted Not Done Reply Inline Actions Drop the mlir:: rriddle: Drop the mlir::

static void print(OpAsmPrinter &p, AffineParallelOp op) {		static void print(OpAsmPrinter &p, AffineParallelOp op) {
		rriddleUnsubmitted Not Done Reply Inline Actions Why is this hook necessary? Seems like you should be using `rewriter.replaceOp(op, yield.operands())`instead of this loop and eraseOp. rriddle: Why is this hook necessary? Seems like you should be using `rewriter.replaceOp(op, yield.
p << op.getOperationName() << " (" << op.getBody()->getArguments() << ") = (";		p << op.getOperationName() << " (" << op.getBody()->getArguments() << ") = (";
		rriddleUnsubmitted Not Done Reply Inline Actions This would also need to go through the rewriter. rriddle: This would also need to go through the rewriter.
p.printAffineMapOfSSAIds(op.lowerBoundsMapAttr(),		p.printAffineMapOfSSAIds(op.lowerBoundsMapAttr(),
op.getLowerBoundsOperands());		op.getLowerBoundsOperands());
p << ") to (";		p << ") to (";
p.printAffineMapOfSSAIds(op.upperBoundsMapAttr(),		p.printAffineMapOfSSAIds(op.upperBoundsMapAttr(),
op.getUpperBoundsOperands());		op.getUpperBoundsOperands());
p << ')';		p << ')';
SmallVector<int64_t, 4> steps;		SmallVector<int64_t, 8> steps = op.getSteps();
bool elideSteps = true;		bool elideSteps = llvm::all_of(steps, [](int64_t step) { return step == 1; });
for (auto attr : op.steps()) {
auto step = attr.cast<IntegerAttr>().getInt();
elideSteps &= (step == 1);
steps.push_back(step);
}
if (!elideSteps) {		if (!elideSteps) {
p << " step (";		p << " step (";
llvm::interleaveComma(steps, p);		llvm::interleaveComma(steps, p);
p << ')';		p << ')';
}		}
if (op.getNumResults()) {		if (op.getNumResults()) {
p << " reduce (";		p << " reduce (";
llvm::interleaveComma(op.reductions(), p, [&](auto &attr) {		llvm::interleaveComma(op.reductions(), p, [&](auto &attr) {
AtomicRMWKind sym =		AtomicRMWKind sym =
*symbolizeAtomicRMWKind(attr.template cast<IntegerAttr>().getInt());		*symbolizeAtomicRMWKind(attr.template cast<IntegerAttr>().getInt());
p << "\"" << stringifyAtomicRMWKind(sym) << "\"";		p << "\"" << stringifyAtomicRMWKind(sym) << "\"";
});		});
p << ") -> (" << op.getResultTypes() << ")";		p << ") -> (" << op.getResultTypes() << ")";
		bondhugulaUnsubmitted Not Done Reply Inline Actions Likewise and anywhere else. bondhugula: Likewise and anywhere else.
}		}

p.printRegion(op.region(), /printEntryBlockArgs=/false,		p.printRegion(op.region(), /printEntryBlockArgs=/false,
/printBlockTerminators=/op.getNumResults());		/printBlockTerminators=/op.getNumResults());
p.printOptionalAttrDict(		p.printOptionalAttrDict(
op.getAttrs(),		op.getAttrs(),
/elidedAttrs=/{AffineParallelOp::getReductionsAttrName(),		/elidedAttrs=/{AffineParallelOp::getReductionsAttrName(),
AffineParallelOp::getLowerBoundsMapAttrName(),		AffineParallelOp::getLowerBoundsMapAttrName(),
AffineParallelOp::getUpperBoundsMapAttrName(),		AffineParallelOp::getUpperBoundsMapAttrName(),
AffineParallelOp::getStepsAttrName()});		AffineParallelOp::getStepsAttrName()});
}		}

		rriddleUnsubmitted Not Done Reply Inline Actions Not sure what you mean here, the rewriter will gladly rollback successfully applied patterns. Just because a pattern was applied doesn't mean it won't get rolled back. rriddle: Not sure what you mean here, the rewriter will gladly rollback successfully applied patterns.
		flaubAuthorUnsubmitted Done Reply Inline Actions My understanding was that `arg.replaceAllUsesWith(lowerBoundValue);` was not something the rewriter could rollback. So I was attempting to determine if the transformation was legal before doing something that couldn't be recovered from. flaub: My understanding was that `arg.replaceAllUsesWith(lowerBoundValue);` was not something the…
		rriddleUnsubmitted Not Done Reply Inline Actions If something isn't expressible with the PatternRewriter, we generally just add the necessary API for that mutation. It's fine even if not all of the pattern rewrite drivers support it immediately, we just add `llvm_unreachable("Unsupported ...")` in those cases. For this case, you should be able to use the `replaceUsesOfBlockArgumentWith` method, but for some reason that is only on ConversionPatternRewriter. We should move that to the base PatternRewriter class as a new hook, at which point you could use it directly. https://github.com/llvm/llvm-project/blob/49bbb8b60e451d173c7dd42993592e8aa4d95f24/mlir/include/mlir/Transforms/DialectConversion.h#L458 The default implementation should effectively just be this inner loop: https://github.com/llvm/llvm-project/blob/49bbb8b60e451d173c7dd42993592e8aa4d95f24/mlir/lib/Transforms/DialectConversion.cpp#L889 rriddle: If something isn't expressible with the PatternRewriter, we generally just add the necessary…
//		//
// operation ::= `affine.parallel` `(` ssa-ids `)` `=` `(` map-of-ssa-ids `)`		// operation ::= `affine.parallel` `(` ssa-ids `)` `=` `(` map-of-ssa-ids `)`
// `to` `(` map-of-ssa-ids `)` steps? region attr-dict?		// `to` `(` map-of-ssa-ids `)` steps? region attr-dict?
// steps ::= `steps` `(` integer-literals `)`		// steps ::= `steps` `(` integer-literals `)`
//		//
static ParseResult parseAffineParallelOp(OpAsmParser &parser,		static ParseResult parseAffineParallelOp(OpAsmParser &parser,
OperationState &result) {		OperationState &result) {
auto &builder = parser.getBuilder();		auto &builder = parser.getBuilder();
auto indexType = builder.getIndexType();		auto indexType = builder.getIndexType();
		bondhugulaUnsubmitted Not Done Reply Inline Actions Nit: prefix `Map` to name. bondhugula: Nit: prefix `Map` to name.
AffineMapAttr lowerBoundsAttr, upperBoundsAttr;		AffineMapAttr lowerBoundsAttr, upperBoundsAttr;
SmallVector<OpAsmParser::OperandType, 4> ivs;		SmallVector<OpAsmParser::OperandType, 4> ivs;
SmallVector<OpAsmParser::OperandType, 4> lowerBoundsMapOperands;		SmallVector<OpAsmParser::OperandType, 4> lowerBoundsMapOperands;
		rriddleUnsubmitted Not Done Reply Inline Actions This would also need to go through the rewriter. rriddle: This would also need to go through the rewriter.
SmallVector<OpAsmParser::OperandType, 4> upperBoundsMapOperands;		SmallVector<OpAsmParser::OperandType, 4> upperBoundsMapOperands;
if (parser.parseRegionArgumentList(ivs, /requiredOperandCount=/-1,		if (parser.parseRegionArgumentList(ivs, /requiredOperandCount=/-1,
OpAsmParser::Delimiter::Paren) \|\|		OpAsmParser::Delimiter::Paren) \|\|
parser.parseEqual() \|\|		parser.parseEqual() \|\|
parser.parseAffineMapOfSSAIds(		parser.parseAffineMapOfSSAIds(
		rriddleUnsubmitted Not Done Reply Inline Actions All of these transformations are breaking the contract with the pattern rewriter, you can't do these in place. Some you can (setting operands, attribute) but you are required to start a root update. rriddle: All of these transformations are breaking the contract with the pattern rewriter, you can't do…
		flaubAuthorUnsubmitted Done Reply Inline Actions OK, I'll try to think of a different way to do this. Do you have docs or test cases I can refer to that can teach me the right way to do this? flaub: OK, I'll try to think of a different way to do this. Do you have docs or test cases I can refer…
		rriddleUnsubmitted Not Done Reply Inline Actions Docs on rewriter API are lacking right now, but I've been working on fixing that. If you want to do a root update, you need to use the root update API: https://github.com/llvm/llvm-project/blob/49bbb8b60e451d173c7dd42993592e8aa4d95f24/mlir/include/mlir/IR/PatternMatch.h#L343 It allows for updating specific parts of an operation in-place, i.e. attributes/operations/successors, but not others(e.g. things happening in regions). rriddle: Docs on rewriter API are lacking right now, but I've been working on fixing that. If you want…
		rriddleUnsubmitted Not Done Reply Inline Actions You can use `op.attrNameAttr(newAttr)` to set a specific attribute, e.g. in this case `op. lowerBoundsMapAttr(AffineMapAttr::get(newLower))`. rriddle: You can use `op.attrNameAttr(newAttr)` to set a specific attribute, e.g. in this case `op.
lowerBoundsMapOperands, lowerBoundsAttr,		lowerBoundsMapOperands, lowerBoundsAttr,
AffineParallelOp::getLowerBoundsMapAttrName(), result.attributes,		AffineParallelOp::getLowerBoundsMapAttrName(), result.attributes,
OpAsmParser::Delimiter::Paren) \|\|		OpAsmParser::Delimiter::Paren) \|\|
parser.resolveOperands(lowerBoundsMapOperands, indexType,		parser.resolveOperands(lowerBoundsMapOperands, indexType,
result.operands) \|\|		result.operands) \|\|
parser.parseKeyword("to") \|\|		parser.parseKeyword("to") \|\|
parser.parseAffineMapOfSSAIds(		parser.parseAffineMapOfSSAIds(
upperBoundsMapOperands, upperBoundsAttr,		upperBoundsMapOperands, upperBoundsAttr,
AffineParallelOp::getUpperBoundsMapAttrName(), result.attributes,		AffineParallelOp::getUpperBoundsMapAttrName(), result.attributes,
		bondhugulaUnsubmitted Done Reply Inline Actions Doc comment please. bondhugula: Doc comment please.
		dcaballeUnsubmitted Not Done Reply Inline Actions Doc still missing? dcaballe: Doc still missing?
OpAsmParser::Delimiter::Paren) \|\|		OpAsmParser::Delimiter::Paren) \|\|
parser.resolveOperands(upperBoundsMapOperands, indexType,		parser.resolveOperands(upperBoundsMapOperands, indexType,
result.operands))		result.operands))
return failure();		return failure();

AffineMapAttr stepsMapAttr;		AffineMapAttr stepsMapAttr;
NamedAttrList stepsAttrs;		NamedAttrList stepsAttrs;
SmallVector<OpAsmParser::OperandType, 4> stepsMapOperands;		SmallVector<OpAsmParser::OperandType, 4> stepsMapOperands;
Show All 18 Lines	for (const auto &result : stepsMap.getResults()) {
"steps must be constant integers");		"steps must be constant integers");
steps.push_back(constExpr.getValue());		steps.push_back(constExpr.getValue());
}		}
result.addAttribute(AffineParallelOp::getStepsAttrName(),		result.addAttribute(AffineParallelOp::getStepsAttrName(),
builder.getI64ArrayAttr(steps));		builder.getI64ArrayAttr(steps));
}		}

// Parse optional clause of the form: `reduce ("addf", "maxf")`, where the		// Parse optional clause of the form: `reduce ("addf", "maxf")`, where the
// quoted strings a member of the enum AtomicRMWKind.		// quoted strings are a member of the enum AtomicRMWKind.
SmallVector<Attribute, 4> reductions;		SmallVector<Attribute, 4> reductions;
if (succeeded(parser.parseOptionalKeyword("reduce"))) {		if (succeeded(parser.parseOptionalKeyword("reduce"))) {
if (parser.parseLParen())		if (parser.parseLParen())
return failure();		return failure();
do {		do {
// Parse a single quoted string via the attribute parsing, and then		// Parse a single quoted string via the attribute parsing, and then
// verify it is a member of the enum and convert to it's integer		// verify it is a member of the enum and convert to it's integer
// representation.		// representation.
▲ Show 20 Lines • Show All 183 Lines • Show Last 20 Lines

mlir/lib/Dialect/Affine/Transforms/AffineParallelNormalize.cpp

This file was added.

				//===- AffineParallelNormalize.cpp - AffineParallelNormalize Pass ---------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements a normalizer for affine parallel loops.
				//
				//===----------------------------------------------------------------------===//

				#include "PassDetail.h"
				#include "mlir/Dialect/Affine/IR/AffineOps.h"
				#include "mlir/Dialect/Affine/IR/AffineValueMap.h"
				#include "mlir/Dialect/Affine/Passes.h"
				#include "mlir/IR/PatternMatch.h"

				using namespace mlir;

				void normalizeAffineParallel(AffineParallelOp op) {
				AffineMap lbMap = op.lowerBoundsMap();
				SmallVector<int64_t, 8> steps = op.getSteps();
				// No need to do any work if the parallel op is already normalized.
				bondhugulaUnsubmitted Done Reply Inline Actions Could you add a line on whether it's always guaranteed to succeed or when it might fail. bondhugula: Could you add a line on whether it's always guaranteed to succeed or when it might fail.
				bool isAlreadyNormalized =
				llvm::all_of(llvm::zip(steps, lbMap.getResults()), [](auto tuple) {
				int64_t step = std::get<0>(tuple);
				auto lbExpr =
				std::get<1>(tuple).template dyn_cast<AffineConstantExpr>();
				return lbExpr && lbExpr.getValue() == 0 && step == 1;
				});
				if (isAlreadyNormalized)
				return;

				AffineValueMap ranges = op.getRangesValueMap();
				auto builder = OpBuilder::atBlockBegin(op.getBody());
				auto zeroExpr = builder.getAffineConstantExpr(0);
				SmallVector<AffineExpr, 8> lbExprs;
				bondhugulaUnsubmitted Done Reply Inline Actions `steps` should all be stored in int64_t for consistency. bondhugula: `steps` should all be stored in int64_t for consistency.
				SmallVector<AffineExpr, 8> ubExprs;
				for (unsigned i = 0, e = steps.size(); i < e; ++i) {
				bondhugulaUnsubmitted Done Reply Inline Actions A comment here and/or one for the `isWorkPending \|= ...` line in the body. bondhugula: A comment here and/or one for the `isWorkPending \|= ...` line in the body.
				int64_t step = steps[i];
				bondhugulaUnsubmitted Done Reply Inline Actions auto -> int64_t bondhugula: auto -> int64_t

				// Adjust the lower bound to be 0.
				lbExprs.push_back(zeroExpr);

				// Adjust the upper bound expression: 'range / step'.
				AffineExpr ubExpr = ranges.getResult(i).ceilDiv(step);
				ubExprs.push_back(ubExpr);

				// Adjust the corresponding IV: 'lb + i * step'.
				BlockArgument iv = op.getBody()->getArgument(i);
				AffineExpr lbExpr = lbMap.getResult(i);
				unsigned nDims = lbMap.getNumDims();
				auto expr = lbExpr + builder.getAffineDimExpr(nDims) * step;
				auto map = AffineMap::get(/dimCount=/nDims + 1,
				/symbolCount=/lbMap.getNumSymbols(), expr);

				// Use an 'affine.apply' op that will be simplified later in subsequent
				// canonicalizations.
				OperandRange lbOperands = op.getLowerBoundsOperands();
				OperandRange dimOperands = lbOperands.take_front(nDims);
				OperandRange symbolOperands = lbOperands.drop_front(nDims);
				bondhugulaUnsubmitted Not Done Reply Inline Actions This looks like a bug. Shouldn't this be ceilDiv? bondhugula: This looks like a bug. Shouldn't this be ceilDiv?
				flaubAuthorUnsubmitted Done Reply Inline Actions Good catch! flaub: Good catch!
				SmallVector<Value, 8> applyOperands{dimOperands};
				applyOperands.push_back(iv);
				applyOperands.append(symbolOperands.begin(), symbolOperands.end());
				auto apply = builder.create<AffineApplyOp>(op.getLoc(), map, applyOperands);
				iv.replaceAllUsesExcept(apply, SmallPtrSet<Operation *, 1>{apply});
				}

				SmallVector<int64_t, 8> newSteps(op.getNumDims(), 1);
				op.setSteps(newSteps);
				auto newLowerMap = AffineMap::get(
				/dimCount=/0, /symbolCount=/0, lbExprs, op.getContext());
				op.setLowerBounds({}, newLowerMap);
				auto newUpperMap = AffineMap::get(ranges.getNumDims(), ranges.getNumSymbols(),
				ubExprs, op.getContext());
				op.setUpperBounds(ranges.getOperands(), newUpperMap);
				}

				namespace {

				/// Normalize affine.parallel ops so that lower bounds are 0 and steps are 1.
				/// As currently implemented, this pass cannot fail, but it might skip over ops
				/// that are already in a normalized form.
				struct AffineParallelNormalizePass
				: public AffineParallelNormalizeBase<AffineParallelNormalizePass> {

				void runOnFunction() override { getFunction().walk(normalizeAffineParallel); }
				};

				} // namespace

				std::unique_ptr<OperationPass<FuncOp>>
				mlir::createAffineParallelNormalizePass() {
				return std::make_unique<AffineParallelNormalizePass>();
				}

mlir/lib/Dialect/Affine/Transforms/CMakeLists.txt

	add_mlir_dialect_library(MLIRAffineTransforms			add_mlir_dialect_library(MLIRAffineTransforms
	AffineDataCopyGeneration.cpp			AffineDataCopyGeneration.cpp
	AffineLoopInvariantCodeMotion.cpp			AffineLoopInvariantCodeMotion.cpp
	AffineParallelize.cpp			AffineParallelize.cpp
				AffineParallelNormalize.cpp
	LoopTiling.cpp			LoopTiling.cpp
	LoopUnroll.cpp			LoopUnroll.cpp
	LoopUnrollAndJam.cpp			LoopUnrollAndJam.cpp
	SuperVectorize.cpp			SuperVectorize.cpp
	SimplifyAffineStructures.cpp			SimplifyAffineStructures.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Affine			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Affine
	Show All 19 Lines

mlir/test/Dialect/Affine/affine-parallel-normalize.mlir

This file was added.

				// RUN: mlir-opt %s -affine-parallel-normalize -split-input-file \| FileCheck %s

				bondhugulaUnsubmitted Not Done Reply Inline Actions Are you sure you want to run `-canonicalize` here? This would make it an integration test and in many cases may make it hard/non-trivial to maintain test cases due to changes to `-canonicalize`. I understand it makes it much easier/intuitive for you to write the CHECK lines - more compact IR but in general combining multiple passes for for something that is meant to test just a specific pass is discouraged. Actually, if you really want to run the canonicalizer always after this, you can run it from inside the pass and just drop this `-canonicalize`. But that still gives one a combined effect. What do you think? bondhugula: Are you sure you want to run `-canonicalize` here? This would make it an integration test and…
				flaubAuthorUnsubmitted Done Reply Inline Actions I've dropped the test that relies on canonicalization because it's true that as a unit test, we don't need to also test the combined capability. For context, this pass was originally written as a canonicalization pattern. I think a further refinement would be to provide a way to call the transformation directly without going thru a pass to allow callers to decide whether they want to combine canonicalization with normalization. The combination is what I require in my current use cases. But as a general purpose library I see the value in not restricting other users to that particular use case. flaub: I've dropped the test that relies on canonicalization because it's true that as a unit test, we…
				// Normalize steps to 1 and lower bounds to 0.

				bondhugulaUnsubmitted Done Reply Inline Actions Nit: Set -> Normalize bondhugula: Nit: Set -> Normalize
				// CHECK-DAG: [[$MAP0:#map[0-9]+]] = affine_map<(d0) -> (d0 * 3)>
				// CHECK-DAG: [[$MAP1:#map[0-9]+]] = affine_map<(d0) -> (d0 * 2 + 1)>
				// CHECK-DAG: [[$MAP2:#map[0-9]+]] = affine_map<(d0, d1) -> (d0 + d1)>

				// CHECK-LABEL: func @normalize_parallel()
				func @normalize_parallel() {
				bondhugulaUnsubmitted Not Done Reply Inline Actions Shouldn't there be 4 iterations in the outer loop here? bondhugula: Shouldn't there be 4 iterations in the outer loop here?
				flaubAuthorUnsubmitted Done Reply Inline Actions Yes! Thanks for catching this! flaub: Yes! Thanks for catching this!
				%cst = constant 1.0 : f32
				%0 = alloc() : memref<2x4xf32>
				// CHECK: affine.parallel (%[[i0:.]], %[[j0:.]]) = (0, 0) to (4, 2)
				affine.parallel (%i, %j) = (0, 1) to (10, 5) step (3, 2) {
				// CHECK: %[[i1:.*]] = affine.apply [[$MAP0]](%[[i0]])
				// CHECK: %[[j1:.*]] = affine.apply [[$MAP1]](%[[j0]])
				// CHECK: affine.parallel (%[[k0:.*]]) = (0) to (%[[j1]] - %[[i1]])
				affine.parallel (%k) = (%i) to (%j) {
				// CHECK: %[[k1:.*]] = affine.apply [[$MAP2]](%[[i1]], %[[k0]])
				// CHECK: affine.store %{{.}}, %{{.}}[%[[i1]], %[[k1]]] : memref<2x4xf32>
				affine.store %cst, %0[%i, %k] : memref<2x4xf32>
				}
				}
				bondhugulaUnsubmitted Not Done Reply Inline Actions Nit: bounds -> bound or bounds' bondhugula: Nit: bounds -> bound or bounds'
				return
				}

mlir/test/Dialect/Affine/canonicalize.mlir

	Show First 20 Lines • Show All 598 Lines • ▼ Show 20 Lines

	func @drop_duplicate_bounds(%N : index) {			func @drop_duplicate_bounds(%N : index) {
	// affine.for %i = max #lb(%arg0) to min #ub(%arg0)			// affine.for %i = max #lb(%arg0) to min #ub(%arg0)
	affine.for %i = max affine_map<(d0) -> (d0, d0)>(%N) to min affine_map<(d0) -> (d0 + 2, d0 + 2)>(%N) {			affine.for %i = max affine_map<(d0) -> (d0, d0)>(%N) to min affine_map<(d0) -> (d0 + 2, d0 + 2)>(%N) {
	"foo"() : () -> ()			"foo"() : () -> ()
	}			}
	return			return
	}			}

				// -----

				// Ensure affine.parallel bounds expressions are canonicalized.

				#map3 = affine_map<(d0) -> (d0 * 5)>

				// CHECK-LABEL: func @affine_parallel_const_bounds
				func @affine_parallel_const_bounds() {
				%cst = constant 1.0 : f32
				%c0 = constant 0 : index
				%c4 = constant 4 : index
				%0 = alloc() : memref<4xf32>
				// CHECK: affine.parallel (%{{.*}}) = (0) to (4)
				affine.parallel (%i) = (%c0) to (%c0 + %c4) {
				%1 = affine.apply #map3(%i)
				// CHECK: affine.parallel (%{{.}}) = (0) to (%{{.}} * 5)
				affine.parallel (%j) = (%c0) to (%1) {
				affine.store %cst, %0[%j] : memref<4xf32>
				}
				}
				return
				}

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Add affine.parallel folder and AffineParallelNormalizePassClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 286906

mlir/include/mlir/Dialect/Affine/IR/AffineOps.td

mlir/include/mlir/Dialect/Affine/IR/AffineValueMap.h

mlir/include/mlir/Dialect/Affine/Passes.h

mlir/include/mlir/Dialect/Affine/Passes.td

mlir/include/mlir/Dialect/Affine/Utils.h

mlir/lib/Dialect/Affine/IR/AffineOps.cpp

mlir/lib/Dialect/Affine/Transforms/AffineParallelNormalize.cpp

mlir/lib/Dialect/Affine/Transforms/CMakeLists.txt

mlir/test/Dialect/Affine/affine-parallel-normalize.mlir

mlir/test/Dialect/Affine/canonicalize.mlir

[MLIR] Add affine.parallel folder and AffineParallelNormalizePass
ClosedPublic