This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Affine/IR/
-
mlir/
-
Dialect/
-
Affine/
-
IR/
-
AffineOps.td
1/1
AffineValueMap.h
-
lib/Dialect/Affine/IR/
-
Dialect/
-
Affine/
-
IR/
10/30
AffineOps.cpp
-
test/Dialect/Affine/
-
Dialect/
-
Affine/
2/3
affine-fold.mlir

Differential D84998

[MLIR] Add affine.parallel folder and AffineParallelNormalizePass
ClosedPublic

Authored by flaub on Jul 30 2020, 7:38 PM.

Download Raw Diff

Details

Reviewers

jbruestle
nicolasvasilache
ftynse
rriddle
andydavis1
bondhugula
dcaballe

Commits

rGcca3f3dd2681: [MLIR] Add affine.parallel folder and normalizer

Summary

Add a folder to the affine.parallel op so that loop bounds expressions are canonicalized.

Additionally, a new AffineParallelNormalizePass is added to adjust affine.parallel ops so that the lower bound is always 0 and the upper bound always represents a range with a step size of 1.

Diff Detail

Event Timeline

flaub created this revision.Jul 30 2020, 7:38 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 30 2020, 7:38 PM

Herald added subscribers: msifontes, jurahul, Kayjukh and 12 others. · View Herald Transcript

flaub requested review of this revision.Jul 30 2020, 7:38 PM

Herald added a subscriber: stephenneuendorffer. · View Herald TranscriptJul 30 2020, 7:38 PM

Harbormaster completed remote builds in B66492: Diff 282105.Jul 30 2020, 7:55 PM

Fix pre-merge lint check

I notice auto being used pervasively and it's impacting readability at most places. Please spell the type out.

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2525	assert message please.
2681	Doc comment please.
mlir/test/Dialect/Affine/affine-fold.mlir
3	Please add a couple of lines here on what these test cases are testing.

This revision now requires changes to proceed.Jul 31 2020, 2:15 AM

Address CR comments

flaub marked 3 inline comments as done.Jul 31 2020, 11:59 AM

@dcaballe or @ftynse - could one of you review this please? I'm quite tied up the next week.

bondhugula edited reviewers, added: andydavis1; removed: bondhugula.Aug 1 2020, 12:09 AM

Thanks for the patch! Just a high-level comment for now. It looks good overall.

Any thoughts on what should go to the canonicalization pass and what should go to an independent pass? Removing dead loops or 1-trip-count loops seem like a good fit to me. However, normalizing the loop bounds and step (SimplifyAffineParallel) is something that might not be always desirable since it's moving the complexity from the loop control to the loop body. Would it make sense to have an independent loop normalization pass for that? I'm worried about adding too many transformation to canonicalization and that we have to go all or nothing with them.

Also, any thoughts on how we can refactor this so that we can also use it for affine.for or even scf? Could it be implemented using the loop interface? Could you give that a try? Otherwise we would end up replicating this 4 times...

rriddle added inline comments.Aug 3 2020, 2:04 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2655	Not sure what you mean here, the rewriter will gladly rollback successfully applied patterns. Just because a pattern was applied doesn't mean it won't get rolled back.

A lot of your patterns (all of them?) are breaking the contract with the pattern rewriter right now.

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2613	Drop trivial braces here and below.
2672	All of these transformations are breaking the contract with the pattern rewriter, you can't do these in place. Some you can (setting operands, attribute) but you are required to start a root update.

flaub added inline comments.Aug 3 2020, 3:59 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2655	My understanding was that `arg.replaceAllUsesWith(lowerBoundValue);` was not something the rewriter could rollback. So I was attempting to determine if the transformation was legal before doing something that couldn't be recovered from.
2672	OK, I'll try to think of a different way to do this. Do you have docs or test cases I can refer to that can teach me the right way to do this?

rriddle added inline comments.Aug 3 2020, 4:31 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2655	If something isn't expressible with the PatternRewriter, we generally just add the necessary API for that mutation. It's fine even if not all of the pattern rewrite drivers support it immediately, we just add `llvm_unreachable("Unsupported ...")` in those cases. For this case, you should be able to use the `replaceUsesOfBlockArgumentWith` method, but for some reason that is only on ConversionPatternRewriter. We should move that to the base PatternRewriter class as a new hook, at which point you could use it directly. https://github.com/llvm/llvm-project/blob/49bbb8b60e451d173c7dd42993592e8aa4d95f24/mlir/include/mlir/Transforms/DialectConversion.h#L458 The default implementation should effectively just be this inner loop: https://github.com/llvm/llvm-project/blob/49bbb8b60e451d173c7dd42993592e8aa4d95f24/mlir/lib/Transforms/DialectConversion.cpp#L889
2672	Docs on rewriter API are lacking right now, but I've been working on fixing that. If you want to do a root update, you need to use the root update API: https://github.com/llvm/llvm-project/blob/49bbb8b60e451d173c7dd42993592e8aa4d95f24/mlir/include/mlir/IR/PatternMatch.h#L343 It allows for updating specific parts of an operation in-place, i.e. attributes/operations/successors, but not others(e.g. things happening in regions).
2672	You can use `op.attrNameAttr(newAttr)` to set a specific attribute, e.g. in this case `op. lowerBoundsMapAttr(AffineMapAttr::get(newLower))`.

@rriddle I've attempted to follow your advice, could you take another look? Is this what you were thinking of?

@dcaballe I think you're right about not including the AffineSimplifyParallel with canonicalization. I've split this out into its own pass, what do you think?

Herald added a subscriber: mgorny. · View Herald TranscriptAug 5 2020, 2:08 PM

flaub updated this revision to Diff 283394.Aug 5 2020, 2:13 PM

flaub marked an inline comment as done.

dcaballe requested changes to this revision.Aug 5 2020, 4:12 PM

Thanks Frank! It looks good to me. Just a few minor comments

mlir/include/mlir/Dialect/Affine/Passes.h
45 ↗	(On Diff #283394)	Could we use Normalize instead of Simplify? https://en.wikipedia.org/wiki/Normalized_loop We can rename it later on if we add other simplifications in the future.
mlir/include/mlir/Dialect/Affine/Passes.td
121 ↗	(On Diff #283394)	Probably good to add the previous comment to this description: / Simplify affine.parallel ops so that they have a step size of 1 and a lower / bound of 0.
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2599	I don't understand this condition. Why don't remove a 0-rank loop if it has a custom attribute?
2681	Doc still missing?
mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp
22 ↗	(On Diff #283394)	Same, Simplify -> Normalize
38 ↗	(On Diff #283394)	We shouldn't run full blown canonicalization on all the ops as part of this pass. If canonicalization is needed, we can always invoke the canonicalizer after this pass. In that way, we would leave the decision of running canonicalization or not after this pass to the user.
51 ↗	(On Diff #283394)	nit, readability: lbExpr.getValue() -> lbExpr.getValue() != 0
mlir/test/Dialect/Affine/affine-fold.mlir
24	no iteration -> only one iteration?

This revision now requires changes to proceed.Aug 5 2020, 4:12 PM

flaub added inline comments.Aug 6 2020, 11:04 AM

mlir/include/mlir/Dialect/Affine/Passes.h
45 ↗	(On Diff #283394)	Gotcha, will do.
mlir/include/mlir/Dialect/Affine/Passes.td
121 ↗	(On Diff #283394)	Was trying to keep this short since this is just for the command line. Looking at the other summaries that seems consistent. I can try to make it a little bit more descriptive. How about if we go with the normalize nomenclature, can we just say: "Normalize affine.parallel ops so that lower bounds are 0 and step size is 1."
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2599	We have cases where we'd like to avoid removing affine.parallel ops even when they have no IVs where the op represents something structural, like it might represent the outermost loop which we might want to do kernel outlining on. This was an attempt to prevent this particular canonicalization from running in this case. We've been 'tagging' ops in our pipeline and this was an easy way to control canonicalization. This isn't the most elegant or general solution so we're open to something better here.
mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp
38 ↗	(On Diff #283394)	Fair enough I'll remove it and just add it to our pipeline as a separate pass.
mlir/test/Dialect/Affine/affine-fold.mlir
24	Actually I meant to say when no IVs exist. But yes, that's right.

dcaballe added a subscriber: bondhugula.Aug 6 2020, 11:56 AM

dcaballe added inline comments.

mlir/include/mlir/Dialect/Affine/Passes.td
121 ↗	(On Diff #283394)	Sounds good!
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2599	One option could be to convert your special rank-0 affine.parallel to some other region op that properly models what you want (kernel outlining, in this case) before the canonicalization (or probably use that region op from the beginning? I'm missing how you get to this rank-0 affine.parallel scenario). Another option could be to have a dedicated attribute to prevent the canonicalization. I think I would lean towards the first option since we wouldn't be overloading rank-0 affine.parallel constructs with special semantics. Probably @rriddle, @ftynse, @bondhugula could also help here.

flaub added inline comments.Aug 6 2020, 12:55 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2599	IIUC, currently kernel outlining is based on `affine.for` and `scf.parallel` ops. Our plan was to lower `affine.parallel` to `scf.parallel` and then use the SCFToGPU pass and then eventually perform kernel outlining. If we were to canonicalize away this empty `affine.parallel`, then any inner `affine.parallel` would be elevated up one level, which would mean kernel outlining would be working on the wrong level. Our model assumes that the outermost `affine.parallel` represents iteration of the workgroup items. It's very possible that we'd have a single kernel launch for a single workgroup item, which would be represented as `affine.parallel () = () to ()`. Regarding adding a new op, I suppose that would work, although it seems redundant. What should we call this op? It would act as a shield for this canonicalization but then we'd need a special lowering for this op into `affine.for` or `scf.parallel`. And it's not really rank-0 that have these special semantics, it's any outermost loop that has special semantics that is imposed by kernel outlining.

bondhugula requested changes to this revision.Aug 6 2020, 1:15 PM

bondhugula added inline comments.

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2599	This looks like a hack / substitute due to a missing abstraction/region op. In fact, std.execute_region exactly model scenarios like this among several others. (There is the affine.execute_region as well but you won't need it here since it starts a new affine scope at that point.) Perhaps you may want to leave it the way that's simplest for this revision and put a TODO for the need of a more suitable abstraction?
2609	Nit: Terminate with full stop please.
2643	Likewise and anywhere else.
mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
81 ↗	(On Diff #283394)	Avoid `auto` here please.
87 ↗	(On Diff #283394)	Likewise.

bondhugula added inline comments.Aug 6 2020, 1:19 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2664	Nit: prefix `Map` to name.
mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp
38 ↗	(On Diff #283394)	Note that, where needed, there is also a local version of the canonicalizer to apply on specific ops if you know which ops you want to canonicalize.

bondhugula added inline comments.Aug 6 2020, 1:22 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2600	The comment needs another line to explain the rationale.
mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp
64 ↗	(On Diff #283394)	Nit: steps should always be int64_t.

flaub added inline comments.Aug 6 2020, 1:27 PM

mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
112 ↗	(On Diff #283394)	@bondhugula I'm confused, this file already uses this style for `auto`. How do I know which style to use?

flaub added inline comments.Aug 6 2020, 1:30 PM

mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp
38 ↗	(On Diff #283394)	Is there an example of this someplace? That seems like something that would be useful in this case.

flaub added inline comments.Aug 6 2020, 1:39 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2599	Agreed this is a hack, looking for better alternatives, but it does practically work for now. I've been looking for the `std.execute_region` and `affine.execute_region` but can't find it. Was it a proposal or perhaps renamed?

Address comments.

flaub edited the summary of this revision. (Show Details)Aug 6 2020, 2:13 PM

@dcaballe & @bondhugula Is there anything else you'd like for me to address?

@rriddle I added replaceUsesOfBlockArgument and replaceUsesOfWith to the PatternRewriter, left the default impl as unsupported, and then implemented a simple version for GreedyPatternRewriteDriver, WDYT?

dcaballe added inline comments.Aug 10 2020, 2:07 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2599	To keep this moving while working on a better abstraction and since you are already overusing affine.parallel, would it be possible to remove the check below so that any affine.parallel is optimized and then add some fake bounds to your special affine.parallel so that it's not optimized away? Another option would be to define a specific attribute that we can use to prevent the optimization. We could check for that attribute only. WDYT?

In D84998#2205662, @flaub wrote:

@rriddle I added replaceUsesOfBlockArgument and replaceUsesOfWith to the PatternRewriter, left the default impl as unsupported, and then implemented a simple version for GreedyPatternRewriteDriver, WDYT?

Sorry for the delay, was waiting for some of the other discussion on this revision to resolve first.

mlir/include/mlir/IR/PatternMatch.h
318 ↗	(On Diff #283734)	Please use the inner loop here as the default implementation of this hook (https://github.com/llvm/llvm-project/blob/49bbb8b60e451d173c7dd42993592e8aa4d95f24/mlir/lib/Transforms/DialectConversion.cpp#L889). The operation rooted at `to` could be using `from`, using the implementation linked above allows for the use cases to "just work".
322 ↗	(On Diff #283734)	I don't think this is necessary, see the comment at the use of it below.
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2613	This seems like it should just be `rewriter.replaceOp(op, yield.operands());`.
2619	Drop the mlir::
2621	Why is this hook necessary? Seems like you should be using `rewriter.replaceOp(op, yield.operands())`instead of this loop and eraseOp.
2622	This would also need to go through the rewriter.
2667	This would also need to go through the rewriter.
mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp
33 ↗	(On Diff #283394)	Please do not do this. This is placing a full run of the canonicalizer pass inside of your pass, just let the user schedule the canonicalizer in their pipeline.
mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
80 ↗	(On Diff #283734)	When you update the default implementation, you can call it from here.

bondhugula added inline comments.Aug 13 2020, 12:52 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2599	The execute_region ops are still pending - the corresponding revisions are dormant on differential. Although there are no unresolved issues w.r.t std.execute_region, I didn't finish it up for the lack of immediate use cases (an `std.yield` terminator needs to be added for it).

bondhugula added inline comments.Aug 13 2020, 12:59 PM

mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
112 ↗	(On Diff #283394)	A lot of the old code was written with an overuse of `auto`. That's fine - it's not a big deal here. In general, avoid `auto` if it doesn't improve readability. The issue is that the type is often obvious to the author of the revision at this point, but it often isn't when reading it later "locally".

After some more discussions and thought, I've removed the complex canonicalizations that:

break the PatternRewriter rules
have hacky special exemptions

What's left is the folder for affine.parallel (very similar to the one for affine.for) and the separate AffineParallelNormalizePass. The other patterns will be refactored as separate passes that we might submit here at a later time.

Hey Frank, didn't mean for this to force you to scale back the patch. If it helps I can implement the pattern rewriter hooks that you would need. Just let me know.

River

bondhugula added inline comments.Aug 14 2020, 10:13 PM

mlir/include/mlir/Dialect/Affine/IR/AffineValueMap.h
77	This doesn't describe what this method does! (but only its return status).
mlir/test/Dialect/Affine/affine-parallel-normalize.mlir
22 ↗	(On Diff #283734)	Nit: bounds -> bound or bounds'

bondhugula added inline comments.Aug 14 2020, 10:19 PM

mlir/lib/Dialect/Affine/IR/AffineOps.cpp

2516

Nit: avoid auto.

mlir/lib/Dialect/Affine/Transforms/AffineParallelSimplify.cpp

38 ↗

(On Diff #283394)

The method is called mlir::applyOpPatternsAndFold and is used in these five files:

lib/Dialect/Affine/Utils/Utils.cpp
lib/Dialect/Affine/Transforms/AffineDataCopyGeneration.cpp
lib/Dialect/Affine/Transforms/SimplifyAffineStructures.cpp
lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
lib/Transforms/Utils/LoopUtils.cpp

Thanks for addressing the comments, Frank! LGTM. I'll leave the final approval to Uday and River since I'm OOO next week.

What's left is the folder for affine.parallel (very similar to the one for affine.for) and the separate AffineParallelNormalizePass. The other patterns will be refactored as separate passes that we might submit here at a later time.

Actually, having them in a separate pass could be a good idea since removing a loop (or iteration space dim), even if it has 1 iteration, is somehow dropping high level information that can be useful or even expected by some passes. I would actually have problems if we removed 1-iteration loops or dimension too early in the pipeline. Maybe a LoopSimplifyPass could gather some of these loop optimization so that we have more control over them.

Unblocking the review from my side. OOO next week.

bondhugula added inline comments.Aug 18 2020, 4:49 AM

mlir/test/Dialect/Affine/affine-parallel-normalize.mlir
1 ↗	(On Diff #285729)	Are you sure you want to run `-canonicalize` here? This would make it an integration test and in many cases may make it hard/non-trivial to maintain test cases due to changes to `-canonicalize`. I understand it makes it much easier/intuitive for you to write the CHECK lines - more compact IR but in general combining multiple passes for for something that is meant to test just a specific pass is discouraged. Actually, if you really want to run the canonicalizer always after this, you can run it from inside the pass and just drop this `-canonicalize`. But that still gives one a combined effect. What do you think?

bondhugula added inline comments.Aug 18 2020, 4:50 AM

mlir/lib/Dialect/Affine/Transforms/AffineParallelNormalize.cpp
23 ↗	(On Diff #285729)	Could you add a line on whether it's always guaranteed to succeed or when it might fail.

bondhugula requested changes to this revision.Aug 18 2020, 4:57 AM

bondhugula added inline comments.

mlir/lib/Dialect/Affine/Transforms/AffineParallelNormalize.cpp
59–61 ↗	(On Diff #285729)	This looks like a bug. Shouldn't this be ceilDiv?
mlir/test/Dialect/Affine/affine-parallel-normalize.mlir
9 ↗	(On Diff #285729)	Shouldn't there be 4 iterations in the outer loop here?

This revision now requires changes to proceed.Aug 18 2020, 4:57 AM

In D84998#2218985, @rriddle wrote:

Hey Frank, didn't mean for this to force you to scale back the patch. If it helps I can implement the pattern rewriter hooks that you would need. Just let me know.

River

No worries River, thanks for the offer. I think we're still exploring the design space here with regards to being able to configure canonicalizations. Perhaps it's just a bad idea and I need to rethink it. I'm happy to back away and take more time to think thru a better solution so that we don't paint ourselves into a corner.

All that said, it would be helpful to see your vision for updating the default implementation of PatternRewriter, it wasn't clear to me how big of a refactor you were looking for, so I was trying to do the most minimal thing. I think there's a separate issue which is how to prevent authors of pattern rewriters from breaking the rules; it seems very easy to do so right now and at present we see no errors or other indications that this has happened.

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
2516	Seems like this one improves readability, because otherwise it's the lengthy: `AffineParallelOp::operand_range`
mlir/lib/Dialect/Affine/Transforms/AffineParallelNormalize.cpp
59–61 ↗	(On Diff #285729)	Good catch!
mlir/test/Dialect/Affine/affine-parallel-normalize.mlir
1 ↗	(On Diff #285729)	I've dropped the test that relies on canonicalization because it's true that as a unit test, we don't need to also test the combined capability. For context, this pass was originally written as a canonicalization pattern. I think a further refinement would be to provide a way to call the transformation directly without going thru a pass to allow callers to decide whether they want to combine canonicalization with normalization. The combination is what I require in my current use cases. But as a general purpose library I see the value in not restricting other users to that particular use case.
9 ↗	(On Diff #285729)	Yes! Thanks for catching this!

flaub updated this revision to Diff 286470.Aug 18 2020, 8:52 PM

bondhugula added inline comments.Aug 18 2020, 11:29 PM

mlir/lib/Dialect/Affine/Transforms/AffineParallelNormalize.cpp
37 ↗	(On Diff #286470)	`steps` should all be stored in int64_t for consistency.
39 ↗	(On Diff #286470)	A comment here and/or one for the `isWorkPending \|= ...` line in the body.
40 ↗	(On Diff #286470)	auto -> int64_t

Thanks for addressing something. Looks good to me. Please take care of the remaining minor comments for doc.

mlir/test/Dialect/Affine/affine-parallel-normalize.mlir
3 ↗	(On Diff #286470)	Nit: Set -> Normalize

This revision is now accepted and ready to land.Aug 18 2020, 11:33 PM

flaub updated this revision to Diff 286904.Aug 20 2020, 3:15 PM

flaub marked 4 inline comments as done.

Closed by commit rGcca3f3dd2681: [MLIR] Add affine.parallel folder and normalizer (authored by flaub). · Explain WhyAug 20 2020, 3:24 PM

This revision was automatically updated to reflect the committed changes.

flaub added a commit: rGcca3f3dd2681: [MLIR] Add affine.parallel folder and normalizer.

Thanks for the review everyone, I've made tiny tweaks after further feedback. This includes extracting the core transformation of normalization into a separate utility function (in case other users want to integrate this transformation in a different pass), and also adding a helper method AffineParallelOp::getSteps().

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Affine/

IR/

AffineOps.td

19 lines

AffineValueMap.h

3 lines

lib/

Dialect/

Affine/

IR/

AffineOps.cpp

255 lines

test/

Dialect/

Affine/

affine-fold.mlir

102 lines

Diff 282105

mlir/include/mlir/Dialect/Affine/IR/AffineOps.td

Show First 20 Lines • Show All 615 Lines • ▼ Show 20 Lines	OpBuilder<"OpBuilder &builder, OperationState &result, "
"AffineMap ubMap, ValueRange ubArgs, "		"AffineMap ubMap, ValueRange ubArgs, "
"ArrayRef<int64_t> steps">		"ArrayRef<int64_t> steps">
];		];

let extraClassDeclaration = [{		let extraClassDeclaration = [{
/// Get the number of dimensions.		/// Get the number of dimensions.
unsigned getNumDims();		unsigned getNumDims();

operand_range getLowerBoundsOperands();
operand_range getUpperBoundsOperands();

AffineValueMap getLowerBoundsValueMap();
AffineValueMap getUpperBoundsValueMap();
AffineValueMap getRangesValueMap();		AffineValueMap getRangesValueMap();

/// Get ranges as constants, may fail in dynamic case.		/// Get ranges as constants, may fail in dynamic case.
Optional<SmallVector<int64_t, 8>> getConstantRanges();		Optional<SmallVector<int64_t, 8>> getConstantRanges();

Block *getBody();		Block *getBody();
OpBuilder getBodyBuilder();		OpBuilder getBodyBuilder();
MutableArrayRef<BlockArgument> getIVs() {		MutableArrayRef<BlockArgument> getIVs() {
return getBody()->getArguments();		return getBody()->getArguments();
}		}

		operand_range getLowerBoundsOperands();
		AffineValueMap getLowerBoundsValueMap();
		void setLowerBounds(ValueRange operands, AffineMap map);
		void setLowerBoundsMap(AffineMap map);

		operand_range getUpperBoundsOperands();
		AffineValueMap getUpperBoundsValueMap();
		void setUpperBounds(ValueRange operands, AffineMap map);
		void setUpperBoundsMap(AffineMap map);

void setSteps(ArrayRef<int64_t> newSteps);		void setSteps(ArrayRef<int64_t> newSteps);

static StringRef getReductionsAttrName() { return "reductions"; }		static StringRef getReductionsAttrName() { return "reductions"; }
static StringRef getLowerBoundsMapAttrName() { return "lowerBoundsMap"; }		static StringRef getLowerBoundsMapAttrName() { return "lowerBoundsMap"; }
static StringRef getUpperBoundsMapAttrName() { return "upperBoundsMap"; }		static StringRef getUpperBoundsMapAttrName() { return "upperBoundsMap"; }
static StringRef getStepsAttrName() { return "steps"; }		static StringRef getStepsAttrName() { return "steps"; }
}];		}];

		let hasCanonicalizer = 1;
		let hasFolder = 1;
}		}

def AffinePrefetchOp : Affine_Op<"prefetch"> {		def AffinePrefetchOp : Affine_Op<"prefetch"> {
let summary = "affine prefetch operation";		let summary = "affine prefetch operation";
let description = [{		let description = [{
The "affine.prefetch" op prefetches data from a memref location described		The "affine.prefetch" op prefetches data from a memref location described
with an affine subscript similar to affine.load, and has three attributes:		with an affine subscript similar to affine.load, and has three attributes:
a read/write specifier, a locality hint, and a cache type specifier as shown		a read/write specifier, a locality hint, and a cache type specifier as shown
▲ Show 20 Lines • Show All 268 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Affine/IR/AffineValueMap.h

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	public:
inline unsigned getNumDims() const { return map.getNumDims(); }		inline unsigned getNumDims() const { return map.getNumDims(); }
inline unsigned getNumSymbols() const { return map.getNumSymbols(); }		inline unsigned getNumSymbols() const { return map.getNumSymbols(); }
inline unsigned getNumResults() const { return map.getNumResults(); }		inline unsigned getNumResults() const { return map.getNumResults(); }

Value getOperand(unsigned i) const;		Value getOperand(unsigned i) const;
ArrayRef<Value> getOperands() const;		ArrayRef<Value> getOperands() const;
AffineMap getAffineMap() const;		AffineMap getAffineMap() const;

		/// Return success if the map and/or operands have been modified.
		bondhugulaUnsubmitted Done Reply Inline Actions This doesn't describe what this method does! (but only its return status). bondhugula: This doesn't describe what this method does! (but only its return status).
		LogicalResult canonicalize();

private:		private:
// A mutable affine map.		// A mutable affine map.
MutableAffineMap map;		MutableAffineMap map;

// TODO: make these trailing objects?		// TODO: make these trailing objects?
/// The SSA operands binding to the dim's and symbols of 'map'.		/// The SSA operands binding to the dim's and symbols of 'map'.
SmallVector<Value, 4> operands;		SmallVector<Value, 4> operands;
/// The SSA results binding to the results of 'map'.		/// The SSA results binding to the results of 'map'.
SmallVector<Value, 4> results;		SmallVector<Value, 4> results;
};		};

} // namespace mlir		} // namespace mlir

#endif // MLIR_DIALECT_AFFINE_IR_AFFINEVALUEMAP_H		#endif // MLIR_DIALECT_AFFINE_IR_AFFINEVALUEMAP_H

mlir/lib/Dialect/Affine/IR/AffineOps.cpp

Show All 11 Lines
#include "mlir/IR/Function.h"		#include "mlir/IR/Function.h"
#include "mlir/IR/IntegerSet.h"		#include "mlir/IR/IntegerSet.h"
#include "mlir/IR/Matchers.h"		#include "mlir/IR/Matchers.h"
#include "mlir/IR/OpImplementation.h"		#include "mlir/IR/OpImplementation.h"
#include "mlir/IR/PatternMatch.h"		#include "mlir/IR/PatternMatch.h"
#include "mlir/Transforms/InliningUtils.h"		#include "mlir/Transforms/InliningUtils.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallBitVector.h"		#include "llvm/ADT/SmallBitVector.h"
		#include "llvm/ADT/SmallPtrSet.h"
		#include "llvm/ADT/StringSet.h"
#include "llvm/ADT/TypeSwitch.h"		#include "llvm/ADT/TypeSwitch.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"

using namespace mlir;		using namespace mlir;
using llvm::dbgs;		using llvm::dbgs;

#define DEBUG_TYPE "affine-analysis"		#define DEBUG_TYPE "affine-analysis"

▲ Show 20 Lines • Show All 2,473 Lines • ▼ Show 20 Lines
}		}

Block *AffineParallelOp::getBody() { return &region().front(); }		Block *AffineParallelOp::getBody() { return &region().front(); }

OpBuilder AffineParallelOp::getBodyBuilder() {		OpBuilder AffineParallelOp::getBodyBuilder() {
return OpBuilder(getBody(), std::prev(getBody()->end()));		return OpBuilder(getBody(), std::prev(getBody()->end()));
}		}

		void AffineParallelOp::setLowerBounds(ValueRange lbOperands, AffineMap map) {
		assert(lbOperands.size() == map.getNumInputs());
		assert(map.getNumResults() >= 1 && "bounds map has at least one result");

		auto ubOperands = getUpperBoundsOperands();

		bondhugulaUnsubmitted Not Done Reply Inline Actions Nit: avoid `auto`. bondhugula: Nit: avoid `auto`.
		flaubAuthorUnsubmitted Done Reply Inline Actions Seems like this one improves readability, because otherwise it's the lengthy: `AffineParallelOp::operand_range` flaub: Seems like this one improves readability, because otherwise it's the lengthy: `AffineParallelOp…
		SmallVector<Value, 4> newOperands(lbOperands);
		newOperands.append(ubOperands.begin(), ubOperands.end());
		getOperation()->setOperands(newOperands);

		setAttr(getLowerBoundsMapAttrName(), AffineMapAttr::get(map));
		}

		void AffineParallelOp::setUpperBounds(ValueRange ubOperands, AffineMap map) {
		assert(ubOperands.size() == map.getNumInputs());
		bondhugulaUnsubmitted Done Reply Inline Actions assert message please. bondhugula: assert message please.
		assert(map.getNumResults() >= 1 && "bounds map has at least one result");

		SmallVector<Value, 4> newOperands(getLowerBoundsOperands());
		newOperands.append(ubOperands.begin(), ubOperands.end());
		getOperation()->setOperands(newOperands);

		setAttr(getUpperBoundsMapAttrName(), AffineMapAttr::get(map));
		}

		void AffineParallelOp::setLowerBoundsMap(AffineMap map) {
		auto lbMap = lowerBoundsMap();
		assert(lbMap.getNumDims() == map.getNumDims() &&
		lbMap.getNumSymbols() == map.getNumSymbols());
		(void)lbMap;
		setAttr(getLowerBoundsMapAttrName(), AffineMapAttr::get(map));
		}

		void AffineParallelOp::setUpperBoundsMap(AffineMap map) {
		auto ubMap = upperBoundsMap();
		assert(ubMap.getNumDims() == map.getNumDims() &&
		ubMap.getNumSymbols() == map.getNumSymbols());
		(void)ubMap;
		setAttr(getUpperBoundsMapAttrName(), AffineMapAttr::get(map));
		}

void AffineParallelOp::setSteps(ArrayRef<int64_t> newSteps) {		void AffineParallelOp::setSteps(ArrayRef<int64_t> newSteps) {
assert(newSteps.size() == getNumDims() && "steps & num dims mismatch");
setAttr(getStepsAttrName(), getBodyBuilder().getI64ArrayAttr(newSteps));		setAttr(getStepsAttrName(), getBodyBuilder().getI64ArrayAttr(newSteps));
}		}

static LogicalResult verify(AffineParallelOp op) {		static LogicalResult verify(AffineParallelOp op) {
auto numDims = op.getNumDims();		auto numDims = op.getNumDims();
if (op.lowerBoundsMap().getNumResults() != numDims \|\|		if (op.lowerBoundsMap().getNumResults() != numDims \|\|
op.upperBoundsMap().getNumResults() != numDims \|\|		op.upperBoundsMap().getNumResults() != numDims \|\|
op.steps().size() != numDims \|\|		op.steps().size() != numDims \|\|
Show All 18 Lines	if (failed(verifyDimAndSymbolIdentifiers(op, op.getLowerBoundsOperands(),
return failure();		return failure();
/// Upper bounds.		/// Upper bounds.
if (failed(verifyDimAndSymbolIdentifiers(op, op.getUpperBoundsOperands(),		if (failed(verifyDimAndSymbolIdentifiers(op, op.getUpperBoundsOperands(),
op.upperBoundsMap().getNumDims())))		op.upperBoundsMap().getNumDims())))
return failure();		return failure();
return success();		return success();
}		}

		namespace {
		/// This pattern removes affine.parallel ops with no induction variables.
		struct AffineParallelRank0LoopRemover
		: public OpRewritePattern<AffineParallelOp> {
		using OpRewritePattern<AffineParallelOp>::OpRewritePattern;

		LogicalResult matchAndRewrite(AffineParallelOp op,
		PatternRewriter &rewriter) const override {
		// Check that there are no induction variables
		if (op.getNumDims())
		return failure();

		// Only remove ops that don't have any custom attributes (i.e. those not
		// defined by the op itself).
		dcaballeUnsubmitted Not Done Reply Inline Actions I don't understand this condition. Why don't remove a 0-rank loop if it has a custom attribute? dcaballe: I don't understand this condition. Why don't remove a 0-rank loop if it has a custom attribute?
		flaubAuthorUnsubmitted Done Reply Inline Actions We have cases where we'd like to avoid removing affine.parallel ops even when they have no IVs where the op represents something structural, like it might represent the outermost loop which we might want to do kernel outlining on. This was an attempt to prevent this particular canonicalization from running in this case. We've been 'tagging' ops in our pipeline and this was an easy way to control canonicalization. This isn't the most elegant or general solution so we're open to something better here. flaub: We have cases where we'd like to avoid removing affine.parallel ops even when they have no IVs…
		dcaballeUnsubmitted Not Done Reply Inline Actions One option could be to convert your special rank-0 affine.parallel to some other region op that properly models what you want (kernel outlining, in this case) before the canonicalization (or probably use that region op from the beginning? I'm missing how you get to this rank-0 affine.parallel scenario). Another option could be to have a dedicated attribute to prevent the canonicalization. I think I would lean towards the first option since we wouldn't be overloading rank-0 affine.parallel constructs with special semantics. Probably @rriddle, @ftynse, @bondhugula could also help here. dcaballe: One option could be to convert your special rank-0 affine.parallel to some other region op that…
		bondhugulaUnsubmitted Not Done Reply Inline Actions This looks like a hack / substitute due to a missing abstraction/region op. In fact, std.execute_region exactly model scenarios like this among several others. (There is the affine.execute_region as well but you won't need it here since it starts a new affine scope at that point.) Perhaps you may want to leave it the way that's simplest for this revision and put a TODO for the need of a more suitable abstraction? bondhugula: This looks like a hack / substitute due to a missing abstraction/region op. In fact, std.
		flaubAuthorUnsubmitted Done Reply Inline Actions IIUC, currently kernel outlining is based on `affine.for` and `scf.parallel` ops. Our plan was to lower `affine.parallel` to `scf.parallel` and then use the SCFToGPU pass and then eventually perform kernel outlining. If we were to canonicalize away this empty `affine.parallel`, then any inner `affine.parallel` would be elevated up one level, which would mean kernel outlining would be working on the wrong level. Our model assumes that the outermost `affine.parallel` represents iteration of the workgroup items. It's very possible that we'd have a single kernel launch for a single workgroup item, which would be represented as `affine.parallel () = () to ()`. Regarding adding a new op, I suppose that would work, although it seems redundant. What should we call this op? It would act as a shield for this canonicalization but then we'd need a special lowering for this op into `affine.for` or `scf.parallel`. And it's not really rank-0 that have these special semantics, it's any outermost loop that has special semantics that is imposed by kernel outlining. flaub: IIUC, currently kernel outlining is based on `affine.for` and `scf.parallel` ops. Our plan was…
		flaubAuthorUnsubmitted Done Reply Inline Actions Agreed this is a hack, looking for better alternatives, but it does practically work for now. I've been looking for the `std.execute_region` and `affine.execute_region` but can't find it. Was it a proposal or perhaps renamed? flaub: Agreed this is a hack, looking for better alternatives, but it does practically work for now.
		dcaballeUnsubmitted Not Done Reply Inline Actions To keep this moving while working on a better abstraction and since you are already overusing affine.parallel, would it be possible to remove the check below so that any affine.parallel is optimized and then add some fake bounds to your special affine.parallel so that it's not optimized away? Another option would be to define a specific attribute that we can use to prevent the optimization. We could check for that attribute only. WDYT? dcaballe: To keep this moving while working on a better abstraction and since you are already overusing…
		bondhugulaUnsubmitted Done Reply Inline Actions The execute_region ops are still pending - the corresponding revisions are dormant on differential. Although there are no unresolved issues w.r.t std.execute_region, I didn't finish it up for the lack of immediate use cases (an `std.yield` terminator needs to be added for it). bondhugula: The execute_region ops are still pending - the corresponding revisions are dormant on…
		StringSet<> opAttrs{AffineParallelOp::getReductionsAttrName(),
		bondhugulaUnsubmitted Not Done Reply Inline Actions The comment needs another line to explain the rationale. bondhugula: The comment needs another line to explain the rationale.
		AffineParallelOp::getLowerBoundsMapAttrName(),
		AffineParallelOp::getUpperBoundsMapAttrName(),
		AffineParallelOp::getStepsAttrName()};
		for (auto attr : op.getAttrs()) {
		if (!opAttrs.count(attr.first.strref()))
		return failure();
		}

		// Remove the affine.parallel wrapper, retain the body in the same location
		bondhugulaUnsubmitted Not Done Reply Inline Actions Nit: Terminate with full stop please. bondhugula: Nit: Terminate with full stop please.
		auto &parentOps = rewriter.getInsertionBlock()->getOperations();
		auto &parallelBodyOps = op.region().front().getOperations();
		auto yield = mlir::cast<AffineYieldOp>(std::prev(parallelBodyOps.end()));
		for (auto it : zip(op.getResults(), yield.operands())) {
		rriddleUnsubmitted Done Reply Inline Actions Drop trivial braces here and below. rriddle: Drop trivial braces here and below.
		rriddleUnsubmitted Not Done Reply Inline Actions This seems like it should just be `rewriter.replaceOp(op, yield.operands());`. rriddle: This seems like it should just be `rewriter.replaceOp(op, yield.operands());`.
		std::get<0>(it).replaceAllUsesWith(std::get<1>(it));
		}
		parentOps.splice(mlir::Block::iterator(op), parallelBodyOps,
		parallelBodyOps.begin(), std::prev(parallelBodyOps.end()));
		rewriter.eraseOp(op);
		return success();
		rriddleUnsubmitted Not Done Reply Inline Actions Drop the mlir:: rriddle: Drop the mlir::
		}
		};
		rriddleUnsubmitted Not Done Reply Inline Actions Why is this hook necessary? Seems like you should be using `rewriter.replaceOp(op, yield.operands())`instead of this loop and eraseOp. rriddle: Why is this hook necessary? Seems like you should be using `rewriter.replaceOp(op, yield.

		rriddleUnsubmitted Not Done Reply Inline Actions This would also need to go through the rewriter. rriddle: This would also need to go through the rewriter.
		/// This pattern removes indexes that go over an empty range.
		struct AffineParallelTripCount1IndexRemover
		: public OpRewritePattern<AffineParallelOp> {
		using OpRewritePattern<AffineParallelOp>::OpRewritePattern;

		LogicalResult matchAndRewrite(AffineParallelOp op,
		PatternRewriter &rewriter) const override {
		auto ranges = op.getRangesValueMap();
		auto body = op.getBody();
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: 'auto body' can be declared as 'auto body' [llvm-qualified-auto] not useful Lint: Pre-merge checks:* clang-tidy: warning: 'auto body' can be declared as 'auto *body' [llvm-qualified-auto] [[https…
		SmallVector<AffineExpr, 6> newLowerBounds;
		SmallVector<AffineExpr, 6> newUpperBounds;
		SmallVector<int64_t, 6> newSteps;
		SmallVector<BlockArgument, 6> argsToRemove;
		for (unsigned i = 0, e = body->getNumArguments(); i < e; i++) {
		// Is the range a constant value matching the step size?
		auto constExpr = ranges.getResult(i).dyn_cast<AffineConstantExpr>();
		int64_t step = op.steps()[i].template cast<IntegerAttr>().getInt();
		if (constExpr && constExpr.getValue() == step) {
		// Mark argument for removal and replacement with 0.
		argsToRemove.push_back(body->getArgument(i));
		} else {
		bondhugulaUnsubmitted Not Done Reply Inline Actions Likewise and anywhere else. bondhugula: Likewise and anywhere else.
		// Keep argument
		newLowerBounds.push_back(op.lowerBoundsMap().getResult(i));
		newUpperBounds.push_back(op.upperBoundsMap().getResult(i));
		newSteps.push_back(step);
		}
		}

		// If no arguments need removal, return failure to match.
		if (argsToRemove.empty())
		return failure();

		// After this point, there will be no need to rollback the rewriter.
		rriddleUnsubmitted Not Done Reply Inline Actions Not sure what you mean here, the rewriter will gladly rollback successfully applied patterns. Just because a pattern was applied doesn't mean it won't get rolled back. rriddle: Not sure what you mean here, the rewriter will gladly rollback successfully applied patterns.
		flaubAuthorUnsubmitted Done Reply Inline Actions My understanding was that `arg.replaceAllUsesWith(lowerBoundValue);` was not something the rewriter could rollback. So I was attempting to determine if the transformation was legal before doing something that couldn't be recovered from. flaub: My understanding was that `arg.replaceAllUsesWith(lowerBoundValue);` was not something the…
		rriddleUnsubmitted Not Done Reply Inline Actions If something isn't expressible with the PatternRewriter, we generally just add the necessary API for that mutation. It's fine even if not all of the pattern rewrite drivers support it immediately, we just add `llvm_unreachable("Unsupported ...")` in those cases. For this case, you should be able to use the `replaceUsesOfBlockArgumentWith` method, but for some reason that is only on ConversionPatternRewriter. We should move that to the base PatternRewriter class as a new hook, at which point you could use it directly. https://github.com/llvm/llvm-project/blob/49bbb8b60e451d173c7dd42993592e8aa4d95f24/mlir/include/mlir/Transforms/DialectConversion.h#L458 The default implementation should effectively just be this inner loop: https://github.com/llvm/llvm-project/blob/49bbb8b60e451d173c7dd42993592e8aa4d95f24/mlir/lib/Transforms/DialectConversion.cpp#L889 rriddle: If something isn't expressible with the PatternRewriter, we generally just add the necessary…
		for (auto arg : argsToRemove) {
		auto argNumber = arg.getArgNumber();
		auto lowerBoundValue = rewriter.create<AffineApplyOp>(
		op.getLoc(), op.lowerBoundsMap().getSubMap({argNumber}),
		op.getLowerBoundsOperands());
		arg.replaceAllUsesWith(lowerBoundValue);
		body->eraseArgument(argNumber);
		}

		bondhugulaUnsubmitted Not Done Reply Inline Actions Nit: prefix `Map` to name. bondhugula: Nit: prefix `Map` to name.
		// Update attributes and return success
		auto newLower = AffineMap::get(op.lowerBoundsMap().getNumDims(),
		op.lowerBoundsMap().getNumSymbols(),
		rriddleUnsubmitted Not Done Reply Inline Actions This would also need to go through the rewriter. rriddle: This would also need to go through the rewriter.
		newLowerBounds, op.getContext());
		auto newUpper = AffineMap::get(op.upperBoundsMap().getNumDims(),
		op.upperBoundsMap().getNumSymbols(),
		newUpperBounds, op.getContext());
		op.setAttr(AffineParallelOp::getLowerBoundsMapAttrName(),
		rriddleUnsubmitted Not Done Reply Inline Actions All of these transformations are breaking the contract with the pattern rewriter, you can't do these in place. Some you can (setting operands, attribute) but you are required to start a root update. rriddle: All of these transformations are breaking the contract with the pattern rewriter, you can't do…
		rriddleUnsubmitted Not Done Reply Inline Actions You can use `op.attrNameAttr(newAttr)` to set a specific attribute, e.g. in this case `op. lowerBoundsMapAttr(AffineMapAttr::get(newLower))`. rriddle: You can use `op.attrNameAttr(newAttr)` to set a specific attribute, e.g. in this case `op.
		flaubAuthorUnsubmitted Done Reply Inline Actions OK, I'll try to think of a different way to do this. Do you have docs or test cases I can refer to that can teach me the right way to do this? flaub: OK, I'll try to think of a different way to do this. Do you have docs or test cases I can refer…
		rriddleUnsubmitted Not Done Reply Inline Actions Docs on rewriter API are lacking right now, but I've been working on fixing that. If you want to do a root update, you need to use the root update API: https://github.com/llvm/llvm-project/blob/49bbb8b60e451d173c7dd42993592e8aa4d95f24/mlir/include/mlir/IR/PatternMatch.h#L343 It allows for updating specific parts of an operation in-place, i.e. attributes/operations/successors, but not others(e.g. things happening in regions). rriddle: Docs on rewriter API are lacking right now, but I've been working on fixing that. If you want…
		AffineMapAttr::get(newLower));
		op.setAttr(AffineParallelOp::getUpperBoundsMapAttrName(),
		AffineMapAttr::get(newUpper));
		op.setSteps(newSteps);
		return success();
		}
		};

		struct SimplifyAffineParallel : public OpRewritePattern<AffineParallelOp> {
		bondhugulaUnsubmitted Done Reply Inline Actions Doc comment please. bondhugula: Doc comment please.
		dcaballeUnsubmitted Not Done Reply Inline Actions Doc still missing? dcaballe: Doc still missing?
		using OpRewritePattern<AffineParallelOp>::OpRewritePattern;

		LogicalResult matchAndRewrite(AffineParallelOp op,
		PatternRewriter &rewriter) const override {

		auto stepsAttrs = op.steps();
		auto lbMap = op.lowerBoundsMap();

		SmallVector<int, 8> steps;
		bool isWorkPending = false;
		for (unsigned i = 0, e = stepsAttrs.size(); i < e; ++i) {
		auto step = stepsAttrs[i].cast<IntegerAttr>().getInt();
		steps.push_back(step);
		auto lbExpr = lbMap.getResult(i).dyn_cast<AffineConstantExpr>();
		isWorkPending \|= (!lbExpr \|\| lbExpr.getValue() \|\| step != 1);
		}

		// No need to do any work if the parallel op is already simplified.
		if (!isWorkPending)
		return failure();

		auto ranges = op.getRangesValueMap();
		auto zeroExpr = rewriter.getAffineConstantExpr(0);
		rewriter.setInsertionPointToStart(op.getBody());
		SmallVector<AffineExpr, 8> lbExprs;
		SmallVector<AffineExpr, 8> ubExprs;
		for (unsigned i = 0, e = steps.size(); i < e; ++i) {
		auto step = steps[i];

		// Adjust the lower bound to be 0.
		lbExprs.push_back(zeroExpr);

		// Adjust the upper bound expression: 'range / step'
		auto ubExpr = ranges.getResult(i).floorDiv(step);
		ubExprs.push_back(ubExpr);

		// Adjust the corresponding IV: 'lb + i * step'
		auto iv = op.getBody()->getArgument(i);
		auto lbExpr = lbMap.getResult(i);
		auto nDims = lbMap.getNumDims();
		auto expr = lbExpr + rewriter.getAffineDimExpr(nDims) * step;
		auto map = AffineMap::get(/dimCount=/nDims + 1,
		/symbolCount=/lbMap.getNumSymbols(), expr);

		// Use an 'affine.apply' op that will be simplified later in subsequent
		// canonicalizations.
		auto lbOperands = op.getLowerBoundsOperands();
		auto dimOperands = lbOperands.take_front(nDims);
		auto symbolOperands = lbOperands.drop_front(nDims);
		SmallVector<Value, 8> applyOperands{dimOperands};
		applyOperands.push_back(iv);
		applyOperands.append(symbolOperands.begin(), symbolOperands.end());
		auto apply =
		rewriter.create<AffineApplyOp>(op.getLoc(), map, applyOperands);
		iv.replaceAllUsesExcept(apply, SmallPtrSet<Operation *, 1>{apply});
		}

		SmallVector<int64_t, 8> newSteps(op.getNumDims(), 1);
		op.setSteps(newSteps);
		auto newLowerMap = AffineMap::get(/dimCount=/0, /symbolCount=/0,
		lbExprs, rewriter.getContext());
		op.setLowerBounds({}, newLowerMap);
		auto newUpperMap =
		AffineMap::get(ranges.getNumDims(), ranges.getNumSymbols(), ubExprs,
		rewriter.getContext());
		op.setUpperBounds(ranges.getOperands(), newUpperMap);

		return success();
		}
		};

		} // end anonymous namespace

		LogicalResult AffineValueMap::canonicalize() {
		SmallVector<Value, 4> newOperands{operands};
		auto newMap = getAffineMap();
		composeAffineMapAndOperands(&newMap, &newOperands);
		if (newMap == getAffineMap() && newOperands == operands)
		return failure();
		reset(newMap, newOperands);
		return success();
		}

		/// Canonicalize the bounds of the given loop.
		static LogicalResult canonicalizeLoopBounds(AffineParallelOp op) {
		auto lb = op.getLowerBoundsValueMap();
		auto lbCanonicalized = succeeded(lb.canonicalize());

		auto ub = op.getUpperBoundsValueMap();
		auto ubCanonicalized = succeeded(ub.canonicalize());

		// Any canonicalization change always leads to updated map(s).
		if (!lbCanonicalized && !ubCanonicalized)
		return failure();

		if (lbCanonicalized)
		op.setLowerBounds(lb.getOperands(), lb.getAffineMap());
		if (ubCanonicalized)
		op.setUpperBounds(ub.getOperands(), ub.getAffineMap());

		return success();
		}

		void AffineParallelOp::getCanonicalizationPatterns(
		OwningRewritePatternList &results, MLIRContext *context) {
		results.insert<SimplifyAffineParallel, AffineParallelRank0LoopRemover,
		AffineParallelTripCount1IndexRemover>(context);
		}

		LogicalResult AffineParallelOp::fold(ArrayRef<Attribute> operands,
		SmallVectorImpl<OpFoldResult> &results) {
		return canonicalizeLoopBounds(*this);
		}

static void print(OpAsmPrinter &p, AffineParallelOp op) {		static void print(OpAsmPrinter &p, AffineParallelOp op) {
p << op.getOperationName() << " (" << op.getBody()->getArguments() << ") = (";		p << op.getOperationName() << " (" << op.getBody()->getArguments() << ") = (";
p.printAffineMapOfSSAIds(op.lowerBoundsMapAttr(),		p.printAffineMapOfSSAIds(op.lowerBoundsMapAttr(),
op.getLowerBoundsOperands());		op.getLowerBoundsOperands());
p << ") to (";		p << ") to (";
p.printAffineMapOfSSAIds(op.upperBoundsMapAttr(),		p.printAffineMapOfSSAIds(op.upperBoundsMapAttr(),
op.getUpperBoundsOperands());		op.getUpperBoundsOperands());
p << ')';		p << ')';
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	for (const auto &result : stepsMap.getResults()) {
"steps must be constant integers");		"steps must be constant integers");
steps.push_back(constExpr.getValue());		steps.push_back(constExpr.getValue());
}		}
result.addAttribute(AffineParallelOp::getStepsAttrName(),		result.addAttribute(AffineParallelOp::getStepsAttrName(),
builder.getI64ArrayAttr(steps));		builder.getI64ArrayAttr(steps));
}		}

// Parse optional clause of the form: `reduce ("addf", "maxf")`, where the		// Parse optional clause of the form: `reduce ("addf", "maxf")`, where the
// quoted strings a member of the enum AtomicRMWKind.		// quoted strings are a member of the enum AtomicRMWKind.
SmallVector<Attribute, 4> reductions;		SmallVector<Attribute, 4> reductions;
if (succeeded(parser.parseOptionalKeyword("reduce"))) {		if (succeeded(parser.parseOptionalKeyword("reduce"))) {
if (parser.parseLParen())		if (parser.parseLParen())
return failure();		return failure();
do {		do {
// Parse a single quoted string via the attribute parsing, and then		// Parse a single quoted string via the attribute parsing, and then
// verify it is a member of the enum and convert to it's integer		// verify it is a member of the enum and convert to it's integer
// representation.		// representation.
▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

mlir/test/Dialect/Affine/affine-fold.mlir

This file was added.

				// RUN: mlir-opt -canonicalize -split-input-file %s \| FileCheck %s

				// CHECK: func @affine_parallel_rank0
				bondhugulaUnsubmitted Done Reply Inline Actions Please add a couple of lines here on what these test cases are testing. bondhugula: Please add a couple of lines here on what these test cases are testing.
				func @affine_parallel_rank0(%out: memref<f32>) {
				// CHECK-NEXT: constant
				%cst = constant 0.0 : f32
				// CHECK-NEXT: affine.store
				affine.parallel () = () to () {
				affine.parallel () = () to () {
				affine.store %cst, %out[] : memref<f32>
				}
				}
				return
				}

				// -----

				// CHECK-LABEL: func @affine_parallel_range1
				func @affine_parallel_range1() {
				// CHECK-NEXT: constant
				%cst = constant 1.0 : f32
				// CHECK-NEXT: alloc
				%0 = alloc() : memref<2x4xf32>
				// CHECK-NEXT: affine.store
				dcaballeUnsubmitted Not Done Reply Inline Actions no iteration -> only one iteration? dcaballe: no iteration -> only one iteration?
				flaubAuthorUnsubmitted Done Reply Inline Actions Actually I meant to say when no IVs exist. But yes, that's right. flaub: Actually I meant to say when no IVs exist. But yes, that's right.
				affine.parallel (%i, %j) = (0, 1) to (2, 2) step (2, 1) {
				affine.store %cst, %0[%i, %j] : memref<2x4xf32>
				}
				// CHECK-NEXT: return
				return
				}

				// -----

				// CHECK-LABEL: func @affine_parallel_partial_range1
				func @affine_parallel_partial_range1() {
				// CHECK-NEXT: constant
				%cst = constant 1.0 : f32
				// CHECK-NEXT: alloc
				%0 = alloc() : memref<2x4xf32>
				// CHECK-NEXT: affine.parallel (%{{.*}}) = (0) to (10)
				affine.parallel (%i, %j) = (0, 1) to (10, 2) {
				// CHECK-NEXT: affine.store %{{.}}, %{{.}}[%{{.*}}, 1]
				affine.store %cst, %0[%i, %j] : memref<2x4xf32>
				}
				// CHECK: return
				return
				}

				// -----

				// CHECK-LABEL: func @simplify_parallel
				func @simplify_parallel() {
				%cst = constant 1.0 : f32
				%0 = alloc() : memref<2x4xf32>
				// CHECK: affine.parallel (%[[i:.]], %[[j:.]]) = (0, 0) to (3, 2) {
				affine.parallel (%i, %j) = (0, 1) to (10, 5) step (3, 2) {
				// CHECK: affine.parallel (%[[k:.]]) = (0) to (%[[j]] 2 - %[[i]] * 3 + 1) {
				affine.parallel (%k) = (%i) to (%j) {
				// CHECK: affine.store %{{.}}, %{{.}}[%[[i]] * 3, %[[i]] * 3 + %[[k]]] : memref<2x4xf32>
				affine.store %cst, %0[%i, %k] : memref<2x4xf32>
				}
				}
				return
				}

				// -----

				// CHECK-LABEL: func @affine_parallel_const_bounds
				func @affine_parallel_const_bounds() {
				%cst = constant 1.0 : f32
				%c0 = constant 0 : index
				%c4 = constant 4 : index
				%0 = alloc() : memref<4xf32>
				// CHECK: affine.parallel (%{{.*}}) = (0) to (4)
				affine.parallel (%i) = (%c0) to (%c0 + %c4) {
				affine.store %cst, %0[%i] : memref<4xf32>
				}
				return
				}

				// -----

				#map0 = affine_map<(d0) -> (d0 * 5)>
				#map1 = affine_map<(d0) -> (d0 * 10)>

				// CHECK-LABEL: func @affine_parallel_fold_bounds
				func @affine_parallel_fold_bounds() {
				%cst = constant 1.0 : f32
				%0 = alloc() : memref<100x100xf32>
				// CHECK: affine.parallel (%[[i0:.]], %[[j0:.]]) =
				affine.parallel (%i0, %j0) = (0, 0) to (100, 10) {
				%2 = affine.apply #map0(%i0)
				%3 = affine.apply #map1(%j0)
				// CHECK-NOT: affine.apply
				// CHECK: affine.parallel (%[[i1:.]], %[[j1:.]]) = (0, 0) to (5, 10) {
				affine.parallel (%i1, %j1) = (%2, %3) to (%2 + 5, %3 + 10) {
				// CHECK: affine.store %{{.}}, %{{.}}[%[[i0]] * 5 + %[[i1]], %[[j0]] * 10 + %[[j1]]]
				affine.store %cst, %0[%i1, %j1] : memref<100x100xf32>
				}
				}
				return
				}

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Add affine.parallel folder and AffineParallelNormalizePassClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 282105

mlir/include/mlir/Dialect/Affine/IR/AffineOps.td

mlir/include/mlir/Dialect/Affine/IR/AffineValueMap.h

mlir/lib/Dialect/Affine/IR/AffineOps.cpp

mlir/test/Dialect/Affine/affine-fold.mlir

[MLIR] Add affine.parallel folder and AffineParallelNormalizePass
ClosedPublic