This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Analysis/
-
AffineAnalysis.h
10/10
Utils.h
-
Dialect/Affine/
-
Affine/
-
Passes.td
-
Utils.h
-
lib/
-
Analysis/
6/6
AffineAnalysis.cpp
2/2
Utils.cpp
-
Dialect/Affine/
-
Affine/
-
Transforms/
1/1
AffineParallelize.cpp
-
SuperVectorize.cpp
-
Utils/
-
CMakeLists.txt
21/21
Utils.cpp
-
test/Dialect/Affine/
-
Dialect/
-
Affine/
4/4
parallelize.mlir

Differential D101171

[mlir] Affine: parallelize affine loops with reductions
ClosedPublic

Authored by ftynse on Apr 23 2021, 8:54 AM.

Download Raw Diff

Details

Reviewers

wsmoses
chelini
kumasento
nicolasvasilache
bondhugula
aartbik

Commits

rG545fa37834ef: [mlir] Affine: parallelize affine loops with reductions

Summary

Introduce a basic support for parallelizing affine loops with reductions
expressed using iteration arguments. Affine parallelism detector now has a flag
to assume such reductions are parallel. The transformation handles a subset of
parallel reductions that are can be expressed using affine.parallel:
integer/float addition and multiplication. This requires to detect the
reduction operation since affine.parallel only supports a fixed set of
reduction operators.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ftynse created this revision.Apr 23 2021, 8:54 AM

Herald added subscribers: dcaballe, cota, teijeong and 17 others. · View Herald TranscriptApr 23 2021, 8:55 AM

ftynse requested review of this revision.Apr 23 2021, 8:55 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptApr 23 2021, 8:55 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

ftynse added a child revision: D101172: [mlir] support max/min lower/upper bounds in affine.parallel.Apr 23 2021, 8:55 AM

Harbormaster completed remote builds in B100589: Diff 340058.Apr 23 2021, 10:34 AM

chelini accepted this revision.Apr 23 2021, 11:19 AM

chelini added inline comments.

mlir/lib/Dialect/Affine/Utils/Utils.cpp
143	Here we expect all the iterators to be valid reductions. I would make this explicit using a comment.
166	I would remove braces.
169	Same here.
215	It may be not clear what this "1" means. Can we add a comment like: /* induction var */ or use an explicit method like `getNumInductionVars()` ?

This revision is now accepted and ready to land.Apr 23 2021, 11:19 AM

LGTM!

mlir/lib/Dialect/Affine/Utils/Utils.cpp
166	Just wondering is it possible that both operands can be iter_args?

Great to see this functionality! Some comments.

mlir/include/mlir/Analysis/Utils.h
356–357	Update doc comment please to capture the new argument.
mlir/lib/Dialect/Affine/Utils/Utils.cpp
162–163	This check can be moved up and the `isa_and_nonull` check and the `getReductionOperationKind` can be unified?
203–207	assert on reductionOp not being null?
mlir/test/Dialect/Affine/parallelize.mlir
169–170	Better to match the whole `affine.parallel` op itself for clarity at least in this one case?

This revision now requires changes to proceed.Apr 25 2021, 12:06 AM

Address review.

ftynse added inline comments.Apr 26 2021, 3:03 AM

mlir/lib/Dialect/Affine/Utils/Utils.cpp
143	How do you suggest to parallelize reduction loops where only some reductions are parallelizable? Sounds impossible to me, but still added a comment.
166	It is possible in the IR, but wouldn't pass the single-use check above.
215	That's why there is a comment above saying "erase the block arguments that correspond to reductions". The `/name=/` comments are used to indicate the argument names of the callee, `llvm::seq` in this case, so the correct name would be `/Begin=/`. The linter will justifiably complain about any other comment because it only makes code more confusing for whoever familiar with the style.

Harbormaster completed remote builds in B100887: Diff 340465.Apr 26 2021, 3:41 AM

chelini added inline comments.Apr 26 2021, 11:33 AM

mlir/lib/Dialect/Affine/Utils/Utils.cpp
143	I don't have anything in particular in mind. I suggested adding a comment only to clarify when the function does not parallelize reductions.

bondhugula added inline comments.Apr 26 2021, 12:22 PM

mlir/lib/Dialect/Affine/Utils/Utils.cpp
137–138	This part of the comment is now stale I think. There's nothing to keep in sync.
156	Nit: A comment here perhaps saying AtomicRMWKind supports additional reduction kinds that we aren't detecting.
163–166	This check doesn't appear to be enough unless I missed something. You'll need to check this "value being reduced" isn't itself an iter_arg or a function of the iter_arg; furthermore, you'll also need to check that the result of this `definition` is being yielded at the right position.

bondhugula requested changes to this revision.Apr 26 2021, 7:33 PM

This revision now requires changes to proceed.Apr 26 2021, 7:33 PM

This functionality overlaps with my reduction vectorization patch https://reviews.llvm.org/D100694
We can merge both (our changes to the isLoopParallel function are essentially the same), but having two separate reduction recognizers would be strange, so maybe we should discuss if there is a way to converge them.

Address more review.

Herald added a subscriber: mgorny. · View Herald TranscriptApr 27 2021, 9:57 AM

ftynse added inline comments.Apr 27 2021, 9:57 AM

mlir/lib/Dialect/Affine/Utils/Utils.cpp
163–166	You'll need to check this "value being reduced" isn't itself an iter_arg or a function of the iter_arg; Most of the cases were actually filtered out because we need all iter args to be reductions to transform them, but there are indeed weird cases that need to be handled. furthermore, you'll also need to check that the result of this definition is being yielded at the right position. `definition` is the defining op of the `pos`-th yield operand...

@sgrechanik I am happy to use your version when it lands, or have this code updated if it lands first.

Harbormaster completed remote builds in B101204: Diff 340890.Apr 27 2021, 11:11 AM

This looks good to me. Please take a look at the two more comments below.

mlir/lib/Dialect/Affine/Utils/Utils.cpp
163–166	Sounds good to me. But it'd be good to add a comment on that note that you mention - "... were actually filtered out because we need all iter args to be reductions to transform them, but there are indeed weird cases that need to be handled"
mlir/test/Dialect/Affine/parallelize.mlir
169	Could you please check the body as well as you are replacing the old op if that sounds reasonable?

This revision is now accepted and ready to land.Apr 27 2021, 6:30 PM

bondhugula requested changes to this revision.Apr 27 2021, 6:45 PM

bondhugula added inline comments.

mlir/include/mlir/Analysis/Utils.h
358	I'll have the same concern with this method as with D100694 of @sgrechanik. At this point, we don't even know whether the `forOp`'s iter_args are reduction vars - so this argument naming is weird. Also, the returned result is technically inaccurate when `reductionsAreParallel` is set to true. In fact, `ignoreIterArgs` or `ignoreIterArgDeps` as the name is less confusing FWIW the way @sgrechanik has it. If there are iter_args, can we integrate this method with the reduction detection logic itself that you have? If `reductionsAreParallel` is set to false, you can immediately return false the way you have it if there is at least one iter_arg; if it's true, then you can go ahead and do the actual reduction detection.

This revision now requires changes to proceed.Apr 27 2021, 6:45 PM

bondhugula added inline comments.Apr 27 2021, 6:49 PM

mlir/test/Dialect/Affine/parallelize.mlir
206	How about `addf %it1, %it1`? That was the one I was referring to. You could also have: %0 = addf %it1, (%A[%i]) %1 = addf %it2, %0 yield %0, %1 : f32, f32 Just making sure the weird cases (non reductions) are guarded.

Address even more review.

mlir/include/mlir/Analysis/Utils.h
358	Changed to `ignoreReductionIterArgs`. We don't ignore _all_ iter args so anything that doesn't "reduction" in the name would be actually confusing. If reductionsAreParallel is set to false, you can immediately return false the way you have it if there is at least one iter_arg; if it's true, then you can go ahead and do the actual reduction detection. This is exactly what the implementation has been doing all along.
mlir/lib/Dialect/Affine/Utils/Utils.cpp
163–166	You might have missed it, I added a full-blown check in `dependsOnIterArgs` that fails if any operand in the backward slice is an iter_arg, so this no longer depends on the filtering.
mlir/test/Dialect/Affine/parallelize.mlir
206	The first one is easily caught because there are two uses of `%it1`, not one. The second case is caught by the backward slice check I added. Added both as tests.

Harbormaster completed remote builds in B101343: Diff 341092.Apr 28 2021, 2:01 AM

Sorry, it looks like I wasn't clear or I'm still missing something.

mlir/include/mlir/Analysis/Utils.h
357	I assume you meant `set to false` here? There's no check if it's set to true.
358	Looks like I'm missing something. I only see a single line change to `isLoopParallel` - the logic you refer to is in `affineParallelize` and not `isLoopParallel` unless I'm looking at an older diff! `affineParallelize` was mostly meant to be "just perform the replacement" while I was expecting the check including the one for reductions in `isLoopParallel`.
mlir/lib/Dialect/Affine/Utils/Utils.cpp
134–145	This requires a slight rephrase. This replacement is contingent on the reduction checks you have.
138–147	It's this part that I thought should go into `isLoopParallel` it's really a reduction check. You can have it return (sa output arg) the reduced values?

sgrechanik added inline comments.Apr 28 2021, 6:13 PM

mlir/lib/Dialect/Affine/Utils/Utils.cpp
163–166	Do the weird examples that can be caught by `dependsOnIterArgs` and cannot be caught by checking `hasOneUse()` really exist? Because it seems to me that checking `hasOneUse()` on every iter arg will exclude ALL weird cases, but maybe I miss something.

ftynse marked 3 inline comments as done.Apr 29 2021, 12:38 AM

ftynse added inline comments.

mlir/include/mlir/Analysis/Utils.h
357	No, I meant what I said. If it is set to true, the parallelism check will ignore iter args that it can prove to be reductions; other iter args still make the loop non-parallel. If it is unset (or set to false if you wish), any iter arg makes loop non-parallel. Let's please not get too much into bikeshedding with names.
358	Yes, you are missing something. Probably because Phabricator only shows you the diff between the latest version and the version you reviewed previously, not the base version. Line 1275 in Analysis/Utils.cpp contains the check and it was there since the first iteration.
mlir/lib/Analysis/Utils.cpp
1273	Here's the early exit @bondhugula ^
mlir/lib/Dialect/Affine/Utils/Utils.cpp
163–166	Yes, see the `strange_butterfly` case in tests. Each iter arg is used exactly once, in the same `addf` operation that becomes the reduction. One of the operands may also be a value transitively dependent on the single use of another iter arg.

Adress the hopefully final iteration of review.

Herald added a reviewer: aartbik. · View Herald TranscriptApr 29 2021, 1:34 AM

ftynse added inline comments.Apr 29 2021, 1:34 AM

mlir/include/mlir/Analysis/Utils.h
357	This is no longer relevant as I opted for an explicit list of reductions instead.

ftynse added inline comments.Apr 29 2021, 1:35 AM

mlir/lib/Analysis/AffineAnalysis.cpp
108	Here's the early exit @bondhugula, it has always been here.
mlir/lib/Analysis/Utils.cpp
1273	This has moved to AffineAnalysis.cpp.

Harbormaster completed remote builds in B101578: Diff 341432.Apr 29 2021, 2:04 AM

LGTM. Just minor comments now.

mlir/include/mlir/Analysis/Utils.h
358	Not really. I did notice that one line change. But now I see the whole unification into `isLoopParallel` which was what I was requesting for.
mlir/lib/Analysis/AffineAnalysis.cpp
108	Yes, I knew this had always been there. This wasn't the confusion.
111–158	I assume all of this code beyond the early exit were just added in the recent iteration. This was what I was requesting for! I hope I was indeed looking at the whole diff earlier and not the last increment. The change to isLoopParallel in the previous one was just one line. Now it's integrated!
146–148	Nit: The `Inst` suffix is legacy back when they were called instructions. Just `srcOp` and `dstOp` is fine. Likewise for `Insts`.
mlir/lib/Dialect/Affine/Transforms/AffineParallelize.cpp
44	Doc comment for this one please.

This revision is now accepted and ready to land.Apr 29 2021, 3:51 AM

Fix nits.

ftynse marked an inline comment as done.Apr 29 2021, 4:16 AM

ftynse added inline comments.

mlir/include/mlir/Analysis/Utils.h
358	Now you may be able to see it because it moved to a different file.
mlir/lib/Analysis/AffineAnalysis.cpp
111–158	Part of this code (the original contents of isLoopParallel) is moved from lib/Analysis/Utils.cpp, Phabricator indicates this by a light yellow vertical bar next to the lines it can detect as moved.
146–148	I'm just moving this code from a different file and I'm not a fan of sneaking in irrelevant changes, but okay.

This revision was landed with ongoing or failed builds.Apr 29 2021, 4:16 AM

Closed by commit rG545fa37834ef: [mlir] Affine: parallelize affine loops with reductions (authored by ftynse). · Explain Why

This revision was automatically updated to reflect the committed changes.

ftynse added a commit: rG545fa37834ef: [mlir] Affine: parallelize affine loops with reductions.

Harbormaster completed remote builds in B101600: Diff 341461.Apr 29 2021, 5:04 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Analysis/

AffineAnalysis.h

20 lines

Utils.h

3 lines

Dialect/

Affine/

Passes.td

3 lines

Utils.h

9 lines

lib/

Analysis/

AffineAnalysis.cpp

128 lines

Utils.cpp

43 lines

Dialect/

Affine/

Transforms/

AffineParallelize.cpp

31 lines

SuperVectorize.cpp

4 lines

Utils/

CMakeLists.txt

1 line

Utils.cpp

57 lines

test/

Dialect/

Affine/

parallelize.mlir

63 lines

Diff 341432

mlir/include/mlir/Analysis/AffineAnalysis.h

	Show All 9 Lines
	// involving affine structures (AffineExprStorage, AffineMap, IntegerSet, etc.)			// involving affine structures (AffineExprStorage, AffineMap, IntegerSet, etc.)
	// and other IR structures that in turn use these.			// and other IR structures that in turn use these.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef MLIR_ANALYSIS_AFFINE_ANALYSIS_H			#ifndef MLIR_ANALYSIS_AFFINE_ANALYSIS_H
	#define MLIR_ANALYSIS_AFFINE_ANALYSIS_H			#define MLIR_ANALYSIS_AFFINE_ANALYSIS_H

				#include "mlir/Dialect/StandardOps/IR/Ops.h"
	#include "mlir/IR/Value.h"			#include "mlir/IR/Value.h"
	#include "llvm/ADT/Optional.h"			#include "llvm/ADT/Optional.h"
	#include "llvm/ADT/SmallVector.h"			#include "llvm/ADT/SmallVector.h"

	namespace mlir {			namespace mlir {

	class AffineApplyOp;			class AffineApplyOp;
	class AffineForOp;			class AffineForOp;
	class AffineValueMap;			class AffineValueMap;
	class FlatAffineConstraints;			class FlatAffineConstraints;
	class Operation;			class Operation;

				/// A description of a (parallelizable) reduction in an affine loop.
				struct LoopReduction {
				/// Reduction kind.
				AtomicRMWKind kind;

				/// Position of the iteration argument that acts as accumulator.
				unsigned iterArgPosition;

				/// The value being reduced.
				Value value;
				};

				/// Returns true if `forOp' is a parallel loop. If `parallelReductions` is
				/// provided, populates it with descriptors of the parallelizable reductions and
				/// treats them as not preventing parallelization.
				bool isLoopParallel(
				AffineForOp forOp,
				SmallVectorImpl<LoopReduction> *parallelReductions = nullptr);

	/// Returns in `affineApplyOps`, the sequence of those AffineApplyOp			/// Returns in `affineApplyOps`, the sequence of those AffineApplyOp
	/// Operations that are reachable via a search starting from `operands` and			/// Operations that are reachable via a search starting from `operands` and
	/// ending at those operands that are not the result of an AffineApplyOp.			/// ending at those operands that are not the result of an AffineApplyOp.
	void getReachableAffineApplyOps(ArrayRef<Value> operands,			void getReachableAffineApplyOps(ArrayRef<Value> operands,
	SmallVectorImpl<Operation *> &affineApplyOps);			SmallVectorImpl<Operation *> &affineApplyOps);

	/// Builds a system of constraints with dimensional identifiers corresponding to			/// Builds a system of constraints with dimensional identifiers corresponding to
	/// the loop IVs of the forOps and AffineIfOp's operands appearing in			/// the loop IVs of the forOps and AffineIfOp's operands appearing in
	▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

mlir/include/mlir/Analysis/Utils.h

	Show First 20 Lines • Show All 347 Lines • ▼ Show 20 Lines

	/// Returns the number of surrounding loops common to both A and B.			/// Returns the number of surrounding loops common to both A and B.
	unsigned getNumCommonSurroundingLoops(Operation &A, Operation &B);			unsigned getNumCommonSurroundingLoops(Operation &A, Operation &B);

	/// Gets the memory footprint of all data touched in the specified memory space			/// Gets the memory footprint of all data touched in the specified memory space
	/// in bytes; if the memory space is unspecified, considers all memory spaces.			/// in bytes; if the memory space is unspecified, considers all memory spaces.
	Optional<int64_t> getMemoryFootprintBytes(AffineForOp forOp,			Optional<int64_t> getMemoryFootprintBytes(AffineForOp forOp,
	int memorySpace = -1);			int memorySpace = -1);

	/// Returns true if `forOp' is a parallel loop.
	bool isLoopParallel(AffineForOp forOp);

	/// Simplify the integer set by simplifying the underlying affine expressions by			/// Simplify the integer set by simplifying the underlying affine expressions by
				bondhugulaUnsubmitted Done Reply Inline Actions Update doc comment please to capture the new argument. bondhugula: Update doc comment please to capture the new argument.
				bondhugulaUnsubmitted Done Reply Inline Actions I assume you meant `set to false` here? There's no check if it's set to true. bondhugula: I assume you meant `set to false` here? There's no check if it's set to true.
				ftynseAuthorUnsubmitted Done Reply Inline Actions No, I meant what I said. If it is set to true, the parallelism check will ignore iter args that it can prove to be reductions; other iter args still make the loop non-parallel. If it is unset (or set to false if you wish), any iter arg makes loop non-parallel. Let's please not get too much into bikeshedding with names. ftynse: No, I meant what I said. If it is set to true, the parallelism check will ignore iter args that…
				ftynseAuthorUnsubmitted Done Reply Inline Actions This is no longer relevant as I opted for an explicit list of reductions instead. ftynse: This is no longer relevant as I opted for an explicit list of reductions instead.
	/// flattening and some simple inference. Also, drop any duplicate constraints.			/// flattening and some simple inference. Also, drop any duplicate constraints.
				bondhugulaUnsubmitted Done Reply Inline Actions I'll have the same concern with this method as with D100694 of @sgrechanik. At this point, we don't even know whether the `forOp`'s iter_args are reduction vars - so this argument naming is weird. Also, the returned result is technically inaccurate when `reductionsAreParallel` is set to true. In fact, `ignoreIterArgs` or `ignoreIterArgDeps` as the name is less confusing FWIW the way @sgrechanik has it. If there are iter_args, can we integrate this method with the reduction detection logic itself that you have? If `reductionsAreParallel` is set to false, you can immediately return false the way you have it if there is at least one iter_arg; if it's true, then you can go ahead and do the actual reduction detection. bondhugula: I'll have the same concern with this method as with D100694 of @sgrechanik. At this point, we…
				ftynseAuthorUnsubmitted Done Reply Inline Actions Changed to `ignoreReductionIterArgs`. We don't ignore _all_ iter args so anything that doesn't "reduction" in the name would be actually confusing. If reductionsAreParallel is set to false, you can immediately return false the way you have it if there is at least one iter_arg; if it's true, then you can go ahead and do the actual reduction detection. This is exactly what the implementation has been doing all along. ftynse: Changed to `ignoreReductionIterArgs`. We don't ignore _all_ iter args so anything that doesn't…
				bondhugulaUnsubmitted Done Reply Inline Actions Looks like I'm missing something. I only see a single line change to `isLoopParallel` - the logic you refer to is in `affineParallelize` and not `isLoopParallel` unless I'm looking at an older diff! `affineParallelize` was mostly meant to be "just perform the replacement" while I was expecting the check including the one for reductions in `isLoopParallel`. bondhugula: Looks like I'm missing something. I only see a single line change to `isLoopParallel` - the…
				ftynseAuthorUnsubmitted Done Reply Inline Actions Yes, you are missing something. Probably because Phabricator only shows you the diff between the latest version and the version you reviewed previously, not the base version. Line 1275 in Analysis/Utils.cpp contains the check and it was there since the first iteration. ftynse: Yes, you are missing something. Probably because Phabricator only shows you the diff between…
				bondhugulaUnsubmitted Done Reply Inline Actions Not really. I did notice that one line change. But now I see the whole unification into `isLoopParallel` which was what I was requesting for. bondhugula: Not really. I did notice that one line change. But now I see the whole unification into…
				ftynseAuthorUnsubmitted Done Reply Inline Actions Now you may be able to see it because it moved to a different file. ftynse: Now you may be able to see it because it moved to a different file.
	/// Returns the simplified integer set. This method runs in time linear in the			/// Returns the simplified integer set. This method runs in time linear in the
	/// number of constraints.			/// number of constraints.
	IntegerSet simplifyIntegerSet(IntegerSet set);			IntegerSet simplifyIntegerSet(IntegerSet set);

	/// Returns the innermost common loop depth for the set of operations in 'ops'.			/// Returns the innermost common loop depth for the set of operations in 'ops'.
	unsigned getInnermostCommonLoopDepth(			unsigned getInnermostCommonLoopDepth(
	ArrayRef<Operation *> ops,			ArrayRef<Operation *> ops,
	SmallVectorImpl<AffineForOp> *surroundingLoops = nullptr);			SmallVectorImpl<AffineForOp> *surroundingLoops = nullptr);

	} // end namespace mlir			} // end namespace mlir

	#endif // MLIR_ANALYSIS_UTILS_H			#endif // MLIR_ANALYSIS_UTILS_H

mlir/include/mlir/Dialect/Affine/Passes.td

	Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines

	def AffineParallelize : FunctionPass<"affine-parallelize"> {			def AffineParallelize : FunctionPass<"affine-parallelize"> {
	let summary = "Convert affine.for ops into 1-D affine.parallel";			let summary = "Convert affine.for ops into 1-D affine.parallel";
	let constructor = "mlir::createAffineParallelizePass()";			let constructor = "mlir::createAffineParallelizePass()";
	let options = [			let options = [
	Option<"maxNested", "max-nested", "unsigned", /default=/"-1u",			Option<"maxNested", "max-nested", "unsigned", /default=/"-1u",
	"Maximum number of nested parallel loops to produce. "			"Maximum number of nested parallel loops to produce. "
	"Defaults to unlimited (UINT_MAX).">,			"Defaults to unlimited (UINT_MAX).">,
				Option<"parallelReductions", "parallel-reductions", "bool",
				/default=/"false",
				"Whether to parallelize reduction loops. Defaults to false.">
	];			];
	}			}

	def AffineLoopNormalize : FunctionPass<"affine-loop-normalize"> {			def AffineLoopNormalize : FunctionPass<"affine-loop-normalize"> {
	let summary = "Apply normalization transformations to affine loop-like ops";			let summary = "Apply normalization transformations to affine loop-like ops";
	let constructor = "mlir::createAffineLoopNormalizePass()";			let constructor = "mlir::createAffineLoopNormalizePass()";
	}			}

	def SimplifyAffineStructures : FunctionPass<"simplify-affine-structures"> {			def SimplifyAffineStructures : FunctionPass<"simplify-affine-structures"> {
	let summary = "Simplify affine expressions in maps/sets and normalize "			let summary = "Simplify affine expressions in maps/sets and normalize "
	"memrefs";			"memrefs";
	let constructor = "mlir::createSimplifyAffineStructuresPass()";			let constructor = "mlir::createSimplifyAffineStructuresPass()";
	}			}

	#endif // MLIR_DIALECT_AFFINE_PASSES			#endif // MLIR_DIALECT_AFFINE_PASSES

mlir/include/mlir/Dialect/Affine/Utils.h

	Show All 18 Lines
	#include "llvm/ADT/SmallVector.h"			#include "llvm/ADT/SmallVector.h"

	namespace mlir {			namespace mlir {

	class AffineForOp;			class AffineForOp;
	class AffineIfOp;			class AffineIfOp;
	class AffineParallelOp;			class AffineParallelOp;
	struct LogicalResult;			struct LogicalResult;
				struct LoopReduction;
	class Operation;			class Operation;

	/// Replaces parallel affine.for op with 1-d affine.parallel op.			/// Replaces parallel affine.for op with 1-d affine.parallel op.
	/// mlir::isLoopParallel detect the parallel affine.for ops.			/// mlir::isLoopParallel detects the parallel affine.for ops.
				/// Parallelizes the specified reductions. Parallelization will fail in presence
				/// of loop iteration arguments that are not listed in `parallelReductions`.
	/// There is no cost model currently used to drive this parallelization.			/// There is no cost model currently used to drive this parallelization.
	void affineParallelize(AffineForOp forOp);			LogicalResult
				affineParallelize(AffineForOp forOp,
				ArrayRef<LoopReduction> parallelReductions = {});

	/// Hoists out affine.if/else to as high as possible, i.e., past all invariant			/// Hoists out affine.if/else to as high as possible, i.e., past all invariant
	/// affine.fors/parallel's. Returns success if any hoisting happened; folded` is			/// affine.fors/parallel's. Returns success if any hoisting happened; folded` is
	/// set to true if the op was folded or erased. This hoisting could lead to			/// set to true if the op was folded or erased. This hoisting could lead to
	/// significant code expansion in some cases.			/// significant code expansion in some cases.
	LogicalResult hoistAffineIfOp(AffineIfOp ifOp, bool *folded = nullptr);			LogicalResult hoistAffineIfOp(AffineIfOp ifOp, bool *folded = nullptr);

	/// Holds parameters to perform n-D vectorization on a single loop nest.			/// Holds parameters to perform n-D vectorization on a single loop nest.
	▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

mlir/lib/Analysis/AffineAnalysis.cpp

	//===- AffineAnalysis.cpp - Affine structures analysis routines -----------===//			//===- AffineAnalysis.cpp - Affine structures analysis routines -----------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file implements miscellaneous analysis routines for affine structures			// This file implements miscellaneous analysis routines for affine structures
	// (expressions, maps, sets), and other utilities relying on such analysis.			// (expressions, maps, sets), and other utilities relying on such analysis.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "mlir/Analysis/AffineAnalysis.h"			#include "mlir/Analysis/AffineAnalysis.h"
				#include "mlir/Analysis/SliceAnalysis.h"
	#include "mlir/Analysis/Utils.h"			#include "mlir/Analysis/Utils.h"
	#include "mlir/Dialect/Affine/IR/AffineOps.h"			#include "mlir/Dialect/Affine/IR/AffineOps.h"
	#include "mlir/Dialect/Affine/IR/AffineValueMap.h"			#include "mlir/Dialect/Affine/IR/AffineValueMap.h"
	#include "mlir/Dialect/StandardOps/IR/Ops.h"			#include "mlir/Dialect/StandardOps/IR/Ops.h"
	#include "mlir/IR/AffineExprVisitor.h"			#include "mlir/IR/AffineExprVisitor.h"
	#include "mlir/IR/BuiltinOps.h"			#include "mlir/IR/BuiltinOps.h"
	#include "mlir/IR/IntegerSet.h"			#include "mlir/IR/IntegerSet.h"
	#include "mlir/Support/MathExtras.h"			#include "mlir/Support/MathExtras.h"
	#include "llvm/ADT/DenseMap.h"			#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/SmallPtrSet.h"
				#include "llvm/ADT/TypeSwitch.h"
	#include "llvm/Support/Debug.h"			#include "llvm/Support/Debug.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"

	#define DEBUG_TYPE "affine-analysis"			#define DEBUG_TYPE "affine-analysis"

	using namespace mlir;			using namespace mlir;

	using llvm::dbgs;			using llvm::dbgs;

				/// Returns true if `value` (transitively) depends on iteration arguments of the
				/// given `forOp`.
				static bool dependsOnIterArgs(Value value, AffineForOp forOp) {
				// Compute the backward slice of the value.
				SetVector<Operation *> slice;
				getBackwardSlice(value, &slice,
				[&](Operation *op) { return !forOp->isAncestor(op); });

				// Check that none of the operands of the operations in the backward slice are
				// loop iteration arguments, and neither is the value itself.
				auto argRange = forOp.getRegionIterArgs();
				llvm::SmallPtrSet<Value, 8> iterArgs(argRange.begin(), argRange.end());
				if (iterArgs.contains(value))
				return true;

				for (Operation *op : slice)
				for (Value operand : op->getOperands())
				if (iterArgs.contains(operand))
				return true;

				return false;
				}

				/// Get the value that is being reduced by `pos`-th reduction in the loop if
				/// such a reduction can be performed by affine parallel loops. This assumes
				/// floating-point operations are commutative. On success, `kind` will be the
				/// reduction kind suitable for use in affine parallel loop builder. If the
				/// reduction is not supported, returns null.
				static Value getSupportedReduction(AffineForOp forOp, unsigned pos,
				AtomicRMWKind &kind) {
				auto yieldOp = cast<AffineYieldOp>(forOp.getBody()->back());
				Value yielded = yieldOp.operands()[pos];
				Operation *definition = yielded.getDefiningOp();
				if (!definition)
				return nullptr;
				if (!forOp.getRegionIterArgs()[pos].hasOneUse())
				return nullptr;

				Optional<AtomicRMWKind> maybeKind =
				TypeSwitch<Operation *, Optional<AtomicRMWKind>>(definition)
				.Case<AddFOp>([](Operation *) { return AtomicRMWKind::addf; })
				.Case<MulFOp>([](Operation *) { return AtomicRMWKind::mulf; })
				.Case<AddIOp>([](Operation *) { return AtomicRMWKind::addi; })
				.Case<MulIOp>([](Operation *) { return AtomicRMWKind::muli; })
				.Default([](Operation *) -> Optional<AtomicRMWKind> {
				// TODO: AtomicRMW supports other kinds of reductions this is
				// currently not detecting, add those when the need arises.
				return llvm::None;
				});
				if (!maybeKind)
				return nullptr;

				kind = *maybeKind;
				if (definition->getOperand(0) == forOp.getRegionIterArgs()[pos] &&
				!dependsOnIterArgs(definition->getOperand(1), forOp))
				return definition->getOperand(1);
				if (definition->getOperand(1) == forOp.getRegionIterArgs()[pos] &&
				!dependsOnIterArgs(definition->getOperand(0), forOp))
				return definition->getOperand(0);

				return nullptr;
				}

				/// Returns true if `forOp' is a parallel loop. If `parallelReductions` is
				/// provided, populates it with descriptors of the parallelizable reductions and
				/// treats them as not preventing parallelization.
				bool mlir::isLoopParallel(AffineForOp forOp,
				SmallVectorImpl<LoopReduction> *parallelReductions) {
				unsigned numIterArgs = forOp.getNumIterOperands();

				// Loop is not parallel if it has SSA loop-carried dependences and reduction
				// detection is not requested.
				if (numIterArgs > 0 && !parallelReductions)
				ftynseAuthorUnsubmitted Done Reply Inline Actions Here's the early exit @bondhugula, it has always been here. ftynse: Here's the early exit @bondhugula, it has always been here.
				bondhugulaUnsubmitted Done Reply Inline Actions Yes, I knew this had always been there. This wasn't the confusion. bondhugula: Yes, I knew this had always been there. This wasn't the confusion.
				return false;

				// Find supported reductions of requested.
				if (parallelReductions) {
				parallelReductions->reserve(forOp.getNumIterOperands());
				for (unsigned i = 0; i < numIterArgs; ++i) {
				AtomicRMWKind kind;
				if (Value value = getSupportedReduction(forOp, i, kind))
				parallelReductions->emplace_back(LoopReduction{kind, i, value});
				}

				// Return later to allow for identifying all parallel reductions even if the
				// loop is not parallel.
				if (parallelReductions->size() != numIterArgs)
				return false;
				}

				// Collect all load and store ops in loop nest rooted at 'forOp'.
				SmallVector<Operation *, 8> loadAndStoreOpInsts;
				auto walkResult = forOp.walk([&](Operation *opInst) -> WalkResult {
				if (isa<AffineReadOpInterface, AffineWriteOpInterface>(opInst))
				loadAndStoreOpInsts.push_back(opInst);
				else if (!isa<AffineForOp, AffineYieldOp, AffineIfOp>(opInst) &&
				!MemoryEffectOpInterface::hasNoEffect(opInst))
				return WalkResult::interrupt();

				return WalkResult::advance();
				});

				// Stop early if the loop has unknown ops with side effects.
				if (walkResult.wasInterrupted())
				return false;

				// Dep check depth would be number of enclosing loops + 1.
				unsigned depth = getNestingDepth(forOp) + 1;

				// Check dependences between all pairs of ops in 'loadAndStoreOpInsts'.
				for (auto *srcOpInst : loadAndStoreOpInsts) {
				MemRefAccess srcAccess(srcOpInst);
				for (auto *dstOpInst : loadAndStoreOpInsts) {
				bondhugulaUnsubmitted Done Reply Inline Actions Nit: The `Inst` suffix is legacy back when they were called instructions. Just `srcOp` and `dstOp` is fine. Likewise for `Insts`. bondhugula: Nit: The `Inst` suffix is legacy back when they were called instructions. Just `srcOp` and…
				ftynseAuthorUnsubmitted Done Reply Inline Actions I'm just moving this code from a different file and I'm not a fan of sneaking in irrelevant changes, but okay. ftynse: I'm just moving this code from a different file and I'm not a fan of sneaking in irrelevant…
				MemRefAccess dstAccess(dstOpInst);
				FlatAffineConstraints dependenceConstraints;
				DependenceResult result = checkMemrefAccessDependence(
				srcAccess, dstAccess, depth, &dependenceConstraints,
				/dependenceComponents=/nullptr);
				if (result.value != DependenceResult::NoDependence)
				return false;
				}
				}
				return true;
				bondhugulaUnsubmitted Done Reply Inline Actions I assume all of this code beyond the early exit were just added in the recent iteration. This was what I was requesting for! I hope I was indeed looking at the whole diff earlier and not the last increment. The change to isLoopParallel in the previous one was just one line. Now it's integrated! bondhugula: I assume all of this code beyond the early exit were just added in the recent iteration. This…
				ftynseAuthorUnsubmitted Done Reply Inline Actions Part of this code (the original contents of isLoopParallel) is moved from lib/Analysis/Utils.cpp, Phabricator indicates this by a light yellow vertical bar next to the lines it can detect as moved. ftynse: Part of this code (the original contents of isLoopParallel) is moved from lib/Analysis/Utils.
				}

	/// Returns the sequence of AffineApplyOp Operations operation in			/// Returns the sequence of AffineApplyOp Operations operation in
	/// 'affineApplyOps', which are reachable via a search starting from 'operands',			/// 'affineApplyOps', which are reachable via a search starting from 'operands',
	/// and ending at operands which are not defined by AffineApplyOps.			/// and ending at operands which are not defined by AffineApplyOps.
	// TODO: Add a method to AffineApplyOp which forward substitutes the			// TODO: Add a method to AffineApplyOp which forward substitutes the
	// AffineApplyOp into any user AffineApplyOps.			// AffineApplyOp into any user AffineApplyOps.
	void mlir::getReachableAffineApplyOps(			void mlir::getReachableAffineApplyOps(
	ArrayRef<Value> operands, SmallVectorImpl<Operation *> &affineApplyOps) {			ArrayRef<Value> operands, SmallVectorImpl<Operation *> &affineApplyOps) {
	struct State {			struct State {
	▲ Show 20 Lines • Show All 942 Lines • Show Last 20 Lines

mlir/lib/Analysis/Utils.cpp

Show First 20 Lines • Show All 1,262 Lines • ▼ Show 20 Lines	void mlir::getSequentialLoops(AffineForOp forOp,
llvm::SmallDenseSet<Value, 8> *sequentialLoops) {		llvm::SmallDenseSet<Value, 8> *sequentialLoops) {
forOp->walk([&](Operation *op) {		forOp->walk([&](Operation *op) {
if (auto innerFor = dyn_cast<AffineForOp>(op))		if (auto innerFor = dyn_cast<AffineForOp>(op))
if (!isLoopParallel(innerFor))		if (!isLoopParallel(innerFor))
sequentialLoops->insert(innerFor.getInductionVar());		sequentialLoops->insert(innerFor.getInductionVar());
});		});
}		}

/// Returns true if 'forOp' is parallel.
bool mlir::isLoopParallel(AffineForOp forOp) {
// Loop is not parallel if it has SSA loop-carried dependences.
// TODO: Conditionally support reductions and other loop-carried dependences
// that could be handled in the context of a parallel loop.
if (forOp.getNumIterOperands() > 0)
return false;

// Collect all load and store ops in loop nest rooted at 'forOp'.
SmallVector<Operation *, 8> loadAndStoreOpInsts;
auto walkResult = forOp.walk([&](Operation *opInst) -> WalkResult {
if (isa<AffineReadOpInterface, AffineWriteOpInterface>(opInst))
loadAndStoreOpInsts.push_back(opInst);
else if (!isa<AffineForOp, AffineYieldOp, AffineIfOp>(opInst) &&
!MemoryEffectOpInterface::hasNoEffect(opInst))
return WalkResult::interrupt();

return WalkResult::advance();
});

// Stop early if the loop has unknown ops with side effects.
if (walkResult.wasInterrupted())
return false;

// Dep check depth would be number of enclosing loops + 1.
unsigned depth = getNestingDepth(forOp) + 1;

// Check dependences between all pairs of ops in 'loadAndStoreOpInsts'.
for (auto *srcOpInst : loadAndStoreOpInsts) {
MemRefAccess srcAccess(srcOpInst);
for (auto *dstOpInst : loadAndStoreOpInsts) {
MemRefAccess dstAccess(dstOpInst);
FlatAffineConstraints dependenceConstraints;
DependenceResult result = checkMemrefAccessDependence(
srcAccess, dstAccess, depth, &dependenceConstraints,
/dependenceComponents=/nullptr);
if (result.value != DependenceResult::NoDependence)
return false;
}
}
return true;
}

IntegerSet mlir::simplifyIntegerSet(IntegerSet set) {		IntegerSet mlir::simplifyIntegerSet(IntegerSet set) {
FlatAffineConstraints fac(set);		FlatAffineConstraints fac(set);
if (fac.isEmpty())		if (fac.isEmpty())
		ftynseAuthorUnsubmitted Done Reply Inline Actions Here's the early exit @bondhugula ^ ftynse: Here's the early exit @bondhugula ^
		ftynseAuthorUnsubmitted Done Reply Inline Actions This has moved to AffineAnalysis.cpp. ftynse: This has moved to AffineAnalysis.cpp.
return IntegerSet::getEmptySet(set.getNumDims(), set.getNumSymbols(),		return IntegerSet::getEmptySet(set.getNumDims(), set.getNumSymbols(),
set.getContext());		set.getContext());
fac.removeTrivialRedundancy();		fac.removeTrivialRedundancy();

auto simplifiedSet = fac.getAsIntegerSet(set.getContext());		auto simplifiedSet = fac.getAsIntegerSet(set.getContext());
assert(simplifiedSet && "guaranteed to succeed while roundtripping");		assert(simplifiedSet && "guaranteed to succeed while roundtripping");
return simplifiedSet;		return simplifiedSet;
}		}

mlir/lib/Dialect/Affine/Transforms/AffineParallelize.cpp

	//===- AffineParallelize.cpp - Affineparallelize Pass---------------------===//			//===- AffineParallelize.cpp - Affineparallelize Pass---------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file implements a parallelizer for affine loop nests that is able to			// This file implements a parallelizer for affine loop nests that is able to
	// perform inner or outer loop parallelization.			// perform inner or outer loop parallelization.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "PassDetail.h"			#include "PassDetail.h"
				#include "mlir/Analysis/AffineAnalysis.h"
	#include "mlir/Analysis/AffineStructures.h"			#include "mlir/Analysis/AffineStructures.h"
	#include "mlir/Analysis/LoopAnalysis.h"			#include "mlir/Analysis/LoopAnalysis.h"
	#include "mlir/Analysis/Utils.h"			#include "mlir/Analysis/Utils.h"
	#include "mlir/Dialect/Affine/IR/AffineOps.h"			#include "mlir/Dialect/Affine/IR/AffineOps.h"
	#include "mlir/Dialect/Affine/IR/AffineValueMap.h"			#include "mlir/Dialect/Affine/IR/AffineValueMap.h"
	#include "mlir/Dialect/Affine/Passes.h"			#include "mlir/Dialect/Affine/Passes.h"
	#include "mlir/Dialect/Affine/Passes.h.inc"			#include "mlir/Dialect/Affine/Passes.h.inc"
	#include "mlir/Dialect/Affine/Utils.h"			#include "mlir/Dialect/Affine/Utils.h"
	#include "mlir/Transforms/LoopUtils.h"			#include "mlir/Transforms/LoopUtils.h"
	#include "llvm/Support/Debug.h"			#include "llvm/Support/Debug.h"
	#include <deque>			#include <deque>

	#define DEBUG_TYPE "affine-parallel"			#define DEBUG_TYPE "affine-parallel"

	using namespace mlir;			using namespace mlir;

	namespace {			namespace {
	/// Convert all parallel affine.for op into 1-D affine.parallel op.			/// Convert all parallel affine.for op into 1-D affine.parallel op.
	struct AffineParallelize : public AffineParallelizeBase<AffineParallelize> {			struct AffineParallelize : public AffineParallelizeBase<AffineParallelize> {
	void runOnFunction() override;			void runOnFunction() override;
	};			};

				/// Descriptor of a potentially parallelizable loop.
				struct ParallelizationCandidate {
				ParallelizationCandidate(AffineForOp l, SmallVector<LoopReduction> &&r)
				: loop(l), reductions(std::move(r)) {}

				AffineForOp loop;
				SmallVector<LoopReduction> reductions;
				bondhugulaUnsubmitted Done Reply Inline Actions Doc comment for this one please. bondhugula: Doc comment for this one please.
				};
	} // namespace			} // namespace

	void AffineParallelize::runOnFunction() {			void AffineParallelize::runOnFunction() {
	FuncOp f = getFunction();			FuncOp f = getFunction();

	// The walker proceeds in post-order, but we need to process outer loops first			// The walker proceeds in post-order, but we need to process outer loops first
	// to control the number of outer parallel loops, so push candidate loops to			// to control the number of outer parallel loops, so push candidate loops to
	// the front of a deque.			// the front of a deque.
	std::deque<AffineForOp> parallelizableLoops;			std::deque<ParallelizationCandidate> parallelizableLoops;
	f.walk([&](AffineForOp loop) {			f.walk([&](AffineForOp loop) {
	if (isLoopParallel(loop))			SmallVector<LoopReduction> reductions;
	parallelizableLoops.push_front(loop);			if (isLoopParallel(loop, parallelReductions ? &reductions : nullptr))
				parallelizableLoops.emplace_back(loop, std::move(reductions));
	});			});

	for (AffineForOp loop : parallelizableLoops) {			for (const ParallelizationCandidate &candidate : parallelizableLoops) {
	unsigned numParentParallelOps = 0;			unsigned numParentParallelOps = 0;
				AffineForOp loop = candidate.loop;
	for (Operation *op = loop->getParentOp();			for (Operation *op = loop->getParentOp();
	op != nullptr && !op->hasTrait<OpTrait::AffineScope>();			op != nullptr && !op->hasTrait<OpTrait::AffineScope>();
	op = op->getParentOp()) {			op = op->getParentOp()) {
	if (isa<AffineParallelOp>(op))			if (isa<AffineParallelOp>(op))
	++numParentParallelOps;			++numParentParallelOps;
	}			}

	if (numParentParallelOps < maxNested)			if (numParentParallelOps < maxNested) {
	affineParallelize(loop);			if (failed(affineParallelize(loop, candidate.reductions))) {
				LLVM_DEBUG(llvm::dbgs() << "[" DEBUG_TYPE "] failed to parallelize\n"
				<< loop);
				}
				} else {
				LLVM_DEBUG(llvm::dbgs() << "[" DEBUG_TYPE "] too many nested loops\n"
				<< loop);
				}
	}			}
	}			}

	std::unique_ptr<OperationPass<FuncOp>> mlir::createAffineParallelizePass() {			std::unique_ptr<OperationPass<FuncOp>> mlir::createAffineParallelizePass() {
	return std::make_unique<AffineParallelize>();			return std::make_unique<AffineParallelize>();
	}			}

mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp

	//===- SuperVectorize.cpp - Vectorize Pass Impl ---------------------------===//			//===- SuperVectorize.cpp - Vectorize Pass Impl ---------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file implements vectorization of loops, operations and data types to			// This file implements vectorization of loops, operations and data types to
	// a target-independent, n-D super-vector abstraction.			// a target-independent, n-D super-vector abstraction.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "PassDetail.h"			#include "PassDetail.h"
				#include "mlir/Analysis/AffineAnalysis.h"
	#include "mlir/Analysis/LoopAnalysis.h"			#include "mlir/Analysis/LoopAnalysis.h"
	#include "mlir/Analysis/NestedMatcher.h"			#include "mlir/Analysis/NestedMatcher.h"
	#include "mlir/Analysis/Utils.h"
	#include "mlir/Dialect/Affine/IR/AffineOps.h"			#include "mlir/Dialect/Affine/IR/AffineOps.h"
	#include "mlir/Dialect/Affine/Utils.h"			#include "mlir/Dialect/Affine/Utils.h"
	#include "mlir/Dialect/Vector/VectorOps.h"			#include "mlir/Dialect/Vector/VectorOps.h"
	#include "mlir/Dialect/Vector/VectorUtils.h"			#include "mlir/Dialect/Vector/VectorUtils.h"
	#include "mlir/IR/BlockAndValueMapping.h"			#include "mlir/IR/BlockAndValueMapping.h"
	#include "llvm/Support/Debug.h"
	#include "mlir/Support/LLVM.h"			#include "mlir/Support/LLVM.h"
				#include "llvm/Support/Debug.h"

	using namespace mlir;			using namespace mlir;
	using namespace vector;			using namespace vector;

	///			///
	/// Implements a high-level vectorization strategy on a Function.			/// Implements a high-level vectorization strategy on a Function.
	/// The abstraction used is that of super-vectors, which provide a single,			/// The abstraction used is that of super-vectors, which provide a single,
	/// compact, representation in the vector types, information that is expected			/// compact, representation in the vector types, information that is expected
	▲ Show 20 Lines • Show All 1,484 Lines • Show Last 20 Lines

mlir/lib/Dialect/Affine/Utils/CMakeLists.txt

	add_mlir_dialect_library(MLIRAffineUtils			add_mlir_dialect_library(MLIRAffineUtils
	Utils.cpp			Utils.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Affine			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Affine

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRAffine			MLIRAffine
				MLIRAnalysis
	MLIRTransformUtils			MLIRTransformUtils
	)			)

mlir/lib/Dialect/Affine/Utils/Utils.cpp

//===- Utils.cpp ---- Utilities for affine dialect transformation ---------===//		//===- Utils.cpp ---- Utilities for affine dialect transformation ---------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements miscellaneous transformation utilities for the Affine		// This file implements miscellaneous transformation utilities for the Affine
// dialect.		// dialect.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Dialect/Affine/Utils.h"		#include "mlir/Dialect/Affine/Utils.h"
		#include "mlir/Analysis/AffineAnalysis.h"
		#include "mlir/Analysis/Utils.h"
#include "mlir/Dialect/Affine/IR/AffineOps.h"		#include "mlir/Dialect/Affine/IR/AffineOps.h"
#include "mlir/IR/BlockAndValueMapping.h"		#include "mlir/IR/BlockAndValueMapping.h"
#include "mlir/IR/BuiltinOps.h"		#include "mlir/IR/BuiltinOps.h"
#include "mlir/IR/IntegerSet.h"		#include "mlir/IR/IntegerSet.h"
#include "mlir/Transforms/GreedyPatternRewriteDriver.h"		#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	static AffineIfOp hoistAffineIfOp(AffineIfOp ifOp, Operation *hoistOverOp) {
auto *elseBlock = hoistedIfOp.getElseBlock();		auto *elseBlock = hoistedIfOp.getElseBlock();
elseBlock->getOperations().splice(		elseBlock->getOperations().splice(
elseBlock->begin(), hoistOverOpClone->getBlock()->getOperations(),		elseBlock->begin(), hoistOverOpClone->getBlock()->getOperations(),
Block::iterator(hoistOverOpClone));		Block::iterator(hoistOverOpClone));

return hoistedIfOp;		return hoistedIfOp;
}		}

/// Replace affine.for with a 1-d affine.parallel and clone the former's body		/// Replace affine.for with a 1-d affine.parallel and clone the former's body
/// into the latter while remapping values.		/// into the latter while remapping values. Parallelizes the specified
void mlir::affineParallelize(AffineForOp forOp) {		/// reductions. Parallelization will fail in presence of loop iteration
		/// arguments that are not listed in `parallelReductions`.
		LogicalResult
		bondhugulaUnsubmitted Done Reply Inline Actions This part of the comment is now stale I think. There's nothing to keep in sync. bondhugula: This part of the comment is now stale I think. There's nothing to keep in sync.
		mlir::affineParallelize(AffineForOp forOp,
		ArrayRef<LoopReduction> parallelReductions) {
		// Fail early if there are iter arguments that are not reductions.
		unsigned numReductions = parallelReductions.size();
		if (numReductions != forOp.getNumIterOperands())
		cheliniUnsubmitted Done Reply Inline Actions Here we expect all the iterators to be valid reductions. I would make this explicit using a comment. chelini: Here we expect all the iterators to be valid reductions. I would make this explicit using a…
		ftynseAuthorUnsubmitted Done Reply Inline Actions How do you suggest to parallelize reduction loops where only some reductions are parallelizable? Sounds impossible to me, but still added a comment. ftynse: How do you suggest to parallelize reduction loops where only some reductions are…
		cheliniUnsubmitted Done Reply Inline Actions I don't have anything in particular in mind. I suggested adding a comment only to clarify when the function does not parallelize reductions. chelini: I don't have anything in particular in mind. I suggested adding a comment only to clarify when…
		return failure();

		bondhugulaUnsubmitted Done Reply Inline Actions This requires a slight rephrase. This replacement is contingent on the reduction checks you have. bondhugula: This requires a slight rephrase. This replacement is contingent on the reduction checks you…
Location loc = forOp.getLoc();		Location loc = forOp.getLoc();
OpBuilder outsideBuilder(forOp);		OpBuilder outsideBuilder(forOp);
		bondhugulaUnsubmitted Done Reply Inline Actions It's this part that I thought should go into `isLoopParallel` it's really a reduction check. You can have it return (sa output arg) the reduced values? bondhugula: It's this part that I thought should go into `isLoopParallel` it's really a reduction check.

// If a loop has a 'max' in the lower bound, emit it outside the parallel loop		// If a loop has a 'max' in the lower bound, emit it outside the parallel loop
// as it does not have implicit 'max' behavior.		// as it does not have implicit 'max' behavior.
AffineMap lowerBoundMap = forOp.getLowerBoundMap();		AffineMap lowerBoundMap = forOp.getLowerBoundMap();
ValueRange lowerBoundOperands = forOp.getLowerBoundOperands();		ValueRange lowerBoundOperands = forOp.getLowerBoundOperands();
AffineMap upperBoundMap = forOp.getUpperBoundMap();		AffineMap upperBoundMap = forOp.getUpperBoundMap();
ValueRange upperBoundOperands = forOp.getUpperBoundOperands();		ValueRange upperBoundOperands = forOp.getUpperBoundOperands();

bool needsMax = lowerBoundMap.getNumResults() > 1;		bool needsMax = lowerBoundMap.getNumResults() > 1;
		bondhugulaUnsubmitted Done Reply Inline Actions Nit: A comment here perhaps saying AtomicRMWKind supports additional reduction kinds that we aren't detecting. bondhugula: Nit: A comment here perhaps saying AtomicRMWKind supports additional reduction kinds that we…
bool needsMin = upperBoundMap.getNumResults() > 1;		bool needsMin = upperBoundMap.getNumResults() > 1;
AffineMap identityMap;		AffineMap identityMap;
if (needsMax \|\| needsMin) {		if (needsMax \|\| needsMin) {
if (forOp->getParentOp() &&		if (forOp->getParentOp() &&
!forOp->getParentOp()->hasTrait<OpTrait::AffineScope>())		!forOp->getParentOp()->hasTrait<OpTrait::AffineScope>())
return;		return failure();

		bondhugulaUnsubmitted Done Reply Inline Actions This check can be moved up and the `isa_and_nonull` check and the `getReductionOperationKind` can be unified? bondhugula: This check can be moved up and the `isa_and_nonull` check and the `getReductionOperationKind`…
identityMap = AffineMap::getMultiDimIdentityMap(1, loc->getContext());		identityMap = AffineMap::getMultiDimIdentityMap(1, loc->getContext());
}		}
if (needsMax) {		if (needsMax) {
		cheliniUnsubmitted Done Reply Inline Actions I would remove braces. chelini: I would remove braces.
		kumasentoUnsubmitted Done Reply Inline Actions Just wondering is it possible that both operands can be iter_args? kumasento: Just wondering is it possible that both operands can be iter_args?
		ftynseAuthorUnsubmitted Done Reply Inline Actions It is possible in the IR, but wouldn't pass the single-use check above. ftynse: It is possible in the IR, but wouldn't pass the single-use check above.
		bondhugulaUnsubmitted Done Reply Inline Actions This check doesn't appear to be enough unless I missed something. You'll need to check this "value being reduced" isn't itself an iter_arg or a function of the iter_arg; furthermore, you'll also need to check that the result of this `definition` is being yielded at the right position. bondhugula: This check doesn't appear to be enough unless I missed something. You'll need to check this…
		ftynseAuthorUnsubmitted Done Reply Inline Actions You'll need to check this "value being reduced" isn't itself an iter_arg or a function of the iter_arg; Most of the cases were actually filtered out because we need all iter args to be reductions to transform them, but there are indeed weird cases that need to be handled. furthermore, you'll also need to check that the result of this definition is being yielded at the right position. `definition` is the defining op of the `pos`-th yield operand... ftynse: > You'll need to check this "value being reduced" isn't itself an iter_arg or a function of the…
		bondhugulaUnsubmitted Done Reply Inline Actions Sounds good to me. But it'd be good to add a comment on that note that you mention - "... were actually filtered out because we need all iter args to be reductions to transform them, but there are indeed weird cases that need to be handled" bondhugula: Sounds good to me. But it'd be good to add a comment on that note that you mention - //"...
		ftynseAuthorUnsubmitted Done Reply Inline Actions You might have missed it, I added a full-blown check in `dependsOnIterArgs` that fails if any operand in the backward slice is an iter_arg, so this no longer depends on the filtering. ftynse: You might have missed it, I added a full-blown check in `dependsOnIterArgs` that fails if any…
		sgrechanikUnsubmitted Done Reply Inline Actions Do the weird examples that can be caught by `dependsOnIterArgs` and cannot be caught by checking `hasOneUse()` really exist? Because it seems to me that checking `hasOneUse()` on every iter arg will exclude ALL weird cases, but maybe I miss something. sgrechanik: Do the weird examples that can be caught by `dependsOnIterArgs` and cannot be caught by…
		ftynseAuthorUnsubmitted Done Reply Inline Actions Yes, see the `strange_butterfly` case in tests. Each iter arg is used exactly once, in the same `addf` operation that becomes the reduction. One of the operands may also be a value transitively dependent on the single use of another iter arg. ftynse: Yes, see the `strange_butterfly` case in tests. Each iter arg is used exactly once, in the same…
auto maxOp = outsideBuilder.create<AffineMaxOp>(loc, lowerBoundMap,		auto maxOp = outsideBuilder.create<AffineMaxOp>(loc, lowerBoundMap,
lowerBoundOperands);		lowerBoundOperands);
lowerBoundMap = identityMap;		lowerBoundMap = identityMap;
		cheliniUnsubmitted Done Reply Inline Actions Same here. chelini: Same here.
lowerBoundOperands = maxOp->getResults();		lowerBoundOperands = maxOp->getResults();
}		}

// Same for the upper bound.		// Same for the upper bound.
if (needsMin) {		if (needsMin) {
auto minOp = outsideBuilder.create<AffineMinOp>(loc, upperBoundMap,		auto minOp = outsideBuilder.create<AffineMinOp>(loc, upperBoundMap,
upperBoundOperands);		upperBoundOperands);
upperBoundMap = identityMap;		upperBoundMap = identityMap;
upperBoundOperands = minOp->getResults();		upperBoundOperands = minOp->getResults();
}		}

// Creating empty 1-D affine.parallel op.		// Creating empty 1-D affine.parallel op.
		auto reducedValues = llvm::to_vector<4>(llvm::map_range(
		parallelReductions, [](const LoopReduction &red) { return red.value; }));
		auto reductionKinds = llvm::to_vector<4>(llvm::map_range(
		parallelReductions, [](const LoopReduction &red) { return red.kind; }));
AffineParallelOp newPloop = outsideBuilder.create<AffineParallelOp>(		AffineParallelOp newPloop = outsideBuilder.create<AffineParallelOp>(
loc, llvm::None, llvm::None, lowerBoundMap, lowerBoundOperands,		loc, ValueRange(reducedValues).getTypes(), reductionKinds, lowerBoundMap,
upperBoundMap, upperBoundOperands);		lowerBoundOperands, upperBoundMap, upperBoundOperands);
// Steal the body of the old affine for op and erase it.		// Steal the body of the old affine for op.
newPloop.region().takeBody(forOp.region());		newPloop.region().takeBody(forOp.region());
		Operation *yieldOp = &newPloop.getBody()->back();

		// Handle the initial values of reductions because the parallel loop always
		// starts from the neutral value.
		SmallVector<Value> newResults;
		newResults.reserve(numReductions);
		for (unsigned i = 0; i < numReductions; ++i) {
		Value init = forOp.getIterOperands()[i];
		// This works because we are only handling single-op reductions at the
		// moment. A switch on reduction kind or a mechanism to collect operations
		// participating in the reduction will be necessary for multi-op reductions.
		Operation *reductionOp = yieldOp->getOperand(i).getDefiningOp();
		assert(reductionOp && "yielded value is expected to be produced by an op");
		outsideBuilder.getInsertionBlock()->getOperations().splice(
		outsideBuilder.getInsertionPoint(), newPloop.getBody()->getOperations(),
		reductionOp);
		reductionOp->setOperands({init, newPloop->getResult(i)});
		bondhugulaUnsubmitted Done Reply Inline Actions assert on reductionOp not being null? bondhugula: assert on reductionOp not being null?
		forOp->getResult(i).replaceAllUsesWith(reductionOp->getResult(0));
		}

		// Update the loop terminator to yield reduced values bypassing the reduction
		// operation itself (now moved outside of the loop) and erase the block
		// arguments that correspond to reductions. Note that the loop always has one
		// "main" induction variable whenc coming from a non-parallel for.
		unsigned numIVs = 1;
		cheliniUnsubmitted Done Reply Inline Actions It may be not clear what this "1" means. Can we add a comment like: /* induction var / or use an explicit method like `getNumInductionVars()` ? chelini:* It may be not clear what this "1" means. Can we add a comment like: /* induction var */ or use…
		ftynseAuthorUnsubmitted Done Reply Inline Actions That's why there is a comment above saying "erase the block arguments that correspond to reductions". The `/name=/` comments are used to indicate the argument names of the callee, `llvm::seq` in this case, so the correct name would be `/Begin=/`. The linter will justifiably complain about any other comment because it only makes code more confusing for whoever familiar with the style. ftynse: That's why there is a comment above saying "erase the block arguments that correspond to…
		yieldOp->setOperands(reducedValues);
		newPloop.getBody()->eraseArguments(
		llvm::to_vector<4>(llvm::seq<unsigned>(numIVs, numReductions + numIVs)));

forOp.erase();		forOp.erase();
		return success();
}		}

// Returns success if any hoisting happened.		// Returns success if any hoisting happened.
LogicalResult mlir::hoistAffineIfOp(AffineIfOp ifOp, bool *folded) {		LogicalResult mlir::hoistAffineIfOp(AffineIfOp ifOp, bool *folded) {
// Bail out early if the ifOp returns a result. TODO: Consider how to		// Bail out early if the ifOp returns a result. TODO: Consider how to
// properly support this case.		// properly support this case.
if (ifOp.getNumResults() != 0)		if (ifOp.getNumResults() != 0)
return failure();		return failure();
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

mlir/test/Dialect/Affine/parallelize.mlir

// RUN: mlir-opt %s -allow-unregistered-dialect -affine-parallelize\| FileCheck %s		// RUN: mlir-opt %s -allow-unregistered-dialect -affine-parallelize \| FileCheck %s
// RUN: mlir-opt %s -allow-unregistered-dialect -affine-parallelize='max-nested=1' \| FileCheck --check-prefix=MAX-NESTED %s		// RUN: mlir-opt %s -allow-unregistered-dialect -affine-parallelize='max-nested=1' \| FileCheck --check-prefix=MAX-NESTED %s
		// RUN: mlir-opt %s -allow-unregistered-dialect -affine-parallelize='parallel-reductions=1' \| FileCheck --check-prefix=REDUCE %s

// CHECK-LABEL: func @reduce_window_max() {		// CHECK-LABEL: func @reduce_window_max() {
func @reduce_window_max() {		func @reduce_window_max() {
%cst = constant 0.000000e+00 : f32		%cst = constant 0.000000e+00 : f32
%0 = memref.alloc() : memref<1x8x8x64xf32>		%0 = memref.alloc() : memref<1x8x8x64xf32>
%1 = memref.alloc() : memref<1x18x18x64xf32>		%1 = memref.alloc() : memref<1x18x18x64xf32>
affine.for %arg0 = 0 to 1 {		affine.for %arg0 = 0 to 1 {
affine.for %arg1 = 0 to 8 {		affine.for %arg1 = 0 to 8 {
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	affine.for %i = affine_map<(d0) -> (d0)>(%lb0) to affine_map<(d0) -> (d0)>(%ub0) {
// MAX-NESTED: affine.for		// MAX-NESTED: affine.for
affine.for %j = affine_map<(d0) -> (d0)>(%lb1) to affine_map<(d0) -> (d0)>(%ub1) {		affine.for %j = affine_map<(d0) -> (d0)>(%lb1) to affine_map<(d0) -> (d0)>(%ub1) {
affine.load %m[%i, %j] : memref<?x?xf32>		affine.load %m[%i, %j] : memref<?x?xf32>
}		}
}		}
return		return
}		}

// CHECK-LABEL: @unsupported_iter_args		// CHECK-LABEL: @iter_args
func @unsupported_iter_args(%in: memref<10xf32>) {		// REDUCE-LABEL: @iter_args
		func @iter_args(%in: memref<10xf32>) {
		// REDUCE: %[[init:.*]] = constant
%cst = constant 0.000000e+00 : f32		%cst = constant 0.000000e+00 : f32
// CHECK-NOT: affine.parallel		// CHECK-NOT: affine.parallel
		// REDUCE: %[[reduced:.]] = affine.parallel (%{{.}}) = (0) to (10) reduce ("addf")
		bondhugulaUnsubmitted Done Reply Inline Actions Could you please check the body as well as you are replacing the old op if that sounds reasonable? bondhugula: Could you please check the body as well as you are replacing the old op if that sounds…
%final_red = affine.for %i = 0 to 10 iter_args(%red_iter = %cst) -> (f32) {		%final_red = affine.for %i = 0 to 10 iter_args(%red_iter = %cst) -> (f32) {
		bondhugulaUnsubmitted Done Reply Inline Actions Better to match the whole `affine.parallel` op itself for clarity at least in this one case? bondhugula: Better to match the whole `affine.parallel` op itself for clarity at least in this one case?
		// REDUCE: %[[red_value:.*]] = affine.load
%ld = affine.load %in[%i] : memref<10xf32>		%ld = affine.load %in[%i] : memref<10xf32>
		// REDUCE-NOT: addf
%add = addf %red_iter, %ld : f32		%add = addf %red_iter, %ld : f32
		// REDUCE: affine.yield %[[red_value]]
affine.yield %add : f32		affine.yield %add : f32
}		}
		// REDUCE: addf %[[init]], %[[reduced]]
return		return
}		}

// CHECK-LABEL: @unsupported_nested_iter_args		// CHECK-LABEL: @nested_iter_args
func @unsupported_nested_iter_args(%in: memref<20x10xf32>) {		// REDUCE-LABEL: @nested_iter_args
		func @nested_iter_args(%in: memref<20x10xf32>) {
%cst = constant 0.000000e+00 : f32		%cst = constant 0.000000e+00 : f32
// CHECK: affine.parallel		// CHECK: affine.parallel
affine.for %i = 0 to 20 {		affine.for %i = 0 to 20 {
// CHECK: affine.for		// CHECK-NOT: affine.parallel
		// REDUCE: affine.parallel
		// REDUCE: reduce ("addf")
%final_red = affine.for %j = 0 to 10 iter_args(%red_iter = %cst) -> (f32) {		%final_red = affine.for %j = 0 to 10 iter_args(%red_iter = %cst) -> (f32) {
%ld = affine.load %in[%i, %j] : memref<20x10xf32>		%ld = affine.load %in[%i, %j] : memref<20x10xf32>
%add = addf %red_iter, %ld : f32		%add = addf %red_iter, %ld : f32
affine.yield %add : f32		affine.yield %add : f32
}		}
}		}
return		return
}		}

		// REDUCE-LABEL: @strange_butterfly
		func @strange_butterfly() {
		%cst1 = constant 0.0 : f32
		%cst2 = constant 1.0 : f32
		// REDUCE-NOT: affine.parallel
		affine.for %i = 0 to 10 iter_args(%it1 = %cst1, %it2 = %cst2) -> (f32, f32) {
		%0 = addf %it1, %it2 : f32
		bondhugulaUnsubmitted Done Reply Inline Actions How about `addf %it1, %it1`? That was the one I was referring to. You could also have: %0 = addf %it1, (%A[%i]) %1 = addf %it2, %0 yield %0, %1 : f32, f32 Just making sure the weird cases (non reductions) are guarded. bondhugula: How about `addf %it1, %it1`? That was the one I was referring to. You could also have: ``` %0 =…
		ftynseAuthorUnsubmitted Done Reply Inline Actions The first one is easily caught because there are two uses of `%it1`, not one. The second case is caught by the backward slice check I added. Added both as tests. ftynse: The first one is easily caught because there are two uses of `%it1`, not one. The second case…
		affine.yield %0, %0 : f32, f32
		}
		return
		}

		// An iter arg is used more than once. This is not a simple reduction and
		// should not be parallelized.
		// REDUCE-LABEL: @repeated_use
		func @repeated_use() {
		%cst1 = constant 0.0 : f32
		// REDUCE-NOT: affine.parallel
		affine.for %i = 0 to 10 iter_args(%it1 = %cst1) -> (f32) {
		%0 = addf %it1, %it1 : f32
		affine.yield %0 : f32
		}
		return
		}

		// An iter arg is used in the chain of operations defining the value being
		// reduced, this is not a simple reduction and should not be parallelized.
		// REDUCE-LABEL: @use_in_backward_slice
		func @use_in_backward_slice() {
		%cst1 = constant 0.0 : f32
		%cst2 = constant 1.0 : f32
		// REDUCE-NOT: affine.parallel
		affine.for %i = 0 to 10 iter_args(%it1 = %cst1, %it2 = %cst2) -> (f32, f32) {
		%0 = "test.some_modification"(%it2) : (f32) -> f32
		%1 = addf %it1, %0 : f32
		affine.yield %1, %1 : f32, f32
		}
		return
		}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Affine: parallelize affine loops with reductionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 341432

mlir/include/mlir/Analysis/AffineAnalysis.h

mlir/include/mlir/Analysis/Utils.h

mlir/include/mlir/Dialect/Affine/Passes.td

mlir/include/mlir/Dialect/Affine/Utils.h

mlir/lib/Analysis/AffineAnalysis.cpp

mlir/lib/Analysis/Utils.cpp

mlir/lib/Dialect/Affine/Transforms/AffineParallelize.cpp

mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp

mlir/lib/Dialect/Affine/Utils/CMakeLists.txt

mlir/lib/Dialect/Affine/Utils/Utils.cpp

mlir/test/Dialect/Affine/parallelize.mlir

[mlir] Affine: parallelize affine loops with reductions
ClosedPublic