This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Analysis/
10/10
Utils.h
-
Dialect/Affine/
-
Affine/
-
Passes.td
-
Utils.h
-
lib/
-
Analysis/
2/2
Utils.cpp
-
Dialect/Affine/
-
Affine/
-
Transforms/
1/1
AffineParallelize.cpp
-
Utils/
-
CMakeLists.txt
21/21
Utils.cpp
-
test/Dialect/Affine/
-
Dialect/
-
Affine/
4/4
parallelize.mlir

Differential D101171

[mlir] Affine: parallelize affine loops with reductions
ClosedPublic

Authored by ftynse on Apr 23 2021, 8:54 AM.

Download Raw Diff

Details

Reviewers

wsmoses
chelini
kumasento
nicolasvasilache
bondhugula
aartbik

Commits

rG545fa37834ef: [mlir] Affine: parallelize affine loops with reductions

Summary

Introduce a basic support for parallelizing affine loops with reductions
expressed using iteration arguments. Affine parallelism detector now has a flag
to assume such reductions are parallel. The transformation handles a subset of
parallel reductions that are can be expressed using affine.parallel:
integer/float addition and multiplication. This requires to detect the
reduction operation since affine.parallel only supports a fixed set of
reduction operators.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ftynse created this revision.Apr 23 2021, 8:54 AM

Herald added subscribers: dcaballe, cota, teijeong and 17 others. · View Herald TranscriptApr 23 2021, 8:55 AM

ftynse requested review of this revision.Apr 23 2021, 8:55 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptApr 23 2021, 8:55 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

ftynse added a child revision: D101172: [mlir] support max/min lower/upper bounds in affine.parallel.Apr 23 2021, 8:55 AM

Harbormaster completed remote builds in B100589: Diff 340058.Apr 23 2021, 10:34 AM

chelini accepted this revision.Apr 23 2021, 11:19 AM

chelini added inline comments.

mlir/lib/Dialect/Affine/Utils/Utils.cpp
167	I would remove braces.
170	Same here.
207	Here we expect all the iterators to be valid reductions. I would make this explicit using a comment.
279	It may be not clear what this "1" means. Can we add a comment like: /* induction var */ or use an explicit method like `getNumInductionVars()` ?

This revision is now accepted and ready to land.Apr 23 2021, 11:19 AM

LGTM!

mlir/lib/Dialect/Affine/Utils/Utils.cpp
167	Just wondering is it possible that both operands can be iter_args?

Great to see this functionality! Some comments.

mlir/include/mlir/Analysis/Utils.h
356–357	Update doc comment please to capture the new argument.
mlir/lib/Dialect/Affine/Utils/Utils.cpp
163–164	This check can be moved up and the `isa_and_nonull` check and the `getReductionOperationKind` can be unified?
267–271	assert on reductionOp not being null?
mlir/test/Dialect/Affine/parallelize.mlir
169–170	Better to match the whole `affine.parallel` op itself for clarity at least in this one case?

This revision now requires changes to proceed.Apr 25 2021, 12:06 AM

Address review.

ftynse added inline comments.Apr 26 2021, 3:03 AM

mlir/lib/Dialect/Affine/Utils/Utils.cpp
167	It is possible in the IR, but wouldn't pass the single-use check above.
207	How do you suggest to parallelize reduction loops where only some reductions are parallelizable? Sounds impossible to me, but still added a comment.
279	That's why there is a comment above saying "erase the block arguments that correspond to reductions". The `/name=/` comments are used to indicate the argument names of the callee, `llvm::seq` in this case, so the correct name would be `/Begin=/`. The linter will justifiably complain about any other comment because it only makes code more confusing for whoever familiar with the style.

Harbormaster completed remote builds in B100887: Diff 340465.Apr 26 2021, 3:41 AM

chelini added inline comments.Apr 26 2021, 11:33 AM

mlir/lib/Dialect/Affine/Utils/Utils.cpp
207	I don't have anything in particular in mind. I suggested adding a comment only to clarify when the function does not parallelize reductions.

bondhugula added inline comments.Apr 26 2021, 12:22 PM

mlir/lib/Dialect/Affine/Utils/Utils.cpp
138–139	This part of the comment is now stale I think. There's nothing to keep in sync.
157	Nit: A comment here perhaps saying AtomicRMWKind supports additional reduction kinds that we aren't detecting.
164–167	This check doesn't appear to be enough unless I missed something. You'll need to check this "value being reduced" isn't itself an iter_arg or a function of the iter_arg; furthermore, you'll also need to check that the result of this `definition` is being yielded at the right position.

bondhugula requested changes to this revision.Apr 26 2021, 7:33 PM

This revision now requires changes to proceed.Apr 26 2021, 7:33 PM

This functionality overlaps with my reduction vectorization patch https://reviews.llvm.org/D100694
We can merge both (our changes to the isLoopParallel function are essentially the same), but having two separate reduction recognizers would be strange, so maybe we should discuss if there is a way to converge them.

Address more review.

Herald added a subscriber: mgorny. · View Herald TranscriptApr 27 2021, 9:57 AM

ftynse added inline comments.Apr 27 2021, 9:57 AM

mlir/lib/Dialect/Affine/Utils/Utils.cpp
164–167	You'll need to check this "value being reduced" isn't itself an iter_arg or a function of the iter_arg; Most of the cases were actually filtered out because we need all iter args to be reductions to transform them, but there are indeed weird cases that need to be handled. furthermore, you'll also need to check that the result of this definition is being yielded at the right position. `definition` is the defining op of the `pos`-th yield operand...

@sgrechanik I am happy to use your version when it lands, or have this code updated if it lands first.

Harbormaster completed remote builds in B101204: Diff 340890.Apr 27 2021, 11:11 AM

This looks good to me. Please take a look at the two more comments below.

mlir/lib/Dialect/Affine/Utils/Utils.cpp
164–167	Sounds good to me. But it'd be good to add a comment on that note that you mention - "... were actually filtered out because we need all iter args to be reductions to transform them, but there are indeed weird cases that need to be handled"
mlir/test/Dialect/Affine/parallelize.mlir
169	Could you please check the body as well as you are replacing the old op if that sounds reasonable?

This revision is now accepted and ready to land.Apr 27 2021, 6:30 PM

bondhugula requested changes to this revision.Apr 27 2021, 6:45 PM

bondhugula added inline comments.

mlir/include/mlir/Analysis/Utils.h
359	I'll have the same concern with this method as with D100694 of @sgrechanik. At this point, we don't even know whether the `forOp`'s iter_args are reduction vars - so this argument naming is weird. Also, the returned result is technically inaccurate when `reductionsAreParallel` is set to true. In fact, `ignoreIterArgs` or `ignoreIterArgDeps` as the name is less confusing FWIW the way @sgrechanik has it. If there are iter_args, can we integrate this method with the reduction detection logic itself that you have? If `reductionsAreParallel` is set to false, you can immediately return false the way you have it if there is at least one iter_arg; if it's true, then you can go ahead and do the actual reduction detection.

This revision now requires changes to proceed.Apr 27 2021, 6:45 PM

bondhugula added inline comments.Apr 27 2021, 6:49 PM

mlir/test/Dialect/Affine/parallelize.mlir
206	How about `addf %it1, %it1`? That was the one I was referring to. You could also have: %0 = addf %it1, (%A[%i]) %1 = addf %it2, %0 yield %0, %1 : f32, f32 Just making sure the weird cases (non reductions) are guarded.

Address even more review.

mlir/include/mlir/Analysis/Utils.h
359	Changed to `ignoreReductionIterArgs`. We don't ignore _all_ iter args so anything that doesn't "reduction" in the name would be actually confusing. If reductionsAreParallel is set to false, you can immediately return false the way you have it if there is at least one iter_arg; if it's true, then you can go ahead and do the actual reduction detection. This is exactly what the implementation has been doing all along.
mlir/lib/Dialect/Affine/Utils/Utils.cpp
164–167	You might have missed it, I added a full-blown check in `dependsOnIterArgs` that fails if any operand in the backward slice is an iter_arg, so this no longer depends on the filtering.
mlir/test/Dialect/Affine/parallelize.mlir
206	The first one is easily caught because there are two uses of `%it1`, not one. The second case is caught by the backward slice check I added. Added both as tests.

Harbormaster completed remote builds in B101343: Diff 341092.Apr 28 2021, 2:01 AM

Sorry, it looks like I wasn't clear or I'm still missing something.

mlir/include/mlir/Analysis/Utils.h
358	I assume you meant `set to false` here? There's no check if it's set to true.
359	Looks like I'm missing something. I only see a single line change to `isLoopParallel` - the logic you refer to is in `affineParallelize` and not `isLoopParallel` unless I'm looking at an older diff! `affineParallelize` was mostly meant to be "just perform the replacement" while I was expecting the check including the one for reductions in `isLoopParallel`.
mlir/lib/Dialect/Affine/Utils/Utils.cpp
198–199	This requires a slight rephrase. This replacement is contingent on the reduction checks you have.
202–211	It's this part that I thought should go into `isLoopParallel` it's really a reduction check. You can have it return (sa output arg) the reduced values?

sgrechanik added inline comments.Apr 28 2021, 6:13 PM

mlir/lib/Dialect/Affine/Utils/Utils.cpp
164–167	Do the weird examples that can be caught by `dependsOnIterArgs` and cannot be caught by checking `hasOneUse()` really exist? Because it seems to me that checking `hasOneUse()` on every iter arg will exclude ALL weird cases, but maybe I miss something.

ftynse marked 3 inline comments as done.Apr 29 2021, 12:38 AM

ftynse added inline comments.

mlir/include/mlir/Analysis/Utils.h
358	No, I meant what I said. If it is set to true, the parallelism check will ignore iter args that it can prove to be reductions; other iter args still make the loop non-parallel. If it is unset (or set to false if you wish), any iter arg makes loop non-parallel. Let's please not get too much into bikeshedding with names.
359	Yes, you are missing something. Probably because Phabricator only shows you the diff between the latest version and the version you reviewed previously, not the base version. Line 1275 in Analysis/Utils.cpp contains the check and it was there since the first iteration.
mlir/lib/Analysis/Utils.cpp
1275	Here's the early exit @bondhugula ^
mlir/lib/Dialect/Affine/Utils/Utils.cpp
164–167	Yes, see the `strange_butterfly` case in tests. Each iter arg is used exactly once, in the same `addf` operation that becomes the reduction. One of the operands may also be a value transitively dependent on the single use of another iter arg.

Adress the hopefully final iteration of review.

Herald added a reviewer: aartbik. · View Herald TranscriptApr 29 2021, 1:34 AM

ftynse added inline comments.Apr 29 2021, 1:34 AM

mlir/include/mlir/Analysis/Utils.h
358	This is no longer relevant as I opted for an explicit list of reductions instead.

ftynse added inline comments.Apr 29 2021, 1:35 AM

mlir/lib/Analysis/AffineAnalysis.cpp
108 ↗	(On Diff #341432)	Here's the early exit @bondhugula, it has always been here.
mlir/lib/Analysis/Utils.cpp
1275	This has moved to AffineAnalysis.cpp.

Harbormaster completed remote builds in B101578: Diff 341432.Apr 29 2021, 2:04 AM

LGTM. Just minor comments now.

mlir/include/mlir/Analysis/Utils.h
359	Not really. I did notice that one line change. But now I see the whole unification into `isLoopParallel` which was what I was requesting for.
mlir/lib/Analysis/AffineAnalysis.cpp
108 ↗	(On Diff #341432)	Yes, I knew this had always been there. This wasn't the confusion.
111–158 ↗	(On Diff #341432)	I assume all of this code beyond the early exit were just added in the recent iteration. This was what I was requesting for! I hope I was indeed looking at the whole diff earlier and not the last increment. The change to isLoopParallel in the previous one was just one line. Now it's integrated!
146–148 ↗	(On Diff #341432)	Nit: The `Inst` suffix is legacy back when they were called instructions. Just `srcOp` and `dstOp` is fine. Likewise for `Insts`.
mlir/lib/Dialect/Affine/Transforms/AffineParallelize.cpp
43	Doc comment for this one please.

This revision is now accepted and ready to land.Apr 29 2021, 3:51 AM

Fix nits.

ftynse marked an inline comment as done.Apr 29 2021, 4:16 AM

ftynse added inline comments.

mlir/include/mlir/Analysis/Utils.h
359	Now you may be able to see it because it moved to a different file.
mlir/lib/Analysis/AffineAnalysis.cpp
111–158 ↗	(On Diff #341432)	Part of this code (the original contents of isLoopParallel) is moved from lib/Analysis/Utils.cpp, Phabricator indicates this by a light yellow vertical bar next to the lines it can detect as moved.
146–148 ↗	(On Diff #341432)	I'm just moving this code from a different file and I'm not a fan of sneaking in irrelevant changes, but okay.

This revision was landed with ongoing or failed builds.Apr 29 2021, 4:16 AM

Closed by commit rG545fa37834ef: [mlir] Affine: parallelize affine loops with reductions (authored by ftynse). · Explain Why

This revision was automatically updated to reflect the committed changes.

ftynse added a commit: rG545fa37834ef: [mlir] Affine: parallelize affine loops with reductions.

Harbormaster completed remote builds in B101600: Diff 341461.Apr 29 2021, 5:04 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Analysis/

Utils.h

6 lines

Dialect/

Affine/

Passes.td

3 lines

Utils.h

2 lines

lib/

Analysis/

Utils.cpp

9 lines

Dialect/

Affine/

Transforms/

AffineParallelize.cpp

13 lines

Utils/

CMakeLists.txt

1 line

Utils.cpp

121 lines

test/

Dialect/

Affine/

parallelize.mlir

63 lines

Diff 341092

mlir/include/mlir/Analysis/Utils.h

	Show First 20 Lines • Show All 347 Lines • ▼ Show 20 Lines

	/// Returns the number of surrounding loops common to both A and B.			/// Returns the number of surrounding loops common to both A and B.
	unsigned getNumCommonSurroundingLoops(Operation &A, Operation &B);			unsigned getNumCommonSurroundingLoops(Operation &A, Operation &B);

	/// Gets the memory footprint of all data touched in the specified memory space			/// Gets the memory footprint of all data touched in the specified memory space
	/// in bytes; if the memory space is unspecified, considers all memory spaces.			/// in bytes; if the memory space is unspecified, considers all memory spaces.
	Optional<int64_t> getMemoryFootprintBytes(AffineForOp forOp,			Optional<int64_t> getMemoryFootprintBytes(AffineForOp forOp,
	int memorySpace = -1);			int memorySpace = -1);

	/// Returns true if `forOp' is a parallel loop.			/// Returns true if `forOp' is a parallel loop. If `ignoreReductionIterArgs` is
				bondhugulaUnsubmitted Done Reply Inline Actions Update doc comment please to capture the new argument. bondhugula: Update doc comment please to capture the new argument.
	bool isLoopParallel(AffineForOp forOp);			/// set, checks if all loop iteration arguments are actually (known types of)
				bondhugulaUnsubmitted Done Reply Inline Actions I assume you meant `set to false` here? There's no check if it's set to true. bondhugula: I assume you meant `set to false` here? There's no check if it's set to true.
				ftynseAuthorUnsubmitted Done Reply Inline Actions No, I meant what I said. If it is set to true, the parallelism check will ignore iter args that it can prove to be reductions; other iter args still make the loop non-parallel. If it is unset (or set to false if you wish), any iter arg makes loop non-parallel. Let's please not get too much into bikeshedding with names. ftynse: No, I meant what I said. If it is set to true, the parallelism check will ignore iter args that…
				ftynseAuthorUnsubmitted Done Reply Inline Actions This is no longer relevant as I opted for an explicit list of reductions instead. ftynse: This is no longer relevant as I opted for an explicit list of reductions instead.
				/// reductions and treats them as not preventing parallelization.
				bondhugulaUnsubmitted Done Reply Inline Actions I'll have the same concern with this method as with D100694 of @sgrechanik. At this point, we don't even know whether the `forOp`'s iter_args are reduction vars - so this argument naming is weird. Also, the returned result is technically inaccurate when `reductionsAreParallel` is set to true. In fact, `ignoreIterArgs` or `ignoreIterArgDeps` as the name is less confusing FWIW the way @sgrechanik has it. If there are iter_args, can we integrate this method with the reduction detection logic itself that you have? If `reductionsAreParallel` is set to false, you can immediately return false the way you have it if there is at least one iter_arg; if it's true, then you can go ahead and do the actual reduction detection. bondhugula: I'll have the same concern with this method as with D100694 of @sgrechanik. At this point, we…
				ftynseAuthorUnsubmitted Done Reply Inline Actions Changed to `ignoreReductionIterArgs`. We don't ignore _all_ iter args so anything that doesn't "reduction" in the name would be actually confusing. If reductionsAreParallel is set to false, you can immediately return false the way you have it if there is at least one iter_arg; if it's true, then you can go ahead and do the actual reduction detection. This is exactly what the implementation has been doing all along. ftynse: Changed to `ignoreReductionIterArgs`. We don't ignore _all_ iter args so anything that doesn't…
				bondhugulaUnsubmitted Done Reply Inline Actions Looks like I'm missing something. I only see a single line change to `isLoopParallel` - the logic you refer to is in `affineParallelize` and not `isLoopParallel` unless I'm looking at an older diff! `affineParallelize` was mostly meant to be "just perform the replacement" while I was expecting the check including the one for reductions in `isLoopParallel`. bondhugula: Looks like I'm missing something. I only see a single line change to `isLoopParallel` - the…
				ftynseAuthorUnsubmitted Done Reply Inline Actions Yes, you are missing something. Probably because Phabricator only shows you the diff between the latest version and the version you reviewed previously, not the base version. Line 1275 in Analysis/Utils.cpp contains the check and it was there since the first iteration. ftynse: Yes, you are missing something. Probably because Phabricator only shows you the diff between…
				bondhugulaUnsubmitted Done Reply Inline Actions Not really. I did notice that one line change. But now I see the whole unification into `isLoopParallel` which was what I was requesting for. bondhugula: Not really. I did notice that one line change. But now I see the whole unification into…
				ftynseAuthorUnsubmitted Done Reply Inline Actions Now you may be able to see it because it moved to a different file. ftynse: Now you may be able to see it because it moved to a different file.
				bool isLoopParallel(AffineForOp forOp, bool ignoreReductionIterArgs = false);

	/// Simplify the integer set by simplifying the underlying affine expressions by			/// Simplify the integer set by simplifying the underlying affine expressions by
	/// flattening and some simple inference. Also, drop any duplicate constraints.			/// flattening and some simple inference. Also, drop any duplicate constraints.
	/// Returns the simplified integer set. This method runs in time linear in the			/// Returns the simplified integer set. This method runs in time linear in the
	/// number of constraints.			/// number of constraints.
	IntegerSet simplifyIntegerSet(IntegerSet set);			IntegerSet simplifyIntegerSet(IntegerSet set);

	/// Returns the innermost common loop depth for the set of operations in 'ops'.			/// Returns the innermost common loop depth for the set of operations in 'ops'.
	unsigned getInnermostCommonLoopDepth(			unsigned getInnermostCommonLoopDepth(
	ArrayRef<Operation *> ops,			ArrayRef<Operation *> ops,
	SmallVectorImpl<AffineForOp> *surroundingLoops = nullptr);			SmallVectorImpl<AffineForOp> *surroundingLoops = nullptr);

	} // end namespace mlir			} // end namespace mlir

	#endif // MLIR_ANALYSIS_UTILS_H			#endif // MLIR_ANALYSIS_UTILS_H

mlir/include/mlir/Dialect/Affine/Passes.td

	Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines

	def AffineParallelize : FunctionPass<"affine-parallelize"> {			def AffineParallelize : FunctionPass<"affine-parallelize"> {
	let summary = "Convert affine.for ops into 1-D affine.parallel";			let summary = "Convert affine.for ops into 1-D affine.parallel";
	let constructor = "mlir::createAffineParallelizePass()";			let constructor = "mlir::createAffineParallelizePass()";
	let options = [			let options = [
	Option<"maxNested", "max-nested", "unsigned", /default=/"-1u",			Option<"maxNested", "max-nested", "unsigned", /default=/"-1u",
	"Maximum number of nested parallel loops to produce. "			"Maximum number of nested parallel loops to produce. "
	"Defaults to unlimited (UINT_MAX).">,			"Defaults to unlimited (UINT_MAX).">,
				Option<"parallelReductions", "parallel-reductions", "bool",
				/default=/"false",
				"Whether to parallelize reduction loops. Defaults to false.">
	];			];
	}			}

	def AffineLoopNormalize : FunctionPass<"affine-loop-normalize"> {			def AffineLoopNormalize : FunctionPass<"affine-loop-normalize"> {
	let summary = "Apply normalization transformations to affine loop-like ops";			let summary = "Apply normalization transformations to affine loop-like ops";
	let constructor = "mlir::createAffineLoopNormalizePass()";			let constructor = "mlir::createAffineLoopNormalizePass()";
	}			}

	def SimplifyAffineStructures : FunctionPass<"simplify-affine-structures"> {			def SimplifyAffineStructures : FunctionPass<"simplify-affine-structures"> {
	let summary = "Simplify affine expressions in maps/sets and normalize "			let summary = "Simplify affine expressions in maps/sets and normalize "
	"memrefs";			"memrefs";
	let constructor = "mlir::createSimplifyAffineStructuresPass()";			let constructor = "mlir::createSimplifyAffineStructuresPass()";
	}			}

	#endif // MLIR_DIALECT_AFFINE_PASSES			#endif // MLIR_DIALECT_AFFINE_PASSES

mlir/include/mlir/Dialect/Affine/Utils.h

	Show All 23 Lines
	class AffineIfOp;			class AffineIfOp;
	class AffineParallelOp;			class AffineParallelOp;
	struct LogicalResult;			struct LogicalResult;
	class Operation;			class Operation;

	/// Replaces parallel affine.for op with 1-d affine.parallel op.			/// Replaces parallel affine.for op with 1-d affine.parallel op.
	/// mlir::isLoopParallel detect the parallel affine.for ops.			/// mlir::isLoopParallel detect the parallel affine.for ops.
	/// There is no cost model currently used to drive this parallelization.			/// There is no cost model currently used to drive this parallelization.
	void affineParallelize(AffineForOp forOp);			LogicalResult affineParallelize(AffineForOp forOp);

	/// Hoists out affine.if/else to as high as possible, i.e., past all invariant			/// Hoists out affine.if/else to as high as possible, i.e., past all invariant
	/// affine.fors/parallel's. Returns success if any hoisting happened; folded` is			/// affine.fors/parallel's. Returns success if any hoisting happened; folded` is
	/// set to true if the op was folded or erased. This hoisting could lead to			/// set to true if the op was folded or erased. This hoisting could lead to
	/// significant code expansion in some cases.			/// significant code expansion in some cases.
	LogicalResult hoistAffineIfOp(AffineIfOp ifOp, bool *folded = nullptr);			LogicalResult hoistAffineIfOp(AffineIfOp ifOp, bool *folded = nullptr);

	/// Holds parameters to perform n-D vectorization on a single loop nest.			/// Holds parameters to perform n-D vectorization on a single loop nest.
	▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

mlir/lib/Analysis/Utils.cpp

Show First 20 Lines • Show All 1,263 Lines • ▼ Show 20 Lines	void mlir::getSequentialLoops(AffineForOp forOp,
forOp->walk([&](Operation *op) {		forOp->walk([&](Operation *op) {
if (auto innerFor = dyn_cast<AffineForOp>(op))		if (auto innerFor = dyn_cast<AffineForOp>(op))
if (!isLoopParallel(innerFor))		if (!isLoopParallel(innerFor))
sequentialLoops->insert(innerFor.getInductionVar());		sequentialLoops->insert(innerFor.getInductionVar());
});		});
}		}

/// Returns true if 'forOp' is parallel.		/// Returns true if 'forOp' is parallel.
bool mlir::isLoopParallel(AffineForOp forOp) {		bool mlir::isLoopParallel(AffineForOp forOp, bool ignoreReductionIterArgs) {
// Loop is not parallel if it has SSA loop-carried dependences.		// Loop is not parallel if it has SSA loop-carried dependences and reduction
// TODO: Conditionally support reductions and other loop-carried dependences		// detection is not requested.
// that could be handled in the context of a parallel loop.		if (forOp.getNumIterOperands() > 0 && !ignoreReductionIterArgs)
		ftynseAuthorUnsubmitted Done Reply Inline Actions Here's the early exit @bondhugula ^ ftynse: Here's the early exit @bondhugula ^
		ftynseAuthorUnsubmitted Done Reply Inline Actions This has moved to AffineAnalysis.cpp. ftynse: This has moved to AffineAnalysis.cpp.
if (forOp.getNumIterOperands() > 0)
return false;		return false;

// Collect all load and store ops in loop nest rooted at 'forOp'.		// Collect all load and store ops in loop nest rooted at 'forOp'.
SmallVector<Operation *, 8> loadAndStoreOpInsts;		SmallVector<Operation *, 8> loadAndStoreOpInsts;
auto walkResult = forOp.walk([&](Operation *opInst) -> WalkResult {		auto walkResult = forOp.walk([&](Operation *opInst) -> WalkResult {
if (isa<AffineReadOpInterface, AffineWriteOpInterface>(opInst))		if (isa<AffineReadOpInterface, AffineWriteOpInterface>(opInst))
loadAndStoreOpInsts.push_back(opInst);		loadAndStoreOpInsts.push_back(opInst);
else if (!isa<AffineForOp, AffineYieldOp, AffineIfOp>(opInst) &&		else if (!isa<AffineForOp, AffineYieldOp, AffineIfOp>(opInst) &&
Show All 40 Lines

mlir/lib/Dialect/Affine/Transforms/AffineParallelize.cpp

	Show All 34 Lines
	};			};
	} // namespace			} // namespace

	void AffineParallelize::runOnFunction() {			void AffineParallelize::runOnFunction() {
	FuncOp f = getFunction();			FuncOp f = getFunction();

	// The walker proceeds in post-order, but we need to process outer loops first			// The walker proceeds in post-order, but we need to process outer loops first
	// to control the number of outer parallel loops, so push candidate loops to			// to control the number of outer parallel loops, so push candidate loops to
	// the front of a deque.			// the front of a deque.
				bondhugulaUnsubmitted Done Reply Inline Actions Doc comment for this one please. bondhugula: Doc comment for this one please.
	std::deque<AffineForOp> parallelizableLoops;			std::deque<AffineForOp> parallelizableLoops;
	f.walk([&](AffineForOp loop) {			f.walk([&](AffineForOp loop) {
	if (isLoopParallel(loop))			if (isLoopParallel(loop, parallelReductions))
	parallelizableLoops.push_front(loop);			parallelizableLoops.push_front(loop);
	});			});

	for (AffineForOp loop : parallelizableLoops) {			for (AffineForOp loop : parallelizableLoops) {
	unsigned numParentParallelOps = 0;			unsigned numParentParallelOps = 0;
	for (Operation *op = loop->getParentOp();			for (Operation *op = loop->getParentOp();
	op != nullptr && !op->hasTrait<OpTrait::AffineScope>();			op != nullptr && !op->hasTrait<OpTrait::AffineScope>();
	op = op->getParentOp()) {			op = op->getParentOp()) {
	if (isa<AffineParallelOp>(op))			if (isa<AffineParallelOp>(op))
	++numParentParallelOps;			++numParentParallelOps;
	}			}

	if (numParentParallelOps < maxNested)			if (numParentParallelOps < maxNested) {
	affineParallelize(loop);			if (failed(affineParallelize(loop))) {
				LLVM_DEBUG(llvm::dbgs() << "[" DEBUG_TYPE "] failed to parallelize\n"
				<< loop);
				}
				} else {
				LLVM_DEBUG(llvm::dbgs() << "[" DEBUG_TYPE "] too many nested loops\n"
				<< loop);
				}
	}			}
	}			}

	std::unique_ptr<OperationPass<FuncOp>> mlir::createAffineParallelizePass() {			std::unique_ptr<OperationPass<FuncOp>> mlir::createAffineParallelizePass() {
	return std::make_unique<AffineParallelize>();			return std::make_unique<AffineParallelize>();
	}			}

mlir/lib/Dialect/Affine/Utils/CMakeLists.txt

	add_mlir_dialect_library(MLIRAffineUtils			add_mlir_dialect_library(MLIRAffineUtils
	Utils.cpp			Utils.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Affine			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Affine

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRAffine			MLIRAffine
				MLIRAnalysis
	MLIRTransformUtils			MLIRTransformUtils
	)			)

mlir/lib/Dialect/Affine/Utils/Utils.cpp

//===- Utils.cpp ---- Utilities for affine dialect transformation ---------===//		//===- Utils.cpp ---- Utilities for affine dialect transformation ---------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements miscellaneous transformation utilities for the Affine		// This file implements miscellaneous transformation utilities for the Affine
// dialect.		// dialect.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Dialect/Affine/Utils.h"		#include "mlir/Dialect/Affine/Utils.h"
		#include "mlir/Analysis/SliceAnalysis.h"
#include "mlir/Dialect/Affine/IR/AffineOps.h"		#include "mlir/Dialect/Affine/IR/AffineOps.h"
#include "mlir/IR/BlockAndValueMapping.h"		#include "mlir/IR/BlockAndValueMapping.h"
#include "mlir/IR/BuiltinOps.h"		#include "mlir/IR/BuiltinOps.h"
#include "mlir/IR/IntegerSet.h"		#include "mlir/IR/IntegerSet.h"
#include "mlir/Transforms/GreedyPatternRewriteDriver.h"		#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
		#include "llvm/ADT/SmallPtrSet.h"
		#include "llvm/ADT/TypeSwitch.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

using namespace mlir;		using namespace mlir;

/// Promotes the `then` or the `else` block of `ifOp` (depending on whether		/// Promotes the `then` or the `else` block of `ifOp` (depending on whether
/// `elseBlock` is false or true) into `ifOp`'s containing block, and discards		/// `elseBlock` is false or true) into `ifOp`'s containing block, and discards
/// the rest of the op.		/// the rest of the op.
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	static AffineIfOp hoistAffineIfOp(AffineIfOp ifOp, Operation *hoistOverOp) {
auto *elseBlock = hoistedIfOp.getElseBlock();		auto *elseBlock = hoistedIfOp.getElseBlock();
elseBlock->getOperations().splice(		elseBlock->getOperations().splice(
elseBlock->begin(), hoistOverOpClone->getBlock()->getOperations(),		elseBlock->begin(), hoistOverOpClone->getBlock()->getOperations(),
Block::iterator(hoistOverOpClone));		Block::iterator(hoistOverOpClone));

return hoistedIfOp;		return hoistedIfOp;
}		}

		/// Returns true if `value` (transitively) depends on iteration arguments of the
		/// given `forOp`.
		static bool dependsOnIterArgs(Value value, AffineForOp forOp) {
		// Compute the backward slice of the value.
		SetVector<Operation *> slice;
		bondhugulaUnsubmitted Done Reply Inline Actions This part of the comment is now stale I think. There's nothing to keep in sync. bondhugula: This part of the comment is now stale I think. There's nothing to keep in sync.
		getBackwardSlice(value, &slice,
		[&](Operation *op) { return !forOp->isAncestor(op); });

		// Check that none of the operands of the operations in the backward slice are
		// loop iteration arguments, and neither is the value itself.
		auto argRange = forOp.getRegionIterArgs();
		llvm::SmallPtrSet<Value, 8> iterArgs(argRange.begin(), argRange.end());
		if (iterArgs.contains(value))
		return true;

		for (Operation *op : slice)
		for (Value operand : op->getOperands())
		if (iterArgs.contains(operand))
		return true;

		return false;
		}

		bondhugulaUnsubmitted Done Reply Inline Actions Nit: A comment here perhaps saying AtomicRMWKind supports additional reduction kinds that we aren't detecting. bondhugula: Nit: A comment here perhaps saying AtomicRMWKind supports additional reduction kinds that we…
		/// Get the value that is being reduced by `pos`-th reduction in the loop if
		/// such a reduction can be performed by affine parallel loops. This assumes
		/// floating-point operations are commutative. On success, `kind` will be the
		/// reduction kind suitable for use in affine parallel loop builder. If the
		/// reduction is not supported, returns null.
		static Value getSupportedReduction(AffineForOp forOp, unsigned pos,
		AtomicRMWKind &kind) {
		bondhugulaUnsubmitted Done Reply Inline Actions This check can be moved up and the `isa_and_nonull` check and the `getReductionOperationKind` can be unified? bondhugula: This check can be moved up and the `isa_and_nonull` check and the `getReductionOperationKind`…
		auto yieldOp = cast<AffineYieldOp>(forOp.getBody()->back());
		Value yielded = yieldOp.operands()[pos];
		Operation *definition = yielded.getDefiningOp();
		cheliniUnsubmitted Done Reply Inline Actions I would remove braces. chelini: I would remove braces.
		kumasentoUnsubmitted Done Reply Inline Actions Just wondering is it possible that both operands can be iter_args? kumasento: Just wondering is it possible that both operands can be iter_args?
		ftynseAuthorUnsubmitted Done Reply Inline Actions It is possible in the IR, but wouldn't pass the single-use check above. ftynse: It is possible in the IR, but wouldn't pass the single-use check above.
		bondhugulaUnsubmitted Done Reply Inline Actions This check doesn't appear to be enough unless I missed something. You'll need to check this "value being reduced" isn't itself an iter_arg or a function of the iter_arg; furthermore, you'll also need to check that the result of this `definition` is being yielded at the right position. bondhugula: This check doesn't appear to be enough unless I missed something. You'll need to check this…
		ftynseAuthorUnsubmitted Done Reply Inline Actions You'll need to check this "value being reduced" isn't itself an iter_arg or a function of the iter_arg; Most of the cases were actually filtered out because we need all iter args to be reductions to transform them, but there are indeed weird cases that need to be handled. furthermore, you'll also need to check that the result of this definition is being yielded at the right position. `definition` is the defining op of the `pos`-th yield operand... ftynse: > You'll need to check this "value being reduced" isn't itself an iter_arg or a function of the…
		bondhugulaUnsubmitted Done Reply Inline Actions Sounds good to me. But it'd be good to add a comment on that note that you mention - "... were actually filtered out because we need all iter args to be reductions to transform them, but there are indeed weird cases that need to be handled" bondhugula: Sounds good to me. But it'd be good to add a comment on that note that you mention - //"...
		ftynseAuthorUnsubmitted Done Reply Inline Actions You might have missed it, I added a full-blown check in `dependsOnIterArgs` that fails if any operand in the backward slice is an iter_arg, so this no longer depends on the filtering. ftynse: You might have missed it, I added a full-blown check in `dependsOnIterArgs` that fails if any…
		sgrechanikUnsubmitted Done Reply Inline Actions Do the weird examples that can be caught by `dependsOnIterArgs` and cannot be caught by checking `hasOneUse()` really exist? Because it seems to me that checking `hasOneUse()` on every iter arg will exclude ALL weird cases, but maybe I miss something. sgrechanik: Do the weird examples that can be caught by `dependsOnIterArgs` and cannot be caught by…
		ftynseAuthorUnsubmitted Done Reply Inline Actions Yes, see the `strange_butterfly` case in tests. Each iter arg is used exactly once, in the same `addf` operation that becomes the reduction. One of the operands may also be a value transitively dependent on the single use of another iter arg. ftynse: Yes, see the `strange_butterfly` case in tests. Each iter arg is used exactly once, in the same…
		if (!definition)
		return nullptr;
		if (!forOp.getRegionIterArgs()[pos].hasOneUse())
		cheliniUnsubmitted Done Reply Inline Actions Same here. chelini: Same here.
		return nullptr;

		Optional<AtomicRMWKind> maybeKind =
		TypeSwitch<Operation *, Optional<AtomicRMWKind>>(definition)
		.Case<AddFOp>([](Operation *) { return AtomicRMWKind::addf; })
		.Case<MulFOp>([](Operation *) { return AtomicRMWKind::mulf; })
		.Case<AddIOp>([](Operation *) { return AtomicRMWKind::addi; })
		.Case<MulIOp>([](Operation *) { return AtomicRMWKind::muli; })
		.Default([](Operation *) -> Optional<AtomicRMWKind> {
		// TODO: AtomicRMW supports other kinds of reductions this is
		// currently not detecting, add those when the need arises.
		return llvm::None;
		});
		if (!maybeKind)
		return nullptr;

		kind = *maybeKind;
		if (definition->getOperand(0) == forOp.getRegionIterArgs()[pos] &&
		!dependsOnIterArgs(definition->getOperand(1), forOp))
		return definition->getOperand(1);
		if (definition->getOperand(1) == forOp.getRegionIterArgs()[pos] &&
		!dependsOnIterArgs(definition->getOperand(0), forOp))
		return definition->getOperand(0);

		return nullptr;
		}

/// Replace affine.for with a 1-d affine.parallel and clone the former's body		/// Replace affine.for with a 1-d affine.parallel and clone the former's body
/// into the latter while remapping values.		/// into the latter while remapping values. Also parallelize reductions if
		bondhugulaUnsubmitted Done Reply Inline Actions This requires a slight rephrase. This replacement is contingent on the reduction checks you have. bondhugula: This requires a slight rephrase. This replacement is contingent on the reduction checks you…
void mlir::affineParallelize(AffineForOp forOp) {		/// supported for all reductions.
		LogicalResult mlir::affineParallelize(AffineForOp forOp) {
		unsigned numReductions = forOp.getNumRegionIterArgs();
		SmallVector<AtomicRMWKind> reductionKinds;
		SmallVector<Value> reducedValues;
		reductionKinds.reserve(numReductions);
		reducedValues.reserve(numReductions);
		for (unsigned i = 0; i < numReductions; ++i) {
		cheliniUnsubmitted Done Reply Inline Actions Here we expect all the iterators to be valid reductions. I would make this explicit using a comment. chelini: Here we expect all the iterators to be valid reductions. I would make this explicit using a…
		ftynseAuthorUnsubmitted Done Reply Inline Actions How do you suggest to parallelize reduction loops where only some reductions are parallelizable? Sounds impossible to me, but still added a comment. ftynse: How do you suggest to parallelize reduction loops where only some reductions are…
		cheliniUnsubmitted Done Reply Inline Actions I don't have anything in particular in mind. I suggested adding a comment only to clarify when the function does not parallelize reductions. chelini: I don't have anything in particular in mind. I suggested adding a comment only to clarify when…
		reducedValues.push_back(
		getSupportedReduction(forOp, i, reductionKinds.emplace_back()));
		if (!reducedValues.back())
		return failure();
		bondhugulaUnsubmitted Done Reply Inline Actions It's this part that I thought should go into `isLoopParallel` it's really a reduction check. You can have it return (sa output arg) the reduced values? bondhugula: It's this part that I thought should go into `isLoopParallel` it's really a reduction check.
		}

Location loc = forOp.getLoc();		Location loc = forOp.getLoc();
OpBuilder outsideBuilder(forOp);		OpBuilder outsideBuilder(forOp);

// If a loop has a 'max' in the lower bound, emit it outside the parallel loop		// If a loop has a 'max' in the lower bound, emit it outside the parallel loop
// as it does not have implicit 'max' behavior.		// as it does not have implicit 'max' behavior.
AffineMap lowerBoundMap = forOp.getLowerBoundMap();		AffineMap lowerBoundMap = forOp.getLowerBoundMap();
ValueRange lowerBoundOperands = forOp.getLowerBoundOperands();		ValueRange lowerBoundOperands = forOp.getLowerBoundOperands();
AffineMap upperBoundMap = forOp.getUpperBoundMap();		AffineMap upperBoundMap = forOp.getUpperBoundMap();
ValueRange upperBoundOperands = forOp.getUpperBoundOperands();		ValueRange upperBoundOperands = forOp.getUpperBoundOperands();

bool needsMax = lowerBoundMap.getNumResults() > 1;		bool needsMax = lowerBoundMap.getNumResults() > 1;
bool needsMin = upperBoundMap.getNumResults() > 1;		bool needsMin = upperBoundMap.getNumResults() > 1;
AffineMap identityMap;		AffineMap identityMap;
if (needsMax \|\| needsMin) {		if (needsMax \|\| needsMin) {
if (forOp->getParentOp() &&		if (forOp->getParentOp() &&
!forOp->getParentOp()->hasTrait<OpTrait::AffineScope>())		!forOp->getParentOp()->hasTrait<OpTrait::AffineScope>())
return;		return failure();

identityMap = AffineMap::getMultiDimIdentityMap(1, loc->getContext());		identityMap = AffineMap::getMultiDimIdentityMap(1, loc->getContext());
}		}
if (needsMax) {		if (needsMax) {
auto maxOp = outsideBuilder.create<AffineMaxOp>(loc, lowerBoundMap,		auto maxOp = outsideBuilder.create<AffineMaxOp>(loc, lowerBoundMap,
lowerBoundOperands);		lowerBoundOperands);
lowerBoundMap = identityMap;		lowerBoundMap = identityMap;
lowerBoundOperands = maxOp->getResults();		lowerBoundOperands = maxOp->getResults();
}		}

// Same for the upper bound.		// Same for the upper bound.
if (needsMin) {		if (needsMin) {
auto minOp = outsideBuilder.create<AffineMinOp>(loc, upperBoundMap,		auto minOp = outsideBuilder.create<AffineMinOp>(loc, upperBoundMap,
upperBoundOperands);		upperBoundOperands);
upperBoundMap = identityMap;		upperBoundMap = identityMap;
upperBoundOperands = minOp->getResults();		upperBoundOperands = minOp->getResults();
}		}

// Creating empty 1-D affine.parallel op.		// Creating empty 1-D affine.parallel op.
AffineParallelOp newPloop = outsideBuilder.create<AffineParallelOp>(		AffineParallelOp newPloop = outsideBuilder.create<AffineParallelOp>(
loc, llvm::None, llvm::None, lowerBoundMap, lowerBoundOperands,		loc, ValueRange(reducedValues).getTypes(), reductionKinds, lowerBoundMap,
upperBoundMap, upperBoundOperands);		lowerBoundOperands, upperBoundMap, upperBoundOperands);
// Steal the body of the old affine for op and erase it.		// Steal the body of the old affine for op.
newPloop.region().takeBody(forOp.region());		newPloop.region().takeBody(forOp.region());
		Operation *yieldOp = &newPloop.getBody()->back();

		// Handle the initial values of reductions because the parallel loop always
		// starts from the neutral value.
		SmallVector<Value> newResults;
		newResults.reserve(numReductions);
		for (unsigned i = 0; i < numReductions; ++i) {
		Value init = forOp.getIterOperands()[i];
		// This works because we are only handling single-op reductions at the
		// moment. A switch on reduction kind or a mechanism to collect operations
		// participating in the reduction will be necessary for multi-op reductions.
		Operation *reductionOp = yieldOp->getOperand(i).getDefiningOp();
		assert(reductionOp && "yielded value is expected to be produced by an op");
		outsideBuilder.getInsertionBlock()->getOperations().splice(
		outsideBuilder.getInsertionPoint(), newPloop.getBody()->getOperations(),
		reductionOp);
		reductionOp->setOperands({init, newPloop->getResult(i)});
		bondhugulaUnsubmitted Done Reply Inline Actions assert on reductionOp not being null? bondhugula: assert on reductionOp not being null?
		forOp->getResult(i).replaceAllUsesWith(reductionOp->getResult(0));
		}

		// Update the loop terminator to yield reduced values bypassing the reduction
		// operation itself (now moved outside of the loop) and erase the block
		// arguments that correspond to reductions. Note that the loop always has one
		// "main" induction variable whenc coming from a non-parallel for.
		unsigned numIVs = 1;
		cheliniUnsubmitted Done Reply Inline Actions It may be not clear what this "1" means. Can we add a comment like: /* induction var / or use an explicit method like `getNumInductionVars()` ? chelini:* It may be not clear what this "1" means. Can we add a comment like: /* induction var */ or use…
		ftynseAuthorUnsubmitted Done Reply Inline Actions That's why there is a comment above saying "erase the block arguments that correspond to reductions". The `/name=/` comments are used to indicate the argument names of the callee, `llvm::seq` in this case, so the correct name would be `/Begin=/`. The linter will justifiably complain about any other comment because it only makes code more confusing for whoever familiar with the style. ftynse: That's why there is a comment above saying "erase the block arguments that correspond to…
		yieldOp->setOperands(reducedValues);
		newPloop.getBody()->eraseArguments(
		llvm::to_vector<4>(llvm::seq<unsigned>(numIVs, numReductions + numIVs)));

forOp.erase();		forOp.erase();
		return success();
}		}

// Returns success if any hoisting happened.		// Returns success if any hoisting happened.
LogicalResult mlir::hoistAffineIfOp(AffineIfOp ifOp, bool *folded) {		LogicalResult mlir::hoistAffineIfOp(AffineIfOp ifOp, bool *folded) {
// Bail out early if the ifOp returns a result. TODO: Consider how to		// Bail out early if the ifOp returns a result. TODO: Consider how to
// properly support this case.		// properly support this case.
if (ifOp.getNumResults() != 0)		if (ifOp.getNumResults() != 0)
return failure();		return failure();
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

mlir/test/Dialect/Affine/parallelize.mlir

// RUN: mlir-opt %s -allow-unregistered-dialect -affine-parallelize\| FileCheck %s		// RUN: mlir-opt %s -allow-unregistered-dialect -affine-parallelize \| FileCheck %s
// RUN: mlir-opt %s -allow-unregistered-dialect -affine-parallelize='max-nested=1' \| FileCheck --check-prefix=MAX-NESTED %s		// RUN: mlir-opt %s -allow-unregistered-dialect -affine-parallelize='max-nested=1' \| FileCheck --check-prefix=MAX-NESTED %s
		// RUN: mlir-opt %s -allow-unregistered-dialect -affine-parallelize='parallel-reductions=1' \| FileCheck --check-prefix=REDUCE %s

// CHECK-LABEL: func @reduce_window_max() {		// CHECK-LABEL: func @reduce_window_max() {
func @reduce_window_max() {		func @reduce_window_max() {
%cst = constant 0.000000e+00 : f32		%cst = constant 0.000000e+00 : f32
%0 = memref.alloc() : memref<1x8x8x64xf32>		%0 = memref.alloc() : memref<1x8x8x64xf32>
%1 = memref.alloc() : memref<1x18x18x64xf32>		%1 = memref.alloc() : memref<1x18x18x64xf32>
affine.for %arg0 = 0 to 1 {		affine.for %arg0 = 0 to 1 {
affine.for %arg1 = 0 to 8 {		affine.for %arg1 = 0 to 8 {
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	affine.for %i = affine_map<(d0) -> (d0)>(%lb0) to affine_map<(d0) -> (d0)>(%ub0) {
// MAX-NESTED: affine.for		// MAX-NESTED: affine.for
affine.for %j = affine_map<(d0) -> (d0)>(%lb1) to affine_map<(d0) -> (d0)>(%ub1) {		affine.for %j = affine_map<(d0) -> (d0)>(%lb1) to affine_map<(d0) -> (d0)>(%ub1) {
affine.load %m[%i, %j] : memref<?x?xf32>		affine.load %m[%i, %j] : memref<?x?xf32>
}		}
}		}
return		return
}		}

// CHECK-LABEL: @unsupported_iter_args		// CHECK-LABEL: @iter_args
func @unsupported_iter_args(%in: memref<10xf32>) {		// REDUCE-LABEL: @iter_args
		func @iter_args(%in: memref<10xf32>) {
		// REDUCE: %[[init:.*]] = constant
%cst = constant 0.000000e+00 : f32		%cst = constant 0.000000e+00 : f32
// CHECK-NOT: affine.parallel		// CHECK-NOT: affine.parallel
		// REDUCE: %[[reduced:.]] = affine.parallel (%{{.}}) = (0) to (10) reduce ("addf")
		bondhugulaUnsubmitted Done Reply Inline Actions Could you please check the body as well as you are replacing the old op if that sounds reasonable? bondhugula: Could you please check the body as well as you are replacing the old op if that sounds…
%final_red = affine.for %i = 0 to 10 iter_args(%red_iter = %cst) -> (f32) {		%final_red = affine.for %i = 0 to 10 iter_args(%red_iter = %cst) -> (f32) {
		bondhugulaUnsubmitted Done Reply Inline Actions Better to match the whole `affine.parallel` op itself for clarity at least in this one case? bondhugula: Better to match the whole `affine.parallel` op itself for clarity at least in this one case?
		// REDUCE: %[[red_value:.*]] = affine.load
%ld = affine.load %in[%i] : memref<10xf32>		%ld = affine.load %in[%i] : memref<10xf32>
		// REDUCE-NOT: addf
%add = addf %red_iter, %ld : f32		%add = addf %red_iter, %ld : f32
		// REDUCE: affine.yield %[[red_value]]
affine.yield %add : f32		affine.yield %add : f32
}		}
		// REDUCE: addf %[[init]], %[[reduced]]
return		return
}		}

// CHECK-LABEL: @unsupported_nested_iter_args		// CHECK-LABEL: @nested_iter_args
func @unsupported_nested_iter_args(%in: memref<20x10xf32>) {		// REDUCE-LABEL: @nested_iter_args
		func @nested_iter_args(%in: memref<20x10xf32>) {
%cst = constant 0.000000e+00 : f32		%cst = constant 0.000000e+00 : f32
// CHECK: affine.parallel		// CHECK: affine.parallel
affine.for %i = 0 to 20 {		affine.for %i = 0 to 20 {
// CHECK: affine.for		// CHECK-NOT: affine.parallel
		// REDUCE: affine.parallel
		// REDUCE: reduce ("addf")
%final_red = affine.for %j = 0 to 10 iter_args(%red_iter = %cst) -> (f32) {		%final_red = affine.for %j = 0 to 10 iter_args(%red_iter = %cst) -> (f32) {
%ld = affine.load %in[%i, %j] : memref<20x10xf32>		%ld = affine.load %in[%i, %j] : memref<20x10xf32>
%add = addf %red_iter, %ld : f32		%add = addf %red_iter, %ld : f32
affine.yield %add : f32		affine.yield %add : f32
}		}
}		}
return		return
}		}

		// REDUCE-LABEL: @strange_butterfly
		func @strange_butterfly() {
		%cst1 = constant 0.0 : f32
		%cst2 = constant 1.0 : f32
		// REDUCE-NOT: affine.parallel
		affine.for %i = 0 to 10 iter_args(%it1 = %cst1, %it2 = %cst2) -> (f32, f32) {
		%0 = addf %it1, %it2 : f32
		bondhugulaUnsubmitted Done Reply Inline Actions How about `addf %it1, %it1`? That was the one I was referring to. You could also have: %0 = addf %it1, (%A[%i]) %1 = addf %it2, %0 yield %0, %1 : f32, f32 Just making sure the weird cases (non reductions) are guarded. bondhugula: How about `addf %it1, %it1`? That was the one I was referring to. You could also have: ``` %0 =…
		ftynseAuthorUnsubmitted Done Reply Inline Actions The first one is easily caught because there are two uses of `%it1`, not one. The second case is caught by the backward slice check I added. Added both as tests. ftynse: The first one is easily caught because there are two uses of `%it1`, not one. The second case…
		affine.yield %0, %0 : f32, f32
		}
		return
		}

		// An iter arg is used more than once. This is not a simple reduction and
		// should not be parallelized.
		// REDUCE-LABEL: @repeated_use
		func @repeated_use() {
		%cst1 = constant 0.0 : f32
		// REDUCE-NOT: affine.parallel
		affine.for %i = 0 to 10 iter_args(%it1 = %cst1) -> (f32) {
		%0 = addf %it1, %it1 : f32
		affine.yield %0 : f32
		}
		return
		}

		// An iter arg is used in the chain of operations defining the value being
		// reduced, this is not a simple reduction and should not be parallelized.
		// REDUCE-LABEL: @use_in_backward_slice
		func @use_in_backward_slice() {
		%cst1 = constant 0.0 : f32
		%cst2 = constant 1.0 : f32
		// REDUCE-NOT: affine.parallel
		affine.for %i = 0 to 10 iter_args(%it1 = %cst1, %it2 = %cst2) -> (f32, f32) {
		%0 = "test.some_modification"(%it2) : (f32) -> f32
		%1 = addf %it1, %0 : f32
		affine.yield %1, %1 : f32, f32
		}
		return
		}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Affine: parallelize affine loops with reductionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 341092

mlir/include/mlir/Analysis/Utils.h

mlir/include/mlir/Dialect/Affine/Passes.td

mlir/include/mlir/Dialect/Affine/Utils.h

mlir/lib/Analysis/Utils.cpp

mlir/lib/Dialect/Affine/Transforms/AffineParallelize.cpp

mlir/lib/Dialect/Affine/Utils/CMakeLists.txt

mlir/lib/Dialect/Affine/Utils/Utils.cpp

mlir/test/Dialect/Affine/parallelize.mlir

[mlir] Affine: parallelize affine loops with reductions
ClosedPublic