Download Raw Diff

Details

Reviewers

bondhugula
dcaballe
qcolombet
rriddle
ftynse
springerm

Commits

rG0fa20ecafe0c: [mlir][Affine] Add helper functions to allow reordering affine.apply operands…

Summary

Care is taken to order operands from least hoistable to most hoistable and to process subexpressions in the same
order.

This allows exposing more oppportunities for licm, cse and strength reduction.

Such a step should typically be applied while we still have loops in the IR and just before lowering affine ops to arith.
This is because the affine.apply canonicalization currently tries to maximally compose chains of affine.apply operations
and could undo the effects of these decompositions.

Depends on: D145784

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nicolasvasilache created this revision.Mar 9 2023, 4:47 AM

Herald added a reviewer: bondhugula. · View Herald TranscriptMar 9 2023, 4:47 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: Moerafaat, bzcheeseman, sdasgup3 and 25 others. · View Herald Transcript

nicolasvasilache requested review of this revision.Mar 9 2023, 4:47 AM

Herald added a reviewer: dcaballe. · View Herald TranscriptMar 9 2023, 4:47 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added a subscriber: stephenneuendorffer. · View Herald Transcript

nicolasvasilache planned changes to this revision.Mar 9 2023, 4:47 AM

nicolasvasilache added a reviewer: qcolombet.

Harbormaster completed remote builds in B218366: Diff 503734.Mar 9 2023, 5:05 AM

add test and fix ordering assumptions to account for local variables.

Herald added a reviewer: rriddle. · View Herald TranscriptMar 9 2023, 7:01 AM

Drop debug spew

Harbormaster completed remote builds in B218387: Diff 503762.Mar 9 2023, 7:27 AM

Update tests

Format

Trim deps

Harbormaster completed remote builds in B218394: Diff 503771.Mar 9 2023, 8:31 AM

Rebase

nicolasvasilache mentioned this in D145977: [mlir][Transform] NFC - Various API cleanups and use RewriterBase in lieu of PatternRewriter.Mar 13 2023, 12:36 PM

Add licm and cse to test

nicolasvasilache added reviewers: ftynse, springerm.Mar 13 2023, 2:59 PM

Harbormaster completed remote builds in B219150: Diff 504817.Mar 13 2023, 3:53 PM

ftynse accepted this revision.Mar 13 2023, 5:38 PM

ftynse added inline comments.

mlir/include/mlir/Dialect/Affine/Transforms/Transforms.h
43–47	Looks unnecessary here.
mlir/lib/Dialect/Affine/Transforms/DecomposeAffineOps.cpp
44	Nit: I think `to_vector` no longer needs the number of stack elements.

This revision is now accepted and ready to land.Mar 13 2023, 5:38 PM

springerm accepted this revision.Mar 14 2023, 1:00 AM

qcolombet accepted this revision.Mar 14 2023, 1:22 AM

qcolombet added inline comments.

mlir/test/lib/Dialect/Affine/TestDecomposeAffineOps.cpp
29 ↗	(On Diff #504817)	For the integration in IREE, I had to "promote" this test pass into an actual pass (see https://github.com/iree-org/iree-llvm-fork/blob/4ef84146ad72a6a5878697daf9658844cefb0a22/mlir/lib/Dialect/Affine/Transforms/DecomposeAffineOps.cpp#L183). Could you front load this refactoring in this PR? Unless you have a different plan for the IREE integration.

Food for thoughts:
Since the canonicalization is going to undo the decomposition, should we rewrite the expression (e.g., as part of the canonicalization) to the ordering we want?
The downside is we would still rely on the backend to do the hoisting of the loop invariant stuff.

Right now, what happens is:
Let's say we have:

affine.apply affine_map<()[s0, s1, s2] -> (s0 * 1024 + s1 * 32 + s2)>()[%loopVariant, %inv1, %inv2]

This decomposes in:

%inv1x32 = affine.apply affine_map<()[s0] -> (s0 * 32)>()[%inv1]
%inv2_ = affine.apply affine_map<()[s0] -> (s0)>()[%inv2]
%inv1x32_plus_inv2 = affine.apply affine_map<()[s0, s1] -> (s0 + s1)>()[%inv1x32, %inv2_]
%loopVariantx1024 = affine.apply affine_map<()[s0] -> (s0 * 1024)>()[%loopVariant]
%res = affine.apply affine_map<()[s0, s1] -> (s0 + s1)>()[%loopVariant, %inv1x32_plus_inv2]

Then we run licm and lower. Hooray, we did the hoisting and produce the code we wanted.

Now, let's say we don't lower right away and happen to run through canonicalization, the resulting expression will look like:

affine.apply affine_map<()[s0, s1, s2] -> (s0 + s1 * 32 + s2 * 1024)>()[%inv1, %inv2, %loopVariant]

Now if we lower this expression, the backend is still able to do licm and whatnot.

To summarize, I like the decomposition approach as it is more flexible, but it is easy to undo it (via canonicalization) so I wonder if we should just make the expression with the invariant symbols first the canonical representation.

nicolasvasilache marked 3 inline comments as done.Mar 14 2023, 4:02 AM

nicolasvasilache added inline comments.

mlir/test/lib/Dialect/Affine/TestDecomposeAffineOps.cpp
29 ↗	(On Diff #504817)	Given the interaction with other passes and transforms I was thinking we want this functionality to just be called from a more comprehensive pass. I wouldn't want to chase phase orderings between this, canonicalize, licm, lower affine and stuff related to ldmatrix. I was thinking we'd have a new pass that puts the things properly together on the IREE side (or even upstream once we know exactly what we need) ?

Closed by commit rG0fa20ecafe0c: [mlir][Affine] Add helper functions to allow reordering affine.apply operands… (authored by nicolasvasilache). · Explain WhyMar 14 2023, 4:07 AM

This revision was automatically updated to reflect the committed changes.

nicolasvasilache marked an inline comment as done.

nicolasvasilache added a commit: rG0fa20ecafe0c: [mlir][Affine] Add helper functions to allow reordering affine.apply operands….

nicolasvasilache mentioned this in rG1cff4cbda305: [mlir][Transform] NFC - Various API cleanups and use RewriterBase in lieu of….Mar 14 2023, 4:23 AM

This patch broke the buildkite bot
https://buildkite.com/llvm-project/upstream-bazel/builds/56515#0186dfcd-e044-4adf-992d-b51e688544dd

Diff 503734

mlir/include/mlir/Dialect/Affine/Passes.h

	Show All 12 Lines

	#ifndef MLIR_DIALECT_AFFINE_PASSES_H			#ifndef MLIR_DIALECT_AFFINE_PASSES_H
	#define MLIR_DIALECT_AFFINE_PASSES_H			#define MLIR_DIALECT_AFFINE_PASSES_H

	#include "mlir/Pass/Pass.h"			#include "mlir/Pass/Pass.h"
	#include <limits>			#include <limits>

	namespace mlir {			namespace mlir {

	namespace func {			namespace func {
	class FuncOp;			class FuncOp;
	} // namespace func			} // namespace func

	class AffineForOp;			class AffineForOp;

	/// Fusion mode to attempt. The default mode `Greedy` does both			/// Fusion mode to attempt. The default mode `Greedy` does both
	/// producer-consumer and sibling fusion.			/// producer-consumer and sibling fusion.
	▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	/// line if provided.			/// line if provided.
	std::unique_ptr<OperationPass<func::FuncOp>>			std::unique_ptr<OperationPass<func::FuncOp>>
	createLoopUnrollAndJamPass(int unrollJamFactor = -1);			createLoopUnrollAndJamPass(int unrollJamFactor = -1);

	/// Creates a pass to pipeline explicit movement of data across levels of the			/// Creates a pass to pipeline explicit movement of data across levels of the
	/// memory hierarchy.			/// memory hierarchy.
	std::unique_ptr<OperationPass<func::FuncOp>> createPipelineDataTransferPass();			std::unique_ptr<OperationPass<func::FuncOp>> createPipelineDataTransferPass();

	/// Populate patterns that expand affine index operations into more fundamental
	/// operations (not necessarily restricted to Affine dialect).
	void populateAffineExpandIndexOpsPatterns(RewritePatternSet &patterns);

	/// Creates a pass to expand affine index operations into more fundamental			/// Creates a pass to expand affine index operations into more fundamental
	/// operations (not necessarily restricted to Affine dialect).			/// operations (not necessarily restricted to Affine dialect).
	std::unique_ptr<Pass> createAffineExpandIndexOpsPass();			std::unique_ptr<Pass> createAffineExpandIndexOpsPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Registration			// Registration
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	/// Generate the code for registering passes.			/// Generate the code for registering passes.
	#define GEN_PASS_REGISTRATION			#define GEN_PASS_REGISTRATION
	#include "mlir/Dialect/Affine/Passes.h.inc"			#include "mlir/Dialect/Affine/Passes.h.inc"

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_AFFINE_PASSES_H			#endif // MLIR_DIALECT_AFFINE_PASSES_H

mlir/include/mlir/Dialect/Affine/Transforms/Transforms.h

This file was added.

				//===- Transforms.h - Transforms Entrypoints ---------------------*- C++
				//-*-===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This header file defines a set of transforms specific for the AffineOps
				// dialect.
				//
				//===----------------------------------------------------------------------===//

				#ifndef MLIR_DIALECT_AFFINE_TRANSFORMS_TRANSFORMS_H
				#define MLIR_DIALECT_AFFINE_TRANSFORMS_TRANSFORMS_H

				#include "mlir/Support/LogicalResult.h"

				namespace mlir {
				class RewritePatternSet;
				class RewriterBase;
				class AffineApplyOp;

				/// Populate patterns that expand affine index operations into more fundamental
				/// operations (not necessarily restricted to Affine dialect).
				void populateAffineExpandIndexOpsPatterns(RewritePatternSet &patterns);

				/// Helper function to rewrite `op`'s affine map and reorder its operands such
				/// that they are in increasing order of hoistability (i.e. the least hoistable)
				/// operands come first in the operand list.
				void reorderOperandsByHoistability(RewriterBase &rewriter, AffineApplyOp op);

				/// Split an "affine.apply" operation into 2 smaller ops, exhibiting
				/// opportunities for CSE and LICM.
				/// Return the sink AffineApplyOp on success or failure if the
				/// Note that this can be currently undone by canonicalization which tries to
				/// maximally compose chains of AffineApplyOps.
				FailureOr<AffineApplyOp> decompose(RewriterBase &rewriter, AffineApplyOp op);

				//===----------------------------------------------------------------------===//
				// Registration
				//===----------------------------------------------------------------------===//

				} // namespace mlir

				#endif // MLIR_DIALECT_AFFINE_TRANSFORMS_TRANSFORMS_H
				ftynseUnsubmitted Done Reply Inline Actions Looks unnecessary here. ftynse: Looks unnecessary here.

mlir/lib/Dialect/Affine/Transforms/AffineExpandIndexOps.cpp

	//===- AffineExpandIndexOps.cpp - Affine expand index ops pass ------------===//			//===- AffineExpandIndexOps.cpp - Affine expand index ops pass ------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file implements a pass to expand affine index ops into one or more more			// This file implements a pass to expand affine index ops into one or more more
	// fundamental operations.			// fundamental operations.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "mlir/Dialect/Affine/Passes.h"			#include "mlir/Dialect/Affine/Passes.h"

	#include "mlir/Dialect/Affine/IR/AffineOps.h"			#include "mlir/Dialect/Affine/IR/AffineOps.h"
				#include "mlir/Dialect/Affine/Transforms/Transforms.h"
	#include "mlir/Dialect/Affine/Utils.h"			#include "mlir/Dialect/Affine/Utils.h"
	#include "mlir/Transforms/GreedyPatternRewriteDriver.h"			#include "mlir/Transforms/GreedyPatternRewriteDriver.h"

	namespace mlir {			namespace mlir {
	#define GEN_PASS_DEF_AFFINEEXPANDINDEXOPS			#define GEN_PASS_DEF_AFFINEEXPANDINDEXOPS
	#include "mlir/Dialect/Affine/Passes.h.inc"			#include "mlir/Dialect/Affine/Passes.h.inc"
	} // namespace mlir			} // namespace mlir

	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

mlir/lib/Dialect/Affine/Transforms/CMakeLists.txt

	add_mlir_dialect_library(MLIRAffineTransforms			add_mlir_dialect_library(MLIRAffineTransforms
	AffineDataCopyGeneration.cpp			AffineDataCopyGeneration.cpp
	AffineExpandIndexOps.cpp			AffineExpandIndexOps.cpp
	AffineLoopInvariantCodeMotion.cpp			AffineLoopInvariantCodeMotion.cpp
	AffineLoopNormalize.cpp			AffineLoopNormalize.cpp
	AffineParallelize.cpp			AffineParallelize.cpp
	AffineScalarReplacement.cpp			AffineScalarReplacement.cpp
				DecomposeAffineOps.cpp
	LoopCoalescing.cpp			LoopCoalescing.cpp
	LoopFusion.cpp			LoopFusion.cpp
	LoopTiling.cpp			LoopTiling.cpp
	LoopUnroll.cpp			LoopUnroll.cpp
	LoopUnrollAndJam.cpp			LoopUnrollAndJam.cpp
	PipelineDataTransfer.cpp			PipelineDataTransfer.cpp
	SuperVectorize.cpp			SuperVectorize.cpp
	SimplifyAffineStructures.cpp			SimplifyAffineStructures.cpp
	Show All 25 Lines

mlir/lib/Dialect/Affine/Transforms/DecomposeAffineOps.cpp

This file was added.

				//===- DecomposeAffineOps.cpp - Decompose affine ops into finer-grained ---===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements functionality to progressively decompose coarse-grained
				// affine ops into finer-grained ops.
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Conversion/AffineToStandard/AffineToStandard.h"

				#include "mlir/Dialect/Affine/IR/AffineOps.h"
				#include "mlir/Dialect/Affine/Transforms/Transforms.h"
				#include "mlir/Dialect/Affine/Utils.h"
				#include "mlir/IR/AffineExpr.h"
				#include "mlir/IR/PatternMatch.h"
				#include "mlir/IR/Value.h"
				#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
				#include "llvm/Support/Debug.h"
				#include <cstdint>

				using namespace mlir;

				#define DEBUG_TYPE "decompose-affine-ops"
				#define DBGS() (llvm::dbgs() << "[" DEBUG_TYPE "]: ")
				#define DBGSNL() (llvm::dbgs() << "\n")

				/// Count the number of invariant enclosing
				static int64_t numEnclosingInvariantLoops(OpOperand &operand) {
				int64_t count = 0;
				Operation *currentOp = operand.getOwner();
				while (auto loopOp = currentOp->getParentOfType<LoopLikeOpInterface>()) {
				if (!loopOp.isDefinedOutsideOfLoop(operand.get()))
				break;
				currentOp = loopOp;
				count++;
				}
				return count;
				}

				ftynseUnsubmitted Done Reply Inline Actions Nit: I think `to_vector` no longer needs the number of stack elements. ftynse: Nit: I think `to_vector` no longer needs the number of stack elements.
				/// Reorder operands by the number of loops above which the operand is defined.
				/// This allows us to unambiguously
				void mlir::reorderOperandsByHoistability(RewriterBase &rewriter,
				AffineApplyOp op) {
				SmallVector<int64_t> numInvariant = llvm::to_vector<4>(
				llvm::map_range(op->getOpOperands(), [&](OpOperand &operand) {
				return numEnclosingInvariantLoops(operand);
				}));

				int64_t numOperands = op.getNumOperands();
				SmallVector<int64_t> operandPositions =
				llvm::to_vector<4>(llvm::seq<int64_t>(0, numOperands));
				std::sort(operandPositions.begin(), operandPositions.end(),
				[&numInvariant](size_t i1, size_t i2) {
				return numInvariant[i1] < numInvariant[i2];
				});

				SmallVector<AffineExpr> replacements(numOperands);
				SmallVector<Value> operands(numOperands);
				for (int64_t i = 0; i < numOperands; ++i) {
				operands[i] = op.getOperand(operandPositions[i]);
				replacements[operandPositions[i]] = getAffineSymbolExpr(i, op.getContext());
				}

				AffineMap map = op.getAffineMap();
				ArrayRef<AffineExpr> repls{replacements};
				map = map.replaceDimsAndSymbols(repls.take_front(map.getNumDims()),
				repls.drop_front(map.getNumDims()),
				/numResultDims=/0,
				/numResultSyms=/numOperands);
				map = AffineMap::get(0, numOperands,
				simplifyAffineExpr(map.getResult(0), 0, numOperands),
				op->getContext());

				rewriter.startRootUpdate(op);
				op.setMap(map);
				op->setOperands(operands);
				rewriter.finalizeRootUpdate(op);
				}

				/// Build an affine.apply that is a subexpression `expr` of `originalOp`s affine
				/// map and with the same operands.
				/// Canonicalize the map and operands to deduplicate and drop dead operands
				/// before returning but do not perform maximal composition of AffineApplyOp
				/// which would defeat the purpose.
				static AffineApplyOp createSubApply(RewriterBase &rewriter,
				AffineApplyOp originalOp, AffineExpr expr) {
				MLIRContext *ctx = originalOp->getContext();
				AffineMap m = originalOp.getAffineMap();
				auto rhsMap = AffineMap::get(m.getNumDims(), m.getNumSymbols(), expr, ctx);
				SmallVector<Value> rhsOperands = originalOp->getOperands();
				canonicalizeMapAndOperands(&rhsMap, &rhsOperands);
				return rewriter.create<AffineApplyOp>(originalOp.getLoc(), rhsMap,
				rhsOperands);
				}

				/// Split an "affine.apply" operation into 2 smaller ops, exhibiting
				/// opportunities for CSE and LICM.
				/// Note that this can be currently undone by canonicalization which tries to
				/// maximally compose chains of AffineApplyOps.
				FailureOr<AffineApplyOp> mlir::decompose(RewriterBase &rewriter,
				AffineApplyOp op) {
				AffineMap m = op.getAffineMap();
				AffineExpr exp = m.getResult(0);
				auto binExpr = exp.dyn_cast<AffineBinaryOpExpr>();
				if (!binExpr)
				return rewriter.notifyMatchFailure(op, "terminal affine.apply");

				if (!binExpr.getLHS().isa<AffineBinaryOpExpr>() &&
				!binExpr.getRHS().isa<AffineBinaryOpExpr>())
				return rewriter.notifyMatchFailure(op, "terminal affine.apply");

				bool supportedKind = ((binExpr.getKind() == AffineExprKind::Add) \|\|
				(binExpr.getKind() == AffineExprKind::Mul));
				if (!supportedKind)
				return rewriter.notifyMatchFailure(
				op, "only add or mul binary expr can be reassociated");

				LLVM_DEBUG(DBGS() << "Start decomposeIntoFinerGrainedOps: " << op << "\n");

				// Iteratively extract the RHS while the binary operation does not change.
				// When done, we have an ordered list of affine.apply ops that we can
				// reassociate.
				MLIRContext *ctx = op->getContext();
				SmallVector<AffineApplyOp> rhsOps;
				while (auto currentBinExpr = exp.dyn_cast<AffineBinaryOpExpr>()) {
				if (currentBinExpr.getKind() != binExpr.getKind()) {
				rhsOps.push_back(createSubApply(rewriter, op, currentBinExpr));
				LLVM_DEBUG(DBGS() << "--subapply: " << rhsOps.back() << "\n");
				break;
				}
				rhsOps.push_back(createSubApply(rewriter, op, currentBinExpr.getRHS()));
				LLVM_DEBUG(DBGS() << "--subapply: " << rhsOps.back() << "\n");
				exp = currentBinExpr.getLHS();
				}

				// Merge back iteratively, thus achieving reassociation.
				auto s0 = getAffineSymbolExpr(0, ctx);
				auto s1 = getAffineSymbolExpr(1, ctx);
				AffineMap binMap = AffineMap::get(
				/dimCount=/0, /symbolCount=/2,
				getAffineBinaryOpExpr(binExpr.getKind(), s0, s1), ctx);
				AffineApplyOp rhs = rhsOps[0];
				for (int64_t i = 0, e = rhsOps.size(); i + 1 < e; ++i) {
				rhs = rewriter.create<AffineApplyOp>(op.getLoc(), binMap,
				ValueRange{rhs, rhsOps[i + 1]});
				LLVM_DEBUG(DBGS() << "--reassociate into: " << rhs << "\n");
				}

				rewriter.replaceOp(op, rhs.getResult());
				return rhs;
				}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Affine] Add helper functions to allow reordering affine.apply operands and decompose the ops into smaller components
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 503734

mlir/include/mlir/Dialect/Affine/Passes.h

mlir/include/mlir/Dialect/Affine/Transforms/Transforms.h

mlir/lib/Dialect/Affine/Transforms/AffineExpandIndexOps.cpp

mlir/lib/Dialect/Affine/Transforms/CMakeLists.txt

mlir/lib/Dialect/Affine/Transforms/DecomposeAffineOps.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Affine] Add helper functions to allow reordering affine.apply operands and decompose the ops into smaller componentsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 503734

mlir/include/mlir/Dialect/Affine/Passes.h

mlir/include/mlir/Dialect/Affine/Transforms/Transforms.h

mlir/lib/Dialect/Affine/Transforms/AffineExpandIndexOps.cpp

mlir/lib/Dialect/Affine/Transforms/CMakeLists.txt

mlir/lib/Dialect/Affine/Transforms/DecomposeAffineOps.cpp

[mlir][Affine] Add helper functions to allow reordering affine.apply operands and decompose the ops into smaller components
ClosedPublic