This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Linalg/TransformOps/
-
mlir/
-
Dialect/
-
Linalg/
-
TransformOps/
-
LinalgTransformOps.td
-
lib/Dialect/Linalg/TransformOps/
-
Dialect/
-
Linalg/
-
TransformOps/
2
LinalgTransformOps.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
transform-pack-greedily.mlir

Differential D142661

[mlir][Linalg] Adding a greedy packing transform dialect op.
ClosedPublic

Authored by nicolasvasilache on Jan 26 2023, 1:07 PM.

Download Raw Diff

Details

Reviewers

hanchung
ThomasRaoux
ftynse

Commits

rG55cf0de35efd: [mlir][Linalg] Adding a greedy packing transform dialect op.

Summary

This PR adds a pack_greedily transform operation that infers the packing for gemm
subcomputations embedded within in any LinalgOp and packs accordingly.
A normalization step guarantees that we get the innermost op dimensions in one of 8
possible (m, n, k) orders, specified as a parameter, from which we can emit all
packed forms.

The current implementation takes an arbitrary LinalgOp and tries to pack it along
dimensions ordered as (kk, mm, nn) and with sizes (32, 8, 16).
This is an arbitrary choice that can be lifted to a transform.pack_greedily.

This achieves a new level of normalization and generalization for any n-D
LinalgOp that contains a contraction hidden within it:
we will always see a predictable packed form for any of these ops.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nicolasvasilache created this revision.Jan 26 2023, 1:07 PM

Herald added a reviewer: hanchung. · View Herald TranscriptJan 26 2023, 1:07 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: hanchung, Moerafaat, bzcheeseman and 24 others. · View Herald Transcript

nicolasvasilache requested review of this revision.Jan 26 2023, 1:07 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 26 2023, 1:07 PM

Herald added subscribers: limo1996, stephenneuendorffer. · View Herald Transcript

nicolasvasilache added reviewers: ThomasRaoux, ftynse.Jan 26 2023, 1:07 PM

ThomasRaoux accepted this revision.Jan 26 2023, 1:43 PM

This revision is now accepted and ready to land.Jan 26 2023, 1:43 PM

Harbormaster completed remote builds in B210204: Diff 492552.Jan 26 2023, 1:46 PM

Closed by commit rG55cf0de35efd: [mlir][Linalg] Adding a greedy packing transform dialect op. (authored by nicolasvasilache). · Explain WhyJan 26 2023, 2:07 PM

This revision was automatically updated to reflect the committed changes.

nicolasvasilache added a commit: rG55cf0de35efd: [mlir][Linalg] Adding a greedy packing transform dialect op..

Pretty impressive progress, great to see things trying up together :)

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
1070	Wouldn't it be just better to use a bitset instead of DenseSet for seen? Seems like you're iterating nextPos from 0 to permSize here, pretty bounded!
1090	Drive by: while having assertions is traditionally how we manage this kind of invariants, APIs exposed through the Transform dialect are no longer "internals" to the compiler really and "somewhere" this should be checked and properly handled I think.

nicolasvasilache mentioned this in D144108: [mlir][LinAlg][Transform] Add a transform op for conv2d to im2col.Feb 15 2023, 8:52 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

TransformOps/

LinalgTransformOps.td

80 lines

lib/

Dialect/

Linalg/

TransformOps/

LinalgTransformOps.cpp

292 lines

test/

Dialect/

Linalg/

transform-pack-greedily.mlir

228 lines

Diff 492552

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td

Show First 20 Lines • Show All 364 Lines • ▼ Show 20 Lines
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// PackOp		// PackOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def PackOp : Op<Transform_Dialect, "structured.pack", [		def PackOp : Op<Transform_Dialect, "structured.pack", [
DeclareOpInterfaceMethods<TransformOpInterface>,		DeclareOpInterfaceMethods<TransformOpInterface>,
DeclareOpInterfaceMethods<MemoryEffectsOpInterface>,]> {		DeclareOpInterfaceMethods<MemoryEffectsOpInterface>]> {
let description = [{		let description = [{
Pack a LinalgOp by applying a data tiling transformation on the op and		Pack a LinalgOp by applying a data tiling transformation on the op and
packing the operands according to the `packed_sizes` specification.		packing the operands according to the `packed_sizes` specification.

Iterator dimensions are tiled in their canonical order in the op spec.		Iterator dimensions are tiled in their canonical order in the op spec.
Operands are packed according to the same canonical order of the op iterator		Operands are packed according to the same canonical order of the op iterator
dimensions.		dimensions.

▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	def PackOp : Op<Transform_Dialect, "structured.pack", [
];		];

let extraClassDeclaration = [{		let extraClassDeclaration = [{
::llvm::SmallVector<::mlir::OpFoldResult> getMixedPackedSizes();		::llvm::SmallVector<::mlir::OpFoldResult> getMixedPackedSizes();
}];		}];
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// PackGreedilyOp
		//===----------------------------------------------------------------------===//
		def PackGreedilyOp : Op<Transform_Dialect, "structured.pack_greedily", [
		DeclareOpInterfaceMethods<TransformOpInterface>,
		DeclareOpInterfaceMethods<MemoryEffectsOpInterface>]> {
		let description = [{
		Target a Linalg op and rewrite it into packed LinalgOp form by trying to
		infer whether a known suboperation is embedded

		Different packing strategies are applied in order, when one applies
		successfully, the transform returns:
		1. Gemm packing: Try to infer a gemm operation embedded in the target op.
		Specifically, this looks for 2 parallel dimensions that participate in
		an outer-product and 1 reduction dimension.
		These dimensions are referred as (m, n, k) to match canonical gemm
		terminology.
		The packed sizes for (m, n, k) are specified by `gemm_packed_sizes`.
		The ordering of the packed dimensions (mm, nn, kk) is specified by the
		`gemm_inner_dims_order` attribute.

		Packing occurs as follows:
		1. Find the dimensions to pack according to the strategy.
		2. The target is converted to linalg.generic form.
		3. An interchange transform is applied to isolate the dimensions to pack as
		the most minor indexing dimensions of the linalg.generic. The most minor
		dimensions are themselves ordered according to `inner_dims_order`.
		4. Packing is performed by `packed_sizes` and following `inner_dims_order`.

		By normalizing the most minor dimensions to `inner_dims_order`, the transform
		guarantees that packing immediates generates inner dimensions in a desirable
		layout.

		Outer dimension layout permutations are not controlled by this transform op
		at the moment and can be obtained by composing with the pack_transpose
		transformation.

		#### Return modes

		This operation ignores non-Linalg ops and drops them in the return.
		It returns the list of packed Linalg ops or the original op when all available
		packing strategies failed to apply.
		}];

		// TODO: Transform_ConcreteOpType<linalg::LinalgOp> needs interface.
		let arguments = (ins TransformHandleTypeInterface:$target,
		Variadic<PDL_Operation>:$gemm_packed_sizes,
		DefaultValuedAttr<DenseI64ArrayAttr, "{}">
		:$static_gemm_packed_sizes,
		DefaultValuedAttr<DenseI64ArrayAttr, "{}">
		:$gemm_inner_dims_order);
		let results = (outs Transform_ConcreteOpType<"linalg.generic">:$packed_op);

		let builders = [
		OpBuilder<(ins "Value":$target,
		"ArrayRef<OpFoldResult>":$mixedGemmPackedSizes,
		CArg<"ArrayRef<int64_t>", "{}">:$gemmDimsInnerDimsOrder)>
		];

		let assemblyFormat = [{
		$target
		oilist(
		`gemm_packed_sizes` `=` custom<DynamicIndexList>($gemm_packed_sizes,
		$static_gemm_packed_sizes)
		`gemm_inner_dims_order` `=` $gemm_inner_dims_order
		)
		attr-dict
		`:` functional-type($target, results)
		}];
		let hasVerifier = 1;

		let extraClassDeclaration = [{
		/// Returns the list of tile sizes, which may be static (Attribute) or
		/// dynamic (Value).
		SmallVector<OpFoldResult> getMixedGemmPackedSizes();
		}];
		}

		//===----------------------------------------------------------------------===//
// PackTransposeOp		// PackTransposeOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
def PackTransposeOp : Op<Transform_Dialect, "structured.pack_transpose", [		def PackTransposeOp : Op<Transform_Dialect, "structured.pack_transpose", [
FunctionalStyleTransformOpTrait,		FunctionalStyleTransformOpTrait,
MemoryEffectsOpInterface,		MemoryEffectsOpInterface,
DeclareOpInterfaceMethods<TransformOpInterface>]> {		DeclareOpInterfaceMethods<TransformOpInterface>]> {
let description = [{		let description = [{
Apply a transposition to a single `tensor.pack` (resp. `tensor.unpack`) and		Apply a transposition to a single `tensor.pack` (resp. `tensor.unpack`) and
▲ Show 20 Lines • Show All 998 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp

Show All 24 Lines
#include "mlir/Dialect/Transform/Utils/Utils.h"		#include "mlir/Dialect/Transform/Utils/Utils.h"
#include "mlir/Dialect/Utils/IndexingUtils.h"		#include "mlir/Dialect/Utils/IndexingUtils.h"
#include "mlir/IR/AffineMap.h"		#include "mlir/IR/AffineMap.h"
#include "mlir/IR/BuiltinTypes.h"		#include "mlir/IR/BuiltinTypes.h"
#include "mlir/IR/Matchers.h"		#include "mlir/IR/Matchers.h"
#include "mlir/IR/OpDefinition.h"		#include "mlir/IR/OpDefinition.h"
#include "mlir/IR/PatternMatch.h"		#include "mlir/IR/PatternMatch.h"
#include "mlir/Interfaces/TilingInterface.h"		#include "mlir/Interfaces/TilingInterface.h"
		#include "mlir/Support/LLVM.h"
#include "mlir/Transforms/GreedyPatternRewriteDriver.h"		#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/ScopeExit.h"		#include "llvm/ADT/ScopeExit.h"
		#include "llvm/ADT/SetOperations.h"
#include "llvm/ADT/StringSet.h"		#include "llvm/ADT/StringSet.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"

using namespace mlir;		using namespace mlir;
using namespace mlir::linalg;		using namespace mlir::linalg;
using namespace mlir::transform;		using namespace mlir::transform;

#define DEBUG_TYPE "linalg-transforms"		#define DEBUG_TYPE "linalg-transforms"
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	static DiagnosedSilenceableFailure unpackSingleIndexResultPDLOperations(
return DiagnosedSilenceableFailure::success();		return DiagnosedSilenceableFailure::success();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// DecomposeOp		// DecomposeOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

DiagnosedSilenceableFailure		DiagnosedSilenceableFailure
transform::DecomposeOp::applyToOne(linalg::LinalgOp target,		transform::DecomposeOp::applyToOne(LinalgOp target,
transform::ApplyToEachResultList &results,		transform::ApplyToEachResultList &results,
transform::TransformState &state) {		transform::TransformState &state) {
#define DOWNSCALE(trans) \		#define DOWNSCALE(trans) \
{ \		{ \
FailureOr<LinalgOp> res = tryApply<trans>(target); \		FailureOr<LinalgOp> res = tryApply<trans>(target); \
if (succeeded(res)) { \		if (succeeded(res)) { \
results.push_back(*res); \		results.push_back(*res); \
return DiagnosedSilenceableFailure::success(); \		return DiagnosedSilenceableFailure::success(); \
▲ Show 20 Lines • Show All 491 Lines • ▼ Show 20 Lines	transform::FuseIntoContainingOp::apply(transform::TransformResults &results,
return DiagnosedSilenceableFailure::success();		return DiagnosedSilenceableFailure::success();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// GeneralizeOp		// GeneralizeOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

DiagnosedSilenceableFailure		DiagnosedSilenceableFailure
transform::GeneralizeOp::applyToOne(linalg::LinalgOp target,		transform::GeneralizeOp::applyToOne(LinalgOp target,
transform::ApplyToEachResultList &results,		transform::ApplyToEachResultList &results,
transform::TransformState &state) {		transform::TransformState &state) {
// Exit early if no transformation is needed.		// Exit early if no transformation is needed.
if (isa<GenericOp>(target)) {		if (isa<GenericOp>(target)) {
results.push_back(target);		results.push_back(target);
return DiagnosedSilenceableFailure::success();		return DiagnosedSilenceableFailure::success();
}		}
FailureOr<LinalgOp> generic = tryApply<LinalgGeneralizationPattern>(target);		FailureOr<LinalgOp> generic = tryApply<LinalgGeneralizationPattern>(target);
if (succeeded(generic)) {		if (succeeded(generic)) {
results.push_back(generic->getOperation());		results.push_back(generic->getOperation());
return DiagnosedSilenceableFailure::success();		return DiagnosedSilenceableFailure::success();
}		}
return emitDefaultSilenceableFailure(target);		return emitDefaultSilenceableFailure(target);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// InterchangeOp		// InterchangeOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

DiagnosedSilenceableFailure		DiagnosedSilenceableFailure
transform::InterchangeOp::applyToOne(linalg::GenericOp target,		transform::InterchangeOp::applyToOne(GenericOp target,
transform::ApplyToEachResultList &results,		transform::ApplyToEachResultList &results,
transform::TransformState &state) {		transform::TransformState &state) {
ArrayRef<int64_t> interchangeVector = getIteratorInterchange();		ArrayRef<int64_t> interchangeVector = getIteratorInterchange();
// Exit early if no transformation is needed.		// Exit early if no transformation is needed.
if (interchangeVector.empty()) {		if (interchangeVector.empty()) {
results.push_back(target);		results.push_back(target);
return DiagnosedSilenceableFailure::success();		return DiagnosedSilenceableFailure::success();
}		}
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	auto matchFun = [&](Operation *op) {
if (getOps().has_value() && !strs.contains(op->getName().getStringRef()))		if (getOps().has_value() && !strs.contains(op->getName().getStringRef()))
return;		return;

// Interfaces cannot be matched by name, just by ID.		// Interfaces cannot be matched by name, just by ID.
// So we specifically encode the interfaces we care about for this op.		// So we specifically encode the interfaces we care about for this op.
if (getInterface().has_value()) {		if (getInterface().has_value()) {
auto iface = getInterface().value();		auto iface = getInterface().value();
if (iface == transform::MatchInterfaceEnum::LinalgOp &&		if (iface == transform::MatchInterfaceEnum::LinalgOp &&
!isa<linalg::LinalgOp>(op))		!isa<LinalgOp>(op))
return;		return;
if (iface == transform::MatchInterfaceEnum::TilingInterface &&		if (iface == transform::MatchInterfaceEnum::TilingInterface &&
isa<TilingInterface>(op))		isa<TilingInterface>(op))
return;		return;
}		}

// Check if all specified attributes match.		// Check if all specified attributes match.
if (getOpAttrs().has_value()) {		if (getOpAttrs().has_value()) {
▲ Show 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	void transform::PackOp::build(OpBuilder &builder, OperationState &result,
SmallVector<int64_t> staticPackedSizes;		SmallVector<int64_t> staticPackedSizes;
SmallVector<Value> dynamicPackedSizes;		SmallVector<Value> dynamicPackedSizes;
dispatchIndexOpFoldResults(mixedPackedSizes, dynamicPackedSizes,		dispatchIndexOpFoldResults(mixedPackedSizes, dynamicPackedSizes,
staticPackedSizes);		staticPackedSizes);
// Call the default builder which sets up the proper operands segment sizes		// Call the default builder which sets up the proper operands segment sizes
// attributes for multiple variadic operands. In the absence of this, horrible		// attributes for multiple variadic operands. In the absence of this, horrible
// bugs ensue.		// bugs ensue.
Type linalgOpHType = transform::OperationType::get(		Type linalgOpHType = transform::OperationType::get(
builder.getContext(), linalg::GenericOp::getOperationName());		builder.getContext(), GenericOp::getOperationName());
build(builder, result,		build(builder, result,
/resultType=/linalgOpHType,		/resultType=/linalgOpHType,
/target=/target,		/target=/target,
/dynamic_sizes=/dynamicPackedSizes,		/dynamic_sizes=/dynamicPackedSizes,
/static_sizes=/builder.getDenseI64ArrayAttr(staticPackedSizes));		/static_sizes=/builder.getDenseI64ArrayAttr(staticPackedSizes));
}		}

SmallVector<OpFoldResult> transform::PackOp::getMixedPackedSizes() {		SmallVector<OpFoldResult> transform::PackOp::getMixedPackedSizes() {
Builder b(getContext());		Builder b(getContext());
return getMixedValues(getStaticPackedSizes(), getPackedSizes(), b);		return getMixedValues(getStaticPackedSizes(), getPackedSizes(), b);
}		}

DiagnosedSilenceableFailure		DiagnosedSilenceableFailure
transform::PackOp::apply(transform::TransformResults &transformResults,		transform::PackOp::apply(transform::TransformResults &transformResults,
transform::TransformState &state) {		transform::TransformState &state) {
ArrayRef<Operation *> targetOps = state.getPayloadOps(getTarget());		ArrayRef<Operation *> targetOps = state.getPayloadOps(getTarget());
// If nothing to pack, propagate success.		// If nothing to pack, propagate success.
if (targetOps.empty()) {		if (targetOps.empty()) {
transformResults.set(getPackedOp().cast<OpResult>(), {});		transformResults.set(getPackedOp().cast<OpResult>(), {});
return DiagnosedSilenceableFailure::success();		return DiagnosedSilenceableFailure::success();
}		}
// Fail on multi-op handles.		// Fail on multi-op handles.
auto linalgOp = dyn_cast<linalg::LinalgOp>(targetOps.front());		auto linalgOp = dyn_cast<LinalgOp>(targetOps.front());
if (targetOps.size() != 1 \|\| !linalgOp) {		if (targetOps.size() != 1 \|\| !linalgOp) {
return emitSilenceableError()		return emitSilenceableError()
<< "requires target to map to exactly 1 LinalgOp (got "		<< "requires target to map to exactly 1 LinalgOp (got "
<< targetOps.size() << ")";		<< targetOps.size() << ")";
}		}
// Fail on mismatched number of pack sizes.		// Fail on mismatched number of pack sizes.
if (getMixedPackedSizes().size() != linalgOp.getNumLoops()) {		if (getMixedPackedSizes().size() != linalgOp.getNumLoops()) {
return emitSilenceableError()		return emitSilenceableError()
Show All 22 Lines	void transform::PackOp::getEffects(
SmallVectorImpl<MemoryEffects::EffectInstance> &effects) {		SmallVectorImpl<MemoryEffects::EffectInstance> &effects) {
transform::consumesHandle(getTarget(), effects);		transform::consumesHandle(getTarget(), effects);
transform::onlyReadsHandle(getPackedSizes(), effects);		transform::onlyReadsHandle(getPackedSizes(), effects);
transform::producesHandle(getPackedOp(), effects);		transform::producesHandle(getPackedOp(), effects);
transform::modifiesPayload(effects);		transform::modifiesPayload(effects);
}		}

//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
		// PackGreedilyOp.
		//===---------------------------------------------------------------------===//

		LogicalResult transform::PackGreedilyOp::verify() {
		if (!isPermutationVector(getGemmInnerDimsOrder())) {
		return emitOpError() << getGemmInnerDimsOrderAttrName()
		<< " is not a valid permutation";
		}
		// TODO: relax to allow empty once we have another strategy than just gemm.
		if (getGemmInnerDimsOrder().size() != 3 \|\|
		getMixedGemmPackedSizes().size() != 3) {
		return emitOpError() << " needs 3 entries for gemm_packed_sizes and "
		<< getGemmInnerDimsOrderAttrName()
		<< " order for the gemm strategy";
		}
		return success();
		}

		namespace {
		auto par = utils::IteratorType::parallel;
		auto red = utils::IteratorType::reduction;
		} // namespace

		/// Return the set of AffineDimExpr
		static DenseSet<int64_t>
		findPermutationsIndexingOperand(LinalgOp linalgOp, OpOperand *opOperand,
		utils::IteratorType iter) {
		DenseSet<int64_t> res;
		assert(linalgOp == opOperand->getOwner() && "expected linalgOp owner");
		AffineMap indexingMap = linalgOp.getMatchingIndexingMap(opOperand);
		for (AffineExpr e : indexingMap.getResults()) {
		if (auto d = e.dyn_cast<AffineDimExpr>()) {
		if (linalgOp.getIteratorTypesArray()[d.getPosition()] == iter &&
		llvm::count_if(indexingMap.getResults(), [d](AffineExpr e) {
		return e.isFunctionOfDim(d.getPosition());
		}) == 1)
		res.insert(d.getPosition());
		}
		}
		return res;
		}

		struct ContractionDimsForPacking {
		int64_t mPos, nPos, kPos;
		};
		/// Greedily look for 2 parallel (m and n) and 1 reduction (k) dimension that
		/// form a contraction. Such dimensions are such that:
		/// 1. The m dimension is involved in an outer-product along LHS
		/// (i.e. it is a permutation on RES and LHS and does not appear in RHS).
		/// 2. The n dimension is involved in an outer-product along RHS
		/// (i.e. it is a permutation on RES and RHS and does not appear in LHS).
		/// 3. The k dimension appears as a permutation on LHS and RHS.
		/// 4. m, n and k appear only once in any given indexing.
		///
		/// This allows detecting that some contraction is embedded within `linalgOp`.
		///
		/// When multiple possibilities for selecting m, n and k appear, we just pick
		/// an arbitrary one (i.e. the first in a DenseSet).
		// TODO: Better heuristic (e.g pick dims based on packing-based metric).
		static FailureOr<ContractionDimsForPacking>
		getContractionDims(LinalgOp linalgOp) {
		assert(linalgOp.getNumDpsInits() == 1 && "wrong number of dps inits");
		assert(linalgOp.getNumDpsInputs() == 2 && "wrong number of dps inputs");

		DenseSet<int64_t> a = findPermutationsIndexingOperand(
		linalgOp, linalgOp.getDpsInputOperand(0), par);
		DenseSet<int64_t> b = findPermutationsIndexingOperand(
		linalgOp, linalgOp.getDpsInputOperand(1), par);
		DenseSet<int64_t> c = findPermutationsIndexingOperand(
		linalgOp, linalgOp.getDpsInitOperand(0), par);

		// A & C - B are the iterators involved in an outer-product along A (the LHS).
		DenseSet<int64_t> ac = a;
		llvm::set_intersect(ac, c);
		llvm::set_subtract(ac, b);
		// B & C - A are the iterators involved in an outer-product along B (the RHS).
		DenseSet<int64_t> bc = b;
		llvm::set_intersect(bc, c);
		llvm::set_subtract(bc, a);

		// Note: if we ever need them, A & B & C would be "batch" dimensions.

		// A & B red are the reduction dimensions.
		DenseSet<int64_t> ra = findPermutationsIndexingOperand(
		linalgOp, linalgOp.getDpsInputOperand(0), red);
		DenseSet<int64_t> rb = findPermutationsIndexingOperand(
		linalgOp, linalgOp.getDpsInputOperand(1), red);
		llvm::set_intersect(ra, rb);

		if (ac.empty() \|\| bc.empty() \|\| ra.empty())
		return failure();

		// Pick the first one in each set.
		// TODO: Better heuristic (e.g pick dims based on packing-based metric).
		return ContractionDimsForPacking{ac.begin(), bc.begin(), *ra.begin()};
		}

		/// Return a permutation vector of size permSize that would result in moving
		/// positions into desiredPositions.
		///
		/// For example, permSize == 5, positions = {2, 4}, desiredPositions = {1, 0}
		/// would result in a {4, 2, 0, 1, 3} permutation vector.
		static SmallVector<int64_t>
		computePermutationVector(int64_t permSize, ArrayRef<int64_t> positions,
		ArrayRef<int64_t> desiredPositions) {
		SmallVector<int64_t> res(permSize, -1);
		DenseSet<int64_t> seen;
		for (auto [pos, desiredPos] : llvm::zip(positions, desiredPositions)) {
		res[desiredPos] = pos;
		seen.insert(pos);
		}
		int64_t nextPos = 0;
		for (int64_t &entry : res) {
		if (entry != -1)
		continue;
		while (seen.contains(nextPos))
		++nextPos;
		entry = nextPos;
		++nextPos;
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Wouldn't it be just better to use a bitset instead of DenseSet for seen? Seems like you're iterating nextPos from 0 to permSize here, pretty bounded! mehdi_amini: Wouldn't it be just better to use a bitset instead of DenseSet for seen? Seems like you're…
		}
		return res;
		}

		/// Pack a LinalgOp by greedily inferring 2-D contraction dimensions (m, n, k)
		/// where m and n are proper parallel dimensions and k is a proper reduction
		/// dimension.
		/// Packing occurs by rewriting the op as a linalg.generic and calling
		/// linalg::pack by `mnkPackedSizes`.
		/// The order of the packed dimensions is customizable: the `mnkOrder` is a
		/// permutation of {0, 1, 2} to reorder {m, n, k} into one of the 8 possible
		/// forms.
		/// The outer dimensions of the operands are not permuted at this time, this is
		/// left for future work.
		static FailureOr<LinalgOp>
		packContractionGreedily(RewriterBase &rewriter, LinalgOp linalgOp,
		ArrayRef<OpFoldResult> mnkPackedSizes,
		ArrayRef<int64_t> mnkOrder) {
		assert(mnkPackedSizes.size() == 3 && "unexpected num of packing sizes");
		assert(mnkOrder.size() == 3 && "unexpected mnkOrder size");
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Drive by: while having assertions is traditionally how we manage this kind of invariants, APIs exposed through the Transform dialect are no longer "internals" to the compiler really and "somewhere" this should be checked and properly handled I think. mehdi_amini: Drive by: while having assertions is traditionally how we manage this kind of invariants, APIs…
		assert(isPermutationVector(mnkOrder) && "expected a permutation");

		int64_t numLoops = linalgOp.getNumLoops();
		if (numLoops <= 2)
		return rewriter.notifyMatchFailure(linalgOp,
		"need 3+ loops to pack a contraction");

		// Locally adjust the desired iterator position of mnk and packing sizes.
		int64_t numPackedDims = mnkPackedSizes.size();
		SmallVector<int64_t> mmnnkkPos(numPackedDims);
		for (int64_t i = 0, e = numPackedDims; i < e; ++i)
		mmnnkkPos[i] = numLoops - numPackedDims + mnkOrder[i];
		SmallVector<OpFoldResult> packedSizes(mnkPackedSizes.size());
		for (int64_t i = 0, e = numPackedDims; i < e; ++i)
		packedSizes[mnkOrder[i]] = mnkPackedSizes[i];

		// 1. Infer dims that are important for contraction.
		FailureOr<ContractionDimsForPacking> res = getContractionDims(linalgOp);
		if (failed(res)) {
		return rewriter.notifyMatchFailure(linalgOp,
		"couldn't infer contraction iterators");
		}

		// 2. Normalize linalgOp to an kmn-matmul-like with [red, par, par] most
		// minor iterators. If we wanted a different normalization order, this is
		// where it would have to start.
		int64_t mPos = res->mPos, nPos = res->nPos, kPos = res->kPos;
		LLVM_DEBUG(DBGSNL(); DBGSNL(); DBGSNL();
		DBGS() << "Start packing generic op greedily with (m@" << mPos
		<< ", n@" << nPos << ", k@" << kPos << "): " << linalgOp
		<< "\n";);

		// 2.a. Rewrite as a generic.
		auto genericOp = dyn_cast<GenericOp>(linalgOp.getOperation());
		if (!genericOp) {
		FailureOr<GenericOp> generalizeResult =
		generalizeNamedOp(rewriter, linalgOp);
		assert(succeeded(generalizeResult) && "unexpected failure generalizing op");
		genericOp = *generalizeResult;
		}

		// 2.b. Interchange to move the dimensions (k, m, n) as most-minor iterators.
		// Note that this only normalized the iteration order and does not change the
		// indexings of any operand.
		SmallVector<int64_t> permutation =
		computePermutationVector(numLoops, {mPos, nPos, kPos}, mmnnkkPos);
		LLVM_DEBUG(llvm::interleaveComma(permutation, DBGS() << "perm: "); DBGSNL(););
		// Sign .. unsigned pollution.
		SmallVector<unsigned> unsignedPerm(permutation.begin(), permutation.end());
		FailureOr<GenericOp> interchangeResult =
		interchangeGenericOp(rewriter, genericOp, unsignedPerm);
		assert(succeeded(interchangeResult) && "unexpected failure interchanging op");
		genericOp = *interchangeResult;
		LLVM_DEBUG(DBGS() << "Generalized Op to pack: " << genericOp << "\n";);

		// At this point, the op iterators are normalized to {leading, k, m, n}.
		// The layouts induced by packing will always be:
		// - LHS{leading_lhs, kk, mm}
		// - RHS{leading_rhs, kk, nn}
		// - RES{leading_res, mm, nn}
		// If we wanted to change the packed order, we would reorder (k, m, n) to
		// something else above.
		//
		// Additional permutations of the outer dims of the operands (i.e.
		// leading_lhs, leading_rhs and leading_res) could follow by computing the
		// desired outerPerm for each operand.
		// This is left for future work.

		// Add leading zeros to match numLoops.
		SmallVector<OpFoldResult> adjustedPackedSizes(numLoops - packedSizes.size(),
		rewriter.getIndexAttr(0));
		llvm::append_range(adjustedPackedSizes, packedSizes);

		// TODO: If we wanted to give the genericOp a name after packing, after
		// calling `pack` would be a good time.
		return linalg::pack(rewriter, genericOp, adjustedPackedSizes);
		}

		DiagnosedSilenceableFailure
		PackGreedilyOp::apply(transform::TransformResults &transformResults,
		transform::TransformState &state) {
		ArrayRef<Operation *> targetOps = state.getPayloadOps(getTarget());

		SmallVector<Operation *> results;
		IRRewriter rewriter(getContext());
		for (Operation *op : targetOps) {
		auto linalgOp = dyn_cast<LinalgOp>(op);
		if (!linalgOp)
		continue;
		// linalgOp will be replaced and the insertion point may be invalidated if
		// we set it before -> set it after.
		rewriter.setInsertionPointAfter(linalgOp);
		// Failing to pack greedily is perfectly fine.
		// In the future we will want to order packings according to some metric.
		// For now we just pack contractions embedded in ops in the order:
		// {kk, mm, nn} by size {32, 8, 16}.
		FailureOr<LinalgOp> contraction = packContractionGreedily(
		/rewriter=/rewriter,
		/linalgOp=/linalgOp,
		/mnkPackedSizes=/getMixedGemmPackedSizes(),
		/mnkOrder=/getGemmInnerDimsOrder());
		if (succeeded(contraction)) {
		results.push_back(*contraction);
		continue;
		}
		results.push_back(linalgOp);
		}
		transformResults.set(getPackedOp().cast<OpResult>(), results);
		return DiagnosedSilenceableFailure::success();
		}

		SmallVector<OpFoldResult> PackGreedilyOp::getMixedGemmPackedSizes() {
		Builder b(getContext());
		return getMixedValues(getStaticGemmPackedSizes(), getGemmPackedSizes(), b);
		}

		void transform::PackGreedilyOp::getEffects(
		SmallVectorImpl<MemoryEffects::EffectInstance> &effects) {
		transform::consumesHandle(getTarget(), effects);
		transform::onlyReadsHandle(getGemmPackedSizes(), effects);
		transform::producesHandle(getPackedOp(), effects);
		transform::modifiesPayload(effects);
		}

		//===---------------------------------------------------------------------===//
// PackTransposeOp		// PackTransposeOp
//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//

LogicalResult transform::PackTransposeOp::verify() {		LogicalResult transform::PackTransposeOp::verify() {
if (!isPermutationVector(getInnerPerm())) {		if (!isPermutationVector(getInnerPerm())) {
return emitOpError() << getInnerPermAttrName()		return emitOpError() << getInnerPermAttrName()
<< " is not a valid permutation";		<< " is not a valid permutation";
}		}
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	transform::PackTransposeOp::apply(transform::TransformResults &transformResults,

// Step 2.2. Fail on wrong type.		// Step 2.2. Fail on wrong type.
auto packOp = dyn_cast<tensor::PackOp>(packOrUnpackOps.front());		auto packOp = dyn_cast<tensor::PackOp>(packOrUnpackOps.front());
auto unPackOp = dyn_cast<tensor::UnPackOp>(packOrUnpackOps.front());		auto unPackOp = dyn_cast<tensor::UnPackOp>(packOrUnpackOps.front());
if ((!packOp && !unPackOp)) {		if ((!packOp && !unPackOp)) {
return emitSilenceableError() << "requires target to map to a "		return emitSilenceableError() << "requires target to map to a "
"tensor.pack or tensor.unpack";		"tensor.pack or tensor.unpack";
}		}
LinalgOp linalgOpTarget = dyn_cast<linalg::LinalgOp>(linalgOps.front());		LinalgOp linalgOpTarget = dyn_cast<LinalgOp>(linalgOps.front());
if (!linalgOpTarget)		if (!linalgOpTarget)
return emitSilenceableError() << "requires a LinalgOp target";		return emitSilenceableError() << "requires a LinalgOp target";

// Step 2.3. Fail if we can't get the producer / consumer Linalg op.		// Step 2.3. Fail if we can't get the producer / consumer Linalg op.
LinalgOp linalgOp;		LinalgOp linalgOp;
if (packOp && packOp.getResult().hasOneUse())		if (packOp && packOp.getResult().hasOneUse())
linalgOp = dyn_cast<LinalgOp>(*(packOp.getResult().getUsers().begin()));		linalgOp = dyn_cast<LinalgOp>(*(packOp.getResult().getUsers().begin()));
else if (unPackOp)		else if (unPackOp)
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	transform::PackTransposeOp::apply(transform::TransformResults &transformResults,
return DiagnosedSilenceableFailure::success();		return DiagnosedSilenceableFailure::success();
}		}

//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
// PadOp		// PadOp
//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//

DiagnosedSilenceableFailure		DiagnosedSilenceableFailure
transform::PadOp::applyToOne(linalg::LinalgOp target,		transform::PadOp::applyToOne(LinalgOp target,
transform::ApplyToEachResultList &results,		transform::ApplyToEachResultList &results,
transform::TransformState &state) {		transform::TransformState &state) {
// Convert the integer packing flags to booleans.		// Convert the integer packing flags to booleans.
SmallVector<bool> packPaddings;		SmallVector<bool> packPaddings;
for (int64_t packPadding : extractFromI64ArrayAttr(getPackPaddings()))		for (int64_t packPadding : extractFromI64ArrayAttr(getPackPaddings()))
packPaddings.push_back(static_cast<bool>(packPadding));		packPaddings.push_back(static_cast<bool>(packPadding));

// Convert the padding values to attributes.		// Convert the padding values to attributes.
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	LogicalResult transform::PadOp::verify() {
return success();		return success();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// PromoteOp		// PromoteOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

DiagnosedSilenceableFailure		DiagnosedSilenceableFailure
transform::PromoteOp::applyToOne(linalg::LinalgOp target,		transform::PromoteOp::applyToOne(LinalgOp target,
transform::ApplyToEachResultList &results,		transform::ApplyToEachResultList &results,
transform::TransformState &state) {		transform::TransformState &state) {
LinalgPromotionOptions promotionOptions;		LinalgPromotionOptions promotionOptions;
if (!getOperandsToPromote().empty())		if (!getOperandsToPromote().empty())
promotionOptions = promotionOptions.setOperandsToPromote(		promotionOptions = promotionOptions.setOperandsToPromote(
extractFromI64ArrayAttr(getOperandsToPromote()));		extractFromI64ArrayAttr(getOperandsToPromote()));
if (getUseFullTilesByDefault())		if (getUseFullTilesByDefault())
promotionOptions = promotionOptions.setUseFullTileBuffersByDefault(		promotionOptions = promotionOptions.setUseFullTileBuffersByDefault(
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	LogicalResult transform::ReplaceOp::verify() {
return success();		return success();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ScalarizeOp		// ScalarizeOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

DiagnosedSilenceableFailure		DiagnosedSilenceableFailure
transform::ScalarizeOp::applyToOne(linalg::LinalgOp target,		transform::ScalarizeOp::applyToOne(LinalgOp target,
transform::ApplyToEachResultList &results,		transform::ApplyToEachResultList &results,
transform::TransformState &state) {		transform::TransformState &state) {
scf::SCFTilingOptions tilingOptions;		scf::SCFTilingOptions tilingOptions;
tilingOptions.setTileSizeComputationFunction([&](OpBuilder &b, Operation *) {		tilingOptions.setTileSizeComputationFunction([&](OpBuilder &b, Operation *) {
SmallVector<Value, 4> tileSizes;		SmallVector<Value, 4> tileSizes;
Location loc = target.getLoc();		Location loc = target.getLoc();
SmallVector<OpFoldResult> allShapeSizes =		SmallVector<OpFoldResult> allShapeSizes =
target.createFlatListOfOperandDims(b, loc);		target.createFlatListOfOperandDims(b, loc);
▲ Show 20 Lines • Show All 235 Lines • ▼ Show 20 Lines	if (useAlloc) {
result.addAttribute(SplitReductionOp::getUseAllocAttrName(result.name),		result.addAttribute(SplitReductionOp::getUseAllocAttrName(result.name),
builder.getUnitAttr());		builder.getUnitAttr());
}		}
auto resultType = pdl::OperationType::get(ctx);		auto resultType = pdl::OperationType::get(ctx);
result.addTypes({resultType, resultType, resultType, resultType});		result.addTypes({resultType, resultType, resultType, resultType});
}		}

DiagnosedSilenceableFailure transform::SplitReductionOp::applyToOne(		DiagnosedSilenceableFailure transform::SplitReductionOp::applyToOne(
linalg::LinalgOp target, transform::ApplyToEachResultList &results,		LinalgOp target, transform::ApplyToEachResultList &results,
transform::TransformState &state) {		transform::TransformState &state) {
ControlSplitReductionFn splitFn = [&](LinalgOp) {		ControlSplitReductionFn splitFn = [&](LinalgOp) {
return linalg::SplitReductionOptions{int64_t(getSplitFactor()),		return linalg::SplitReductionOptions{int64_t(getSplitFactor()),
unsigned(getInsertSplitDimension()),		unsigned(getInsertSplitDimension()),
bool(getInnerParallel())};		bool(getInnerParallel())};
};		};
TrivialPatternRewriter rewriter(getContext());		TrivialPatternRewriter rewriter(getContext());
rewriter.setInsertionPoint(target);		rewriter.setInsertionPoint(target);
Show All 28 Lines	void transform::TileReductionUsingScfOp::build(
auto staticTileSizesAttr = builder.getDenseI64ArrayAttr(staticTileSizes);		auto staticTileSizesAttr = builder.getDenseI64ArrayAttr(staticTileSizes);
build(builder, result,		build(builder, result,
/resultTypes=/TypeRange{opTy, opTy, opTy, opTy},		/resultTypes=/TypeRange{opTy, opTy, opTy, opTy},
/target=/target,		/target=/target,
/tile_sizes=/staticTileSizesAttr);		/tile_sizes=/staticTileSizesAttr);
}		}

DiagnosedSilenceableFailure transform::TileReductionUsingScfOp::applyToOne(		DiagnosedSilenceableFailure transform::TileReductionUsingScfOp::applyToOne(
linalg::LinalgOp target, transform::ApplyToEachResultList &results,		LinalgOp target, transform::ApplyToEachResultList &results,
transform::TransformState &state) {		transform::TransformState &state) {
TrivialPatternRewriter rewriter(getContext());		TrivialPatternRewriter rewriter(getContext());
rewriter.setInsertionPoint(target);		rewriter.setInsertionPoint(target);
FailureOr<scf::SCFReductionTilingResult> result = scf::tileReductionUsingScf(		FailureOr<scf::SCFReductionTilingResult> result = scf::tileReductionUsingScf(
rewriter, cast<PartialReductionOpInterface>(target.getOperation()),		rewriter, cast<PartialReductionOpInterface>(target.getOperation()),
getAsOpFoldResult(rewriter.getI64ArrayAttr(getTileSizes())));		getAsOpFoldResult(rewriter.getI64ArrayAttr(getTileSizes())));

if (failed(result))		if (failed(result))
Show All 27 Lines	build(builder, result,
/target=/target,		/target=/target,
/num_threads=/staticNumThreadsAttr,		/num_threads=/staticNumThreadsAttr,
/tile_sizes=/staticTileSizesAttr,		/tile_sizes=/staticTileSizesAttr,
/mapping=/mapping);		/mapping=/mapping);
}		}

DiagnosedSilenceableFailure		DiagnosedSilenceableFailure
transform::TileReductionUsingForeachThreadOp::applyToOne(		transform::TileReductionUsingForeachThreadOp::applyToOne(
linalg::LinalgOp target, transform::ApplyToEachResultList &results,		LinalgOp target, transform::ApplyToEachResultList &results,
transform::TransformState &state) {		transform::TransformState &state) {
TrivialPatternRewriter rewriter(getContext());		TrivialPatternRewriter rewriter(getContext());
rewriter.setInsertionPoint(target);		rewriter.setInsertionPoint(target);
SmallVector<OpFoldResult> numThreads =		SmallVector<OpFoldResult> numThreads =
getAsOpFoldResult(rewriter.getI64ArrayAttr(getNumThreads()));		getAsOpFoldResult(rewriter.getI64ArrayAttr(getNumThreads()));
SmallVector<OpFoldResult> tileSizes =		SmallVector<OpFoldResult> tileSizes =
getAsOpFoldResult(rewriter.getI64ArrayAttr(getTileSizes()));		getAsOpFoldResult(rewriter.getI64ArrayAttr(getTileSizes()));
FailureOr<linalg::ForeachThreadReductionTilingResult> result =		FailureOr<linalg::ForeachThreadReductionTilingResult> result =
▲ Show 20 Lines • Show All 844 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/transform-pack-greedily.mlir

This file was added.

				// RUN: mlir-opt %s -test-transform-dialect-interpreter --split-input-file \| FileCheck %s

				!A_mk = tensor<1023x255xf32>
				!B_kn = tensor<255x127xf32>
				!C_mn = tensor<1023x127xf32>

				// Normalized dims are: ( k, m, n)(kk, mm, nn)
				// CHECK-DAG: #[[$mk_kkmm:.*]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d1, d0, d3, d4)>
				// CHECK-DAG: #[[$kn_kknn:.*]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d2, d3, d5)>
				// CHECK-DAG: #[[$mn_mmnn:.*]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d1, d2, d4, d5)>

				// CHECK-LABEL: @matmul_mk_kn_mn(
				func.func @matmul_mk_kn_mn(%A : !A_mk, %B : !B_kn, %C : !C_mn) -> !C_mn {
				// CHECK: linalg.generic
				// CHECK-SAME: indexing_maps = [#[[$mk_kkmm]], #[[$kn_kknn]], #[[$mn_mmnn]]]
				// CHECK-SAME: ["reduction", "parallel", "parallel", "reduction", "parallel", "parallel"]}
				// CHECK-SAME: ins(%{{.*}} : tensor<128x8x32x8xf32>, tensor<8x8x32x16xf32>)
				// CHECK-SAME: outs(%{{.*}} : tensor<128x8x8x16xf32>)
				%0 = linalg.matmul ins(%A, %B : !A_mk, !B_kn) outs(%C : !C_mn) -> !C_mn
				return %0 : !C_mn
				}

				transform.sequence failures(propagate) {
				^bb1(%module_op: !pdl.operation):
				%matmul = transform.structured.match ops{["linalg.matmul"]} in %module_op
				: (!pdl.operation) -> !transform.op<"linalg.matmul">
				transform.structured.pack_greedily %matmul
				gemm_packed_sizes = [8, 16, 32] gemm_inner_dims_order = [1, 2, 0]
				: (!transform.op<"linalg.matmul">) -> !transform.op<"linalg.generic">
				}

				// -----

				!A_mk = tensor<1023x255xf32>
				!B_nk = tensor<127x255xf32>
				!C_nm = tensor<127x1023xf32>

				#mkn_accesses = [
				affine_map<(m, n, k) -> (m, k)>,
				affine_map<(m, n, k) -> (n, k)>,
				affine_map<(m, n, k) -> (n, m)>
				]
				#mkn_trait = {
				indexing_maps = #mkn_accesses,
				iterator_types = ["parallel", "parallel", "reduction"]
				}

				// Normalized dims are: ( k, m, n)(kk, mm, nn)
				// CHECK-DAG: #[[$km_kkmm:.*]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d1, d0, d3, d4)>
				// CHECK-DAG: #[[$kn_kknn:.*]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d2, d0, d3, d5)>
				// CHECK-DAG: #[[$mn_mmnn:.*]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d2, d1, d4, d5)>

				// CHECK-LABEL: @matmul_mk_nk_nm(
				func.func @matmul_mk_nk_nm(%A : !A_mk, %B : !B_nk, %C : !C_nm) -> !C_nm {
				// CHECK: linalg.generic
				// CHECK-SAME: indexing_maps = [#[[$mk_kkmm]], #[[$kn_kknn]], #[[$mn_mmnn]]]
				// CHECK-SAME: ["reduction", "parallel", "parallel", "reduction", "parallel", "parallel"]}
				// CHECK-SAME: ins(%{{.*}} : tensor<128x8x32x8xf32>, tensor<8x8x32x16xf32>)
				// CHECK-SAME: outs(%{{.*}} : tensor<8x128x8x16xf32>)
				%0 = linalg.generic #mkn_trait ins(%A, %B : !A_mk, !B_nk) outs(%C : !C_nm) {
				^bb0(%a: f32, %b: f32, %c: f32):
				%d = arith.mulf %a, %b : f32
				%e = arith.addf %c, %d : f32
				linalg.yield %e : f32
				} -> !C_nm
				return %0 : !C_nm
				}

				transform.sequence failures(propagate) {
				^bb1(%module_op: !pdl.operation):
				%generic = transform.structured.match ops{["linalg.generic"]} in %module_op : (!pdl.operation) -> !transform.op<"linalg.generic">
				transform.structured.pack_greedily %generic
				gemm_packed_sizes = [8, 16, 32] gemm_inner_dims_order = [1, 2, 0]
				: (!transform.op<"linalg.generic">) -> !transform.op<"linalg.generic">
				}

				// -----

				!A_mk = tensor<1023x255xf32>
				!B_nk = tensor<127x255xf32>
				!C_nm = tensor<127x1023xf32>

				#mkn_accesses = [
				affine_map<(k, m, n) -> (m, k)>,
				affine_map<(k, m, n) -> (n, k)>,
				affine_map<(k, m, n) -> (n, m)>
				]
				#mkn_trait = {
				indexing_maps = #mkn_accesses,
				iterator_types = ["reduction", "parallel", "parallel"]
				}

				// Normalized dims are: ( k, m, n)(kk, mm, nn)
				// CHECK-DAG: #[[$mk_kkmm:.*]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d1, d0, d3, d4)>
				// CHECK-DAG: #[[$kn_kknn:.*]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d2, d0, d3, d5)>
				// CHECK-DAG: #[[$mn_mmnn:.*]] = affine_map<(d0, d1, d2, d3, d4, d5) -> (d2, d1, d4, d5)>

				// CHECK-LABEL: @matmul_mk_nk_nm_transposed(
				func.func @matmul_mk_nk_nm_transposed(%A : !A_mk, %B : !B_nk, %C : !C_nm) -> !C_nm {
				// CHECK: linalg.generic
				// CHECK-SAME: indexing_maps = [#[[$mk_kkmm]], #[[$kn_kknn]], #[[$mn_mmnn]]]
				// CHECK-SAME: ["reduction", "parallel", "parallel", "reduction", "parallel", "parallel"]}
				// CHECK-SAME: ins(%{{.*}} : tensor<128x8x32x8xf32>, tensor<8x8x32x16xf32>)
				// CHECK-SAME: outs(%{{.*}} : tensor<8x128x8x16xf32>)
				%0 = linalg.generic #mkn_trait ins(%A, %B : !A_mk, !B_nk) outs(%C : !C_nm) {
				^bb0(%a: f32, %b: f32, %c: f32):
				%d = arith.mulf %a, %b : f32
				%e = arith.addf %c, %d : f32
				linalg.yield %e : f32
				} -> !C_nm
				return %0 : !C_nm
				}

				transform.sequence failures(propagate) {
				^bb1(%module_op: !pdl.operation):
				%generic = transform.structured.match ops{["linalg.generic"]} in %module_op : (!pdl.operation) -> !transform.op<"linalg.generic">
				transform.structured.pack_greedily %generic
				gemm_packed_sizes = [8, 16, 32] gemm_inner_dims_order = [1, 2, 0]
				: (!transform.op<"linalg.generic">) -> !transform.op<"linalg.generic">
				}

				// -----

				!A_bmkm2 = tensor<42x1023x255x33xf32>
				!B_nkb = tensor<127x255x42xf32>
				!C_nbm = tensor<127x42x1023xf32>

				#mkn_accesses = [
				affine_map<(k, m, n, b, m2) -> (b, m, k, m2)>,
				affine_map<(k, m, n, b, m2) -> (n, k, b)>,
				affine_map<(k, m, n, b, m2) -> (n, b, m)>
				]
				#mkn_trait = {
				indexing_maps = #mkn_accesses,
				iterator_types = ["reduction", "parallel", "parallel", "parallel", "parallel"]
				}

				// Normalized dims are: ( ?, ?, k, m, n)(kk, mm, nn)
				// CHECK-DAG: #[[$bmkm2_kkmm:.*]] = affine_map<(d0, d1, d2, d3, d4, d5, d6, d7) -> (d0, d3, d2, d1, d5, d6)>
				// CHECK-DAG: #[[$nkb_kknn:.*]] = affine_map<(d0, d1, d2, d3, d4, d5, d6, d7) -> (d4, d2, d0, d5, d7)>
				// CHECK-DAG: #[[$nbm_mmnn:.*]] = affine_map<(d0, d1, d2, d3, d4, d5, d6, d7) -> (d4, d0, d3, d6, d7)>

				// CHECK-LABEL: @contraction_bmkm2_nkb_nbm(
				func.func @contraction_bmkm2_nkb_nbm(%A : !A_bmkm2, %B : !B_nkb, %C : !C_nbm) -> !C_nbm {
				// CHECK: linalg.generic
				// CHECK-SAME: indexing_maps = [#[[$bmkm2_kkmm]], #[[$nkb_kknn]], #[[$nbm_mmnn]]]
				// CHECK-SAME: ["parallel", "parallel", "reduction", "parallel", "parallel", "reduction", "parallel", "parallel"]}
				// CHECK-SAME: ins(%{{.*}} : tensor<42x128x8x33x32x8xf32>, tensor<8x8x42x32x16xf32>)
				// CHECK-SAME: outs(%{{.*}} : tensor<8x42x128x8x16xf32>)
				%0 = linalg.generic #mkn_trait ins(%A, %B : !A_bmkm2, !B_nkb) outs(%C : !C_nbm) {
				^bb0(%a: f32, %b: f32, %c: f32):
				%d = arith.mulf %a, %b : f32
				%e = arith.addf %c, %d : f32
				linalg.yield %e : f32
				} -> !C_nbm
				return %0 : !C_nbm
				}

				transform.sequence failures(propagate) {
				^bb1(%module_op: !pdl.operation):
				%generic = transform.structured.match ops{["linalg.generic"]} in %module_op : (!pdl.operation) -> !transform.op<"linalg.generic">
				transform.structured.pack_greedily %generic
				gemm_packed_sizes = [8, 16, 32] gemm_inner_dims_order = [1, 2, 0]
				: (!transform.op<"linalg.generic">) -> !transform.op<"linalg.generic">
				}

				// -----

				// Conv linguo: h w kh kw c n f cc nn ff
				// Normalized dims are: ( ?, ?, ?, ?, k, m, n)(kk, mm, nn)
				// n c h + kh w + kw cc nn
				// CHECK-DAG: #[[$M1:.*]] = affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8, d9) -> (d5, d4, d0 + d2, d1 + d3, d7, d8)>
				// f c kh kw cc ff
				// CHECK-DAG: #[[$M2:.*]] = affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8, d9) -> (d6, d4, d2, d3, d7, d9)>
				// n f h w nn ff
				// CHECK-DAG: #[[$M3:.*]] = affine_map<(d0, d1, d2, d3, d4, d5, d6, d7, d8, d9) -> (d5, d6, d0, d1, d8, d9)>

				// CHECK-LABEL: @conv_2d_nchw_fchw
				func.func @conv_2d_nchw_fchw(%arg0: tensor<?x47x16x16xf32>, %arg2: tensor<?x16x14x14xf32>) -> tensor<?x16x14x14xf32> {
				%c0 = arith.constant dense<0.1> : tensor<16x47x3x3xf32>
				// CHECK: linalg.generic
				// CHECK-SAME: indexing_maps = [#[[$M1]], #[[$M2]], #[[$M3]]]
				// CHECK-SAME: iterator_types = ["parallel", "parallel", "reduction", "reduction", "reduction", "parallel", "parallel", "reduction", "parallel", "parallel"]
				// CHECK-SAME: ins(%{{.*}} : tensor<?x2x16x16x32x8xf32>, tensor<1x2x3x3x32x16xf32>)
				// CHECK-SAME: outs(%{{.*}} : tensor<?x1x14x14x8x16xf32>)
				%0 = linalg.conv_2d_nchw_fchw
				{dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64> }
				ins(%arg0, %c0: tensor<?x47x16x16xf32>, tensor<16x47x3x3xf32>)
				outs(%arg2: tensor<?x16x14x14xf32>) -> tensor<?x16x14x14xf32>
				return %0 : tensor<?x16x14x14xf32>
				}

				transform.sequence failures(propagate) {
				^bb1(%module_op: !pdl.operation):
				%conv = transform.structured.match ops{["linalg.conv_2d_nchw_fchw"]} in %module_op
				: (!pdl.operation) -> !transform.op<"linalg.conv_2d_nchw_fchw">
				transform.structured.pack_greedily %conv
				gemm_packed_sizes = [8, 16, 32] gemm_inner_dims_order = [1, 2, 0]
				: (!transform.op<"linalg.conv_2d_nchw_fchw">) -> !transform.op<"linalg.generic">
				}


				// -----

				// These should fail to pack for now as they don't contain a contraction.
				// CHECK-LABEL: @reduce_and_map
				func.func @reduce_and_map(%arg0: tensor<10x100xf32>,
				%arg1: tensor<10x100xf32>, %output: tensor<10xf32>) -> tensor<10xf32> {
				%map_init = tensor.empty() : tensor<10x100xf32>
				// CHECK: linalg.map
				%mapped = linalg.map { arith.addf }
				ins(%arg0, %arg1 : tensor<10x100xf32>, tensor<10x100xf32>)
				outs(%map_init : tensor<10x100xf32>)
				// CHECK: linalg.reduce
				%res = linalg.reduce { arith.addf }
				ins(%mapped: tensor<10x100xf32>)
				outs(%output: tensor<10xf32>)
				dimensions = [1]
				return %res : tensor<10xf32>
				}

				transform.sequence failures(propagate) {
				^bb1(%module_op: !pdl.operation):
				%generic = transform.structured.match ops{["linalg.generic"]} in %module_op : (!pdl.operation) -> !transform.op<"linalg.generic">
				transform.structured.pack_greedily %generic
				gemm_packed_sizes = [8, 16, 32] gemm_inner_dims_order = [1, 2, 0]
				: (!transform.op<"linalg.generic">) -> !transform.op<"linalg.generic">
				}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Linalg] Adding a greedy packing transform dialect op.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 492552

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp

mlir/test/Dialect/Linalg/transform-pack-greedily.mlir

[mlir][Linalg] Adding a greedy packing transform dialect op.
ClosedPublic