This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Vector] Add ExtractOp folding when fed by a TransposeOp
ClosedPublic

Authored by nicolasvasilache on Jul 9 2020, 8:55 AM.

Download Raw Diff

Details

Reviewers

aartbik
ftynse
bkramer
ThomasRaoux
rriddle

Commits

rGa490d387e6e6: [mlir][Vector] Add ExtractOp folding when fed by a TransposeOp

Summary

TransposeOp are often followed by ExtractOp.
In certain cases however, it is unnecessary (and even detrimental) to lower a TransposeOp to either a flat transpose (llvm.matrix intrinsics) or to unrolled scalar insert / extract chains.

Providing foldings of ExtractOp mitigates some of the unnecessary complexity.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nicolasvasilache created this revision.Jul 9 2020, 8:55 AM

Herald added a reviewer: rriddle. · View Herald TranscriptJul 9 2020, 8:55 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: msifontes, jurahul, Kayjukh and 12 others. · View Herald Transcript

Harbormaster failed remote builds in B63597: Diff 276753!Jul 9 2020, 8:56 AM

aartbik added inline comments.Jul 9 2020, 11:51 AM

mlir/test/Dialect/Vector/canonicalize.mlir
343	is this example right? %0 is used twice, %2 once further down and %4 never? I would expect each transpose to be used by the following extract?

Fix test

mlir/test/Dialect/Vector/canonicalize.mlir
343	yes thanks for catching!

Harbormaster completed remote builds in B63748: Diff 277023.Jul 10 2020, 7:07 AM

ftynse accepted this revision.Jul 10 2020, 7:23 AM

ftynse added inline comments.

mlir/lib/IR/AffineMap.cpp
383	can getSubMap take a range instead of materializing the vector?

This revision is now accepted and ready to land.Jul 10 2020, 7:23 AM

nicolasvasilache marked an inline comment as done.Jul 10 2020, 7:32 AM

nicolasvasilache marked an inline comment as done.Jul 10 2020, 8:09 AM

nicolasvasilache added inline comments.

mlir/lib/IR/AffineMap.cpp
383	not without iterator facading complexity AFAICT ..

Closed by commit rGa490d387e6e6: [mlir][Vector] Add ExtractOp folding when fed by a TransposeOp (authored by nicolasvasilache). · Explain WhyJul 10 2020, 8:11 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

mlir/

include/

mlir/

IR/

AffineMap.h

9 lines

lib/

Dialect/

Vector/

VectorOps.cpp

60 lines

IR/

AffineMap.cpp

23 lines

test/

Dialect/

Vector/

canonicalize.mlir

41 lines

Diff 277050

mlir/include/mlir/IR/AffineMap.h

Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	public:
///		///
/// Example:		/// Example:
/// map1: `(d0, d1)[s0, s1] -> (d0 + 1 + s1, d1 - 1 - s0)`		/// map1: `(d0, d1)[s0, s1] -> (d0 + 1 + s1, d1 - 1 - s0)`
/// map2: `(d0)[s0] -> (d0 + s0, d0 - s0)`		/// map2: `(d0)[s0] -> (d0 + s0, d0 - s0)`
/// map1.compose(map2):		/// map1.compose(map2):
/// `(d0)[s0, s1, s2] -> (d0 + s1 + s2 + 1, d0 - s0 - s2 - 1)`		/// `(d0)[s0, s1, s2] -> (d0 + s1 + s2 + 1, d0 - s0 - s2 - 1)`
AffineMap compose(AffineMap map);		AffineMap compose(AffineMap map);

		/// Applies composition by the dims of `this` to the integer `values` and
		/// returns the resulting values. `this` must be symbol-less.
		SmallVector<int64_t, 4> compose(ArrayRef<int64_t> values);

/// Returns true if the AffineMap represents a subset (i.e. a projection) of a		/// Returns true if the AffineMap represents a subset (i.e. a projection) of a
/// symbol-less permutation map.		/// symbol-less permutation map.
bool isProjectedPermutation();		bool isProjectedPermutation();

/// Returns true if the AffineMap represents a symbol-less permutation map.		/// Returns true if the AffineMap represents a symbol-less permutation map.
bool isPermutation();		bool isPermutation();

/// Returns the map consisting of the `resultPos` subset.		/// Returns the map consisting of the `resultPos` subset.
AffineMap getSubMap(ArrayRef<unsigned> resultPos);		AffineMap getSubMap(ArrayRef<unsigned> resultPos);

		/// Returns the map consisting of the most major `numResults` results.
		/// Returns the null AffineMap if `numResults` == 0.
		/// Returns `*this` if `numResults` >= `this->getNumResults()`.
		AffineMap getMajorSubMap(unsigned numResults);

/// Returns the map consisting of the most minor `numResults` results.		/// Returns the map consisting of the most minor `numResults` results.
/// Returns the null AffineMap if `numResults` == 0.		/// Returns the null AffineMap if `numResults` == 0.
/// Returns `*this` if `numResults` >= `this->getNumResults()`.		/// Returns `*this` if `numResults` >= `this->getNumResults()`.
AffineMap getMinorSubMap(unsigned numResults);		AffineMap getMinorSubMap(unsigned numResults);

friend ::llvm::hash_code hash_value(AffineMap arg);		friend ::llvm::hash_code hash_value(AffineMap arg);

private:		private:
▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

mlir/lib/Dialect/Vector/VectorOps.cpp

Show All 12 Lines

#include "mlir/Dialect/Vector/VectorOps.h"		#include "mlir/Dialect/Vector/VectorOps.h"
#include "mlir/Dialect/StandardOps/IR/Ops.h"		#include "mlir/Dialect/StandardOps/IR/Ops.h"
#include "mlir/Dialect/Utils/StructuredOpsUtils.h"		#include "mlir/Dialect/Utils/StructuredOpsUtils.h"
#include "mlir/Dialect/Vector/VectorUtils.h"		#include "mlir/Dialect/Vector/VectorUtils.h"
#include "mlir/IR/AffineExpr.h"		#include "mlir/IR/AffineExpr.h"
#include "mlir/IR/AffineMap.h"		#include "mlir/IR/AffineMap.h"
#include "mlir/IR/Builders.h"		#include "mlir/IR/Builders.h"
		#include "mlir/IR/Function.h"
#include "mlir/IR/OpImplementation.h"		#include "mlir/IR/OpImplementation.h"
#include "mlir/IR/PatternMatch.h"		#include "mlir/IR/PatternMatch.h"
#include "mlir/IR/TypeUtilities.h"		#include "mlir/IR/TypeUtilities.h"
#include "mlir/Support/LLVM.h"		#include "mlir/Support/LLVM.h"
#include "mlir/Support/MathExtras.h"		#include "mlir/Support/MathExtras.h"
#include "llvm/ADT/StringSet.h"		#include "llvm/ADT/StringSet.h"
#include <numeric>		#include <numeric>

▲ Show 20 Lines • Show All 568 Lines • ▼ Show 20 Lines	static LogicalResult foldExtractOpFromExtractChain(ExtractOp extractOp) {
// OpBuilder is only used as a helper to build an I64ArrayAttr.		// OpBuilder is only used as a helper to build an I64ArrayAttr.
OpBuilder b(extractOp.getContext());		OpBuilder b(extractOp.getContext());
std::reverse(globalPosition.begin(), globalPosition.end());		std::reverse(globalPosition.begin(), globalPosition.end());
extractOp.setAttr(ExtractOp::getPositionAttrName(),		extractOp.setAttr(ExtractOp::getPositionAttrName(),
b.getI64ArrayAttr(globalPosition));		b.getI64ArrayAttr(globalPosition));
return success();		return success();
}		}

		/// Fold the result of an ExtractOp in place when it comes from a TransposeOp.
		static LogicalResult foldExtractOpFromTranspose(ExtractOp extractOp) {
		auto transposeOp = extractOp.vector().getDefiningOp<TransposeOp>();
		if (!transposeOp)
		return failure();

		auto permutation = extractVector<unsigned>(transposeOp.transp());
		auto extractedPos = extractVector<int64_t>(extractOp.position());

		// If transposition permutation is larger than the ExtractOp, all minor
		// dimensions must be an identity for folding to occur. If not, individual
		// elements within the extracted value are transposed and this is not just a
		// simple folding.
		unsigned minorRank = permutation.size() - extractedPos.size();
		MLIRContext *ctx = extractOp.getContext();
		AffineMap permutationMap = AffineMap::getPermutationMap(permutation, ctx);
		AffineMap minorMap = permutationMap.getMinorSubMap(minorRank);
		if (minorMap && !AffineMap::isMinorIdentity(minorMap))
		return failure();

		// %1 = transpose %0[x, y, z] : vector<axbxcxf32>
		// %2 = extract %1[u, v] : vector<..xf32>
		// may turn into:
		// %2 = extract %0[w, x] : vector<..xf32>
		// iff z == 2 and [w, x] = [x, y]^-1 o [u, v] here o denotes composition and
		// -1 denotes the inverse.
		permutationMap = permutationMap.getMajorSubMap(extractedPos.size());
		// The major submap has fewer results but the same number of dims. To compose
		// cleanly, we need to drop dims to form a "square matrix". This is possible
		// because:
		// (a) this is a permutation map and
		// (b) the minor map has already been checked to be identity.
		// Therefore, the major map cannot contain dims of position greater or equal
		// than the number of results.
		assert(llvm::all_of(permutationMap.getResults(),
		[&](AffineExpr e) {
		auto dim = e.dyn_cast<AffineDimExpr>();
		return dim && dim.getPosition() <
		permutationMap.getNumResults();
		}) &&
		"Unexpected map results depend on higher rank positions");
		// Project on the first domain dimensions to allow composition.
		permutationMap = AffineMap::get(permutationMap.getNumResults(), 0,
		permutationMap.getResults(), ctx);

		extractOp.setOperand(transposeOp.vector());
		// Compose the inverse permutation map with the extractedPos.
		auto newExtractedPos =
		inversePermutation(permutationMap).compose(extractedPos);
		// OpBuilder is only used as a helper to build an I64ArrayAttr.
		OpBuilder b(extractOp.getContext());
		extractOp.setAttr(ExtractOp::getPositionAttrName(),
		b.getI64ArrayAttr(newExtractedPos));

		return success();
		}

/// Fold an ExtractOp that is fed by a chain of InsertOps and TransposeOps. The		/// Fold an ExtractOp that is fed by a chain of InsertOps and TransposeOps. The
/// result is always the input to some InsertOp.		/// result is always the input to some InsertOp.
static Value foldExtractOpFromInsertChainAndTranspose(ExtractOp extractOp) {		static Value foldExtractOpFromInsertChainAndTranspose(ExtractOp extractOp) {
MLIRContext *context = extractOp.getContext();		MLIRContext *context = extractOp.getContext();
AffineMap permutationMap;		AffineMap permutationMap;
auto extractedPos = extractVector<unsigned>(extractOp.position());		auto extractedPos = extractVector<unsigned>(extractOp.position());
// Walk back a chain of InsertOp/TransposeOp until we hit a match.		// Walk back a chain of InsertOp/TransposeOp until we hit a match.
// Compose TransposeOp permutations as we walk back.		// Compose TransposeOp permutations as we walk back.
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	while (insertOp \|\| transposeOp) {
transposeOp = insertionDest.getDefiningOp<vector::TransposeOp>();		transposeOp = insertionDest.getDefiningOp<vector::TransposeOp>();
}		}
return Value();		return Value();
}		}

OpFoldResult ExtractOp::fold(ArrayRef<Attribute>) {		OpFoldResult ExtractOp::fold(ArrayRef<Attribute>) {
if (succeeded(foldExtractOpFromExtractChain(*this)))		if (succeeded(foldExtractOpFromExtractChain(*this)))
return getResult();		return getResult();
		if (succeeded(foldExtractOpFromTranspose(*this)))
		return getResult();
if (auto val = foldExtractOpFromInsertChainAndTranspose(*this))		if (auto val = foldExtractOpFromInsertChainAndTranspose(*this))
return val;		return val;
return OpFoldResult();		return OpFoldResult();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ExtractSlicesOp		// ExtractSlicesOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 1,496 Lines • Show Last 20 Lines

mlir/lib/IR/AffineMap.cpp

Show First 20 Lines • Show All 324 Lines • ▼ Show 20 Lines	auto newMap =
map.replaceDimsAndSymbols(newDims, newSymbols, numDims, numSymbols);		map.replaceDimsAndSymbols(newDims, newSymbols, numDims, numSymbols);
SmallVector<AffineExpr, 8> exprs;		SmallVector<AffineExpr, 8> exprs;
exprs.reserve(getResults().size());		exprs.reserve(getResults().size());
for (auto expr : getResults())		for (auto expr : getResults())
exprs.push_back(expr.compose(newMap));		exprs.push_back(expr.compose(newMap));
return AffineMap::get(numDims, numSymbols, exprs, map.getContext());		return AffineMap::get(numDims, numSymbols, exprs, map.getContext());
}		}

		SmallVector<int64_t, 4> AffineMap::compose(ArrayRef<int64_t> values) {
		assert(getNumSymbols() == 0 && "Expected symbol-less map");
		SmallVector<AffineExpr, 4> exprs;
		exprs.reserve(values.size());
		MLIRContext *ctx = getContext();
		for (auto v : values)
		exprs.push_back(getAffineConstantExpr(v, ctx));
		auto resMap = compose(AffineMap::get(0, 0, exprs, ctx));
		SmallVector<int64_t, 4> res;
		res.reserve(resMap.getNumResults());
		for (auto e : resMap.getResults())
		res.push_back(e.cast<AffineConstantExpr>().getValue());
		return res;
		}

bool AffineMap::isProjectedPermutation() {		bool AffineMap::isProjectedPermutation() {
if (getNumSymbols() > 0)		if (getNumSymbols() > 0)
return false;		return false;
SmallVector<bool, 8> seen(getNumInputs(), false);		SmallVector<bool, 8> seen(getNumInputs(), false);
for (auto expr : getResults()) {		for (auto expr : getResults()) {
if (auto dim = expr.dyn_cast<AffineDimExpr>()) {		if (auto dim = expr.dyn_cast<AffineDimExpr>()) {
if (seen[dim.getPosition()])		if (seen[dim.getPosition()])
return false;		return false;
Show All 14 Lines
AffineMap AffineMap::getSubMap(ArrayRef<unsigned> resultPos) {		AffineMap AffineMap::getSubMap(ArrayRef<unsigned> resultPos) {
SmallVector<AffineExpr, 4> exprs;		SmallVector<AffineExpr, 4> exprs;
exprs.reserve(resultPos.size());		exprs.reserve(resultPos.size());
for (auto idx : resultPos)		for (auto idx : resultPos)
exprs.push_back(getResult(idx));		exprs.push_back(getResult(idx));
return AffineMap::get(getNumDims(), getNumSymbols(), exprs, getContext());		return AffineMap::get(getNumDims(), getNumSymbols(), exprs, getContext());
}		}

		AffineMap AffineMap::getMajorSubMap(unsigned numResults) {
		if (numResults == 0)
		return AffineMap();
		if (numResults > getNumResults())
		return *this;
		return getSubMap(llvm::to_vector<4>(llvm::seq<unsigned>(0, numResults)));
		ftynseUnsubmitted Done Reply Inline Actions can getSubMap take a range instead of materializing the vector? ftynse: can getSubMap take a range instead of materializing the vector?
		nicolasvasilacheAuthorUnsubmitted Done Reply Inline Actions not without iterator facading complexity AFAICT .. nicolasvasilache: not without iterator facading complexity AFAICT ..
		}

AffineMap AffineMap::getMinorSubMap(unsigned numResults) {		AffineMap AffineMap::getMinorSubMap(unsigned numResults) {
if (numResults == 0)		if (numResults == 0)
return AffineMap();		return AffineMap();
if (numResults > getNumResults())		if (numResults > getNumResults())
return *this;		return *this;
return getSubMap(llvm::to_vector<4>(		return getSubMap(llvm::to_vector<4>(
llvm::seq<unsigned>(getNumResults() - numResults, getNumResults())));		llvm::seq<unsigned>(getNumResults() - numResults, getNumResults())));
}		}
▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

mlir/test/Dialect/Vector/canonicalize.mlir

Show First 20 Lines • Show All 294 Lines • ▼ Show 20 Lines	-> (vector<4xf32>, vector<4xf32>, vector<4xf32>, vector<4xf32>)
// CHECK-SAME: vector<4xf32>, vector<4xf32>, vector<4xf32>, vector<4xf32>		// CHECK-SAME: vector<4xf32>, vector<4xf32>, vector<4xf32>, vector<4xf32>
return %r1, %r2, %r3, %r4 : vector<4xf32>, vector<4xf32>, vector<4xf32>, vector<4xf32>		return %r1, %r2, %r3, %r4 : vector<4xf32>, vector<4xf32>, vector<4xf32>, vector<4xf32>
}		}

// -----		// -----

// CHECK-LABEL: fold_extracts		// CHECK-LABEL: fold_extracts
// CHECK-SAME: %[[A:[a-zA-Z0-9]*]]: vector<3x4x5x6xf32>		// CHECK-SAME: %[[A:[a-zA-Z0-9]*]]: vector<3x4x5x6xf32>
// CHECK-NEXT: vector.extract %[[A]][0, 1, 2, 3] : vector<3x4x5x6xf32>
// CHECK-NEXT: vector.extract %[[A]][0] : vector<3x4x5x6xf32>
// CHECK-NEXT: return
func @fold_extracts(%a : vector<3x4x5x6xf32>) -> (f32, vector<4x5x6xf32>) {		func @fold_extracts(%a : vector<3x4x5x6xf32>) -> (f32, vector<4x5x6xf32>) {
%b = vector.extract %a[0] : vector<3x4x5x6xf32>		%b = vector.extract %a[0] : vector<3x4x5x6xf32>
%c = vector.extract %b[1, 2] : vector<4x5x6xf32>		%c = vector.extract %b[1, 2] : vector<4x5x6xf32>
		// CHECK-NEXT: vector.extract %[[A]][0, 1, 2, 3] : vector<3x4x5x6xf32>
%d = vector.extract %c[3] : vector<6xf32>		%d = vector.extract %c[3] : vector<6xf32>

		// CHECK-NEXT: vector.extract %[[A]][0] : vector<3x4x5x6xf32>
%e = vector.extract %a[0] : vector<3x4x5x6xf32>		%e = vector.extract %a[0] : vector<3x4x5x6xf32>

		// CHECK-NEXT: return
return %d, %e : f32, vector<4x5x6xf32>		return %d, %e : f32, vector<4x5x6xf32>
}		}

		// -----

		// CHECK-LABEL: fold_extract_transpose
		// CHECK-SAME: %[[A:[a-zA-Z0-9]*]]: vector<3x4x5x6xf32>
		// CHECK-SAME: %[[B:[a-zA-Z0-9]*]]: vector<3x6x5x6xf32>
		func @fold_extract_transpose(
		%a : vector<3x4x5x6xf32>, %b : vector<3x6x5x6xf32>) -> (
		vector<6xf32>, vector<6xf32>, vector<6xf32>) {
		// [3] is a proper most minor identity map in transpose.
		// Permutation is a self inverse and we have.
		// [0, 2, 1] ^ -1 o [0, 1, 2] = [0, 2, 1] o [0, 1, 2]
		// = [0, 2, 1]
		// CHECK-NEXT: vector.extract %[[A]][0, 2, 1] : vector<3x4x5x6xf32>
		%0 = vector.transpose %a, [0, 2, 1, 3] : vector<3x4x5x6xf32> to vector<3x5x4x6xf32>
		%1 = vector.extract %0[0, 1, 2] : vector<3x5x4x6xf32>

		// [3] is a proper most minor identity map in transpose.
		// Permutation is a not self inverse and we have.
		// [1, 2, 0] ^ -1 o [0, 1, 2] = [2, 0, 1] o [0, 1, 2]
		// = [2, 0, 1]
		// CHECK-NEXT: vector.extract %[[A]][2, 0, 1] : vector<3x4x5x6xf32>
		%2 = vector.transpose %a, [1, 2, 0, 3] : vector<3x4x5x6xf32> to vector<4x5x3x6xf32>
		%3 = vector.extract %2[0, 1, 2] : vector<4x5x3x6xf32>

		// Not a minor identity map so intra-vector level has been permuted
		// CHECK-NEXT: vector.transpose %[[B]], [0, 2, 3, 1]
		// CHECK-NEXT: vector.extract %{{.*}}[0, 1, 2]
		%4 = vector.transpose %b, [0, 2, 3, 1] : vector<3x6x5x6xf32> to vector<3x5x6x6xf32>
		aartbikUnsubmitted Done Reply Inline Actions is this example right? %0 is used twice, %2 once further down and %4 never? I would expect each transpose to be used by the following extract? aartbik: is this example right? %0 is used twice, %2 once further down and %4 never? I would expect…
		nicolasvasilacheAuthorUnsubmitted Done Reply Inline Actions yes thanks for catching! nicolasvasilache: yes thanks for catching!
		%5 = vector.extract %4[0, 1, 2] : vector<3x5x6x6xf32>

		return %1, %3, %5 : vector<6xf32>, vector<6xf32>, vector<6xf32>
		}