This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Interfaces/
-
mlir/
-
Interfaces/
-
VectorInterfaces.td
-
lib/Dialect/Vector/IR/
-
Dialect/
-
Vector/
-
IR/
5/8
VectorOps.cpp
-
test/Dialect/Vector/
-
Dialect/
-
Vector/
-
canonicalize.mlir

Differential D130135

[mlir][vector] Extend transfer_write to read propagation
ClosedPublic

Authored by ThomasRaoux on Jul 19 2022, 5:42 PM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
springerm
aartbik

Commits

rG9f6ba4be2685: [mlir][vector] Extend transfer_write to read propagation

Summary

Folding of transfer_write into transfer_read is already supported but
this requires the read and write to have the same permuation map.
After linalg vectorization it is common to have different ppermuation
map for write followed by read even though the cases could be
propagated.
This canonicalization handle cases where the permuation maps are
different but the data read and written match and replace the transfer
ops with broadcast and permuation

Diff Detail

Event Timeline

ThomasRaoux created this revision.Jul 19 2022, 5:42 PM

Herald added a reviewer: aartbik. · View Herald TranscriptJul 19 2022, 5:42 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bzcheeseman, sdasgup3, wenzhicui and 18 others. · View Herald Transcript

ThomasRaoux requested review of this revision.Jul 19 2022, 5:42 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2022, 5:42 PM

Herald added subscribers: limo1996, stephenneuendorffer. · View Herald Transcript

Harbormaster completed remote builds in B176386: Diff 445994.Jul 19 2022, 6:15 PM

nicolasvasilache added inline comments.Jul 20 2022, 1:31 AM

mlir/lib/Dialect/Vector/IR/VectorOps.cpp
3288	Can you please rephrase? I have trouble parsing this comment.
3291	Seems like a good helper to add to the interface itself. Note that this is an upper bound on the size of the chunk.
3341	The while case is not tested, please add both a positive and a negative test.
3367	typo
3373	typo

Address review comments

ThomasRaoux marked 2 inline comments as done.Jul 20 2022, 9:14 AM

ThomasRaoux added inline comments.

mlir/lib/Dialect/Vector/IR/VectorOps.cpp
3288	Rephrased and moved into interface
3291	Moved it to `VectorTransferOpInterface`
3341	Good point, I removed the while loop as `isDisjointTransferIndices` seems to assume minor identity map. We would need to change it to use `getTensorChunkSizeAccessed` for it to work here

Harbormaster completed remote builds in B176525: Diff 446176.Jul 20 2022, 9:35 AM

Thanks!

This revision is now accepted and ready to land.Jul 21 2022, 7:03 AM

Closed by commit rG9f6ba4be2685: [mlir][vector] Extend transfer_write to read propagation (authored by ThomasRaoux). · Explain WhyJul 22 2022, 10:11 AM

This revision was automatically updated to reflect the committed changes.

ThomasRaoux added a commit: rG9f6ba4be2685: [mlir][vector] Extend transfer_write to read propagation.

Revision Contents

Path

Size

mlir/

include/

mlir/

Interfaces/

VectorInterfaces.td

30 lines

lib/

Dialect/

Vector/

IR/

VectorOps.cpp

82 lines

test/

Dialect/

Vector/

canonicalize.mlir

39 lines

Diff 446176

mlir/include/mlir/Interfaces/VectorInterfaces.td

Show First 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	InterfaceMethod<
for (int64_t resultIdx = 0,		for (int64_t resultIdx = 0,
indicesIdx = $_op.getLeadingShapedRank(),		indicesIdx = $_op.getLeadingShapedRank(),
eResult = $_op.getTransferRank();		eResult = $_op.getTransferRank();
resultIdx < eResult;		resultIdx < eResult;
++resultIdx, ++indicesIdx)		++resultIdx, ++indicesIdx)
fun(resultIdx, indicesIdx);		fun(resultIdx, indicesIdx);
}]		}]
>,		>,
		InterfaceMethod<
		/desc=/[{
		Return an upper-bound shape accessed by the transfer op within the
		tensor/memref operand.
		For example:
		```
		vector.transfer %w0[%i, %j] {
		permutation_map = affine_map<(d0, d1) -> (d1, d0, 0)>} :
		tensor<?x?xf32>, vector<4x2x6xf32>
		```
		returns a shape [2, 4].
		}],
		/retTy=/"SmallVector<int64_t>",
		/methodName=/"getTransferChunkAccessed",
		/args=/(ins),
		/methodBody=/"",
		/defaultImplementation=/[{
		SmallVector<int64_t> dimSizes($_op.getPermutationMap().getNumDims(), 0);
		for (auto vecDims : llvm::zip($_op.getPermutationMap().getResults(),
		$_op.getVectorType().getShape())) {
		AffineExpr dim = std::get<0>(vecDims);
		int64_t size = std::get<1>(vecDims);
		// Skip broadcast.
		if (dim.isa<AffineConstantExpr>())
		continue;
		dimSizes[dim.cast<AffineDimExpr>().getPosition()] = size;
		}
		return dimSizes;
		}]
		>,
];		];
}		}

#endif // MLIR_INTERFACES_VECTORINTERFACES		#endif // MLIR_INTERFACES_VECTORINTERFACES

mlir/lib/Dialect/Vector/IR/VectorOps.cpp

Show First 20 Lines • Show All 3,278 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(TransferReadOp xferOp,
SmallVector<bool> inBounds(xferOp.getTransferRank(), true);		SmallVector<bool> inBounds(xferOp.getTransferRank(), true);
rewriter.replaceOpWithNewOp<TransferReadOp>(		rewriter.replaceOpWithNewOp<TransferReadOp>(
xferOp, xferOp.getVectorType(), extractOp.getSource(), newIndices,		xferOp, xferOp.getVectorType(), extractOp.getSource(), newIndices,
xferOp.getPadding(), ArrayRef<bool>{inBounds});		xferOp.getPadding(), ArrayRef<bool>{inBounds});

return success();		return success();
}		}
};		};

		/// Store to load forwarding for transfer operations with permuation maps.
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Can you please rephrase? I have trouble parsing this comment. nicolasvasilache: Can you please rephrase? I have trouble parsing this comment.
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Rephrased and moved into interface ThomasRaoux: Rephrased and moved into interface
		/// Even if the permutation maps are different we can still propagate the store
		/// into the load if the size of the dimensions read and written match. Then we
		/// can replace the transfer_read + transfer_write by vector.broadcast and
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Seems like a good helper to add to the interface itself. Note that this is an upper bound on the size of the chunk. nicolasvasilache: Seems like a good helper to add to the interface itself. Note that this is an upper bound on…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Moved it to `VectorTransferOpInterface` ThomasRaoux: Moved it to `VectorTransferOpInterface`
		/// vector.transpose.
		/// Example:
		/// ```
		/// %w0 = vector.transfer_write %v0, %arg0[%c0, %c0, %c0]
		/// {in_bounds = [true, true],
		/// permutation_map = affine_map<(d0, d1, d2) -> (d2, d1)>} :
		/// vector<4x1xf32>, tensor<4x4x4xf32>
		/// %r = vector.transfer_read %w0[%c0, %c0, %c0], %cf0
		/// {in_bounds = [true, true, true, true],
		/// permutation_map = affine_map<(d0, d1, d2) -> (d1, 0, d2, 0)>} :
		/// tensor<4x4x4xf32>, vector<1x100x4x5xf32>
		/// ```
		/// To:
		/// ```
		/// %0 = vector.broadcast %arg1 : vector<4x1xf32> to vector<100x5x4x1xf32>
		/// %r = vector.transpose %0, [3, 0, 2, 1] :
		/// vector<100x5x4x1xf32> to vector<1x100x4x5xf32>
		/// ```
		struct TransferReadAfterWriteToBroadcast
		: public OpRewritePattern<TransferReadOp> {
		using OpRewritePattern<TransferReadOp>::OpRewritePattern;

		LogicalResult matchAndRewrite(TransferReadOp readOp,
		PatternRewriter &rewriter) const override {
		if (readOp.hasOutOfBoundsDim() \|\|
		!readOp.getShapedType().isa<RankedTensorType>())
		return failure();
		auto defWrite = readOp.getSource().getDefiningOp<vector::TransferWriteOp>();
		if (!defWrite)
		return failure();

		SmallVector<int64_t> readDims = readOp.getTransferChunkAccessed();
		Value vec;
		if (readOp.getIndices() == defWrite.getIndices() &&
		readOp.getMask() == defWrite.getMask()) {
		SmallVector<int64_t> writeDims = defWrite.getTransferChunkAccessed();
		// TODO: If the writeDim is a superset of the read dims we could do an
		// extract_strided_slice.
		if (writeDims == readDims)
		vec = defWrite.getVector();
		}
		// TODO: loop through the chain of transfer_write if we can prove that they
		// don't overlap with the transfer_read. This requires improving
		// `isDisjointTransferIndices` helper.
		if (!vec)
		return failure();
		SmallVector<unsigned> permutation;
		AffineMap readMap = compressUnusedDims(readOp.getPermutationMap());
		AffineMap writeMap = compressUnusedDims(defWrite.getPermutationMap());
		AffineMap map = readMap.compose(writeMap);
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions The while case is not tested, please add both a positive and a negative test. nicolasvasilache: The while case is not tested, please add both a positive and a negative test.
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Good point, I removed the while loop as `isDisjointTransferIndices` seems to assume minor identity map. We would need to change it to use `getTensorChunkSizeAccessed` for it to work here ThomasRaoux: Good point, I removed the while loop as `isDisjointTransferIndices` seems to assume minor…
		if (map.getNumResults() == 0)
		return failure();
		// Calculate the permuation to apply to go from the vector stored to the
		// vector read.
		if (!map.isPermutationOfMinorIdentityWithBroadcasting(permutation))
		return failure();

		Location loc = readOp.getLoc();
		// Calculate the broadcast shape by applying the reverse permuation to the
		// final shape we want.
		ArrayRef<int64_t> destShape = readOp.getVectorType().getShape();
		SmallVector<int64_t> broadcastShape(destShape.size());
		for (const auto &pos : llvm::enumerate(permutation))
		broadcastShape[pos.value()] = destShape[pos.index()];
		VectorType broadcastedType = VectorType::get(
		broadcastShape, defWrite.getVectorType().getElementType());
		vec = rewriter.create<vector::BroadcastOp>(loc, broadcastedType, vec);
		SmallVector<int64_t> transposePerm(permutation.begin(), permutation.end());
		rewriter.replaceOpWithNewOp<vector::TransposeOp>(readOp, vec,
		transposePerm);
		return success();
		}
		};
} // namespace		} // namespace

void TransferReadOp::getCanonicalizationPatterns(RewritePatternSet &results,		void TransferReadOp::getCanonicalizationPatterns(RewritePatternSet &results,
		nicolasvasilacheUnsubmitted Done Reply Inline Actions typo nicolasvasilache: typo
MLIRContext *context) {		MLIRContext *context) {
results.add<FoldExtractSliceIntoTransferRead>(context);		results
		.add<FoldExtractSliceIntoTransferRead, TransferReadAfterWriteToBroadcast>(
		context);
}		}

		nicolasvasilacheUnsubmitted Done Reply Inline Actions typo nicolasvasilache: typo
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// TransferWriteOp		// TransferWriteOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// 1. Builder with type inference.		/// 1. Builder with type inference.
void TransferWriteOp::build(OpBuilder &builder, OperationState &result,		void TransferWriteOp::build(OpBuilder &builder, OperationState &result,
Value vector, Value dest, ValueRange indices,		Value vector, Value dest, ValueRange indices,
AffineMapAttr permutationMapAttr,		AffineMapAttr permutationMapAttr,
▲ Show 20 Lines • Show All 1,754 Lines • Show Last 20 Lines

mlir/test/Dialect/Vector/canonicalize.mlir

Show First 20 Lines • Show All 1,052 Lines • ▼ Show 20 Lines	%w1 = vector.transfer_write %v0, %w0[%i, %i] {in_bounds = [true, true]} :
vector<1x4xf32>, tensor<4x4xf32>		vector<1x4xf32>, tensor<4x4xf32>
%0 = vector.transfer_read %w1[%c1, %c0], %cf0 {in_bounds = [true, true]} :		%0 = vector.transfer_read %w1[%c1, %c0], %cf0 {in_bounds = [true, true]} :
tensor<4x4xf32>, vector<1x4xf32>		tensor<4x4xf32>, vector<1x4xf32>
return %0 : vector<1x4xf32>		return %0 : vector<1x4xf32>
}		}

// -----		// -----

		// CHECK-LABEL: func @store_to_load_tensor_broadcast
		// CHECK-SAME: (%[[ARG:.]]: tensor<4x4xf32>, %[[V0:.]]: vector<4x2xf32>)
		// CHECK: %[[B:.*]] = vector.broadcast %[[V0]] : vector<4x2xf32> to vector<6x4x2xf32>
		// CHECK: %[[T:.*]] = vector.transpose %[[B]], [1, 2, 0] : vector<6x4x2xf32> to vector<4x2x6xf32>
		// CHECK: return %[[T]] : vector<4x2x6xf32>
		func.func @store_to_load_tensor_broadcast(%arg0 : tensor<4x4xf32>,
		%v0 : vector<4x2xf32>) -> vector<4x2x6xf32> {
		%c0 = arith.constant 0 : index
		%cf0 = arith.constant 0.0 : f32
		%w0 = vector.transfer_write %v0, %arg0[%c0, %c0] {in_bounds = [true, true]} :
		vector<4x2xf32>, tensor<4x4xf32>
		%0 = vector.transfer_read %w0[%c0, %c0], %cf0 {in_bounds = [true, true, true],
		permutation_map = affine_map<(d0, d1) -> (d0, d1, 0)>} :
		tensor<4x4xf32>, vector<4x2x6xf32>
		return %0 : vector<4x2x6xf32>
		}

		// -----

		// CHECK-LABEL: func @store_to_load_tensor_perm_broadcast
		// CHECK-SAME: (%[[ARG:.]]: tensor<4x4x4xf32>, %[[V0:.]]: vector<4x1xf32>)
		// CHECK: %[[B:.*]] = vector.broadcast %[[V0]] : vector<4x1xf32> to vector<100x5x4x1xf32>
		// CHECK: %[[T:.*]] = vector.transpose %[[B]], [3, 0, 2, 1] : vector<100x5x4x1xf32> to vector<1x100x4x5xf32>
		// CHECK: return %[[T]] : vector<1x100x4x5xf32>
		func.func @store_to_load_tensor_perm_broadcast(%arg0 : tensor<4x4x4xf32>,
		%v0 : vector<4x1xf32>) -> vector<1x100x4x5xf32> {
		%c0 = arith.constant 0 : index
		%cf0 = arith.constant 0.0 : f32
		%w0 = vector.transfer_write %v0, %arg0[%c0, %c0, %c0] {in_bounds = [true, true],
		permutation_map = affine_map<(d0, d1, d2) -> (d2, d1)>} :
		vector<4x1xf32>, tensor<4x4x4xf32>
		%0 = vector.transfer_read %w0[%c0, %c0, %c0], %cf0 {in_bounds = [true, true, true, true],
		permutation_map = affine_map<(d0, d1, d2) -> (d1, 0, d2, 0)>} :
		tensor<4x4x4xf32>, vector<1x100x4x5xf32>
		return %0 : vector<1x100x4x5xf32>
		}

		// -----


// CHECK-LABEL: func @dead_store_tensor		// CHECK-LABEL: func @dead_store_tensor
// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index		// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index		// CHECK-DAG: %[[C1:.*]] = arith.constant 1 : index
// CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index		// CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index
// CHECK-NOT: vector.transfer_write {{.}}, {{.}}[%[[C1]], %[[C0]]		// CHECK-NOT: vector.transfer_write {{.}}, {{.}}[%[[C1]], %[[C0]]
// CHECK: vector.transfer_write {{.}}, {{.}}[%[[C2]], %[[C0]]		// CHECK: vector.transfer_write {{.}}, {{.}}[%[[C2]], %[[C0]]
// CHECK: %[[VTW:.]] = vector.transfer_write {{.}}, {{.*}}[%[[C1]], %[[C0]]		// CHECK: %[[VTW:.]] = vector.transfer_write {{.}}, {{.*}}[%[[C1]], %[[C0]]
▲ Show 20 Lines • Show All 628 Lines • Show Last 20 Lines