This is an archive of the discontinued LLVM Phabricator instance.

mlir/lib/Dialect/Vector/VectorOps.cpp
2738–2739	There may be higher dim in the tensor that are ignored. I think we should check the affine map here instead? for instance what about this case: %vread = vector.transfer_read %t[%c0, %c0, %c0, %c0], %cst { in_bounds = [true, true, true, true], permutation_map = affine_map<(d0, d1, d2, d3) -> (0, d3, d2, 0)> } : tensor<4x4x4x5xf32>, vector<8x5x4x2xf32> You basically want the affine map to be a minor identity.
2766	you also need to handle potential Mask.
2769	I don't understand how you can be sure you'll end up with an identity map? What if some dimensions were missing in the original map like in this example: %vread = vector.transfer_read %t[%c0, %c0, %c0], %cst { in_bounds = [true, true, true, true], permutation_map = affine_map<(d0, d1, d2) -> (0, d2, d1, 0)> } : tensor<4x4x5xf32>, vector<8x5x4x2xf32>
mlir/test/Dialect/Vector/canonicalize.mlir
1039	Note that this case is already handled by this pattern: https://github.com/llvm/llvm-project/blob/7362cc5ef50b5ebcbb11380ab13a179902c7b8be/mlir/lib/Dialect/Vector/VectorTransforms.cpp#L2814 If your pattern is a superset then we should remove that one.

dcaballe added a reviewer: sgrechanik.Sep 30 2021, 6:17 PM

nicolasvasilache requested changes to this revision.Sep 30 2021, 11:47 PM

nicolasvasilache added inline comments.

mlir/lib/Dialect/Vector/VectorOps.cpp
2717	typo: than
2725	Extracting the broadcast information LGTM but I am not sure why you also extract the permutation in 1 shot here. Is there a reason to not keep this pattern orthogonal to `populateVectorTransferPermutationMapLoweringPatterns`? I.e. I'd just drop the zeros from the permutation map to produce: // %vread = vector.transfer_read %t[%c0, %c0], %cst { // in_bounds = [true, true, true, true], // permutation_map = affine_map<(d0, d1) -> (d1, d0)> // } : tensor<4x5xf32>, vector<5x4xf32> // %1 = vector.broadcast %0 // : vector<5x4xf32> to vector<8x2x5x4xf32> // %2 = vector.transpose %1, [0, 2, 3, 1] // : vector<8x2x5x4xf32> to vector<8x5x4x2xf32> Then `populateVectorTransferPermutationMapLoweringPatterns` should kick in to drop the permutation. I haven't tested the interaction specifically myself but experience says the more orthogonal the patterns are the least interference they exhibit. This also makes me think that IIRC some f the transpose patterns also managed some small amount of broadcast; I'd drop the broadcast related part from the other pattterns and rely on composition + orthogonality.
2738	I am not sure why this constraint? With the more I described above (i.e. this patterns just drops zeros and adds broadcast + transpose based on the zeros dropped), I think it would work more generally?
2749	Not yet looking at impl. details here until the previous points I made have been addressed or refuted. Still I'd note that I'd expect to see something "more compositional and reusable" along the lines: auto mapZero = readOp.permutation_map().select(b.getAffineConstantExpr(0)); // Given (i, j, k, l)[] -> (j + k, i, i, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d1, d2, d4) auto mapNonZero = readOp.permutation_map().selectNot(b.getAffineConstantExpr(0)); // Given (i, j, k, l)[] -> (j + k, i, i, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d0, d3, d5) VectorType targetVectorType = Vector::get(.... applyPermutationMap(mapNonZero, originalVectorShape), ...); transferRead_clone_with_permutation_map(mapNonZero.compose(permutation_map()))_and_result_type(targetVectorType); SmallVector<int> broadcastShape = applyPermutationMap(mapZero, originalVectorShape); broadcastShape.append(targetVectorType.shape()); VectorType broadcastVectorShape = Vector::get(.... broadcastShape, ...); b.create<vector::BroadcastOp>(...); // Finally transpose the vector in a similar fashion. where `select`/`selectNot` are new AffineMap utilities that returns an affineMap that is a permutation can be composed with the original one to only keep the exprs you are interested in: AffineMap AffineMap::select(AffineExpr); // Given (i, j, k, l)[] -> (i, i, j + k, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d0, d1, d4) AffineMap AffineMap::select(AffineExpr); // Given (i, j, k, l)[] -> (i, i, j + k, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d0, d1, d4) Any better name for `select`/`selectNot` or better utility functions that recover similar similar information are most welcome.
2766	I'd just bail on masks in a first impl. We can transpose + extract from them in a followup CL as long as the TODO is here. The inbounds attribute should be handled properly.
2769	I believe the route I proposed would handle all cases ?

This revision now requires changes to proceed.Sep 30 2021, 11:47 PM

Actually, I think the current TransferReadPermutationLowering and TransferOpReduceRank are good enough. I wish I knew about them before implementing this.

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Vector/

VectorOps.cpp

73 lines

test/

Dialect/

Vector/

canonicalize.mlir

81 lines

Diff 376192

mlir/lib/Dialect/Vector/VectorOps.cpp

Show First 20 Lines • Show All 2,705 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(TransferReadOp xferOp,
SmallVector<bool> inBounds(xferOp.getTransferRank(), true);		SmallVector<bool> inBounds(xferOp.getTransferRank(), true);
rewriter.replaceOpWithNewOp<TransferReadOp>(xferOp, xferOp.getVectorType(),		rewriter.replaceOpWithNewOp<TransferReadOp>(xferOp, xferOp.getVectorType(),
extractOp.source(), newIndices,		extractOp.source(), newIndices,
xferOp.padding(), inBounds);		xferOp.padding(), inBounds);

return success();		return success();
}		}
};		};

		// Splits vector.transfer_read into vector.transfer_read + vector.broadcast +
		// vector.transpose if vector.transfer_read reads into a vector of higher rank
		// then the source memref type.
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions typo: than nicolasvasilache: typo: than
		//
		//
		// %vread = vector.transfer_read %t[%c0, %c0], %cst {
		// in_bounds = [true, true, true, true],
		// permutation_map = affine_map<(d0, d1) -> (0, d1, d0, 0)>
		// } : tensor<4x5xf32>, vector<8x5x4x2xf32>
		//
		// %0 = vector.transfer_read %arg0[%c0, %c0], %cst {in_bounds = [true, true]}
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Extracting the broadcast information LGTM but I am not sure why you also extract the permutation in 1 shot here. Is there a reason to not keep this pattern orthogonal to `populateVectorTransferPermutationMapLoweringPatterns`? I.e. I'd just drop the zeros from the permutation map to produce: // %vread = vector.transfer_read %t[%c0, %c0], %cst { // in_bounds = [true, true, true, true], // permutation_map = affine_map<(d0, d1) -> (d1, d0)> // } : tensor<4x5xf32>, vector<5x4xf32> // %1 = vector.broadcast %0 // : vector<5x4xf32> to vector<8x2x5x4xf32> // %2 = vector.transpose %1, [0, 2, 3, 1] // : vector<8x2x5x4xf32> to vector<8x5x4x2xf32> Then `populateVectorTransferPermutationMapLoweringPatterns` should kick in to drop the permutation. I haven't tested the interaction specifically myself but experience says the more orthogonal the patterns are the least interference they exhibit. This also makes me think that IIRC some f the transpose patterns also managed some small amount of broadcast; I'd drop the broadcast related part from the other pattterns and rely on composition + orthogonality. nicolasvasilache: Extracting the broadcast information LGTM but I am not sure why you also extract the…
		// : tensor<4x5xf32>, vector<4x5xf32>
		// %1 = vector.broadcast %0
		// : vector<4x5xf32> to vector<8x2x4x5xf32>
		// %2 = vector.transpose %1, [0, 3, 2, 1]
		// : vector<8x2x4x5xf32> to vector<8x5x4x2xf32>
		class SplitTransferRead final : public OpRewritePattern<TransferReadOp> {
		public:
		using OpRewritePattern<TransferReadOp>::OpRewritePattern;
		LogicalResult matchAndRewrite(TransferReadOp readOp,
		PatternRewriter &rewriter) const override {
		auto srcType = readOp.getShapedType();
		auto vectorType = readOp.getVectorType();
		if (vectorType.getRank() <= srcType.getRank())
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions I am not sure why this constraint? With the more I described above (i.e. this patterns just drops zeros and adds broadcast + transpose based on the zeros dropped), I think it would work more generally? nicolasvasilache: I am not sure why this constraint? With the more I described above (i.e. this patterns just…
		return failure();
		ThomasRaouxUnsubmitted Not Done Reply Inline Actions There may be higher dim in the tensor that are ignored. I think we should check the affine map here instead? for instance what about this case: %vread = vector.transfer_read %t[%c0, %c0, %c0, %c0], %cst { in_bounds = [true, true, true, true], permutation_map = affine_map<(d0, d1, d2, d3) -> (0, d3, d2, 0)> } : tensor<4x4x4x5xf32>, vector<8x5x4x2xf32> You basically want the affine map to be a minor identity. ThomasRaoux: There may be higher dim in the tensor that are ignored. I think we should check the affine map…

		int64_t srcRank = srcType.getRank();
		int64_t vectorRank = vectorType.getRank();
		SmallVector<bool> newInbounds(srcRank, false);
		SmallVector<int64_t, 2> targetBroadcastedShape(vectorRank, 0);
		SmallVector<int64_t, 2> permutation(vectorRank, 0);

		size_t zeroExprsCount = 0;
		ArrayAttr inbounds = readOp.in_boundsAttr();
		for (auto &en : llvm::enumerate(readOp.permutation_map().getResults())) {
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Not yet looking at impl. details here until the previous points I made have been addressed or refuted. Still I'd note that I'd expect to see something "more compositional and reusable" along the lines: auto mapZero = readOp.permutation_map().select(b.getAffineConstantExpr(0)); // Given (i, j, k, l)[] -> (j + k, i, i, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d1, d2, d4) auto mapNonZero = readOp.permutation_map().selectNot(b.getAffineConstantExpr(0)); // Given (i, j, k, l)[] -> (j + k, i, i, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d0, d3, d5) VectorType targetVectorType = Vector::get(.... applyPermutationMap(mapNonZero, originalVectorShape), ...); transferRead_clone_with_permutation_map(mapNonZero.compose(permutation_map()))_and_result_type(targetVectorType); SmallVector<int> broadcastShape = applyPermutationMap(mapZero, originalVectorShape); broadcastShape.append(targetVectorType.shape()); VectorType broadcastVectorShape = Vector::get(.... broadcastShape, ...); b.create<vector::BroadcastOp>(...); // Finally transpose the vector in a similar fashion. where `select`/`selectNot` are new AffineMap utilities that returns an affineMap that is a permutation can be composed with the original one to only keep the exprs you are interested in: AffineMap AffineMap::select(AffineExpr); // Given (i, j, k, l)[] -> (i, i, j + k, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d0, d1, d4) AffineMap AffineMap::select(AffineExpr); // Given (i, j, k, l)[] -> (i, i, j + k, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d0, d1, d4) Any better name for `select`/`selectNot` or better utility functions that recover similar similar information are most welcome. nicolasvasilache: Not yet looking at impl. details here until the previous points I made have been addressed or…
		size_t index = en.index();
		auto &expr = en.value();
		if (auto zero = expr.dyn_cast<AffineConstantExpr>()) {
		permutation[zeroExprsCount] = index;
		targetBroadcastedShape[zeroExprsCount++] = vectorType.getDimSize(index);
		continue;
		}
		unsigned int dimPos = expr.cast<AffineDimExpr>().getPosition();
		permutation[srcRank + dimPos] = index;
		targetBroadcastedShape[srcRank + dimPos] = srcType.getDimSize(dimPos);

		if (!inbounds.empty())
		newInbounds[dimPos] = inbounds[index].cast<BoolAttr>().getValue();
		}

		Location loc = readOp.getLoc();
		Value read = rewriter.create<TransferReadOp>(
		ThomasRaouxUnsubmitted Not Done Reply Inline Actions you also need to handle potential Mask. ThomasRaoux: you also need to handle potential Mask.
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions I'd just bail on masks in a first impl. We can transpose + extract from them in a followup CL as long as the TODO is here. The inbounds attribute should be handled properly. nicolasvasilache: I'd just bail on masks in a first impl. We can transpose + extract from them in a followup CL…
		loc, VectorType::get(srcType.getShape(), srcType.getElementType()),
		readOp.source(), readOp.indices(),
		AffineMap::getMultiDimIdentityMap(srcRank, rewriter.getContext()),
		ThomasRaouxUnsubmitted Not Done Reply Inline Actions I don't understand how you can be sure you'll end up with an identity map? What if some dimensions were missing in the original map like in this example: %vread = vector.transfer_read %t[%c0, %c0, %c0], %cst { in_bounds = [true, true, true, true], permutation_map = affine_map<(d0, d1, d2) -> (0, d2, d1, 0)> } : tensor<4x4x5xf32>, vector<8x5x4x2xf32> ThomasRaoux: I don't understand how you can be sure you'll end up with an identity map? What if some…
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions I believe the route I proposed would handle all cases ? nicolasvasilache: I believe the route I proposed would handle all cases ?
		readOp.padding(), rewriter.getBoolArrayAttr(newInbounds));

		Value result = rewriter.create<BroadcastOp>(
		loc, VectorType::get(targetBroadcastedShape, srcType.getElementType()),
		read);

		// Insert TransposeOp if necessary.
		if (!llvm::is_sorted(permutation))
		result = rewriter.create<TransposeOp>(loc, result, permutation);

		rewriter.replaceOp(readOp, result);
		return success();
		}
		};

} // namespace		} // namespace

void TransferReadOp::getCanonicalizationPatterns(RewritePatternSet &results,		void TransferReadOp::getCanonicalizationPatterns(RewritePatternSet &results,
MLIRContext *context) {		MLIRContext *context) {
results.add<FoldExtractSliceIntoTransferRead>(context);		results.add<FoldExtractSliceIntoTransferRead, SplitTransferRead>(context);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// TransferWriteOp		// TransferWriteOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// Builder that sets permutation map to 'getMinorIdentityMap'.		/// Builder that sets permutation map to 'getMinorIdentityMap'.
void TransferWriteOp::build(OpBuilder &builder, OperationState &result,		void TransferWriteOp::build(OpBuilder &builder, OperationState &result,
▲ Show 20 Lines • Show All 1,156 Lines • Show Last 20 Lines

mlir/test/Dialect/Vector/canonicalize.mlir

	Show First 20 Lines • Show All 1,020 Lines • ▼ Show 20 Lines
	// CHECK: %[[r:.*]] = vector.transfer_write %[[v]], %[[t1]][%[[c4]], %[[c3]], %[[s]]] {in_bounds = [true, true]} : vector<5x6xf32>, tensor<?x?x12xf32>			// CHECK: %[[r:.*]] = vector.transfer_write %[[v]], %[[t1]][%[[c4]], %[[c3]], %[[s]]] {in_bounds = [true, true]} : vector<5x6xf32>, tensor<?x?x12xf32>
	// CHECK: return %[[r]]			// CHECK: return %[[r]]
	func @insert_slice_of_transfer_write_rank_extending(%t1 : tensor<?x?x12xf32>, %v : vector<5x6xf32>, %s : index, %t2 : tensor<5x6xf32>) -> tensor<?x?x12xf32> {			func @insert_slice_of_transfer_write_rank_extending(%t1 : tensor<?x?x12xf32>, %v : vector<5x6xf32>, %s : index, %t2 : tensor<5x6xf32>) -> tensor<?x?x12xf32> {
	%c0 = constant 0 : index			%c0 = constant 0 : index
	%0 = vector.transfer_write %v, %t2[%c0, %c0] {in_bounds = [true, true]} : vector<5x6xf32>, tensor<5x6xf32>			%0 = vector.transfer_write %v, %t2[%c0, %c0] {in_bounds = [true, true]} : vector<5x6xf32>, tensor<5x6xf32>
	%1 = tensor.insert_slice %0 into %t1[4, 3, %s] [1, 5, 6] [1, 1, 1] : tensor<5x6xf32> into tensor<?x?x12xf32>			%1 = tensor.insert_slice %0 into %t1[4, 3, %s] [1, 5, 6] [1, 1, 1] : tensor<5x6xf32> into tensor<?x?x12xf32>
	return %1 : tensor<?x?x12xf32>			return %1 : tensor<?x?x12xf32>
	}			}

				// -----

				// CHECK-LABEL: func @split_transfer_read_no_transpose
				func @split_transfer_read_no_transpose(%t: tensor<4xf32>) -> vector<2x4xf32> {
				%c0 = constant 0 : index
				%cst = constant 0.0 : f32
				%vread = vector.transfer_read %t[%c0], %cst {
				in_bounds = [true, true],
				permutation_map = affine_map<(d0) -> (0, d0)>
				} : tensor<4xf32>, vector<2x4xf32>
				ThomasRaouxUnsubmitted Not Done Reply Inline Actions Note that this case is already handled by this pattern: https://github.com/llvm/llvm-project/blob/7362cc5ef50b5ebcbb11380ab13a179902c7b8be/mlir/lib/Dialect/Vector/VectorTransforms.cpp#L2814 If your pattern is a superset then we should remove that one. ThomasRaoux: Note that this case is already handled by this pattern: https://github.com/llvm/llvm…
				return %vread : vector<2x4xf32>
				}
				// CHECK-SAME: %[[IN:.*]]: tensor<4xf32>) -> vector<2x4xf32>
				// CHECK-NEXT: %[[C0_F32:.*]] = constant 0.000000e+00 : f32
				// CHECK-NEXT: %[[C0:.*]] = constant 0 : index

				// CHECK-NEXT: %[[READ:.*]] = vector.transfer_read %[[IN]][%[[C0]]], %[[C0_F32]]
				// CHECK-SAME: {in_bounds = [true]} : tensor<4xf32>, vector<4xf32>

				// CHECK-NEXT: %[[BCAST:.*]] = vector.broadcast %[[READ]]
				// CHECK-SAME: : vector<4xf32> to vector<2x4xf32>

				// CHECK-NEXT: return %[[BCAST]] : vector<2x4xf32>

				// -----

				// CHECK-LABEL: func @split_transfer_read_with_transpose
				func @split_transfer_read_with_transpose(%t: tensor<4xf32>)
				-> vector<4x8xf32> {
				%c0 = constant 0 : index
				%cst = constant 0.0 : f32
				%vread = vector.transfer_read %t[%c0], %cst {
				in_bounds = [true, true],
				permutation_map = affine_map<(d0) -> (d0, 0)>
				} : tensor<4xf32>, vector<4x8xf32>
				return %vread : vector<4x8xf32>
				}
				// CHECK-SAME: %[[IN:.*]]: tensor<4xf32>) -> vector<4x8xf32>
				// CHECK-NEXT: %[[C0_F32:.*]] = constant 0.000000e+00 : f32
				// CHECK-NEXT: %[[C0:.*]] = constant 0 : index

				// CHECK-NEXT: %[[READ:.*]] = vector.transfer_read %[[IN]][%[[C0]]], %[[C0_F32]]
				// CHECK-SAME: {in_bounds = [true]} : tensor<4xf32>, vector<4xf32>

				// CHECK-NEXT: %[[BCAST:.*]] = vector.broadcast %[[READ]]
				// CHECK-SAME: : vector<4xf32> to vector<8x4xf32>

				// CHECK-NEXT: %[[TRANSPOSE:.*]] = vector.transpose %[[BCAST]], [1, 0]
				// CHECK-SAME: : vector<8x4xf32> to vector<4x8xf32>

				// CHECK-NEXT: return %[[TRANSPOSE]] : vector<4x8xf32>

				// -----

				// CHECK-LABEL: func @split_transfer_read_with_transpose_4D
				func @split_transfer_read_with_transpose_4D(%t: tensor<4x5xf32>)
				-> vector<8x5x4x2xf32> {
				%c0 = constant 0 : index
				%cst = constant 0.0 : f32
				%vread = vector.transfer_read %t[%c0, %c0], %cst {
				in_bounds = [true, true, true, true],
				permutation_map = affine_map<(d0, d1) -> (0, d1, d0, 0)>
				} : tensor<4x5xf32>, vector<8x5x4x2xf32>
				return %vread : vector<8x5x4x2xf32>
				}
				// CHECK-SAME: %[[IN:.*]]: tensor<4x5xf32>) -> vector<8x5x4x2xf32>
				// CHECK-NEXT: %[[C0_F32:.*]] = constant 0.000000e+00 : f32
				// CHECK-NEXT: %[[C0:.*]] = constant 0 : index

				// CHECK-NEXT: %[[READ:.*]] = vector.transfer_read
				// CHECK-SAME: %[[IN]][%[[C0]], %[[C0]]], %[[C0_F32]]
				// CHECK-SAME: {in_bounds = [true, true]} : tensor<4x5xf32>, vector<4x5xf32>

				// CHECK-NEXT: %[[BCAST:.*]] = vector.broadcast %[[READ]]
				// CHECK-SAME: : vector<4x5xf32> to vector<8x2x4x5xf32>

				// CHECK-NEXT: %[[TRANSPOSE:.*]] = vector.transpose %[[BCAST]], [0, 3, 2, 1]
				// CHECK-SAME: : vector<8x2x4x5xf32> to vector<8x5x4x2xf32>

				// CHECK-NEXT: return %[[TRANSPOSE]] : vector<8x5x4x2xf32>

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Rewrite transfer_read as transfer_read + broadcast + transpose.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 376192

mlir/lib/Dialect/Vector/VectorOps.cpp

mlir/test/Dialect/Vector/canonicalize.mlir

[mlir] Rewrite transfer_read as transfer_read + broadcast + transpose.
AbandonedPublic