This is an archive of the discontinued LLVM Phabricator instance.

I am in favor of this approach! In its current form it may lead to additional pad tensor operations when padding groups of Linalg operations. The reason is that makeComposedPadHighOp can only remove redundant pad tensor operations if the operations are padded in the correct order from producer to consumer. Up to now this is guaranteed since padding a consumer will only work once the extract slice of the padded producer is available. This implicit control needs to replaced by an explicit control of the transformation order. I would thus suggest to improve control over code transformations before landing the patch. Or we should at least be sure it does not affect performance of the current benchmarks.

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
185	You can avoid the loop as well here by using the OpResult result number: while (auto linalgOp = opOperand->get().getDefiningOp<LinalgOp>()) { OpResult result = opOperand->get().cast<OpResult>(); opOperand = &linalgOp.getOutputOperand(result.getResultNumber()); }
mlir/test/Dialect/Linalg/pad.mlir
516	This test shows a interesting side-effect of the change :). We see a pad here since the matmul is apparently padded after the generic. Without the the change the generic is padded after the matmul since only once padding applied to matmul there is an extract slice which enables padding the generic... I actually believe this test and also the test pad_multiple may now fail depending on the pattern application order... I would thus suggest to adapt TestLinalgCodegenStrategy.cpp (https://github.com/llvm/llvm-project/blob/35dfa78ff8d44733b1c805309f0bbd4a8c960897/mlir/test/lib/Dialect/Linalg/TestLinalgCodegenStrategy.cpp#L203) to pad only operations with a specific name. It may be possible to do this depending the fusion flag or we may want to control it with an additional command line flag (e.g., pad-anchor-op-only). Then it should be possible to write a really simple test that contains a fill and a matmul or matvec where only the matmul / matvec is padded.

address comments

Harbormaster completed remote builds in B149585: Diff 408702.Feb 14 2022, 9:29 PM

gysit added inline comments.Feb 15 2022, 12:44 AM

mlir/test/Dialect/Linalg/pad.mlir
181	I would not use `pad-anchor-op-only` op here so that we can test the canonicalization of the inner pad op!
487	it would be nice to shorten the test a bit. could we use two fill ops in a row connected to a generic op to simplify the test a bit?

gysit mentioned this in D122116: [mlir][linalg] Support padding LinalgOps in use-def chain..Mar 21 2022, 1:46 AM

hanchung abandoned this revision.Mar 23 2022, 8:01 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 23 2022, 8:01 PM

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Linalg/

Transforms/

Transforms.cpp

8 lines

test/

Dialect/

Linalg/

pad.mlir

62 lines

lib/

Dialect/

Linalg/

TestLinalgCodegenStrategy.cpp

7 lines

Diff 408702

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp

Show First 20 Lines • Show All 174 Lines • ▼ Show 20 Lines	static LogicalResult padOperandToSmallestStaticBoundingBox(
if (shape.empty())		if (shape.empty())
return success();		return success();

// Cannot pad if the padding value is unknown.		// Cannot pad if the padding value is unknown.
FailureOr<Value> paddingValue = paddingFunc(b, *opOperand);		FailureOr<Value> paddingValue = paddingFunc(b, *opOperand);
if (failed(paddingValue))		if (failed(paddingValue))
return failure(hasDynamicShape);		return failure(hasDynamicShape);

		OpOperand *operand = opOperand;
		while (auto linalgOp = operand->get().getDefiningOp<LinalgOp>()) {
		OpResult result = operand->get().cast<OpResult>();
		gysitUnsubmitted Not Done Reply Inline Actions You can avoid the loop as well here by using the OpResult result number: while (auto linalgOp = opOperand->get().getDefiningOp<LinalgOp>()) { OpResult result = opOperand->get().cast<OpResult>(); opOperand = &linalgOp.getOutputOperand(result.getResultNumber()); } gysit: You can avoid the loop as well here by using the OpResult result number: while (auto linalgOp…
		operand = linalgOp.getOutputOperand(result.getResultNumber());
		}

// Cannot construct a static bounding box if the operand is not defined by an		// Cannot construct a static bounding box if the operand is not defined by an
// ExtractSliceOp.		// ExtractSliceOp.
auto sliceOp = opOperand->get().getDefiningOp<tensor::ExtractSliceOp>();		auto sliceOp = operand->get().getDefiningOp<tensor::ExtractSliceOp>();
if (!sliceOp)		if (!sliceOp)
return failure(hasDynamicShape);		return failure(hasDynamicShape);

// Compute the dropped dimensions if `sliceOp` is ranke-reducing.		// Compute the dropped dimensions if `sliceOp` is ranke-reducing.
llvm::SmallBitVector droppedDims = sliceOp.getDroppedDims();		llvm::SmallBitVector droppedDims = sliceOp.getDroppedDims();

// Upper bound the `sliceOp` sizes to obtain a static bounding box.		// Upper bound the `sliceOp` sizes to obtain a static bounding box.
SmallVector<int64_t> staticSizes;		SmallVector<int64_t> staticSizes;
▲ Show 20 Lines • Show All 931 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/pad.mlir

// RUN: mlir-opt %s -test-linalg-codegen-strategy="anchor-op=linalg.matmul pad pack-paddings=1,1,0 run-enable-pass=false" -cse -canonicalize -split-input-file \| FileCheck %s --check-prefix=MATMUL		// RUN: mlir-opt %s -test-linalg-codegen-strategy="anchor-op=linalg.matmul pad pack-paddings=1,1,0 run-enable-pass=false" -cse -canonicalize -split-input-file \| FileCheck %s --check-prefix=MATMUL
// RUN: mlir-opt %s -test-linalg-codegen-strategy="anchor-op=linalg.fill pad pack-paddings=1,1,0 run-enable-pass=false" -cse -canonicalize -split-input-file \| FileCheck %s --check-prefix=FILL		// RUN: mlir-opt %s -test-linalg-codegen-strategy="anchor-op=linalg.fill pad pack-paddings=2,1,0 run-enable-pass=false pad-anchor-op-only" -cse -canonicalize -split-input-file \| FileCheck %s --check-prefix=FILL
		// RUN: mlir-opt %s -test-linalg-codegen-strategy="anchor-op=linalg.generic pad pack-paddings=2,1,0 run-enable-pass=false pad-anchor-op-only" -cse -canonicalize -split-input-file \| FileCheck %s --check-prefix=GENERIC
// RUN: mlir-opt %s -test-linalg-codegen-strategy="anchor-op=linalg.matmul pad pack-paddings=1,1,0 pad-inputs-only run-enable-pass=false" -cse -canonicalize -split-input-file \| FileCheck %s --check-prefix=INPUTS-ONLY		// RUN: mlir-opt %s -test-linalg-codegen-strategy="anchor-op=linalg.matmul pad pack-paddings=1,1,0 pad-inputs-only run-enable-pass=false" -cse -canonicalize -split-input-file \| FileCheck %s --check-prefix=INPUTS-ONLY

// MATMUL-DAG: #[[MAP0:[0-9a-z]+]] = affine_map<()[s0] -> (7, -s0 + 12)>		// MATMUL-DAG: #[[MAP0:[0-9a-z]+]] = affine_map<()[s0] -> (7, -s0 + 12)>
// MATMUL-DAG: #[[MAP1:[0-9a-z]+]] = affine_map<()[s0] -> (-s0 + 7)>		// MATMUL-DAG: #[[MAP1:[0-9a-z]+]] = affine_map<()[s0] -> (-s0 + 7)>
#map = affine_map<()[s0] -> (7, -s0 + 12)>		#map = affine_map<()[s0] -> (7, -s0 + 12)>

// MATMUL: static_sizes_output_divisible		// MATMUL: static_sizes_output_divisible
// MATMUL-SAME: %[[ARG0:[0-9a-zA-Z]*]]: tensor<24x12xf32>		// MATMUL-SAME: %[[ARG0:[0-9a-zA-Z]*]]: tensor<24x12xf32>
▲ Show 20 Lines • Show All 155 Lines • ▼ Show 20 Lines
// FILL: pad_multiple		// FILL: pad_multiple
// FILL-SAME: %[[ARG0:[0-9a-zA-Z]*]]: tensor<64x64xf32>		// FILL-SAME: %[[ARG0:[0-9a-zA-Z]*]]: tensor<64x64xf32>
func @pad_multiple(%arg0: tensor<64x64xf32>,		func @pad_multiple(%arg0: tensor<64x64xf32>,
%iv0 : index) -> tensor<?x?xf32> {		%iv0 : index) -> tensor<?x?xf32> {
%cst = arith.constant 0.0 : f32		%cst = arith.constant 0.0 : f32
%size = affine.min #map0()[%iv0]		%size = affine.min #map0()[%iv0]
%0 = tensor.extract_slice %arg0[0, 0] [%size, %size] [1, 1] : tensor<64x64xf32> to tensor<?x?xf32>		%0 = tensor.extract_slice %arg0[0, 0] [%size, %size] [1, 1] : tensor<64x64xf32> to tensor<?x?xf32>

// Check both fill operations are padded by the same pad tensor operation.		// Check both fill operations are padded by the same source tensor operation.
// FILL: %[[T0:.*]] = tensor.pad		// FILL: %[[T0:.*]] = tensor.pad
// FILL: %[[T1:.]] = linalg.fill(%{{.}}, %[[T0]])		// FILL: %[[T1:.]] = linalg.fill(%{{.}}, %[[T0]])
// FILL: %[[T2:.]] = linalg.fill(%{{.}}, %[[T1]])		// FILL: %[[T2:.*]] = tensor.extract_slice %[[T1]]
// FILL: = tensor.extract_slice %[[T2]]		// FILL: %[[T3:.*]] = tensor.pad %[[T2]]
		// FILL: %[[T4:.]] = linalg.fill(%{{.}}, %[[T3]])
		// FILL: = tensor.extract_slice %[[T4]]
		gysitUnsubmitted Not Done Reply Inline Actions I would not use `pad-anchor-op-only` op here so that we can test the canonicalization of the inner pad op! gysit: I would not use `pad-anchor-op-only` op here so that we can test the canonicalization of the…
%1 = linalg.fill(%cst, %0) : f32, tensor<?x?xf32> -> tensor<?x?xf32>		%1 = linalg.fill(%cst, %0) : f32, tensor<?x?xf32> -> tensor<?x?xf32>
%2 = linalg.fill(%cst, %1) : f32, tensor<?x?xf32> -> tensor<?x?xf32>		%2 = linalg.fill(%cst, %1) : f32, tensor<?x?xf32> -> tensor<?x?xf32>
return %2 : tensor<?x?xf32>		return %2 : tensor<?x?xf32>
}		}

// -----		// -----

#map0 = affine_map<()[s0] -> (64, s0)>		#map0 = affine_map<()[s0] -> (64, s0)>
▲ Show 20 Lines • Show All 280 Lines • ▼ Show 20 Lines	func @rank_reducing(%arg0: tensor<1x64x1x64xf32>,
// Check the fill is padded despite the rank-reducing slice operation.		// Check the fill is padded despite the rank-reducing slice operation.
// FILL: %[[T0:.*]] = tensor.pad		// FILL: %[[T0:.*]] = tensor.pad
// FILL: %[[T1:.]] = linalg.fill(%{{.}}, %[[T0]])		// FILL: %[[T1:.]] = linalg.fill(%{{.}}, %[[T0]])
// FILL-SAME: tensor<1x64x64xf32>		// FILL-SAME: tensor<1x64x64xf32>
// FILL: = tensor.extract_slice %[[T1]]		// FILL: = tensor.extract_slice %[[T1]]
%1 = linalg.fill(%cst, %0) : f32, tensor<1x?x?xf32> -> tensor<1x?x?xf32>		%1 = linalg.fill(%cst, %0) : f32, tensor<1x?x?xf32> -> tensor<1x?x?xf32>
return %1 : tensor<1x?x?xf32>		return %1 : tensor<1x?x?xf32>
}		}

		// -----

		// GENERIC: func @matmul_bias_add(
		// GENERIC-SAME: %[[ARG0:[0-9a-zA-Z]*]]: tensor<25x49xf32>,
		// GENERIC-SAME: %[[ARG1:[0-9a-zA-Z]*]]: tensor<49x33xf32>,
		// GENERIC-SAME: %[[ARG2:[0-9a-zA-Z]*]]: tensor<33xf32>,
		// GENERIC-SAME: %[[ARG3:[0-9a-zA-Z]*]]: tensor<25x33xf32>,
		// GENERIC-SAME: %[[ARG4:[0-9a-zA-Z]*]]: tensor<25x33xf32>)
		func @matmul_bias_add(%arg0: tensor<25x49xf32>,
		gysitUnsubmitted Not Done Reply Inline Actions it would be nice to shorten the test a bit. could we use two fill ops in a row connected to a generic op to simplify the test a bit? gysit: it would be nice to shorten the test a bit. could we use two fill ops in a row connected to a…
		%arg1: tensor<49x33xf32>,
		%arg2: tensor<33xf32>,
		%arg3: tensor<25x33xf32>,
		%arg4: tensor<25x33xf32>) -> tensor<25x33xf32> {
		%cst = arith.constant 0.000000e+00 : f32
		%c0 = arith.constant 0 : index
		%c33 = arith.constant 33 : index
		%c25 = arith.constant 25 : index
		%c16 = arith.constant 16 : index
		%c8 = arith.constant 8 : index
		%0 = scf.for %arg5 = %c0 to %c25 step %c8 iter_args(%arg6 = %arg4) -> (tensor<25x33xf32>) {
		%1 = affine.min affine_map<(d0) -> (-d0 + 25, 8)>(%arg5)
		%2 = tensor.extract_slice %arg0[%arg5, 0] [%1, 49] [1, 1] : tensor<25x49xf32> to tensor<?x49xf32>
		%3 = affine.min affine_map<(d0) -> (8, -d0 + 25)>(%arg5)
		%4 = scf.for %arg7 = %c0 to %c33 step %c16 iter_args(%arg8 = %arg6) -> (tensor<25x33xf32>) {
		%5 = affine.min affine_map<(d0) -> (-d0 + 33, 16)>(%arg7)
		%6 = tensor.extract_slice %arg1[0, %arg7] [49, %5] [1, 1] : tensor<49x33xf32> to tensor<49x?xf32>
		%7 = tensor.extract_slice %arg3[%arg5, %arg7] [%1, %5] [1, 1] : tensor<25x33xf32> to tensor<?x?xf32>
		%8 = linalg.fill(%cst, %7) : f32, tensor<?x?xf32> -> tensor<?x?xf32>
		%9 = linalg.matmul ins(%2, %6 : tensor<?x49xf32>, tensor<49x?xf32>) outs(%8 : tensor<?x?xf32>) -> tensor<?x?xf32>
		%10 = affine.min affine_map<(d0) -> (16, -d0 + 33)>(%arg7)
		%11 = tensor.extract_slice %arg2[%arg7] [%10] [1] : tensor<33xf32> to tensor<?xf32>
		%12 = tensor.extract_slice %arg8[%arg5, %arg7] [%3, %10] [1, 1] : tensor<25x33xf32> to tensor<?x?xf32>

		// GENERIC: %[[T0:.+]] = linalg.matmul
		// GENERIC: %[[PAD:.+]] = tensor.pad %[[T0]]
		// GENERIC: %{{.+}} = linalg.generic
		// GENERIC-SAME: ins(%[[PAD]]

		gysitUnsubmitted Not Done Reply Inline Actions This test shows a interesting side-effect of the change :). We see a pad here since the matmul is apparently padded after the generic. Without the the change the generic is padded after the matmul since only once padding applied to matmul there is an extract slice which enables padding the generic... I actually believe this test and also the test pad_multiple may now fail depending on the pattern application order... I would thus suggest to adapt TestLinalgCodegenStrategy.cpp (https://github.com/llvm/llvm-project/blob/35dfa78ff8d44733b1c805309f0bbd4a8c960897/mlir/test/lib/Dialect/Linalg/TestLinalgCodegenStrategy.cpp#L203) to pad only operations with a specific name. It may be possible to do this depending the fusion flag or we may want to control it with an additional command line flag (e.g., pad-anchor-op-only). Then it should be possible to write a really simple test that contains a fill and a matmul or matvec where only the matmul / matvec is padded. gysit: This test shows a interesting side-effect of the change :). We see a pad here since the matmul…
		%13 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%9, %11 : tensor<?x?xf32>, tensor<?xf32>) outs(%12 : tensor<?x?xf32>) {
		^bb0(%arg9: f32, %arg10: f32, %arg11: f32):
		%15 = arith.addf %arg9, %arg10 : f32
		linalg.yield %15 : f32
		} -> tensor<?x?xf32>
		%14 = tensor.insert_slice %13 into %arg8[%arg5, %arg7] [%3, %10] [1, 1] : tensor<?x?xf32> into tensor<25x33xf32>
		scf.yield %14 : tensor<25x33xf32>
		}
		scf.yield %4 : tensor<25x33xf32>
		}
		return %0 : tensor<25x33xf32>
		}

mlir/test/lib/Dialect/Linalg/TestLinalgCodegenStrategy.cpp

Show First 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	Option<bool> registerPromoteFullTile{
llvm::cl::desc("Pad the small aligned memory buffer to the tile sizes."),		llvm::cl::desc("Pad the small aligned memory buffer to the tile sizes."),
llvm::cl::init(false)};		llvm::cl::init(false)};
Option<bool> pad{*this, "pad", llvm::cl::desc("Pad the operands."),		Option<bool> pad{*this, "pad", llvm::cl::desc("Pad the operands."),
llvm::cl::init(false)};		llvm::cl::init(false)};
Option<bool> padInputsOnly{		Option<bool> padInputsOnly{
*this, "pad-inputs-only",		*this, "pad-inputs-only",
llvm::cl::desc("Only pad input operands when test-pad-pattern"),		llvm::cl::desc("Only pad input operands when test-pad-pattern"),
llvm::cl::init(false)};		llvm::cl::init(false)};
		Option<bool> padAnchorOpOnly{
		*this, "pad-anchor-op-only",
		llvm::cl::desc("Only pad anchor op operands when test-pad-pattern"),
		llvm::cl::init(false)};
ListOption<int64_t> packPaddings{		ListOption<int64_t> packPaddings{
*this, "pack-paddings",		*this, "pack-paddings",
llvm::cl::desc("Operand packing flags when test-pad-pattern."),		llvm::cl::desc("Operand packing flags when test-pad-pattern."),
llvm::cl::ZeroOrMore, llvm::cl::MiscFlags::CommaSeparated};		llvm::cl::ZeroOrMore, llvm::cl::MiscFlags::CommaSeparated};
ListOption<int64_t> hoistPaddings{		ListOption<int64_t> hoistPaddings{
*this, "hoist-paddings",		*this, "hoist-paddings",
llvm::cl::desc("Operand hoisting depths when test-pad-pattern."),		llvm::cl::desc("Operand hoisting depths when test-pad-pattern."),
llvm::cl::ZeroOrMore, llvm::cl::MiscFlags::CommaSeparated};		llvm::cl::ZeroOrMore, llvm::cl::MiscFlags::CommaSeparated};
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	strategy
.setAlignment(16)		.setAlignment(16)
.setUseFullTileBuffersByDefault(promoteFullTile))		.setUseFullTileBuffersByDefault(promoteFullTile))
.tileIf(!fuse && !registerTileSizes.empty(), anchorOpName,		.tileIf(!fuse && !registerTileSizes.empty(), anchorOpName,
registerTilingOptions)		registerTilingOptions)
.promoteIf(!fuse && registerPromote, anchorOpName,		.promoteIf(!fuse && registerPromote, anchorOpName,
LinalgPromotionOptions()		LinalgPromotionOptions()
.setAlignment(16)		.setAlignment(16)
.setUseFullTileBuffersByDefault(registerPromoteFullTile))		.setUseFullTileBuffersByDefault(registerPromoteFullTile))
.padIf(pad, "", std::move(paddingOptions))		.padIf(pad, padAnchorOpOnly ? anchorOpName : std::string(),
		std::move(paddingOptions))
.decomposeIf(decompose)		.decomposeIf(decompose)
.generalizeIf(generalize, "")		.generalizeIf(generalize, "")
.interchangeIf(!iteratorInterchange.empty(), iteratorInterchange)		.interchangeIf(!iteratorInterchange.empty(), iteratorInterchange)
.vectorizeIf(vectorize, "", nullptr, vectorizePadding)		.vectorizeIf(vectorize, "", nullptr, vectorizePadding)
.vectorLowering(		.vectorLowering(
LinalgVectorLoweringOptions()		LinalgVectorLoweringOptions()
.setVectorTransformsOptions(		.setVectorTransformsOptions(
vector::VectorTransformsOptions()		vector::VectorTransformsOptions()
▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][linalg] Propagate static padding informations through Linalg ops.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 408702

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp

mlir/test/Dialect/Linalg/pad.mlir

mlir/test/lib/Dialect/Linalg/TestLinalgCodegenStrategy.cpp

[mlir][linalg] Propagate static padding informations through Linalg ops.
AbandonedPublic