This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Linalg/Utils/
-
Dialect/
-
Linalg/
-
Utils/
-
Utils.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
1
tile-and-fuse-on-tensors.mlir
-
tile-and-fuse-tensors.mlir
-
tile-conv.mlir
-
tile-simple-conv.mlir

Differential D110849

[mlir][linalg] Fix incorrect bound calculation for tiling conv
ClosedPublic

Authored by antiagainst on Sep 30 2021, 9:40 AM.

Download Raw Diff

Details

Reviewers

mravishankar
nicolasvasilache
springerm

Commits

rGcb2e6518000c: [mlir][linalg] Fix incorrect bound calculation for tiling conv

Summary

For convolution, the input window dimension's access affine map
is of the form (d0 * s0 + d1), where d0/d1 is the output/
filter window dimension, and s0 is the stride.

When tiling, https://reviews.llvm.org/D109267 changed how the
way dimensions are acquired. Instead of directly querying using
*.dim ops on the original convolution op, we now get it by
applying the access affine map to the loop upper bounds. This
is fine for dimensions having single-dimension affine maps,
like matmul, but not for convolution input. It will cause
incorrect compuation and out of bound. A concrete example, say
we have 1x225x225x3 (NHWC) input, 3x3x3x32 (HWCF) filter, and
1x112x112x3 (NHWC) output with stride 2, (112 * 2 + 3) would be
227, which is different from the correct input window dimension
size 225.

Instead, we should first calculate the max indices for each loop,
and apply the affine map to them, and then plus one to get the
dimension size. Note this makes no difference for matmul-like
ops given they will have d0 - 1 + 1 effectively.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

antiagainst created this revision.Sep 30 2021, 9:40 AM

Herald added a reviewer: mravishankar. · View Herald TranscriptSep 30 2021, 9:40 AM

Herald added subscribers: Groverkss, wenzhicui, wrengr and 20 others. · View Herald Transcript

antiagainst requested review of this revision.Sep 30 2021, 9:40 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptSep 30 2021, 9:40 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: limo1996, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B126595: Diff 376251.Sep 30 2021, 9:40 AM

antiagainst added a reviewer: springerm.Sep 30 2021, 9:43 AM

Yes, exclusive -> inclusive -> apply -> exclusive is the proper way, thanks!

Just note however that linalg.conv is going to be retired soon.

This revision is now accepted and ready to land.Sep 30 2021, 9:45 AM

also @gysit who is looking at retring conv.

Just note however that linalg.conv is going to be retired soon.

+100. It needs to be cleaned up for a long time! Thanks for that, @gysit!

Closed by commit rGcb2e6518000c: [mlir][linalg] Fix incorrect bound calculation for tiling conv (authored by antiagainst). · Explain WhySep 30 2021, 10:54 AM

This revision was automatically updated to reflect the committed changes.

antiagainst added a commit: rGcb2e6518000c: [mlir][linalg] Fix incorrect bound calculation for tiling conv.

gysit added inline comments.Sep 30 2021, 11:14 AM

mlir/test/Dialect/Linalg/tile-and-fuse-on-tensors.mlir
237	@antiagainst good catch and thanks for fixing. 10 + 8 is not always =18 :). I hope this second 18 disappears if I adapt the sizes of %arg0. Will have a look at it tomorrow. Otherwise, there may be a similar error somewhere.

gysit mentioned this in D110906: [mli][linalg] Change tensor size in unit test (NFC)..Sep 30 2021, 11:34 PM

gysit mentioned this in rG32a7d6051633: [mli][linalg] Change tensor size in unit test (NFC)..Oct 3 2021, 11:44 PM

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Linalg/

Utils/

Utils.cpp

29 lines

test/

Dialect/

Linalg/

tile-and-fuse-on-tensors.mlir

12 lines

tile-and-fuse-tensors.mlir

2 lines

tile-conv.mlir

2 lines

tile-simple-conv.mlir

4 lines

Diff 376287

mlir/lib/Dialect/Linalg/Utils/Utils.cpp

Show First 20 Lines • Show All 631 Lines • ▼ Show 20 Lines	for (unsigned r = 0; r < rank; ++r) {
auto sizeCst = size.getDefiningOp<ConstantIndexOp>();		auto sizeCst = size.getDefiningOp<ConstantIndexOp>();
auto hasTileSizeOne = sizeCst && sizeCst.getValue() == 1;		auto hasTileSizeOne = sizeCst && sizeCst.getValue() == 1;
auto dividesEvenly = sizeCst && !ShapedType::isDynamic(shapeSize) &&		auto dividesEvenly = sizeCst && !ShapedType::isDynamic(shapeSize) &&
((shapeSize % sizeCst.getValue()) == 0);		((shapeSize % sizeCst.getValue()) == 0);
if (!hasTileSizeOne && !dividesEvenly) {		if (!hasTileSizeOne && !dividesEvenly) {
LLVM_DEBUG(llvm::dbgs() << "makeTiledShape: shapeSize=" << shapeSize		LLVM_DEBUG(llvm::dbgs() << "makeTiledShape: shapeSize=" << shapeSize
<< ", size: " << size		<< ", size: " << size
<< ": make sure in bound with affine.min\n");		<< ": make sure in bound with affine.min\n");

AffineExpr dim0, dim1, dim2;		AffineExpr dim0, dim1, dim2;
bindDims(builder.getContext(), dim0, dim1, dim2);		bindDims(builder.getContext(), dim0, dim1, dim2);

		// Get the dimension size for this dimension. We need to first calculate
		// the max index and then plus one. This is important because for
		// convolution ops, we have its input window dimension's affine map of the
		// form `(d0 * s0 + d1)`, where `d0`/`d1 is an output/filter window
		// dimension and `s0` is stride. Directly use the dimension size of
		// output/filer window dimensions will cause incorrect calculation.
		AffineMap minusOneMap =
		AffineMap::inferFromExprList({ArrayRef<AffineExpr>{dim0 - 1}})
		.front();
		AffineMap plusOneMap =
		AffineMap::inferFromExprList({ArrayRef<AffineExpr>{dim0 + 1}})
		.front();
		auto maxIndices = llvm::to_vector<8>(llvm::map_range(ubs, [&](Value ub) {
		return makeComposedAffineApply(builder, loc, minusOneMap, {ub})
		.getResult();
		}));
		Value maxIndex = applyMapToValues(builder, loc, m, maxIndices).front();
		Value d = makeComposedAffineApply(builder, loc, plusOneMap, {maxIndex});

// Compute min(size, dim - offset) to avoid out-of-bounds accesses.		// Compute min(size, dim - offset) to avoid out-of-bounds accesses.
AffineMap minMap =		AffineMap minMap = AffineMap::inferFromExprList(
AffineMap::inferFromExprList(		{ArrayRef<AffineExpr>{dim0, dim1 - dim2}})
ArrayRef<ArrayRef<AffineExpr>>{{dim0, dim1 - dim2}})
.front();		.front();
Value d = applyMapToValues(builder, loc, m, ubs).front();
SmallVector<Value, 4> operands{size, d, offset};		SmallVector<Value, 4> operands{size, d, offset};
fullyComposeAffineMapAndOperands(&minMap, &operands);		fullyComposeAffineMapAndOperands(&minMap, &operands);
canonicalizeMapAndOperands(&minMap, &operands);		canonicalizeMapAndOperands(&minMap, &operands);
size = builder.create<AffineMinOp>(loc, builder.getIndexType(), minMap,		size = builder.create<AffineMinOp>(loc, builder.getIndexType(), minMap,
operands);		operands);
}		}

sizes.push_back(size);		sizes.push_back(size);
▲ Show 20 Lines • Show All 116 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/tile-and-fuse-on-tensors.mlir

Show First 20 Lines • Show All 227 Lines • ▼ Show 20 Lines	builtin.func @fuse_indexed(%arg0: tensor<24x12xi32>,
// CHECK: %{{.*}} = addi %[[IDX0_SHIFTED]], %[[IDX1_SHIFTED]]		// CHECK: %{{.*}} = addi %[[IDX0_SHIFTED]], %[[IDX1_SHIFTED]]
%1 = linalg.matmul ins(%arg0, %0 : tensor<24x12xi32>, tensor<12x25xi32>) outs(%arg2 : tensor<24x25xi32>) -> tensor<24x25xi32>		%1 = linalg.matmul ins(%arg0, %0 : tensor<24x12xi32>, tensor<12x25xi32>) outs(%arg2 : tensor<24x25xi32>) -> tensor<24x25xi32>
return %1 : tensor<24x25xi32>		return %1 : tensor<24x25xi32>
}		}

// -----		// -----

// CHECK-DAG: #[[MAP0:.*]] = affine_map<(d0, d1) -> (d0 + d1)>		// CHECK-DAG: #[[MAP0:.*]] = affine_map<(d0, d1) -> (d0 + d1)>
// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0, d1) -> (8, -d0 - d1 + 18)>		// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0, d1) -> (8, -d0 - d1 + 17)>
// CHECK-DAG: #[[MAP2:.*]] = affine_map<(d0, d1, d2) -> (d0, -d1 - d2 + 18)>		// CHECK-DAG: #[[MAP2:.*]] = affine_map<(d0, d1, d2) -> (d0, -d1 - d2 + 18)>
		gysitUnsubmitted Not Done Reply Inline Actions @antiagainst good catch and thanks for fixing. 10 + 8 is not always =18 :). I hope this second 18 disappears if I adapt the sizes of %arg0. Will have a look at it tomorrow. Otherwise, there may be a similar error somewhere. gysit: @antiagainst good catch and thanks for fixing. 10 + 8 is not always =18 :). I hope this second…
#map0 = affine_map<(d0, d1) -> (d0, d0 + d1)>		#map0 = affine_map<(d0, d1) -> (d0, d0 + d1)>
#map1 = affine_map<(d0, d1) -> (d0, d1)>		#map1 = affine_map<(d0, d1) -> (d0, d1)>

// CHECK: fuse_non_rectangular		// CHECK: fuse_non_rectangular
// CHECK-SAME: %[[ARG0:[0-9a-zA-Z]*]]: tensor<10x18xf32>		// CHECK-SAME: %[[ARG0:[0-9a-zA-Z]*]]: tensor<10x18xf32>
func @fuse_non_rectangular(%arg0: tensor<10x18xf32>,		func @fuse_non_rectangular(%arg0: tensor<10x18xf32>,
%arg1: tensor<10x8xf32>) -> tensor<10x8xf32> {		%arg1: tensor<10x8xf32>) -> tensor<10x8xf32> {
%cst = constant 0.000000e+00 : f32		%cst = constant 0.000000e+00 : f32
%0 = linalg.fill(%cst, %arg0) : f32, tensor<10x18xf32> -> tensor<10x18xf32>		%0 = linalg.fill(%cst, %arg0) : f32, tensor<10x18xf32> -> tensor<10x18xf32>

// CHECK: scf.for %[[IV0:[0-9a-zA-Z]*]] =		// CHECK: scf.for %[[IV0:[0-9a-zA-Z]*]] = %c0 to %c8 step %c4
// CHECK: scf.for %[[IV1:[0-9a-zA-Z]*]] =		// CHECK: scf.for %[[IV1:[0-9a-zA-Z]*]] = %c0 to %c10 step %c5

// Compute producer on a hyper rectangular bounding box. Along the second dimenson,		// Compute producer on a hyper rectangular bounding box. Along the second dimenson,
// the offset is set to the sum of the induction variables and the upper bound		// the offset is set to the sum of the induction variables, and the upper bound
// to either eight (sum of the tile sizes) or eighteen (sum of the domain sizes)		// to either 8 (tile size) or 17 (sum of max indices (9+7) then + 1) minus the
// minus the induction variables.		// induction variables.
// CHECK: %[[SUM:.*]] = affine.apply #[[MAP0]](%[[IV1]], %[[IV0]]		// CHECK: %[[SUM:.*]] = affine.apply #[[MAP0]](%[[IV1]], %[[IV0]]
// CHECK: %[[TS1:.*]] = affine.min #[[MAP1]](%[[IV1]], %[[IV0]]		// CHECK: %[[TS1:.*]] = affine.min #[[MAP1]](%[[IV1]], %[[IV0]]
// CHECK: %[[UB1:.*]] = affine.min #[[MAP2]](%[[TS1]], %[[IV1]], %[[IV0]]		// CHECK: %[[UB1:.*]] = affine.min #[[MAP2]](%[[TS1]], %[[IV1]], %[[IV0]]
// CHECK: %[[T0:.*]] = tensor.extract_slice %[[ARG0]]		// CHECK: %[[T0:.*]] = tensor.extract_slice %[[ARG0]]
// CHECK-SAME: %[[IV1]], %[[SUM]]		// CHECK-SAME: %[[IV1]], %[[SUM]]
// CHECK-SAME: , %[[UB1]]		// CHECK-SAME: , %[[UB1]]
// CHECK: %[[T1:.]] = linalg.fill(%{{.}}, %[[T0]])		// CHECK: %[[T1:.]] = linalg.fill(%{{.}}, %[[T0]])
%1 = linalg.generic {indexing_maps = [#map0, #map1], iterator_types = ["parallel", "parallel"]} ins(%0 : tensor<10x18xf32>) outs(%arg1 : tensor<10x8xf32>) {		%1 = linalg.generic {indexing_maps = [#map0, #map1], iterator_types = ["parallel", "parallel"]} ins(%0 : tensor<10x18xf32>) outs(%arg1 : tensor<10x8xf32>) {
^bb0(%arg2: f32, %arg3: f32): // no predecessors		^bb0(%arg2: f32, %arg3: f32): // no predecessors
%2 = addf %arg2, %arg3 : f32		%2 = addf %arg2, %arg3 : f32
linalg.yield %2 : f32		linalg.yield %2 : f32
} -> tensor<10x8xf32>		} -> tensor<10x8xf32>
return %1 : tensor<10x8xf32>		return %1 : tensor<10x8xf32>
}		}

mlir/test/Dialect/Linalg/tile-and-fuse-tensors.mlir

Show First 20 Lines • Show All 197 Lines • ▼ Show 20 Lines	func @conv_tensors_dynamic(%input: tensor<?x?x?x?xf32>, %filter: tensor<?x?x?x?xf32>, %elementwise: tensor<?x?x?x?xf32>) -> tensor<?x?x?x?xf32> {
}		}
return %for0 : tensor<?x?x?x?xf32>		return %for0 : tensor<?x?x?x?xf32>
}		}

// CHECK: #[[BOUND8_MAP:.+]] = affine_map<(d0)[s0] -> (8, -d0 + s0)>		// CHECK: #[[BOUND8_MAP:.+]] = affine_map<(d0)[s0] -> (8, -d0 + s0)>
// CHECK: #[[BOUND8_MAP_2:.+]] = affine_map<(d0)[s0, s1] -> (-d0 + s0, 8, -d0 + s1)>		// CHECK: #[[BOUND8_MAP_2:.+]] = affine_map<(d0)[s0, s1] -> (-d0 + s0, 8, -d0 + s1)>
// CHECK: #[[BOUND16_MAP:.+]] = affine_map<(d0)[s0] -> (16, -d0 + s0)>		// CHECK: #[[BOUND16_MAP:.+]] = affine_map<(d0)[s0] -> (16, -d0 + s0)>
// CHECK: #[[X2_MAP:.+]] = affine_map<(d0) -> (d0 * 2)>		// CHECK: #[[X2_MAP:.+]] = affine_map<(d0) -> (d0 * 2)>
// CHECK: #[[INPUT_BOUND:.+]] = affine_map<(d0, d1)[s0, s1] -> (d0 * 2 + s0 - 2, d1 * -2 + s0 + s1 * 2)>		// CHECK: #[[INPUT_BOUND:.+]] = affine_map<(d0, d1)[s0, s1] -> (d0 * 2 + s0 - 2, d1 * -2 + s0 + s1 * 2 - 2)>
// CHECK: #[[BOUND16_MAP_2:.+]] = affine_map<(d0)[s0, s1] -> (-d0 + s0, 16, -d0 + s1)>		// CHECK: #[[BOUND16_MAP_2:.+]] = affine_map<(d0)[s0, s1] -> (-d0 + s0, 16, -d0 + s1)>
// CHECK: #[[BOUND4_MAP:.+]] = affine_map<(d0)[s0] -> (4, -d0 + s0)>		// CHECK: #[[BOUND4_MAP:.+]] = affine_map<(d0)[s0] -> (4, -d0 + s0)>
// CHECK: #[[BOUND2_MAP:.+]] = affine_map<(d0)[s0] -> (2, -d0 + s0)>		// CHECK: #[[BOUND2_MAP:.+]] = affine_map<(d0)[s0] -> (2, -d0 + s0)>
// CHECK: #[[BOUND4_MAP_2:.+]] = affine_map<(d0)[s0, s1] -> (-d0 + s0, 4, -d0 + s1)>		// CHECK: #[[BOUND4_MAP_2:.+]] = affine_map<(d0)[s0, s1] -> (-d0 + s0, 4, -d0 + s1)>
// CHECK: #[[BOUND2_MAP_2:.+]] = affine_map<(d0, d1)[s0, s1] -> (-d0 + s0, 2, -d1 + s1)>		// CHECK: #[[BOUND2_MAP_2:.+]] = affine_map<(d0, d1)[s0, s1] -> (-d0 + s0, 2, -d1 + s1)>

// CHECK: func @conv_tensors_dynamic		// CHECK: func @conv_tensors_dynamic
// CHECK-SAME: (%[[INPUT]]: tensor<?x?x?x?xf32>, %[[FILTER]]: tensor<?x?x?x?xf32>, %[[ELEM]]: tensor<?x?x?x?xf32>)		// CHECK-SAME: (%[[INPUT]]: tensor<?x?x?x?xf32>, %[[FILTER]]: tensor<?x?x?x?xf32>, %[[ELEM]]: tensor<?x?x?x?xf32>)
▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/tile-conv.mlir

	// RUN: mlir-opt %s -linalg-tile="tile-sizes=2,3,0,0,4" \| FileCheck %s -check-prefix=TILE-23004			// RUN: mlir-opt %s -linalg-tile="tile-sizes=2,3,0,0,4" \| FileCheck %s -check-prefix=TILE-23004

	// TILE-23004-DAG: #[[$D0x30pS0x10:.]] = affine_map<(d0) -> (d0 30)>			// TILE-23004-DAG: #[[$D0x30pS0x10:.]] = affine_map<(d0) -> (d0 30)>
	// TILE-23004-DAG: #[[$S0x10p90D0x30pS1:.]] = affine_map<(d0)[s0, s1] -> (s0 10 + 51, d0 * -30 + s0 * 10 + s1 * 30)>			// TILE-23004-DAG: #[[$S0x10p90D0x30pS1:.]] = affine_map<(d0)[s0, s1] -> (s0 10 + 51, d0 * -30 + s0 * 10 + s1 * 30 - 39)>
	// TILE-23004-DAG: #[[$strided4D:.]] = affine_map<(d0, d1, d2, d3)[s0, s1, s2, s3] -> (d0 s1 + s0 + d1 * s2 + d2 * s3 + d3)>			// TILE-23004-DAG: #[[$strided4D:.]] = affine_map<(d0, d1, d2, d3)[s0, s1, s2, s3] -> (d0 s1 + s0 + d1 * s2 + d2 * s3 + d3)>
	// TILE-23004-DAG: #[[$bound_map_2:.*]] = affine_map<(d0)[s0] -> (2, -d0 + s0)>			// TILE-23004-DAG: #[[$bound_map_2:.*]] = affine_map<(d0)[s0] -> (2, -d0 + s0)>
	// TILE-23004-DAG: #[[$bound_map_3:.*]] = affine_map<(d0)[s0] -> (3, -d0 + s0)>			// TILE-23004-DAG: #[[$bound_map_3:.*]] = affine_map<(d0)[s0] -> (3, -d0 + s0)>
	// TILE-23004-DAG: #[[$bound_map_4:.*]] = affine_map<(d0)[s0] -> (4, -d0 + s0)>			// TILE-23004-DAG: #[[$bound_map_4:.*]] = affine_map<(d0)[s0] -> (4, -d0 + s0)>

	func @conv(%arg0: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg1: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg2: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>) {			func @conv(%arg0: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg1: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, %arg2: memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>) {
	linalg.conv(%arg0, %arg1, %arg2) {dilations = [10, 20], strides = [30, 40]} : memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>			linalg.conv(%arg0, %arg1, %arg2) {dilations = [10, 20], strides = [30, 40]} : memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>, memref<?x?x?x?xf32, offset: ?, strides: [?, ?, ?, 1]>
	return			return
	Show All 33 Lines

mlir/test/Dialect/Linalg/tile-simple-conv.mlir

	// RUN: mlir-opt %s -linalg-tile="tile-sizes=2,3,4" \| FileCheck %s			// RUN: mlir-opt %s -linalg-tile="tile-sizes=2,3,4" \| FileCheck %s

	// CHECK-DAG: #[[MAP0:.*]] = affine_map<(d0)[s0] -> (2, -d0 + s0)>			// CHECK-DAG: #[[MAP0:.*]] = affine_map<(d0)[s0] -> (2, -d0 + s0)>
	// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0)[s0, s1] -> (s0 + 2, -d0 + s0 + s1)>			// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0)[s0, s1] -> (s0 + 2, -d0 + s0 + s1 - 1)>
	// CHECK-DAG: #[[MAP2:.*]] = affine_map<(d0)[s0, s1] -> (s0 + 3, -d0 + s0 + s1)>			// CHECK-DAG: #[[MAP2:.*]] = affine_map<(d0)[s0, s1] -> (s0 + 3, -d0 + s0 + s1 - 1)>
	// CHECK-DAG: #[[MAP4:.*]] = affine_map<(d0)[s0] -> (3, -d0 + s0)>			// CHECK-DAG: #[[MAP4:.*]] = affine_map<(d0)[s0] -> (3, -d0 + s0)>
	// CHECK-DAG: #[[MAP5:.*]] = affine_map<(d0)[s0] -> (4, -d0 + s0)>			// CHECK-DAG: #[[MAP5:.*]] = affine_map<(d0)[s0] -> (4, -d0 + s0)>

	func @conv(%arg0 : memref<?x?x?x?xf32>, %arg1 : memref<?x?x?x?xf32>, %arg2 : memref<?x?x?x?xf32>) {			func @conv(%arg0 : memref<?x?x?x?xf32>, %arg1 : memref<?x?x?x?xf32>, %arg2 : memref<?x?x?x?xf32>) {
	linalg.conv(%arg0, %arg1, %arg2) : memref<?x?x?x?xf32>, memref<?x?x?x?xf32>, memref<?x?x?x?xf32>			linalg.conv(%arg0, %arg1, %arg2) : memref<?x?x?x?xf32>, memref<?x?x?x?xf32>, memref<?x?x?x?xf32>
	return			return
	}			}

	Show All 30 Lines