This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Linalg/Transforms/
-
Dialect/
-
Linalg/
-
Transforms/
1/2
Transforms.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
generalize-tensor-pack.mlir

Differential D144425

[mlir][tensor] Fix transpose permutation in tensor.pack generalization pattern
ClosedPublic

Authored by qedawkins on Feb 20 2023, 12:10 PM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
hanchung
chelini

Commits

rGbbf1d80d67db: [mlir][tensor] Fix transpose permutation in tensor.pack generalization pattern

Summary

The generalization pattern for tensor.pack was inverting the
innerDimsPos permutation when normalizing. Thus, the transpose op
produced when generalizing wouldn't do the correct transpose. This can
be observed with the following example by comparing the IR generated
with and without data layout op (pack/unpack) propagation.

#map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
builtin.module {
  func.func @forward(%arg0: tensor<3x5x7xf32>, %arg1: tensor<3x5x7xf32>) -> tensor<1x1x1x5x7x3xf32> {
    %empty = tensor.empty() : tensor<3x5x7xf32>
    %elementwise = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel", "parallel"]} ins(%arg0, %arg1 : tensor<3x5x7xf32>, tensor<3x5x7xf32>) outs(%empty : tensor<3x5x7xf32>) {
    ^bb0(%in: f32, %in_1: f32, %out: f32):
      %add = arith.addf %in, %in_1 : f32
      linalg.yield %add : f32
    } -> tensor<3x5x7xf32>
    %pack_empty = tensor.empty() : tensor<1x1x1x5x7x3xf32>
    %pack = tensor.pack %elementwise inner_dims_pos = [1, 2, 0] inner_tiles = [5, 7, 3] into %pack_empty : tensor<3x5x7xf32> -> tensor<1x1x1x5x7x3xf32>
    return %pack : tensor<1x1x1x5x7x3xf32>
  }
}

With the data layout propagation patterns through elementwise ops:

#map = affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d1, d2, d3, d4, d5)>
module {
  func.func @forward(%arg0: tensor<3x5x7xf32>, %arg1: tensor<3x5x7xf32>) -> tensor<1x1x1x5x7x3xf32> {
    %0 = tensor.empty() : tensor<1x1x1x5x7x3xf32>
    %1 = tensor.empty() : tensor<1x1x1x5x7x3xf32>
    %2 = tensor.empty() : tensor<7x3x5xf32>
    %transposed = linalg.transpose ins(%arg0 : tensor<3x5x7xf32>) outs(%2 : tensor<7x3x5xf32>) permutation = [2, 0, 1]
    %inserted_slice = tensor.insert_slice %transposed into %1[0, 0, 0, 0, 0, 0] [1, 1, 1, 7, 3, 5] [1, 1, 1, 1, 1, 1] : tensor<7x3x5xf32> into tensor<1x1x1x5x7x3xf32>
    %3 = tensor.empty() : tensor<1x1x1x5x7x3xf32>
    %4 = tensor.empty() : tensor<7x3x5xf32>
    %transposed_0 = linalg.transpose ins(%arg1 : tensor<3x5x7xf32>) outs(%4 : tensor<7x3x5xf32>) permutation = [2, 0, 1]
    %inserted_slice_1 = tensor.insert_slice %transposed_0 into %3[0, 0, 0, 0, 0, 0] [1, 1, 1, 7, 3, 5] [1, 1, 1, 1, 1, 1] : tensor<7x3x5xf32> into tensor<1x1x1x5x7x3xf32>
    %5 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel", "parallel"]} ins(%inserted_slice, %inserted_slice_1 : tensor<1x1x1x5x7x3xf32>, tensor<1x1x1x5x7x3xf32>) outs(%0 : tensor<1x1x1x5x7x3xf32>) {
    ^bb0(%in: f32, %in_2: f32, %out: f32):
      %6 = arith.addf %in, %in_2 : f32
      linalg.yield %6 : f32
    } -> tensor<1x1x1x5x7x3xf32>
    return %5 : tensor<1x1x1x5x7x3xf32>
  }
}

Without propagation:

#map = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
module {
  func.func @forward(%arg0: tensor<3x5x7xf32>, %arg1: tensor<3x5x7xf32>) -> tensor<1x1x1x5x7x3xf32> {
    %0 = tensor.empty() : tensor<3x5x7xf32>
    %1 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel", "parallel"]} ins(%arg0, %arg1 : tensor<3x5x7xf32>, tensor<3x5x7xf32>) outs(%0 : tensor<3x5x7xf32>) {
    ^bb0(%in: f32, %in_0: f32, %out: f32):
      %4 = arith.addf %in, %in_0 : f32
      linalg.yield %4 : f32
    } -> tensor<3x5x7xf32>
    %2 = tensor.empty() : tensor<1x1x1x5x7x3xf32>
    %3 = tensor.empty() : tensor<7x3x5xf32>
    %transposed = linalg.transpose ins(%1 : tensor<3x5x7xf32>) outs(%3 : tensor<7x3x5xf32>) permutation = [2, 0, 1]
    %inserted_slice = tensor.insert_slice %transposed into %2[0, 0, 0, 0, 0, 0] [1, 1, 1, 7, 3, 5] [1, 1, 1, 1, 1, 1] : tensor<7x3x5xf32> into tensor<1x1x1x5x7x3xf32>
    return %inserted_slice : tensor<1x1x1x5x7x3xf32>
  }
}

Where data layout propagation is doing a different transpose than what
generalization comes up with.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

qedawkins created this revision.Feb 20 2023, 12:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 20 2023, 12:10 PM

Herald added subscribers: hanchung, Moerafaat, bzcheeseman and 24 others. · View Herald Transcript

qedawkins published this revision for review.Feb 20 2023, 12:16 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptFeb 20 2023, 12:16 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: limo1996, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

qedawkins added a reviewer: hanchung.Feb 20 2023, 12:16 PM

Harbormaster completed remote builds in B214823: Diff 498926.Feb 20 2023, 12:31 PM

hanchung added a reviewer: chelini.Feb 21 2023, 1:08 PM

hanchung requested changes to this revision.Feb 21 2023, 2:41 PM

hanchung added inline comments.

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
579–589	The implementation of `getPackNormalizedInnerPerm` is weird to me.. it's calling `getPackUnpackNormalizedInnerPerm` and generates the inverse permutation. Why not just use the same method and call `invertPermutationVector` here?

This revision now requires changes to proceed.Feb 21 2023, 2:41 PM

Use invert helper instead of specialized function

lint

qedawkins added inline comments.Feb 21 2023, 3:07 PM

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
579–589	Thanks, this is much cleaner :)

Harbormaster completed remote builds in B215117: Diff 499309.Feb 21 2023, 4:10 PM

hanchung accepted this revision.Feb 22 2023, 11:29 AM

This revision is now accepted and ready to land.Feb 22 2023, 11:29 AM

This revision was landed with ongoing or failed builds.Feb 22 2023, 11:53 AM

Closed by commit rGbbf1d80d67db: [mlir][tensor] Fix transpose permutation in tensor.pack generalization pattern (authored by qedawkins). · Explain Why

This revision was automatically updated to reflect the committed changes.

qedawkins added a commit: rGbbf1d80d67db: [mlir][tensor] Fix transpose permutation in tensor.pack generalization pattern.

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Linalg/

Transforms/

Transforms.cpp

7 lines

test/

Dialect/

Linalg/

generalize-tensor-pack.mlir

18 lines

Diff 499613

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp

Show First 20 Lines • Show All 570 Lines • ▼ Show 20 Lines LogicalResult GeneralizeOuterUnitDimsPackOpPattern::matchAndRewrite(

Type elemType = packOp.getSourceType().getElementType(); Type elemType = packOp.getSourceType().getElementType();

auto readType = RankedTensorType::get(readShape, elemType); auto readType = RankedTensorType::get(readShape, elemType);

Value input = getPackOpSourceOrPaddedSource(rewriter, packOp); Value input = getPackOpSourceOrPaddedSource(rewriter, packOp);

Value tile = rewriter.create<tensor::ExtractSliceOp>( Value tile = rewriter.create<tensor::ExtractSliceOp>(

loc, readType, input, readOffsets, readSizes, readStrides); loc, readType, input, readOffsets, readSizes, readStrides);

// 2. Transpose the tile to match the inner tile order. // 2. Transpose the tile to match the inner tile order.

SmallVector<int64_t> perm = SmallVector<int64_t> perm =

getPackUnpackNormalizedInnerPerm(srcRank, packOp.getInnerDimsPos()); getPackUnpackNormalizedInnerPerm(srcRank, packOp.getInnerDimsPos());

// The permutation is inverted when normalizing so invert back to match the

// ordering in the pack op.

perm = invertPermutationVector(perm);

LLVM_DEBUG(DBGS() << "Pack permutation: " << packOp << "\n";

llvm::interleaveComma(perm, DBGS() << "perm: "); DBGSNL(););

SmallVector<int64_t> transpShape = readShape; SmallVector<int64_t> transpShape = readShape;

applyPermutationToVector<int64_t>(transpShape, perm); applyPermutationToVector<int64_t>(transpShape, perm);

hanchungUnsubmitted

Not Done

// 2. Transpose the tile to match the inner tile order.

SmallVector<int64_t> perm =

- getPackNormalizedInnerPerm(srcRank, packOp.getInnerDimsPos());

+ getPackUnpackNormalizedInnerPerm(srcRank, packOp.getInnerDimsPos());

+ // Add some comments here.

+ perm = invertPermutationVector(perm);

LLVM_DEBUG(DBGS() << "Pack permutation: " << packOp << "\n";

llvm::interleaveComma(perm, DBGS() << "perm: "); DBGSNL(););

SmallVector<int64_t> transpShape = readShape;

applyPermutationToVector<int64_t>(transpShape, perm);

Value empty = rewriter.create<tensor::EmptyOp>(loc, transpShape, elemType);

The implementation of getPackNormalizedInnerPerm is weird to me.. it's calling getPackUnpackNormalizedInnerPerm and generates the inverse permutation.

Why not just use the same method and call invertPermutationVector here?

hanchung: The implementation of `getPackNormalizedInnerPerm` is weird to me.. it's calling…

qedawkinsAuthorUnsubmitted

Done

Thanks, this is much cleaner :)

qedawkins: Thanks, this is much cleaner :)

Value empty = rewriter.create<tensor::EmptyOp>(loc, transpShape, elemType); Value empty = rewriter.create<tensor::EmptyOp>(loc, transpShape, elemType);

auto transposedOp = auto transposedOp =

rewriter.create<linalg::TransposeOp>(loc, tile, empty, perm); rewriter.create<linalg::TransposeOp>(loc, tile, empty, perm);

// 3. Insert the inner tile to the destination. // 3. Insert the inner tile to the destination.

int64_t destRank = packOp.getDestRank(); int64_t destRank = packOp.getDestRank();

SmallVector<OpFoldResult> writeStrides(destRank, oneIdxAttr); SmallVector<OpFoldResult> writeStrides(destRank, oneIdxAttr);

▲ Show 20 Lines • Show All 731 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/generalize-tensor-pack.mlir

	Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	// CHECK: %[[EMPTY:.+]] = tensor.empty() : tensor<32x8xf32>			// CHECK: %[[EMPTY:.+]] = tensor.empty() : tensor<32x8xf32>
	// CHECK: %[[TRANSP:.+]] = linalg.transpose			// CHECK: %[[TRANSP:.+]] = linalg.transpose
	// CHECK-SAME: ins(%[[SRC]] : tensor<32x8xf32>)			// CHECK-SAME: ins(%[[SRC]] : tensor<32x8xf32>)
	// CHECK-SAME: outs(%[[EMPTY]] : tensor<32x8xf32>)			// CHECK-SAME: outs(%[[EMPTY]] : tensor<32x8xf32>)
	// CHECK-SAME: permutation = [0, 1]			// CHECK-SAME: permutation = [0, 1]
	// CHECK: %[[INSERT:.+]] = tensor.insert_slice %[[TRANSP]] into %[[DEST]]			// CHECK: %[[INSERT:.+]] = tensor.insert_slice %[[TRANSP]] into %[[DEST]]
	// CHECK-SAME: [0, 0, 0, 0] [1, 1, 32, 8] [1, 1, 1, 1]			// CHECK-SAME: [0, 0, 0, 0] [1, 1, 32, 8] [1, 1, 1, 1]
	// CHECK: return %[[INSERT]]			// CHECK: return %[[INSERT]]

				// -----

				func.func @simple_CHW_to_CHWhwc(%arg0: tensor<3x5x7xf32>, %arg1: tensor<1x1x1x5x7x3xf32>) -> tensor<1x1x1x5x7x3xf32> {
				%0 = tensor.pack %arg0 inner_dims_pos = [1, 2, 0] inner_tiles = [5, 7, 3] into %arg1 : tensor<3x5x7xf32> -> tensor<1x1x1x5x7x3xf32>
				return %0 : tensor<1x1x1x5x7x3xf32>
				}
				// CHECK-LABEL: func.func @simple_CHW_to_CHWhwc
				// CHECK-SAME: %[[SRC:[a-zA-Z0-9]+]]
				// CHECK-SAME: %[[DEST:[a-zA-Z0-9]+]]
				// CHECK: %[[EMPTY:.+]] = tensor.empty() : tensor<5x7x3xf32>
				// CHECK: %[[TRANSP:.+]] = linalg.transpose
				// CHECK-SAME: ins(%[[SRC]] : tensor<3x5x7xf32>)
				// CHECK-SAME: outs(%[[EMPTY]] : tensor<5x7x3xf32>)
				// CHECK-SAME: permutation = [1, 2, 0]
				// CHECK: %[[INSERT:.+]] = tensor.insert_slice %[[TRANSP]] into %[[DEST]]
				// CHECK-SAME: [0, 0, 0, 0, 0, 0] [1, 1, 1, 5, 7, 3] [1, 1, 1, 1, 1, 1]
				// CHECK: return %[[INSERT]]