This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
IR/
-
LinalgOps.cpp
-
Transforms/
19/19
Fusion.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
1/1
fusion-tensor.mlir

Differential D86314

[mlir][Linalg] Enhance Linalg fusion on generic op and tensor_reshape op.
ClosedPublic

Authored by hanchung on Aug 20 2020, 12:45 PM.

Download Raw Diff

Details

Reviewers

mravishankar
nicolasvasilache

Commits

rGeb4efa883212: [mlir][Linalg] Enhance Linalg fusion on generic op and tensor_reshape op.

Summary

The tensor_reshape op was only fusible only if it is a collapsing case. Now we
propagate the op to all the operands so there is a further chance to fuse it
with generic op. The pre-conditions are:

The producer is not an indexed_generic op.
All the shapes of the operands are the same.
All the indexing maps are identity.
All the loops are parallel loops.

It is possible to fuse the ops if the producer is an indexed_generic op. We
still can compute the original indices. E.g., if the reshape op collapses the d0
and d1, we can use DimOp to get the width of d1, and calculate the index
d0 * width + d1. Then replace all the uses with it. However, this pattern is
not implemented in the patch.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hanchung created this revision.Aug 20 2020, 12:45 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 20 2020, 12:45 PM

Herald added subscribers: msifontes, jurahul, Kayjukh and 12 others. · View Herald Transcript

hanchung requested review of this revision.Aug 20 2020, 12:45 PM

Herald added subscribers: limo1996, stephenneuendorffer. · View Herald TranscriptAug 20 2020, 12:45 PM

Harbormaster completed remote builds in B69067: Diff 286881.Aug 20 2020, 1:01 PM

Fix test

Harbormaster completed remote builds in B69069: Diff 286884.Aug 20 2020, 1:32 PM

mravishankar requested changes to this revision.Aug 20 2020, 11:32 PM

mravishankar added inline comments.

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
845	You could combine all the checks into a single if with `(cond1) \|\| (cond2) \|\| ...`
850	I think you can assert `!types.empty()` here.
855	This could be llvm::any_of(producer.getIndexingMaps(), [](AffineMap map) { return map.isIdentity(); })
861	This could be llvm::any_of(producer.iterator_types(), [](Attribute attr) { return attr.cast<StringAttr>().getValue() != getParallelIteratorTypeName(); }))
873	I dont think you need to do this. There is a separate pattern that will fold the `constant -> tensor_reshape` into a `constant`
mlir/test/Dialect/Linalg/fusion-tensor.mlir
248	I think it would be better to check the shape, indexing maps, etc here as well cause those are generated by the pattern being applied.

This revision now requires changes to proceed.Aug 20 2020, 11:32 PM

Address comments

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
845	I actually prefer to make them separated because they are independent. I usually put the checks into a single if when they are logically related, eg, check if they are in the range [l, r], I would write `if (l <= val && val <= r)`. What do you think?
873	Needs to update ReshapeOp::fold and use createOrFold. I fixed it in this patch as well.

Harbormaster completed remote builds in B69283: Diff 287312.Aug 24 2020, 2:24 AM

mravishankar requested changes to this revision.Aug 24 2020, 12:58 PM

mravishankar added inline comments.

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
840–841	The way this was setup, the `isFusible` was supposed to be a collection of all checks that tell you if the producer-consumer pair is fusible. If this returns true. Its OK to do this in a future change, but it would be good to retain this structure. This could be refactored to static bool isFusibleCase1(...) static LinalgOp fuseCase1(...) static bool isFusibleCase2(...) static LinalgOp fuseCase2(...) static bool isFusible(..) { return isFusibleCase1(..) \|\| isFusibleCase2(...) } static bool fuse(...) { if (isFusibleCase1(..)) { return fuseCase1(..); } if (isFusibleCase2(..)) { return fuseCase2(..); } return nullptr; }
845	I think it is better to combine these. Makes the code less verbose. COmbining all the checks, and a comment describing what each check is must be clear enough.
864	Maybe we need to add a couple more checks to this. The producer `linalg.generic` op has a single user (the `linalg.tensor_reshape` op). WHen these operations are converted to buffers the reshape ideally just becomes a view modifier. So %0 = linalg.generic ... : ... -> tensor<typeA> %1 = linalg.tensor_reshape %0 -> tensor<typeA> to tensor<typeB> in buffer world would be linalg.generic %0 .... : ...., memref<typeA> %1 = linalg.reshape %0 ... : memref<typeA> into memref<typeB> With a single use in tensor world, there wont be an increases in "memory usage" when converted to buffers as the modified code would just lower to linalg.generic %0 .... : ...., memref<typeB> If the generic op had two uses %0 = linalg.generic ... : ... -> tensor<typeA> %1 = linalg.tensor_reshape %0 -> tensor<typeA> to tensor<typeB> %2 = linalg.generic %0 ... : tensor<typeA> ... Fusion would result in %0 = linalg.generic ... : ... -> tensor<typeA> %1 = linalg.generic ... : .... -> tensor<typeB> %2 = linalg.generic %0 ... : tensor<typeA> this when converted to buffers linalg.generic .... %0 : ... memref<typeA> linalg.generic .... %1 : ... memref<typeB> linalg.generic %0 ... : memref<typeA> This has an extra `memref<typeA>` . The operating theory has been that is better to convert to "higher"-dimensionality. The test case below is converting the producer op to use a higher dimensionality. Maybe have that requirement explicit, i.e. check that the `tensor_reshape` result is of higher rank than the `tensor_reshape` source.

This revision now requires changes to proceed.Aug 24 2020, 12:58 PM

Address comments

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp

845

I tried this in isFusibleCase2, but it looks bad to me. :(

We need to count and map the condition to each other.

I've seen that we should make code descriptive instead of adding comment somewhere. But maybe this is not style to follow here. So I was putting the conditions separately in the beginning.

864

Added the second check.

Regarding the first check, I think in this case we would lose a chance to fuse these two generic op. The reshape op would be propagate and become

tensor_reshape
generic
generic

And then it would fuse into the first generic op. In the end two generic ops have a chance to fuse.

I tested with this case

#map0 = affine_map<(d0, d1) -> (d0, d1)>
#map1 = affine_map<(d0, d1, d2) -> (d0, d1)>
#map2 = affine_map<(d0, d1, d2) -> (d2)>
#map3 = affine_map<(d0, d1, d2) -> (d0, d1, d2)>

func @generic_op_reshape_consumer_expanding(%arg0: tensor<264x4xf32>)
                                            -> tensor<8x33x4xf32> {
  %cst = constant dense<2.000000e+00> : tensor<264x4xf32>
  %0 = linalg.generic
    {args_in = 2 : i64, args_out = 1 : i64,
     indexing_maps = [#map0, #map0, #map0],
     iterator_types = ["parallel", "parallel"]}
    %arg0, %cst {
    ^bb0(%arg1: f32, %arg2: f32):  // no predecessors
      %2 = mulf %arg1, %arg2 : f32
      linalg.yield %2 : f32
    }: tensor<264x4xf32>, tensor<264x4xf32> -> tensor<264x4xf32>
  %1 = linalg.tensor_reshape %0 [#map1, #map2] :
    tensor<264x4xf32> into tensor<8x33x4xf32>
  %2 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64,
     indexing_maps = [#map3, #map3],
     iterator_types = ["parallel", "parallel", "parallel"]}
    %1 {
    ^bb0(%arg1: f32):  // no predecessors
      %2 = mulf %arg1, %arg1 : f32
      linalg.yield %2 : f32
    }: tensor<8x33x4xf32> -> tensor<8x33x4xf32>

  return %2 : tensor<8x33x4xf32>
}

And it would be fused to

#map0 = affine_map<(d0, d1, d2) -> (d0 * 33 + d1, d2)>
#map1 = affine_map<(d0, d1, d2) -> (d0, d1, d2)>


module {
  func @generic_op_reshape_consumer_expanding(%arg0: tensor<264x4xf32>) -> tensor<8x33x4xf32> {
    %cst = constant 2.000000e+00 : f32
    %0 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map1], iterator_types = ["parallel", "parallel", "parallel"]} %arg0 {
    ^bb0(%arg1: f32):  // no predecessors
      %1 = mulf %arg1, %cst : f32
      %2 = mulf %1, %1 : f32
      linalg.yield %2 : f32
    }: tensor<264x4xf32> -> tensor<8x33x4xf32>
    return %0 : tensor<8x33x4xf32>
  }
}

Which looks good to me. Even the all of the ops are not fused, won't it result in

linalg.reshape
linalg.generic
linalg.generic

in buffers world?

Harbormaster completed remote builds in B69753: Diff 288259.Aug 27 2020, 3:31 AM

hanchung added inline comments.Aug 27 2020, 7:05 AM

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp

864

Oh, I was using wrong example.

It should be

from

#map0 = affine_map<(d0, d1) -> (d0, d1)>
#map1 = affine_map<(d0, d1, d2) -> (d0, d1)>
#map2 = affine_map<(d0, d1, d2) -> (d2)>
#map3 = affine_map<(d0, d1, d2) -> (d0, d1, d2)>

func @generic_op_reshape_consumer_expanding(%arg0: tensor<264x4xf32>)
                                            -> (tensor<8x33x4xf32>, tensor<264x4xf32>) {
  %cst = constant dense<2.000000e+00> : tensor<264x4xf32>
  %0 = linalg.generic
    {args_in = 2 : i64, args_out = 1 : i64,
     indexing_maps = [#map0, #map0, #map0],
     iterator_types = ["parallel", "parallel"]}
    %arg0, %cst {
    ^bb0(%arg1: f32, %arg2: f32):  // no predecessors
      %2 = mulf %arg1, %arg2 : f32
      linalg.yield %2 : f32
    }: tensor<264x4xf32>, tensor<264x4xf32> -> tensor<264x4xf32>
  %1 = linalg.tensor_reshape %0 [#map1, #map2] :
    tensor<264x4xf32> into tensor<8x33x4xf32>
  %2 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64,
     indexing_maps = [#map0, #map0],
     iterator_types = ["parallel", "parallel"]}
    %0 {
    ^bb0(%arg1: f32):  // no predecessors
      %2 = mulf %arg1, %arg1 : f32
      linalg.yield %2 : f32
    }: tensor<264x4xf32> -> tensor<264x4xf32>

  return %1, %2 : tensor<8x33x4xf32>, tensor<264x4xf32>
}

#map0 = affine_map<(d0, d1, d2) -> (d0 * 33 + d1, d2)>
#map1 = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
#map2 = affine_map<(d0, d1) -> (d0, d1)>


module {
  func @generic_op_reshape_consumer_expanding(%arg0: tensor<264x4xf32>) -> (tensor<8x33x4xf32>, tensor<264x4xf32>) {
    %cst = constant 2.000000e+00 : f32
    %0 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map1], iterator_types = ["parallel", "parallel", "parallel"]} %arg0 {
    ^bb0(%arg1: f32):  // no predecessors
      %2 = mulf %arg1, %cst : f32
      linalg.yield %2 : f32
    }: tensor<264x4xf32> -> tensor<8x33x4xf32>
    %1 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map2, #map2], iterator_types = ["parallel", "parallel"]} %arg0 {
    ^bb0(%arg1: f32):  // no predecessors
      %2 = mulf %arg1, %cst : f32
      %3 = mulf %2, %2 : f32
      linalg.yield %3 : f32
    }: tensor<264x4xf32> -> tensor<264x4xf32>
    return %0, %1 : tensor<8x33x4xf32>, tensor<264x4xf32>
  }
}

I think it would still be

linalg.generic
linalg.generic

in buffers world?

Why would

%0 = linalg.generic ... : ... -> tensor<typeA>
%1 = linalg.tensor_reshape %0 -> tensor<typeA> to tensor<typeB>
%2 = linalg.generic %0 ... : tensor<typeA> ...

become

%0 = linalg.generic ... : ... -> tensor<typeA>
%1 = linalg.generic ... : .... -> tensor<typeB>
%2 = linalg.generic %0 ... : tensor<typeA>

I think the tensor reshape op would be propagated up and eventually either be lowered to liangl.reshape or fuse with the next generic (ie %0)?

rebase

Harbormaster completed remote builds in B69785: Diff 288332.Aug 27 2020, 8:24 AM

Looks good. Few minor comments. Please address before submitting.

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
832–834	Nit: I was hoping you would be able to pick a better name than `isFusibleCase1` (I used it cause I couldnt think of one). Same for `isFusibleCase2` Maybe make `Case1` -> `WhenCollapsing` and `Case2` -> `WhenExpanding`?
864	I dont disagree with what you are saying. But I think more experimentation/data is needed here and its better to go incremental instead of solving a general case that might have unintended consequences. If for the current uses cases checking that the `tensor_reshape` has a single use and only then applying the transformation is safer. The example you gave is fine and it that case the `tensor_reshape` has a single use. But it easy to adapt that example to have a case with the tensor_reshape has multiple uses. You are right that this case wouldnt be handled right now. Lets revisit that if we need to? Regarding your question, %0 = linalg.generic ... : ... -> tensor<typeA> %1 = linalg.generic ... : .... -> tensor<typeB> %2 = linalg.generic %0 ... : tensor<typeA> I am not refering to the `linalg.reshape` that exists above the snippet. We can discuss this offline. FWIW, if your use case currently has multiple uses of the reshape op, then its OK to not add that check.
888	Maybe a matter of preference, but this looks clean to me :)
901	Nit: This should be `consumer.getSrcType().getRank() < consumer.getResultType().getRank()`. `==` is illegal by op definition.

This revision is now accepted and ready to land.Aug 27 2020, 10:51 AM

Address comments

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
832–834	Oh I miss this. Let's use isCollapsingAndFusible, fuseCollapsingCase, etc.
864	Thanks for the detail explanation. I agree with you, let's do it incremental.

Harbormaster completed remote builds in B69888: Diff 288534.Aug 27 2020, 11:29 PM

Closed by commit rGeb4efa883212: [mlir][Linalg] Enhance Linalg fusion on generic op and tensor_reshape op. (authored by hanchung). · Explain WhyAug 28 2020, 1:56 AM

This revision was automatically updated to reflect the committed changes.

hanchung added a commit: rGeb4efa883212: [mlir][Linalg] Enhance Linalg fusion on generic op and tensor_reshape op..

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Linalg/

IR/

LinalgOps.cpp

15 lines

Transforms/

Fusion.cpp

89 lines

test/

Dialect/

Linalg/

fusion-tensor.mlir

34 lines

Diff 288554

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp

Show First 20 Lines • Show All 390 Lines • ▼ Show 20 Lines	else if (areReshapeOpsFoldable(srcReshapeOp.getSrcType(),
return success();		return success();
}		}
return failure();		return failure();
}		}
};		};
} // namespace		} // namespace

template <typename ReshapeOpTy>		template <typename ReshapeOpTy>
static OpFoldResult foldReshapeOp(ReshapeOpTy reshapeOp) {		static OpFoldResult foldReshapeOp(ReshapeOpTy reshapeOp,
		ArrayRef<Attribute> operands) {
// Fold producer-consumer reshape ops that where the operand type of the		// Fold producer-consumer reshape ops that where the operand type of the
// producer is same as the return type of the consumer. This can only be		// producer is same as the return type of the consumer. This can only be
// verified if the shapes in question are static.		// verified if the shapes in question are static.
ReshapeOpTy reshapeSrcOp =		ReshapeOpTy reshapeSrcOp =
reshapeOp.src().template getDefiningOp<ReshapeOpTy>();		reshapeOp.src().template getDefiningOp<ReshapeOpTy>();
if (reshapeSrcOp && reshapeSrcOp.getSrcType().hasStaticShape() &&		if (reshapeSrcOp && reshapeSrcOp.getSrcType().hasStaticShape() &&
reshapeOp.getResultType().hasStaticShape() &&		reshapeOp.getResultType().hasStaticShape() &&
reshapeSrcOp.getSrcType() == reshapeOp.getResultType())		reshapeSrcOp.getSrcType() == reshapeOp.getResultType())
return reshapeSrcOp.src();		return reshapeSrcOp.src();
		if (auto elements = operands.front().dyn_cast_or_null<DenseElementsAttr>()) {
		return elements.reshape(
		reshapeOp.getResult().getType().template cast<ShapedType>());
		}
return nullptr;		return nullptr;
}		}

/// Return true if the reassociation specification is valid, false otherwise.		/// Return true if the reassociation specification is valid, false otherwise.
/// When false, the `invalidIndex` integer pointer is optionally filled with the		/// When false, the `invalidIndex` integer pointer is optionally filled with the
/// index of the offending reassociation map.		/// index of the offending reassociation map.
static bool isReassociationValid(ArrayRef<AffineMap> reassociation,		static bool isReassociationValid(ArrayRef<AffineMap> reassociation,
int *invalidIndex = nullptr) {		int *invalidIndex = nullptr) {
▲ Show 20 Lines • Show All 753 Lines • ▼ Show 20 Lines	llvm::interleave(
types.begin(), types.end(), [&](Type t) { appendMangledType(ss, t); },		types.begin(), types.end(), [&](Type t) { appendMangledType(ss, t); },
[&]() { ss << "_"; });		[&]() { ss << "_"; });
return ss.str();		return ss.str();
}		}

// TODO: Consider making all this boilerplate easy to autogenerate		// TODO: Consider making all this boilerplate easy to autogenerate
// with Tablegen. This seems a desirable property in the context of OpInterfaces		// with Tablegen. This seems a desirable property in the context of OpInterfaces
// where a Linalg "named" op isa LinalgOp.		// where a Linalg "named" op isa LinalgOp.
OpFoldResult ReshapeOp::fold(ArrayRef<Attribute>) {		OpFoldResult ReshapeOp::fold(ArrayRef<Attribute> operands) {
if (succeeded(foldMemRefCast(*this)))		if (succeeded(foldMemRefCast(*this)))
return getResult();		return getResult();
return foldReshapeOp(*this);		return foldReshapeOp(*this, operands);
}		}
OpFoldResult SliceOp::fold(ArrayRef<Attribute>) {		OpFoldResult SliceOp::fold(ArrayRef<Attribute>) {
if (succeeded(foldMemRefCast(*this)))		if (succeeded(foldMemRefCast(*this)))
return getResult();		return getResult();
return {};		return {};
}		}
OpFoldResult TensorReshapeOp::fold(ArrayRef<Attribute>) {		OpFoldResult TensorReshapeOp::fold(ArrayRef<Attribute> operands) {
return foldReshapeOp(*this);		return foldReshapeOp(*this, operands);
}		}
OpFoldResult TransposeOp::fold(ArrayRef<Attribute>) {		OpFoldResult TransposeOp::fold(ArrayRef<Attribute>) {
if (succeeded(foldMemRefCast(*this)))		if (succeeded(foldMemRefCast(*this)))
return getResult();		return getResult();
return {};		return {};
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 158 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp

Show First 20 Lines • Show All 767 Lines • ▼ Show 20 Lines	return isa<GenericOp, IndexedGenericOp>(consumer.getOperation()) &&
isTensorReshapeOpFusible(producer,		isTensorReshapeOpFusible(producer,
consumer.getInputIndexingMap(consumerIdx),		consumer.getInputIndexingMap(consumerIdx),
/asProducer=/true);		/asProducer=/true);
}		}

static LinalgOp fuse(TensorReshapeOp producer, LinalgOp consumer,		static LinalgOp fuse(TensorReshapeOp producer, LinalgOp consumer,
unsigned consumerIdx, PatternRewriter &rewriter,		unsigned consumerIdx, PatternRewriter &rewriter,
OperationFolder *folder = nullptr) {		OperationFolder *folder = nullptr) {
		if (producer.src().getDefiningOp<ConstantOp>())
		return nullptr;

if (!isFusible(producer, consumer, consumerIdx))		if (!isFusible(producer, consumer, consumerIdx))
return nullptr;		return nullptr;

// Compute the fused operands list,		// Compute the fused operands list,
Operation *consumerOp = consumer.getOperation();		Operation *consumerOp = consumer.getOperation();
SmallVector<Value, 2> fusedOperands(consumerOp->getOperands());		SmallVector<Value, 2> fusedOperands(consumerOp->getOperands());
fusedOperands[consumerIdx] = producer.src();		fusedOperands[consumerIdx] = producer.src();

Show All 37 Lines	static LinalgOp fuse(TensorReshapeOp producer, LinalgOp consumer,
rewriter.cloneRegionBefore(consumerOp->getRegion(0), fusedRegion,		rewriter.cloneRegionBefore(consumerOp->getRegion(0), fusedRegion,
fusedRegion.begin());		fusedRegion.begin());
return fusedOp;		return fusedOp;
}		}
};		};

/// Implementation of fusion on tensor ops when consumer is a TensorReshapeOp.		/// Implementation of fusion on tensor ops when consumer is a TensorReshapeOp.
struct FuseTensorReshapeOpAsConsumer {		struct FuseTensorReshapeOpAsConsumer {
static bool isFusible(LinalgOp producer, TensorReshapeOp consumer,		static bool isCollapsingAndFusible(LinalgOp producer,
		TensorReshapeOp consumer,
unsigned consumerIdx) {		unsigned consumerIdx) {
		mravishankarUnsubmitted Done Reply Inline Actions Nit: I was hoping you would be able to pick a better name than `isFusibleCase1` (I used it cause I couldnt think of one). Same for `isFusibleCase2` Maybe make `Case1` -> `WhenCollapsing` and `Case2` -> `WhenExpanding`? mravishankar: Nit: I was hoping you would be able to pick a better name than `isFusibleCase1` (I used it…
		hanchungAuthorUnsubmitted Done Reply Inline Actions Oh I miss this. Let's use isCollapsingAndFusible, fuseCollapsingCase, etc. hanchung: Oh I miss this. Let's use isCollapsingAndFusible, fuseCollapsingCase, etc.
return isa<GenericOp, IndexedGenericOp>(producer.getOperation()) &&		return isa<GenericOp, IndexedGenericOp>(producer.getOperation()) &&
producer.hasTensorSemantics() &&		producer.hasTensorSemantics() &&
isTensorReshapeOpFusible(consumer, producer.getOutputIndexingMap(0),		isTensorReshapeOpFusible(consumer, producer.getOutputIndexingMap(0),
/asProducer=/false);		/asProducer=/false);
}		}

static LinalgOp fuse(LinalgOp producer, TensorReshapeOp consumer,		static LinalgOp fuseCollapsingCase(LinalgOp producer,
		mravishankarUnsubmitted Done Reply Inline Actions The way this was setup, the `isFusible` was supposed to be a collection of all checks that tell you if the producer-consumer pair is fusible. If this returns true. Its OK to do this in a future change, but it would be good to retain this structure. This could be refactored to static bool isFusibleCase1(...) static LinalgOp fuseCase1(...) static bool isFusibleCase2(...) static LinalgOp fuseCase2(...) static bool isFusible(..) { return isFusibleCase1(..) \|\| isFusibleCase2(...) } static bool fuse(...) { if (isFusibleCase1(..)) { return fuseCase1(..); } if (isFusibleCase2(..)) { return fuseCase2(..); } return nullptr; } mravishankar: The way this was setup, the `isFusible` was supposed to be a collection of all checks that tell…
unsigned consumerIdx, PatternRewriter &rewriter,		TensorReshapeOp consumer,
OperationFolder *folder = nullptr) {		unsigned consumerIdx,
if (!isFusible(producer, consumer, consumerIdx))		PatternRewriter &rewriter) {
return nullptr;

// The indexing_maps for the operands of the fused operation are same as		// The indexing_maps for the operands of the fused operation are same as
		mravishankarUnsubmitted Done Reply Inline Actions You could combine all the checks into a single if with `(cond1) \|\| (cond2) \|\| ...` mravishankar: You could combine all the checks into a single if with `(cond1) \|\| (cond2) \|\| ...`
		hanchungAuthorUnsubmitted Done Reply Inline Actions I actually prefer to make them separated because they are independent. I usually put the checks into a single if when they are logically related, eg, check if they are in the range [l, r], I would write `if (l <= val && val <= r)`. What do you think? hanchung: I actually prefer to make them separated because they are independent. I usually put the checks…
		mravishankarUnsubmitted Done Reply Inline Actions I think it is better to combine these. Makes the code less verbose. COmbining all the checks, and a comment describing what each check is must be clear enough. mravishankar: I think it is better to combine these. Makes the code less verbose. COmbining all the checks…
		hanchungAuthorUnsubmitted Done Reply Inline Actions I tried this in isFusibleCase2, but it looks bad to me. :( We need to count and map the condition to each other. I've seen that we should make code descriptive instead of adding comment somewhere. But maybe this is not style to follow here. So I was putting the conditions separately in the beginning. hanchung: I tried this in isFusibleCase2, but it looks bad to me. :( We need to count and map the…
// those for the operands of the producer.		// those for the operands of the producer.
SmallVector<AffineMap, 4> fusedIndexMaps =		SmallVector<AffineMap, 4> fusedIndexMaps =
llvm::to_vector<4>(llvm::map_range(		llvm::to_vector<4>(llvm::map_range(
producer.indexing_maps(), [](Attribute attr) -> AffineMap {		producer.indexing_maps(), [](Attribute attr) -> AffineMap {
return attr.cast<AffineMapAttr>().getValue();		return attr.cast<AffineMapAttr>().getValue();
		mravishankarUnsubmitted Done Reply Inline Actions I think you can assert `!types.empty()` here. mravishankar: I think you can assert `!types.empty()` here.
}));		}));
// Compute the indexing map to use for the operand of the producer.		// Compute the indexing map to use for the operand of the producer.
AffineMap modifiedMap = linearizeCollapsedDims(		AffineMap modifiedMap = linearizeCollapsedDims(
producer.getOutputIndexingMap(0), consumer.getSrcType().getShape(),		producer.getOutputIndexingMap(0), consumer.getSrcType().getShape(),
consumer.getReassociationMaps());		consumer.getReassociationMaps());
		mravishankarUnsubmitted Done Reply Inline Actions This could be llvm::any_of(producer.getIndexingMaps(), [](AffineMap map) { return map.isIdentity(); }) mravishankar: This could be ``` llvm::any_of(producer.getIndexingMaps(), [](AffineMap map) { return map.
for (AffineExpr expr : modifiedMap.getResults()) {		for (AffineExpr expr : modifiedMap.getResults()) {
if (!expr.isPureAffine())		if (!expr.isPureAffine())
return nullptr;		return nullptr;
}		}
fusedIndexMaps.back() = modifiedMap;		fusedIndexMaps.back() = modifiedMap;

		mravishankarUnsubmitted Done Reply Inline Actions This could be llvm::any_of(producer.iterator_types(), [](Attribute attr) { return attr.cast<StringAttr>().getValue() != getParallelIteratorTypeName(); })) mravishankar: This could be ``` llvm::any_of(producer.iterator_types(), [](Attribute attr) { return attr.
// Further check that the resulting index maps can be fused and		// Further check that the resulting index maps can be fused and
// inverted. Without this the resultant op is not legal.		// inverted. Without this the resultant op is not legal.
if (!inversePermutation(concatAffineMaps(fusedIndexMaps)))		if (!inversePermutation(concatAffineMaps(fusedIndexMaps)))
		mravishankarUnsubmitted Done Reply Inline Actions Maybe we need to add a couple more checks to this. The producer `linalg.generic` op has a single user (the `linalg.tensor_reshape` op). WHen these operations are converted to buffers the reshape ideally just becomes a view modifier. So %0 = linalg.generic ... : ... -> tensor<typeA> %1 = linalg.tensor_reshape %0 -> tensor<typeA> to tensor<typeB> in buffer world would be linalg.generic %0 .... : ...., memref<typeA> %1 = linalg.reshape %0 ... : memref<typeA> into memref<typeB> With a single use in tensor world, there wont be an increases in "memory usage" when converted to buffers as the modified code would just lower to linalg.generic %0 .... : ...., memref<typeB> If the generic op had two uses %0 = linalg.generic ... : ... -> tensor<typeA> %1 = linalg.tensor_reshape %0 -> tensor<typeA> to tensor<typeB> %2 = linalg.generic %0 ... : tensor<typeA> ... Fusion would result in %0 = linalg.generic ... : ... -> tensor<typeA> %1 = linalg.generic ... : .... -> tensor<typeB> %2 = linalg.generic %0 ... : tensor<typeA> this when converted to buffers linalg.generic .... %0 : ... memref<typeA> linalg.generic .... %1 : ... memref<typeB> linalg.generic %0 ... : memref<typeA> This has an extra `memref<typeA>` . The operating theory has been that is better to convert to "higher"-dimensionality. The test case below is converting the producer op to use a higher dimensionality. Maybe have that requirement explicit, i.e. check that the `tensor_reshape` result is of higher rank than the `tensor_reshape` source. mravishankar: Maybe we need to add a couple more checks to this. 1) The producer `linalg.generic` op has a…
		hanchungAuthorUnsubmitted Done Reply Inline Actions Added the second check. Regarding the first check, I think in this case we would lose a chance to fuse these two generic op. The reshape op would be propagate and become tensor_reshape generic generic And then it would fuse into the first generic op. In the end two generic ops have a chance to fuse. I tested with this case #map0 = affine_map<(d0, d1) -> (d0, d1)> #map1 = affine_map<(d0, d1, d2) -> (d0, d1)> #map2 = affine_map<(d0, d1, d2) -> (d2)> #map3 = affine_map<(d0, d1, d2) -> (d0, d1, d2)> func @generic_op_reshape_consumer_expanding(%arg0: tensor<264x4xf32>) -> tensor<8x33x4xf32> { %cst = constant dense<2.000000e+00> : tensor<264x4xf32> %0 = linalg.generic {args_in = 2 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0, #map0], iterator_types = ["parallel", "parallel"]} %arg0, %cst { ^bb0(%arg1: f32, %arg2: f32): // no predecessors %2 = mulf %arg1, %arg2 : f32 linalg.yield %2 : f32 }: tensor<264x4xf32>, tensor<264x4xf32> -> tensor<264x4xf32> %1 = linalg.tensor_reshape %0 [#map1, #map2] : tensor<264x4xf32> into tensor<8x33x4xf32> %2 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map3, #map3], iterator_types = ["parallel", "parallel", "parallel"]} %1 { ^bb0(%arg1: f32): // no predecessors %2 = mulf %arg1, %arg1 : f32 linalg.yield %2 : f32 }: tensor<8x33x4xf32> -> tensor<8x33x4xf32> return %2 : tensor<8x33x4xf32> } And it would be fused to #map0 = affine_map<(d0, d1, d2) -> (d0 * 33 + d1, d2)> #map1 = affine_map<(d0, d1, d2) -> (d0, d1, d2)> module { func @generic_op_reshape_consumer_expanding(%arg0: tensor<264x4xf32>) -> tensor<8x33x4xf32> { %cst = constant 2.000000e+00 : f32 %0 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map1], iterator_types = ["parallel", "parallel", "parallel"]} %arg0 { ^bb0(%arg1: f32): // no predecessors %1 = mulf %arg1, %cst : f32 %2 = mulf %1, %1 : f32 linalg.yield %2 : f32 }: tensor<264x4xf32> -> tensor<8x33x4xf32> return %0 : tensor<8x33x4xf32> } } Which looks good to me. Even the all of the ops are not fused, won't it result in linalg.reshape linalg.generic linalg.generic in buffers world? hanchung: Added the second check. Regarding the first check, I think in this case we would lose a chance…
		hanchungAuthorUnsubmitted Done Reply Inline Actions Oh, I was using wrong example. It should be from #map0 = affine_map<(d0, d1) -> (d0, d1)> #map1 = affine_map<(d0, d1, d2) -> (d0, d1)> #map2 = affine_map<(d0, d1, d2) -> (d2)> #map3 = affine_map<(d0, d1, d2) -> (d0, d1, d2)> func @generic_op_reshape_consumer_expanding(%arg0: tensor<264x4xf32>) -> (tensor<8x33x4xf32>, tensor<264x4xf32>) { %cst = constant dense<2.000000e+00> : tensor<264x4xf32> %0 = linalg.generic {args_in = 2 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0, #map0], iterator_types = ["parallel", "parallel"]} %arg0, %cst { ^bb0(%arg1: f32, %arg2: f32): // no predecessors %2 = mulf %arg1, %arg2 : f32 linalg.yield %2 : f32 }: tensor<264x4xf32>, tensor<264x4xf32> -> tensor<264x4xf32> %1 = linalg.tensor_reshape %0 [#map1, #map2] : tensor<264x4xf32> into tensor<8x33x4xf32> %2 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map0], iterator_types = ["parallel", "parallel"]} %0 { ^bb0(%arg1: f32): // no predecessors %2 = mulf %arg1, %arg1 : f32 linalg.yield %2 : f32 }: tensor<264x4xf32> -> tensor<264x4xf32> return %1, %2 : tensor<8x33x4xf32>, tensor<264x4xf32> } to #map0 = affine_map<(d0, d1, d2) -> (d0 * 33 + d1, d2)> #map1 = affine_map<(d0, d1, d2) -> (d0, d1, d2)> #map2 = affine_map<(d0, d1) -> (d0, d1)> module { func @generic_op_reshape_consumer_expanding(%arg0: tensor<264x4xf32>) -> (tensor<8x33x4xf32>, tensor<264x4xf32>) { %cst = constant 2.000000e+00 : f32 %0 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map0, #map1], iterator_types = ["parallel", "parallel", "parallel"]} %arg0 { ^bb0(%arg1: f32): // no predecessors %2 = mulf %arg1, %cst : f32 linalg.yield %2 : f32 }: tensor<264x4xf32> -> tensor<8x33x4xf32> %1 = linalg.generic {args_in = 1 : i64, args_out = 1 : i64, indexing_maps = [#map2, #map2], iterator_types = ["parallel", "parallel"]} %arg0 { ^bb0(%arg1: f32): // no predecessors %2 = mulf %arg1, %cst : f32 %3 = mulf %2, %2 : f32 linalg.yield %3 : f32 }: tensor<264x4xf32> -> tensor<264x4xf32> return %0, %1 : tensor<8x33x4xf32>, tensor<264x4xf32> } } I think it would still be linalg.generic linalg.generic in buffers world? Why would %0 = linalg.generic ... : ... -> tensor<typeA> %1 = linalg.tensor_reshape %0 -> tensor<typeA> to tensor<typeB> %2 = linalg.generic %0 ... : tensor<typeA> ... become %0 = linalg.generic ... : ... -> tensor<typeA> %1 = linalg.generic ... : .... -> tensor<typeB> %2 = linalg.generic %0 ... : tensor<typeA> I think the tensor reshape op would be propagated up and eventually either be lowered to `liangl.reshape` or fuse with the next generic (ie %0)? hanchung: Oh, I was using wrong example. It should be from ``` #map0 = affine_map<(d0, d1) -> (d0…
		mravishankarUnsubmitted Done Reply Inline Actions I dont disagree with what you are saying. But I think more experimentation/data is needed here and its better to go incremental instead of solving a general case that might have unintended consequences. If for the current uses cases checking that the `tensor_reshape` has a single use and only then applying the transformation is safer. The example you gave is fine and it that case the `tensor_reshape` has a single use. But it easy to adapt that example to have a case with the tensor_reshape has multiple uses. You are right that this case wouldnt be handled right now. Lets revisit that if we need to? Regarding your question, %0 = linalg.generic ... : ... -> tensor<typeA> %1 = linalg.generic ... : .... -> tensor<typeB> %2 = linalg.generic %0 ... : tensor<typeA> I am not refering to the `linalg.reshape` that exists above the snippet. We can discuss this offline. FWIW, if your use case currently has multiple uses of the reshape op, then its OK to not add that check. mravishankar: I dont disagree with what you are saying. But I think more experimentation/data is needed here…
		hanchungAuthorUnsubmitted Done Reply Inline Actions Thanks for the detail explanation. I agree with you, let's do it incremental. hanchung: Thanks for the detail explanation. I agree with you, let's do it incremental.
return nullptr;		return nullptr;

SmallVector<Attribute, 4> indexMapAttrs = llvm::to_vector<4>(		SmallVector<Attribute, 4> indexMapAttrs = llvm::to_vector<4>(
llvm::map_range(fusedIndexMaps, [](AffineMap map) -> Attribute {		llvm::map_range(fusedIndexMaps, [](AffineMap map) -> Attribute {
return AffineMapAttr::get(map);		return AffineMapAttr::get(map);
}));		}));

Operation *producerOp = producer.getOperation();		Operation *producerOp = producer.getOperation();
LinalgOp fusedOp = createLinalgOpOfSameType(		LinalgOp fusedOp = createLinalgOpOfSameType(
		mravishankarUnsubmitted Done Reply Inline Actions I dont think you need to do this. There is a separate pattern that will fold the `constant -> tensor_reshape` into a `constant` mravishankar: I dont think you need to do this. There is a separate pattern that will fold the `constant ->…
		hanchungAuthorUnsubmitted Done Reply Inline Actions Needs to update ReshapeOp::fold and use createOrFold. I fixed it in this patch as well. hanchung: Needs to update ReshapeOp::fold and use createOrFold. I fixed it in this patch as well.
producer, rewriter, rewriter.getUnknownLoc(), consumer.getResultType(),		producer, rewriter, rewriter.getUnknownLoc(), consumer.getResultType(),
producerOp->getOperands(),		producerOp->getOperands(),
rewriter.getI64IntegerAttr(producerOp->getNumOperands()),		rewriter.getI64IntegerAttr(producerOp->getNumOperands()),
rewriter.getI64IntegerAttr(1), rewriter.getArrayAttr(indexMapAttrs),		rewriter.getI64IntegerAttr(1), rewriter.getArrayAttr(indexMapAttrs),
producer.iterator_types(),		producer.iterator_types(),
/doc=/nullptr,		/doc=/nullptr,
/library_call=/nullptr,		/library_call=/nullptr,
/symbol_source=/nullptr);		/symbol_source=/nullptr);
auto &fusedRegion = fusedOp.getOperation()->getRegion(0);		auto &fusedRegion = fusedOp.getOperation()->getRegion(0);
rewriter.cloneRegionBefore(producerOp->getRegion(0), fusedRegion,		rewriter.cloneRegionBefore(producerOp->getRegion(0), fusedRegion,
fusedRegion.begin());		fusedRegion.begin());
return fusedOp;		return fusedOp;
}		}

		static bool isExpandingAndFusible(LinalgOp producer, TensorReshapeOp consumer,
		mravishankarUnsubmitted Done Reply Inline Actions Maybe a matter of preference, but this looks clean to me :) mravishankar: Maybe a matter of preference, but this looks clean to me :)
		unsigned consumerIdx) {
		// Is fusible only if:
		// 1) The producer is a generic op.
		// 2) The producer has tensor semantics.
		// 3) The tensor reshape op is a expanding case.
		// 4) All the shapes are the same for the generic op.
		// 5) All the indexing maps in producer are identity.
		// 6) All the loops in producer are parallel loops.
		// 7) The producer has a single user.
		auto types = producer.getInputOutputShapedTypes();
		assert(!types.empty());
		return isa<GenericOp>(producer.getOperation()) &&
		producer.hasTensorSemantics() &&
		mravishankarUnsubmitted Done Reply Inline Actions Nit: This should be `consumer.getSrcType().getRank() < consumer.getResultType().getRank()`. `==` is illegal by op definition. mravishankar: Nit: This should be `consumer.getSrcType().getRank() < consumer.getResultType().getRank()`.
		consumer.getSrcType().getRank() <
		consumer.getResultType().getRank() &&
		std::equal(types.begin() + 1, types.end(), types.begin()) &&
		llvm::all_of(producer.getIndexingMaps(),
		[](AffineMap map) { return map.isIdentity(); }) &&
		llvm::all_of(producer.iterator_types(),
		[](Attribute attr) {
		return attr.cast<StringAttr>().getValue() ==
		getParallelIteratorTypeName();
		}) &&
		producer.getOperation()->hasOneUse();
		}

		static LinalgOp fuseExpandingCase(LinalgOp producer, TensorReshapeOp consumer,
		unsigned consumerIdx,
		PatternRewriter &rewriter) {
		Location loc = producer.getLoc();
		auto dstShape = consumer.getResultType().cast<ShapedType>().getShape();
		SmallVector<Value, 4> args;
		for (auto arg : producer.getOperation()->getOperands()) {
		auto type = RankedTensorType::get(
		dstShape, arg.getType().cast<ShapedType>().getElementType());
		args.push_back(rewriter.createOrFold<linalg::TensorReshapeOp>(
		loc, type, arg, consumer.reassociation()));
		}

		SmallVector<Type, 4> resultTypes;
		for (auto t : producer.getOutputTensorTypes()) {
		Type type = RankedTensorType::get(dstShape,
		t.cast<ShapedType>().getElementType());
		resultTypes.push_back(type);
		}

		int rank = dstShape.size();
		int numArgsIn = producer.getNumInputs();
		int numArgsOut = producer.getNumOutputs();
		auto genericOp = rewriter.create<linalg::GenericOp>(
		loc, resultTypes, args, numArgsIn, numArgsOut,
		SmallVector<AffineMap, 3>(args.size() + resultTypes.size(),
		rewriter.getMultiDimIdentityMap(rank)),
		SmallVector<StringRef, 3>(rank, getParallelIteratorTypeName()));
		Region &region = genericOp.getRegion();
		rewriter.cloneRegionBefore(producer.getOperation()->getRegion(0), region,
		region.begin());
		return cast<LinalgOp>(genericOp.getOperation());
		}

		static LinalgOp fuse(LinalgOp producer, TensorReshapeOp consumer,
		unsigned consumerIdx, PatternRewriter &rewriter,
		OperationFolder *folder = nullptr) {
		if (isCollapsingAndFusible(producer, consumer, consumerIdx))
		return fuseCollapsingCase(producer, consumer, consumerIdx, rewriter);
		if (isExpandingAndFusible(producer, consumer, consumerIdx))
		return fuseExpandingCase(producer, consumer, consumerIdx, rewriter);
		return nullptr;
		}
};		};

/// Implementation of fusion on tensor ops when producer is a splat constant.		/// Implementation of fusion on tensor ops when producer is a splat constant.
struct FuseConstantOpAsProducer {		struct FuseConstantOpAsProducer {
static bool isFusible(ConstantOp producer, LinalgOp consumer,		static bool isFusible(ConstantOp producer, LinalgOp consumer,
unsigned consumerIdx) {		unsigned consumerIdx) {
return isa<GenericOp, IndexedGenericOp>(consumer.getOperation()) &&		return isa<GenericOp, IndexedGenericOp>(consumer.getOperation()) &&
consumer.hasTensorSemantics() &&		consumer.hasTensorSemantics() &&
▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/fusion-tensor.mlir

Show First 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	func @generic_op_reshape_consumer_nofusion(%arg0 : tensor<?x?x?x5xf32>,
return %1 : tensor<?x?xf32>		return %1 : tensor<?x?xf32>
}		}

// CHECK-LABEL: func @generic_op_reshape_consumer_nofusion		// CHECK-LABEL: func @generic_op_reshape_consumer_nofusion
// CHECK: linalg.tensor_reshape		// CHECK: linalg.tensor_reshape

// -----		// -----

		#map0 = affine_map<(d0, d1) -> (d0, d1)>
		#map1 = affine_map<(d0, d1, d2) -> (d0, d1)>
		#map2 = affine_map<(d0, d1, d2) -> (d2)>

		func @generic_op_reshape_consumer_expanding(%arg0: tensor<264x4xf32>)
		-> tensor<8x33x4xf32> {
		%cst = constant dense<2.000000e+00> : tensor<264x4xf32>
		%0 = linalg.generic
		{args_in = 2 : i64, args_out = 1 : i64,
		indexing_maps = [#map0, #map0, #map0],
		iterator_types = ["parallel", "parallel"]}
		%arg0, %cst {
		^bb0(%arg1: f32, %arg2: f32): // no predecessors
		%2 = mulf %arg1, %arg2 : f32
		linalg.yield %2 : f32
		}: tensor<264x4xf32>, tensor<264x4xf32> -> tensor<264x4xf32>
		%1 = linalg.tensor_reshape %0 [#map1, #map2] :
		tensor<264x4xf32> into tensor<8x33x4xf32>
		return %1 : tensor<8x33x4xf32>
		}

		// The reshape op in `%arg0` is folded into the indexing map of generic op.
		// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0, d1, d2) -> (d0 * 33 + d1, d2)>
		// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
		mravishankarUnsubmitted Done Reply Inline Actions I think it would be better to check the shape, indexing maps, etc here as well cause those are generated by the pattern being applied. mravishankar: I think it would be better to check the shape, indexing maps, etc here as well cause those are…
		// CHECK: func @generic_op_reshape_consumer_expanding
		// CHECK-NOT: linalg.tensor_reshape
		// CHECK: %[[CST:.]] = constant {{.}} : f32
		// CHECK: linalg.generic
		// CHECK-SAME: indexing_maps = [#[[MAP0]], #[[MAP1]]]
		// CHECK: tensor<264x4xf32> -> tensor<8x33x4xf32>
		// CHECK-NOT: linalg.tensor_reshape

		// -----

#map0 = affine_map<(d0, d1, d2) -> (d0)>		#map0 = affine_map<(d0, d1, d2) -> (d0)>
#map1 = affine_map<(d0, d1, d2) -> (d0, d1, d2)>		#map1 = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
func @generic_op_constant_fusion(%arg0 : tensor<5x?x?xf32>) -> tensor<5x?x?xf32>		func @generic_op_constant_fusion(%arg0 : tensor<5x?x?xf32>) -> tensor<5x?x?xf32>
{		{
%0 = constant dense<42.0> : tensor<5xf32>		%0 = constant dense<42.0> : tensor<5xf32>
%1 = linalg.generic		%1 = linalg.generic
{args_in = 2 : i64, args_out = 1 : i64,		{args_in = 2 : i64, args_out = 1 : i64,
indexing_maps = [#map0, #map1, #map1],		indexing_maps = [#map0, #map1, #map1],
▲ Show 20 Lines • Show All 320 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Linalg] Enhance Linalg fusion on generic op and tensor_reshape op.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 288554

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp

mlir/test/Dialect/Linalg/fusion-tensor.mlir

[mlir][Linalg] Enhance Linalg fusion on generic op and tensor_reshape op.
ClosedPublic