This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Conversion/TosaToSCF/
-
Conversion/
-
TosaToSCF/
-
TosaToSCF.cpp
-
TosaToSCFPass.cpp
-
test/Conversion/TosaToSCF/
-
Conversion/
-
TosaToSCF/
-
tosa-to-scf.mlir

Differential D151117

Lowering for 'tosa.scatter'
ClosedPublic

Authored by rafaelubalmw on May 22 2023, 9:26 AM.

Download Raw Diff

Details

Reviewers

silvas
mehdi_amini
rriddle
jpienaar
rsuderman
eric-k256

Commits

rG6b4b63a832f1: Lowering for 'tosa.scatter'

Summary

This patch adds support for tosa.scatter lowering in the --tosa-to-scf pass. Here's an example for this lowering:

func.func @tosa(
                %valuesIn : tensor<3x7x5xi32>,
                %indices : tensor<3x6xi32>,
                %input : tensor<3x6x5xi32>) ->
                tensor<3x7x5xi32> {
        %0 = "tosa.scatter"(%valuesIn, %indices, %input) :
                        (tensor<3x7x5xi32>,
                        tensor<3x6xi32>,
                        tensor<3x6x5xi32>) ->
                        (tensor<3x7x5xi32>)
        return %0 : tensor<3x7x5xi32>
}

translates to

func.func @tosa(%arg0: tensor<3x7x5xi32>, %arg1: tensor<3x6xi32>, %arg2: tensor<3x6x5xi32>) -> tensor<3x7x5xi32> {
  %c0 = arith.constant 0 : index
  %c3 = arith.constant 3 : index
  %c1 = arith.constant 1 : index
  %c6 = arith.constant 6 : index
  %c2 = arith.constant 2 : index
  %c5 = arith.constant 5 : index
  %c0_0 = arith.constant 0 : index
  %c1_1 = arith.constant 1 : index
  %0 = scf.for %arg3 = %c0_0 to %c3 step %c1_1 iter_args(%arg4 = %arg0) -> (tensor<3x7x5xi32>) {
    %1 = scf.for %arg5 = %c0_0 to %c6 step %c1_1 iter_args(%arg6 = %arg4) -> (tensor<3x7x5xi32>) {
      %extracted = tensor.extract %arg1[%arg3, %arg5] : tensor<3x6xi32>
      %2 = arith.index_cast %extracted : i32 to index
      %extracted_slice = tensor.extract_slice %arg2[%arg3, %arg5, %c0_0] [%c1_1, %c1_1, %c5] [%c1_1, %c1_1, %c1_1] : tensor<3x6x5xi32> to tensor<?x?x?xi32>
      %inserted_slice = tensor.insert_slice %extracted_slice into %arg6[%arg3, %2, %c0_0] [%c1_1, %c1_1, %c5] [%c1_1, %c1_1, %c1_1] : tensor<?x?x?xi32> into tensor<3x7x5xi32>
      scf.yield %inserted_slice : tensor<3x7x5xi32>
    }
    scf.yield %1 : tensor<3x7x5xi32>
  }
  return %0 : tensor<3x7x5xi32>
}

We have attempted an alternative lowering pass that uses `tensor.scatter` as an intermediate step. However, we opted to aim straight at the `scf` dialect for the following reasons:

- The `tensor.scatter` op doesn't seem to be used anywhere. There is no available lowering pass for this op (although we have one that we'll upstream soon).
- The `tosa.scatter` and `tensor.scatter` op have different indexing semantics. The `indices` argument of `tosa.scatter` must be non-trivially modified and restructured (e.g. with a `linalg.generic` op) to adapt to the needs of `tensor.scatter`. While this overhead may be simplified and fused after a subsequent `tensor.scatter` lowering, it adds complex logic and an obscure intermediate state. Unless there is a good reason to go through the `tensor` dialect that we're missing, this additional complexity may not be justified.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rafaelubalmw created this revision.May 22 2023, 9:26 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 22 2023, 9:26 AM

Herald added subscribers: bviyer, Moerafaat, zero9178 and 25 others. · View Herald Transcript

rafaelubalmw requested review of this revision.May 22 2023, 9:26 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 22 2023, 9:26 AM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

rafaelubalmw edited the summary of this revision. (Show Details)May 22 2023, 9:45 AM

rafaelubalmw added a reviewer: Restricted Project.

Herald added a subscriber: limo1996. · View Herald TranscriptMay 22 2023, 9:45 AM

rafaelubalmw edited reviewers, added: silvas, mehdi_amini, rriddle, jpienaar; removed: Restricted Project.May 22 2023, 10:02 AM

Harbormaster completed remote builds in B233613: Diff 524353.May 22 2023, 10:56 AM

dcaballe added a reviewer: rsuderman.May 22 2023, 11:02 AM

Thanks for taking this on. Scatter is a complicated op, and it is good to have a working legalization to linalg.

This revision is now accepted and ready to land.May 25 2023, 9:06 PM

to have a working legalization to linalg.

Is there a path from there to linalg? It's not clear to me how it works?
And actually I'm wondering if this lowering could be expressed with a linalg.generic indeed?

In D151117#4375142, @mehdi_amini wrote:

to have a working legalization to linalg.

Is there a path from there to linalg? It's not clear to me how it works?
And actually I'm wondering if this lowering could be expressed with a linalg.generic indeed?

Sorry, I misspoke and wrote linalg while meaning scf/tensor.
With scatter, I didn't think it would be possible to use linalg.generic as you don't have an affine map for the output because you are dependent on the values in the index tensor. Perhaps I'm missing an implementation option.

In D151117#4375142, @mehdi_amini wrote:

to have a working legalization to linalg.

Is there a path from there to linalg? It's not clear to me how it works?
And actually I'm wondering if this lowering could be expressed with a linalg.generic indeed?

We were unable to devise a lowering to linalg.generic directly. As @eric-k256 noted, this is due to the fact that the indexing on the output is based on the indices tensor rather than an affine map.

The IREE project created their own op in their 'LinalgExt' dialect to accommodate a more gradual lowering: https://github.com/openxla/iree/blob/97779d7f494660f88864b035475ec77a1e54c6c8/llvm-external-projects/iree-dialects/include/iree-dialects/Dialect/LinalgExt/IR/LinalgExtOps.td#L62

The tensor.scatter operation is an obvious lowering target for tosa.scatter, but

There is no lowering out of tensor.scatter at the moment (discussed a bit here: https://discourse.llvm.org/t/lowering-of-scatter-operations/70535)
The tosa.scatter -> tensor.scatter lowering is not as straightforward as the names would suggest

In D151117#4375830, @sabauma wrote:

In D151117#4375142, @mehdi_amini wrote:

to have a working legalization to linalg.

Is there a path from there to linalg? It's not clear to me how it works?
And actually I'm wondering if this lowering could be expressed with a linalg.generic indeed?

We were unable to devise a lowering to linalg.generic directly. As @eric-k256 noted, this is due to the fact that the indexing on the output is based on the indices tensor rather than an affine map.

The IREE project created their own op in their 'LinalgExt' dialect to accommodate a more gradual lowering: https://github.com/openxla/iree/blob/97779d7f494660f88864b035475ec77a1e54c6c8/llvm-external-projects/iree-dialects/include/iree-dialects/Dialect/LinalgExt/IR/LinalgExtOps.td#L62

The tensor.scatter operation is an obvious lowering target for tosa.scatter, but

There is no lowering out of tensor.scatter at the moment (discussed a bit here: https://discourse.llvm.org/t/lowering-of-scatter-operations/70535)

The tosa.scatter -> tensor.scatter lowering is not as straightforward as the names would suggest

Just to elaborate on (2), this was our attempt to grab a tosa.scatter instance and lower it to tensor.scatter. The conversion of the indexing semantics involved introducing an intermediate linalg.generic step:

func.func @main(
		%valuesIn : tensor<3x7x5xi32>,
		%indices : tensor<3x6xi32>,
		%input : tensor<3x6x5xi32>) ->
		tensor<3x7x5xi32> {
	%0 = "tosa.scatter"(%valuesIn, %indices, %input) :
			(tensor<3x7x5xi32>,
			tensor<3x6xi32>,
			tensor<3x6x5xi32>) ->
			(tensor<3x7x5xi32>)
	return %0 : tensor<3x7x5xi32>
}

lowers to tensor.scatter as

func.func @main(
		%valuesIn : tensor<3x7x5xi32>,
		%indices : tensor<3x6xi32>,
		%input : tensor<3x6x5xi32>) ->
		tensor<3x7x5xi32> {
	%reshapedInput = tensor.reshape %input : tensor<3x6x5xi32> into tensor<3x6 x1x1x5 xi32>
	%emptyNewIndices = tensor.empty : tensor<3x6 x2 xindex>
	%newIndices = linalg.generic {
		indexing_maps [
			affine_map<(i, j, k) -> (i, j)>,
			affine_map<(i, j, k) -> (i, j, k)>
		],
		iterator_types = ["parallel", "parallel"]
	} ins(%indices: tensor<3x6xi32>) outs(%emptyNewIndices: tensor<3x6x2 xindex>) {
	^bb0(%index: i32):
		%i = linalg.index 0 : index
		%j = linalg.index 2 : index
		%zero = arith.constant 0 : index
		%isZero = arith.cmp eq %j, %zero : index
		%ret = arith.select %isZero, %i, %index
		linalg.yield %ret : index
	}
	%result = tensor.scatter %reshapedInput into %valuesIn[%newIndices]
			scatter_dims([0, 1]) unique :
			(tensor<3x6x 1x1x5 xi32>,
			tensor<3x7x5 xi32>,
			tensor<3x6 x2 xi32>) ->
			tensor<3x7x5 xi32>
	return %result : tensor<3x7x5xi32>
}

In our exploration of this path, we also created a lowering pattern for tensor.scatter -> scf (which we might upstream soon anyway). But we concluded that the additional overhead of the index conversion did not justify using tensor.scatter as an intermediate representation when lowering tosa.scatter.

Thanks for the detailed informations!

Closed by commit rG6b4b63a832f1: Lowering for 'tosa.scatter' (authored by rafaelubalmw, committed by eric-k256). · Explain WhyMay 30 2023, 2:34 PM

This revision was automatically updated to reflect the committed changes.

eric-k256 added a commit: rG6b4b63a832f1: Lowering for 'tosa.scatter'.

Revision Contents

Path

Size

mlir/

lib/

Conversion/

TosaToSCF/

TosaToSCF.cpp

73 lines

TosaToSCFPass.cpp

2 lines

test/

Conversion/

TosaToSCF/

tosa-to-scf.mlir

30 lines

Diff 526799

mlir/lib/Conversion/TosaToSCF/TosaToSCF.cpp

Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(tosa::IfOp op,
inlineIfCase(op.getElseBranch(), newIf.getElseRegion(), op.getInputs(),		inlineIfCase(op.getElseBranch(), newIf.getElseRegion(), op.getInputs(),
rewriter);		rewriter);

rewriter.replaceOp(op, newIf.getResults());		rewriter.replaceOp(op, newIf.getResults());
return success();		return success();
}		}
};		};

		class ScatterOpConverter : public OpRewritePattern<tosa::ScatterOp> {
		static Value createTensorDim(OpBuilder &builder, Location loc, Value tensor,
		int64_t dim) {
		return builder.createOrFold<tensor::DimOp>(loc, tensor, dim);
		}

		static Value createIndexConst(OpBuilder &builder, Location loc,
		int64_t value) {
		return builder.create<arith::ConstantIndexOp>(loc, value);
		}

		public:
		using OpRewritePattern<tosa::ScatterOp>::OpRewritePattern;

		LogicalResult matchAndRewrite(tosa::ScatterOp scatter,
		PatternRewriter &rewriter) const final {
		auto valuesIn = scatter.getValuesIn();
		auto indices = scatter.getIndices();
		auto input = scatter.getInput();
		auto loc = scatter.getLoc();

		// N, W, C are chosen to match the TOSA spec
		auto dimN = createTensorDim(rewriter, loc, input, 0);
		auto dimW = createTensorDim(rewriter, loc, input, 1);
		auto dimC = createTensorDim(rewriter, loc, input, 2);

		auto zero = createIndexConst(rewriter, loc, 0);
		auto one = createIndexConst(rewriter, loc, 1);

		// Loop bounds
		auto lbs = llvm::SmallVector<Value>(2, zero);
		auto steps = llvm::SmallVector<Value>(2, one);
		auto ubs = llvm::SmallVector<Value>{{dimN, dimW}};

		auto buildBody = [&](OpBuilder &builder, Location loc, ValueRange ivs,
		ValueRange args) -> scf::ValueVector {
		auto n = ivs[0];

		// Read the index and cast it to index type
		auto index = builder.create<tensor::ExtractOp>(loc, indices, ivs);
		auto castIndex = builder.create<arith::IndexCastOp>(
		loc, builder.getIndexType(), index);

		// Offset, sizes, and strides for the input tensor
		auto inputOffset = llvm::to_vector(ivs);
		inputOffset.push_back(zero);

		llvm::SmallVector<Value> sizes = {one, one, dimC};
		llvm::SmallVector<Value> strides = {one, one, one};

		auto slice = builder.create<tensor::ExtractSliceOp>(
		loc, input, inputOffset, sizes, strides);

		// Insert the slice into the output accumulator tensor.
		llvm::SmallVector<Value> outputOffset = {n, castIndex, zero};
		auto updated = builder.create<tensor::InsertSliceOp>(
		loc, slice, args[0], outputOffset, sizes, strides);

		return {updated};
		};

		auto loops = scf::buildLoopNest(rewriter, loc, lbs, ubs, steps,
		ValueRange{valuesIn}, buildBody);
		rewriter.replaceOp(scatter, loops.results);

		return success();
		}
		};

class WhileOpConverter : public OpRewritePattern<tosa::WhileOp> {		class WhileOpConverter : public OpRewritePattern<tosa::WhileOp> {
public:		public:
using OpRewritePattern<tosa::WhileOp>::OpRewritePattern;		using OpRewritePattern<tosa::WhileOp>::OpRewritePattern;

LogicalResult matchAndRewrite(tosa::WhileOp op,		LogicalResult matchAndRewrite(tosa::WhileOp op,
PatternRewriter &rewriter) const final {		PatternRewriter &rewriter) const final {
auto newWhile = rewriter.create<scf::WhileOp>(		auto newWhile = rewriter.create<scf::WhileOp>(
op.getLoc(), op.getResultTypes(), op.getInputs());		op.getLoc(), op.getResultTypes(), op.getInputs());
rewriter.createBlock(&newWhile.getBefore());		rewriter.createBlock(&newWhile.getBefore());
rewriter.createBlock(&newWhile.getAfter());		rewriter.createBlock(&newWhile.getAfter());

inlineWhileCase(op.getCond(), newWhile.getBefore(), rewriter, true);		inlineWhileCase(op.getCond(), newWhile.getBefore(), rewriter, true);
inlineWhileCase(op.getBody(), newWhile.getAfter(), rewriter, false);		inlineWhileCase(op.getBody(), newWhile.getAfter(), rewriter, false);

rewriter.replaceOp(op, newWhile.getResults());		rewriter.replaceOp(op, newWhile.getResults());

return success();		return success();
}		}
};		};

} // namespace		} // namespace

void mlir::tosa::populateTosaToSCFConversionPatterns(		void mlir::tosa::populateTosaToSCFConversionPatterns(
RewritePatternSet *patterns) {		RewritePatternSet *patterns) {
patterns->add<IfOpConverter>(patterns->getContext());		patterns->add<IfOpConverter, ScatterOpConverter, WhileOpConverter>(
patterns->add<WhileOpConverter>(patterns->getContext());		patterns->getContext());
}		}

mlir/lib/Conversion/TosaToSCF/TosaToSCFPass.cpp

	Show All 31 Lines

	namespace {			namespace {
	struct TosaToSCF : public impl::TosaToSCFBase<TosaToSCF> {			struct TosaToSCF : public impl::TosaToSCFBase<TosaToSCF> {
	public:			public:
	void runOnOperation() override {			void runOnOperation() override {
	RewritePatternSet patterns(&getContext());			RewritePatternSet patterns(&getContext());
	ConversionTarget target(getContext());			ConversionTarget target(getContext());
	target.addLegalDialect<tensor::TensorDialect, scf::SCFDialect>();			target.addLegalDialect<tensor::TensorDialect, scf::SCFDialect>();
	target.addIllegalOp<tosa::IfOp, tosa::WhileOp>();			target.addIllegalOp<tosa::IfOp, tosa::ScatterOp, tosa::WhileOp>();
	target.markUnknownOpDynamicallyLegal([](Operation *) { return true; });			target.markUnknownOpDynamicallyLegal([](Operation *) { return true; });

	auto *op = getOperation();			auto *op = getOperation();
	mlir::tosa::populateTosaToSCFConversionPatterns(&patterns);			mlir::tosa::populateTosaToSCFConversionPatterns(&patterns);
	if (failed(applyPartialConversion(op, target, std::move(patterns))))			if (failed(applyPartialConversion(op, target, std::move(patterns))))
	signalPassFailure();			signalPassFailure();
	}			}
	};			};
	Show All 9 Lines

mlir/test/Conversion/TosaToSCF/tosa-to-scf.mlir

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	^bb1(%arg5 : tensor<f32>, %arg6 : tensor<f32>):
"tosa.yield"(%arg6) : (tensor<f32>) -> ()		"tosa.yield"(%arg6) : (tensor<f32>) -> ()

// CHECK: }		// CHECK: }
// CHECK: return [[IF]]		// CHECK: return [[IF]]
}) : (tensor<i1>, tensor<f32>, tensor<f32>) -> (tensor<f32>)		}) : (tensor<i1>, tensor<f32>, tensor<f32>) -> (tensor<f32>)

return %0 : tensor<f32>		return %0 : tensor<f32>
}		}

		// -----

		// CHECK-LABEL: func @scatter_test
		// CHECK-SAME: ([[VALUES_IN:%.+]]: tensor<3x7x5xi32>, [[INDICES:%.+]]: tensor<3x6xi32>, [[INPUT:%.+]]: tensor<3x6x5xi32>)
		func.func @scatter_test(%values_in: tensor<3x7x5xi32>, %indices : tensor<3x6xi32>, %input: tensor<3x6x5xi32>) -> tensor<3x7x5xi32> {

		// CHECK-DAG: [[C_0:%.+]] = arith.constant 0 : index
		// CHECK-DAG: [[C_1:%.+]] = arith.constant 1 : index
		// CHECK-DAG: [[C_2:%.+]] = arith.constant 2 : index
		// CHECK-DAG: [[C_3:%.+]] = arith.constant 3 : index
		// CHECK-DAG: [[C_5:%.+]] = arith.constant 5 : index
		// CHECK-DAG: [[C_6:%.+]] = arith.constant 6 : index
		// CHECK-DAG: [[C_0_0:%.+]] = arith.constant 0 : index
		// CHECK-DAG: [[C_1_0:%.+]] = arith.constant 1 : index
		// CHECK: [[RESULT_0:%.+]] = scf.for [[ITER_VAR_0:%.+]] = [[C_0_0]] to [[C_3]] step [[C_1_0]] iter_args([[ITER_ARG_0:%.+]] = [[VALUES_IN]]) -> (tensor<3x7x5xi32>) {
		// CHECK: [[RESULT_1:%.+]] = scf.for [[ITER_VAR_1:%.+]] = [[C_0_0]] to [[C_6]] step [[C_1_0]] iter_args([[ITER_ARG_1:%.+]] = [[ITER_ARG_0]]) -> (tensor<3x7x5xi32>) {
		// CHECK-DAG: [[EXTRACTED:%.+]] = tensor.extract [[INDICES]][[[ITER_VAR_0]], [[ITER_VAR_1]]] : tensor<3x6xi32>
		// CHECK-DAG: [[EXTRACTED_CAST:%.+]] = arith.index_cast [[EXTRACTED]] : i32 to index
		// CHECK-DAG: [[EXTRACTED_SLICE:%.+]] = tensor.extract_slice [[INPUT]][[[ITER_VAR_0]], [[ITER_VAR_1]], [[C_0_0]]] [[[C_1_0]], [[C_1_0]], [[C_5]]] [[[C_1_0]], [[C_1_0]], [[C_1_0]]] : tensor<3x6x5xi32> to tensor<?x?x?xi32>
		// CHECK-DAG: [[INSERTED_SLICE:%.+]] = tensor.insert_slice [[EXTRACTED_SLICE]] into [[ITER_ARG_1]][[[ITER_VAR_0]], [[EXTRACTED_CAST]], [[C_0_0]]] [[[C_1_0]], [[C_1_0]], [[C_5]]] [[[C_1_0]], [[C_1_0]], [[C_1_0]]] : tensor<?x?x?xi32> into tensor<3x7x5xi32>
		// CHECK: scf.yield [[INSERTED_SLICE]] : tensor<3x7x5xi32>
		// CHECK: }
		// CHECK: scf.yield [[RESULT_1]] : tensor<3x7x5xi32>
		// CHECK: }
		%0 = "tosa.scatter"(%values_in, %indices, %input) : (tensor<3x7x5xi32>, tensor<3x6xi32>, tensor<3x6x5xi32>) -> (tensor<3x7x5xi32>)

		// CHECK: return [[RESULT_0]] : tensor<3x7x5xi32>
		return %0 : tensor<3x7x5xi32>
		}