This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/SparseTensor/Transforms/
-
Dialect/
-
SparseTensor/
-
Transforms/
12/14
SparseTensorRewriting.cpp
-
test/Dialect/SparseTensor/
-
Dialect/
-
SparseTensor/
-
sparse_sddmm.mlir

Differential D131126

[mlir][sparse] replace zero yield generic op with copy in allocation
ClosedPublic

Authored by aartbik on Aug 3 2022, 3:52 PM.

Download Raw Diff

Details

Reviewers

springerm
bixia
Peiming
yinying-lisa-li
wrengr

Commits

rGc7bb69bc7546: [mlir][sparse] replace zero yield generic op with copy in allocation

Summary

This prepares patterns that sometimes are generated by the front-end
and would prohibit fusion of SDDMM flavored kernels.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aartbik created this revision.Aug 3 2022, 3:52 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2022, 3:52 PM

Herald added subscribers: anlunx, bzcheeseman, sdasgup3 and 19 others. · View Herald Transcript

aartbik requested review of this revision.Aug 3 2022, 3:52 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2022, 3:52 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

aartbik added reviewers: bixia, Peiming, yinying-lisa-li, wrengr.Aug 3 2022, 3:55 PM

Harbormaster completed remote builds in B179154: Diff 449815.Aug 3 2022, 4:18 PM

Peiming added inline comments.Aug 3 2022, 4:32 PM

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorRewriting.cpp
120	Why only fold zero?

aartbik added inline comments.Aug 3 2022, 5:22 PM

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorRewriting.cpp
120	For now, the only situation that appears from automatically generated code.

springerm added inline comments.Aug 4 2022, 12:45 AM

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorRewriting.cpp
102	What if the op has multiple outputs?
102	In addition, you could also check if `yield.getOperand(0)` is a zero constant.
120	initial
120	converts
134–139	Is this safe? What if `a` already has a `copy` operand and `a` has multiple users?
138	Should be wrapped in `rewriter.updateRootInPlace`.
139	Can you use `zero` here? `arith.constant` bufferize to a `memref.get_global` and is "non-writable". I.e., if some operation tries to write to it, the bufferization will automatically insert an `alloc_tensor` op.

aartbik marked 6 inline comments as done.Aug 4 2022, 8:10 AM

aartbik added inline comments.

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorRewriting.cpp
102	That is taken care of in L125. Or do you mean you want an assert here?
102	Extended this for direct value. Like Peiming suggested, for now I restrict this to zero (since that is the only useful case in the set of rewriting, but we may expand other constants too in the future).
134–139	We guarantee there is no copy at L127 (isZero == false). And even if there are several uses, the contents will be filled either way, right?
138	I have changed it, also below. See if this is the right idiom?
139	But that is why we replace the copy in the already existing bufferization allocation, right?

addressed comments

Harbormaster completed remote builds in B179285: Diff 449988.Aug 4 2022, 8:26 AM

springerm accepted this revision.Aug 4 2022, 8:39 AM

This revision is now accepted and ready to land.Aug 4 2022, 8:39 AM

more precise update root call

Harbormaster completed remote builds in B179300: Diff 450007.Aug 4 2022, 9:24 AM

Closed by commit rGc7bb69bc7546: [mlir][sparse] replace zero yield generic op with copy in allocation (authored by aartbik). · Explain WhyAug 4 2022, 9:34 AM

This revision was automatically updated to reflect the committed changes.

aartbik added a commit: rGc7bb69bc7546: [mlir][sparse] replace zero yield generic op with copy in allocation.

Revision Contents

Path

Size

mlir/

lib/

Dialect/

SparseTensor/

Transforms/

SparseTensorRewriting.cpp

56 lines

test/

Dialect/

SparseTensor/

sparse_sddmm.mlir

36 lines

Diff 450019

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorRewriting.cpp

//===- SparseTensorRewriting.cpp - Sparse tensor rewriting rules ----------===//		//===- SparseTensorRewriting.cpp - Sparse tensor rewriting rules ----------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements rewriting rules that are specific to sparse tensors.		// This file implements rewriting rules that are specific to sparse tensors.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		#include "CodegenUtils.h"

#include "mlir/Dialect/Arithmetic/IR/Arithmetic.h"		#include "mlir/Dialect/Arithmetic/IR/Arithmetic.h"
#include "mlir/Dialect/Bufferization/IR/Bufferization.h"		#include "mlir/Dialect/Bufferization/IR/Bufferization.h"
#include "mlir/Dialect/Linalg/IR/Linalg.h"		#include "mlir/Dialect/Linalg/IR/Linalg.h"
#include "mlir/Dialect/SparseTensor/IR/SparseTensor.h"		#include "mlir/Dialect/SparseTensor/IR/SparseTensor.h"
#include "mlir/Dialect/SparseTensor/Transforms/Passes.h"		#include "mlir/Dialect/SparseTensor/Transforms/Passes.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"		#include "mlir/Dialect/Tensor/IR/Tensor.h"
#include "mlir/IR/AffineMap.h"		#include "mlir/IR/AffineMap.h"
#include "mlir/IR/Matchers.h"		#include "mlir/IR/Matchers.h"
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	if (isa<arith::AddFOp>(def) \|\| isa<arith::AddIOp>(def)) {
Value x = op.getBlock()->getArguments().back();		Value x = op.getBlock()->getArguments().back();
return (def->getOperand(0) == x && isMulChain(def->getOperand(1), x)) \|\|		return (def->getOperand(0) == x && isMulChain(def->getOperand(1), x)) \|\|
(def->getOperand(1) == x && isMulChain(def->getOperand(0), x));		(def->getOperand(1) == x && isMulChain(def->getOperand(0), x));
}		}
}		}
return false;		return false;
}		}

		// Helper to detect direct yield of a zero value.
		static bool isZeroYield(GenericOp op) {
		auto yieldOp = cast<linalg::YieldOp>(op.region().front().getTerminator());
		if (auto arg = yieldOp.getOperand(0).dyn_cast<BlockArgument>()) {
		springermUnsubmitted Done Reply Inline Actions What if the op has multiple outputs? springerm: What if the op has multiple outputs?
		aartbikAuthorUnsubmitted Done Reply Inline Actions That is taken care of in L125. Or do you mean you want an assert here? aartbik: That is taken care of in L125. Or do you mean you want an assert here?
		springermUnsubmitted Done Reply Inline Actions In addition, you could also check if `yield.getOperand(0)` is a zero constant. springerm: In addition, you could also check if `yield.getOperand(0)` is a zero constant.
		aartbikAuthorUnsubmitted Done Reply Inline Actions Extended this for direct value. Like Peiming suggested, for now I restrict this to zero (since that is the only useful case in the set of rewriting, but we may expand other constants too in the future). aartbik: Extended this for direct value. Like Peiming suggested, for now I restrict this to zero (since…
		if (arg.getOwner()->getParentOp() == op) {
		OpOperand *t = op.getInputAndOutputOperands()[arg.getArgNumber()];
		return matchPattern(t->get(), m_Zero()) \|\|
		matchPattern(t->get(), m_AnyZeroFloat());
		}
		} else if (auto *def = yieldOp.getOperand(0).getDefiningOp()) {
		return matchPattern(def, m_Zero()) \|\| matchPattern(def, m_AnyZeroFloat());
		}
		return false;
		}

//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
// The actual sparse tensor rewriting rules.		// The actual sparse tensor rewriting rules.
//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//

namespace {		namespace {

		/// Rewriting rule that converts direct yield of zero with initial allocation.
		PeimingUnsubmitted Not Done Reply Inline Actions Why only fold zero? Peiming: Why only fold zero?
		aartbikAuthorUnsubmitted Done Reply Inline Actions For now, the only situation that appears from automatically generated code. aartbik: For now, the only situation that appears from automatically generated code.
		springermUnsubmitted Done Reply Inline Actions initial springerm: initial
		springermUnsubmitted Done Reply Inline Actions converts springerm: converts
		struct FoldInvariantYield : public OpRewritePattern<GenericOp> {
		public:
		using OpRewritePattern<GenericOp>::OpRewritePattern;

		LogicalResult matchAndRewrite(GenericOp op,
		PatternRewriter &rewriter) const override {
		if (!op.hasTensorSemantics() \|\| op.getNumResults() != 1 \|\|
		!isAlloc(op.getOutputOperand(0), /isZero=/false) \|\| !isZeroYield(op))
		return failure();
		auto outputType = op.getResult(0).getType().cast<RankedTensorType>();
		if (!outputType.hasStaticShape() \|\| getSparseTensorEncoding(outputType))
		return failure();
		// Incorporate zero value into allocation copy.
		Value zero = constantZero(rewriter, op.getLoc(), op.getResult(0).getType());
		AllocTensorOp a =
		op.getOutputOperand(0)->get().getDefiningOp<AllocTensorOp>();
		rewriter.updateRootInPlace(a, [&]() { a.getCopyMutable().assign(zero); });
		rewriter.replaceOp(op, op.getOutputOperand(0)->get());
		springermUnsubmitted Done Reply Inline Actions Should be wrapped in `rewriter.updateRootInPlace`. springerm: Should be wrapped in `rewriter.updateRootInPlace`.
		aartbikAuthorUnsubmitted Done Reply Inline Actions I have changed it, also below. See if this is the right idiom? aartbik: I have changed it, also below. See if this is the right idiom?
		return success();
		springermUnsubmitted Done Reply Inline Actions Is this safe? What if `a` already has a `copy` operand and `a` has multiple users? springerm: Is this safe? What if `a` already has a `copy` operand and `a` has multiple users?
		aartbikAuthorUnsubmitted Done Reply Inline Actions We guarantee there is no copy at L127 (isZero == false). And even if there are several uses, the contents will be filled either way, right? aartbik: We guarantee there is no copy at L127 (isZero == false). And even if there are several uses…
		springermUnsubmitted Not Done Reply Inline Actions Can you use `zero` here? `arith.constant` bufferize to a `memref.get_global` and is "non-writable". I.e., if some operation tries to write to it, the bufferization will automatically insert an `alloc_tensor` op. springerm: Can you use `zero` here? `arith.constant` bufferize to a `memref.get_global` and is "non…
		aartbikAuthorUnsubmitted Done Reply Inline Actions But that is why we replace the copy in the already existing bufferization allocation, right? aartbik: But that is why we replace the copy in the already existing bufferization allocation, right?
		}
		};

/// Rewriting rule that converts two kernels:		/// Rewriting rule that converts two kernels:
///		///
/// T(i,j) = SUM(k, A(i,j,k) * B(i,j,k) * ... )		/// T(i,j) = SUM(k, A(i,j,k) * B(i,j,k) * ... )
/// X(i,j) = S(i,j) * T(i,j)		/// X(i,j) = S(i,j) * T(i,j)
///		///
/// into a single kernel, using distributive law:		/// into a single kernel, using distributive law:
///		///
/// X(i,j) = SUM(k, S(i,j) * A(i,j,k) * B(i,j,k) * ... )		/// X(i,j) = SUM(k, S(i,j) * A(i,j,k) * B(i,j,k) * ... )
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	for (auto &op : prodBlock.without_terminator())
rewriter.clone(op, mapper);		rewriter.clone(op, mapper);
}		}
mapper.map(consBlock.getArgument(other), fusedBlock->back().getResult(0));		mapper.map(consBlock.getArgument(other), fusedBlock->back().getResult(0));
mapper.map(last, rewriter.clone(*sampler, mapper)->getResult(0));		mapper.map(last, rewriter.clone(*sampler, mapper)->getResult(0));
last = rewriter.clone(*acc, mapper)->getResult(0);		last = rewriter.clone(*acc, mapper)->getResult(0);
rewriter.create<linalg::YieldOp>(loc, last);		rewriter.create<linalg::YieldOp>(loc, last);
// Force initial value on merged allocation for dense outputs.		// Force initial value on merged allocation for dense outputs.
if (!getSparseTensorEncoding(op.getResult(0).getType())) {		if (!getSparseTensorEncoding(op.getResult(0).getType())) {
AllocTensorOp a1 =		Value init = prod.getOutputOperand(0)
prod.getOutputOperand(0)->get().getDefiningOp<AllocTensorOp>();		->get()
AllocTensorOp a2 =		.getDefiningOp<AllocTensorOp>()
		.getCopy();
		AllocTensorOp a =
op.getOutputOperand(0)->get().getDefiningOp<AllocTensorOp>();		op.getOutputOperand(0)->get().getDefiningOp<AllocTensorOp>();
a2.getCopyMutable().assign(a1.getCopy());		rewriter.updateRootInPlace(a, [&]() { a.getCopyMutable().assign(init); });
}		}
// Replace consumer with fused operation. Old producer		// Replace consumer with fused operation. Old producer
// and consumer ops will be removed by DCE.		// and consumer ops will be removed by DCE.
rewriter.replaceOp(op, fusedOp->getResults());		rewriter.replaceOp(op, fusedOp->getResults());
return success();		return success();
}		}

private:		private:
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines

} // namespace		} // namespace

//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
// Methods that add patterns described in this file to a pattern list.		// Methods that add patterns described in this file to a pattern list.
//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//

void mlir::populateSparseTensorRewriting(RewritePatternSet &patterns) {		void mlir::populateSparseTensorRewriting(RewritePatternSet &patterns) {
patterns		patterns.add<FoldInvariantYield, FuseSparseMultiplyOverAdd,
.add<FuseSparseMultiplyOverAdd, ReshapeRewriter<tensor::ExpandShapeOp>,		ReshapeRewriter<tensor::ExpandShapeOp>,
ReshapeRewriter<tensor::CollapseShapeOp>>(patterns.getContext());		ReshapeRewriter<tensor::CollapseShapeOp>>(patterns.getContext());
}		}

mlir/test/Dialect/SparseTensor/sparse_sddmm.mlir

Property	Old Value	New Value
File Mode	100644	100755

Show All 14 Lines	#trait_scale = {
indexing_maps = [		indexing_maps = [
affine_map<(d0, d1) -> (d0, d1)>,		affine_map<(d0, d1) -> (d0, d1)>,
affine_map<(d0, d1) -> (d0, d1)>,		affine_map<(d0, d1) -> (d0, d1)>,
affine_map<(d0, d1) -> (d0, d1)>		affine_map<(d0, d1) -> (d0, d1)>
],		],
iterator_types = ["parallel", "parallel"]		iterator_types = ["parallel", "parallel"]
}		}

		// CHECK-LABEL: func.func @fold_yield_arg_zero() -> tensor<1024x1024xf64> {
		// CHECK: %[[VAL_0:.*]] = arith.constant dense<0.000000e+00> : tensor<1024x1024xf64>
		// CHECK: %[[VAL_1:.*]] = bufferization.alloc_tensor() copy(%[[VAL_0]]) {bufferization.escape = [false], memory_space = 0 : ui64} : tensor<1024x1024xf64>
		// CHECK: return %[[VAL_1]] : tensor<1024x1024xf64>
		// CHECK: }
		func.func @fold_yield_arg_zero() -> tensor<1024x1024xf64> {
		%cst = arith.constant 0.000000e+00 : f64
		%0 = linalg.init_tensor [1024, 1024] : tensor<1024x1024xf64>
		%1 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> ()>,
		affine_map<(d0, d1) -> (d0, d1)>],
		iterator_types = ["parallel", "parallel"]}
		ins(%cst : f64)
		outs(%0 : tensor<1024x1024xf64>) {
		^bb0(%a: f64, %x: f64):
		linalg.yield %a : f64
		} -> tensor<1024x1024xf64>
		return %1 : tensor<1024x1024xf64>
		}

		// CHECK-LABEL: func.func @fold_yield_direct_zero() -> tensor<32xf64> {
		// CHECK: %[[VAL_0:.*]] = arith.constant dense<0.000000e+00> : tensor<32xf64>
		// CHECK: %[[VAL_1:.*]] = bufferization.alloc_tensor() copy(%[[VAL_0]]) {bufferization.escape = [false], memory_space = 0 : ui64} : tensor<32xf64>
		// CHECK: return %[[VAL_1]] : tensor<32xf64>
		// CHECK: }
		func.func @fold_yield_direct_zero() -> tensor<32xf64> {
		%cst = arith.constant 0.000000e+00 : f64
		%0 = linalg.init_tensor [32] : tensor<32xf64>
		%1 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>],
		iterator_types = ["parallel"]}
		outs(%0 : tensor<32xf64>) {
		^bb0(%x: f64):
		linalg.yield %cst : f64
		} -> tensor<32xf64>
		return %1 : tensor<32xf64>
		}

// CHECK-LABEL: func.func @sampled_dd_unfused(		// CHECK-LABEL: func.func @sampled_dd_unfused(
// CHECK-SAME: %[[VAL_0:.]]: tensor<8x8xf64, #sparse_tensor.encoding<{{.}}>>,		// CHECK-SAME: %[[VAL_0:.]]: tensor<8x8xf64, #sparse_tensor.encoding<{{.}}>>,
// CHECK-SAME: %[[VAL_1:.*]]: tensor<8x8xf64>,		// CHECK-SAME: %[[VAL_1:.*]]: tensor<8x8xf64>,
// CHECK-SAME: %[[VAL_2:.*]]: tensor<8x8xf64>) -> tensor<8x8xf64> {		// CHECK-SAME: %[[VAL_2:.*]]: tensor<8x8xf64>) -> tensor<8x8xf64> {
// CHECK-DAG: %[[VAL_3:.*]] = arith.constant 8 : index		// CHECK-DAG: %[[VAL_3:.*]] = arith.constant 8 : index
// CHECK-DAG: %[[VAL_4:.*]] = arith.constant 0 : index		// CHECK-DAG: %[[VAL_4:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[VAL_5:.*]] = arith.constant 1 : index		// CHECK-DAG: %[[VAL_5:.*]] = arith.constant 1 : index
// CHECK-DAG: %[[VAL_6:.*]] = arith.constant dense<0.000000e+00> : tensor<8x8xf64>		// CHECK-DAG: %[[VAL_6:.*]] = arith.constant dense<0.000000e+00> : tensor<8x8xf64>
▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines