This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] auto-insertion of conversion to resolve cycles
ClosedPublic

Authored by aartbik on Jun 29 2022, 12:44 PM.

Download Raw Diff

Details

Reviewers

bixia
Peiming
wrengr
yinying-lisa-li
penpornk

Commits

rGe057f25dee59: [mlir][sparse] auto-insertion of conversion to resolve cycles

Summary

When the iteration graph is cyclic (even after several attempts using less and less constraints), the current sparse compiler bails out, and no rewriting hapens. However, this revision adds some new logic where the sparse compiler tries to find a single input sparse tensor that breaks the cycle, and then adds a proper sparse conversion operation. This way, more incoming kernels can be handled!

Note, the resulting code is not optimal (although it keeps more or less proper "sparse" complexity), and more improvements should be added (especially when the kernel directly yields without computation, such as the transpose example). However, handling is better than not handling ;-)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aartbik created this revision.Jun 29 2022, 12:44 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2022, 12:44 PM

Herald added subscribers: bzcheeseman, sdasgup3, wenzhicui and 19 others. · View Herald Transcript

aartbik requested review of this revision.Jun 29 2022, 12:44 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2022, 12:44 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

aartbik added reviewers: bixia, Peiming, wrengr, yinying-lisa-li, penpornk.Jun 29 2022, 12:46 PM

aartbik edited the summary of this revision. (Show Details)Jun 29 2022, 12:50 PM

Harbormaster completed remote builds in B172851: Diff 441132.Jun 29 2022, 2:34 PM

rebased with main

Harbormaster completed remote builds in B172873: Diff 441171.Jun 29 2022, 4:04 PM

bixia accepted this revision.Jun 29 2022, 4:25 PM

bixia added inline comments.

mlir/test/Dialect/SparseTensor/sparse_transpose.mlir
2	Maybe we shouldn't even need such a test as the generated code is so complicated and we already have an integration test to show that it works?
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_transpose.mlir
32–49	Would it be better to just replace this old function with the new one without the convert op?

This revision is now accepted and ready to land.Jun 29 2022, 4:25 PM

aartbik marked 2 inline comments as done.Jun 29 2022, 5:33 PM

aartbik added inline comments.

mlir/test/Dialect/SparseTensor/sparse_transpose.mlir
2	This was really to show the IR as part of the review too. Also, it shows that we should really address the TODO, since converting and then yielding is a bit strange. When we make that fix, this test will be triggered as being changed.
mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_transpose.mlir
32–49	I wanted to contrast the old with the new, but I agree that the comment is now a bit out of date. I have updated the comments to show the manual vs. auto insertion.

addressed comments

Harbormaster completed remote builds in B172912: Diff 441218.Jun 29 2022, 6:18 PM

Closed by commit rGe057f25dee59: [mlir][sparse] auto-insertion of conversion to resolve cycles (authored by aartbik). · Explain WhyJun 29 2022, 6:28 PM

This revision was automatically updated to reflect the committed changes.

aartbik added a commit: rGe057f25dee59: [mlir][sparse] auto-insertion of conversion to resolve cycles.

wrengr mentioned this in D129011: [mlir][sparse] Reducing computational complexity.Jul 1 2022, 12:02 PM

wrengr mentioned this in rG875ee0ed1c5a: [mlir][sparse] Reducing computational complexity.Jul 1 2022, 12:55 PM

Revision Contents

Path

Size

mlir/

lib/

Dialect/

SparseTensor/

Transforms/

Sparsification.cpp

85 lines

test/

Dialect/

SparseTensor/

sparse_transpose.mlir

62 lines

Integration/

Dialect/

SparseTensor/

CPU/

sparse_transpose.mlir

57 lines

Diff 441220

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp

Show All 35 Lines

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Declarations of data structures.		// Declarations of data structures.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

namespace {		namespace {

// Iteration graph sorting.		// Iteration graph sorting.
enum SortMask { kSparseOnly = 0x0, kIncludeDense = 0x1, kIncludeUndef = 0x2 };		enum SortMask {
		kSparseOnly = 0x0,
		kIncludeDense = 0x1,
		kIncludeUndef = 0x2,
		kIncludeAll = 0x3
		};

// Reduction kinds.		// Reduction kinds.
enum Reduction { kNoReduc, kSum, kProduct, kAnd, kOr, kXor };		enum Reduction { kNoReduc, kSum, kProduct, kAnd, kOr, kXor };

// Code generation.		// Code generation.
struct CodeGen {		struct CodeGen {
CodeGen(SparsificationOptions o, unsigned numTensors, unsigned numLoops,		CodeGen(SparsificationOptions o, unsigned numTensors, unsigned numLoops,
OpOperand *op, unsigned nest)		OpOperand *op, unsigned nest)
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
};		};

} // namespace		} // namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Sparse compiler analysis methods.		// Sparse compiler analysis methods.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		/// Helper method to construct a permuted dimension ordering
		/// that adheres to the given topological sort.
		static AffineMap permute(MLIRContext *context, AffineMap m,
		std::vector<unsigned> &topSort) {
		unsigned sz = topSort.size();
		SmallVector<unsigned, 4> perm(sz);
		for (unsigned i = 0; i < sz; i++)
		perm[i] = m.getPermutedPosition(topSort[i]);
		return AffineMap::getPermutationMap(perm, context);
		}

/// Helper method to apply dimension ordering permutation.		/// Helper method to apply dimension ordering permutation.
static unsigned perm(const SparseTensorEncodingAttr &enc, unsigned d) {		static unsigned perm(const SparseTensorEncodingAttr &enc, unsigned d) {
if (enc) {		if (enc) {
auto order = enc.getDimOrdering();		auto order = enc.getDimOrdering();
if (order) {		if (order) {
assert(order.isPermutation());		assert(order.isPermutation());
return order.getDimPosition(d);		return order.getDimPosition(d);
}		}
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
}		}

/// Computes a topologically sorted iteration graph for the linalg operation.		/// Computes a topologically sorted iteration graph for the linalg operation.
/// Ensures all tensors are visited in natural index order. This is essential		/// Ensures all tensors are visited in natural index order. This is essential
/// for sparse storage formats since these only support access along fixed		/// for sparse storage formats since these only support access along fixed
/// dimensions. Even for dense storage formats, however, the natural index		/// dimensions. Even for dense storage formats, however, the natural index
/// order yields innermost unit-stride access with better spatial locality.		/// order yields innermost unit-stride access with better spatial locality.
static bool computeIterationGraph(Merger &merger, linalg::GenericOp op,		static bool computeIterationGraph(Merger &merger, linalg::GenericOp op,
std::vector<unsigned> &topSort,		std::vector<unsigned> &topSort, unsigned mask,
unsigned mask) {		OpOperand *skip = nullptr) {
// Set up an n x n from/to adjacency matrix of the iteration graph		// Set up an n x n from/to adjacency matrix of the iteration graph
// for the implicit loop indices i_0 .. i_n-1.		// for the implicit loop indices i_0 .. i_n-1.
unsigned n = op.getNumLoops();		unsigned n = op.getNumLoops();
std::vector<std::vector<bool>> adjM(n, std::vector<bool>(n, false));		std::vector<std::vector<bool>> adjM(n, std::vector<bool>(n, false));

// Iterate over the indexing maps of every tensor in the tensor expression.		// Iterate over the indexing maps of every tensor in the tensor expression.
for (OpOperand *t : op.getInputAndOutputOperands()) {		for (OpOperand *t : op.getInputAndOutputOperands()) {
		// Skip tensor during cycle resolution.
		if (t == skip)
		continue;
		// Get map and encoding.
auto map = op.getTiedIndexingMap(t);		auto map = op.getTiedIndexingMap(t);
auto enc = getSparseTensorEncoding(t->get().getType());		auto enc = getSparseTensorEncoding(t->get().getType());
assert(map.getNumDims() == n);		assert(map.getNumDims() == n);
// Skip dense tensor constraints when not requested.		// Skip dense tensor constraints when not requested.
if (!(mask & SortMask::kIncludeDense) && !enc)		if (!(mask & SortMask::kIncludeDense) && !enc)
continue;		continue;
// Each tensor expression and optional dimension ordering (row-major		// Each tensor expression and optional dimension ordering (row-major
// by default) puts an ordering constraint on the loop indices. For		// by default) puts an ordering constraint on the loop indices. For
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	if (merger.isDim(tensor, i, Dim::kSparse)) {
allDense = false;		allDense = false;
break;		break;
}		}
if (allDense)		if (allDense)
return true;		return true;
// A tensor expression with a sparse output tensor that changes its values		// A tensor expression with a sparse output tensor that changes its values
// but not its nonzero structure, an operation called "simply dynamic" in		// but not its nonzero structure, an operation called "simply dynamic" in
// [Bik96,Ch9], is also admissable without special codegen, provided		// [Bik96,Ch9], is also admissable without special codegen, provided
// the tensor's underlying sparse storage scheme can be modified in place.		// the tensor's underlying sparse storage scheme can be modified in-place.
if (merger.isSingleCondition(tensor, exp) && isInPlace(lhs->get()))		if (merger.isSingleCondition(tensor, exp) && isInPlace(lhs->get()))
return true;		return true;
// Accept "truly dynamic" if the output tensor materializes uninitialized		// Accept "truly dynamic" if the output tensor materializes uninitialized
// into the computation and insertions occur in lexicographic index order.		// into the computation and insertions occur in lexicographic index order.
if (isMaterializing(lhs->get())) {		if (isMaterializing(lhs->get())) {
unsigned nest = 0;		unsigned nest = 0;
for (unsigned i = 0; i < numLoops; i++) {		for (unsigned i = 0; i < numLoops; i++) {
if (isReductionIterator(iteratorTypes[topSort[i]]))		if (isReductionIterator(iteratorTypes[topSort[i]]))
▲ Show 20 Lines • Show All 1,380 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(linalg::GenericOp op,
// information for all tensors to loop indices in the kernel.		// information for all tensors to loop indices in the kernel.
assert(op.getNumOutputs() == 1);		assert(op.getNumOutputs() == 1);
unsigned numTensors = op.getNumInputsAndOutputs();		unsigned numTensors = op.getNumInputsAndOutputs();
unsigned numLoops = op.iterator_types().getValue().size();		unsigned numLoops = op.iterator_types().getValue().size();
Merger merger(numTensors, numLoops);		Merger merger(numTensors, numLoops);
if (!findSparseAnnotations(merger, op))		if (!findSparseAnnotations(merger, op))
return failure();		return failure();

// Computes a topologically sorted iteration graph to ensure		// Computes a topologically sorted iteration graph to ensure tensors
// tensors are visited in natural index order. Fails on cycles.		// are visited in natural index order. Gradually relaxes the considered
// This assumes that higher-level passes have already put the		// constraints until an acyclic iteration graph results, such that sparse
// tensors in each tensor expression in a feasible order.		// code generation can proceed. As a last resort, an attempt is made
		// to resolve cycles by inserting a conversion.
std::vector<unsigned> topSort;		std::vector<unsigned> topSort;
if (!computeIterationGraph(merger, op, topSort,		if (!computeIterationGraph(merger, op, topSort, SortMask::kIncludeAll) &&
SortMask::kIncludeUndef \|
SortMask::kIncludeDense) &&
!computeIterationGraph(merger, op, topSort, SortMask::kIncludeUndef) &&		!computeIterationGraph(merger, op, topSort, SortMask::kIncludeUndef) &&
!computeIterationGraph(merger, op, topSort, SortMask::kIncludeDense) &&		!computeIterationGraph(merger, op, topSort, SortMask::kIncludeDense) &&
!computeIterationGraph(merger, op, topSort, SortMask::kSparseOnly))		!computeIterationGraph(merger, op, topSort, SortMask::kSparseOnly)) {
return failure();		return resolveCycle(merger, rewriter, op);
		}

// Builds the tensor expression for the Linalg operation in SSA form.		// Builds the tensor expression for the Linalg operation in SSA form.
Optional<unsigned> optExp = merger.buildTensorExpFromLinalg(op);		Optional<unsigned> optExp = merger.buildTensorExpFromLinalg(op);
if (!optExp.hasValue())		if (!optExp.hasValue())
return failure();		return failure();
unsigned exp = optExp.getValue();		unsigned exp = optExp.getValue();

// Rejects an inadmissable tensor expression.		// Rejects an inadmissable tensor expression.
OpOperand *sparseOut = nullptr;		OpOperand *sparseOut = nullptr;
unsigned outerParNest = 0;		unsigned outerParNest = 0;
if (!isAdmissableTensorExp(merger, op, topSort, exp, &sparseOut,		if (!isAdmissableTensorExp(merger, op, topSort, exp, &sparseOut,
outerParNest))		outerParNest))
return failure();		return failure();

// Recursively generates code.		// Recursively generates code.
merger.setHasSparseOut(sparseOut != nullptr);		merger.setHasSparseOut(sparseOut != nullptr);
CodeGen codegen(options, numTensors, numLoops, sparseOut, outerParNest);		CodeGen codegen(options, numTensors, numLoops, sparseOut, outerParNest);
genBuffers(merger, codegen, rewriter, op);		genBuffers(merger, codegen, rewriter, op);
genStmt(merger, codegen, rewriter, op, topSort, exp, 0);		genStmt(merger, codegen, rewriter, op, topSort, exp, 0);
genResult(merger, codegen, rewriter, op);		genResult(merger, codegen, rewriter, op);
return success();		return success();
}		}

private:		private:
		// Last resort cycle resolution.
		LogicalResult resolveCycle(Merger &merger, PatternRewriter &rewriter,
		linalg::GenericOp op) const {
		// Compute topological sort while leaving out every
		// sparse input tensor in succession until an acylic
		// iteration graph results.
		std::vector<unsigned> topSort;
		for (OpOperand *t : op.getInputOperands()) {
		unsigned tensor = t->getOperandNumber();
		Value tval = t->get();
		auto srcEnc = getSparseTensorEncoding(tval.getType());
		if (!srcEnc \|\|
		!computeIterationGraph(merger, op, topSort, SortMask::kSparseOnly, t))
		continue;
		// Found an input tensor that resolves the cycle by inserting a
		// conversion into a sparse tensor that adheres to the iteration
		// graph order. Also releases the temporary sparse tensor.
		//
		// TODO: investigate fusing the conversion with computation,
		// especially if it is a direct yield!
		//
		auto srcTp = tval.getType().cast<RankedTensorType>();
		auto dstEnc = SparseTensorEncodingAttr::get(
		op->getContext(), srcEnc.getDimLevelType(),
		permute(getContext(), op.getTiedIndexingMap(t), topSort), // new order
		srcEnc.getPointerBitWidth(), srcEnc.getIndexBitWidth());
		auto dstTp = RankedTensorType::get(srcTp.getShape(),
		srcTp.getElementType(), dstEnc);
		auto convert = rewriter.create<ConvertOp>(tval.getLoc(), dstTp, tval);
		op->setOperand(tensor, convert);
		rewriter.setInsertionPointAfter(op);
		rewriter.create<ReleaseOp>(tval.getLoc(), convert);
		return success();
		}
		// Cannot be resolved with a single conversion.
		// TODO: convert more than one?
		return failure();
		}

/// Options to control sparse code generation.		/// Options to control sparse code generation.
SparsificationOptions options;		SparsificationOptions options;
};		};

} // namespace		} // namespace

/// Populates the given patterns list with rewriting rules required for		/// Populates the given patterns list with rewriting rules required for
/// the sparsification of linear algebra operations.		/// the sparsification of linear algebra operations.
void mlir::populateSparsificationPatterns(		void mlir::populateSparsificationPatterns(
RewritePatternSet &patterns, const SparsificationOptions &options) {		RewritePatternSet &patterns, const SparsificationOptions &options) {
patterns.add<GenericOpSparsifier>(patterns.getContext(), options);		patterns.add<GenericOpSparsifier>(patterns.getContext(), options);
}		}

mlir/test/Dialect/SparseTensor/sparse_transpose.mlir

This file was added.

				// RUN: mlir-opt %s -sparsification \| FileCheck %s

				bixiaUnsubmitted Done Reply Inline Actions Maybe we shouldn't even need such a test as the generated code is so complicated and we already have an integration test to show that it works? bixia: Maybe we shouldn't even need such a test as the generated code is so complicated and we already…
				aartbikAuthorUnsubmitted Done Reply Inline Actions This was really to show the IR as part of the review too. Also, it shows that we should really address the TODO, since converting and then yielding is a bit strange. When we make that fix, this test will be triggered as being changed. aartbik: This was really to show the IR as part of the review too. Also, it shows that we should really…
				#DCSR = #sparse_tensor.encoding<{
				dimLevelType = [ "compressed", "compressed" ]
				}>

				#transpose_trait = {
				indexing_maps = [
				affine_map<(i,j) -> (j,i)>, // A
				affine_map<(i,j) -> (i,j)> // X
				],
				iterator_types = ["parallel", "parallel"],
				doc = "X(i,j) = A(j,i)"
				}

				// TODO: improve auto-conversion followed by yield

				// CHECK-LABEL: func.func @sparse_transpose_auto(
				// CHECK-SAME: %[[VAL_0:.*]]: tensor<3x4xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], pointerBitWidth = 0, indexBitWidth = 0 }>>) -> tensor<4x3xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], pointerBitWidth = 0, indexBitWidth = 0 }>> {
				// CHECK-DAG: %[[VAL_1:.*]] = arith.constant 0 : index
				// CHECK-DAG: %[[VAL_2:.*]] = arith.constant 1 : index
				// CHECK-DAG: %[[VAL_3:.*]] = arith.constant 2 : index
				// CHECK: %[[VAL_4:.*]] = bufferization.alloc_tensor() : tensor<4x3xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], pointerBitWidth = 0, indexBitWidth = 0 }>>
				// CHECK: %[[VAL_5:.*]] = sparse_tensor.convert %[[VAL_0]] : tensor<3x4xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], pointerBitWidth = 0, indexBitWidth = 0 }>> to tensor<3x4xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d1, d0)>, pointerBitWidth = 0, indexBitWidth = 0 }>>
				// CHECK: %[[VAL_6:.*]] = sparse_tensor.pointers %[[VAL_5]], %[[VAL_1]] : tensor<3x4xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d1, d0)>, pointerBitWidth = 0, indexBitWidth = 0 }>> to memref<?xindex>
				// CHECK: %[[VAL_7:.*]] = sparse_tensor.indices %[[VAL_5]], %[[VAL_1]] : tensor<3x4xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d1, d0)>, pointerBitWidth = 0, indexBitWidth = 0 }>> to memref<?xindex>
				// CHECK: %[[VAL_8:.*]] = sparse_tensor.pointers %[[VAL_5]], %[[VAL_2]] : tensor<3x4xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d1, d0)>, pointerBitWidth = 0, indexBitWidth = 0 }>> to memref<?xindex>
				// CHECK: %[[VAL_9:.*]] = sparse_tensor.indices %[[VAL_5]], %[[VAL_2]] : tensor<3x4xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d1, d0)>, pointerBitWidth = 0, indexBitWidth = 0 }>> to memref<?xindex>
				// CHECK: %[[VAL_10:.*]] = sparse_tensor.values %[[VAL_5]] : tensor<3x4xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d1, d0)>, pointerBitWidth = 0, indexBitWidth = 0 }>> to memref<?xf64>
				// CHECK: %[[VAL_11:.*]] = memref.alloca(%[[VAL_3]]) : memref<?xindex>
				// CHECK: %[[VAL_12:.*]] = memref.alloca() : memref<f64>
				// CHECK: %[[VAL_13:.*]] = memref.load %[[VAL_6]]{{\[}}%[[VAL_1]]] : memref<?xindex>
				// CHECK: %[[VAL_14:.*]] = memref.load %[[VAL_6]]{{\[}}%[[VAL_2]]] : memref<?xindex>
				// CHECK: scf.for %[[VAL_15:.*]] = %[[VAL_13]] to %[[VAL_14]] step %[[VAL_2]] {
				// CHECK: %[[VAL_16:.*]] = memref.load %[[VAL_7]]{{\[}}%[[VAL_15]]] : memref<?xindex>
				// CHECK: memref.store %[[VAL_16]], %[[VAL_11]]{{\[}}%[[VAL_1]]] : memref<?xindex>
				// CHECK: %[[VAL_17:.*]] = memref.load %[[VAL_8]]{{\[}}%[[VAL_15]]] : memref<?xindex>
				// CHECK: %[[VAL_18:.*]] = arith.addi %[[VAL_15]], %[[VAL_2]] : index
				// CHECK: %[[VAL_19:.*]] = memref.load %[[VAL_8]]{{\[}}%[[VAL_18]]] : memref<?xindex>
				// CHECK: scf.for %[[VAL_20:.*]] = %[[VAL_17]] to %[[VAL_19]] step %[[VAL_2]] {
				// CHECK: %[[VAL_21:.*]] = memref.load %[[VAL_9]]{{\[}}%[[VAL_20]]] : memref<?xindex>
				// CHECK: memref.store %[[VAL_21]], %[[VAL_11]]{{\[}}%[[VAL_2]]] : memref<?xindex>
				// CHECK: %[[VAL_22:.*]] = memref.load %[[VAL_10]]{{\[}}%[[VAL_20]]] : memref<?xf64>
				// CHECK: memref.store %[[VAL_22]], %[[VAL_12]][] : memref<f64>
				// CHECK: sparse_tensor.lex_insert %[[VAL_4]], %[[VAL_11]], %[[VAL_12]] : tensor<4x3xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], pointerBitWidth = 0, indexBitWidth = 0 }>>, memref<?xindex>, memref<f64>
				// CHECK: }
				// CHECK: }
				// CHECK: %[[VAL_23:.*]] = sparse_tensor.load %[[VAL_4]] hasInserts : tensor<4x3xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], pointerBitWidth = 0, indexBitWidth = 0 }>>
				// CHECK: sparse_tensor.release %[[VAL_5]] : tensor<3x4xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d1, d0)>, pointerBitWidth = 0, indexBitWidth = 0 }>>
				// CHECK: return %[[VAL_23]] : tensor<4x3xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed" ], pointerBitWidth = 0, indexBitWidth = 0 }>>
				// CHECK: }
				func.func @sparse_transpose_auto(%arga: tensor<3x4xf64, #DCSR>)
				-> tensor<4x3xf64, #DCSR> {
				%i = bufferization.alloc_tensor() : tensor<4x3xf64, #DCSR>
				%0 = linalg.generic #transpose_trait
				ins(%arga: tensor<3x4xf64, #DCSR>)
				outs(%i: tensor<4x3xf64, #DCSR>) {
				^bb(%a: f64, %x: f64):
				linalg.yield %a : f64
				} -> tensor<4x3xf64, #DCSR>
				return %0 : tensor<4x3xf64, #DCSR>
				}

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_transpose.mlir

Show All 19 Lines	#transpose_trait = {
iterator_types = ["parallel", "parallel"],		iterator_types = ["parallel", "parallel"],
doc = "X(i,j) = A(j,i)"		doc = "X(i,j) = A(j,i)"
}		}

module {		module {

//		//
// Transposing a sparse row-wise matrix into another sparse row-wise		// Transposing a sparse row-wise matrix into another sparse row-wise
// matrix would fail direct codegen, since it introduces a cycle in		// matrix introduces a cycle in the iteration graph. This complication
// the iteration graph. This can be avoided by converting the incoming		// can be avoided by manually inserting a conversion of the incoming
// matrix into a sparse column-wise matrix first.		// matrix into a sparse column-wise matrix first.
//		//
func.func @sparse_transpose(%arga: tensor<3x4xf64, #DCSR>) -> tensor<4x3xf64, #DCSR> {		func.func @sparse_transpose(%arga: tensor<3x4xf64, #DCSR>)
%t = sparse_tensor.convert %arga : tensor<3x4xf64, #DCSR> to tensor<3x4xf64, #DCSC>		-> tensor<4x3xf64, #DCSR> {
		%t = sparse_tensor.convert %arga
		: tensor<3x4xf64, #DCSR> to tensor<3x4xf64, #DCSC>

%i = bufferization.alloc_tensor() : tensor<4x3xf64, #DCSR>		%i = bufferization.alloc_tensor() : tensor<4x3xf64, #DCSR>

%0 = linalg.generic #transpose_trait		%0 = linalg.generic #transpose_trait
ins(%t: tensor<3x4xf64, #DCSC>)		ins(%t: tensor<3x4xf64, #DCSC>)
outs(%i: tensor<4x3xf64, #DCSR>) {		outs(%i: tensor<4x3xf64, #DCSR>) {
^bb(%a: f64, %x: f64):		^bb(%a: f64, %x: f64):
linalg.yield %a : f64		linalg.yield %a : f64
} -> tensor<4x3xf64, #DCSR>		} -> tensor<4x3xf64, #DCSR>

sparse_tensor.release %t : tensor<3x4xf64, #DCSC>		sparse_tensor.release %t : tensor<3x4xf64, #DCSC>

return %0 : tensor<4x3xf64, #DCSR>		return %0 : tensor<4x3xf64, #DCSR>
}		}

		bixiaUnsubmitted Done Reply Inline Actions Would it be better to just replace this old function with the new one without the convert op? bixia: Would it be better to just replace this old function with the new one without the convert op?
		aartbikAuthorUnsubmitted Done Reply Inline Actions I wanted to contrast the old with the new, but I agree that the comment is now a bit out of date. I have updated the comments to show the manual vs. auto insertion. aartbik: I wanted to contrast the old with the new, but I agree that the comment is now a bit out of…
//		//
		// However, even better, the sparse compiler is able to insert such a
		// conversion automatically to resolve a cycle in the iteration graph!
		//
		func.func @sparse_transpose_auto(%arga: tensor<3x4xf64, #DCSR>)
		-> tensor<4x3xf64, #DCSR> {
		%i = bufferization.alloc_tensor() : tensor<4x3xf64, #DCSR>
		%0 = linalg.generic #transpose_trait
		ins(%arga: tensor<3x4xf64, #DCSR>)
		outs(%i: tensor<4x3xf64, #DCSR>) {
		^bb(%a: f64, %x: f64):
		linalg.yield %a : f64
		} -> tensor<4x3xf64, #DCSR>
		return %0 : tensor<4x3xf64, #DCSR>
		}

		//
// Main driver.		// Main driver.
//		//
func.func @entry() {		func.func @entry() {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index		%c1 = arith.constant 1 : index
%c4 = arith.constant 4 : index		%c4 = arith.constant 4 : index
%du = arith.constant 0.0 : f64		%du = arith.constant 0.0 : f64

// Setup input sparse matrix from compressed constant.		// Setup input sparse matrix from compressed constant.
%d = arith.constant dense <[		%d = arith.constant dense <[
[ 1.1, 1.2, 0.0, 1.4 ],		[ 1.1, 1.2, 0.0, 1.4 ],
[ 0.0, 0.0, 0.0, 0.0 ],		[ 0.0, 0.0, 0.0, 0.0 ],
[ 3.1, 0.0, 3.3, 3.4 ]		[ 3.1, 0.0, 3.3, 3.4 ]
]> : tensor<3x4xf64>		]> : tensor<3x4xf64>
%a = sparse_tensor.convert %d : tensor<3x4xf64> to tensor<3x4xf64, #DCSR>		%a = sparse_tensor.convert %d : tensor<3x4xf64> to tensor<3x4xf64, #DCSR>

// Call the kernel.		// Call the kernels.
%0 = call @sparse_transpose(%a) : (tensor<3x4xf64, #DCSR>) -> tensor<4x3xf64, #DCSR>		%0 = call @sparse_transpose(%a)
		: (tensor<3x4xf64, #DCSR>) -> tensor<4x3xf64, #DCSR>
		%1 = call @sparse_transpose_auto(%a)
		: (tensor<3x4xf64, #DCSR>) -> tensor<4x3xf64, #DCSR>

//		//
// Verify result.		// Verify result.
//		//
// CHECK: ( 1.1, 0, 3.1 )		// CHECK: ( 1.1, 0, 3.1 )
// CHECK-NEXT: ( 1.2, 0, 0 )		// CHECK-NEXT: ( 1.2, 0, 0 )
// CHECK-NEXT: ( 0, 0, 3.3 )		// CHECK-NEXT: ( 0, 0, 3.3 )
// CHECK-NEXT: ( 1.4, 0, 3.4 )		// CHECK-NEXT: ( 1.4, 0, 3.4 )
//		//
		// CHECK-NEXT: ( 1.1, 0, 3.1 )
		// CHECK-NEXT: ( 1.2, 0, 0 )
		// CHECK-NEXT: ( 0, 0, 3.3 )
		// CHECK-NEXT: ( 1.4, 0, 3.4 )
		//
%x = sparse_tensor.convert %0 : tensor<4x3xf64, #DCSR> to tensor<4x3xf64>		%x = sparse_tensor.convert %0 : tensor<4x3xf64, #DCSR> to tensor<4x3xf64>
%m = bufferization.to_memref %x : memref<4x3xf64>		%m = bufferization.to_memref %x : memref<4x3xf64>
scf.for %i = %c0 to %c4 step %c1 {		scf.for %i = %c0 to %c4 step %c1 {
%v = vector.transfer_read %m[%i, %c0], %du: memref<4x3xf64>, vector<3xf64>		%v1 = vector.transfer_read %m[%i, %c0], %du: memref<4x3xf64>, vector<3xf64>
vector.print %v : vector<3xf64>		vector.print %v1 : vector<3xf64>
		}
		%y = sparse_tensor.convert %1 : tensor<4x3xf64, #DCSR> to tensor<4x3xf64>
		%n = bufferization.to_memref %y : memref<4x3xf64>
		scf.for %i = %c0 to %c4 step %c1 {
		%v2 = vector.transfer_read %n[%i, %c0], %du: memref<4x3xf64>, vector<3xf64>
		vector.print %v2 : vector<3xf64>
}		}

// Release resources.		// Release resources.
sparse_tensor.release %a : tensor<3x4xf64, #DCSR>		sparse_tensor.release %a : tensor<3x4xf64, #DCSR>
sparse_tensor.release %0 : tensor<4x3xf64, #DCSR>		sparse_tensor.release %0 : tensor<4x3xf64, #DCSR>
		sparse_tensor.release %1 : tensor<4x3xf64, #DCSR>
memref.dealloc %m : memref<4x3xf64>		memref.dealloc %m : memref<4x3xf64>
		memref.dealloc %n : memref<4x3xf64>

return		return
}		}
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] auto-insertion of conversion to resolve cyclesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 441220

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp

mlir/test/Dialect/SparseTensor/sparse_transpose.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_transpose.mlir

[mlir][sparse] auto-insertion of conversion to resolve cycles
ClosedPublic