This is an archive of the discontinued LLVM Phabricator instance.

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen.cpp
328 ↗	(On Diff #462817)	Why this (and the above one) is not static? is it used outside this file?
419 ↗	(On Diff #462817)	Will this tailing recursion be optimized (by any pass in MLIR)?
573 ↗	(On Diff #462817)	Do you think it is a good idea to decouple SortOp from these functions? i.e., they instead take two ValueRange, one for sorting and one for parallel arrays (It will probably make Aart's life easier for compress operator later). You can also make the `compare` function as an extra callback argument to the function, to make it extensible for other orders. We can also move these function into `CodegenUtils.cpp` maybe?

Peiming added inline comments.Sep 26 2022, 9:30 AM

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen.cpp
382–383 ↗	(On Diff #462817)	Just out of curiosity, any reason why you chose to generate a function here? (you can do it fully inline, right?)

Decouple SortOp from the utility routines that generate sorting code.
Add missing static keyword to a routine.

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen.cpp
328 ↗	(On Diff #462817)	Was a mistake.
573 ↗	(On Diff #462817)	Remove SortOp from those routine. Per offline discussion, we will try to progressively lowering and avoid the need to generate sort code without going through sort op.

Harbormaster completed remote builds in B188758: Diff 462981.Sep 26 2022, 12:26 PM

aartbik added inline comments.Sep 26 2022, 1:07 PM

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen.cpp
1031 ↗	(On Diff #462981)	As discussed offline, let's put this in a rewriter pass (that runs after codegen). So we will rewriting: pre-rewriting sparsification conversion: "codegen", which can introduce sort and push_back etc. rewriting: post-rewriting (which deals with sort, push_back, foreach? etc)

wrengr added inline comments.Sep 26 2022, 3:12 PM

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen.cpp
18 ↗	(On Diff #462981)	Usually we put angle-bracket includes after all the double-quote includes
321–324 ↗	(On Diff #462981)	This is already available as `drop_front` (inherited from `llvm::detail::indexed_accessor_range_base`)

Since this is a sizable chunk of code that's rather independent of the rest of the codegen, I think it should be moved off to a separate file. (Anticipating there being a number of other similar large chunks of codegen code, I'd suggest adding a subdirectory and calling it lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen/Sort.cpp)

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen.cpp
341 ↗	(On Diff #462981)	I think this needs a better name. When reading the definition itself it looks fine, but later on when it's called I got a bit confused because the name suggests (to me) that it's returning the main "sort" function itself, rather than being reused to do the name mangling for any helper function associated with the main sort function. If you move all the sorting codegen off to its own file, then we could name this `getMangledFunc` (or `getTypeMangledFunc`, `getFuncInstanceForType`, etc). Whereas if you keep it in this file, then it'd need something more explicit like `getMangledSortingHelperFunc` to keep it clear that it's only for sorting helper functions.
345 ↗	(On Diff #462981)	I think the namePrefix should be moved earlier in the arguments. To closely match the argument ordering used elsewhere in MLIR it should be `(builder, insertPoint, resultTypes, namePrefix, dim, operands, createFunc)`; that is, so it matches the `(returnType, name, operands)` ordering used elsewhere. But regardless of the ordering of the other arguments, the point is to make sure the `createFunc` argument comes last, since doing so allows for nicer code formatting when the createFunc is a lambda (cf., the "To take best advantage of this formatting" paragraph at https://llvm.org/docs/CodingStandards.html#format-lambdas-like-blocks-of-code )
373–374 ↗	(On Diff #462981)	I'm curious why you have the `i != j` conditional as part of the generated function, rather than having it at the callsites?
421 ↗	(On Diff #462981)	This should be spelled "than", not "then". Ditto for `createLessThanFunc`, etc.
522 ↗	(On Diff #462981)	ditto: "lessThan"
524 ↗	(On Diff #462981)	ditto: "less_than"
419 ↗	(On Diff #462817)	Also, I think it would probably be better to convert this recursion into a for-loop instead. That way we can leave it up to downstream lowerings to decide whether and how much to unroll the loop. (Or this pass could be given a configuration setting to allow the client to decide whether they want the loop unrolled or not.)

Move the implementation to sparse-buffer-rewrite.

In D134627#3816660, @wrengr wrote:

Since this is a sizable chunk of code that's rather independent of the rest of the codegen, I think it should be moved off to a separate file. (Anticipating there being a number of other similar large chunks of codegen code, I'd suggest adding a subdirectory and calling it lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen/Sort.cpp)

This is moved to file for rewriting primitives that use buffers now.

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen.cpp
341 ↗	(On Diff #462981)	Change to use getMangledSortHelperFunc.
345 ↗	(On Diff #462981)	Use this order (builder, insertPoint, resultTypes, namePrefix, dim, operands, createFunc)
373–374 ↗	(On Diff #462981)	Rename this to CreateMaySwapeFunc to clarify the purpose of the function and to avoid the confusion hopefully. It has two call sites, and can share the code to create and conditional block.
419 ↗	(On Diff #462817)	This part doesn't generate recursive calls in MLIR, it is only the compile codegen algorithm itself is recursive. But I rewrite the compiler codegen into a loop anyway.

bixia retitled this revision from [mlir][sparse] Add codegen rule for the sort operator. to [mlir][sparse] Add rewrite rule for the sort operator..Sep 27 2022, 9:13 AM

bixia edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B188969: Diff 463254.Sep 27 2022, 9:27 AM

Rebase.

Harbormaster completed remote builds in B189024: Diff 463337.Sep 27 2022, 2:48 PM

wrengr added inline comments.Sep 27 2022, 4:10 PM

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen.cpp
419 ↗	(On Diff #462817)	Yeah, I know the generated MLIR wasn't making recursive function calls. (If it were then lower-level passes would have a much harder time determining if it's safe to unroll.) And I'm totally fine with the C++ function being recursive (since the recursion has bounded depth). Rather, my concern was about the MLIR-code bloat of always unrolling the loop. I'm not convinced there's always a performance benefit to unrolling the loop, hence my suggestion to generate an MLIR-loop and leave it to subsequent passes to decide whether and how much to unroll it (or vectorize it, etc).

bixia added inline comments.Sep 27 2022, 4:23 PM

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorCodegen.cpp
419 ↗	(On Diff #462817)	This is NOT an unrolled loop in MLIR. The generated code compares two tuples of len(xs) values. It is not a loop because the two tuples are NOT stored in two memref that we can loop through.

wrengr added inline comments.Sep 27 2022, 5:08 PM

mlir/lib/Dialect/SparseTensor/Pipelines/SparseTensorPipelines.cpp
67	Does this need to be a global pass, or can you use `addNestedPass<FuncOp>` instead? (It's unclear to me if that's valid for the way `getMangledSortHelperFunc` is defined)
mlir/lib/Dialect/SparseTensor/Transforms/SparseBufferRewriting.cpp
93	iirc, you can just omit the variable name when it's unused. Though I don't recall if the MLIR style-guide has a stance on the syntax to use here
110	Why not use `for (auto arg : args.drop_front(xStartIdx))`?
127–128	why call these `vi`/`vj` instead of `xi`/`xj`?
170	The indentation here doesn't match the preceding `if`/`else if`
171	should be `x1`
171–174	As I mentioned on the comment thread for the previous version, I don't mind the C++ function being recursive. If you're going to unroll the MLIR-loop anyways, then I think the recursive C++-function was cleaner/easier to read. But since you've already made the change, it's up to you which implementation to go with. FWIW, I only just noticed that the loop is over a `ValueRange` rather than over a memref or some other runtime collection. So there isn't any clean way to convert this into an MLIR-loop like I'd been suggesting. Since the number of dimensions is generally going to be small, the unrolled loop is surely better than allocating and initializing a collection just to loop over it. Though if do we notice code bloat becoming an issue, then at that point in time we could always switch over to allocating such a collection (since the same collection would be reused for all the different loops throughout the sorting algorithm, so the cost of constructing it can be amortized).
242	why not `boolType` or `i1Type`?
283–292	I haven't checked, but is the current implementation a stable-sort? If so, then should add that to the documentation. If not, then should update the partitioning to make it stable (and then document that fact).
350–361	I think I missed it in the chat but, what's the reason for wanting to cast things to dynamic shapes?

Address review comments.

mlir/lib/Dialect/SparseTensor/Pipelines/SparseTensorPipelines.cpp
67	Per offline discussion, need to be a global passs for adding new funcs.
mlir/lib/Dialect/SparseTensor/Transforms/SparseBufferRewriting.cpp
93	I thought about this, but found an MLIR example
127–128	We use x0, x1, to refer to the indices for dim0, dim1. So, I call them valuei, value j and vi, vj for short. I use the same naming in the integration test.
171–174	Acknowledge.
283–292	We can't make quick sort stable without using extra storage. Per offline discussion, we will add an attribute for requesting a stable sort and implement another algorithm for that.
350–361	Reusing the same MLIR routine for different static shapes with the same element types.

Harbormaster completed remote builds in B189187: Diff 463572.Sep 28 2022, 8:49 AM

This is looking pretty good! A few minor comments and suggestions

mlir/lib/Dialect/SparseTensor/Transforms/SparseBufferRewriting.cpp
54	can we make this a typedef, so that createFunc can appear at the same line in the argument list
85	Rather than generating a function call for swapping, have you tried the parallel assignment ("tuples" in and out) for swapping and generate it inline? I have not checked this fully, but hopefully that generates more efficient code. if (i != j) x0[i], x1[i], ... = x0[j], x1[j], ....
93	I don't think that is the right example (the "unusedDims" are used to indicate the ones that are not used, right?). But there was some discussion on this a while back, with most people in favor of keeping the var names in the definition (omitting in declaration is accepted)
mlir/test/Dialect/SparseTensor/buffer_rewriting.mlir
7	do you want to add some CHECKing of the generated code? I realize that being too pattern specific may be brittle, but you could perhaps test for some basic structure?

Address review comments.

mlir/lib/Dialect/SparseTensor/Transforms/SparseBufferRewriting.cpp
85	parallel assignment is a C like syntax. In MLIR, we need to load memref and store memref. That is what the code generates currently.
93	Thanks! Keep the name then.

bixia marked an inline comment as done and an inline comment as not done.Sep 28 2022, 1:01 PM

Harbormaster completed remote builds in B189245: Diff 463659.Sep 28 2022, 1:16 PM

bixia marked an inline comment as done.Sep 29 2022, 9:41 AM

aartbik accepted this revision.Sep 29 2022, 11:08 AM

aartbik added inline comments.

mlir/lib/Dialect/SparseTensor/Transforms/SparseBufferRewriting.cpp
85	I was thinking of another compiler with the parallel assignment ;-) But my major point of course was inlining the swaps vs. introducing a new function. But we can always revisit that based on on perf analysis.
mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorPasses.cpp
162	This should be in already (rebase with main), also comment on L160 changed ;-)

This revision is now accepted and ready to land.Sep 29 2022, 11:08 AM

Closed by commit rG062e515b7019: [mlir][sparse] Add rewrite rule for the sort operator. (authored by bixia). · Explain WhySep 29 2022, 11:38 AM

This revision was automatically updated to reflect the committed changes.

bixia added a commit: rG062e515b7019: [mlir][sparse] Add rewrite rule for the sort operator..

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

SparseTensor/

Transforms/

Passes.h

3 lines

Passes.td

15 lines

lib/

Dialect/

SparseTensor/

Pipelines/

SparseTensorPipelines.cpp

1 line

Transforms/

CMakeLists.txt

1 line

SparseBufferRewriting.cpp

382 lines

SparseTensorPasses.cpp

19 lines

test/

Dialect/

SparseTensor/

buffer_rewriting.mlir

107 lines

Integration/

Dialect/

SparseTensor/

CPU/

sparse_rewrite_sort.mlir

100 lines

Diff 463977

mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.h

	Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines
	// Other rewriting rules and passes.			// Other rewriting rules and passes.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	void populateSparseTensorRewriting(RewritePatternSet &patterns, bool enableRT);			void populateSparseTensorRewriting(RewritePatternSet &patterns, bool enableRT);

	std::unique_ptr<Pass> createDenseBufferizationPass(			std::unique_ptr<Pass> createDenseBufferizationPass(
	const bufferization::OneShotBufferizationOptions &options);			const bufferization::OneShotBufferizationOptions &options);

				void populateSparseBufferRewriting(RewritePatternSet &patterns);
				std::unique_ptr<Pass> createSparseBufferRewritePass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Registration.			// Registration.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	/// Generate the code for registering passes.			/// Generate the code for registering passes.
	#define GEN_PASS_REGISTRATION			#define GEN_PASS_REGISTRATION
	#include "mlir/Dialect/SparseTensor/Transforms/Passes.h.inc"			#include "mlir/Dialect/SparseTensor/Transforms/Passes.h.inc"

	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_SPARSETENSOR_TRANSFORMS_PASSES_H_			#endif // MLIR_DIALECT_SPARSETENSOR_TRANSFORMS_PASSES_H_

mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.td

Show First 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	let dependentDialects = [
"bufferization::BufferizationDialect",		"bufferization::BufferizationDialect",
"linalg::LinalgDialect",		"linalg::LinalgDialect",
"memref::MemRefDialect",		"memref::MemRefDialect",
"scf::SCFDialect",		"scf::SCFDialect",
"sparse_tensor::SparseTensorDialect",		"sparse_tensor::SparseTensorDialect",
];		];
}		}

		def SparseBufferRewrite : Pass<"sparse-buffer-rewrite", "ModuleOp"> {
		let summary = "Rewrite sparse primitives on buffers to actual code";
		let description = [{
		A pass that rewrites sparse primitives on buffers to the MLIR implementation
		of the primitives. For example, sparse_tensor.sort operator is implemented
		in this pass.
		}];
		let constructor = "mlir::createSparseBufferRewritePass()";
		let dependentDialects = [
		"arith::ArithmeticDialect",
		"memref::MemRefDialect",
		"scf::SCFDialect",
		"sparse_tensor::SparseTensorDialect",
		];
		}
#endif // MLIR_DIALECT_SPARSETENSOR_TRANSFORMS_PASSES		#endif // MLIR_DIALECT_SPARSETENSOR_TRANSFORMS_PASSES

mlir/lib/Dialect/SparseTensor/Pipelines/SparseTensorPipelines.cpp

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	void mlir::sparse_tensor::buildSparseCompiler(
if (options.testBufferizationAnalysisOnly)		if (options.testBufferizationAnalysisOnly)
return;		return;
pm.addPass(createSparsificationPass(options.sparsificationOptions()));		pm.addPass(createSparsificationPass(options.sparsificationOptions()));
if (options.enableRuntimeLibrary)		if (options.enableRuntimeLibrary)
pm.addPass(createSparseTensorConversionPass(		pm.addPass(createSparseTensorConversionPass(
options.sparseTensorConversionOptions()));		options.sparseTensorConversionOptions()));
else		else
pm.addPass(createSparseTensorCodegenPass());		pm.addPass(createSparseTensorCodegenPass());
		pm.addPass(createSparseBufferRewritePass());
		wrengrUnsubmitted Not Done Reply Inline Actions Does this need to be a global pass, or can you use `addNestedPass<FuncOp>` instead? (It's unclear to me if that's valid for the way `getMangledSortHelperFunc` is defined) wrengr: Does this need to be a global pass, or can you use `addNestedPass<FuncOp>` instead? (It's…
		bixiaAuthorUnsubmitted Done Reply Inline Actions Per offline discussion, need to be a global passs for adding new funcs. bixia: Per offline discussion, need to be a global passs for adding new funcs.
pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());		pm.addNestedPass<func::FuncOp>(createCanonicalizerPass());
pm.addPass(createDenseBufferizationPass(		pm.addPass(createDenseBufferizationPass(
getBufferizationOptions(/analysisOnly=/false)));		getBufferizationOptions(/analysisOnly=/false)));
pm.addNestedPass<func::FuncOp>(		pm.addNestedPass<func::FuncOp>(
mlir::bufferization::createFinalizingBufferizePass());		mlir::bufferization::createFinalizingBufferizePass());
// TODO(springerm): Add sparse support to the BufferDeallocation pass and add		// TODO(springerm): Add sparse support to the BufferDeallocation pass and add
// it to this pipeline.		// it to this pipeline.
pm.addNestedPass<func::FuncOp>(createConvertLinalgToLoopsPass());		pm.addNestedPass<func::FuncOp>(createConvertLinalgToLoopsPass());
Show All 27 Lines

mlir/lib/Dialect/SparseTensor/Transforms/CMakeLists.txt

	add_mlir_dialect_library(MLIRSparseTensorTransforms			add_mlir_dialect_library(MLIRSparseTensorTransforms
	BufferizableOpInterfaceImpl.cpp			BufferizableOpInterfaceImpl.cpp
	CodegenUtils.cpp			CodegenUtils.cpp
	DenseBufferizationPass.cpp			DenseBufferizationPass.cpp
	Sparsification.cpp			Sparsification.cpp
				SparseBufferRewriting.cpp
	SparseTensorCodegen.cpp			SparseTensorCodegen.cpp
	SparseTensorConversion.cpp			SparseTensorConversion.cpp
	SparseTensorPasses.cpp			SparseTensorPasses.cpp
	SparseTensorRewriting.cpp			SparseTensorRewriting.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/SparseTensor			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/SparseTensor

	Show All 24 Lines

mlir/lib/Dialect/SparseTensor/Transforms/SparseBufferRewriting.cpp

This file was added.

				//===- SparseBufferRewriting.cpp - Sparse buffer rewriting rules ----------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements rewriting rules that are specific to sparse tensor
				// primitives with memref operands.
				//
				//===----------------------------------------------------------------------===//

				#include "CodegenUtils.h"

				#include "mlir/Dialect/Arithmetic/IR/Arithmetic.h"
				#include "mlir/Dialect/Func/IR/FuncOps.h"
				#include "mlir/Dialect/MemRef/IR/MemRef.h"
				#include "mlir/Dialect/SCF/IR/SCF.h"
				#include "mlir/Dialect/SparseTensor/IR/SparseTensor.h"
				#include "mlir/Dialect/SparseTensor/Transforms/Passes.h"
				#include "mlir/Support/LLVM.h"

				using namespace mlir;
				using namespace mlir::sparse_tensor;

				//===---------------------------------------------------------------------===//
				// Helper methods for the actual rewriting rules.
				//===---------------------------------------------------------------------===//

				constexpr uint64_t loIdx = 0;
				constexpr uint64_t hiIdx = 1;
				constexpr uint64_t xStartIdx = 2;

				typedef function_ref<void(OpBuilder &, ModuleOp, func::FuncOp, size_t)>
				FuncGeneratorType;

				/// Constructs a function name with this format to facilitate quick sort:
				/// <namePrefix><dim>_<x type>_<y0 type>..._<yn type>
				static void getMangledSortHelperFuncName(llvm::raw_svector_ostream &nameOstream,
				StringRef namePrefix, size_t dim,
				ValueRange operands) {
				nameOstream
				<< namePrefix << dim << "_"
				<< operands[xStartIdx].getType().cast<MemRefType>().getElementType();

				for (Value v : operands.drop_front(xStartIdx + dim))
				nameOstream << "_" << v.getType().cast<MemRefType>().getElementType();
				}

				/// Looks up a function that is appropriate for the given operands being
				/// sorted, and creates such a function if it doesn't exist yet.
				static FlatSymbolRefAttr
				getMangledSortHelperFunc(OpBuilder &builder, func::FuncOp insertPoint,
				aartbikUnsubmitted Done Reply Inline Actions can we make this a typedef, so that createFunc can appear at the same line in the argument list aartbik: can we make this a typedef, so that createFunc can appear at the same line in the argument list
				TypeRange resultTypes, StringRef namePrefix,
				size_t dim, ValueRange operands,
				FuncGeneratorType createFunc) {
				SmallString<32> nameBuffer;
				llvm::raw_svector_ostream nameOstream(nameBuffer);
				getMangledSortHelperFuncName(nameOstream, namePrefix, dim, operands);

				ModuleOp module = insertPoint->getParentOfType<ModuleOp>();
				MLIRContext *context = module.getContext();
				auto result = SymbolRefAttr::get(context, nameOstream.str());
				auto func = module.lookupSymbol<func::FuncOp>(result.getAttr());

				if (!func) {
				// Create the function.
				OpBuilder::InsertionGuard insertionGuard(builder);
				builder.setInsertionPoint(insertPoint);
				Location loc = insertPoint.getLoc();
				func = builder.create<func::FuncOp>(
				loc, nameOstream.str(),
				FunctionType::get(context, operands.getTypes(), resultTypes));
				func.setPrivate();
				createFunc(builder, module, func, dim);
				}

				return result;
				}

				/// Creates a function for swapping the values in index i and j for all the
				/// buffers.
				//
				// The generate IR corresponds to this C like algorithm:
				aartbikUnsubmitted Not Done Reply Inline Actions Rather than generating a function call for swapping, have you tried the parallel assignment ("tuples" in and out) for swapping and generate it inline? I have not checked this fully, but hopefully that generates more efficient code. if (i != j) x0[i], x1[i], ... = x0[j], x1[j], .... aartbik: Rather than generating a function call for swapping, have you tried the parallel assignment…
				bixiaAuthorUnsubmitted Not Done Reply Inline Actions parallel assignment is a C like syntax. In MLIR, we need to load memref and store memref. That is what the code generates currently. bixia: parallel assignment is a C like syntax. In MLIR, we need to load memref and store memref. That…
				aartbikUnsubmitted Not Done Reply Inline Actions I was thinking of another compiler with the parallel assignment ;-) But my major point of course was inlining the swaps vs. introducing a new function. But we can always revisit that based on on perf analysis. aartbik: I was thinking of another compiler with the parallel assignment ;-) But my major point of…
				// if (i != j) {
				// swap(x0[i], x0[j]);
				// swap(x1[i], x1[j]);
				// ...
				// swap(xn[i], xn[j]);
				// swap(y0[i], y0[j]);
				// ...
				// swap(yn[i], yn[j]);
				wrengrUnsubmitted Not Done Reply Inline Actions iirc, you can just omit the variable name when it's unused. Though I don't recall if the MLIR style-guide has a stance on the syntax to use here wrengr: iirc, you can just omit the variable name when it's unused. Though I don't recall if the MLIR…
				bixiaAuthorUnsubmitted Done Reply Inline Actions I thought about this, but found an MLIR example bixia: I thought about this, but found [an MLIR example](https://github.com/llvm/llvm…
				aartbikUnsubmitted Not Done Reply Inline Actions I don't think that is the right example (the "unusedDims" are used to indicate the ones that are not used, right?). But there was some discussion on this a while back, with most people in favor of keeping the var names in the definition (omitting in declaration is accepted) aartbik: I don't think that is the right example (the "unusedDims" are used to indicate the ones that…
				bixiaAuthorUnsubmitted Done Reply Inline Actions Thanks! Keep the name then. bixia: Thanks! Keep the name then.
				// }
				static void createMaySwapFunc(OpBuilder &builder, ModuleOp unused,
				func::FuncOp func, size_t dim) {
				OpBuilder::InsertionGuard insertionGuard(builder);

				Block *entryBlock = func.addEntryBlock();
				builder.setInsertionPointToStart(entryBlock);

				Location loc = func.getLoc();
				ValueRange args = entryBlock->getArguments();
				Value i = args[0];
				Value j = args[1];
				Value cond =
				builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::ne, i, j);
				scf::IfOp ifOp = builder.create<scf::IfOp>(loc, cond, /else=/false);

				// If i!=j swap values in the buffers.
				wrengrUnsubmitted Done Reply Inline Actions Why not use `for (auto arg : args.drop_front(xStartIdx))`? wrengr: Why not use `for (auto arg : args.drop_front(xStartIdx))`?
				builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
				for (auto arg : args.drop_front(xStartIdx)) {
				Value vi = builder.create<memref::LoadOp>(loc, arg, i);
				Value vj = builder.create<memref::LoadOp>(loc, arg, j);
				builder.create<memref::StoreOp>(loc, vj, arg, i);
				builder.create<memref::StoreOp>(loc, vi, arg, j);
				}

				builder.setInsertionPointAfter(ifOp);
				builder.create<func::ReturnOp>(loc);
				}

				/// Generates an if-statement to compare x[i] and x[j].
				static scf::IfOp createLessThanCompare(OpBuilder &builder, Location loc,
				Value i, Value j, Value x,
				bool isLastDim) {
				Value f = constantI1(builder, loc, false);
				Value t = constantI1(builder, loc, true);
				wrengrUnsubmitted Not Done Reply Inline Actions why call these `vi`/`vj` instead of `xi`/`xj`? wrengr: why call these `vi`/`vj` instead of `xi`/`xj`?
				bixiaAuthorUnsubmitted Done Reply Inline Actions We use x0, x1, to refer to the indices for dim0, dim1. So, I call them valuei, value j and vi, vj for short. I use the same naming in the integration test. bixia: We use x0, x1, to refer to the indices for dim0, dim1. So, I call them valuei, value j and vi…
				Value vi = builder.create<memref::LoadOp>(loc, x, i);
				Value vj = builder.create<memref::LoadOp>(loc, x, j);

				Value cond =
				builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::ult, vi, vj);
				scf::IfOp ifOp =
				builder.create<scf::IfOp>(loc, f.getType(), cond, /else=/true);
				// If (x[i] < x[j]).
				builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
				builder.create<scf::YieldOp>(loc, t);

				builder.setInsertionPointToStart(&ifOp.getElseRegion().front());
				if (isLastDim == 1) {
				// Finish checking all dimensions.
				builder.create<scf::YieldOp>(loc, f);
				} else {
				cond =
				builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::ult, vj, vi);
				scf::IfOp ifOp2 =
				builder.create<scf::IfOp>(loc, f.getType(), cond, /else=/true);
				// Otherwise if (x[j] < x[i]).
				builder.setInsertionPointToStart(&ifOp2.getThenRegion().front());
				builder.create<scf::YieldOp>(loc, f);

				// Otherwise check the remaining dimensions.
				builder.setInsertionPointAfter(ifOp2);
				builder.create<scf::YieldOp>(loc, ifOp2.getResult(0));
				// Set up the insertion point for the nested if-stmt that checks the
				// remaining dimensions.
				builder.setInsertionPointToStart(&ifOp2.getElseRegion().front());
				}

				return ifOp;
				}

				/// Creates a function to compare the xs values in index i and j for all the
				/// dimensions. The function returns true iff xs[i] < xs[j].
				//
				// The generate IR corresponds to this C like algorithm:
				// if (x0[i] < x0[j])
				// return true;
				// else if (x0[j] < x0[i])
				wrengrUnsubmitted Done Reply Inline Actions The indentation here doesn't match the preceding `if`/`else if` wrengr: The indentation here doesn't match the preceding `if`/`else if`
				// return false;
				wrengrUnsubmitted Done Reply Inline Actions should be `x1` wrengr: should be `x1`
				// else
				// if (x1[i] < x1[j])
				// return true;
				wrengrUnsubmitted Not Done Reply Inline Actions As I mentioned on the comment thread for the previous version, I don't mind the C++ function being recursive. If you're going to unroll the MLIR-loop anyways, then I think the recursive C++-function was cleaner/easier to read. But since you've already made the change, it's up to you which implementation to go with. FWIW, I only just noticed that the loop is over a `ValueRange` rather than over a memref or some other runtime collection. So there isn't any clean way to convert this into an MLIR-loop like I'd been suggesting. Since the number of dimensions is generally going to be small, the unrolled loop is surely better than allocating and initializing a collection just to loop over it. Though if do we notice code bloat becoming an issue, then at that point in time we could always switch over to allocating such a collection (since the same collection would be reused for all the different loops throughout the sorting algorithm, so the cost of constructing it can be amortized). wrengr: As I mentioned on the comment thread for the previous version, I don't mind the C++ function…
				bixiaAuthorUnsubmitted Done Reply Inline Actions Acknowledge. bixia: Acknowledge.
				// else if (x1[j] < x1[i]))
				// and so on ...
				static void createLessThanFunc(OpBuilder &builder, ModuleOp unused,
				func::FuncOp func, size_t dim) {
				OpBuilder::InsertionGuard insertionGuard(builder);

				Block *entryBlock = func.addEntryBlock();
				builder.setInsertionPointToStart(entryBlock);
				Location loc = func.getLoc();
				ValueRange args = entryBlock->getArguments();

				scf::IfOp topIfOp;
				for (const auto &item : llvm::enumerate(args.slice(xStartIdx, dim))) {
				scf::IfOp ifOp =
				createLessThanCompare(builder, loc, args[0], args[1], item.value(),
				(item.index() == dim - 1));
				if (item.index() == 0) {
				topIfOp = ifOp;
				} else {
				OpBuilder::InsertionGuard insertionGuard(builder);
				builder.setInsertionPointAfter(ifOp);
				builder.create<scf::YieldOp>(loc, ifOp.getResult(0));
				}
				}

				builder.setInsertionPointAfter(topIfOp);
				builder.create<func::ReturnOp>(loc, topIfOp.getResult(0));
				}

				/// Creates a function to perform quick sort partition on the values in the
				/// range of index [lo, hi), assuming lo < hi.
				//
				// The generated IR corresponds to this C like algorithm:
				// int partition(lo, hi, data) {
				// pivot = data[hi - 1];
				// i = (lo – 1) // RHS of the pivot found so far.
				// for (j = lo; j < hi - 1; j++){
				// if (data[j] < pivot){
				// i++;
				// swap data[i] and data[j]
				// }
				// }
				// i++
				// swap data[i] and data[hi-1])
				// return i
				// }
				static void createPartitionFunc(OpBuilder &builder, ModuleOp module,
				func::FuncOp func, size_t dim) {
				OpBuilder::InsertionGuard insertionGuard(builder);

				Block *entryBlock = func.addEntryBlock();
				builder.setInsertionPointToStart(entryBlock);

				MLIRContext *context = module.getContext();
				Location loc = func.getLoc();
				ValueRange args = entryBlock->getArguments();
				Value lo = args[loIdx];
				Value c1 = constantIndex(builder, loc, 1);
				Value i = builder.create<arith::SubIOp>(loc, lo, c1);
				Value him1 = builder.create<arith::SubIOp>(loc, args[hiIdx], c1);
				scf::ForOp forOp =
				builder.create<scf::ForOp>(loc, lo, him1, c1, ValueRange{i});

				// Start the for-stmt body.
				builder.setInsertionPointToStart(forOp.getBody());
				Value j = forOp.getInductionVar();
				SmallVector<Value, 6> compareOperands{j, him1};
				ValueRange xs = args.slice(xStartIdx, dim);
				wrengrUnsubmitted Done Reply Inline Actions why not `boolType` or `i1Type`? wrengr: why not `boolType` or `i1Type`?
				compareOperands.append(xs.begin(), xs.end());
				Type i1Type = IntegerType::get(context, 1, IntegerType::Signless);
				FlatSymbolRefAttr lessThanFunc =
				getMangledSortHelperFunc(builder, func, {i1Type}, "_sparse_less_than_",
				dim, compareOperands, createLessThanFunc);
				Value cond = builder
				.create<func::CallOp>(loc, lessThanFunc, TypeRange{i1Type},
				compareOperands)
				.getResult(0);
				scf::IfOp ifOp =
				builder.create<scf::IfOp>(loc, i.getType(), cond, /else=/true);

				// The if-stmt true branch: i++; swap(data[i], data[j]); yield i.
				builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
				Value i1 =
				builder.create<arith::AddIOp>(loc, forOp.getRegionIterArgs().front(), c1);
				SmallVector<Value, 6> swapOperands{i1, j};
				swapOperands.append(args.begin() + xStartIdx, args.end());
				FlatSymbolRefAttr swapFunc =
				getMangledSortHelperFunc(builder, func, TypeRange(), "_sparse_may_swap_",
				dim, swapOperands, createMaySwapFunc);
				builder.create<func::CallOp>(loc, swapFunc, TypeRange(), swapOperands);
				builder.create<scf::YieldOp>(loc, i1);

				// The if-stmt false branch: yield i.
				builder.setInsertionPointToStart(&ifOp.getElseRegion().front());
				builder.create<scf::YieldOp>(loc, forOp.getRegionIterArgs().front());

				// After the if-stmt, yield the updated i value to end the for-stmt body.
				builder.setInsertionPointAfter(ifOp);
				builder.create<scf::YieldOp>(loc, ifOp.getResult(0));

				// After the for-stmt: i++; swap(data[i], data[him1]); return i.
				builder.setInsertionPointAfter(forOp);
				i1 = builder.create<arith::AddIOp>(loc, forOp.getResult(0), c1);
				swapOperands[0] = i1;
				swapOperands[1] = him1;
				builder.create<func::CallOp>(loc, swapFunc, TypeRange(), swapOperands);
				builder.create<func::ReturnOp>(loc, i1);
				}

				/// Creates a function to perform quick sort on the value in the range of
				/// index [lo, hi).
				//
				// The generate IR corresponds to this C like algorithm:
				// void quickSort(lo, hi, data) {
				// if (lo < hi) {
				// p = partition(low, high, data);
				// quickSort(lo, p, data);
				// quickSort(p + 1, hi, data);
				wrengrUnsubmitted Not Done Reply Inline Actions I haven't checked, but is the current implementation a stable-sort? If so, then should add that to the documentation. If not, then should update the partitioning to make it stable (and then document that fact). wrengr: I haven't checked, but is the current implementation a stable-sort? If so, then should add that…
				bixiaAuthorUnsubmitted Done Reply Inline Actions We can't make quick sort stable without using extra storage. Per offline discussion, we will add an attribute for requesting a stable sort and implement another algorithm for that. bixia: We can't make quick sort stable without using extra storage. Per offline discussion, we will…
				// }
				// }
				static void createSortFunc(OpBuilder &builder, ModuleOp module,
				func::FuncOp func, size_t dim) {
				OpBuilder::InsertionGuard insertionGuard(builder);
				Block *entryBlock = func.addEntryBlock();
				builder.setInsertionPointToStart(entryBlock);

				MLIRContext *context = module.getContext();
				Location loc = func.getLoc();
				ValueRange args = entryBlock->getArguments();
				Value lo = args[loIdx];
				Value hi = args[hiIdx];
				Value cond =
				builder.create<arith::CmpIOp>(loc, arith::CmpIPredicate::ult, lo, hi);
				scf::IfOp ifOp = builder.create<scf::IfOp>(loc, cond, /else=/false);

				// The if-stmt true branch.
				builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
				FlatSymbolRefAttr partitionFunc = getMangledSortHelperFunc(
				builder, func, {IndexType::get(context)}, "_sparse_partition_", dim, args,
				createPartitionFunc);
				auto p = builder.create<func::CallOp>(
				loc, partitionFunc, TypeRange{IndexType::get(context)}, ValueRange(args));

				SmallVector<Value, 6> lowOperands{lo, p.getResult(0)};
				lowOperands.append(args.begin() + xStartIdx, args.end());
				builder.create<func::CallOp>(loc, func, lowOperands);

				SmallVector<Value, 6> highOperands{
				builder.create<arith::AddIOp>(loc, p.getResult(0),
				constantIndex(builder, loc, 1)),
				hi};
				highOperands.append(args.begin() + xStartIdx, args.end());
				builder.create<func::CallOp>(loc, func, highOperands);

				// After the if-stmt.
				builder.setInsertionPointAfter(ifOp);
				builder.create<func::ReturnOp>(loc);
				}

				//===---------------------------------------------------------------------===//
				// The actual sparse buffer rewriting rules.
				//===---------------------------------------------------------------------===//

				namespace {

				/// Sparse rewriting rule for the sort operator.
				struct SortRewriter : public OpRewritePattern<SortOp> {
				public:
				using OpRewritePattern<SortOp>::OpRewritePattern;

				LogicalResult matchAndRewrite(SortOp op,
				PatternRewriter &rewriter) const override {
				Location loc = op.getLoc();
				SmallVector<Value, 6> operands{constantIndex(rewriter, loc, 0), op.getN()};

				// Convert `values` to have dynamic shape and append them to `operands`.
				auto addValues = [&](ValueRange values) {
				for (Value v : values) {
				auto mtp = v.getType().cast<MemRefType>();
				if (!mtp.isDynamicDim(0)) {
				auto new_mtp =
				MemRefType::get({ShapedType::kDynamicSize}, mtp.getElementType());
				v = rewriter.create<memref::CastOp>(loc, new_mtp, v);
				}
				operands.push_back(v);
				}
				};
				wrengrUnsubmitted Not Done Reply Inline Actions I think I missed it in the chat but, what's the reason for wanting to cast things to dynamic shapes? wrengr: I think I missed it in the chat but, what's the reason for wanting to cast things to dynamic…
				bixiaAuthorUnsubmitted Done Reply Inline Actions Reusing the same MLIR routine for different static shapes with the same element types. bixia: Reusing the same MLIR routine for different static shapes with the same element types.
				ValueRange xs = op.getXs();
				addValues(xs);
				addValues(op.getYs());
				auto insertPoint = op->getParentOfType<func::FuncOp>();
				FlatSymbolRefAttr func = getMangledSortHelperFunc(
				rewriter, insertPoint, TypeRange(), "_sparse_sort_", xs.size(),
				operands, createSortFunc);
				rewriter.replaceOpWithNewOp<func::CallOp>(op, func, TypeRange(), operands);
				return success();
				}
				};

				} // namespace

				//===---------------------------------------------------------------------===//
				// Methods that add patterns described in this file to a pattern list.
				//===---------------------------------------------------------------------===//

				void mlir::populateSparseBufferRewriting(RewritePatternSet &patterns) {
				patterns.add<SortRewriter>(patterns.getContext());
				}

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorPasses.cpp

Show All 18 Lines
#include "mlir/Dialect/SparseTensor/Transforms/Passes.h"		#include "mlir/Dialect/SparseTensor/Transforms/Passes.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"		#include "mlir/Dialect/Tensor/IR/Tensor.h"
#include "mlir/Transforms/GreedyPatternRewriteDriver.h"		#include "mlir/Transforms/GreedyPatternRewriteDriver.h"

namespace mlir {		namespace mlir {
#define GEN_PASS_DEF_SPARSIFICATIONPASS		#define GEN_PASS_DEF_SPARSIFICATIONPASS
#define GEN_PASS_DEF_SPARSETENSORCONVERSIONPASS		#define GEN_PASS_DEF_SPARSETENSORCONVERSIONPASS
#define GEN_PASS_DEF_SPARSETENSORCODEGEN		#define GEN_PASS_DEF_SPARSETENSORCODEGEN
		#define GEN_PASS_DEF_SPARSEBUFFERREWRITE
#include "mlir/Dialect/SparseTensor/Transforms/Passes.h.inc"		#include "mlir/Dialect/SparseTensor/Transforms/Passes.h.inc"
} // namespace mlir		} // namespace mlir

using namespace mlir;		using namespace mlir;
using namespace mlir::sparse_tensor;		using namespace mlir::sparse_tensor;

namespace {		namespace {

▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	struct SparseTensorCodegenPass

void runOnOperation() override {		void runOnOperation() override {
auto *ctx = &getContext();		auto *ctx = &getContext();
RewritePatternSet patterns(ctx);		RewritePatternSet patterns(ctx);
SparseTensorTypeToBufferConverter converter;		SparseTensorTypeToBufferConverter converter;
ConversionTarget target(*ctx);		ConversionTarget target(*ctx);
// Most ops in the sparse dialect must go!		// Most ops in the sparse dialect must go!
target.addIllegalDialect<SparseTensorDialect>();		target.addIllegalDialect<SparseTensorDialect>();
target.addLegalOp<SortOp>();		target.addLegalOp<SortOp>();
		aartbikUnsubmitted Not Done Reply Inline Actions This should be in already (rebase with main), also comment on L160 changed ;-) aartbik: This should be in already (rebase with main), also comment on L160 changed ;-)
// All dynamic rules below accept new function, call, return, and various		// All dynamic rules below accept new function, call, return, and various
// tensor and bufferization operations as legal output of the rewriting		// tensor and bufferization operations as legal output of the rewriting
// provided that all sparse tensor types have been fully rewritten.		// provided that all sparse tensor types have been fully rewritten.
target.addDynamicallyLegalOp<func::FuncOp>([&](func::FuncOp op) {		target.addDynamicallyLegalOp<func::FuncOp>([&](func::FuncOp op) {
return converter.isSignatureLegal(op.getFunctionType());		return converter.isSignatureLegal(op.getFunctionType());
});		});
target.addDynamicallyLegalOp<func::CallOp>([&](func::CallOp op) {		target.addDynamicallyLegalOp<func::CallOp>([&](func::CallOp op) {
return converter.isSignatureLegal(op.getCalleeType());		return converter.isSignatureLegal(op.getCalleeType());
Show All 23 Lines	scf::populateSCFStructuralTypeConversionsAndLegality(converter, patterns,
target);		target);
populateSparseTensorCodegenPatterns(converter, patterns);		populateSparseTensorCodegenPatterns(converter, patterns);
if (failed(applyPartialConversion(getOperation(), target,		if (failed(applyPartialConversion(getOperation(), target,
std::move(patterns))))		std::move(patterns))))
signalPassFailure();		signalPassFailure();
}		}
};		};

		struct SparseBufferRewritePass
		: public impl::SparseBufferRewriteBase<SparseBufferRewritePass> {

		SparseBufferRewritePass() = default;
		SparseBufferRewritePass(const SparseBufferRewritePass &pass) = default;

		void runOnOperation() override {
		auto *ctx = &getContext();
		RewritePatternSet patterns(ctx);
		populateSparseBufferRewriting(patterns);
		(void)applyPatternsAndFoldGreedily(getOperation(), std::move(patterns));
		}
		};

} // namespace		} // namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Strategy flag methods.		// Strategy flag methods.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

SparseToSparseConversionStrategy		SparseToSparseConversionStrategy
mlir::sparseToSparseConversionStrategy(int32_t flag) {		mlir::sparseToSparseConversionStrategy(int32_t flag) {
Show All 27 Lines
std::unique_ptr<Pass> mlir::createSparseTensorConversionPass(		std::unique_ptr<Pass> mlir::createSparseTensorConversionPass(
const SparseTensorConversionOptions &options) {		const SparseTensorConversionOptions &options) {
return std::make_unique<SparseTensorConversionPass>(options);		return std::make_unique<SparseTensorConversionPass>(options);
}		}

std::unique_ptr<Pass> mlir::createSparseTensorCodegenPass() {		std::unique_ptr<Pass> mlir::createSparseTensorCodegenPass() {
return std::make_unique<SparseTensorCodegenPass>();		return std::make_unique<SparseTensorCodegenPass>();
}		}

		std::unique_ptr<Pass> mlir::createSparseBufferRewritePass() {
		return std::make_unique<SparseBufferRewritePass>();
		}

mlir/test/Dialect/SparseTensor/buffer_rewriting.mlir

This file was added.

				// RUN: mlir-opt %s --sparse-buffer-rewrite --canonicalize --cse \| FileCheck %s

				// CHECK-LABEL: func.func private @_sparse_less_than_1_i8(
				// CHECK-SAME: %[[I:arg0]]: index,
				// CHECK-SAME: %[[J:.*]]: index,
				// CHECK-SAME: %[[X0:.*]]: memref<?xi8>) -> i1 {
				// CHECK: %[[VI:.*]] = memref.load %[[X0]]{{\[}}%[[I]]]
				aartbikUnsubmitted Done Reply Inline Actions do you want to add some CHECKing of the generated code? I realize that being too pattern specific may be brittle, but you could perhaps test for some basic structure? aartbik: do you want to add some CHECKing of the generated code? I realize that being too pattern…
				// CHECK: %[[VJ:.*]] = memref.load %[[X0]]{{\[}}%[[J]]]
				// CHECK: %[[C:.*]] = arith.cmpi ult, %[[VI]], %[[VJ]]
				// CHECK: return %[[C]]
				// CHECK: }

				// CHECK-LABEL: func.func private @_sparse_may_swap_1_i8_f32_index(
				// CHECK-SAME: %[[I:arg0]]: index,
				// CHECK-SAME: %[[J:.*]]: index,
				// CHECK-SAME: %[[X0:.*]]: memref<?xi8>,
				// CHECK-SAME: %[[Y0:.*]]: memref<?xf32>,
				// CHECK-SAME: %[[Y1:.*]]: memref<?xindex>) {
				// CHECK: %[[C:.*]] = arith.cmpi ne, %[[I]], %[[J]]
				// CHECK: scf.if %[[C]] {
				// CHECK: %[[Vx0i:.*]] = memref.load %[[X0]]{{\[}}%[[I]]]
				// CHECK: %[[Vx0j:.*]] = memref.load %[[X0]]{{\[}}%[[J]]]
				// CHECK: memref.store %[[Vx0j]], %[[X0]]{{\[}}%[[I]]]
				// CHECK: memref.store %[[Vx0i]], %[[X0]]{{\[}}%[[J]]]
				// CHECK: %[[Vy0i:.*]] = memref.load %[[Y0]]{{\[}}%[[I]]]
				// CHECK: %[[Vy0j:.*]] = memref.load %[[Y0]]{{\[}}%[[J]]]
				// CHECK: memref.store %[[Vy0j]], %[[Y0]]{{\[}}%[[I]]]
				// CHECK: memref.store %[[Vy0i]], %[[Y0]]{{\[}}%[[J]]]
				// CHECK: %[[Vy1i:.*]] = memref.load %[[Y1]]{{\[}}%[[I]]]
				// CHECK: %[[Vy1j:.*]] = memref.load %[[Y1]]{{\[}}%[[J]]]
				// CHECK: memref.store %[[Vy1j]], %[[Y1]]{{\[}}%[[I]]]
				// CHECK: memref.store %[[Vy1i]], %[[Y1]]{{\[}}%[[J]]]
				// CHECK: }
				// CHECK: return
				// CHECK: }

				// CHECK-LABEL: func.func private @_sparse_partition_1_i8_f32_index(
				// CHECK-SAME: %[[L:arg0]]: index,
				// CHECK-SAME: %[[H:.*]]: index,
				// CHECK-SAME: %[[X0:.*]]: memref<?xi8>,
				// CHECK-SAME: %[[Y0:.*]]: memref<?xf32>,
				// CHECK-SAME: %[[Y1:.*]]: memref<?xindex>) -> index {
				// CHECK: %[[C1:.*]] = arith.constant 1
				// CHECK: %[[I:.*]] = arith.subi %[[L]], %[[C1]]
				// CHECK: %[[Hm1:.*]] = arith.subi %[[H]], %[[C1]]
				// CHECK: %[[I3:.]] = scf.for %[[J:.]] = %[[L]] to %[[Hm1]] step %[[C1]] iter_args(%[[I2:.*]] = %[[I]]) -> (index) {
				// CHECK: %[[COND:.*]] = func.call @_sparse_less_than_1_i8(%[[J]], %[[Hm1]], %[[X0]])
				// CHECK: %[[IF:.*]] = scf.if %[[COND]] -> (index) {
				// CHECK: %[[Ip1:.*]] = arith.addi %[[I2]], %[[C1]]
				// CHECK: func.call @_sparse_may_swap_1_i8_f32_index(%[[Ip1]], %[[J]], %[[X0]], %[[Y0]], %[[Y1]])
				// CHECK: scf.yield %[[Ip1]]
				// CHECK: } else {
				// CHECK: scf.yield %[[I2]]
				// CHECK: }
				// CHECK: scf.yield %[[IF:.*]]
				// CHECK: }
				// CHECK: %[[I3p1:.]] = arith.addi %[[I3:.]], %[[C1]] : index
				// CHECK: call @_sparse_may_swap_1_i8_f32_index(%[[I3p1]], %[[Hm1]], %[[X0]], %[[Y0]], %[[Y1]])
				// CHECK: return %[[I3p1]]
				// CHECK: }

				// CHECK-LABEL: func.func private @_sparse_sort_1_i8_f32_index(
				// CHECK-SAME: %[[L:arg0]]: index,
				// CHECK-SAME: %[[H:.*]]: index,
				// CHECK-SAME: %[[X0:.*]]: memref<?xi8>,
				// CHECK-SAME: %[[Y0:.*]]: memref<?xf32>,
				// CHECK-SAME: %[[Y1:.*]]: memref<?xindex>) {
				// CHECK: %[[C1:.*]] = arith.constant 1
				// CHECK: %[[COND:.*]] = arith.cmpi ult, %[[L]], %[[H]]
				// CHECK: scf.if %[[COND]] {
				// CHECK: %[[P:.*]] = func.call @_sparse_partition_1_i8_f32_index(%[[L]], %[[H]], %[[X0]], %[[Y0]], %[[Y1]])
				// CHECK: func.call @_sparse_sort_1_i8_f32_index(%[[L]], %[[P]], %[[X0]], %[[Y0]], %[[Y1]])
				// CHECK: %[[P2:.*]] = arith.addi %[[P]], %[[C1]] : index
				// CHECK: func.call @_sparse_sort_1_i8_f32_index(%[[P2]], %[[H]], %[[X0]], %[[Y0]], %[[Y1]])
				// CHECK: }
				// CHECK: return
				// CHECK: }

				// CHECK-LABEL: func.func @sparse_sort_1d2v(
				// CHECK-SAME: %[[N:.*]]: index,
				// CHECK-SAME: %[[X0:.*]]: memref<10xi8>,
				// CHECK-SAME: %[[Y0:.*]]: memref<?xf32>,
				// CHECK-SAME: %[[Y1:.*]]: memref<10xindex>) -> (memref<10xi8>, memref<?xf32>, memref<10xindex>) {
				// CHECK: %[[C0:.*]] = arith.constant 0
				// CHECK: %[[DX0:.*]] = memref.cast %[[X0]] : memref<10xi8> to memref<?xi8>
				// CHECK: %[[DY1:.*]] = memref.cast %[[Y1]] : memref<10xindex> to memref<?xindex>
				// CHECK: call @_sparse_sort_1_i8_f32_index(%[[C0]], %[[N]], %[[DX0]], %[[Y0]], %[[DY1]])
				// CHECK: return %[[X0]], %[[Y0]], %[[Y1]]
				// CHECK: }
				func.func @sparse_sort_1d2v(%arg0: index, %arg1: memref<10xi8>, %arg2: memref<?xf32>, %arg3: memref<10xindex>)
				-> (memref<10xi8>, memref<?xf32>, memref<10xindex>) {
				sparse_tensor.sort %arg0, %arg1 jointly %arg2, %arg3 : memref<10xi8> jointly memref<?xf32>, memref<10xindex>
				return %arg1, %arg2, %arg3 : memref<10xi8>, memref<?xf32>, memref<10xindex>
				}

				// Only check the generated supporting function now. We have integration test
				// to verify correctness of the generated code.
				//
				// CHECK-DAG: func.func private @_sparse_less_than_3_index(%arg0: index, %arg1: index, %arg2: memref<?xindex>, %arg3: memref<?xindex>, %arg4: memref<?xindex>) -> i1 {
				// CHECK-DAG: func.func private @_sparse_may_swap_3_index(%arg0: index, %arg1: index, %arg2: memref<?xindex>, %arg3: memref<?xindex>, %arg4: memref<?xindex>) {
				// CHECK-DAG: func.func private @_sparse_partition_3_index(%arg0: index, %arg1: index, %arg2: memref<?xindex>, %arg3: memref<?xindex>, %arg4: memref<?xindex>) -> index {
				// CHECK-DAG: func.func private @_sparse_sort_3_index(%arg0: index, %arg1: index, %arg2: memref<?xindex>, %arg3: memref<?xindex>, %arg4: memref<?xindex>) {
				// CHECK-LABEL: func.func @sparse_sort_3d
				func.func @sparse_sort_3d(%arg0: index, %arg1: memref<10xindex>, %arg2: memref<?xindex>, %arg3: memref<10xindex>) -> (memref<10xindex>, memref<?xindex>, memref<10xindex>) {
				sparse_tensor.sort %arg0, %arg1, %arg2, %arg3 : memref<10xindex>, memref<?xindex>, memref<10xindex>
				return %arg1, %arg2, %arg3 : memref<10xindex>, memref<?xindex>, memref<10xindex>
				}

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_rewrite_sort.mlir

This file was added.

				// RUN: mlir-opt %s --sparse-compiler=enable-runtime-library=false \| \
				// RUN: mlir-cpu-runner \
				// RUN: -e entry -entry-point-result=void \
				// RUN: -shared-libs=%mlir_lib_dir/libmlir_c_runner_utils%shlibext \| \
				// RUN: FileCheck %s

				module {
				// Stores 5 values to the memref buffer.
				func.func @storeValuesTo(%b: memref<?xi32>, %v0: i32, %v1: i32, %v2: i32,
				%v3: i32, %v4: i32) -> () {
				%i0 = arith.constant 0 : index
				%i1 = arith.constant 1 : index
				%i2 = arith.constant 2 : index
				%i3 = arith.constant 3 : index
				%i4 = arith.constant 4 : index
				memref.store %v0, %b[%i0] : memref<?xi32>
				memref.store %v1, %b[%i1] : memref<?xi32>
				memref.store %v2, %b[%i2] : memref<?xi32>
				memref.store %v3, %b[%i3] : memref<?xi32>
				memref.store %v4, %b[%i4] : memref<?xi32>
				return
				}

				// The main driver.
				func.func @entry() {
				%c0 = arith.constant 0 : i32
				%c1 = arith.constant 1 : i32
				%c2 = arith.constant 2 : i32
				%c3 = arith.constant 3 : i32
				%c4 = arith.constant 4 : i32
				%c5 = arith.constant 5 : i32
				%c6 = arith.constant 6 : i32
				%c7 = arith.constant 7 : i32
				%c8 = arith.constant 8 : i32
				%c9 = arith.constant 9 : i32
				%c10 = arith.constant 10 : i32
				%c100 = arith.constant 100 : i32

				%i0 = arith.constant 0 : index
				%i4 = arith.constant 4 : index
				%i5 = arith.constant 5 : index

				// Prepare a buffer.
				%x0s = memref.alloc() : memref<5xi32>
				%x0 = memref.cast %x0s : memref<5xi32> to memref<?xi32>
				call @storeValuesTo(%x0, %c10, %c2, %c0, %c5, %c1)
				: (memref<?xi32>, i32, i32, i32, i32, i32) -> ()

				// Sort 0 elements.
				// CHECK: ( 10, 2, 0, 5, 1 )
				sparse_tensor.sort %i0, %x0 : memref<?xi32>
				%x0v0 = vector.transfer_read %x0[%i0], %c100: memref<?xi32>, vector<5xi32>
				vector.print %x0v0 : vector<5xi32>

				// Sort the first 4 elements, with the last valid value untouched.
				// CHECK: ( 0, 2, 5, 10, 1 )
				sparse_tensor.sort %i4, %x0 : memref<?xi32>
				%x0v1 = vector.transfer_read %x0[%i0], %c100: memref<?xi32>, vector<5xi32>
				vector.print %x0v1 : vector<5xi32>

				// Prepare more buffers of different dimensions.
				%x1s = memref.alloc() : memref<10xi32>
				%x1 = memref.cast %x1s : memref<10xi32> to memref<?xi32>
				%x2s = memref.alloc() : memref<6xi32>
				%x2 = memref.cast %x2s : memref<6xi32> to memref<?xi32>
				%y0s = memref.alloc() : memref<7xi32>
				%y0 = memref.cast %y0s : memref<7xi32> to memref<?xi32>
				call @storeValuesTo(%x0, %c10, %c2, %c0, %c5, %c1)
				: (memref<?xi32>, i32, i32, i32, i32, i32) -> ()
				call @storeValuesTo(%x1, %c1, %c1, %c3, %c10, %c3)
				: (memref<?xi32>, i32, i32, i32, i32, i32) -> ()
				call @storeValuesTo(%x2, %c2, %c4, %c4, %c7, %c9)
				: (memref<?xi32>, i32, i32, i32, i32, i32) -> ()
				call @storeValuesTo(%y0, %c6, %c10, %c8, %c9, %c7)
				: (memref<?xi32>, i32, i32, i32, i32, i32) -> ()

				// Sort "parallel arrays".
				// CHECK: ( 0, 1, 2, 5, 10 )
				// CHECK: ( 3, 3, 1, 10, 1 )
				// CHECK: ( 4, 9, 4, 7, 2 )
				// CHECK: ( 8, 7, 10, 9, 6 )
				sparse_tensor.sort %i5, %x0, %x1, %x2 jointly %y0
				: memref<?xi32>, memref<?xi32>, memref<?xi32> jointly memref<?xi32>
				%x0v2 = vector.transfer_read %x0[%i0], %c100: memref<?xi32>, vector<5xi32>
				vector.print %x0v2 : vector<5xi32>
				%x1v = vector.transfer_read %x1[%i0], %c100: memref<?xi32>, vector<5xi32>
				vector.print %x1v : vector<5xi32>
				%x2v = vector.transfer_read %x2[%i0], %c100: memref<?xi32>, vector<5xi32>
				vector.print %x2v : vector<5xi32>
				%y0v = vector.transfer_read %y0[%i0], %c100: memref<?xi32>, vector<5xi32>
				vector.print %y0v : vector<5xi32>

				// Release the buffers.
				memref.dealloc %x0 : memref<?xi32>
				memref.dealloc %x1 : memref<?xi32>
				memref.dealloc %x2 : memref<?xi32>
				memref.dealloc %y0 : memref<?xi32>
				return
				}
				}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] Add rewrite rule for the sort operator.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 463977

mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.h

mlir/include/mlir/Dialect/SparseTensor/Transforms/Passes.td

mlir/lib/Dialect/SparseTensor/Pipelines/SparseTensorPipelines.cpp

mlir/lib/Dialect/SparseTensor/Transforms/CMakeLists.txt

mlir/lib/Dialect/SparseTensor/Transforms/SparseBufferRewriting.cpp

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorPasses.cpp

mlir/test/Dialect/SparseTensor/buffer_rewriting.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_rewrite_sort.mlir

[mlir][sparse] Add rewrite rule for the sort operator.
ClosedPublic