This is an archive of the discontinued LLVM Phabricator instance.

mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
177–179	Maybe we can simply say "The output tensor and all input tensors must be of the same rank."?
179–180	How about this? The concatenation happens on the specified `dimension`. The result `dimension` size is the sum of all the input `dimension` sizes, while all the other dimensions should have the same size in the input and output tensors.
mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorConversion.cpp
147	Please add documentation for this routine, similar to the surrounding routines.
155	Please add an empty line after the routine, similar to the surrounding routines.
181–182	s/use/using/ Please add a period at the end of the sentence, that is, s/fine/fine./.
204	NIT: s/sum/Sum/ s/sizes/sizes./
626–627	Why don't we just do this simplification? Similar simplification can be done for genDenseX.
662–663	I think these two lines describe something that is done inside bodyBuilder, isn't it?
669	s/finish/Finish/ s/loop/loop./
672	s/free/Free/ s/iterator/iterator./
681–687	s/iterate/iterates/ I would move this comment out as a document for the whole function.
1106	Unintentional change?
1301	Please add a period to the end.
1390–1391	Maybe we can just delete this comment line, otherwise, s/1st./First/
1404	What is this comment for?
1435	Please add a period at the end.
mlir/test/Integration/Dialect/SparseTensor/CPU/concatenate.mlir
26	Very comprehensive testing! Though, I am not sure what we really want to test all these 16 combination (SS, SD) x (S, D) x (0, 1) x (NP, P), plus a test with dynamic shape. I would test these four cases (SS, SD) x (S, D), for which I would make sure some use dimension 0 , dimension 1, NonPermute, Permute, and a dynamic shape case.
31	s/Concat/Concats/ There are a few similar places below.
45	What is "mix types"? Do you mean "mix sparse and dense matrix"?

Peiming marked an inline comment as done.Jul 22 2022, 9:09 AM

Peiming added inline comments.

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorConversion.cpp
626–627	Because all other functions in the file now only accept an `Operation *`, I would need to change them all to make it work. So I decided that maybe it is better to split them into two CLs.

Address comments from bixia

Fix grammar issue in comments

Peiming added inline comments.Jul 22 2022, 9:49 AM

mlir/test/Integration/Dialect/SparseTensor/CPU/concatenate.mlir
26	I am not sure about it. Isn't it always good to have more test cases? I was trying to test not only sparse/dense but also different sparse encodings. (Although the codegen for different types of sparse tensors is the same, the runtime library is implemented differently)

Peiming marked 15 inline comments as done.Jul 22 2022, 9:52 AM

Peiming marked 2 inline comments as done.

Harbormaster completed remote builds in B177040: Diff 446879.Jul 22 2022, 10:07 AM

wrengr added inline comments.Jul 25 2022, 1:13 PM

mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
177–179	You should add a new trait to verify this requirement. For an example of how to do so, see `def SameOperandsAndResultShape` in `OpBase.td`, `template class SameOperandsAndResultShape` in `include/mlir/IR/OpDefinition.h`, and `OpTrait::impl::verifySameOperandsAndResultShape` in `lib/IR/Operation.cpp`. The trait you'll define is basically the same, just differing on the specified `$dimension` This is important not just for catching errors in user programs, but the same trait will be desired for the analogous dense-tensor op (if it doesn't already exist)
mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorConversion.cpp
148	spelling
156	This wasn't in the original `sizesFromPtr` function, since generally we only need to construct an array of the dynamic sizes (since static sizes are already noted in the types). So why the change of the semantics for `sizesFromPtr`? And is that change strictly necessary? I would much rather have the implementation of `concatSizesFromInputs` be adjusted to deal with static dimensions directly, since there are many other users of `sizesFromPtr`
181	I think it would be cleaner to float this conditional out of the loop. The `for(i=0;i<dim;++i)` and `for(i=dim+1;i<rank:++i)` loops would have the same loop body, but... If you combine what I said about `sizeFromPtr` only constructing the array of dynamic sizes, with what I said about never introducing runtime type checks, then the before/after "loops" are really quite trivial. If the output dimension is dynamic, then store one of the input dimensions (just pick any one, since we know they must all be static and equal); otherwise do nothing. It's only for the concatenated dimension that this function actually needs to do any work.
184–185	We should statically verify that all input tensors already have a compatible shape. The sparse_tensor dialect intentionally avoids all situations where we would be forced to introduce a dynamic/runtime check due to dynamic sizes. (That is, we are allowed to lose information by converting static dimensions into dynamic dimensions, because such information-loss can be performed at compile-time; but we may only gain information about dynamic shapes for the mere purpose of threading that information through to places that need an SSA-value but are entirely parametric in the actual numeric quantity, since any other usage could result in run-time type errors or difficult-to-diagnose performance issues.) Therefore, for all dimensions other than the one being concatenated: all inputs must be static and equal, since if any of them are dynamic then that would require introducing assertions that they match; and the output must be either static-and-equal or else be dynamic (since we're allowed to forget the static-and-equal knowledge after the fact). Whereas for the dimension being concatenated: if all inputs are static, then the output will be either static and equal to the sum, or be dynamic; whereas if any inputs are dynamic, then the output must also be dynamic (though we are permitted to compute the runtime summation-value for the purposes of threading through).
300–313	+1 for factoring this out :)

wrengr added inline comments.Jul 25 2022, 1:57 PM

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorConversion.cpp
425	Please document this function
427	Avoid meaningless variable names like `ret`. Instead, use the same names we use elsewhere for this sort of value: namely `ind` or `ivs` (which mean different things). In this case you want `ivs`. (In situations where you have multiple `ind` things or multiple `ivs` things, then they should be qualified to indicate what their meaning is in that context; e.g., srcInd, dstInd, targetInd, etc)
432	Style-wise, you should swap the order of the conjuncts: since the primary thing in question is whether the current dimension is the `concatDim`, and only after that do we care to avoid the `AddIOp` when there is no offset.
439	please document this function
444	This variable should be named `ivs` same as everywhere else in the code.
445	Ditto, re order of conjuncts.
445–447	Style guide says no braces here
453	Please retain the documentation that was here before
460–471	Is this really worth factoring out, rather than simply inlining? Is this even called anywhere?
626–627	+1 for just doing this, and +1 for splitting it out into a separate CL. Is that CL uploaded for review?
715	(1) This should be moved to be an actual `ConcatenateOp::verify` method, by doing `let hasVerifier = 1;` in the td file where the `ConcatenateOp` is defined. That way it is guaranteed to be called in the right places at the right time. (2) Most of the logic here should be factored out into a trait, as mentioned earlier.
716	This can be made `auto`; since the `cast` op fixes the type, and since it does so explicitly so there's no legibility benefit to repeating the type name again
718	should be `unsigned`. (I'm a huge advocate of using `auto` wherever possible, but doing so here only serves to obfuscate things rather than improving legibility)
739	No. As mentioned before, we intentionally avoid ever introducing dynamic checks
755–758	There's no need for the `isLegal` variable. Instead you should just directly `assert(allEqual && "...")` here. And once you refactor things to do a proper verifier, you'd just return the `LogicalResult` directly.

Peiming added inline comments.Jul 25 2022, 3:06 PM

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorConversion.cpp
156	It is actually in the original `sizeFromPtr` function (`sizes.push_back(constantIndex(builder, loc, shape[i]));`). I simply split the original function into two functions to avoid code repetition

Address some comments from Wren

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorConversion.cpp
460–471	Yes, it is previously used in sparse=>dense conversion, which 1st, use a pointer of COO to load the index vector and 2nd, insert the scalar into dense tensor. When concatenating a dense tensor to a dense tensor, we do not need to convert COO to index vector and we can directly insert the scalar into dense tensor. By factoring it out, we can reuse the common part (inserting scalar into dense vector using index vector).
716	Good piece of advice! Will follow it in the future!
739	Okay!

Address styling issues

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorConversion.cpp
626–627	No, I will submit it after current CL is accepted to avoid conflicts.

wrengr added inline comments.Jul 25 2022, 3:49 PM

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorConversion.cpp
156	Aha, the diff just got totally messed up so it was hard to see
460–471	In this CL, the only callsite I see for a function named `insertScalarIntoDenseTensor` is on line 1395, however that is actually a call to the function defined on lines 456–461. Prior to this CL the only call to a function of this name is on original line 770, which calls the function this CL defines on lines 466–474. However, since the function defined on lines 466–474 is only two lines long and is only ever called from one place, I don't see the value of defining it as a new function rather than inlining it at that the callsite on original line 770.

wrengr added inline comments.Jul 25 2022, 3:55 PM

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorConversion.cpp
181	Since I was mistaken about `sizesFromPtr` storing the static dimensions, the before/after loops aren't quite as I described in the above comment. The code actually becomes even simpler, since they'd just always push the constant index from one of the inputs, without needing to care whether the output is static or dynamic

Add verifier to concatenate operator

wrengr added inline comments.Jul 25 2022, 4:02 PM

mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
177–179	Fwiw, so long as you add the `let hasVerifier = 1;` to this op definition, and rephrase the `verifyConcatShape` function into the `ConcatenateOp::verify` method. I'd be fine with you doing all the trait stuff in a separate CL, to help break things up.

Peiming marked an inline comment as done.Jul 25 2022, 4:03 PM

Peiming added inline comments.

mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
177–179	I decided to add a verifier instead of a trait to verify the shape. Because There is no existing operator (that I am aware of) has the same verification rules. I do not think there will be many operators in the future that gonna to have the same verification rules

Peiming marked an inline comment as done.Jul 25 2022, 4:03 PM

code simplification

Peiming marked 8 inline comments as done.Jul 25 2022, 4:43 PM

Peiming added inline comments.

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorConversion.cpp
181	Done! It indeed makes the code much simpler!

revert unintended change

Harbormaster completed remote builds in B177503: Diff 447518.Jul 25 2022, 5:10 PM

fix formatting issue

Harbormaster completed remote builds in B177640: Diff 447733.Jul 26 2022, 9:43 AM

wrengr requested changes to this revision.Jul 27 2022, 12:14 PM

wrengr added inline comments.

mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
378 ↗	(On Diff #447733)	spelling
383 ↗	(On Diff #447733)	This is incorrect. If any of the inputs are dynamically sized in this dimension, then the output must also be dynamically sized! Consider the simple example of concatenating some `tensor<?xi64>` with `tensor<1x64>`. We cannot say the output has whatever size the user wants, because the size they pick could be both too large and too small. If the tensor that gets passed as the first argument actually has size 5, then the output must have size exactly equal to 6. What would we do if the user assumed the output had size 2? or 17? Moreover, this concatenate could be called multiple times (e.g., because it's in a function), and every single time it's called it can be called with different sized inputs, so there's no fixed size that's going to be correct for the output. If MLIR had a more sophisticated type system then the `?` runtime variable could be given some name like `?n`. In which case we could say that the output has size `?m` with the extra constraint that `?m == ?n + 1`; this will always be true regardless of what runtime value `?n` happens to take, and even if it takes several different values. Alas, MLIR cannot handle such sophisticated details; so in practice those variable names get erased. The output must still have dynamic size `?`, since the output size must still be equal to `? + 1` but that expression has no static/fixed value.
391–399 ↗	(On Diff #447733)	No, again. As I said before, if any (non-`concatDim`) input dimension has a dynamic size then the verifier must fail. For clarity of exposition, let's pretend again that the `?` runtime variables are given names. Now, consider the case of concatenating `tensor<?mx1xi64>` with `tensor<?nx2xi64>` along the second dimension. In order for that to be well-typed, we must require that `?m == ?n`; otherwise how would we handle a ragged concatenation? But there's no way to guarantee that constraint at compile-time, thus the verifier must reject this concatenation as ill-formed. The situation doesn't change when mixing static with dynamic. Concatenating `tensor<?mx1xi64>` with `tensor<3x2xi64>` on the second dimension would still require the constraint `?m == 3`, but that still cannot be guaranteed at compile-time.

This revision now requires changes to proceed.Jul 27 2022, 12:14 PM

wrengr added inline comments.Jul 27 2022, 12:53 PM

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorConversion.cpp
218	please rename this variable to clarify which type this is supposed to be exactly
233	Shouldn't call `.size()` method repeatedly in the condition; cf., the style guide.
234	please use a more meaningful name like `srcSize` or `srcSz`. Then the comment is unnecessary since the variable name is self-documenting Also, this declaration ought to be combined with the conditional that assigns to it; i.e., `Value srcSize = encSrc ? sizeFromPtrAtDim(...) : linalg::createOrFoldDimOp(...);`. Since there's no clarity gained from separating it out
242	I think the generated code would be easier to follow if the order of these two arguments was swapped. (Since currently this generates the summation in the reverse order from the input tensors)
245–248	It'd be clearer to do this first (i.e., `if (shape[dim] != ShapedType::kDynamicSize) { sizes[dim] = constantIndex(builder, loc, shape[dim]); return; }`) since this case is very short whereas the case for dynamic sizes is long.
425	This would be better named `loadIndices`
428	Would be better named `offsetDim`, since this is the dimension where the `offset` is applied (regardless of why the caller wants to apply that offset).
456	And this would be better named `storeIndices`
458	ditto re renaming to `offsetDim`

Address comments from Wren

mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
383 ↗	(On Diff #447733)	Make sense.
391–399 ↗	(On Diff #447733)	What about concating `tensor<?x?xi64>` and `tensor<?x?xi64>`, wouldn't it be too strict? We can explicitly say that programmers should guarantee the shaping rules when using dynamic-sized tensor, otherwise it is undefined behavior. Similar to https://mlir.llvm.org/docs/Dialects/MemRef/#memrefcollapse_shape-mlirmemrefcollapseshapeop
mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorConversion.cpp
233	Good to know!

Harbormaster completed remote builds in B177944: Diff 448157.Jul 27 2022, 3:24 PM

wrengr added inline comments.Jul 28 2022, 6:59 PM

mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
391–399 ↗	(On Diff #447733)	It is indeed strict, but as I said before that's the only way to keep from introducing runtime assertions. I don't see anything in `memref::CollapseShapeOp` that suggests any differently. In fact, that op is a bit stricter than what I've said before, since they also prohibit having an output dimension be dynamic when the corresponding input dimension is fixed. Actually I think that's a very good decision on their part, since it means that any intentional loss of information must be made explicit via some casting op, which in turn simplifies doing further analysis/lowering. Personally I think our concat op should do the same thing. If you want to argue that it's essential we support runtime-polymorphic concat ops, then you'll need to convince Aart that it's worth relaxing the restriction that we never introduce runtime assertions. If you do convince him it's worth it, then the right way to introduce such assertions is via the Shape dialect, since that's what it was designed for. Of course, setting up the proper interop between the SparseTensor dialect and the Shape dialect would be a whole project in and of itself; thus, that would be done in a series of CLs separate from this one anyways.

Peiming added inline comments.Jul 29 2022, 1:54 PM

mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
391–399 ↗	(On Diff #447733)	I did not intend to argue that we should introduce runtime assertions. I was trying to argue that we should accept the instruction if we can NOT statically prove it is WRONG. Similar to this If an op can be statically proven to be invalid (e.g, an expansion from memref<10xf32> to memref<2x6xf32>), it is rejected by the verifier. If it cannot statically be proven invalid (e.g., the full example above; it is unclear whether the first source dimension is divisible by 5), the op is accepted by the verifier. However, if the op is in fact invalid at runtime, the behavior is undefined. https://mlir.llvm.org/docs/Dialects/MemRef/#memrefexpand_shape-mlirmemrefexpandshapeop But I am okay with the strict rules too, because I currently do not know how the instruction will be used by the user in practice.

aartbik added inline comments.Aug 2 2022, 1:14 PM

mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td
167	nit: we do not use the //=== header for individual ops, only for classes of ops (three in this file, general ops, memory ops, semi ring ops). Since it is in the regular ops section, no need for L166-168
175	Concatenates, i.e. use the active "s" form in the summary, as done elsewhere
177	The resulting `dimension` (add "ing")
179	Also, add constraint that, for all src in inputs :: dest-rank == src-rank?
182	empty line after example
mlir/lib/Dialect/SparseTensor/IR/SparseTensorDialect.cpp
371 ↗	(On Diff #448157)	We did not verify that ranks are the same? This can go out of bounds!
mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorConversion.cpp
64	hmm, this worries me; this utility has been there for a very long time, and I am not sure breaking it open like this is a good idea; do you have some more details on when this goes wrong?
300–313	I like the refactoring too, but just for future revisions, it is often better to break the revision up into two parts, once that does the preparing refactoring in existing code, and then one that adds the new functionality, like this one; right now, we have a lot of moving parts to keep track off, making careful review a bit harder
mlir/test/Dialect/SparseTensor/roundtrip.mlir
308	maybe break this part onto separate, aligned lines (note, the part inside the CHECK often breaks the 80-column, that is okay, but the part in the mlir code can be formatted a bit better in general
mlir/test/Integration/Dialect/SparseTensor/CPU/concatenate.mlir
26	I am okay with keeping more test cases, as long as runtime is not excessive, good to be exhaustive!
33	can we break and align parameters for readability of the file?
46	Here and below, period at end of comment

For future reference, I would probably have broken up this revision into three parts

(1) add concat op to IR, with roundtrip, verifier, and invalid test (note, latter seems missing here)
(2) the util refactoring in existing code
(3) the actual concat conversion, with CHECK and integration test

Peiming abandoned this revision.Aug 4 2022, 2:10 PM

Peiming marked 7 inline comments as done.

wrengr mentioned this in D131111: [mlir][sparse] Add new concatente operator to sparse tensor.Aug 16 2022, 11:29 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

SparseTensor/

IR/

SparseTensorOps.td

26 lines

lib/

Dialect/

SparseTensor/

Transforms/

SparseTensorConversion.cpp

450 lines

test/

Dialect/

SparseTensor/

roundtrip.mlir

20 lines

sparse_concat.mlir

363 lines

Integration/

Dialect/

SparseTensor/

CPU/

concatenate.mlir

432 lines

Diff 446568

mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	let description = [{
%1 = sparse_tensor.values %0 : tensor<64x64xf64, #CSR> to memref<?xf64>		%1 = sparse_tensor.values %0 : tensor<64x64xf64, #CSR> to memref<?xf64>
```		```
}];		}];
let assemblyFormat = "$tensor attr-dict `:` type($tensor) `to` type($result)";		let assemblyFormat = "$tensor attr-dict `:` type($tensor) `to` type($result)";
let hasVerifier = 1;		let hasVerifier = 1;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// Sparse Tensor Concatenate Operation
		aartbikUnsubmitted Done Reply Inline Actions nit: we do not use the //=== header for individual ops, only for classes of ops (three in this file, general ops, memory ops, semi ring ops). Since it is in the regular ops section, no need for L166-168 aartbik: nit: we do not use the //=== header for individual ops, only for classes of ops (three in this…
		//===----------------------------------------------------------------------===//

		def SparseTensor_ConcatenateOp : SparseTensor_Op<"concatenate", []>,
		Arguments<(ins Variadic<AnyRankedTensor>:$inputs,
		IndexAttr:$dimension)>,
		Results<(outs AnyRankedTensor:$result)> {

		let summary = "Concatenate a list of concatenate op";
		aartbikUnsubmitted Done Reply Inline Actions Concatenates, i.e. use the active "s" form in the summary, as done elsewhere aartbik: Concatenates, i.e. use the active "s" form in the summary, as done elsewhere
		let description = [{
		Concatenate composes an array from multiple tensor operands. The output tensor
		aartbikUnsubmitted Done Reply Inline Actions The resulting `dimension` (add "ing") aartbik: The resulting `dimension` (add "ing")
		is of the same rank as each of the input tensor operands (which must be of the
		same rank as each other). With the exception of `dimension`, all dimensions
		bixiaUnsubmitted Done Reply Inline Actions Maybe we can simply say "The output tensor and all input tensors must be of the same rank."? bixia: Maybe we can simply say "The output tensor and all input tensors must be of the same rank."?
		wrengrUnsubmitted Done Reply Inline Actions You should add a new trait to verify this requirement. For an example of how to do so, see `def SameOperandsAndResultShape` in `OpBase.td`, `template class SameOperandsAndResultShape` in `include/mlir/IR/OpDefinition.h`, and `OpTrait::impl::verifySameOperandsAndResultShape` in `lib/IR/Operation.cpp`. The trait you'll define is basically the same, just differing on the specified `$dimension` This is important not just for catching errors in user programs, but the same trait will be desired for the analogous dense-tensor op (if it doesn't already exist) wrengr: You should add a new trait to verify this requirement. For an example of how to do so, see `def…
		wrengrUnsubmitted Done Reply Inline Actions Fwiw, so long as you add the `let hasVerifier = 1;` to this op definition, and rephrase the `verifyConcatShape` function into the `ConcatenateOp::verify` method. I'd be fine with you doing all the trait stuff in a separate CL, to help break things up. wrengr: Fwiw, so long as you add the `let hasVerifier = 1;` to this op definition, and rephrase the…
		PeimingAuthorUnsubmitted Done Reply Inline Actions I decided to add a verifier instead of a trait to verify the shape. Because There is no existing operator (that I am aware of) has the same verification rules. I do not think there will be many operators in the future that gonna to have the same verification rules Peiming: I decided to add a verifier instead of a trait to verify the shape. Because 1. There is no…
		aartbikUnsubmitted Done Reply Inline Actions Also, add constraint that, for all src in inputs :: dest-rank == src-rank? aartbik: Also, add constraint that, for all src in inputs :: dest-rank == src-rank?
		must be the same.
		bixiaUnsubmitted Done Reply Inline Actions How about this? The concatenation happens on the specified `dimension`. The result `dimension` size is the sum of all the input `dimension` sizes, while all the other dimensions should have the same size in the input and output tensors. bixia: How about this? The concatenation happens on the specified `dimension`. The result `dimension`…

		Example:
		aartbikUnsubmitted Done Reply Inline Actions empty line after example aartbik: empty line after example
		```mlir
		%0 = sparse_tensor.concatenate %1, %2 { dimension = 0 : index }
		: tensor<64x64xf64, #CSR>, tensor<64x64xf64, #CSR> to tensor<128x64xf64, #CSR>
		```
		}];

		let assemblyFormat = "$inputs attr-dict `:` type($inputs) `to` type($result)";
		}

		//===----------------------------------------------------------------------===//
// Sparse Tensor Management Operations. These operations are "impure" in the		// Sparse Tensor Management Operations. These operations are "impure" in the
// sense that they do not properly operate on SSA values. Instead, the behavior		// sense that they do not properly operate on SSA values. Instead, the behavior
// is solely defined by side-effects. These operations provide a bridge between		// is solely defined by side-effects. These operations provide a bridge between
// the code generator and the support library. The semantics of these operations		// the code generator and the support library. The semantics of these operations
// may be refined over time as our sparse abstractions evolve.		// may be refined over time as our sparse abstractions evolve.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def SparseTensor_LexInsertOp : SparseTensor_Op<"lex_insert", []>,		def SparseTensor_LexInsertOp : SparseTensor_Op<"lex_insert", []>,
▲ Show 20 Lines • Show All 422 Lines • Show Last 20 Lines

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorConversion.cpp

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines static FlatSymbolRefAttr getFunc(Operation *op, StringRef name,

TypeRange resultType, ValueRange operands, TypeRange resultType, ValueRange operands,

EmitCInterface emitCInterface) { EmitCInterface emitCInterface) {

MLIRContext *context = op->getContext(); MLIRContext *context = op->getContext();

auto module = op->getParentOfType<ModuleOp>(); auto module = op->getParentOfType<ModuleOp>();

auto result = SymbolRefAttr::get(context, name); auto result = SymbolRefAttr::get(context, name);

auto func = module.lookupSymbol<func::FuncOp>(result.getAttr()); auto func = module.lookupSymbol<func::FuncOp>(result.getAttr());

if (!func) { if (!func) {

OpBuilder moduleBuilder(module.getBodyRegion()); OpBuilder moduleBuilder(module.getBodyRegion());

// FIXME: It seems that the FuncOpSignatureConversion failed to run on

aartbikUnsubmitted

Not Done

hmm, this worries me; this utility has been there for a very long time, and I am not sure breaking it open like this is a good idea; do you have some more details on when this goes wrong?

aartbik: hmm, this worries me; this utility has been there for a very long time, and I am not sure…

// dynamically inserted FuncOp, so we manually do the conversion here.

SmallVector<Type, 8> opTps;

opTps.reserve(operands.size());

for (auto op : operands) {

if (getSparseTensorEncoding(op.getType()))

opTps.push_back(getOpaquePointerType(moduleBuilder));

else

opTps.push_back(op.getType());

}

func = moduleBuilder.create<func::FuncOp>( func = moduleBuilder.create<func::FuncOp>(

op->getLoc(), name, op->getLoc(), name, FunctionType::get(context, opTps, resultType));

FunctionType::get(context, operands.getTypes(), resultType));

func.setPrivate(); func.setPrivate();

if (static_cast<bool>(emitCInterface)) if (static_cast<bool>(emitCInterface))

func->setAttr(LLVM::LLVMDialect::getEmitCWrapperAttrName(), func->setAttr(LLVM::LLVMDialect::getEmitCWrapperAttrName(),

UnitAttr::get(context)); UnitAttr::get(context));

} }

return result; return result;

} }

▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines

/// Populates given sizes array from source. /// Populates given sizes array from source.

static void sizesFromSrc(OpBuilder &builder, SmallVector<Value, 4> &sizes, static void sizesFromSrc(OpBuilder &builder, SmallVector<Value, 4> &sizes,

Location loc, Value src) { Location loc, Value src) {

unsigned rank = src.getType().cast<ShapedType>().getRank(); unsigned rank = src.getType().cast<ShapedType>().getRank();

for (unsigned i = 0; i < rank; i++) for (unsigned i = 0; i < rank; i++)

sizes.push_back(linalg::createOrFoldDimOp(builder, loc, src, i)); sizes.push_back(linalg::createOrFoldDimOp(builder, loc, src, i));

} }

static Value sizeFromPtrAtDim(OpBuilder &builder, Operation *op,

bixiaUnsubmitted

Done

Please add documentation for this routine, similar to the surrounding routines.

bixia: Please add documentation for this routine, similar to the surrounding routines.

SparseTensorEncodingAttr &enc, ShapedType stp,

wrengrUnsubmitted

Done

/// Compute the size from type (for static sizes) and from an already converted

- /// into opague pointer source (for dynamic sizes) at the given dimension

+ /// into opaque pointer source (for dynamic sizes) at the given dimension

static Value sizeFromPtrAtDim(OpBuilder &builder, Operation *op,

spelling

wrengr: spelling

Value src, unsigned dim) {

Location loc = op->getLoc();

auto shape = stp.getShape();

if (shape[dim] == ShapedType::kDynamicSize)

return genDimSizeCall(builder, op, enc, src, dim);

return constantIndex(builder, loc, shape[dim]);

}

bixiaUnsubmitted

Done

Please add an empty line after the routine, similar to the surrounding routines.

bixia: Please add an empty line after the routine, similar to the surrounding routines.

/// Populates given sizes array from type (for static sizes) and from /// Populates given sizes array from type (for static sizes) and from

wrengrUnsubmitted

Done

This wasn't in the original sizesFromPtr function, since generally we only need to construct an array of the dynamic sizes (since static sizes are already noted in the types). So why the change of the semantics for sizesFromPtr? And is that change strictly necessary? I would much rather have the implementation of concatSizesFromInputs be adjusted to deal with static dimensions directly, since there are many other users of sizesFromPtr

wrengr: This wasn't in the original `sizesFromPtr` function, since generally we only need to construct…

PeimingAuthorUnsubmitted

Done

It is actually in the original sizeFromPtr function (sizes.push_back(constantIndex(builder, loc, shape[i]));). I simply split the original function into two functions to avoid code repetition

Peiming: It is actually in the original `sizeFromPtr` function (`sizes.push_back(constantIndex(builder…

wrengrUnsubmitted

Done

Aha, the diff just got totally messed up so it was hard to see

wrengr: Aha, the diff just got totally messed up so it was hard to see

/// an already converted into opague pointer source (for dynamic sizes). /// an already converted into opague pointer source (for dynamic sizes).

static void sizesFromPtr(OpBuilder &builder, SmallVector<Value, 4> &sizes, static void sizesFromPtr(OpBuilder &builder, SmallVector<Value, 4> &sizes,

Operation *op, SparseTensorEncodingAttr &enc, Operation *op, SparseTensorEncodingAttr &enc,

ShapedType stp, Value src) { ShapedType stp, Value src) {

Location loc = op->getLoc();

auto shape = stp.getShape();

for (unsigned i = 0, rank = stp.getRank(); i < rank; i++) for (unsigned i = 0, rank = stp.getRank(); i < rank; i++)

if (shape[i] == ShapedType::kDynamicSize) sizes.push_back(sizeFromPtrAtDim(builder, op, enc, stp, src, i));

sizes.push_back(genDimSizeCall(builder, op, enc, src, i)); }

/// Populates given sizes array from type (for static sizes) and from

/// an already converted into opague pointer source (for dynamic sizes).

static void concatSizesFromInputs(OpBuilder &builder,

SmallVector<Value, 4> &sizes, Operation *op,

ShapedType tp, ValueRange srcs,

unsigned dim) {

// At least two source tensors to be concatenated.

assert(srcs.size() > 1);

Location loc = op->getLoc();

auto shape = tp.getShape();

for (unsigned i = 0, rank = tp.getRank(); i < rank; i++) {

// Compute the shape of output densor dynamically if it is not

// statically known.

if (shape[i] == ShapedType::kDynamicSize) {

if (i != dim) {

// If this is not the dimension to be concatenated, use

wrengrUnsubmitted

Done

I think it would be cleaner to float this conditional out of the loop. The for(i=0;i<dim;++i) and for(i=dim+1;i<rank:++i) loops would have the same loop body, but...

If you combine what I said about sizeFromPtr only constructing the array of dynamic sizes, with what I said about never introducing runtime type checks, then the before/after "loops" are really quite trivial. If the output dimension is dynamic, then store one of the input dimensions (just pick any one, since we know they must all be static and equal); otherwise do nothing. It's only for the concatenated dimension that this function actually needs to do any work.

wrengr: I think it would be cleaner to float this conditional out of the loop. The `for(i=0;i<dim;++i)`…

wrengrUnsubmitted

Done

Since I was mistaken about sizesFromPtr storing the static dimensions, the before/after loops aren't quite as I described in the above comment. The code actually becomes even simpler, since they'd just always push the constant index from one of the inputs, without needing to care whether the output is static or dynamic

wrengr: Since I was mistaken about `sizesFromPtr` storing the static dimensions, the before/after loops…

PeimingAuthorUnsubmitted

Done

Done!

It indeed makes the code much simpler!

Peiming: Done! It indeed makes the code much simpler!

// arbitrary input tensor for size should be fine

bixiaUnsubmitted

Done

s/use/using/
Please add a period at the end of the sentence, that is, s/fine/fine./.

bixia: s/use/using/ Please add a period at the end of the sentence, that is, s/fine/fine./.

// TODO: Verify that input tensors have the same size except for

// the dimension to be concatenated dynamically.

auto srcTp = srcs[0].getType().cast<ShapedType>();

wrengrUnsubmitted

Done

We should statically verify that all input tensors already have a compatible shape. The sparse_tensor dialect intentionally avoids all situations where we would be forced to introduce a dynamic/runtime check due to dynamic sizes. (That is, we are allowed to lose information by converting static dimensions into dynamic dimensions, because such information-loss can be performed at compile-time; but we may only gain information about dynamic shapes for the mere purpose of threading that information through to places that need an SSA-value but are entirely parametric in the actual numeric quantity, since any other usage could result in run-time type errors or difficult-to-diagnose performance issues.)

Therefore, for all dimensions other than the one being concatenated: all inputs must be static and equal, since if any of them are dynamic then that would require introducing assertions that they match; and the output must be either static-and-equal or else be dynamic (since we're allowed to forget the static-and-equal knowledge after the fact). Whereas for the dimension being concatenated: if all inputs are static, then the output will be either static and equal to the sum, or be dynamic; whereas if any inputs are dynamic, then the output must also be dynamic (though we are permitted to compute the runtime summation-value for the purposes of threading through).

wrengr: We should statically verify that all input tensors already have a compatible shape. The…

auto encSrc = getSparseTensorEncoding(srcTp);

if (encSrc)

sizes.push_back(

sizeFromPtrAtDim(builder, op, encSrc, srcTp, srcs[0], i));

else else

sizes.push_back(linalg::createOrFoldDimOp(builder, loc, srcs[0], i));

} else {

// This is the dimension to concatenate, sum up all the input

// tensor size.

Value ds = constantIndex(builder, loc, 0);

for (size_t j = 0; j < srcs.size(); j++) {

Value ss; // size from source tensor

auto srcTp = srcs[j].getType().cast<ShapedType>();

auto encSrc = getSparseTensorEncoding(srcTp);

if (encSrc)

ss = sizeFromPtrAtDim(builder, op, encSrc, srcTp, srcs[j], i);

else

ss = linalg::createOrFoldDimOp(builder, loc, srcs[j], i);

// sum up all the sizes

bixiaUnsubmitted

Done

NIT: s/sum/Sum/

s/sizes/sizes./

bixia: NIT: s/sum/Sum/ s/sizes/sizes./

ds = builder.create<arith::AddIOp>(loc, ss, ds);

}

sizes.push_back(ds);

}

} else {

sizes.push_back(constantIndex(builder, loc, shape[i])); sizes.push_back(constantIndex(builder, loc, shape[i]));

} }

}

/// Generates an uninitialized temporary buffer of the given size and /// Generates an uninitialized temporary buffer of the given size and

/// type, but returns it as type `memref<? x $tp>` (rather than as type /// type, but returns it as type `memref<? x $tp>` (rather than as type

/// `memref<$sz x $tp>`). /// `memref<$sz x $tp>`).

static Value genAlloca(OpBuilder &builder, Location loc, Value sz, Type tp) { static Value genAlloca(OpBuilder &builder, Location loc, Value sz, Type tp) {

wrengrUnsubmitted

Done

please rename this variable to clarify which type this is supposed to be exactly

wrengr: please rename this variable to clarify which type this is supposed to be exactly

auto memTp = MemRefType::get({ShapedType::kDynamicSize}, tp); auto memTp = MemRefType::get({ShapedType::kDynamicSize}, tp);

return builder.create<memref::AllocaOp>(loc, memTp, ValueRange{sz}); return builder.create<memref::AllocaOp>(loc, memTp, ValueRange{sz});

} }

/// Generates an uninitialized buffer of the given size and type, /// Generates an uninitialized buffer of the given size and type,

/// but returns it as type `memref<? x $tp>` (rather than as type /// but returns it as type `memref<? x $tp>` (rather than as type

/// `memref<$sz x $tp>`). Unlike temporary buffers on the stack, /// `memref<$sz x $tp>`). Unlike temporary buffers on the stack,

/// this buffer must be explicitly deallocated by client. /// this buffer must be explicitly deallocated by client.

static Value genAlloc(RewriterBase &rewriter, Location loc, Value sz, Type tp) { static Value genAlloc(RewriterBase &rewriter, Location loc, Value sz, Type tp) {

auto memTp = MemRefType::get({ShapedType::kDynamicSize}, tp); auto memTp = MemRefType::get({ShapedType::kDynamicSize}, tp);

return rewriter.create<memref::AllocOp>(loc, memTp, ValueRange{sz}); return rewriter.create<memref::AllocOp>(loc, memTp, ValueRange{sz});

} }

/// Generates an uninitialized temporary buffer of the given size and /// Generates an uninitialized temporary buffer of the given size and

/// type, but returns it as type `memref<? x $tp>` (rather than as type /// type, but returns it as type `memref<? x $tp>` (rather than as type

wrengrUnsubmitted

Done

if (shape[dim] == ShapedType::kDynamicSize) {

- for (size_t i = 1; i < srcs.size(); i++) {

+ for (size_t i = 1, sz = srcs.size(); i < sz; i++) {

Value ss; // size from source tensor

Shouldn't call .size() method repeatedly in the condition; cf., the style guide.

wrengr: Shouldn't call `.size()` method repeatedly in the condition; cf., the style guide.

PeimingAuthorUnsubmitted

Done

Good to know!

Peiming: Good to know!

/// `memref<$sz x $tp>`). /// `memref<$sz x $tp>`).

wrengrUnsubmitted

Done

please use a more meaningful name like srcSize or srcSz. Then the comment is unnecessary since the variable name is self-documenting

Also, this declaration ought to be combined with the conditional that assigns to it; i.e., Value srcSize = encSrc ? sizeFromPtrAtDim(...) : linalg::createOrFoldDimOp(...);. Since there's no clarity gained from separating it out

wrengr: please use a more meaningful name like `srcSize` or `srcSz`. Then the comment is unnecessary…

static Value genAlloca(OpBuilder &builder, Location loc, unsigned sz, Type tp) { static Value genAlloca(OpBuilder &builder, Location loc, unsigned sz, Type tp) {

return genAlloca(builder, loc, constantIndex(builder, loc, sz), tp); return genAlloca(builder, loc, constantIndex(builder, loc, sz), tp);

} }

/// Generates an uninitialized temporary buffer with room for one value /// Generates an uninitialized temporary buffer with room for one value

/// of the given type, and returns the `memref<$tp>`. /// of the given type, and returns the `memref<$tp>`.

static Value genAllocaScalar(OpBuilder &builder, Location loc, Type tp) { static Value genAllocaScalar(OpBuilder &builder, Location loc, Type tp) {

return builder.create<memref::AllocaOp>(loc, MemRefType::get({}, tp)); return builder.create<memref::AllocaOp>(loc, MemRefType::get({}, tp));

wrengrUnsubmitted

Done

I think the generated code would be easier to follow if the order of these two arguments was swapped. (Since currently this generates the summation in the reverse order from the input tensors)

wrengr: I think the generated code would be easier to follow if the order of these two arguments was…

} }

/// Generates a temporary buffer of the given type and given contents. /// Generates a temporary buffer of the given type and given contents.

static Value genBuffer(OpBuilder &builder, Location loc, ValueRange values) { static Value genBuffer(OpBuilder &builder, Location loc, ValueRange values) {

unsigned sz = values.size(); unsigned sz = values.size();

assert(sz >= 1); assert(sz >= 1);

wrengrUnsubmitted

Done

It'd be clearer to do this first (i.e., if (shape[dim] != ShapedType::kDynamicSize) { sizes[dim] = constantIndex(builder, loc, shape[dim]); return; }) since this case is very short whereas the case for dynamic sizes is long.

wrengr: It'd be clearer to do this first (i.e., `if (shape[dim] != ShapedType::kDynamicSize) { sizes…

Value buffer = genAlloca(builder, loc, sz, values[0].getType()); Value buffer = genAlloca(builder, loc, sz, values[0].getType());

for (unsigned i = 0; i < sz; i++) { for (unsigned i = 0; i < sz; i++) {

Value idx = constantIndex(builder, loc, i); Value idx = constantIndex(builder, loc, i);

builder.create<memref::StoreOp>(loc, values[i], buffer, idx); builder.create<memref::StoreOp>(loc, values[i], buffer, idx);

} }

return buffer; return buffer;

} }

Show All 35 Lines static void newParams(OpBuilder &builder, SmallVector<Value, 8> &params,

// User action. // User action.

params.push_back(constantAction(builder, loc, action)); params.push_back(constantAction(builder, loc, action));

// Payload pointer. // Payload pointer.

if (!ptr) if (!ptr)

ptr = builder.create<LLVM::NullOp>(loc, getOpaquePointerType(builder)); ptr = builder.create<LLVM::NullOp>(loc, getOpaquePointerType(builder));

params.push_back(ptr); params.push_back(ptr);

} }

/// Generates the code to read the value from tensor[ivs].The generated code

/// looks like the following and the insertion point after this routine is

/// inside the if-then branch behind the assignment to ind.

/// if (tensor[ivs] != 0)

/// insert_point

static Value genValueForDense(OpBuilder &builder, Location loc, Value tensor,

ValueRange ivs) {

Value val = builder.create<tensor::ExtractOp>(loc, tensor, ivs);

Value cond = genIsNonzero(builder, loc, val);

scf::IfOp ifOp = builder.create<scf::IfOp>(loc, cond, /*else*/ false);

builder.setInsertionPointToStart(&ifOp.getThenRegion().front());

return val;

}

wrengrUnsubmitted

Done

+1 for factoring this out :)

wrengr: +1 for factoring this out :)

aartbikUnsubmitted

Not Done

I like the refactoring too, but just for future revisions, it is often better to break the revision up into two parts, once that does the preparing refactoring in existing code, and then one that adds the new functionality, like this one; right now, we have a lot of moving parts to keep track off, making careful review a bit harder

aartbik: I like the refactoring too, but just for future revisions, it is often better to break the…

/// Generates the code to read the value from tensor[ivs], and conditionally /// Generates the code to read the value from tensor[ivs], and conditionally

/// stores the indices ivs to the memory in ind. The generated code looks like /// stores the indices ivs to the memory in ind. The generated code looks like

/// the following and the insertion point after this routine is inside the /// the following and the insertion point after this routine is inside the

/// if-then branch behind the assignment to ind. This is to ensure that the /// if-then branch behind the assignment to ind. This is to ensure that the

/// addEltX call generated after is inside the if-then branch. /// addEltX call generated after is inside the if-then branch.

/// if (tensor[ivs] != 0) /// if (tensor[ivs] != 0)

/// ind = ivs /// ind = ivs

static Value genIndexAndValueForDense(OpBuilder &builder, Location loc, static Value genIndexAndValueForDense(OpBuilder &builder, Location loc,

Value tensor, Value ind, ValueRange ivs) { Value tensor, Value ind, ValueRange ivs) {

Value val = builder.create<tensor::ExtractOp>(loc, tensor, ivs); Value val = genValueForDense(builder, loc, tensor, ivs);

Value cond = genIsNonzero(builder, loc, val);

scf::IfOp ifOp = builder.create<scf::IfOp>(loc, cond, /*else*/ false);

builder.setInsertionPointToStart(&ifOp.getThenRegion().front());

unsigned i = 0; unsigned i = 0;

for (auto iv : ivs) { for (auto iv : ivs) {

Value idx = constantIndex(builder, loc, i++); Value idx = constantIndex(builder, loc, i++);

builder.create<memref::StoreOp>(loc, iv, ind, idx); builder.create<memref::StoreOp>(loc, iv, ind, idx);

} }

return val; return val;

} }

▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines static Value allocDenseTensor(OpBuilder &builder, Location loc,

return mem; return mem;

} }

/// Generates code to deallocate a dense buffer. /// Generates code to deallocate a dense buffer.

static void deallocDenseTensor(OpBuilder &builder, Location loc, Value buffer) { static void deallocDenseTensor(OpBuilder &builder, Location loc, Value buffer) {

builder.create<memref::DeallocOp>(loc, buffer); builder.create<memref::DeallocOp>(loc, buffer);

} }

/// Inserts the element returned by genGetNextCall(_, ind, elemPtr) into static SmallVector<Value, 4>

/// the tensor created by allocDenseTensor(). The `rank` is the rank convertCOOToIndexVector(OpBuilder &builder, Location loc, unsigned rank,

wrengrUnsubmitted

Done

Please document this function

wrengr: Please document this function

wrengrUnsubmitted

Done

This would be better named loadIndices

wrengr: This would be better named `loadIndices`

/// of the `tensor` and the length of `ind`. Value ind, unsigned concatDim, Value offset) {

static void insertScalarIntoDenseTensor(OpBuilder &builder, Location loc, SmallVector<Value, 4> ret;

wrengrUnsubmitted

Done

Avoid meaningless variable names like ret. Instead, use the same names we use elsewhere for this sort of value: namely ind or ivs (which mean different things). In this case you want ivs.

(In situations where you have multiple ind things or multiple ivs things, then they should be qualified to indicate what their meaning is in that context; e.g., srcInd, dstInd, targetInd, etc)

wrengr: Avoid meaningless variable names like `ret`. Instead, use the same names we use elsewhere for…

Value elemPtr, Value tensor, ret.reserve(rank);

wrengrUnsubmitted

Done

Would be better named offsetDim, since this is the dimension where the offset is applied (regardless of why the caller wants to apply that offset).

wrengr: Would be better named `offsetDim`, since this is the dimension where the `offset` is applied…

unsigned rank, Value ind) {

SmallVector<Value, 4> ivs;

ivs.reserve(rank);

for (unsigned i = 0; i < rank; i++) { for (unsigned i = 0; i < rank; i++) {

Value idx = constantIndex(builder, loc, i); Value idx = constantIndex(builder, loc, i);

ivs.push_back(builder.create<memref::LoadOp>(loc, ind, idx)); idx = builder.create<memref::LoadOp>(loc, ind, idx);

if (offset && concatDim == i)

wrengrUnsubmitted

Done

Style-wise, you should swap the order of the conjuncts: since the primary thing in question is whether the current dimension is the concatDim, and only after that do we care to avoid the AddIOp when there is no offset.

wrengr: Style-wise, you should swap the order of the conjuncts: since the primary thing in question is…

idx = builder.create<arith::AddIOp>(loc, idx, offset);

ret.push_back(idx);

}

return ret;

} }

static void convertIndexVectorToCOO(OpBuilder &builder, Location loc,

wrengrUnsubmitted

Done

please document this function

wrengr: please document this function

unsigned rank, Value ind, ValueRange idxVec,

unsigned concatDim = 0,

Value offset = Value()) {

for (unsigned i = 0; i < rank; i++) {

Value idx = idxVec[i];

wrengrUnsubmitted

Done

This variable should be named ivs same as everywhere else in the code.

wrengr: This variable should be named `ivs` same as everywhere else in the code.

if (offset && concatDim == i) {

wrengrUnsubmitted

Done

Ditto, re order of conjuncts.

wrengr: Ditto, re order of conjuncts.

idx = builder.create<arith::AddIOp>(loc, idx, offset);

}

wrengrUnsubmitted

Done

Style guide says no braces here

wrengr: Style guide says no braces here

builder.create<memref::StoreOp>(loc, idx, ind,

constantIndex(builder, loc, i));

}

static void insertScalarIntoDenseTensor(OpBuilder &builder, Location loc,

wrengrUnsubmitted

Done

Please retain the documentation that was here before

wrengr: Please retain the documentation that was here before

Value elemPtr, Value tensor,

ValueRange ivs) {

Value elemV = builder.create<memref::LoadOp>(loc, elemPtr); Value elemV = builder.create<memref::LoadOp>(loc, elemPtr);

wrengrUnsubmitted

Done

And this would be better named storeIndices

wrengr: And this would be better named `storeIndices`

builder.create<memref::StoreOp>(loc, elemV, tensor, ivs); builder.create<memref::StoreOp>(loc, elemV, tensor, ivs);

} }

wrengrUnsubmitted

Done

ditto re renaming to offsetDim

wrengr: ditto re renaming to `offsetDim`

/// Inserts the element returned by genGetNextCall(_, ind, elemPtr) into

/// the tensor created by allocDenseTensor(). The `rank` is the rank

/// of the `tensor` and the length of `ind`.

static void insertScalarIntoDenseTensor(OpBuilder &builder, Location loc,

Value elemPtr, Value tensor,

unsigned rank, Value ind,

unsigned dim = 0, // dim to be concated

Value offset = Value()) {

SmallVector<Value, 4> ivs =

convertCOOToIndexVector(builder, loc, rank, ind, dim, offset);

insertScalarIntoDenseTensor(builder, loc, elemPtr, tensor, ivs);

}

wrengrUnsubmitted

Done

Is this really worth factoring out, rather than simply inlining? Is this even called anywhere?

wrengr: Is this really worth factoring out, rather than simply inlining? Is this even called anywhere?

PeimingAuthorUnsubmitted

Done

Yes, it is previously used in sparse=>dense conversion, which 1st, use a pointer of COO to load the index vector and 2nd, insert the scalar into dense tensor.

When concatenating a dense tensor to a dense tensor, we do not need to convert COO to index vector and we can directly insert the scalar into dense tensor.

By factoring it out, we can reuse the common part (inserting scalar into dense vector using index vector).

Peiming: Yes, it is previously used in sparse=>dense conversion, which 1st, use a pointer of COO to load…

wrengrUnsubmitted

Done

In this CL, the only callsite I see for a function named insertScalarIntoDenseTensor is on line 1395, however that is actually a call to the function defined on lines 456–461. Prior to this CL the only call to a function of this name is on original line 770, which calls the function this CL defines on lines 466–474. However, since the function defined on lines 466–474 is only two lines long and is only ever called from one place, I don't see the value of defining it as a new function rather than inlining it at that the callsite on original line 770.

wrengr: In this CL, the only callsite I see for a function named `insertScalarIntoDenseTensor` is on…

/// Determine if the runtime library supports direct conversion to the /// Determine if the runtime library supports direct conversion to the

/// given target `dimTypes`. /// given target `dimTypes`.

static bool canUseDirectConversion( static bool canUseDirectConversion(

ArrayRef<SparseTensorEncodingAttr::DimLevelType> dimTypes) { ArrayRef<SparseTensorEncodingAttr::DimLevelType> dimTypes) {

bool alreadyCompressed = false; bool alreadyCompressed = false;

for (uint64_t rank = dimTypes.size(), r = 0; r < rank; r++) { for (uint64_t rank = dimTypes.size(), r = 0; r < rank; r++) {

switch (dimTypes[r]) { switch (dimTypes[r]) {

case SparseTensorEncodingAttr::DimLevelType::Compressed: case SparseTensorEncodingAttr::DimLevelType::Compressed:

▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines genSparse2SparseReshape(Operation *op, ConversionPatternRewriter &rewriter,

params[7] = coo; params[7] = coo;

Value dst = genNewCall(rewriter, op, params); Value dst = genNewCall(rewriter, op, params);

genDelCOOCall(rewriter, op, elemTp, coo); genDelCOOCall(rewriter, op, elemTp, coo);

genDelCOOCall(rewriter, op, elemTp, iter); genDelCOOCall(rewriter, op, elemTp, iter);

rewriter.replaceOp(op, dst); rewriter.replaceOp(op, dst);

return success(); return success();

} }

// TODO: Get rid of Operation *op in the parameters list! It seems

// that we only use it for op->getLoc(), pass the loc directly instead!

bixiaUnsubmitted

Done

Why don't we just do this simplification? Similar simplification can be done for genDenseX.

bixia: Why don't we just do this simplification? Similar simplification can be done for genDenseX.

PeimingAuthorUnsubmitted

Done

Because all other functions in the file now only accept an Operation *, I would need to change them all to make it work.

So I decided that maybe it is better to split them into two CLs.

Peiming: Because all other functions in the file now only accept an `Operation *`, I would need to…

wrengrUnsubmitted

Done

+1 for just doing this, and +1 for splitting it out into a separate CL. Is that CL uploaded for review?

wrengr: +1 for just doing this, and +1 for splitting it out into a separate CL. Is that CL uploaded for…

PeimingAuthorUnsubmitted

Done

No, I will submit it after current CL is accepted to avoid conflicts.

Peiming: No, I will submit it after current CL is accepted to avoid conflicts.

// TODO: It can be used by other operators (ReshapeOp, ConvertOP) conversion to

// reduce code repetition!

static void genSparseCOOIterationLoop(

ConversionPatternRewriter &rewriter, Operation *op, Value t,

function_ref<void(OpBuilder &, Location, Value, Value)> bodyBuilder) {

Location loc = op->getLoc();

auto tensorTp = t.getType().cast<RankedTensorType>();

auto enc = getSparseTensorEncoding(tensorTp);

assert(enc && "Generating Sparse Tensor COO Loop on a Dense Tensor!");

unsigned rank = tensorTp.getRank();

Type elemTp = tensorTp.getElementType();

// Start an iterator over the tensor (in original index order).

auto noPerm = SparseTensorEncodingAttr::get(

rewriter.getContext(), enc.getDimLevelType(), AffineMap(),

enc.getPointerBitWidth(), enc.getIndexBitWidth());

SmallVector<Value, 4> sizes;

SmallVector<Value, 8> params;

sizesFromPtr(rewriter, sizes, op, noPerm, tensorTp, t);

newParams(rewriter, params, op, tensorTp, noPerm, Action::kToIterator, sizes,

t);

Value iter = genNewCall(rewriter, op, params);

// Construct a while loop over the iterator.

Value srcIdx = genAlloca(rewriter, loc, rank, rewriter.getIndexType());

Value elemPtr = genAllocaScalar(rewriter, loc, elemTp);

SmallVector<Value> noArgs;

SmallVector<Type> noTypes;

auto whileOp = rewriter.create<scf::WhileOp>(loc, noTypes, noArgs);

Block *before = rewriter.createBlock(&whileOp.getBefore(), {}, noTypes);

rewriter.setInsertionPointToEnd(before);

Value cond = genGetNextCall(rewriter, op, iter, srcIdx, elemPtr);

rewriter.create<scf::ConditionOp>(loc, cond, before->getArguments());

// Translate indices from source to target and insert. Note that we do

// not need to store the value in elemPtr, as the value is still there.

bixiaUnsubmitted

Done

I think these two lines describe something that is done inside bodyBuilder, isn't it?

bixia: I think these two lines describe something that is done inside bodyBuilder, isn't it?

Block *after = rewriter.createBlock(&whileOp.getAfter(), {}, noTypes);

rewriter.setInsertionPointToStart(after);

// Callback here to build loop body.

bodyBuilder(rewriter, loc, srcIdx, elemPtr);

rewriter.create<scf::YieldOp>(loc);

// finish generating loop

bixiaUnsubmitted

Done

s/finish/Finish/
s/loop/loop./

bixia: s/finish/Finish/ s/loop/loop./

rewriter.setInsertionPointAfter(whileOp);

// free memory for iterator

bixiaUnsubmitted

Done

s/free/Free/
s/iterator/iterator./

bixia: s/free/Free/ s/iterator/iterator./

genDelCOOCall(rewriter, op, elemTp, iter);

}

// TODO: It can be used by other operators (ReshapeOp, ConvertOP) conversion to

// reduce code repetition!

static void genDenseTensorIterationLoop(

ConversionPatternRewriter &rewriter, Operation *op, Value t,

function_ref<void(OpBuilder &, Location, ValueRange)> bodyBuilder) {

// Generate loop that iterate over a dense tensor

// for i1 in dim1

// ..

// for ik in dimk

// val = a[i1,..,ik]

// if val != 0

// bodyBuilder(v, [i1, ..., ik])

bixiaUnsubmitted

Done

s/iterate/iterates/

I would move this comment out as a document for the whole function.

bixia: s/iterate/iterates/ I would move this comment out as a document for the whole function.

Location loc = op->getLoc();

RankedTensorType tensorTp = t.getType().cast<RankedTensorType>();

auto enc = getSparseTensorEncoding(tensorTp);

assert(!enc && "Generating Densor Tensor Loop on a Sparse Tensor!");

unsigned rank = tensorTp.getRank();

Value zero = constantIndex(rewriter, loc, 0);

Value one = constantIndex(rewriter, loc, 1);

SmallVector<Value> lo;

SmallVector<Value> hi;

SmallVector<Value> st;

// Fill out loop iteration information

for (unsigned i = 0; i < rank; i++) {

lo.push_back(zero);

hi.push_back(linalg::createOrFoldDimOp(rewriter, loc, t, i));

st.push_back(one);

}

scf::buildLoopNest(rewriter, op->getLoc(), lo, hi, st, {},

[&](OpBuilder &builder, Location loc, ValueRange ivs,

ValueRange args) -> scf::ValueVector {

// Invoke callback to build the body of the loop

bodyBuilder(builder, loc, ivs);

return {};

});

}

wrengrUnsubmitted

Done

(1) This should be moved to be an actual ConcatenateOp::verify method, by doing let hasVerifier = 1; in the td file where the ConcatenateOp is defined. That way it is guaranteed to be called in the right places at the right time.

(2) Most of the logic here should be factored out into a trait, as mentioned earlier.

wrengr: (1) This should be moved to be an actual `ConcatenateOp::verify` method, by doing `let…

wrengrUnsubmitted

Done

This can be made auto; since the cast op fixes the type, and since it does so explicitly so there's no legibility benefit to repeating the type name again

wrengr: This can be made `auto`; since the `cast` op fixes the type, and since it does so explicitly so…

PeimingAuthorUnsubmitted

Done

Good piece of advice! Will follow it in the future!

Peiming: Good piece of advice! Will follow it in the future!

static void verifyConcatShape(ConcatenateOp op) {

RankedTensorType dstTp = op.getType().cast<RankedTensorType>();

wrengrUnsubmitted

Done

should be unsigned. (I'm a huge advocate of using auto wherever possible, but doing so here only serves to obfuscate things rather than improving legibility)

wrengr: should be `unsigned`. (I'm a huge advocate of using `auto` wherever possible, but doing so here…

uint64_t concatDim = op.getDimension().getZExtValue();

auto rank = dstTp.getRank();

bool isLegal = true;

for (unsigned i = 0; i < rank; i++) {

auto dstDim = dstTp.getShape()[i];

if (i == concatDim) {

if (dstDim != ShapedType::kDynamicSize) {

unsigned sumDim = 0;

bool hasDynDimInput = false;

for (auto src : op.getInputs()) {

auto d = src.getType().cast<RankedTensorType>().getShape()[i];

if (d == ShapedType::kDynamicSize)

hasDynDimInput = true;

else

sumDim += d;

}

// If all dimension are statically know, the sum of all the input

// dimensions should be equal to the output dimension,

// else the output dimension should be larger than the sum of the

// input dimension, given that some (positive) input dimensions are

wrengrUnsubmitted

Done

No. As mentioned before, we intentionally avoid ever introducing dynamic checks

wrengr: No. As mentioned before, we intentionally avoid ever introducing dynamic checks

PeimingAuthorUnsubmitted

Done

Okay!

Peiming: Okay!

// unknown.

// TODO: insert dynamic checks when there are unknown dimensions.

assert(hasDynDimInput ? sumDim < dstDim : sumDim == dstDim);

}

} else {

bool allEqual = true;

int prev = dstDim;

for (auto src : op.getInputs()) {

auto d = src.getType().cast<RankedTensorType>().getShape()[i];

if (d != ShapedType::kDynamicSize) {

if (prev != ShapedType::kDynamicSize && d != prev) {

allEqual = false;

break;

}

prev = d;

}

if (!allEqual) {

isLegal = false;

wrengrUnsubmitted

Done

There's no need for the isLegal variable. Instead you should just directly assert(allEqual && "...") here. And once you refactor things to do a proper verifier, you'd just return the LogicalResult directly.

wrengr: There's no need for the `isLegal` variable. Instead you should just directly `assert(allEqual…

break;

}

assert(isLegal && "All dimensions (except concatenating dimenstion) should "

"be equal.");

}

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// Conversion rules. // Conversion rules.

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

/// Sparse conversion rule for returns. /// Sparse conversion rule for returns.

class SparseReturnConverter : public OpConversionPattern<func::ReturnOp> { class SparseReturnConverter : public OpConversionPattern<func::ReturnOp> {

public: public:

using OpConversionPattern::OpConversionPattern; using OpConversionPattern::OpConversionPattern;

▲ Show 20 Lines • Show All 323 Lines • ▼ Show 20 Lines scf::buildLoopNest(

val = genIndexAndValueForSparse(rewriter, loc, indices, values, ind, val = genIndexAndValueForSparse(rewriter, loc, indices, values, ind,

ivs, rank); ivs, rank);

else else

val = genIndexAndValueForDense(rewriter, loc, src, ind, ivs); val = genIndexAndValueForDense(rewriter, loc, src, ind, ivs);

builder.create<memref::StoreOp>(loc, val, elemPtr); builder.create<memref::StoreOp>(loc, val, elemPtr);

genAddEltCall(rewriter, op, eltType, coo, elemPtr, ind, perm); genAddEltCall(rewriter, op, eltType, coo, elemPtr, ind, perm);

return {}; return {};

}); });

bixiaUnsubmitted

Done

Unintentional change?

bixia: Unintentional change?

// Final call to construct sparse tensor storage. // Final call to construct sparse tensor storage.

params[6] = constantAction(rewriter, loc, Action::kFromCOO); params[6] = constantAction(rewriter, loc, Action::kFromCOO);

params[7] = coo; params[7] = coo;

Value dst = genNewCall(rewriter, op, params); Value dst = genNewCall(rewriter, op, params);

genDelCOOCall(rewriter, op, eltType, coo); genDelCOOCall(rewriter, op, eltType, coo);

rewriter.replaceOp(op, dst); rewriter.replaceOp(op, dst);

return success(); return success();

} }

▲ Show 20 Lines • Show All 178 Lines • ▼ Show 20 Lines matchAndRewrite(CompressOp op, OpAdaptor adaptor,

rewriter.setInsertionPointAfter(parent); rewriter.setInsertionPointAfter(parent);

rewriter.create<memref::DeallocOp>(loc, adaptor.getOperands()[2]); rewriter.create<memref::DeallocOp>(loc, adaptor.getOperands()[2]);

rewriter.create<memref::DeallocOp>(loc, adaptor.getOperands()[3]); rewriter.create<memref::DeallocOp>(loc, adaptor.getOperands()[3]);

rewriter.create<memref::DeallocOp>(loc, adaptor.getOperands()[4]); rewriter.create<memref::DeallocOp>(loc, adaptor.getOperands()[4]);

return success(); return success();

} }

}; };

/// Sparse conversion rule for the concatenate operator

bixiaUnsubmitted

Done

Please add a period to the end.

bixia: Please add a period to the end.

class SparseTensorConcatConverter : public OpConversionPattern<ConcatenateOp> {

public:

using OpConversionPattern::OpConversionPattern;

LogicalResult

matchAndRewrite(ConcatenateOp op, OpAdaptor adaptor,

ConversionPatternRewriter &rewriter) const override {

// The conversion works as follow:

// 1st. When output is sparse, and mix of inputs

// a_sparse = concat (b_dense, c_sparse, ....)

// =>

// coo_for_a = newSparseCOO(shapeOf(a))

// for i

// for j

// for k

// // Full loop for dense input.

// coo->add(adjustForOffset(i,j,k), b[i,j,k])

// iter_c = c->toCOO();

// while (elem = iter->getNext()) {

// // O(nnz) while loop for sparse inputs.

// coo->add(adjustForOffset(elem.indices), elem.value)

// }

// ...

// a = newSparseTensor(coo_for_a)

// return a

// 2nd. When output is dense, and mix of inputs

// a_dense = concat (b_dense, c_sparse, ....)

// =>

// a = malloc(shapeOf(a))

// for i

// for j

// for k

// // Full loop for dense input.

// a[ adjustForOffset(i,j,k) ] = b[i,j,k]

// iter_c = c->toCOO();

// while (elem = iter->getNext()) {

// // O(nnz) while loop for sparse input.

// a[ adjustForOffset(elem.indices) ] = elem.value

// }

// return a

verifyConcatShape(op);

assert(op.getInputs().size() >= 2 && "Need 2+ tensors to concatenate");

Location loc = op.getLoc();

RankedTensorType dstTp = op.getType().cast<RankedTensorType>();

auto encDst = getSparseTensorEncoding(dstTp);

Type elemTp = dstTp.getElementType();

uint64_t concatDim = op.getDimension().getZExtValue();

auto rank = dstTp.getRank();

Value dst; // destination tensor

Value dstPerm; // destination tensor permutation (if sparse out)

// A pointer to the value being inserted (if dense => sparse)

Value elemPtr;

// Memory that holds the COO for destination tensor (if sparse out)

Value dstIdx;

// The offset applied to the dimenstion to be concated (starting from 0)

Value offset = constantIndex(rewriter, loc, 0);

SmallVector<Value, 4> sizes;

SmallVector<Value, 8> params;

concatSizesFromInputs(rewriter, sizes, op, dstTp, op.getInputs(),

concatDim);

if (encDst) {

// Start a new COO for the destination tensor.

newParams(rewriter, params, op, dstTp, encDst, Action::kEmptyCOO, sizes);

dst = genNewCall(rewriter, op, params);

dstPerm = params[2];

elemPtr = genAllocaScalar(rewriter, loc, elemTp);

dstIdx = genAlloca(rewriter, loc, rank, rewriter.getIndexType());

} else {

// TODO: Dense buffers should be allocated/deallocated via the callback

// in BufferizationOptions.

dst = allocDenseTensor(rewriter, loc, dstTp, sizes);

}

for (Value input : op.getInputs()) {

RankedTensorType srcTp = input.getType().cast<RankedTensorType>();

auto encSrc = getSparseTensorEncoding(srcTp);

if (encSrc) {

genSparseCOOIterationLoop(

rewriter, op, input,

[&](OpBuilder &builder, Location loc, Value idx,

Value elemPtr) -> void {

auto indVec = convertCOOToIndexVector(builder, loc, rank, idx,

concatDim, offset);

if (encDst) {

// Case: sparse => sparse

// 1st. generate a COO with offset applied

bixiaUnsubmitted

Done

Maybe we can just delete this comment line, otherwise,
s/1st./First/

bixia: Maybe we can just delete this comment line, otherwise, s/1st./First/

convertIndexVectorToCOO(builder, loc, rank, dstIdx, indVec);

genAddEltCall(builder, op, elemTp, dst, elemPtr, dstIdx,

dstPerm);

} else {

// Case: sparse => dense

insertScalarIntoDenseTensor(builder, loc, elemPtr, dst, indVec);

}

});

} else {

genDenseTensorIterationLoop(

rewriter, op, input,

[&](OpBuilder &builder, Location loc, ValueRange idx) -> void {

// Load the value

bixiaUnsubmitted

Done

What is this comment for?

bixia: What is this comment for?

if (encDst) {

// Case: dense => sparse

convertIndexVectorToCOO(builder, loc, rank, dstIdx, idx,

concatDim, offset);

Value val = genValueForDense(builder, loc, input, idx);

builder.create<memref::StoreOp>(loc, val, elemPtr);

genAddEltCall(builder, op, elemTp, dst, elemPtr, dstIdx,

dstPerm);

} else {

// Case: dense => dense

Value val = genValueForDense(builder, loc, input, idx);

SmallVector<Value, 4> indVec(idx);

// Apply offset

indVec[concatDim] = builder.create<arith::AddIOp>(

loc, indVec[concatDim], offset);

builder.create<memref::StoreOp>(loc, val, dst, indVec);

}

});

}

// Accumulate offset.

// TODO: avoid calling sparseDimSize multiple times by caching the result!

Value curDim =

encSrc

? sizeFromPtrAtDim(rewriter, op, encSrc, srcTp, input, concatDim)

: linalg::createOrFoldDimOp(rewriter, loc, input, concatDim);

offset = rewriter.create<arith::AddIOp>(loc, offset, curDim);

}

if (encDst) {

params[6] = constantAction(rewriter, loc, Action::kFromCOO);

// In sparse output case, the destination holds the COO

bixiaUnsubmitted

Done

Please add a period at the end.

bixia: Please add a period at the end.

Value coo = dst;

params[7] = coo;

dst = genNewCall(rewriter, op, params);

// Release resources.

genDelCOOCall(rewriter, op, elemTp, coo);

rewriter.replaceOp(op, dst);

} else {

rewriter.replaceOpWithNewOp<bufferization::ToTensorOp>(op, dstTp, dst);

}

return success();

}

};

/// Sparse conversion rule for the output operator. /// Sparse conversion rule for the output operator.

class SparseTensorOutConverter : public OpConversionPattern<OutOp> { class SparseTensorOutConverter : public OpConversionPattern<OutOp> {

public: public:

using OpConversionPattern::OpConversionPattern; using OpConversionPattern::OpConversionPattern;

LogicalResult LogicalResult

matchAndRewrite(OutOp op, OpAdaptor adaptor, matchAndRewrite(OutOp op, OpAdaptor adaptor,

ConversionPatternRewriter &rewriter) const override { ConversionPatternRewriter &rewriter) const override {

Location loc = op->getLoc(); Location loc = op->getLoc();

Show All 40 Lines

/// the sparsification of linear algebra operations. /// the sparsification of linear algebra operations.

void mlir::populateSparseTensorConversionPatterns( void mlir::populateSparseTensorConversionPatterns(

TypeConverter &typeConverter, RewritePatternSet &patterns, TypeConverter &typeConverter, RewritePatternSet &patterns,

const SparseTensorConversionOptions &options) { const SparseTensorConversionOptions &options) {

patterns.add<SparseReturnConverter, SparseTensorToDimSizeConverter, patterns.add<SparseReturnConverter, SparseTensorToDimSizeConverter,

SparseCastConverter, SparseTensorNewConverter, SparseCastConverter, SparseTensorNewConverter,

SparseReshapeConverter<tensor::ExpandShapeOp>, SparseReshapeConverter<tensor::ExpandShapeOp>,

SparseReshapeConverter<tensor::CollapseShapeOp>, SparseReshapeConverter<tensor::CollapseShapeOp>,

SparseTensorAllocConverter, SparseTensorDeallocConverter, SparseTensorConcatConverter, SparseTensorAllocConverter,

SparseTensorToPointersConverter, SparseTensorToIndicesConverter, SparseTensorDeallocConverter, SparseTensorToPointersConverter,

SparseTensorToValuesConverter, SparseTensorLoadConverter, SparseTensorToIndicesConverter, SparseTensorToValuesConverter,

SparseTensorLexInsertConverter, SparseTensorExpandConverter, SparseTensorLoadConverter, SparseTensorLexInsertConverter,

SparseTensorCompressConverter, SparseTensorOutConverter>( SparseTensorExpandConverter, SparseTensorCompressConverter,

typeConverter, patterns.getContext()); SparseTensorOutConverter>(typeConverter, patterns.getContext());

patterns.add<SparseTensorConvertConverter>(typeConverter, patterns.add<SparseTensorConvertConverter>(typeConverter,

patterns.getContext(), options); patterns.getContext(), options);

} }

mlir/test/Dialect/SparseTensor/roundtrip.mlir

	Show First 20 Lines • Show All 283 Lines • ▼ Show 20 Lines
	// CHECK: }			// CHECK: }
	func.func @sparse_reduce_2d_to_1d(%arg0: f64, %arg1: f64) -> f64 {			func.func @sparse_reduce_2d_to_1d(%arg0: f64, %arg1: f64) -> f64 {
	%cf0 = arith.constant 0.0 : f64			%cf0 = arith.constant 0.0 : f64
	%r = sparse_tensor.reduce %arg0, %arg1, %cf0 : f64 {			%r = sparse_tensor.reduce %arg0, %arg1, %cf0 : f64 {
	^bb0(%x: f64, %y: f64):			^bb0(%x: f64, %y: f64):
	sparse_tensor.yield %x : f64			sparse_tensor.yield %x : f64
	}			}
	return %r : f64			return %r : f64
				}

				// -----

				#SparseMatrix = #sparse_tensor.encoding<{dimLevelType = ["compressed", "compressed"]}>

				// CHECK-LABEL: func @concat_sparse_sparse(
				// CHECK-SAME: %[[A0:.*]]: tensor<2x4xf64
				// CHECK-SAME: %[[A1:.*]]: tensor<3x4xf64
				// CHECK-SAME: %[[A2:.*]]: tensor<4x4xf64
				// CHECK: %[[TMP0:.*]] = sparse_tensor.concatenate %[[A0]], %[[A1]], %[[A2]] {dimension = 0 : index} :
				// CHECK-SAME: tensor<2x4xf64
				// CHECK-SAME: tensor<3x4xf64
				// CHECK-SAME: tensor<4x4xf64
				// CHECK-SAME: tensor<9x4xf64
				// CHECK: return %[[TMP0]] : tensor<9x4xf64
				func.func @concat_sparse_sparse(%arg0: tensor<2x4xf64, #SparseMatrix>, %arg1: tensor<3x4xf64, #SparseMatrix>, %arg2: tensor<4x4xf64, #SparseMatrix>) -> tensor<9x4xf64, #SparseMatrix> {
				aartbikUnsubmitted Done Reply Inline Actions maybe break this part onto separate, aligned lines (note, the part inside the CHECK often breaks the 80-column, that is okay, but the part in the mlir code can be formatted a bit better in general aartbik: maybe break this part onto separate, aligned lines (note, the part inside the CHECK often…
				%0 = sparse_tensor.concatenate %arg0, %arg1, %arg2 {dimension = 0 : index}
				: tensor<2x4xf64, #SparseMatrix>, tensor<3x4xf64, #SparseMatrix>, tensor<4x4xf64, #SparseMatrix> to tensor<9x4xf64, #SparseMatrix>
				return %0 : tensor<9x4xf64, #SparseMatrix>
	}			}
	No newline at end of file			No newline at end of file

mlir/test/Dialect/SparseTensor/sparse_concat.mlir

This file was added.

				// RUN: mlir-opt %s --sparse-tensor-conversion --canonicalize --cse \| FileCheck %s

				#SparseMatrix = #sparse_tensor.encoding<{dimLevelType = ["compressed", "compressed"]}>

				#SparseMatrix_P = #sparse_tensor.encoding<{
				dimLevelType = [ "compressed", "compressed" ],
				dimOrdering = affine_map<(i,j) -> (j,i)>
				}>

				// CHECK-LABEL: func.func @concat_mix_dense(
				// CHECK-SAME: %[[TMP_arg0:.*]]: tensor<2x4xf64>,
				// CHECK-SAME: %[[TMP_arg1:.*]]: !llvm.ptr<i8>)
				// CHECK-DAG: %[[TMP_c2:.*]] = arith.constant 2 : index
				// CHECK-DAG: %[[TMP_c6_i32:.*]] = arith.constant 6 : i32
				// CHECK-DAG: %[[TMP_c1_i32:.*]] = arith.constant 1 : i32
				// CHECK-DAG: %[[TMP_c0_i32:.*]] = arith.constant 0 : i32
				// CHECK-DAG: %[[TMP_c1_i8:.*]] = arith.constant 1 : i8
				// CHECK-DAG: %[[TMP_c3:.*]] = arith.constant 3 : index
				// CHECK-DAG: %[[TMP_c1:.*]] = arith.constant 1 : index
				// CHECK-DAG: %[[TMP_cst:.*]] = arith.constant 0.000000e+00 : f64
				// CHECK-DAG: %[[TMP_c0:.*]] = arith.constant 0 : index
				// CHECK-DAG: %[[TMP_c4:.*]] = arith.constant 4 : index
				// CHECK: %[[TMP_0:.*]] = memref.alloc() : memref<5x4xf64>
				// CHECK: linalg.fill ins(%[[TMP_cst]] : f64) outs(%[[TMP_0]] : memref<5x4xf64>)
				// CHECK: scf.for %[[TMP_arg2:.*]] = %[[TMP_c0]] to %[[TMP_c2]] step %[[TMP_c1]] {
				// CHECK: scf.for %[[TMP_arg3:.*]] = %[[TMP_c0]] to %[[TMP_c4]] step %[[TMP_c1]] {
				// CHECK: %[[TMP_12:.*]] = tensor.extract %[[TMP_arg0]][%[[TMP_arg2]], %[[TMP_arg3]]] : tensor<2x4xf64>
				// CHECK: %[[TMP_13:.*]] = arith.cmpf une, %[[TMP_12]], %[[TMP_cst]] : f64
				// CHECK: scf.if %[[TMP_13]] {
				// CHECK: memref.store %[[TMP_12]], %[[TMP_0]][%[[TMP_arg2]], %[[TMP_arg3]]] : memref<5x4xf64>
				// CHECK: }
				// CHECK: }
				// CHECK: }
				// CHECK: %[[TMP_1:.*]] = memref.alloca() : memref<2xi8>
				// CHECK: %[[TMP_2:.*]] = memref.cast %[[TMP_1]] : memref<2xi8> to memref<?xi8>
				// CHECK: memref.store %[[TMP_c1_i8]], %[[TMP_1]][%[[TMP_c0]]] : memref<2xi8>
				// CHECK: memref.store %[[TMP_c1_i8]], %[[TMP_1]][%[[TMP_c1]]] : memref<2xi8>
				// CHECK: %[[TMP_3:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_4:.*]] = memref.cast %[[TMP_3]] : memref<2xindex> to memref<?xindex>
				// CHECK: memref.store %[[TMP_c3]], %[[TMP_3]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: memref.store %[[TMP_c4]], %[[TMP_3]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_5:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_6:.*]] = memref.cast %[[TMP_5]] : memref<2xindex> to memref<?xindex>
				// CHECK: memref.store %[[TMP_c0]], %[[TMP_5]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: memref.store %[[TMP_c1]], %[[TMP_5]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_7:.*]] = call @newSparseTensor(%[[TMP_2]], %[[TMP_4]], %[[TMP_6]], %[[TMP_c0_i32]], %[[TMP_c0_i32]], %[[TMP_c1_i32]], %[[TMP_c6_i32]], %[[TMP_arg1]]) : (memref<?xi8>, memref<?xindex>, memref<?xindex>, i32, i32, i32, i32, !llvm.ptr<i8>) -> !llvm.ptr<i8>
				// CHECK: %[[TMP_8:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_9:.*]] = memref.cast %[[TMP_8]] : memref<2xindex> to memref<?xindex>
				// CHECK: %[[TMP_10:.*]] = memref.alloca() : memref<f64>
				// CHECK: scf.while : () -> () {
				// CHECK: %[[TMP_12:.*]] = func.call @getNextF64(%[[TMP_7]], %[[TMP_9]], %[[TMP_10]]) : (!llvm.ptr<i8>, memref<?xindex>, memref<f64>) -> i1
				// CHECK: scf.condition(%[[TMP_12]])
				// CHECK: } do {
				// CHECK: %[[TMP_12:.*]] = memref.load %[[TMP_8]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: %[[TMP_13:.*]] = arith.addi %[[TMP_12]], %[[TMP_c2]] : index
				// CHECK: %[[TMP_14:.*]] = memref.load %[[TMP_8]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_15:.*]] = memref.load %[[TMP_10]][] : memref<f64>
				// CHECK: memref.store %[[TMP_15]], %[[TMP_0]][%[[TMP_13]], %[[TMP_14]]] : memref<5x4xf64>
				// CHECK: scf.yield
				// CHECK: }
				// CHECK: call @delSparseTensorCOOF64(%[[TMP_7]]) : (!llvm.ptr<i8>) -> ()
				// CHECK: %[[TMP_11:.*]] = bufferization.to_tensor %[[TMP_0]] : memref<5x4xf64>
				// CHECK: return %[[TMP_11]] : tensor<5x4xf64>
				// CHECK: }
				func.func @concat_mix_dense(%arg0: tensor<2x4xf64>, %arg1: tensor<3x4xf64, #SparseMatrix>) -> tensor<5x4xf64> {
				%0 = sparse_tensor.concatenate %arg0, %arg1 {dimension = 0 : index}
				: tensor<2x4xf64>, tensor<3x4xf64, #SparseMatrix> to tensor<5x4xf64>
				return %0 : tensor<5x4xf64>
				}

				// CHECK-LABEL: func.func @concat_mix_sparse(
				// CHECK-SAME: %[[TMP_arg0:.*]]: tensor<2x4xf64>,
				// CHECK-SAME: %[[TMP_arg1:.*]]: !llvm.ptr<i8>)
				// CHECK-DAG: %[[TMP_c2:.*]] = arith.constant 2 : index
				// CHECK-DAG: %[[TMP_c2_i32:.*]] = arith.constant 2 : i32
				// CHECK-DAG: %[[TMP_c6_i32:.*]] = arith.constant 6 : i32
				// CHECK-DAG: %[[TMP_c3:.*]] = arith.constant 3 : index
				// CHECK-DAG: %[[TMP_cst:.*]] = arith.constant 0.000000e+00 : f64
				// CHECK-DAG: %[[TMP_c4_i32:.*]] = arith.constant 4 : i32
				// CHECK-DAG: %[[TMP_c1_i32:.*]] = arith.constant 1 : i32
				// CHECK-DAG: %[[TMP_c0_i32:.*]] = arith.constant 0 : i32
				// CHECK-DAG: %[[TMP_c1:.*]] = arith.constant 1 : index
				// CHECK-DAG: %[[TMP_c0:.*]] = arith.constant 0 : index
				// CHECK-DAG: %[[TMP_c5:.*]] = arith.constant 5 : index
				// CHECK-DAG: %[[TMP_c4:.*]] = arith.constant 4 : index
				// CHECK-DAG: %[[TMP_c1_i8:.*]] = arith.constant 1 : i8
				// CHECK: %[[TMP_0:.*]] = memref.alloca() : memref<2xi8>
				// CHECK: %[[TMP_1:.*]] = memref.cast %[[TMP_0]] : memref<2xi8> to memref<?xi8>
				// CHECK: memref.store %[[TMP_c1_i8]], %[[TMP_0]][%[[TMP_c0]]] : memref<2xi8>
				// CHECK: memref.store %[[TMP_c1_i8]], %[[TMP_0]][%[[TMP_c1]]] : memref<2xi8>
				// CHECK: %[[TMP_2:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_3:.*]] = memref.cast %[[TMP_2]] : memref<2xindex> to memref<?xindex>
				// CHECK: memref.store %[[TMP_c5]], %[[TMP_2]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: memref.store %[[TMP_c4]], %[[TMP_2]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_4:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_5:.*]] = memref.cast %[[TMP_4]] : memref<2xindex> to memref<?xindex>
				// CHECK: memref.store %[[TMP_c0]], %[[TMP_4]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: memref.store %[[TMP_c1]], %[[TMP_4]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_6:.*]] = llvm.mlir.null : !llvm.ptr<i8>
				// CHECK: %[[TMP_7:.*]] = call @newSparseTensor(%[[TMP_1]], %[[TMP_3]], %[[TMP_5]], %[[TMP_c0_i32]], %[[TMP_c0_i32]], %[[TMP_c1_i32]], %[[TMP_c4_i32]], %[[TMP_6]]) : (memref<?xi8>, memref<?xindex>, memref<?xindex>, i32, i32, i32, i32, !llvm.ptr<i8>) -> !llvm.ptr<i8>
				// CHECK: %[[TMP_8:.*]] = memref.alloca() : memref<f64>
				// CHECK: %[[TMP_9:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_10:.*]] = memref.cast %[[TMP_9]] : memref<2xindex> to memref<?xindex>
				// CHECK: scf.for %[[TMP_arg2:.*]] = %[[TMP_c0]] to %[[TMP_c2]] step %[[TMP_c1]] {
				// CHECK: scf.for %[[TMP_arg3:.*]] = %[[TMP_c0]] to %[[TMP_c4]] step %[[TMP_c1]] {
				// CHECK: memref.store %[[TMP_arg2]], %[[TMP_9]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: memref.store %[[TMP_arg3]], %[[TMP_9]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_22:.*]] = tensor.extract %[[TMP_arg0]][%[[TMP_arg2]], %[[TMP_arg3]]] : tensor<2x4xf64>
				// CHECK: %[[TMP_23:.*]] = arith.cmpf une, %[[TMP_22]], %[[TMP_cst]] : f64
				// CHECK: scf.if %[[TMP_23]] {
				// CHECK: memref.store %[[TMP_22]], %[[TMP_8]][] : memref<f64>
				// CHECK: %[[TMP_24:.*]] = func.call @addEltF64(%[[TMP_7]], %[[TMP_8]], %[[TMP_10]], %[[TMP_5]]) : (!llvm.ptr<i8>, memref<f64>, memref<?xindex>, memref<?xindex>) -> !llvm.ptr<i8>
				// CHECK: }
				// CHECK: }
				// CHECK: }
				// CHECK: %[[TMP_11:.*]] = memref.alloca() : memref<2xi8>
				// CHECK: %[[TMP_12:.*]] = memref.cast %[[TMP_11]] : memref<2xi8> to memref<?xi8>
				// CHECK: memref.store %[[TMP_c1_i8]], %[[TMP_11]][%[[TMP_c0]]] : memref<2xi8>
				// CHECK: memref.store %[[TMP_c1_i8]], %[[TMP_11]][%[[TMP_c1]]] : memref<2xi8>
				// CHECK: %[[TMP_13:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_14:.*]] = memref.cast %[[TMP_13]] : memref<2xindex> to memref<?xindex>
				// CHECK: memref.store %[[TMP_c3]], %[[TMP_13]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: memref.store %[[TMP_c4]], %[[TMP_13]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_15:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_16:.*]] = memref.cast %[[TMP_15]] : memref<2xindex> to memref<?xindex>
				// CHECK: memref.store %[[TMP_c0]], %[[TMP_15]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: memref.store %[[TMP_c1]], %[[TMP_15]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_17:.*]] = call @newSparseTensor(%[[TMP_12]], %[[TMP_14]], %[[TMP_16]], %[[TMP_c0_i32]], %[[TMP_c0_i32]], %[[TMP_c1_i32]], %[[TMP_c6_i32]], %[[TMP_arg1]]) : (memref<?xi8>, memref<?xindex>, memref<?xindex>, i32, i32, i32, i32, !llvm.ptr<i8>) -> !llvm.ptr<i8>
				// CHECK: %[[TMP_18:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_19:.*]] = memref.cast %[[TMP_18]] : memref<2xindex> to memref<?xindex>
				// CHECK: %[[TMP_20:.*]] = memref.alloca() : memref<f64>
				// CHECK: scf.while : () -> () {
				// CHECK: %[[TMP_22:.*]] = func.call @getNextF64(%[[TMP_17]], %[[TMP_19]], %[[TMP_20]]) : (!llvm.ptr<i8>, memref<?xindex>, memref<f64>) -> i1
				// CHECK: scf.condition(%[[TMP_22]])
				// CHECK: } do {
				// CHECK: %[[TMP_22:.*]] = memref.load %[[TMP_18]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: %[[TMP_23:.*]] = arith.addi %[[TMP_22]], %[[TMP_c2]] : index
				// CHECK: %[[TMP_24:.*]] = memref.load %[[TMP_18]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: memref.store %[[TMP_23]], %[[TMP_9]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: memref.store %[[TMP_24]], %[[TMP_9]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_25:.*]] = func.call @addEltF64(%[[TMP_7]], %[[TMP_20]], %[[TMP_10]], %[[TMP_5]]) : (!llvm.ptr<i8>, memref<f64>, memref<?xindex>, memref<?xindex>) -> !llvm.ptr<i8>
				// CHECK: scf.yield
				// CHECK: }
				// CHECK: call @delSparseTensorCOOF64(%[[TMP_17]]) : (!llvm.ptr<i8>) -> ()
				// CHECK: %[[TMP_21:.*]] = call @newSparseTensor(%[[TMP_1]], %[[TMP_3]], %[[TMP_5]], %[[TMP_c0_i32]], %[[TMP_c0_i32]], %[[TMP_c1_i32]], %[[TMP_c2_i32]], %[[TMP_7]]) : (memref<?xi8>, memref<?xindex>, memref<?xindex>, i32, i32, i32, i32, !llvm.ptr<i8>) -> !llvm.ptr<i8>
				// CHECK: call @delSparseTensorCOOF64(%[[TMP_7]]) : (!llvm.ptr<i8>) -> ()
				// CHECK: return %[[TMP_21]] : !llvm.ptr<i8>
				// CHECK: }
				func.func @concat_mix_sparse(%arg0: tensor<2x4xf64>, %arg1: tensor<3x4xf64, #SparseMatrix>) -> tensor<5x4xf64, #SparseMatrix> {
				%0 = sparse_tensor.concatenate %arg0, %arg1 {dimension = 0 : index}
				: tensor<2x4xf64>, tensor<3x4xf64, #SparseMatrix> to tensor<5x4xf64, #SparseMatrix>
				return %0 : tensor<5x4xf64, #SparseMatrix>
				}

				// CHECK-LABEL: func.func @concat_mix_sparse_perm_dim1(
				// CHECK-SAME: %[[TMP_arg0:.*]]: tensor<4x2xf64>,
				// CHECK-SAME: %[[TMP_arg1:.*]]: !llvm.ptr<i8>)
				// CHECK-DAG: %[[TMP_c2:.*]] = arith.constant 2 : index
				// CHECK-DAG: %[[TMP_c2_i32:.*]] = arith.constant 2 : i32
				// CHECK-DAG: %[[TMP_c6_i32:.*]] = arith.constant 6 : i32
				// CHECK-DAG: %[[TMP_c3:.*]] = arith.constant 3 : index
				// CHECK-DAG: %[[TMP_cst:.*]] = arith.constant 0.000000e+00 : f64
				// CHECK-DAG: %[[TMP_c4_i32:.*]] = arith.constant 4 : i32
				// CHECK-DAG: %[[TMP_c1_i32:.*]] = arith.constant 1 : i32
				// CHECK-DAG: %[[TMP_c0_i32:.*]] = arith.constant 0 : i32
				// CHECK-DAG: %[[TMP_c1:.*]] = arith.constant 1 : index
				// CHECK-DAG: %[[TMP_c0:.*]] = arith.constant 0 : index
				// CHECK-DAG: %[[TMP_c4:.*]] = arith.constant 4 : index
				// CHECK-DAG: %[[TMP_c5:.*]] = arith.constant 5 : index
				// CHECK-DAG: %[[TMP_c1_i8:.*]] = arith.constant 1 : i8
				// CHECK: %[[TMP_0:.*]] = memref.alloca() : memref<2xi8>
				// CHECK: %[[TMP_1:.*]] = memref.cast %[[TMP_0]] : memref<2xi8> to memref<?xi8>
				// CHECK: memref.store %[[TMP_c1_i8]], %[[TMP_0]][%[[TMP_c0]]] : memref<2xi8>
				// CHECK: memref.store %[[TMP_c1_i8]], %[[TMP_0]][%[[TMP_c1]]] : memref<2xi8>
				// CHECK: %[[TMP_2:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_3:.*]] = memref.cast %[[TMP_2]] : memref<2xindex> to memref<?xindex>
				// CHECK: memref.store %[[TMP_c4]], %[[TMP_2]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: memref.store %[[TMP_c5]], %[[TMP_2]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_4:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_5:.*]] = memref.cast %[[TMP_4]] : memref<2xindex> to memref<?xindex>
				// CHECK: memref.store %[[TMP_c1]], %[[TMP_4]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: memref.store %[[TMP_c0]], %[[TMP_4]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_6:.*]] = llvm.mlir.null : !llvm.ptr<i8>
				// CHECK: %[[TMP_7:.*]] = call @newSparseTensor(%[[TMP_1]], %[[TMP_3]], %[[TMP_5]], %[[TMP_c0_i32]], %[[TMP_c0_i32]], %[[TMP_c1_i32]], %[[TMP_c4_i32]], %[[TMP_6]]) : (memref<?xi8>, memref<?xindex>, memref<?xindex>, i32, i32, i32, i32, !llvm.ptr<i8>) -> !llvm.ptr<i8>
				// CHECK: %[[TMP_8:.*]] = memref.alloca() : memref<f64>
				// CHECK: %[[TMP_9:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_10:.*]] = memref.cast %[[TMP_9]] : memref<2xindex> to memref<?xindex>
				// CHECK: scf.for %[[TMP_arg2:.*]] = %[[TMP_c0]] to %[[TMP_c4]] step %[[TMP_c1]] {
				// CHECK: scf.for %[[TMP_arg3:.*]] = %[[TMP_c0]] to %[[TMP_c2]] step %[[TMP_c1]] {
				// CHECK: memref.store %[[TMP_arg2]], %[[TMP_9]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: memref.store %[[TMP_arg3]], %[[TMP_9]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_22:.*]] = tensor.extract %[[TMP_arg0]][%[[TMP_arg2]], %[[TMP_arg3]]] : tensor<4x2xf64>
				// CHECK: %[[TMP_23:.*]] = arith.cmpf une, %[[TMP_22]], %[[TMP_cst]] : f64
				// CHECK: scf.if %[[TMP_23]] {
				// CHECK: memref.store %[[TMP_22]], %[[TMP_8]][] : memref<f64>
				// CHECK: %[[TMP_24:.*]] = func.call @addEltF64(%[[TMP_7]], %[[TMP_8]], %[[TMP_10]], %[[TMP_5]]) : (!llvm.ptr<i8>, memref<f64>, memref<?xindex>, memref<?xindex>) -> !llvm.ptr<i8>
				// CHECK: }
				// CHECK: }
				// CHECK: }
				// CHECK: %[[TMP_11:.*]] = memref.alloca() : memref<2xi8>
				// CHECK: %[[TMP_12:.*]] = memref.cast %[[TMP_11]] : memref<2xi8> to memref<?xi8>
				// CHECK: memref.store %[[TMP_c1_i8]], %[[TMP_11]][%[[TMP_c0]]] : memref<2xi8>
				// CHECK: memref.store %[[TMP_c1_i8]], %[[TMP_11]][%[[TMP_c1]]] : memref<2xi8>
				// CHECK: %[[TMP_13:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_14:.*]] = memref.cast %[[TMP_13]] : memref<2xindex> to memref<?xindex>
				// CHECK: memref.store %[[TMP_c4]], %[[TMP_13]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: memref.store %[[TMP_c3]], %[[TMP_13]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_15:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_16:.*]] = memref.cast %[[TMP_15]] : memref<2xindex> to memref<?xindex>
				// CHECK: memref.store %[[TMP_c0]], %[[TMP_15]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: memref.store %[[TMP_c1]], %[[TMP_15]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_17:.*]] = call @newSparseTensor(%[[TMP_12]], %[[TMP_14]], %[[TMP_16]], %[[TMP_c0_i32]], %[[TMP_c0_i32]], %[[TMP_c1_i32]], %[[TMP_c6_i32]], %[[TMP_arg1]]) : (memref<?xi8>, memref<?xindex>, memref<?xindex>, i32, i32, i32, i32, !llvm.ptr<i8>) -> !llvm.ptr<i8>
				// CHECK: %[[TMP_18:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_19:.*]] = memref.cast %[[TMP_18]] : memref<2xindex> to memref<?xindex>
				// CHECK: %[[TMP_20:.*]] = memref.alloca() : memref<f64>
				// CHECK: scf.while : () -> () {
				// CHECK: %[[TMP_22:.*]] = func.call @getNextF64(%[[TMP_17]], %[[TMP_19]], %[[TMP_20]]) : (!llvm.ptr<i8>, memref<?xindex>, memref<f64>) -> i1
				// CHECK: scf.condition(%[[TMP_22]])
				// CHECK: } do {
				// CHECK: %[[TMP_22:.*]] = memref.load %[[TMP_18]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: %[[TMP_23:.*]] = memref.load %[[TMP_18]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_24:.*]] = arith.addi %[[TMP_23]], %[[TMP_c2]] : index
				// CHECK: memref.store %[[TMP_22]], %[[TMP_9]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: memref.store %[[TMP_24]], %[[TMP_9]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_25:.*]] = func.call @addEltF64(%[[TMP_7]], %[[TMP_20]], %[[TMP_10]], %[[TMP_5]]) : (!llvm.ptr<i8>, memref<f64>, memref<?xindex>, memref<?xindex>) -> !llvm.ptr<i8>
				// CHECK: scf.yield
				// CHECK: }
				// CHECK: call @delSparseTensorCOOF64(%[[TMP_17]]) : (!llvm.ptr<i8>) -> ()
				// CHECK: %[[TMP_21:.*]] = call @newSparseTensor(%[[TMP_1]], %[[TMP_3]], %[[TMP_5]], %[[TMP_c0_i32]], %[[TMP_c0_i32]], %[[TMP_c1_i32]], %[[TMP_c2_i32]], %[[TMP_7]]) : (memref<?xi8>, memref<?xindex>, memref<?xindex>, i32, i32, i32, i32, !llvm.ptr<i8>) -> !llvm.ptr<i8>
				// CHECK: call @delSparseTensorCOOF64(%[[TMP_7]]) : (!llvm.ptr<i8>) -> ()
				// CHECK: return %[[TMP_21]] : !llvm.ptr<i8>
				// CHECK: }
				func.func @concat_mix_sparse_perm_dim1(%arg0: tensor<4x2xf64>, %arg1: tensor<4x3xf64, #SparseMatrix_P>) -> tensor<4x5xf64, #SparseMatrix_P> {
				%0 = sparse_tensor.concatenate %arg0, %arg1 {dimension = 1 : index}
				: tensor<4x2xf64>, tensor<4x3xf64, #SparseMatrix_P> to tensor<4x5xf64, #SparseMatrix_P>
				return %0 : tensor<4x5xf64, #SparseMatrix_P>
				}

				// CHECK-LABEL: func.func @concat_mix_dense_perm_dim1(
				// CHECK-SAME: %[[TMP_arg0:.*]]: tensor<4x2xf64>,
				// CHECK-SAME: %[[TMP_arg1:.*]]: !llvm.ptr<i8>)
				// CHECK-DAG: %[[TMP_c2:.*]] = arith.constant 2 : index
				// CHECK-DAG: %[[TMP_c6_i32:.*]] = arith.constant 6 : i32
				// CHECK-DAG: %[[TMP_c1_i32:.*]] = arith.constant 1 : i32
				// CHECK-DAG: %[[TMP_c0_i32:.*]] = arith.constant 0 : i32
				// CHECK-DAG: %[[TMP_c1_i8:.*]] = arith.constant 1 : i8
				// CHECK-DAG: %[[TMP_c3:.*]] = arith.constant 3 : index
				// CHECK-DAG: %[[TMP_c1:.*]] = arith.constant 1 : index
				// CHECK-DAG: %[[TMP_cst:.*]] = arith.constant 0.000000e+00 : f64
				// CHECK-DAG: %[[TMP_c0:.*]] = arith.constant 0 : index
				// CHECK-DAG: %[[TMP_c4:.*]] = arith.constant 4 : index
				// CHECK: %[[TMP_0:.*]] = memref.alloc() : memref<4x5xf64>
				// CHECK: linalg.fill ins(%[[TMP_cst]] : f64) outs(%[[TMP_0]] : memref<4x5xf64>)
				// CHECK: scf.for %[[TMP_arg2:.*]] = %[[TMP_c0]] to %[[TMP_c4]] step %[[TMP_c1]] {
				// CHECK: scf.for %[[TMP_arg3:.*]] = %[[TMP_c0]] to %[[TMP_c2]] step %[[TMP_c1]] {
				// CHECK: %[[TMP_12:.*]] = tensor.extract %[[TMP_arg0]][%[[TMP_arg2]], %[[TMP_arg3]]] : tensor<4x2xf64>
				// CHECK: %[[TMP_13:.*]] = arith.cmpf une, %[[TMP_12]], %[[TMP_cst]] : f64
				// CHECK: scf.if %[[TMP_13]] {
				// CHECK: memref.store %[[TMP_12]], %[[TMP_0]][%[[TMP_arg2]], %[[TMP_arg3]]] : memref<4x5xf64>
				// CHECK: }
				// CHECK: }
				// CHECK: }
				// CHECK: %[[TMP_1:.*]] = memref.alloca() : memref<2xi8>
				// CHECK: %[[TMP_2:.*]] = memref.cast %[[TMP_1]] : memref<2xi8> to memref<?xi8>
				// CHECK: memref.store %[[TMP_c1_i8]], %[[TMP_1]][%[[TMP_c0]]] : memref<2xi8>
				// CHECK: memref.store %[[TMP_c1_i8]], %[[TMP_1]][%[[TMP_c1]]] : memref<2xi8>
				// CHECK: %[[TMP_3:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_4:.*]] = memref.cast %[[TMP_3]] : memref<2xindex> to memref<?xindex>
				// CHECK: memref.store %[[TMP_c4]], %[[TMP_3]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: memref.store %[[TMP_c3]], %[[TMP_3]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_5:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_6:.*]] = memref.cast %[[TMP_5]] : memref<2xindex> to memref<?xindex>
				// CHECK: memref.store %[[TMP_c0]], %[[TMP_5]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: memref.store %[[TMP_c1]], %[[TMP_5]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_7:.*]] = call @newSparseTensor(%[[TMP_2]], %[[TMP_4]], %[[TMP_6]], %[[TMP_c0_i32]], %[[TMP_c0_i32]], %[[TMP_c1_i32]], %[[TMP_c6_i32]], %[[TMP_arg1]]) : (memref<?xi8>, memref<?xindex>, memref<?xindex>, i32, i32, i32, i32, !llvm.ptr<i8>) -> !llvm.ptr<i8>
				// CHECK: %[[TMP_8:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_9:.*]] = memref.cast %[[TMP_8]] : memref<2xindex> to memref<?xindex>
				// CHECK: %[[TMP_10:.*]] = memref.alloca() : memref<f64>
				// CHECK: scf.while : () -> () {
				// CHECK: %[[TMP_12:.*]] = func.call @getNextF64(%[[TMP_7]], %[[TMP_9]], %[[TMP_10]]) : (!llvm.ptr<i8>, memref<?xindex>, memref<f64>) -> i1
				// CHECK: scf.condition(%[[TMP_12]])
				// CHECK: } do {
				// CHECK: %[[TMP_12:.*]] = memref.load %[[TMP_8]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: %[[TMP_13:.*]] = memref.load %[[TMP_8]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_14:.*]] = arith.addi %[[TMP_13]], %[[TMP_c2]] : index
				// CHECK: %[[TMP_15:.*]] = memref.load %[[TMP_10]][] : memref<f64>
				// CHECK: memref.store %[[TMP_15]], %[[TMP_0]][%[[TMP_12]], %[[TMP_14]]] : memref<4x5xf64>
				// CHECK: scf.yield
				// CHECK: }
				// CHECK: call @delSparseTensorCOOF64(%[[TMP_7]]) : (!llvm.ptr<i8>) -> ()
				// CHECK: %[[TMP_11:.*]] = bufferization.to_tensor %[[TMP_0]] : memref<4x5xf64>
				// CHECK: return %[[TMP_11]] : tensor<4x5xf64>
				// CHECK: }
				func.func @concat_mix_dense_perm_dim1(%arg0: tensor<4x2xf64>, %arg1: tensor<4x3xf64, #SparseMatrix_P>) -> tensor<4x5xf64> {
				%0 = sparse_tensor.concatenate %arg0, %arg1 {dimension = 1 : index}
				: tensor<4x2xf64>, tensor<4x3xf64, #SparseMatrix_P> to tensor<4x5xf64>
				return %0 : tensor<4x5xf64>
				}

				// CHECK-LABEL: func.func @concat_mix_dense_perm_dim1_dyn(
				// CHECK-SAME: %[[TMP_arg0:.*]]: tensor<?x?xf64>,
				// CHECK-SAME: %[[TMP_arg1:.*]]: !llvm.ptr<i8>)
				// CHECK-DAG: %[[TMP_c6_i32:.*]] = arith.constant 6 : i32
				// CHECK-DAG: %[[TMP_c1_i32:.*]] = arith.constant 1 : i32
				// CHECK-DAG: %[[TMP_c0_i32:.*]] = arith.constant 0 : i32
				// CHECK-DAG: %[[TMP_c1_i8:.*]] = arith.constant 1 : i8
				// CHECK-DAG: %[[TMP_cst:.*]] = arith.constant 0.000000e+00 : f64
				// CHECK-DAG: %[[TMP_c1:.*]] = arith.constant 1 : index
				// CHECK-DAG: %[[TMP_c0:.*]] = arith.constant 0 : index
				// CHECK: %[[TMP_0:.*]] = tensor.dim %[[TMP_arg0]], %[[TMP_c0]] : tensor<?x?xf64>
				// CHECK: %[[TMP_1:.*]] = tensor.dim %[[TMP_arg0]], %[[TMP_c1]] : tensor<?x?xf64>
				// CHECK: %[[TMP_2:.*]] = call @sparseDimSize(%[[TMP_arg1]], %[[TMP_c1]]) : (!llvm.ptr<i8>, index) -> index
				// CHECK: %[[TMP_3:.*]] = arith.addi %[[TMP_2]], %[[TMP_1]] : index
				// CHECK: %[[TMP_4:.*]] = memref.alloc(%[[TMP_0]], %[[TMP_3]]) : memref<?x?xf64>
				// CHECK: linalg.fill ins(%[[TMP_cst]] : f64) outs(%[[TMP_4]] : memref<?x?xf64>)
				// CHECK: scf.for %[[TMP_arg2:.*]] = %[[TMP_c0]] to %[[TMP_0]] step %[[TMP_c1]] {
				// CHECK: scf.for %[[TMP_arg3:.*]] = %[[TMP_c0]] to %[[TMP_1]] step %[[TMP_c1]] {
				// CHECK: %[[TMP_19:.*]] = tensor.extract %[[TMP_arg0]][%[[TMP_arg2]], %[[TMP_arg3]]] : tensor<?x?xf64>
				// CHECK: %[[TMP_20:.*]] = arith.cmpf une, %[[TMP_19]], %[[TMP_cst]] : f64
				// CHECK: scf.if %[[TMP_20]] {
				// CHECK: memref.store %[[TMP_19]], %[[TMP_4]][%[[TMP_arg2]], %[[TMP_arg3]]] : memref<?x?xf64>
				// CHECK: }
				// CHECK: }
				// CHECK: }
				// CHECK: %[[TMP_5:.*]] = call @sparseDimSize(%[[TMP_arg1]], %[[TMP_c0]]) : (!llvm.ptr<i8>, index) -> index
				// CHECK: %[[TMP_6:.*]] = call @sparseDimSize(%[[TMP_arg1]], %[[TMP_c1]]) : (!llvm.ptr<i8>, index) -> index
				// CHECK: %[[TMP_7:.*]] = memref.alloca() : memref<2xi8>
				// CHECK: %[[TMP_8:.*]] = memref.cast %[[TMP_7]] : memref<2xi8> to memref<?xi8>
				// CHECK: memref.store %[[TMP_c1_i8]], %[[TMP_7]][%[[TMP_c0]]] : memref<2xi8>
				// CHECK: memref.store %[[TMP_c1_i8]], %[[TMP_7]][%[[TMP_c1]]] : memref<2xi8>
				// CHECK: %[[TMP_9:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_10:.*]] = memref.cast %[[TMP_9]] : memref<2xindex> to memref<?xindex>
				// CHECK: memref.store %[[TMP_5]], %[[TMP_9]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: memref.store %[[TMP_6]], %[[TMP_9]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_11:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_12:.*]] = memref.cast %[[TMP_11]] : memref<2xindex> to memref<?xindex>
				// CHECK: memref.store %[[TMP_c0]], %[[TMP_11]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: memref.store %[[TMP_c1]], %[[TMP_11]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_13:.*]] = call @newSparseTensor(%[[TMP_8]], %[[TMP_10]], %[[TMP_12]], %[[TMP_c0_i32]], %[[TMP_c0_i32]], %[[TMP_c1_i32]], %[[TMP_c6_i32]], %[[TMP_arg1]]) : (memref<?xi8>, memref<?xindex>, memref<?xindex>, i32, i32, i32, i32, !llvm.ptr<i8>) -> !llvm.ptr<i8>
				// CHECK: %[[TMP_14:.*]] = memref.alloca() : memref<2xindex>
				// CHECK: %[[TMP_15:.*]] = memref.cast %[[TMP_14]] : memref<2xindex> to memref<?xindex>
				// CHECK: %[[TMP_16:.*]] = memref.alloca() : memref<f64>
				// CHECK: scf.while : () -> () {
				// CHECK: %[[TMP_19:.*]] = func.call @getNextF64(%[[TMP_13]], %[[TMP_15]], %[[TMP_16]]) : (!llvm.ptr<i8>, memref<?xindex>, memref<f64>) -> i1
				// CHECK: scf.condition(%[[TMP_19]])
				// CHECK: } do {
				// CHECK: %[[TMP_19:.*]] = memref.load %[[TMP_14]][%[[TMP_c0]]] : memref<2xindex>
				// CHECK: %[[TMP_20:.*]] = memref.load %[[TMP_14]][%[[TMP_c1]]] : memref<2xindex>
				// CHECK: %[[TMP_21:.*]] = arith.addi %[[TMP_20]], %[[TMP_1]] : index
				// CHECK: %[[TMP_22:.*]] = memref.load %[[TMP_16]][] : memref<f64>
				// CHECK: memref.store %[[TMP_22]], %[[TMP_4]][%[[TMP_19]], %[[TMP_21]]] : memref<?x?xf64>
				// CHECK: scf.yield
				// CHECK: }
				// CHECK: call @delSparseTensorCOOF64(%[[TMP_13]]) : (!llvm.ptr<i8>) -> ()
				// CHECK: %[[TMP_17:.*]] = call @sparseDimSize(%[[TMP_arg1]], %[[TMP_c1]]) : (!llvm.ptr<i8>, index) -> index
				// CHECK: %[[TMP_18:.*]] = bufferization.to_tensor %[[TMP_4]] : memref<?x?xf64>
				// CHECK: return %[[TMP_18]] : tensor<?x?xf64>
				// CHECK: }
				func.func @concat_mix_dense_perm_dim1_dyn(%arg0: tensor<?x?xf64>, %arg1: tensor<?x?xf64, #SparseMatrix>) -> tensor<?x?xf64> {
				%0 = sparse_tensor.concatenate %arg0, %arg1 {dimension = 1 : index}
				: tensor<?x?xf64>, tensor<?x?xf64, #SparseMatrix> to tensor<?x?xf64>
				return %0 : tensor<?x?xf64>
				}

mlir/test/Integration/Dialect/SparseTensor/CPU/concatenate.mlir

This file was added.

				// RUN: mlir-opt %s --sparse-compiler \| \
				// RUN: mlir-cpu-runner \
				// RUN: -e entry -entry-point-result=void \
				// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext \| \
				// RUN: FileCheck %s

				#MAT_C_C = #sparse_tensor.encoding<{dimLevelType = ["compressed", "compressed"]}>
				#MAT_D_C = #sparse_tensor.encoding<{dimLevelType = ["dense", "compressed"]}>
				#MAT_C_D = #sparse_tensor.encoding<{dimLevelType = ["compressed", "dense"]}>

				#MAT_C_C_P = #sparse_tensor.encoding<{
				dimLevelType = [ "compressed", "compressed" ],
				dimOrdering = affine_map<(i,j) -> (j,i)>
				}>

				#MAT_C_D_P = #sparse_tensor.encoding<{
				dimLevelType = [ "compressed", "dense" ],
				dimOrdering = affine_map<(i,j) -> (j,i)>
				}>

				#MAT_D_C_P = #sparse_tensor.encoding<{
				dimLevelType = [ "dense", "compressed" ],
				dimOrdering = affine_map<(i,j) -> (j,i)>
				}>

				module {
				bixiaUnsubmitted Not Done Reply Inline Actions Very comprehensive testing! Though, I am not sure what we really want to test all these 16 combination (SS, SD) x (S, D) x (0, 1) x (NP, P), plus a test with dynamic shape. I would test these four cases (SS, SD) x (S, D), for which I would make sure some use dimension 0 , dimension 1, NonPermute, Permute, and a dynamic shape case. bixia: Very comprehensive testing! Though, I am not sure what we really want to test all these 16…
				PeimingAuthorUnsubmitted Done Reply Inline Actions I am not sure about it. Isn't it always good to have more test cases? I was trying to test not only sparse/dense but also different sparse encodings. (Although the codegen for different types of sparse tensors is the same, the runtime library is implemented differently) Peiming: I am not sure about it. Isn't it always good to have more test cases? I was trying to test not…
				aartbikUnsubmitted Not Done Reply Inline Actions I am okay with keeping more test cases, as long as runtime is not excessive, good to be exhaustive! aartbik: I am okay with keeping more test cases, as long as runtime is not excessive, good to be…
				//
				// Tests without permutation.
				//

				// Concat all sparse matrices (with different encodings) to a sparse matrix.
				bixiaUnsubmitted Done Reply Inline Actions s/Concat/Concats/ There are a few similar places below. bixia: s/Concat/Concats/ There are a few similar places below.
				func.func @concat_sparse_sparse(%arg0: tensor<2x4xf64, #MAT_C_C>, %arg1: tensor<3x4xf64, #MAT_C_D>, %arg2: tensor<4x4xf64, #MAT_D_C>) -> tensor<9x4xf64, #MAT_C_C> {
				%0 = sparse_tensor.concatenate %arg0, %arg1, %arg2 {dimension = 0 : index}
				aartbikUnsubmitted Not Done Reply Inline Actions can we break and align parameters for readability of the file? aartbik: can we break and align parameters for readability of the file?
				: tensor<2x4xf64, #MAT_C_C>, tensor<3x4xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C> to tensor<9x4xf64, #MAT_C_C>
				return %0 : tensor<9x4xf64, #MAT_C_C>
				}

				// Concat all sparse matrices (with different encodings) to a dense matrix.
				func.func @concat_sparse_dense(%arg0: tensor<2x4xf64, #MAT_C_C>, %arg1: tensor<3x4xf64, #MAT_C_D>, %arg2: tensor<4x4xf64, #MAT_D_C>) -> tensor<9x4xf64> {
				%0 = sparse_tensor.concatenate %arg0, %arg1, %arg2 {dimension = 0 : index}
				: tensor<2x4xf64, #MAT_C_C>, tensor<3x4xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C> to tensor<9x4xf64>
				return %0 : tensor<9x4xf64>
				}

				// Concat mix types of matrix to a sparse matrix
				bixiaUnsubmitted Done Reply Inline Actions What is "mix types"? Do you mean "mix sparse and dense matrix"? bixia: What is "mix types"? Do you mean "mix sparse and dense matrix"?
				func.func @concat_mix_sparse(%arg0: tensor<2x4xf64>, %arg1: tensor<3x4xf64, #MAT_C_D>, %arg2: tensor<4x4xf64, #MAT_D_C>) -> tensor<9x4xf64, #MAT_C_C> {
				aartbikUnsubmitted Not Done Reply Inline Actions Here and below, period at end of comment aartbik: Here and below, period at end of comment
				%0 = sparse_tensor.concatenate %arg0, %arg1, %arg2 {dimension = 0 : index}
				: tensor<2x4xf64>, tensor<3x4xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C> to tensor<9x4xf64, #MAT_C_C>
				return %0 : tensor<9x4xf64, #MAT_C_C>
				}

				// Concat mix types of matrix to a dense matrix
				func.func @concat_mix_dense(%arg0: tensor<2x4xf64>, %arg1: tensor<3x4xf64, #MAT_C_D>, %arg2: tensor<4x4xf64, #MAT_D_C>) -> tensor<9x4xf64> {
				%0 = sparse_tensor.concatenate %arg0, %arg1, %arg2 {dimension = 0 : index}
				: tensor<2x4xf64>, tensor<3x4xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C> to tensor<9x4xf64>
				return %0 : tensor<9x4xf64>
				}

				//
				// Tests with permutation.
				//

				// Concat all sparse matrices (with different encodings) to a sparse matrix.
				func.func @concat_sparse_sparse_perm(%arg0: tensor<2x4xf64, #MAT_C_C_P>, %arg1: tensor<3x4xf64, #MAT_C_D>, %arg2: tensor<4x4xf64, #MAT_D_C>) -> tensor<9x4xf64, #MAT_C_C_P> {
				%0 = sparse_tensor.concatenate %arg0, %arg1, %arg2 {dimension = 0 : index}
				: tensor<2x4xf64, #MAT_C_C_P>, tensor<3x4xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C> to tensor<9x4xf64, #MAT_C_C_P>
				return %0 : tensor<9x4xf64, #MAT_C_C_P>
				}

				// Concat all sparse matrices (with different encodings) to a dense matrix.
				func.func @concat_sparse_dense_perm(%arg0: tensor<2x4xf64, #MAT_C_C_P>, %arg1: tensor<3x4xf64, #MAT_C_D_P>, %arg2: tensor<4x4xf64, #MAT_D_C>) -> tensor<9x4xf64> {
				%0 = sparse_tensor.concatenate %arg0, %arg1, %arg2 {dimension = 0 : index}
				: tensor<2x4xf64, #MAT_C_C_P>, tensor<3x4xf64, #MAT_C_D_P>, tensor<4x4xf64, #MAT_D_C> to tensor<9x4xf64>
				return %0 : tensor<9x4xf64>
				}

				// Concat mix types of matrix to a sparse matrix
				func.func @concat_mix_sparse_perm(%arg0: tensor<2x4xf64>, %arg1: tensor<3x4xf64, #MAT_C_D_P>, %arg2: tensor<4x4xf64, #MAT_D_C>) -> tensor<9x4xf64, #MAT_C_C> {
				%0 = sparse_tensor.concatenate %arg0, %arg1, %arg2 {dimension = 0 : index}
				: tensor<2x4xf64>, tensor<3x4xf64, #MAT_C_D_P>, tensor<4x4xf64, #MAT_D_C> to tensor<9x4xf64, #MAT_C_C>
				return %0 : tensor<9x4xf64, #MAT_C_C>
				}

				// Concat mix types of matrix to a dense matrix
				func.func @concat_mix_dense_perm(%arg0: tensor<2x4xf64>, %arg1: tensor<3x4xf64, #MAT_C_D>, %arg2: tensor<4x4xf64, #MAT_D_C_P>) -> tensor<9x4xf64> {
				%0 = sparse_tensor.concatenate %arg0, %arg1, %arg2 {dimension = 0 : index}
				: tensor<2x4xf64>, tensor<3x4xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C_P> to tensor<9x4xf64>
				return %0 : tensor<9x4xf64>
				}

				//
				// Tests without perumutation (concatenate on dimension 1)
				//

				// Concat all sparse matrices (with different encodings) to a sparse matrix.
				func.func @concat_sparse_sparse_dim1(%arg0: tensor<4x2xf64, #MAT_C_C>, %arg1: tensor<4x3xf64, #MAT_C_D>, %arg2: tensor<4x4xf64, #MAT_D_C>) -> tensor<4x9xf64, #MAT_C_C> {
				%0 = sparse_tensor.concatenate %arg0, %arg1, %arg2 {dimension = 1 : index}
				: tensor<4x2xf64, #MAT_C_C>, tensor<4x3xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C> to tensor<4x9xf64, #MAT_C_C>
				return %0 : tensor<4x9xf64, #MAT_C_C>
				}

				// Concat all sparse matrices (with different encodings) to a dense matrix.
				func.func @concat_sparse_dense_dim1(%arg0: tensor<4x2xf64, #MAT_C_C>, %arg1: tensor<4x3xf64, #MAT_C_D>, %arg2: tensor<4x4xf64, #MAT_D_C>) -> tensor<4x9xf64> {
				%0 = sparse_tensor.concatenate %arg0, %arg1, %arg2 {dimension = 1 : index}
				: tensor<4x2xf64, #MAT_C_C>, tensor<4x3xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C> to tensor<4x9xf64>
				return %0 : tensor<4x9xf64>
				}

				// Concat mix types of matrix to a sparse matrix
				func.func @concat_mix_sparse_dim1(%arg0: tensor<4x2xf64>, %arg1: tensor<4x3xf64, #MAT_C_D>, %arg2: tensor<4x4xf64, #MAT_D_C>) -> tensor<4x9xf64, #MAT_C_C> {
				%0 = sparse_tensor.concatenate %arg0, %arg1, %arg2 {dimension = 1 : index}
				: tensor<4x2xf64>, tensor<4x3xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C> to tensor<4x9xf64, #MAT_C_C>
				return %0 : tensor<4x9xf64, #MAT_C_C>
				}

				// Concat mix types of matrix to a dense matrix
				func.func @concat_mix_dense_dim1(%arg0: tensor<4x2xf64>, %arg1: tensor<4x3xf64, #MAT_C_D>, %arg2: tensor<4x4xf64, #MAT_D_C>) -> tensor<4x9xf64> {
				%0 = sparse_tensor.concatenate %arg0, %arg1, %arg2 {dimension = 1 : index}
				: tensor<4x2xf64>, tensor<4x3xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C> to tensor<4x9xf64>
				return %0 : tensor<4x9xf64>
				}

				//
				// Tests with perumutation (concatenate on dimension 1)
				//

				// Concat all sparse matrices (with different encodings) to a sparse matrix.
				func.func @concat_sparse_sparse_perm_dim1(%arg0: tensor<4x2xf64, #MAT_C_C_P>, %arg1: tensor<4x3xf64, #MAT_C_D>, %arg2: tensor<4x4xf64, #MAT_D_C>) -> tensor<4x9xf64, #MAT_C_C_P> {
				%0 = sparse_tensor.concatenate %arg0, %arg1, %arg2 {dimension = 1 : index}
				: tensor<4x2xf64, #MAT_C_C_P>, tensor<4x3xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C> to tensor<4x9xf64, #MAT_C_C_P>
				return %0 : tensor<4x9xf64, #MAT_C_C_P>
				}

				// Concat all sparse matrices (with different encodings) to a dense matrix.
				func.func @concat_sparse_dense_perm_dim1(%arg0: tensor<4x2xf64, #MAT_C_C_P>, %arg1: tensor<4x3xf64, #MAT_C_D_P>, %arg2: tensor<4x4xf64, #MAT_D_C>) -> tensor<4x9xf64> {
				%0 = sparse_tensor.concatenate %arg0, %arg1, %arg2 {dimension = 1 : index}
				: tensor<4x2xf64, #MAT_C_C_P>, tensor<4x3xf64, #MAT_C_D_P>, tensor<4x4xf64, #MAT_D_C> to tensor<4x9xf64>
				return %0 : tensor<4x9xf64>
				}

				// Concat mix types of matrix to a sparse matrix
				func.func @concat_mix_sparse_perm_dim1(%arg0: tensor<4x2xf64>, %arg1: tensor<4x3xf64, #MAT_C_D_P>, %arg2: tensor<4x4xf64, #MAT_D_C>) -> tensor<4x9xf64, #MAT_C_C> {
				%0 = sparse_tensor.concatenate %arg0, %arg1, %arg2 {dimension = 1 : index}
				: tensor<4x2xf64>, tensor<4x3xf64, #MAT_C_D_P>, tensor<4x4xf64, #MAT_D_C> to tensor<4x9xf64, #MAT_C_C>
				return %0 : tensor<4x9xf64, #MAT_C_C>
				}

				// Concat mix types of matrix to a dense matrix
				func.func @concat_mix_dense_perm_dim1(%arg0: tensor<4x2xf64>, %arg1: tensor<4x3xf64, #MAT_C_D>, %arg2: tensor<4x4xf64, #MAT_D_C_P>) -> tensor<4x9xf64> {
				%0 = sparse_tensor.concatenate %arg0, %arg1, %arg2 {dimension = 1 : index}
				: tensor<4x2xf64>, tensor<4x3xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C_P> to tensor<4x9xf64>
				return %0 : tensor<4x9xf64>
				}

				//
				// Concat mix types of matrix to a sparse matrix (with dynamic sizes)
				//
				func.func @concat_mix_sparse_dyn(%arg0: tensor<4x2xf64>, %arg1: tensor<?x?xf64, #MAT_C_D>, %arg2: tensor<?x?xf64, #MAT_D_C>) -> tensor<?x?xf64, #MAT_C_C> {
				%0 = sparse_tensor.concatenate %arg0, %arg1, %arg2 {dimension = 1 : index}
				: tensor<4x2xf64>, tensor<?x?xf64, #MAT_C_D>, tensor<?x?xf64, #MAT_D_C> to tensor<?x?xf64, #MAT_C_C>
				return %0 : tensor<?x?xf64, #MAT_C_C>
				}

				func.func @dump_mat_9x4(%A: tensor<9x4xf64, #MAT_C_C>) {
				%c0 = arith.constant 0 : index
				%du = arith.constant -1.0 : f64

				%c = sparse_tensor.convert %A : tensor<9x4xf64, #MAT_C_C> to tensor<9x4xf64>
				%m = bufferization.to_memref %c : memref<9x4xf64>
				%v = vector.transfer_read %m[%c0, %c0], %du: memref<9x4xf64>, vector<9x4xf64>
				vector.print %v : vector<9x4xf64>

				%1 = sparse_tensor.values %A : tensor<9x4xf64, #MAT_C_C> to memref<?xf64>
				%2 = vector.transfer_read %1[%c0], %du: memref<?xf64>, vector<36xf64>
				vector.print %2 : vector<36xf64>

				return
				}

				func.func @dump_mat_perm_9x4(%A: tensor<9x4xf64, #MAT_C_C_P>) {
				%c0 = arith.constant 0 : index
				%du = arith.constant -1.0 : f64

				%c = sparse_tensor.convert %A : tensor<9x4xf64, #MAT_C_C_P> to tensor<9x4xf64>
				%m = bufferization.to_memref %c : memref<9x4xf64>
				%v = vector.transfer_read %m[%c0, %c0], %du: memref<9x4xf64>, vector<9x4xf64>
				vector.print %v : vector<9x4xf64>

				%1 = sparse_tensor.values %A : tensor<9x4xf64, #MAT_C_C_P> to memref<?xf64>
				%2 = vector.transfer_read %1[%c0], %du: memref<?xf64>, vector<36xf64>
				vector.print %2 : vector<36xf64>

				return
				}

				func.func @dump_mat_dense_9x4(%A: tensor<9x4xf64>) {
				%c0 = arith.constant 0 : index
				%du = arith.constant -1.0 : f64

				%m = bufferization.to_memref %A : memref<9x4xf64>
				%v = vector.transfer_read %m[%c0, %c0], %du: memref<9x4xf64>, vector<9x4xf64>
				vector.print %v : vector<9x4xf64>

				return
				}

				func.func @dump_mat_4x9(%A: tensor<4x9xf64, #MAT_C_C>) {
				%c0 = arith.constant 0 : index
				%du = arith.constant -1.0 : f64

				%c = sparse_tensor.convert %A : tensor<4x9xf64, #MAT_C_C> to tensor<4x9xf64>
				%m = bufferization.to_memref %c : memref<4x9xf64>
				%v = vector.transfer_read %m[%c0, %c0], %du: memref<4x9xf64>, vector<4x9xf64>
				vector.print %v : vector<4x9xf64>

				%1 = sparse_tensor.values %A : tensor<4x9xf64, #MAT_C_C> to memref<?xf64>
				%2 = vector.transfer_read %1[%c0], %du: memref<?xf64>, vector<36xf64>
				vector.print %2 : vector<36xf64>

				return
				}

				func.func @dump_mat_dyn(%A: tensor<?x?xf64, #MAT_C_C>) {
				%c0 = arith.constant 0 : index
				%du = arith.constant -1.0 : f64

				%c = sparse_tensor.convert %A : tensor<?x?xf64, #MAT_C_C> to tensor<?x?xf64>
				%m = bufferization.to_memref %c : memref<?x?xf64>
				%v = vector.transfer_read %m[%c0, %c0], %du: memref<?x?xf64>, vector<4x9xf64>
				vector.print %v : vector<4x9xf64>

				%1 = sparse_tensor.values %A : tensor<?x?xf64, #MAT_C_C> to memref<?xf64>
				%2 = vector.transfer_read %1[%c0], %du: memref<?xf64>, vector<36xf64>
				vector.print %2 : vector<36xf64>

				return
				}

				func.func @dump_mat_perm_4x9(%A: tensor<4x9xf64, #MAT_C_C_P>) {
				%c0 = arith.constant 0 : index
				%du = arith.constant -1.0 : f64

				%c = sparse_tensor.convert %A : tensor<4x9xf64, #MAT_C_C_P> to tensor<4x9xf64>
				%m = bufferization.to_memref %c : memref<4x9xf64>
				%v = vector.transfer_read %m[%c0, %c0], %du: memref<4x9xf64>, vector<4x9xf64>
				vector.print %v : vector<4x9xf64>

				%1 = sparse_tensor.values %A : tensor<4x9xf64, #MAT_C_C_P> to memref<?xf64>
				%2 = vector.transfer_read %1[%c0], %du: memref<?xf64>, vector<36xf64>
				vector.print %2 : vector<36xf64>

				return
				}

				func.func @dump_mat_dense_4x9(%A: tensor<4x9xf64>) {
				%c0 = arith.constant 0 : index
				%du = arith.constant -1.0 : f64

				%m = bufferization.to_memref %A : memref<4x9xf64>
				%v = vector.transfer_read %m[%c0, %c0], %du: memref<4x9xf64>, vector<4x9xf64>
				vector.print %v : vector<4x9xf64>

				return
				}

				// Driver method to call and verify kernels.
				func.func @entry() {
				%m42 = arith.constant dense<
				[ [ 1.0, 0.0 ],
				[ 3.1, 0.0 ],
				[ 0.0, 2.0 ],
				[ 0.0, 0.0 ] ]> : tensor<4x2xf64>
				%m43 = arith.constant dense<
				[ [ 1.0, 0.0, 1.0 ],
				[ 1.0, 0.0, 0.5 ],
				[ 0.0, 0.0, 1.0 ],
				[ 5.0, 2.0, 0.0 ] ]> : tensor<4x3xf64>
				%m24 = arith.constant dense<
				[ [ 1.0, 0.0, 3.0, 0.0],
				[ 0.0, 2.0, 0.0, 0.0] ]> : tensor<2x4xf64>
				%m34 = arith.constant dense<
				[ [ 1.0, 0.0, 1.0, 1.0],
				[ 0.0, 0.5, 0.0, 0.0],
				[ 1.0, 5.0, 2.0, 0.0] ]> : tensor<3x4xf64>
				%m44 = arith.constant dense<
				[ [ 0.0, 0.0, 1.5, 1.0],
				[ 0.0, 3.5, 0.0, 0.0],
				[ 1.0, 5.0, 2.0, 0.0],
				[ 1.0, 0.5, 0.0, 0.0] ]> : tensor<4x4xf64>

				%sm24cc = sparse_tensor.convert %m24 : tensor<2x4xf64> to tensor<2x4xf64, #MAT_C_C>
				%sm34cd = sparse_tensor.convert %m34 : tensor<3x4xf64> to tensor<3x4xf64, #MAT_C_D>
				%sm42cc = sparse_tensor.convert %m42 : tensor<4x2xf64> to tensor<4x2xf64, #MAT_C_C>
				%sm43cd = sparse_tensor.convert %m43 : tensor<4x3xf64> to tensor<4x3xf64, #MAT_C_D>
				%sm44dc = sparse_tensor.convert %m44 : tensor<4x4xf64> to tensor<4x4xf64, #MAT_D_C>

				%sm24ccp = sparse_tensor.convert %m24 : tensor<2x4xf64> to tensor<2x4xf64, #MAT_C_C_P>
				%sm34cdp = sparse_tensor.convert %m34 : tensor<3x4xf64> to tensor<3x4xf64, #MAT_C_D_P>
				%sm42ccp = sparse_tensor.convert %m42 : tensor<4x2xf64> to tensor<4x2xf64, #MAT_C_C_P>
				%sm43cdp = sparse_tensor.convert %m43 : tensor<4x3xf64> to tensor<4x3xf64, #MAT_C_D_P>
				%sm44dcp = sparse_tensor.convert %m44 : tensor<4x4xf64> to tensor<4x4xf64, #MAT_D_C_P>

				%sm43cd_dyn = sparse_tensor.convert %m43 : tensor<4x3xf64> to tensor<?x?xf64, #MAT_C_D>
				%sm44dc_dyn = sparse_tensor.convert %m44 : tensor<4x4xf64> to tensor<?x?xf64, #MAT_D_C>

				// CHECK: ( ( 1, 0, 3, 0 ), ( 0, 2, 0, 0 ), ( 1, 0, 1, 1 ), ( 0, 0.5, 0, 0 ), ( 1, 5, 2, 0 ), ( 0, 0, 1.5, 1 ), ( 0, 3.5, 0, 0 ), ( 1, 5, 2, 0 ), ( 1, 0.5, 0, 0 ) )
				// CHECK-NEXT: ( 1, 3, 2, 1, 0, 1, 1, 0, 0.5, 0, 0, 1, 5, 2, 0, 1.5, 1, 3.5, 1, 5, 2, 1, 0.5, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 )
				%0 = call @concat_sparse_sparse(%sm24cc, %sm34cd, %sm44dc)
				: (tensor<2x4xf64, #MAT_C_C>, tensor<3x4xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C>) -> tensor<9x4xf64, #MAT_C_C>
				call @dump_mat_9x4(%0) : (tensor<9x4xf64, #MAT_C_C>) -> ()

				// CHECK-NEXT: ( ( 1, 0, 3, 0 ), ( 0, 2, 0, 0 ), ( 1, 0, 1, 1 ), ( 0, 0.5, 0, 0 ), ( 1, 5, 2, 0 ), ( 0, 0, 1.5, 1 ), ( 0, 3.5, 0, 0 ), ( 1, 5, 2, 0 ), ( 1, 0.5, 0, 0 ) )
				%1 = call @concat_sparse_dense(%sm24cc, %sm34cd, %sm44dc)
				: (tensor<2x4xf64, #MAT_C_C>, tensor<3x4xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C>) -> tensor<9x4xf64>
				call @dump_mat_dense_9x4(%1) : (tensor<9x4xf64>) -> ()

				// CHECK-NEXT: ( ( 1, 0, 3, 0 ), ( 0, 2, 0, 0 ), ( 1, 0, 1, 1 ), ( 0, 0.5, 0, 0 ), ( 1, 5, 2, 0 ), ( 0, 0, 1.5, 1 ), ( 0, 3.5, 0, 0 ), ( 1, 5, 2, 0 ), ( 1, 0.5, 0, 0 ) )
				// CHECK-NEXT: ( 1, 3, 2, 1, 0, 1, 1, 0, 0.5, 0, 0, 1, 5, 2, 0, 1.5, 1, 3.5, 1, 5, 2, 1, 0.5, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 )
				%2 = call @concat_mix_sparse(%m24, %sm34cd, %sm44dc)
				: (tensor<2x4xf64>, tensor<3x4xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C>) -> tensor<9x4xf64, #MAT_C_C>
				call @dump_mat_9x4(%2) : (tensor<9x4xf64, #MAT_C_C>) -> ()

				// CHECK-NEXT: ( ( 1, 0, 3, 0 ), ( 0, 2, 0, 0 ), ( 1, 0, 1, 1 ), ( 0, 0.5, 0, 0 ), ( 1, 5, 2, 0 ), ( 0, 0, 1.5, 1 ), ( 0, 3.5, 0, 0 ), ( 1, 5, 2, 0 ), ( 1, 0.5, 0, 0 ) )
				%3 = call @concat_mix_dense(%m24, %sm34cd, %sm44dc)
				: (tensor<2x4xf64>, tensor<3x4xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C>) -> tensor<9x4xf64>
				call @dump_mat_dense_9x4(%3) : (tensor<9x4xf64>) -> ()

				// CHECK-NEXT: ( ( 1, 0, 3, 0 ), ( 0, 2, 0, 0 ), ( 1, 0, 1, 1 ), ( 0, 0.5, 0, 0 ), ( 1, 5, 2, 0 ), ( 0, 0, 1.5, 1 ), ( 0, 3.5, 0, 0 ), ( 1, 5, 2, 0 ), ( 1, 0.5, 0, 0 ) )
				// CHECK-NEXT: ( 1, 1, 0, 1, 1, 1, 2, 0, 0.5, 5, 3.5, 5, 0.5, 3, 1, 0, 2, 1.5, 2, 1, 0, 0, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 )
				%4 = call @concat_sparse_sparse_perm(%sm24ccp, %sm34cd, %sm44dc)
				: (tensor<2x4xf64, #MAT_C_C_P>, tensor<3x4xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C>) -> tensor<9x4xf64, #MAT_C_C_P>
				call @dump_mat_perm_9x4(%4) : (tensor<9x4xf64, #MAT_C_C_P>) -> ()

				// CHECK-NEXT: ( ( 1, 0, 3, 0 ), ( 0, 2, 0, 0 ), ( 1, 0, 1, 1 ), ( 0, 0.5, 0, 0 ), ( 1, 5, 2, 0 ), ( 0, 0, 1.5, 1 ), ( 0, 3.5, 0, 0 ), ( 1, 5, 2, 0 ), ( 1, 0.5, 0, 0 ) )
				%5 = call @concat_sparse_dense_perm(%sm24ccp, %sm34cdp, %sm44dc)
				: (tensor<2x4xf64, #MAT_C_C_P>, tensor<3x4xf64, #MAT_C_D_P>, tensor<4x4xf64, #MAT_D_C>) -> tensor<9x4xf64>
				call @dump_mat_dense_9x4(%5) : (tensor<9x4xf64>) -> ()

				// CHECK-NEXT: ( ( 1, 0, 3, 0 ), ( 0, 2, 0, 0 ), ( 1, 0, 1, 1 ), ( 0, 0.5, 0, 0 ), ( 1, 5, 2, 0 ), ( 0, 0, 1.5, 1 ), ( 0, 3.5, 0, 0 ), ( 1, 5, 2, 0 ), ( 1, 0.5, 0, 0 ) )
				// CHECK-NEXT: ( 1, 3, 2, 1, 0, 1, 1, 0, 0.5, 0, 0, 1, 5, 2, 0, 1.5, 1, 3.5, 1, 5, 2, 1, 0.5, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 )
				%6 = call @concat_mix_sparse_perm(%m24, %sm34cdp, %sm44dc)
				: (tensor<2x4xf64>, tensor<3x4xf64, #MAT_C_D_P>, tensor<4x4xf64, #MAT_D_C>) -> tensor<9x4xf64, #MAT_C_C>
				call @dump_mat_9x4(%6) : (tensor<9x4xf64, #MAT_C_C>) -> ()

				// CHECK-NEXT: ( ( 1, 0, 3, 0 ), ( 0, 2, 0, 0 ), ( 1, 0, 1, 1 ), ( 0, 0.5, 0, 0 ), ( 1, 5, 2, 0 ), ( 0, 0, 1.5, 1 ), ( 0, 3.5, 0, 0 ), ( 1, 5, 2, 0 ), ( 1, 0.5, 0, 0 ) )
				%7 = call @concat_mix_dense_perm(%m24, %sm34cd, %sm44dcp)
				: (tensor<2x4xf64>, tensor<3x4xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C_P>) -> tensor<9x4xf64>
				call @dump_mat_dense_9x4(%7) : (tensor<9x4xf64>) -> ()

				// CHECK-NEXT: ( ( 1, 0, 1, 0, 1, 0, 0, 1.5, 1 ), ( 3.1, 0, 1, 0, 0.5, 0, 3.5, 0, 0 ), ( 0, 2, 0, 0, 1, 1, 5, 2, 0 ), ( 0, 0, 5, 2, 0, 1, 0.5, 0, 0 ) )
				// CHECK-NEXT: ( 1, 1, 0, 1, 1.5, 1, 3.1, 1, 0, 0.5, 3.5, 2, 0, 0, 1, 1, 5, 2, 5, 2, 0, 1, 0.5, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 )
				%8 = call @concat_sparse_sparse_dim1(%sm42cc, %sm43cd, %sm44dc)
				: (tensor<4x2xf64, #MAT_C_C>, tensor<4x3xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C>) -> tensor<4x9xf64, #MAT_C_C>
				call @dump_mat_4x9(%8) : (tensor<4x9xf64, #MAT_C_C>) -> ()

				// CHECK-NEXT: ( ( 1, 0, 1, 0, 1, 0, 0, 1.5, 1 ), ( 3.1, 0, 1, 0, 0.5, 0, 3.5, 0, 0 ), ( 0, 2, 0, 0, 1, 1, 5, 2, 0 ), ( 0, 0, 5, 2, 0, 1, 0.5, 0, 0 ) )
				%9 = call @concat_sparse_dense_dim1(%sm42cc, %sm43cd, %sm44dc)
				: (tensor<4x2xf64, #MAT_C_C>, tensor<4x3xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C>) -> tensor<4x9xf64>
				call @dump_mat_dense_4x9(%9) : (tensor<4x9xf64>) -> ()

				// CHECK-NEXT: ( ( 1, 0, 1, 0, 1, 0, 0, 1.5, 1 ), ( 3.1, 0, 1, 0, 0.5, 0, 3.5, 0, 0 ), ( 0, 2, 0, 0, 1, 1, 5, 2, 0 ), ( 0, 0, 5, 2, 0, 1, 0.5, 0, 0 ) )
				// CHECK-NEXT: ( 1, 1, 0, 1, 1.5, 1, 3.1, 1, 0, 0.5, 3.5, 2, 0, 0, 1, 1, 5, 2, 5, 2, 0, 1, 0.5, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 )
				%10 = call @concat_mix_sparse_dim1(%m42, %sm43cd, %sm44dc)
				: (tensor<4x2xf64>, tensor<4x3xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C>) -> tensor<4x9xf64, #MAT_C_C>
				call @dump_mat_4x9(%10) : (tensor<4x9xf64, #MAT_C_C>) -> ()

				// CHECK-NEXT: ( ( 1, 0, 1, 0, 1, 0, 0, 1.5, 1 ), ( 3.1, 0, 1, 0, 0.5, 0, 3.5, 0, 0 ), ( 0, 2, 0, 0, 1, 1, 5, 2, 0 ), ( 0, 0, 5, 2, 0, 1, 0.5, 0, 0 ) )
				%11 = call @concat_mix_dense_dim1(%m42, %sm43cd, %sm44dc)
				: (tensor<4x2xf64>, tensor<4x3xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C>) -> tensor<4x9xf64>
				call @dump_mat_dense_4x9(%11) : (tensor<4x9xf64>) -> ()

				// CHECK-NEXT: ( ( 1, 0, 1, 0, 1, 0, 0, 1.5, 1 ), ( 3.1, 0, 1, 0, 0.5, 0, 3.5, 0, 0 ), ( 0, 2, 0, 0, 1, 1, 5, 2, 0 ), ( 0, 0, 5, 2, 0, 1, 0.5, 0, 0 ) )
				// CHECK-NEXT: ( 1, 3.1, 2, 1, 1, 0, 5, 0, 0, 0, 2, 1, 0.5, 1, 0, 1, 1, 3.5, 5, 0.5, 1.5, 2, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 )
				%12 = call @concat_sparse_sparse_perm_dim1(%sm42ccp, %sm43cd, %sm44dc)
				: (tensor<4x2xf64, #MAT_C_C_P>, tensor<4x3xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C>) -> tensor<4x9xf64, #MAT_C_C_P>
				call @dump_mat_perm_4x9(%12) : (tensor<4x9xf64, #MAT_C_C_P>) -> ()

				// CHECK-NEXT: ( ( 1, 0, 1, 0, 1, 0, 0, 1.5, 1 ), ( 3.1, 0, 1, 0, 0.5, 0, 3.5, 0, 0 ), ( 0, 2, 0, 0, 1, 1, 5, 2, 0 ), ( 0, 0, 5, 2, 0, 1, 0.5, 0, 0 ) )
				%13 = call @concat_sparse_dense_perm_dim1(%sm42ccp, %sm43cdp, %sm44dc)
				: (tensor<4x2xf64, #MAT_C_C_P>, tensor<4x3xf64, #MAT_C_D_P>, tensor<4x4xf64, #MAT_D_C>) -> tensor<4x9xf64>
				call @dump_mat_dense_4x9(%13) : (tensor<4x9xf64>) -> ()

				// CHECK-NEXT: ( ( 1, 0, 1, 0, 1, 0, 0, 1.5, 1 ), ( 3.1, 0, 1, 0, 0.5, 0, 3.5, 0, 0 ), ( 0, 2, 0, 0, 1, 1, 5, 2, 0 ), ( 0, 0, 5, 2, 0, 1, 0.5, 0, 0 ) )
				// CHECK-NEXT: ( 1, 1, 0, 1, 1.5, 1, 3.1, 1, 0, 0.5, 3.5, 2, 0, 0, 1, 1, 5, 2, 5, 2, 0, 1, 0.5, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 )
				%14 = call @concat_mix_sparse_perm_dim1(%m42, %sm43cdp, %sm44dc)
				: (tensor<4x2xf64>, tensor<4x3xf64, #MAT_C_D_P>, tensor<4x4xf64, #MAT_D_C>) -> tensor<4x9xf64, #MAT_C_C>
				call @dump_mat_4x9(%14) : (tensor<4x9xf64, #MAT_C_C>) -> ()

				// CHECK-NEXT: ( ( 1, 0, 1, 0, 1, 0, 0, 1.5, 1 ), ( 3.1, 0, 1, 0, 0.5, 0, 3.5, 0, 0 ), ( 0, 2, 0, 0, 1, 1, 5, 2, 0 ), ( 0, 0, 5, 2, 0, 1, 0.5, 0, 0 ) )
				%15 = call @concat_mix_dense_perm_dim1(%m42, %sm43cd, %sm44dcp)
				: (tensor<4x2xf64>, tensor<4x3xf64, #MAT_C_D>, tensor<4x4xf64, #MAT_D_C_P>) -> tensor<4x9xf64>
				call @dump_mat_dense_4x9(%15) : (tensor<4x9xf64>) -> ()

				// CHECK-NEXT: ( ( 1, 0, 1, 0, 1, 0, 0, 1.5, 1 ), ( 3.1, 0, 1, 0, 0.5, 0, 3.5, 0, 0 ), ( 0, 2, 0, 0, 1, 1, 5, 2, 0 ), ( 0, 0, 5, 2, 0, 1, 0.5, 0, 0 ) )
				// CHECK-NEXT: ( 1, 1, 0, 1, 1.5, 1, 3.1, 1, 0, 0.5, 3.5, 2, 0, 0, 1, 1, 5, 2, 5, 2, 0, 1, 0.5, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 )
				%16 = call @concat_mix_sparse_dyn(%m42, %sm43cd_dyn, %sm44dc_dyn)
				: (tensor<4x2xf64>, tensor<?x?xf64, #MAT_C_D>, tensor<?x?xf64, #MAT_D_C>) -> tensor<?x?xf64, #MAT_C_C>
				call @dump_mat_dyn(%16) : (tensor<?x?xf64, #MAT_C_C>) -> ()

				// Release resources.
				bufferization.dealloc_tensor %sm24cc : tensor<2x4xf64, #MAT_C_C>
				bufferization.dealloc_tensor %sm34cd : tensor<3x4xf64, #MAT_C_D>
				bufferization.dealloc_tensor %sm42cc : tensor<4x2xf64, #MAT_C_C>
				bufferization.dealloc_tensor %sm43cd : tensor<4x3xf64, #MAT_C_D>
				bufferization.dealloc_tensor %sm44dc : tensor<4x4xf64, #MAT_D_C>
				bufferization.dealloc_tensor %sm24ccp : tensor<2x4xf64, #MAT_C_C_P>
				bufferization.dealloc_tensor %sm34cdp : tensor<3x4xf64, #MAT_C_D_P>
				bufferization.dealloc_tensor %sm42ccp : tensor<4x2xf64, #MAT_C_C_P>
				bufferization.dealloc_tensor %sm43cdp : tensor<4x3xf64, #MAT_C_D_P>
				bufferization.dealloc_tensor %sm44dcp : tensor<4x4xf64, #MAT_D_C_P>
				bufferization.dealloc_tensor %sm43cd_dyn : tensor<?x?xf64, #MAT_C_D>
				bufferization.dealloc_tensor %sm44dc_dyn : tensor<?x?xf64, #MAT_D_C>
				bufferization.dealloc_tensor %0 : tensor<9x4xf64, #MAT_C_C>
				bufferization.dealloc_tensor %1 : tensor<9x4xf64>
				bufferization.dealloc_tensor %2 : tensor<9x4xf64, #MAT_C_C>
				bufferization.dealloc_tensor %3 : tensor<9x4xf64>
				bufferization.dealloc_tensor %4 : tensor<9x4xf64, #MAT_C_C_P>
				bufferization.dealloc_tensor %5 : tensor<9x4xf64>
				bufferization.dealloc_tensor %6 : tensor<9x4xf64, #MAT_C_C>
				bufferization.dealloc_tensor %7 : tensor<9x4xf64>
				bufferization.dealloc_tensor %8 : tensor<4x9xf64, #MAT_C_C>
				bufferization.dealloc_tensor %9 : tensor<4x9xf64>
				bufferization.dealloc_tensor %10 : tensor<4x9xf64, #MAT_C_C>
				bufferization.dealloc_tensor %11 : tensor<4x9xf64>
				bufferization.dealloc_tensor %12 : tensor<4x9xf64, #MAT_C_C_P>
				bufferization.dealloc_tensor %13 : tensor<4x9xf64>
				bufferization.dealloc_tensor %14 : tensor<4x9xf64, #MAT_C_C>
				bufferization.dealloc_tensor %15 : tensor<4x9xf64>
				bufferization.dealloc_tensor %16 : tensor<?x?xf64, #MAT_C_C>
				return
				}
				}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] Add sparse_tensor.concatenate operatorAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 446568

mlir/include/mlir/Dialect/SparseTensor/IR/SparseTensorOps.td

mlir/lib/Dialect/SparseTensor/Transforms/SparseTensorConversion.cpp

mlir/test/Dialect/SparseTensor/roundtrip.mlir

mlir/test/Dialect/SparseTensor/sparse_concat.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/concatenate.mlir

[mlir][sparse] Add sparse_tensor.concatenate operator
AbandonedPublic