This is an archive of the discontinued LLVM Phabricator instance.

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1808–1810	I think it'd be better to just inline these into the definition of `GPU_MatTransposeMode` (see `SparseTensorSortKindEnum` for an example using that style)
1818	It would be better to use `GPU_Dialect.cppNamespace` instead of this string. It's best to avoid using literals whenever possible, both for future-proofing (not that they're likely to change this one any time soon) and to help guide the reader (i.e., to make explicit when things are definitionally equal, rather than just happening to be contingently equal).
1848	I think it'd be better to name this `modeA` (or more ideally `transposeModeA` if there's any chance of there being other "modes" in the future). The term "op" is generally used for operations, which this isn't. And ditto for the other ops.
1931	Since the `GPU_MatTransposeModeAttr` arguments seem to be attached to particular `GPU_SparseHandle` arguments, I think it'd be better to use a syntax that helps make that more clear. For example, you could use something like `gpu.spmm_buffersize async [%dep] %env, %spmatA{%opA}, %dnmatB{%opB}, %dnmatC` (where I chose to use curly braces since that's what's typically used for attributes). And ditto for the other ops.

K-Wu added inline comments.May 23 2023, 2:54 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1848	I felt hard to choose when I wrote this code. The reason of using op is that cusparse documentation name this argument as opA. Shall I still use modeA? Do you think it necessary to comment here that this is corresponding to the opA in cusparse documentation?

Harbormaster completed remote builds in B233985: Diff 524886.May 23 2023, 3:17 PM

wrengr added inline comments.May 23 2023, 3:21 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1931	Also, you may want to make these attributes optional, where they default to `NON_TRANSPOSE` when not provided. Since doing so will help avoid needing to make most of the changes to Conversion/GPUCommon/GPUToLLVMConversion.cpp and the various mlir test files. I think the cleanest way to do that would be to modify the definition of `GPU_MatTransposeModeAttr` to specify the default (so it can be shared by all the ops). Though you'll have to make sure to update the assemblyFormat of each op to allow dropping the curly braces when the mode isn't provided. And you'll probably need to add some additional builders for each op, so that they can be defaulted on the C++ side. (Ideally the standard builders should be able to handle this automatically, but I don't recall whether they do or not for this particular case.)
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
1402–1403	To avoid repeating this idiom all over, you should define a helper function (cf., SparseTensor/Transforms/CodegenUtils for some examples of us doing this elsewhere). I'd say ditto for `dType` idiom, though those aren't touched by this patch; so feel free to leave this cleanup for a separate CL
1403	Style-guide says to use `static_cast` here, instead of the C-style cast.
1487	What does this todo actually mean? Cuz it looks like you are passing the transpose modes...

aartbik added inline comments.May 23 2023, 3:28 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1842	This will not be an SSA value, %opA, but an actual "literal" attribute, as in NON_TRANSPOSE, right? So please update the examples accordingly

addressing wren's comments

rm TODOs already done

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
1487	Good catch! I forgot to remove this after finishing this item

K-Wu marked 2 inline comments as done.May 23 2023, 3:59 PM

static_cast

K-Wu marked an inline comment as done.May 23 2023, 4:02 PM

K-Wu marked an inline comment as done.May 23 2023, 4:09 PM

reformat

upd examples in GPUOps.td

K-Wu marked an inline comment as done.May 23 2023, 4:26 PM

Harbormaster completed remote builds in B234016: Diff 524928.May 23 2023, 4:44 PM

rename to TransposeOperator to address concern

K-Wu marked an inline comment as done.May 23 2023, 5:25 PM

renaming opA|B to tOpA|B

K-Wu added inline comments.May 23 2023, 5:33 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1848	Okay let's see if this is better. I now use transposeOpA\|B and tOpA\|B.

Harbormaster completed remote builds in B234032: Diff 524949.May 23 2023, 5:43 PM

fix compile error

add MatTransposeOp default value

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1931	This is somewhat addressed by the use of Default_GPU_MatTransposeOpAttr in the last commit

Harbormaster completed remote builds in B234037: Diff 524955.May 23 2023, 6:19 PM

fix stale assemblyFormat in gpu.spmm

Harbormaster completed remote builds in B234048: Diff 524967.May 23 2023, 6:58 PM

presumably closing two comments. Please reopen/make new comments if you would like to suggest otherwise.

rebase origin/main

Harbormaster completed remote builds in B234573: Diff 525700.May 25 2023, 11:09 AM

fixing unresolved merge conflict

Harbormaster completed remote builds in B234584: Diff 525714.May 25 2023, 11:45 AM

K-Wu updated this revision to Diff 525728.May 25 2023, 11:46 AM

fixing broken build

Harbormaster completed remote builds in B234595: Diff 525728.May 25 2023, 12:20 PM

aartbik added inline comments.May 25 2023, 12:29 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1813	Note that these should always be kept in sync with typedef enum { CUSPARSE_OPERATION_NON_TRANSPOSE = 0, CUSPARSE_OPERATION_TRANSPOSE = 1, CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE = 2 } cusparseOperation_t; I had a similar dilemma for the index types and the data types. We definitely do not want to pull in cusparse.h values in here, but we also don't want to introduce inconsistencies later. I have no good solution for this yet, but at least document the requirement for keeping these consistent here
1837	add NON_TRANSPOSE (the default), ... since the doc page does not see the code above
1842	I like this syntax! Very neat! Note that you get bonus points for a syntax where the default is never printed, only the non default cases (okay for follow up,but we have similar policies for the sparse tensor type features)
1873	same on default
1910	default
1948	default
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
745	you can remove the transpose form TODO from L740 ;-) since you have this now

fixing test error

K-Wu added inline comments.May 25 2023, 1:13 PM

mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
745	This diff introduces the transpose attribute but I haven't added rewrite rules to recognize these patterns in loop nest and set the matrix as transposed. Shall I keep this comment and address it in new diffs?

wrengr added inline comments.May 25 2023, 1:15 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1808	Even if you stick with the "op" name for the operation arguments, you should absolutely not use "op" here— because the "Op" suffix is explicitly and exclusively reserved for operations. Personally I'd go with `TransposeMode` since the "mat" part doesn't add any information/disambiguation. (If the "mat" prefix is there for consistency with cuSparse names, then I suppose it's okay)
1821–1822	You should instead add `let defaultValue = "MatTransposeOp::NON_TRANSPOSE";` to the definition of `GPU_MatTransposeOpAttr` itself, since it's the natural default value for all uses of that EnumAttr. Whereas DefaultValuedAttr is more for situations where either (a) you don't have control over the original attribute definition and therefore can't change it, or (b) you have some op-specific default that's different from the usual default (or the usual lack thereof).
1848	I'm not a big fan of single-letter abbreviations like that. Single letters like "A"/"X" are okay because those are in some sense their full names, and the "spmat"/"dn" parts are adjectives applying to those names. (And, arguably, the "spmat"/"dn" parts can be dropped since that's already captured by the types themselves and so there would be no ambiguity when seeing `getX` or `getA` in the C++ code; though if that's what cuSparse calls them, then there's no harm in keeping the longer names.) Whereas "t" could mean anything, isn't part of a standardized naming scheme (like "A"/"X" are), and doesn't (imo) help to disambiguate what "op" means. I totally get your motivation for going with "op", and I like where you're mind's at :) It's just unfortunate that "op" means something totally different in MLIR, so there's no good way to be consistent with both cuSparse and MLIR at the same time. Personally, I'd side with MLIR for this case, though I'd defer to our noble TL if he feels otherwise. My reasoning for resolving this case comes from a two-pronged argument about which side readers would be most familiar with, and hence is least likely to cause confusion. Since this is part of the MLIR ecosystem, folks must be at least somewhat familiar with MLIR. And since MLIR isn't ubiquitously popular, that (ironically) makes readers even more likely to be familiar with MLIR or trying to become so. (Whereas things which are ubiquitously popular often have dabblers who care more about The Other Thing.) Conversely, and I may well be wrong about this part, I don't think readers are as likely to be familiar with cuSparse, or rather as likely to have greater familiarity/comfortability with cuSparse than with MLIR. Though now that you've mentioned that the name comes from cuSparse, I'm thinking I might know where they're coming from. Namely, I'm guessing they mean "op" in the sense of the category theory notation for duality ("A^{op}"), which means different things depending on what the "A" being dualized is: namely for real-matrices it's transpose, but for complex-matrices it's conjugate-transpose. But if that is indeed where they're coming from, then I feel like that's all the more reason to side with MLIR— since anyone who's familiar with the category theory notation will also be familiar with several other notations for these operations, since linear algebraists can't decide between the half-dozen different traditions for what to name them. That said, if we do side with MLIR over cuSparse, I'm now thinking the better short-name would be "transA" rather than "modeA" (and if you're feeling more verbose, then the better middling-/long-names would be "transposeA" and "transposeModeA")
1856	You should make this whole chunk into an optional-group. I believe "`(` `{` $tOpA^ `}`)?" should be the correct syntax for that, though you may need to play around with it since sometimes the optional-group stuff doesn't parse in the way it seems like it should. For more information, cf: https://mlir.llvm.org/docs/DefiningDialects/AttributesAndTypes/#optional-and-default-valued-parameters
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
1203	I think it'd read nicer to use the prefix "constant" rather than "genConstFrom". And this function is a prime example of why the enum must not be named with "Op": since when reading this I got very confused about why the argument was already an operation rather than the enum/attr, even despite the fact that I'd just re-read through the code above!
1204	Why not just have this function use `llvmInt32Type` directly? (Since that's the type that's used everywhere, and since this function doesn't do any validation that the parameter is indeed a 32-bit integer type, and since no other type would really make much sense)
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
462	Do you actually need to pass this argument since it's the default value? (i.e., do the automatically generated builders allow skipping it) If so, then you should add a new builder definition which uses the default in lieu of requiring an argument here. (And ditto for all the other ops, of course)
mlir/test/Dialect/GPU/ops.mlir
338	If you adjust the assemblyFormat to make the transpose-mode optional like I showed, then you should be able to revert all these changes to the test files (since the generated printers ought to avoid printing the attribute whenever it's the default value)

Harbormaster completed remote builds in B234621: Diff 525768.May 25 2023, 1:39 PM

attempting to remove Default_GPU_MatTransposeOpAttr

K-Wu marked an inline comment as done.May 25 2023, 2:09 PM

K-Wu added inline comments.

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1848	That makes a lot of sense! I didn't know the cetegory theory notation and was confused when I saw cusparse uses op in their documentation. Thanks a lot for this very detailed reasoning. I will follow the MLIR naming convension as you mentioned then.
1856	Noted with great thanks! I really appreciate your mentioning the links and examples in all of your comments. These are great aid to me.

K-Wu marked 4 inline comments as done.May 25 2023, 2:12 PM

explaining default value

K-Wu marked an inline comment as done.May 25 2023, 2:17 PM

cleaning up int32type

fixing error and cleaning up style

aartbik added inline comments.May 25 2023, 2:54 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1808	nit, even though we will think about this, I would phrase it a bit different just in case this remain, something like To avoid coupling this dialect with cusparse.h specifics, we hardcoded magic literals in this enum. Note that this should be kept in sync with cusparseOperation_t in cusparse.h: typedef enum ..... // todo: find a proper way to keep them in sync?

fixing compile error

fixing compile errors and formatting

K-Wu marked 5 inline comments as done.May 25 2023, 3:45 PM

wrengr added inline comments.May 25 2023, 3:47 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1815	"todo" should be all-caps :) As for the content of the todo itself: Assuming cusparse.h is part of the cuSparse library itself, at what point does that library become coupled with the mlir libraries? Is it passed at runtime like our sparse-tensor runtime-library, or is it linked in at some earlier stage? If it's linked in before runtime, then in the file that brings them together we can always add a static_assert to ensure they're consistent at that point. Whereas if it's dynamic-linked at runtime, then I'm not sure there's any good way of ensuring consistency that's worth the cost...
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
1202	If this works, that's cool; but if the callsites don't require knowing that this is the specific operation being returned, then it would probably be better to return `Value` (both to reduce coupling, and because there are a few places where the use of templates gets a bit squirrely about implicitly converting operations to their result values.)
1202–1204	I'm guessing this isn't going to pass the clang-format checks, so be sure to re-run `git clang-format HEAD^`
1203	This function doesn't need the specifics of `ConversionPatternRewriter`, so you should change this parameter to `OpBuilder &builder` instead.
1206	This doesn't look right to me. The `getContext` method already returns `MLIRContext*`, so you don't need the address-of operator there
1207	You should just use `rewriter.getIntegerType(32)` instead.
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
445	MLIR-style says variables should start with lowercase

Harbormaster completed remote builds in B234685: Diff 525845.May 25 2023, 3:53 PM

aartbik added inline comments.May 25 2023, 4:03 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1808	please avoid breaking the 80-col. Just break, and continue on next line (okay to have the Note... more to the right

fixing compiling errors and addressing comments

K-Wu marked 2 inline comments as done.May 25 2023, 4:08 PM

K-Wu marked 4 inline comments as done.May 25 2023, 4:22 PM

Harbormaster completed remote builds in B234697: Diff 525857.May 25 2023, 4:32 PM

try optional attr in builder

Harbormaster completed remote builds in B234708: Diff 525873.May 25 2023, 4:49 PM

reverting

aartbik accepted this revision.May 25 2023, 4:57 PM

aartbik added inline comments.

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1842	maybe use another TRANSPOSE mode in the example, since we won't print the default anymore
1878	same
1916	same, or perhaps even one as default, one different?
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
745	nah, I was just observing the overlap in TODOs now, but no biggie

This revision is now accepted and ready to land.May 25 2023, 4:57 PM

Harbormaster completed remote builds in B234713: Diff 525879.May 25 2023, 5:04 PM

attempting to add optional argument

Harbormaster completed remote builds in B234720: Diff 525890.May 25 2023, 5:38 PM

attempting to do default value

attempting to make default value work

attempt

wrengr added inline comments.May 25 2023, 6:08 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1836–1838	I wonder if you could move this documentation to the definition of `GPU_TransposeModeAttr` itself, to avoid needing to repeat if for every op. (Just a nit, not a blocker)
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
462	Fwiw, I think this was better when explicitly passing `modeA`, rather than passing `std::nullptr`. But the thing I had in mind was to have something more like: def GPU_SpMVBufferSizeOp : ... { let builders = [ OpBuilder<(ins Variadic<GPU_AsyncToken>:$asyncDependencies, GPU_SparseEnvHandle:$env, GPU_SparseSpMatHandle:$spmatA, GPU_SparseDnVecHandle:$dnX, GPU_SparseDnVecHandle:$dnY), [{ auto modeA = gpu::TransposeMode::NON_TRANSPOSE; return build($_builder, $_state, asyncDependencies, env, modeA, spmatA, dnX, dnY); }]; } ...or whatever massaging of that is necessary to get it to compile. If you're having trouble getting this to work, just send me something on chat and we can try to work out the wrinkles :)

wrengr accepted this revision.May 25 2023, 6:08 PM

Harbormaster completed remote builds in B234726: Diff 525897.May 25 2023, 6:19 PM

using the builder Wren suggested

Harbormaster completed remote builds in B234729: Diff 525900.May 25 2023, 6:31 PM

fixing format and compile error in tablegen

Harbormaster completed remote builds in B234733: Diff 525904.May 25 2023, 7:14 PM

fixing unsupported type in tablegen

Harbormaster completed remote builds in B234741: Diff 525912.May 25 2023, 7:36 PM

fixed compile errors

K-Wu marked 4 inline comments as done.May 25 2023, 8:24 PM

K-Wu added inline comments.

mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
462	I make it work! Let me know if this looks good to you.

Harbormaster completed remote builds in B234758: Diff 525930.May 25 2023, 8:58 PM

Closed by commit rG235fbe792b4c: [mlir] [sparse] [gpu] adding transpose support to spmm spmv (authored by K-Wu). · Explain WhyMay 26 2023, 10:09 AM

This revision was automatically updated to reflect the committed changes.

K-Wu added a commit: rG235fbe792b4c: [mlir] [sparse] [gpu] adding transpose support to spmm spmv.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

GPU/

IR/

GPUOps.td

51 lines

lib/

Conversion/

GPUCommon/

GPUToLLVMConversion.cpp

50 lines

Dialect/

SparseTensor/

Transforms/

SparseGPUCodegen.cpp

16 lines

ExecutionEngine/

CudaRuntimeWrappers.cpp

53 lines

test/

Conversion/

GPUCommon/

lower-sparse-to-gpu-runtime-calls.mlir

8 lines

Dialect/

GPU/

ops.mlir

8 lines

SparseTensor/

GPU/

gpu_matmul_lib.mlir

4 lines

gpu_matvec_lib.mlir

4 lines

Diff 524967

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

Show First 20 Lines • Show All 1,799 Lines • ▼ Show 20 Lines	let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
Arg<GPU_SparseHandle>:$spmat);		Arg<GPU_SparseHandle>:$spmat);
let results = (outs Optional<GPU_AsyncToken>:$asyncToken);		let results = (outs Optional<GPU_AsyncToken>:$asyncToken);

let assemblyFormat = [{		let assemblyFormat = [{
custom<AsyncDependencies>(type($asyncToken), $asyncDependencies) $spmat attr-dict		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies) $spmat attr-dict
}];		}];
}		}

		def GPU_MatTransposeOp : I32EnumAttr<"MatTransposeOp",
		wrengrUnsubmitted Done Reply Inline Actions Even if you stick with the "op" name for the operation arguments, you should absolutely not use "op" here— because the "Op" suffix is explicitly and exclusively reserved for operations. Personally I'd go with `TransposeMode` since the "mat" part doesn't add any information/disambiguation. (If the "mat" prefix is there for consistency with cuSparse names, then I suppose it's okay) wrengr: Even if you stick with the "op" name for the operation arguments, you should absolutely not use…
		aartbikUnsubmitted Done Reply Inline Actions nit, even though we will think about this, I would phrase it a bit different just in case this remain, something like To avoid coupling this dialect with cusparse.h specifics, we hardcoded magic literals in this enum. Note that this should be kept in sync with cusparseOperation_t in cusparse.h: typedef enum ..... // todo: find a proper way to keep them in sync? aartbik: nit, even though we will think about this, I would phrase it a bit different just in case this…
		aartbikUnsubmitted Done Reply Inline Actions please avoid breaking the 80-col. Just break, and continue on next line (okay to have the Note... more to the right aartbik: please avoid breaking the 80-col. Just break, and continue on next line (okay to have the Note..
		"transpose mode of sparse matrix supported by sparse tensor ops",
		[
		wrengrUnsubmitted Done Reply Inline Actions I think it'd be better to just inline these into the definition of `GPU_MatTransposeMode` (see `SparseTensorSortKindEnum` for an example using that style) wrengr: I think it'd be better to just inline these into the definition of `GPU_MatTransposeMode` (see…
		I32EnumAttrCase<"NON_TRANSPOSE", 0>,
		I32EnumAttrCase<"TRANSPOSE", 1>,
		I32EnumAttrCase<"CONJUGATE_TRANSPOSE", 2>,
		aartbikUnsubmitted Done Reply Inline Actions Note that these should always be kept in sync with typedef enum { CUSPARSE_OPERATION_NON_TRANSPOSE = 0, CUSPARSE_OPERATION_TRANSPOSE = 1, CUSPARSE_OPERATION_CONJUGATE_TRANSPOSE = 2 } cusparseOperation_t; I had a similar dilemma for the index types and the data types. We definitely do not want to pull in cusparse.h values in here, but we also don't want to introduce inconsistencies later. I have no good solution for this yet, but at least document the requirement for keeping these consistent here aartbik: Note that these should always be kept in sync with typedef enum {…
		]> {
		let genSpecializedAttr = 0;
		wrengrUnsubmitted Not Done Reply Inline Actions "todo" should be all-caps :) As for the content of the todo itself: Assuming cusparse.h is part of the cuSparse library itself, at what point does that library become coupled with the mlir libraries? Is it passed at runtime like our sparse-tensor runtime-library, or is it linked in at some earlier stage? If it's linked in before runtime, then in the file that brings them together we can always add a static_assert to ensure they're consistent at that point. Whereas if it's dynamic-linked at runtime, then I'm not sure there's any good way of ensuring consistency that's worth the cost... wrengr: "todo" should be all-caps :) As for the content of the todo itself: Assuming cusparse.h is…
		let cppNamespace = GPU_Dialect.cppNamespace;
		}

		wrengrUnsubmitted Done Reply Inline Actions It would be better to use `GPU_Dialect.cppNamespace` instead of this string. It's best to avoid using literals whenever possible, both for future-proofing (not that they're likely to change this one any time soon) and to help guide the reader (i.e., to make explicit when things are definitionally equal, rather than just happening to be contingently equal). wrengr: It would be better to use `GPU_Dialect.cppNamespace` instead of this string. It's best to avoid…
		def GPU_MatTransposeOpAttr : EnumAttr<GPU_Dialect, GPU_MatTransposeOp,
		"mat_transpose_mode">;
		def Default_GPU_MatTransposeOpAttr : DefaultValuedAttr<GPU_MatTransposeOpAttr,
		"MatTransposeOp::NON_TRANSPOSE">;
		wrengrUnsubmitted Done Reply Inline Actions You should instead add `let defaultValue = "MatTransposeOp::NON_TRANSPOSE";` to the definition of `GPU_MatTransposeOpAttr` itself, since it's the natural default value for all uses of that EnumAttr. Whereas DefaultValuedAttr is more for situations where either (a) you don't have control over the original attribute definition and therefore can't change it, or (b) you have some op-specific default that's different from the usual default (or the usual lack thereof). wrengr: You should instead add `let defaultValue = "MatTransposeOp::NON_TRANSPOSE";` to the definition…

def GPU_SpMVBufferSizeOp : GPU_Op<"spmv_buffer_size", [GPU_AsyncOpInterface]> {		def GPU_SpMVBufferSizeOp : GPU_Op<"spmv_buffer_size", [GPU_AsyncOpInterface]> {
let summary = "Precompute buffersize for SpMV operation";		let summary = "Precompute buffersize for SpMV operation";
let description = [{		let description = [{
The `gpu.spmv_buffer_size` operation returns the buffer size required		The `gpu.spmv_buffer_size` operation returns the buffer size required
to perform the SpMV operation on the given sparse matrix and dense vectors.		to perform the SpMV operation on the given sparse matrix and dense vectors.
The operation expects handles returned by previous sparse operations		The operation expects handles returned by previous sparse operations
to construct an environment and the operands for SpMV.		to construct an environment and the operands for SpMV.

If the `async` keyword is present, the op is executed asynchronously (i.e.		If the `async` keyword is present, the op is executed asynchronously (i.e.
it does not block until the execution has finished on the device). In		it does not block until the execution has finished on the device). In
that case, it returns a !gpu.async.token in addition to the environment.		that case, it returns a !gpu.async.token in addition to the environment.

		The matrix arguments can also be associated with one of the following
		operators: NON_TRANSPOSE, TRANSPOSE, CONJUGATE_TRANSPOSE.
		aartbikUnsubmitted Done Reply Inline Actions add NON_TRANSPOSE (the default), ... since the doc page does not see the code above aartbik: add NON_TRANSPOSE (the default), ... since the doc page does not see the code above

		wrengrUnsubmitted Not Done Reply Inline Actions I wonder if you could move this documentation to the definition of `GPU_TransposeModeAttr` itself, to avoid needing to repeat if for every op. (Just a nit, not a blocker) wrengr: I wonder if you could move this documentation to the definition of `GPU_TransposeModeAttr`…
Example:		Example:

```mlir		```mlir
%buffersz, %token = gpu.spmv_buffersize async [%dep] %env, %spmatA, %dnX, %dnY		%buffersz, %token = gpu.spmv_buffersize async [%dep] %env, %spmatA{NON_TRANSPOSE}, %dnX, %dnY
		aartbikUnsubmitted Done Reply Inline Actions This will not be an SSA value, %opA, but an actual "literal" attribute, as in NON_TRANSPOSE, right? So please update the examples accordingly aartbik: This will not be an SSA value, %opA, but an actual "literal" attribute, as in NON_TRANSPOSE…
		aartbikUnsubmitted Done Reply Inline Actions I like this syntax! Very neat! Note that you get bonus points for a syntax where the default is never printed, only the non default cases (okay for follow up,but we have similar policies for the sparse tensor type features) aartbik: I like this syntax! Very neat! Note that you get bonus points for a syntax where the default…
		aartbikUnsubmitted Done Reply Inline Actions maybe use another TRANSPOSE mode in the example, since we won't print the default anymore aartbik: maybe use another TRANSPOSE mode in the example, since we won't print the default anymore
```		```
}];		}];

let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
GPU_SparseHandle:$env,		GPU_SparseHandle:$env,
		Default_GPU_MatTransposeOpAttr:$tOpA,
		wrengrUnsubmitted Done Reply Inline Actions I think it'd be better to name this `modeA` (or more ideally `transposeModeA` if there's any chance of there being other "modes" in the future). The term "op" is generally used for operations, which this isn't. And ditto for the other ops. wrengr: I think it'd be better to name this `modeA` (or more ideally `transposeModeA` if there's any…
		K-WuAuthorUnsubmitted Done Reply Inline Actions I felt hard to choose when I wrote this code. The reason of using op is that cusparse documentation name this argument as opA. Shall I still use modeA? Do you think it necessary to comment here that this is corresponding to the opA in cusparse documentation? K-Wu: I felt hard to choose when I wrote this code. The reason of using op is that cusparse…
		K-WuAuthorUnsubmitted Done Reply Inline Actions Okay let's see if this is better. I now use transposeOpA\|B and tOpA\|B. K-Wu: Okay let's see if this is better. I now use transposeOpA\|B and tOpA\|B.
		wrengrUnsubmitted Done Reply Inline Actions I'm not a big fan of single-letter abbreviations like that. Single letters like "A"/"X" are okay because those are in some sense their full names, and the "spmat"/"dn" parts are adjectives applying to those names. (And, arguably, the "spmat"/"dn" parts can be dropped since that's already captured by the types themselves and so there would be no ambiguity when seeing `getX` or `getA` in the C++ code; though if that's what cuSparse calls them, then there's no harm in keeping the longer names.) Whereas "t" could mean anything, isn't part of a standardized naming scheme (like "A"/"X" are), and doesn't (imo) help to disambiguate what "op" means. I totally get your motivation for going with "op", and I like where you're mind's at :) It's just unfortunate that "op" means something totally different in MLIR, so there's no good way to be consistent with both cuSparse and MLIR at the same time. Personally, I'd side with MLIR for this case, though I'd defer to our noble TL if he feels otherwise. My reasoning for resolving this case comes from a two-pronged argument about which side readers would be most familiar with, and hence is least likely to cause confusion. Since this is part of the MLIR ecosystem, folks must be at least somewhat familiar with MLIR. And since MLIR isn't ubiquitously popular, that (ironically) makes readers even more likely to be familiar with MLIR or trying to become so. (Whereas things which are ubiquitously popular often have dabblers who care more about The Other Thing.) Conversely, and I may well be wrong about this part, I don't think readers are as likely to be familiar with cuSparse, or rather as likely to have greater familiarity/comfortability with cuSparse than with MLIR. Though now that you've mentioned that the name comes from cuSparse, I'm thinking I might know where they're coming from. Namely, I'm guessing they mean "op" in the sense of the category theory notation for duality ("A^{op}"), which means different things depending on what the "A" being dualized is: namely for real-matrices it's transpose, but for complex-matrices it's conjugate-transpose. But if that is indeed where they're coming from, then I feel like that's all the more reason to side with MLIR— since anyone who's familiar with the category theory notation will also be familiar with several other notations for these operations, since linear algebraists can't decide between the half-dozen different traditions for what to name them. That said, if we do side with MLIR over cuSparse, I'm now thinking the better short-name would be "transA" rather than "modeA" (and if you're feeling more verbose, then the better middling-/long-names would be "transposeA" and "transposeModeA") wrengr: I'm not a big fan of single-letter abbreviations like that. Single letters like "A"/"X" are…
		K-WuAuthorUnsubmitted Done Reply Inline Actions That makes a lot of sense! I didn't know the cetegory theory notation and was confused when I saw cusparse uses op in their documentation. Thanks a lot for this very detailed reasoning. I will follow the MLIR naming convension as you mentioned then. K-Wu: That makes a lot of sense! I didn't know the cetegory theory notation and was confused when I…
GPU_SparseHandle:$spmatA,		GPU_SparseHandle:$spmatA,
GPU_SparseHandle:$dnX,		GPU_SparseHandle:$dnX,
GPU_SparseHandle:$dnY);		GPU_SparseHandle:$dnY);
let results = (outs Res<Index>:$bufferSz, Optional<GPU_AsyncToken>:$asyncToken);		let results = (outs Res<Index>:$bufferSz, Optional<GPU_AsyncToken>:$asyncToken);

let assemblyFormat = [{		let assemblyFormat = [{
custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
$env `,` $spmatA `,` $dnX `,` $dnY attr-dict		$env `,` $spmatA `{` $tOpA `}` `,` $dnX `,` $dnY attr-dict
		wrengrUnsubmitted Done Reply Inline Actions You should make this whole chunk into an optional-group. I believe "`(` `{` $tOpA^ `}`)?" should be the correct syntax for that, though you may need to play around with it since sometimes the optional-group stuff doesn't parse in the way it seems like it should. For more information, cf: https://mlir.llvm.org/docs/DefiningDialects/AttributesAndTypes/#optional-and-default-valued-parameters wrengr: You should make this whole chunk into an optional-group. I believe "`( ```{``` $tOpA^ ```}```)?
		K-WuAuthorUnsubmitted Done Reply Inline Actions Noted with great thanks! I really appreciate your mentioning the links and examples in all of your comments. These are great aid to me. K-Wu: Noted with great thanks! I really appreciate your mentioning the links and examples in all of…
}];		}];
}		}

def GPU_SpMVOp : GPU_Op<"spmv", [GPU_AsyncOpInterface]> {		def GPU_SpMVOp : GPU_Op<"spmv", [GPU_AsyncOpInterface]> {
let summary = "SpMV operation";		let summary = "SpMV operation";
let description = [{		let description = [{
The `gpu.spmv` operation performs the SpMV operation on the given sparse matrix,		The `gpu.spmv` operation performs the SpMV operation on the given sparse matrix,
dense vectors, and buffer. The operation expects handles returned by previous		dense vectors, and buffer. The operation expects handles returned by previous
sparse operations to construct an environment and the operands for SpMV. The		sparse operations to construct an environment and the operands for SpMV. The
buffer must have been allocated on the device.		buffer must have been allocated on the device.

If the `async` keyword is present, the op is executed asynchronously (i.e.		If the `async` keyword is present, the op is executed asynchronously (i.e.
it does not block until the execution has finished on the device). In		it does not block until the execution has finished on the device). In
that case, it returns a !gpu.async.token in addition to the environment.		that case, it returns a !gpu.async.token in addition to the environment.

		The matrix arguments can also be associated with one of the following
		operators: NON_TRANSPOSE, TRANSPOSE, CONJUGATE_TRANSPOSE.
		aartbikUnsubmitted Done Reply Inline Actions same on default aartbik: same on default

Example:		Example:

```mlir		```mlir
%token = gpu.spmv async [%dep] %env, %spmatA, %dnX, %dnY : memref<?xf64>		%token = gpu.spmv async [%dep] %env, %spmatA{NON_TRANSPOSE}, %dnX, %dnY : memref<?xf64>
		aartbikUnsubmitted Done Reply Inline Actions same aartbik: same
```		```
}];		}];

let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
GPU_SparseHandle:$env,		GPU_SparseHandle:$env,
		Default_GPU_MatTransposeOpAttr:$tOpA,
GPU_SparseHandle:$spmatA,		GPU_SparseHandle:$spmatA,
GPU_SparseHandle:$dnX,		GPU_SparseHandle:$dnX,
GPU_SparseHandle:$dnY,		GPU_SparseHandle:$dnY,
AnyMemRef:$buffer);		AnyMemRef:$buffer);
let results = (outs Optional<GPU_AsyncToken>:$asyncToken);		let results = (outs Optional<GPU_AsyncToken>:$asyncToken);

let assemblyFormat = [{		let assemblyFormat = [{
custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
$env `,` $spmatA `,` $dnX `,` $dnY `,` $buffer attr-dict `:` type($buffer)		$env `,` $spmatA `{` $tOpA `}` `,` $dnX `,` $dnY `,` $buffer attr-dict `:` type($buffer)
}];		}];
}		}

def GPU_SpMMBufferSizeOp : GPU_Op<"spmm_buffer_size", [GPU_AsyncOpInterface]> {		def GPU_SpMMBufferSizeOp : GPU_Op<"spmm_buffer_size", [GPU_AsyncOpInterface]> {
let summary = "Precompute buffersize for SpMM operation";		let summary = "Precompute buffersize for SpMM operation";
let description = [{		let description = [{
The `gpu.spmm_buffer_size` operation returns the buffer size required		The `gpu.spmm_buffer_size` operation returns the buffer size required
to perform the SpMM operation on the given sparse and dense matrix.		to perform the SpMM operation on the given sparse and dense matrix.
The operation expects handles returned by previous sparse operations		The operation expects handles returned by previous sparse operations
to construct an environment and the operands for SpMM.		to construct an environment and the operands for SpMM.

If the `async` keyword is present, the op is executed asynchronously (i.e.		If the `async` keyword is present, the op is executed asynchronously (i.e.
it does not block until the execution has finished on the device). In		it does not block until the execution has finished on the device). In
that case, it returns a !gpu.async.token in addition to the environment.		that case, it returns a !gpu.async.token in addition to the environment.

		The matrix arguments can also be associated with one of the following
		operators: NON_TRANSPOSE, TRANSPOSE, CONJUGATE_TRANSPOSE.
		aartbikUnsubmitted Done Reply Inline Actions default aartbik: default


Example:		Example:

```mlir		```mlir
%buffersz, %token = gpu.spmm_buffersize async [%dep] %env, %spmatA, %spmatB, %spmatC		%buffersz, %token = gpu.spmm_buffersize async [%dep] %env, %spmatA{NON_TRANSPOSE}, %dnmatB{NON_TRANSPOSE}, %dnmatC
		aartbikUnsubmitted Done Reply Inline Actions same, or perhaps even one as default, one different? aartbik: same, or perhaps even one as default, one different?
```		```
}];		}];

let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
GPU_SparseHandle:$env,		GPU_SparseHandle:$env,
		Default_GPU_MatTransposeOpAttr:$tOpA,
		Default_GPU_MatTransposeOpAttr:$tOpB,
GPU_SparseHandle:$spmatA,		GPU_SparseHandle:$spmatA,
GPU_SparseHandle:$dnmatB,		GPU_SparseHandle:$dnmatB,
GPU_SparseHandle:$dnmatC);		GPU_SparseHandle:$dnmatC);
let results = (outs Res<Index>:$bufferSz, Optional<GPU_AsyncToken>:$asyncToken);		let results = (outs Res<Index>:$bufferSz, Optional<GPU_AsyncToken>:$asyncToken);

let assemblyFormat = [{		let assemblyFormat = [{
custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
$env `,` $spmatA `,` $dnmatB `,` $dnmatC attr-dict		$env `,` $spmatA `{` $tOpA `}` `,` $dnmatB `{` $tOpB `}` `,` $dnmatC attr-dict
		wrengrUnsubmitted Done Reply Inline Actions Since the `GPU_MatTransposeModeAttr` arguments seem to be attached to particular `GPU_SparseHandle` arguments, I think it'd be better to use a syntax that helps make that more clear. For example, you could use something like `gpu.spmm_buffersize async [%dep] %env, %spmatA{%opA}, %dnmatB{%opB}, %dnmatC` (where I chose to use curly braces since that's what's typically used for attributes). And ditto for the other ops. wrengr: Since the `GPU_MatTransposeModeAttr` arguments seem to be attached to particular…
		wrengrUnsubmitted Done Reply Inline Actions Also, you may want to make these attributes optional, where they default to `NON_TRANSPOSE` when not provided. Since doing so will help avoid needing to make most of the changes to Conversion/GPUCommon/GPUToLLVMConversion.cpp and the various mlir test files. I think the cleanest way to do that would be to modify the definition of `GPU_MatTransposeModeAttr` to specify the default (so it can be shared by all the ops). Though you'll have to make sure to update the assemblyFormat of each op to allow dropping the curly braces when the mode isn't provided. And you'll probably need to add some additional builders for each op, so that they can be defaulted on the C++ side. (Ideally the standard builders should be able to handle this automatically, but I don't recall whether they do or not for this particular case.) wrengr: Also, you may want to make these attributes optional, where they default to `NON_TRANSPOSE`…
		K-WuAuthorUnsubmitted Done Reply Inline Actions This is somewhat addressed by the use of Default_GPU_MatTransposeOpAttr in the last commit K-Wu: This is somewhat addressed by the use of Default_GPU_MatTransposeOpAttr in the last commit
}];		}];
}		}

def GPU_SpMMOp : GPU_Op<"spmm", [GPU_AsyncOpInterface]> {		def GPU_SpMMOp : GPU_Op<"spmm", [GPU_AsyncOpInterface]> {
let summary = "SpMM operation";		let summary = "SpMM operation";
let description = [{		let description = [{
The `gpu.spmm` operation performs the SpMM operation on the given sparse and		The `gpu.spmm` operation performs the SpMM operation on the given sparse and
dense matrix, and buffer. The operation expects handles returned by previous		dense matrix, and buffer. The operation expects handles returned by previous
sparse operations to construct an environment and the operands for SpMM. The		sparse operations to construct an environment and the operands for SpMM. The
buffer must have been allocated on the device.		buffer must have been allocated on the device.

If the `async` keyword is present, the op is executed asynchronously (i.e.		If the `async` keyword is present, the op is executed asynchronously (i.e.
it does not block until the execution has finished on the device). In		it does not block until the execution has finished on the device). In
that case, it returns a !gpu.async.token in addition to the environment.		that case, it returns a !gpu.async.token in addition to the environment.

		The matrix arguments can also be associated with one of the following
		operators: NON_TRANSPOSE, TRANSPOSE, CONJUGATE_TRANSPOSE.
		aartbikUnsubmitted Done Reply Inline Actions default aartbik: default

Example:		Example:

```mlir		```mlir
%token = gpu.spmm async [%dep] %env, %spmatA, %spmatB, %spmatC, %buffer		%token = gpu.spmm async [%dep] %env, %spmatA{NON_TRANSPOSE}, %dnmatB{NON_TRANSPOSE}, %dnmatC, %buffer
```		```
}];		}];

let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
GPU_SparseHandle:$env,		GPU_SparseHandle:$env,
		Default_GPU_MatTransposeOpAttr:$tOpA,
		Default_GPU_MatTransposeOpAttr:$tOpB,
GPU_SparseHandle:$spmatA,		GPU_SparseHandle:$spmatA,
GPU_SparseHandle:$dnmatB,		GPU_SparseHandle:$dnmatB,
GPU_SparseHandle:$dnmatC,		GPU_SparseHandle:$dnmatC,
AnyMemRef:$buffer);		AnyMemRef:$buffer);
let results = (outs Optional<GPU_AsyncToken>:$asyncToken);		let results = (outs Optional<GPU_AsyncToken>:$asyncToken);

let assemblyFormat = [{		let assemblyFormat = [{
custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
$env `,` $spmatA `,` $dnmatB `,` $dnmatC `,` $buffer attr-dict `:` type($buffer)		$env `,` $spmatA `{` $tOpA `}` `,` $dnmatB `{` $tOpB `}` `,` $dnmatC `,` $buffer attr-dict `:` type($buffer)
}];		}];
}		}

#endif // GPU_OPS		#endif // GPU_OPS

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp

Show First 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	FunctionCallBuilder createCsrCallBuilder = {
llvmInt32Type, llvmPointerType /* void stream /}};		llvmInt32Type, llvmPointerType /* void stream /}};
FunctionCallBuilder destroySpMatCallBuilder = {		FunctionCallBuilder destroySpMatCallBuilder = {
"mgpuDestroySpMat",		"mgpuDestroySpMat",
llvmVoidType,		llvmVoidType,
{llvmPointerType, llvmPointerType /* void stream /}};		{llvmPointerType, llvmPointerType /* void stream /}};
FunctionCallBuilder spMVBufferSizeCallBuilder = {		FunctionCallBuilder spMVBufferSizeCallBuilder = {
"mgpuSpMVBufferSize",		"mgpuSpMVBufferSize",
llvmIntPtrType,		llvmIntPtrType,
{llvmPointerType, llvmPointerType, llvmPointerType, llvmPointerType,		{llvmPointerType, llvmInt32Type, llvmPointerType, llvmPointerType,
llvmInt32Type, llvmPointerType /* void stream /}};		llvmPointerType, llvmInt32Type, llvmPointerType /* void stream /}};
FunctionCallBuilder spMVCallBuilder = {		FunctionCallBuilder spMVCallBuilder = {
"mgpuSpMV",		"mgpuSpMV",
llvmVoidType,		llvmVoidType,
{llvmPointerType, llvmPointerType, llvmPointerType, llvmPointerType,		{llvmPointerType, llvmInt32Type, llvmPointerType, llvmPointerType,
llvmInt32Type, llvmPointerType, llvmPointerType /* void stream /}};		llvmPointerType, llvmInt32Type, llvmPointerType,
		llvmPointerType /* void stream /}};
FunctionCallBuilder spMMBufferSizeCallBuilder = {		FunctionCallBuilder spMMBufferSizeCallBuilder = {
"mgpuSpMMBufferSize",		"mgpuSpMMBufferSize",
llvmIntPtrType,		llvmIntPtrType,
{llvmPointerType, llvmPointerType, llvmPointerType, llvmPointerType,		{llvmPointerType, llvmInt32Type, llvmInt32Type, llvmPointerType,
llvmInt32Type, llvmPointerType /* void stream /}};		llvmPointerType, llvmPointerType, llvmInt32Type,
		llvmPointerType /* void stream /}};
FunctionCallBuilder spMMCallBuilder = {		FunctionCallBuilder spMMCallBuilder = {
"mgpuSpMM",		"mgpuSpMM",
llvmVoidType,		llvmVoidType,
{llvmPointerType, llvmPointerType, llvmPointerType, llvmPointerType,		{llvmPointerType, llvmInt32Type, llvmInt32Type, llvmPointerType,
llvmInt32Type, llvmPointerType, llvmPointerType /* void stream /}};		llvmPointerType, llvmPointerType, llvmInt32Type, llvmPointerType,
		llvmPointerType /* void stream /}};
};		};

/// A rewrite pattern to convert gpu.host_register operations into a GPU runtime		/// A rewrite pattern to convert gpu.host_register operations into a GPU runtime
/// call. Currently it supports CUDA and ROCm (HIP).		/// call. Currently it supports CUDA and ROCm (HIP).
class ConvertHostRegisterOpToGpuRuntimeCallPattern		class ConvertHostRegisterOpToGpuRuntimeCallPattern
: public ConvertOpToGpuRuntimeCallPattern<gpu::HostRegisterOp> {		: public ConvertOpToGpuRuntimeCallPattern<gpu::HostRegisterOp> {
public:		public:
ConvertHostRegisterOpToGpuRuntimeCallPattern(LLVMTypeConverter &typeConverter)		ConvertHostRegisterOpToGpuRuntimeCallPattern(LLVMTypeConverter &typeConverter)
▲ Show 20 Lines • Show All 926 Lines • ▼ Show 20 Lines
static Type getSpMatElemType(Value spMat) {		static Type getSpMatElemType(Value spMat) {
if (auto op = spMat.getDefiningOp<gpu::CreateCooOp>())		if (auto op = spMat.getDefiningOp<gpu::CreateCooOp>())
return op.getValues().getType().cast<MemRefType>().getElementType();		return op.getValues().getType().cast<MemRefType>().getElementType();
if (auto op = spMat.getDefiningOp<gpu::CreateCsrOp>())		if (auto op = spMat.getDefiningOp<gpu::CreateCsrOp>())
return op.getValues().getType().cast<MemRefType>().getElementType();		return op.getValues().getType().cast<MemRefType>().getElementType();
llvm_unreachable("cannot find spmat def");		llvm_unreachable("cannot find spmat def");
}		}

		static LLVM::ConstantOp
		wrengrUnsubmitted Done Reply Inline Actions If this works, that's cool; but if the callsites don't require knowing that this is the specific operation being returned, then it would probably be better to return `Value` (both to reduce coupling, and because there are a few places where the use of templates gets a bit squirrely about implicitly converting operations to their result values.) wrengr: If this works, that's cool; but if the callsites don't require knowing that this is the…
		genConstFromMatTransposeOp(ConversionPatternRewriter &rewriter, Location loc,
		wrengrUnsubmitted Done Reply Inline Actions I think it'd read nicer to use the prefix "constant" rather than "genConstFrom". And this function is a prime example of why the enum must not be named with "Op": since when reading this I got very confused about why the argument was already an operation rather than the enum/attr, even despite the fact that I'd just re-read through the code above! wrengr: I think it'd read nicer to use the prefix "constant" rather than "genConstFrom". And this…
		wrengrUnsubmitted Done Reply Inline Actions This function doesn't need the specifics of `ConversionPatternRewriter`, so you should change this parameter to `OpBuilder &builder` instead. wrengr: This function doesn't need the specifics of `ConversionPatternRewriter`, so you should change…
		Type int32Type, gpu::MatTransposeOp tOp) {
		wrengrUnsubmitted Done Reply Inline Actions Why not just have this function use `llvmInt32Type` directly? (Since that's the type that's used everywhere, and since this function doesn't do any validation that the parameter is indeed a 32-bit integer type, and since no other type would really make much sense) wrengr: Why not just have this function use `llvmInt32Type` directly? (Since that's the type that's…
		wrengrUnsubmitted Done Reply Inline Actions I'm guessing this isn't going to pass the clang-format checks, so be sure to re-run `git clang-format HEAD^` wrengr: I'm guessing this isn't going to pass the clang-format checks, so be sure to re-run `git clang…
		return rewriter.create<LLVM::ConstantOp>(loc, int32Type,
		static_cast<int32_t>(tOp));
		wrengrUnsubmitted Done Reply Inline Actions This doesn't look right to me. The `getContext` method already returns `MLIRContext`, so you don't need the address-of operator there wrengr:* This doesn't look right to me. The `getContext` method already returns `MLIRContext*`, so you…
		}
		wrengrUnsubmitted Done Reply Inline Actions You should just use `rewriter.getIntegerType(32)` instead. wrengr: You should just use `rewriter.getIntegerType(32)` instead.

LogicalResult ConvertCreateSparseEnvOpToGpuRuntimeCallPattern::matchAndRewrite(		LogicalResult ConvertCreateSparseEnvOpToGpuRuntimeCallPattern::matchAndRewrite(
gpu::CreateSparseEnvOp op, OpAdaptor adaptor,		gpu::CreateSparseEnvOp op, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const {		ConversionPatternRewriter &rewriter) const {
if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) \|\|		if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) \|\|
failed(isAsyncWithOneDependency(rewriter, op)))		failed(isAsyncWithOneDependency(rewriter, op)))
return failure();		return failure();
Location loc = op.getLoc();		Location loc = op.getLoc();
auto stream = adaptor.getAsyncDependencies().front();		auto stream = adaptor.getAsyncDependencies().front();
▲ Show 20 Lines • Show All 177 Lines • ▼ Show 20 Lines

LogicalResult ConvertSpMVBufferSizeOpToGpuRuntimeCallPattern::matchAndRewrite(		LogicalResult ConvertSpMVBufferSizeOpToGpuRuntimeCallPattern::matchAndRewrite(
gpu::SpMVBufferSizeOp op, OpAdaptor adaptor,		gpu::SpMVBufferSizeOp op, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const {		ConversionPatternRewriter &rewriter) const {
if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) \|\|		if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) \|\|
failed(isAsyncWithOneDependency(rewriter, op)))		failed(isAsyncWithOneDependency(rewriter, op)))
return failure();		return failure();
Location loc = op.getLoc();		Location loc = op.getLoc();
		auto tOpA =
		genConstFromMatTransposeOp(rewriter, loc, llvmInt32Type, op.getTOpA());
		wrengrUnsubmitted Done Reply Inline Actions Style-guide says to use `static_cast` here, instead of the C-style cast. wrengr: Style-guide says to use `static_cast` here, instead of the C-style cast.
		wrengrUnsubmitted Done Reply Inline Actions To avoid repeating this idiom all over, you should define a helper function (cf., SparseTensor/Transforms/CodegenUtils for some examples of us doing this elsewhere). I'd say ditto for `dType` idiom, though those aren't touched by this patch; so feel free to leave this cleanup for a separate CL wrengr: To avoid repeating this idiom all over, you should define a helper function (cf.
Type dType = getSpMatElemType(op.getSpmatA());		Type dType = getSpMatElemType(op.getSpmatA());
auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type,		auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type,
dType.getIntOrFloatBitWidth());		dType.getIntOrFloatBitWidth());
auto stream = adaptor.getAsyncDependencies().front();		auto stream = adaptor.getAsyncDependencies().front();
auto bufferSize =		auto bufferSize =
spMVBufferSizeCallBuilder		spMVBufferSizeCallBuilder
.create(loc, rewriter,		.create(loc, rewriter,
{adaptor.getEnv(), adaptor.getSpmatA(), adaptor.getDnX(),		{adaptor.getEnv(), tOpA, adaptor.getSpmatA(),
adaptor.getDnY(), dw, stream})		adaptor.getDnX(), adaptor.getDnY(), dw, stream})
.getResult();		.getResult();
rewriter.replaceOp(op, {bufferSize, stream});		rewriter.replaceOp(op, {bufferSize, stream});
return success();		return success();
}		}

LogicalResult ConvertSpMVOpToGpuRuntimeCallPattern::matchAndRewrite(		LogicalResult ConvertSpMVOpToGpuRuntimeCallPattern::matchAndRewrite(
gpu::SpMVOp op, OpAdaptor adaptor,		gpu::SpMVOp op, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const {		ConversionPatternRewriter &rewriter) const {
if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) \|\|		if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) \|\|
failed(isAsyncWithOneDependency(rewriter, op)))		failed(isAsyncWithOneDependency(rewriter, op)))
return failure();		return failure();
Location loc = op.getLoc();		Location loc = op.getLoc();
Type dType = getSpMatElemType(op.getSpmatA());		Type dType = getSpMatElemType(op.getSpmatA());
		auto tOpA = genConstFromMatTransposeOp(rewriter, loc, llvmInt32Type,
		adaptor.getTOpA());
auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type,		auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type,
dType.getIntOrFloatBitWidth());		dType.getIntOrFloatBitWidth());
auto stream = adaptor.getAsyncDependencies().front();		auto stream = adaptor.getAsyncDependencies().front();
Value pBuf =		Value pBuf =
MemRefDescriptor(adaptor.getBuffer()).allocatedPtr(rewriter, loc);		MemRefDescriptor(adaptor.getBuffer()).allocatedPtr(rewriter, loc);
if (!getTypeConverter()->useOpaquePointers())		if (!getTypeConverter()->useOpaquePointers())
pBuf = rewriter.create<LLVM::BitcastOp>(loc, llvmPointerType, pBuf);		pBuf = rewriter.create<LLVM::BitcastOp>(loc, llvmPointerType, pBuf);
spMVCallBuilder.create(loc, rewriter,		spMVCallBuilder.create(loc, rewriter,
{adaptor.getEnv(), adaptor.getSpmatA(),		{adaptor.getEnv(), tOpA, adaptor.getSpmatA(),
adaptor.getDnX(), adaptor.getDnY(), dw, pBuf,		adaptor.getDnX(), adaptor.getDnY(), dw, pBuf,
stream});		stream});
rewriter.replaceOp(op, {stream});		rewriter.replaceOp(op, {stream});
return success();		return success();
}		}

LogicalResult ConvertSpMMBufferSizeOpToGpuRuntimeCallPattern::matchAndRewrite(		LogicalResult ConvertSpMMBufferSizeOpToGpuRuntimeCallPattern::matchAndRewrite(
gpu::SpMMBufferSizeOp op, OpAdaptor adaptor,		gpu::SpMMBufferSizeOp op, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const {		ConversionPatternRewriter &rewriter) const {
if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) \|\|		if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) \|\|
failed(isAsyncWithOneDependency(rewriter, op)))		failed(isAsyncWithOneDependency(rewriter, op)))
return failure();		return failure();
Location loc = op.getLoc();		Location loc = op.getLoc();
Type dType = getSpMatElemType(op.getSpmatA());		Type dType = getSpMatElemType(op.getSpmatA());
		auto tOpA = genConstFromMatTransposeOp(rewriter, loc, llvmInt32Type,
		adaptor.getTOpA());
		auto tOpB = genConstFromMatTransposeOp(rewriter, loc, llvmInt32Type,
		adaptor.getTOpB());
auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type,		auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type,
dType.getIntOrFloatBitWidth());		dType.getIntOrFloatBitWidth());
auto stream = adaptor.getAsyncDependencies().front();		auto stream = adaptor.getAsyncDependencies().front();
auto bufferSize =		auto bufferSize =
spMMBufferSizeCallBuilder		spMMBufferSizeCallBuilder
.create(loc, rewriter,		.create(loc, rewriter,
{adaptor.getEnv(), adaptor.getSpmatA(), adaptor.getDnmatB(),		{adaptor.getEnv(), tOpA, tOpB, adaptor.getSpmatA(),
adaptor.getDnmatC(), dw, stream})		adaptor.getDnmatB(), adaptor.getDnmatC(), dw, stream})
.getResult();		.getResult();
rewriter.replaceOp(op, {bufferSize, stream});		rewriter.replaceOp(op, {bufferSize, stream});
return success();		return success();
}		}

LogicalResult ConvertSpMMOpToGpuRuntimeCallPattern::matchAndRewrite(		LogicalResult ConvertSpMMOpToGpuRuntimeCallPattern::matchAndRewrite(
gpu::SpMMOp op, OpAdaptor adaptor,		gpu::SpMMOp op, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const {		ConversionPatternRewriter &rewriter) const {
if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) \|\|		if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) \|\|
failed(isAsyncWithOneDependency(rewriter, op)))		failed(isAsyncWithOneDependency(rewriter, op)))
return failure();		return failure();
Location loc = op.getLoc();		Location loc = op.getLoc();
Type dType = getSpMatElemType(op.getSpmatA());		Type dType = getSpMatElemType(op.getSpmatA());
auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type,		auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type,
dType.getIntOrFloatBitWidth());		dType.getIntOrFloatBitWidth());
		auto tOpA = genConstFromMatTransposeOp(rewriter, loc, llvmInt32Type,
		adaptor.getTOpA());
		auto tOpB = genConstFromMatTransposeOp(rewriter, loc, llvmInt32Type,
		adaptor.getTOpB());
auto stream = adaptor.getAsyncDependencies().front();		auto stream = adaptor.getAsyncDependencies().front();
Value pBuf =		Value pBuf =
MemRefDescriptor(adaptor.getBuffer()).allocatedPtr(rewriter, loc);		MemRefDescriptor(adaptor.getBuffer()).allocatedPtr(rewriter, loc);
if (!getTypeConverter()->useOpaquePointers())		if (!getTypeConverter()->useOpaquePointers())
pBuf = rewriter.create<LLVM::BitcastOp>(loc, llvmPointerType, pBuf);		pBuf = rewriter.create<LLVM::BitcastOp>(loc, llvmPointerType, pBuf);
spMMCallBuilder.create(loc, rewriter,		spMMCallBuilder.create(loc, rewriter,
		wrengrUnsubmitted Done Reply Inline Actions What does this todo actually mean? Cuz it looks like you are passing the transpose modes... wrengr: What does this todo actually mean? Cuz it looks like you are passing the transpose modes...
		K-WuAuthorUnsubmitted Done Reply Inline Actions Good catch! I forgot to remove this after finishing this item K-Wu: Good catch! I forgot to remove this after finishing this item
{adaptor.getEnv(), adaptor.getSpmatA(),		{adaptor.getEnv(), tOpA, tOpB, adaptor.getSpmatA(),
adaptor.getDnmatB(), adaptor.getDnmatC(), dw, pBuf,		adaptor.getDnmatB(), adaptor.getDnmatC(), dw, pBuf,
stream});		stream});
rewriter.replaceOp(op, {stream});		rewriter.replaceOp(op, {stream});
return success();		return success();
}		}

void mlir::populateGpuToLLVMConversionPatterns(LLVMTypeConverter &converter,		void mlir::populateGpuToLLVMConversionPatterns(LLVMTypeConverter &converter,
RewritePatternSet &patterns,		RewritePatternSet &patterns,
Show All 38 Lines

mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp

Show First 20 Lines • Show All 436 Lines • ▼ Show 20 Lines	static LogicalResult rewriteSpMV(PatternRewriter &rewriter,
// Create sparse environment and sparse matrix/dense vector handles.		// Create sparse environment and sparse matrix/dense vector handles.
Type indexTp = rewriter.getIndexType();		Type indexTp = rewriter.getIndexType();
Type handleTp = rewriter.getType<gpu::SparseHandleType>();		Type handleTp = rewriter.getType<gpu::SparseHandleType>();
Type tokenTp = rewriter.getType<gpu::AsyncTokenType>();		Type tokenTp = rewriter.getType<gpu::AsyncTokenType>();
Value token = genFirstWait(rewriter, loc);		Value token = genFirstWait(rewriter, loc);
auto env =		auto env =
rewriter.create<gpu::CreateSparseEnvOp>(loc, handleTp, tokenTp, token);		rewriter.create<gpu::CreateSparseEnvOp>(loc, handleTp, tokenTp, token);
Value handle = env.getResult(0);		Value handle = env.getResult(0);
		auto tOpA = gpu::MatTransposeOp::NON_TRANSPOSE;
		wrengrUnsubmitted Done Reply Inline Actions MLIR-style says variables should start with lowercase wrengr: MLIR-style says variables should start with lowercase
token = env.getAsyncToken();		token = env.getAsyncToken();
Operation *spGenA = genSpMat(rewriter, loc, handleTp, tokenTp, token, szY,		Operation *spGenA = genSpMat(rewriter, loc, handleTp, tokenTp, token, szY,
szX, nseA, rowA, colA, valA, isCOO, enableRT);		szX, nseA, rowA, colA, valA, isCOO, enableRT);
Value spMatA = spGenA->getResult(0);		Value spMatA = spGenA->getResult(0);
token = spGenA->getResult(1);		token = spGenA->getResult(1);
auto dvecX = rewriter.create<gpu::CreateDnVecOp>(loc, handleTp, tokenTp,		auto dvecX = rewriter.create<gpu::CreateDnVecOp>(loc, handleTp, tokenTp,
token, vecX, szX);		token, vecX, szX);
Value dnX = dvecX.getResult(0);		Value dnX = dvecX.getResult(0);
token = dvecX.getAsyncToken();		token = dvecX.getAsyncToken();
auto dvecY = rewriter.create<gpu::CreateDnVecOp>(loc, handleTp, tokenTp,		auto dvecY = rewriter.create<gpu::CreateDnVecOp>(loc, handleTp, tokenTp,
token, vecY, szY);		token, vecY, szY);
Value dnY = dvecY.getResult(0);		Value dnY = dvecY.getResult(0);
token = dvecY.getAsyncToken();		token = dvecY.getAsyncToken();

// Precompute buffersize for SpMV.		// Precompute buffersize for SpMV.
auto bufferComp = rewriter.create<gpu::SpMVBufferSizeOp>(		auto bufferComp = rewriter.create<gpu::SpMVBufferSizeOp>(
loc, indexTp, tokenTp, token, handle, spMatA, dnX, dnY);		loc, indexTp, tokenTp, token, handle, tOpA, spMatA, dnX, dnY);
		wrengrUnsubmitted Done Reply Inline Actions Do you actually need to pass this argument since it's the default value? (i.e., do the automatically generated builders allow skipping it) If so, then you should add a new builder definition which uses the default in lieu of requiring an argument here. (And ditto for all the other ops, of course) wrengr: Do you actually need to pass this argument since it's the default value? (i.e., do the…
		wrengrUnsubmitted Not Done Reply Inline Actions Fwiw, I think this was better when explicitly passing `modeA`, rather than passing `std::nullptr`. But the thing I had in mind was to have something more like: def GPU_SpMVBufferSizeOp : ... { let builders = [ OpBuilder<(ins Variadic<GPU_AsyncToken>:$asyncDependencies, GPU_SparseEnvHandle:$env, GPU_SparseSpMatHandle:$spmatA, GPU_SparseDnVecHandle:$dnX, GPU_SparseDnVecHandle:$dnY), [{ auto modeA = gpu::TransposeMode::NON_TRANSPOSE; return build($_builder, $_state, asyncDependencies, env, modeA, spmatA, dnX, dnY); }]; } ...or whatever massaging of that is necessary to get it to compile. If you're having trouble getting this to work, just send me something on chat and we can try to work out the wrinkles :) wrengr: Fwiw, I think this was better when explicitly passing `modeA`, rather than passing `std…
		K-WuAuthorUnsubmitted Done Reply Inline Actions I make it work! Let me know if this looks good to you. K-Wu: I make it work! Let me know if this looks good to you.
Value bufferSz = bufferComp.getResult(0);		Value bufferSz = bufferComp.getResult(0);
token = bufferComp.getAsyncToken();		token = bufferComp.getAsyncToken();
auto buf = genAllocBuffer(rewriter, loc, bufferSz, token);		auto buf = genAllocBuffer(rewriter, loc, bufferSz, token);
Value buffer = buf.getResult(0);		Value buffer = buf.getResult(0);
token = buf.getAsyncToken();		token = buf.getAsyncToken();

// Perform the SpMV.		// Perform the SpMV.
auto spmvComp = rewriter.create<gpu::SpMVOp>(loc, tokenTp, token, handle,		auto spmvComp = rewriter.create<gpu::SpMVOp>(loc, tokenTp, token, handle,
spMatA, dnX, dnY, buffer);		tOpA, spMatA, dnX, dnY, buffer);
token = spmvComp.getAsyncToken();		token = spmvComp.getAsyncToken();

// Copy data back to host and free all the resoures.		// Copy data back to host and free all the resoures.
token = rewriter.create<gpu::DestroySpMatOp>(loc, tokenTp, token, spMatA)		token = rewriter.create<gpu::DestroySpMatOp>(loc, tokenTp, token, spMatA)
.getAsyncToken();		.getAsyncToken();
token = rewriter.create<gpu::DestroyDnVecOp>(loc, tokenTp, token, dnX)		token = rewriter.create<gpu::DestroyDnVecOp>(loc, tokenTp, token, dnX)
.getAsyncToken();		.getAsyncToken();
token = rewriter.create<gpu::DestroyDnVecOp>(loc, tokenTp, token, dnY)		token = rewriter.create<gpu::DestroyDnVecOp>(loc, tokenTp, token, dnY)
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	static LogicalResult rewriteSpMM(PatternRewriter &rewriter,
Type indexTp = rewriter.getIndexType();		Type indexTp = rewriter.getIndexType();
Type handleTp = rewriter.getType<gpu::SparseHandleType>();		Type handleTp = rewriter.getType<gpu::SparseHandleType>();
Type tokenTp = rewriter.getType<gpu::AsyncTokenType>();		Type tokenTp = rewriter.getType<gpu::AsyncTokenType>();
Value token = genFirstWait(rewriter, loc);		Value token = genFirstWait(rewriter, loc);
auto env =		auto env =
rewriter.create<gpu::CreateSparseEnvOp>(loc, handleTp, tokenTp, token);		rewriter.create<gpu::CreateSparseEnvOp>(loc, handleTp, tokenTp, token);
Value handle = env.getResult(0);		Value handle = env.getResult(0);
token = env.getAsyncToken();		token = env.getAsyncToken();
		auto tOpA = gpu::MatTransposeOp::NON_TRANSPOSE;
		auto tOpB = gpu::MatTransposeOp::NON_TRANSPOSE;
Operation *spGenA = genSpMat(rewriter, loc, handleTp, tokenTp, token, szm,		Operation *spGenA = genSpMat(rewriter, loc, handleTp, tokenTp, token, szm,
szk, nseA, rowA, colA, valA, isCOO, enableRT);		szk, nseA, rowA, colA, valA, isCOO, enableRT);
Value spMatA = spGenA->getResult(0);		Value spMatA = spGenA->getResult(0);
token = spGenA->getResult(1);		token = spGenA->getResult(1);
auto dmatB = rewriter.create<gpu::CreateDnMatOp>(loc, handleTp, tokenTp,		auto dmatB = rewriter.create<gpu::CreateDnMatOp>(loc, handleTp, tokenTp,
token, szk, szn, matB);		token, szk, szn, matB);
Value dnB = dmatB.getResult(0);		Value dnB = dmatB.getResult(0);
token = dmatB.getAsyncToken();		token = dmatB.getAsyncToken();
auto dmatC = rewriter.create<gpu::CreateDnMatOp>(loc, handleTp, tokenTp,		auto dmatC = rewriter.create<gpu::CreateDnMatOp>(loc, handleTp, tokenTp,
token, szm, szn, matC);		token, szm, szn, matC);
Value dnC = dmatC.getResult(0);		Value dnC = dmatC.getResult(0);
token = dmatC.getAsyncToken();		token = dmatC.getAsyncToken();

// Precompute buffersize for SpMM.		// Precompute buffersize for SpMM.
auto bufferComp = rewriter.create<gpu::SpMMBufferSizeOp>(		auto bufferComp = rewriter.create<gpu::SpMMBufferSizeOp>(
loc, indexTp, tokenTp, token, handle, spMatA, dnB, dnC);		loc, indexTp, tokenTp, token, handle, tOpA, tOpB, spMatA, dnB, dnC);
Value bufferSz = bufferComp.getResult(0);		Value bufferSz = bufferComp.getResult(0);
token = bufferComp.getAsyncToken();		token = bufferComp.getAsyncToken();
auto buf = genAllocBuffer(rewriter, loc, bufferSz, token);		auto buf = genAllocBuffer(rewriter, loc, bufferSz, token);
Value buffer = buf.getResult(0);		Value buffer = buf.getResult(0);
token = buf.getAsyncToken();		token = buf.getAsyncToken();

// Perform the SpMM.		// Perform the SpMM.
auto spmmComp = rewriter.create<gpu::SpMMOp>(loc, tokenTp, token, handle,		auto spmmComp = rewriter.create<gpu::SpMMOp>(
spMatA, dnB, dnC, buffer);		loc, tokenTp, token, handle, tOpA, tOpB, spMatA, dnB, dnC, buffer);
token = spmmComp.getAsyncToken();		token = spmmComp.getAsyncToken();

// Copy data back to host and free all the resoures.		// Copy data back to host and free all the resoures.
token = rewriter.create<gpu::DestroySpMatOp>(loc, tokenTp, token, spMatA)		token = rewriter.create<gpu::DestroySpMatOp>(loc, tokenTp, token, spMatA)
.getAsyncToken();		.getAsyncToken();
token = rewriter.create<gpu::DestroyDnMatOp>(loc, tokenTp, token, dnB)		token = rewriter.create<gpu::DestroyDnMatOp>(loc, tokenTp, token, dnB)
.getAsyncToken();		.getAsyncToken();
token = rewriter.create<gpu::DestroyDnMatOp>(loc, tokenTp, token, dnC)		token = rewriter.create<gpu::DestroyDnMatOp>(loc, tokenTp, token, dnC)
▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(linalg::GenericOp op,
bindDims(getContext(), i, j, k);		bindDims(getContext(), i, j, k);

// TODO: more robust patterns, tranposed versions, more kernels...		// TODO: more robust patterns, tranposed versions, more kernels...

// Recognize a SpMV kernel.		// Recognize a SpMV kernel.
if (numLoops == 2 && numTensors == 3 &&		if (numLoops == 2 && numTensors == 3 &&
linalg::isParallelIterator(iteratorTypes[0]) &&		linalg::isParallelIterator(iteratorTypes[0]) &&
linalg::isReductionIterator(iteratorTypes[1]) &&		linalg::isReductionIterator(iteratorTypes[1]) &&
		// TODO: add transposed {i, j}
		aartbikUnsubmitted Not Done Reply Inline Actions you can remove the transpose form TODO from L740 ;-) since you have this now aartbik: you can remove the transpose form TODO from L740 ;-) since you have this now
		K-WuAuthorUnsubmitted Done Reply Inline Actions This diff introduces the transpose attribute but I haven't added rewrite rules to recognize these patterns in loop nest and set the matrix as transposed. Shall I keep this comment and address it in new diffs? K-Wu: This diff introduces the transpose attribute but I haven't added rewrite rules to recognize…
		aartbikUnsubmitted Done Reply Inline Actions nah, I was just observing the overlap in TODOs now, but no biggie aartbik: nah, I was just observing the overlap in TODOs now, but no biggie
maps == infer({{i, j}, {j}, {i}}) && matchSumOfMultOfArgs(op)) {		maps == infer({{i, j}, {j}, {i}}) && matchSumOfMultOfArgs(op)) {
return rewriteSpMV(rewriter, op, enableRT);		return rewriteSpMV(rewriter, op, enableRT);
}		}

// Recognize a SpMM kernel.		// Recognize a SpMM kernel.
if (numLoops == 3 && numTensors == 3 &&		if (numLoops == 3 && numTensors == 3 &&
linalg::isParallelIterator(iteratorTypes[0]) &&		linalg::isParallelIterator(iteratorTypes[0]) &&
linalg::isParallelIterator(iteratorTypes[1]) &&		linalg::isParallelIterator(iteratorTypes[1]) &&
linalg::isReductionIterator(iteratorTypes[2]) &&		linalg::isReductionIterator(iteratorTypes[2]) &&
		// TODO: add transposed {i, k}, {k, j}
		// TODO: maybe add transposed {i, j} in future
maps == infer({{i, k}, {k, j}, {i, j}}) && matchSumOfMultOfArgs(op)) {		maps == infer({{i, k}, {k, j}, {i, j}}) && matchSumOfMultOfArgs(op)) {
return rewriteSpMM(rewriter, op, enableRT);		return rewriteSpMM(rewriter, op, enableRT);
}		}

return failure();		return failure();
}		}

private:		private:
Show All 24 Lines

mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp

	Show First 20 Lines • Show All 316 Lines • ▼ Show 20 Lines
	}			}

	extern "C" MLIR_CUDA_WRAPPERS_EXPORT void			extern "C" MLIR_CUDA_WRAPPERS_EXPORT void
	mgpuDestroySpMat(void m, CUstream /stream*/) {			mgpuDestroySpMat(void m, CUstream /stream*/) {
	cusparseSpMatDescr_t mat = reinterpret_cast<cusparseSpMatDescr_t>(m);			cusparseSpMatDescr_t mat = reinterpret_cast<cusparseSpMatDescr_t>(m);
	CUSPARSE_REPORT_IF_ERROR(cusparseDestroySpMat(mat))			CUSPARSE_REPORT_IF_ERROR(cusparseDestroySpMat(mat))
	}			}

	extern "C" MLIR_CUDA_WRAPPERS_EXPORT intptr_t mgpuSpMVBufferSize(			extern "C" MLIR_CUDA_WRAPPERS_EXPORT intptr_t
	void h, void a, void x, void y, int32_t dw, CUstream /stream/) {			mgpuSpMVBufferSize(void h, int32_t oa, void a, void x, void y, int32_t dw,
				CUstream /stream/) {
	cusparseHandle_t handle = reinterpret_cast<cusparseHandle_t>(h);			cusparseHandle_t handle = reinterpret_cast<cusparseHandle_t>(h);
				cusparseOperation_t opA = static_cast<cusparseOperation_t>(oa);
	cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);			cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);
	cusparseDnVecDescr_t vecX = reinterpret_cast<cusparseDnVecDescr_t>(x);			cusparseDnVecDescr_t vecX = reinterpret_cast<cusparseDnVecDescr_t>(x);
	cusparseDnVecDescr_t vecY = reinterpret_cast<cusparseDnVecDescr_t>(y);			cusparseDnVecDescr_t vecY = reinterpret_cast<cusparseDnVecDescr_t>(y);
	cudaDataType_t dtp = dataTp(dw);			cudaDataType_t dtp = dataTp(dw);
	double alpha = 1.0;			double alpha = 1.0;
	double beta = 1.0;			double beta = 1.0;
	size_t bufferSize = 0;			size_t bufferSize = 0;
	CUSPARSE_REPORT_IF_ERROR(cusparseSpMV_bufferSize(			CUSPARSE_REPORT_IF_ERROR(
	handle, CUSPARSE_OPERATION_NON_TRANSPOSE, &alpha, matA, vecX, &beta, vecY,			cusparseSpMV_bufferSize(handle, opA, &alpha, matA, vecX, &beta, vecY, dtp,
	dtp, CUSPARSE_SPMV_ALG_DEFAULT, &bufferSize))			CUSPARSE_SPMV_ALG_DEFAULT, &bufferSize))
	return bufferSize == 0 ? 1 : bufferSize; // avoid zero-alloc			return bufferSize == 0 ? 1 : bufferSize; // avoid zero-alloc
	}			}

	extern "C" MLIR_CUDA_WRAPPERS_EXPORT void mgpuSpMV(void h, void a, void *x,			extern "C" MLIR_CUDA_WRAPPERS_EXPORT void mgpuSpMV(void h, int32_t oa, void a,
	void *y, int32_t dw,			void x, void y, int32_t dw,
	void *buf,			void *buf,
	CUstream /stream/) {			CUstream /stream/) {
	cusparseHandle_t handle = reinterpret_cast<cusparseHandle_t>(h);			cusparseHandle_t handle = reinterpret_cast<cusparseHandle_t>(h);
				cusparseOperation_t opA = static_cast<cusparseOperation_t>(oa);
	cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);			cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);
	cusparseDnVecDescr_t vecX = reinterpret_cast<cusparseDnVecDescr_t>(x);			cusparseDnVecDescr_t vecX = reinterpret_cast<cusparseDnVecDescr_t>(x);
	cusparseDnVecDescr_t vecY = reinterpret_cast<cusparseDnVecDescr_t>(y);			cusparseDnVecDescr_t vecY = reinterpret_cast<cusparseDnVecDescr_t>(y);
	cudaDataType_t dtp = dataTp(dw);			cudaDataType_t dtp = dataTp(dw);
	double alpha = 1.0;			double alpha = 1.0;
	double beta = 1.0;			double beta = 1.0;
	CUSPARSE_REPORT_IF_ERROR(			CUSPARSE_REPORT_IF_ERROR(cusparseSpMV(handle, opA, &alpha, matA, vecX, &beta,
	cusparseSpMV(handle, CUSPARSE_OPERATION_NON_TRANSPOSE, &alpha, matA, vecX,			vecY, dtp, CUSPARSE_SPMV_ALG_DEFAULT,
	&beta, vecY, dtp, CUSPARSE_SPMV_ALG_DEFAULT, buf))			buf))
	}			}

	extern "C" MLIR_CUDA_WRAPPERS_EXPORT intptr_t mgpuSpMMBufferSize(			extern "C" MLIR_CUDA_WRAPPERS_EXPORT intptr_t
	void h, void a, void b, void c, int32_t dw, CUstream /stream/) {			mgpuSpMMBufferSize(void h, int32_t oa, int32_t ob, void a, void b, void c,
				int32_t dw, CUstream /stream/) {
	cusparseHandle_t handle = reinterpret_cast<cusparseHandle_t>(h);			cusparseHandle_t handle = reinterpret_cast<cusparseHandle_t>(h);
				cusparseOperation_t opA = static_cast<cusparseOperation_t>(oa);
				cusparseOperation_t opB = static_cast<cusparseOperation_t>(ob);
	cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);			cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);
	cusparseDnMatDescr_t matB = reinterpret_cast<cusparseDnMatDescr_t>(b);			cusparseDnMatDescr_t matB = reinterpret_cast<cusparseDnMatDescr_t>(b);
	cusparseDnMatDescr_t matC = reinterpret_cast<cusparseDnMatDescr_t>(c);			cusparseDnMatDescr_t matC = reinterpret_cast<cusparseDnMatDescr_t>(c);
	cudaDataType_t dtp = dataTp(dw);			cudaDataType_t dtp = dataTp(dw);
	double alpha = 1.0;			double alpha = 1.0;
	double beta = 1.0;			double beta = 1.0;
	size_t bufferSize = 0;			size_t bufferSize = 0;
	CUSPARSE_REPORT_IF_ERROR(cusparseSpMM_bufferSize(			CUSPARSE_REPORT_IF_ERROR(
	handle, CUSPARSE_OPERATION_NON_TRANSPOSE,			cusparseSpMM_bufferSize(handle, opA, opB, &alpha, matA, matB, &beta, matC,
	CUSPARSE_OPERATION_NON_TRANSPOSE, &alpha, matA, matB, &beta, matC, dtp,			dtp, CUSPARSE_SPMM_ALG_DEFAULT, &bufferSize))
	CUSPARSE_SPMM_ALG_DEFAULT, &bufferSize))
	return bufferSize == 0 ? 1 : bufferSize; // avoid zero-alloc			return bufferSize == 0 ? 1 : bufferSize; // avoid zero-alloc
	}			}

	extern "C" MLIR_CUDA_WRAPPERS_EXPORT void mgpuSpMM(void h, void a, void *b,			extern "C" MLIR_CUDA_WRAPPERS_EXPORT void
	void *c, int32_t dw,			mgpuSpMM(void h, int32_t oa, int32_t ob, void a, void b, void c, int32_t dw,
	void *buf,			void buf, CUstream /stream*/) {
	CUstream /stream/) {
	cusparseHandle_t handle = reinterpret_cast<cusparseHandle_t>(h);			cusparseHandle_t handle = reinterpret_cast<cusparseHandle_t>(h);
				cusparseOperation_t opA = static_cast<cusparseOperation_t>(oa);
				cusparseOperation_t opB = static_cast<cusparseOperation_t>(ob);
	cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);			cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);
	cusparseDnMatDescr_t matB = reinterpret_cast<cusparseDnMatDescr_t>(b);			cusparseDnMatDescr_t matB = reinterpret_cast<cusparseDnMatDescr_t>(b);
	cusparseDnMatDescr_t matC = reinterpret_cast<cusparseDnMatDescr_t>(c);			cusparseDnMatDescr_t matC = reinterpret_cast<cusparseDnMatDescr_t>(c);
	cudaDataType_t dtp = dataTp(dw);			cudaDataType_t dtp = dataTp(dw);
	double alpha = 1.0;			double alpha = 1.0;
	double beta = 1.0;			double beta = 1.0;
	CUSPARSE_REPORT_IF_ERROR(			CUSPARSE_REPORT_IF_ERROR(cusparseSpMM(handle, opA, opB, &alpha, matA, matB,
	cusparseSpMM(handle, CUSPARSE_OPERATION_NON_TRANSPOSE,			&beta, matC, dtp,
	CUSPARSE_OPERATION_NON_TRANSPOSE, &alpha, matA, matB, &beta,			CUSPARSE_SPMM_ALG_DEFAULT, buf))
	matC, dtp, CUSPARSE_SPMM_ALG_DEFAULT, buf))
	}			}

mlir/test/Conversion/GPUCommon/lower-sparse-to-gpu-runtime-calls.mlir

Show All 17 Lines	module attributes {gpu.container_module} {
// CHECK: llvm.call @mgpuStreamDestroy		// CHECK: llvm.call @mgpuStreamDestroy
func.func @matvec(%arg0: index) {		func.func @matvec(%arg0: index) {
%token0 = gpu.wait async		%token0 = gpu.wait async
%mem1, %token1 = gpu.alloc async [%token0] (%arg0) : memref<?xindex>		%mem1, %token1 = gpu.alloc async [%token0] (%arg0) : memref<?xindex>
%mem2, %token2 = gpu.alloc async [%token1] (%arg0) : memref<?xf64>		%mem2, %token2 = gpu.alloc async [%token1] (%arg0) : memref<?xf64>
%env, %token3 = gpu.create_sparse_env async [%token2]		%env, %token3 = gpu.create_sparse_env async [%token2]
%spmat, %token4 = gpu.create_coo async [%token3] %arg0, %arg0, %arg0, %mem1, %mem1, %mem2 : memref<?xindex>, memref<?xindex>, memref<?xf64>		%spmat, %token4 = gpu.create_coo async [%token3] %arg0, %arg0, %arg0, %mem1, %mem1, %mem2 : memref<?xindex>, memref<?xindex>, memref<?xf64>
%dnvec, %token5 = gpu.create_dn_vec async [%token4] %mem2, %arg0 : memref<?xf64>		%dnvec, %token5 = gpu.create_dn_vec async [%token4] %mem2, %arg0 : memref<?xf64>
%bufferSz, %token6 = gpu.spmv_buffer_size async [%token5] %env, %spmat, %dnvec, %dnvec		%bufferSz, %token6 = gpu.spmv_buffer_size async [%token5] %env, %spmat{NON_TRANSPOSE}, %dnvec, %dnvec
%token7 = gpu.spmv async [%token6] %env, %spmat, %dnvec, %dnvec, %mem2 : memref<?xf64>		%token7 = gpu.spmv async [%token6] %env, %spmat{NON_TRANSPOSE}, %dnvec, %dnvec, %mem2 : memref<?xf64>
%token8 = gpu.destroy_sp_mat async [%token7] %spmat		%token8 = gpu.destroy_sp_mat async [%token7] %spmat
%token9 = gpu.destroy_dn_vec async [%token8] %dnvec		%token9 = gpu.destroy_dn_vec async [%token8] %dnvec
%token10 = gpu.destroy_sparse_env async [%token9] %env		%token10 = gpu.destroy_sparse_env async [%token9] %env
gpu.wait [%token10]		gpu.wait [%token10]
return		return
}		}

// CHECK-LABEL: func @matmul		// CHECK-LABEL: func @matmul
Show All 12 Lines	module attributes {gpu.container_module} {
// CHECK: llvm.call @mgpuStreamDestroy		// CHECK: llvm.call @mgpuStreamDestroy
func.func @matmul(%arg0: index) {		func.func @matmul(%arg0: index) {
%token0 = gpu.wait async		%token0 = gpu.wait async
%mem1, %token1 = gpu.alloc async [%token0] (%arg0) : memref<?xindex>		%mem1, %token1 = gpu.alloc async [%token0] (%arg0) : memref<?xindex>
%mem2, %token2 = gpu.alloc async [%token1] (%arg0) : memref<?xf64>		%mem2, %token2 = gpu.alloc async [%token1] (%arg0) : memref<?xf64>
%env, %token3 = gpu.create_sparse_env async [%token2]		%env, %token3 = gpu.create_sparse_env async [%token2]
%spmat, %token4 = gpu.create_csr async [%token3] %arg0, %arg0, %arg0, %mem1, %mem1, %mem2 : memref<?xindex>, memref<?xindex>, memref<?xf64>		%spmat, %token4 = gpu.create_csr async [%token3] %arg0, %arg0, %arg0, %mem1, %mem1, %mem2 : memref<?xindex>, memref<?xindex>, memref<?xf64>
%dnmat, %token5 = gpu.create_dn_mat async [%token4] %arg0, %arg0, %mem2 : memref<?xf64>		%dnmat, %token5 = gpu.create_dn_mat async [%token4] %arg0, %arg0, %mem2 : memref<?xf64>
%bufferSz, %token6 = gpu.spmm_buffer_size async [%token5] %env, %spmat, %dnmat, %dnmat		%bufferSz, %token6 = gpu.spmm_buffer_size async [%token5] %env, %spmat{NON_TRANSPOSE}, %dnmat{NON_TRANSPOSE}, %dnmat
%token7 = gpu.spmm async [%token6] %env, %spmat, %dnmat, %dnmat, %mem2 : memref<?xf64>		%token7 = gpu.spmm async [%token6] %env, %spmat{NON_TRANSPOSE}, %dnmat{NON_TRANSPOSE}, %dnmat, %mem2 : memref<?xf64>
%token8 = gpu.destroy_sp_mat async [%token7] %spmat		%token8 = gpu.destroy_sp_mat async [%token7] %spmat
%token9 = gpu.destroy_dn_mat async [%token8] %dnmat		%token9 = gpu.destroy_dn_mat async [%token8] %dnmat
%token10 = gpu.destroy_sparse_env async [%token9] %env		%token10 = gpu.destroy_sparse_env async [%token9] %env
gpu.wait [%token10]		gpu.wait [%token10]
return		return
}		}

}		}

mlir/test/Dialect/GPU/ops.mlir

Show First 20 Lines • Show All 329 Lines • ▼ Show 20 Lines	func.func @sparse_ops(%arg0: index) {
%env, %token3 = gpu.create_sparse_env async [%token2]		%env, %token3 = gpu.create_sparse_env async [%token2]
// CHECK: gpu.create_coo async		// CHECK: gpu.create_coo async
%spmat, %token4 = gpu.create_coo async [%token3] %arg0, %arg0, %arg0, %mem1, %mem1, %mem2 : memref<?xindex>, memref<?xindex>, memref<?xf64>		%spmat, %token4 = gpu.create_coo async [%token3] %arg0, %arg0, %arg0, %mem1, %mem1, %mem2 : memref<?xindex>, memref<?xindex>, memref<?xf64>
// CHECK: gpu.create_csr async		// CHECK: gpu.create_csr async
%spmat2, %token5 = gpu.create_csr async [%token4] %arg0, %arg0, %arg0, %mem1, %mem1, %mem2 : memref<?xindex>, memref<?xindex>, memref<?xf64>		%spmat2, %token5 = gpu.create_csr async [%token4] %arg0, %arg0, %arg0, %mem1, %mem1, %mem2 : memref<?xindex>, memref<?xindex>, memref<?xf64>
// CHECK: gpu.create_dn_vec async		// CHECK: gpu.create_dn_vec async
%dnvec, %token6 = gpu.create_dn_vec async [%token5] %mem2, %arg0 : memref<?xf64>		%dnvec, %token6 = gpu.create_dn_vec async [%token5] %mem2, %arg0 : memref<?xf64>
// CHECK: gpu.spmv_buffer_size async		// CHECK: gpu.spmv_buffer_size async
%bufferSz, %token7 = gpu.spmv_buffer_size async [%token6] %env, %spmat, %dnvec, %dnvec		%bufferSz, %token7 = gpu.spmv_buffer_size async [%token6] %env, %spmat{NON_TRANSPOSE}, %dnvec, %dnvec
		wrengrUnsubmitted Not Done Reply Inline Actions If you adjust the assemblyFormat to make the transpose-mode optional like I showed, then you should be able to revert all these changes to the test files (since the generated printers ought to avoid printing the attribute whenever it's the default value) wrengr: If you adjust the assemblyFormat to make the transpose-mode optional like I showed, then you…
// CHECK: gpu.spmv async		// CHECK: gpu.spmv async
%token8 = gpu.spmv async [%token7] %env, %spmat, %dnvec, %dnvec, %mem2 : memref<?xf64>		%token8 = gpu.spmv async [%token7] %env, %spmat{NON_TRANSPOSE}, %dnvec, %dnvec, %mem2 : memref<?xf64>
// CHECK: gpu.create_dn_mat async		// CHECK: gpu.create_dn_mat async
%dnmat, %token9 = gpu.create_dn_mat async [%token8] %arg0, %arg0, %mem2 : memref<?xf64>		%dnmat, %token9 = gpu.create_dn_mat async [%token8] %arg0, %arg0, %mem2 : memref<?xf64>
// CHECK: gpu.spmm_buffer_size async		// CHECK: gpu.spmm_buffer_size async
%bufferSz2, %token10 = gpu.spmm_buffer_size async [%token9] %env, %spmat, %dnmat, %dnmat		%bufferSz2, %token10 = gpu.spmm_buffer_size async [%token9] %env, %spmat{NON_TRANSPOSE}, %dnmat{NON_TRANSPOSE}, %dnmat
// CHECK: gpu.spmm async		// CHECK: gpu.spmm async
%token11 = gpu.spmm async [%token10] %env, %spmat, %dnmat, %dnmat, %mem2 : memref<?xf64>		%token11 = gpu.spmm async [%token10] %env, %spmat{NON_TRANSPOSE}, %dnmat{NON_TRANSPOSE}, %dnmat, %mem2 : memref<?xf64>
// CHECK: gpu.destroy_dn_mat async		// CHECK: gpu.destroy_dn_mat async
%token12 = gpu.destroy_dn_mat async [%token11] %dnmat		%token12 = gpu.destroy_dn_mat async [%token11] %dnmat
// CHECK: gpu.destroy_sp_mat async		// CHECK: gpu.destroy_sp_mat async
%token13 = gpu.destroy_sp_mat async [%token12] %spmat		%token13 = gpu.destroy_sp_mat async [%token12] %spmat
// CHECK: gpu.destroy_dn_vec async		// CHECK: gpu.destroy_dn_vec async
%token14 = gpu.destroy_dn_vec async [%token13] %dnvec		%token14 = gpu.destroy_dn_vec async [%token13] %dnvec
// CHECK: gpu.destroy_sparse_env async		// CHECK: gpu.destroy_sparse_env async
%token15 = gpu.destroy_sparse_env async [%token14] %env		%token15 = gpu.destroy_sparse_env async [%token14] %env
Show All 12 Lines

mlir/test/Dialect/SparseTensor/GPU/gpu_matmul_lib.mlir

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	// CHECK: %[[VAL_38:.]], %[[VAL_39:.]] = gpu.alloc async {{\[}}%[[VAL_35]]] (%[[VAL_36]], %[[VAL_37]]) : memref<?x?xf64>			// CHECK: %[[VAL_38:.]], %[[VAL_39:.]] = gpu.alloc async {{\[}}%[[VAL_35]]] (%[[VAL_36]], %[[VAL_37]]) : memref<?x?xf64>
	// CHECK: %[[VAL_40:.*]] = gpu.memcpy async {{\[}}%[[VAL_39]]] %[[VAL_38]], %[[VAL_34]] : memref<?x?xf64>, memref<?x?xf64>			// CHECK: %[[VAL_40:.*]] = gpu.memcpy async {{\[}}%[[VAL_39]]] %[[VAL_38]], %[[VAL_34]] : memref<?x?xf64>, memref<?x?xf64>
	// CHECK: gpu.wait {{\[}}%[[VAL_16]], %[[VAL_21]], %[[VAL_26]], %[[VAL_33]], %[[VAL_40]]]			// CHECK: gpu.wait {{\[}}%[[VAL_16]], %[[VAL_21]], %[[VAL_26]], %[[VAL_33]], %[[VAL_40]]]
	// CHECK: %[[VAL_41:.*]] = gpu.wait async			// CHECK: %[[VAL_41:.*]] = gpu.wait async
	// CHECK: %[[VAL_42:.]], %[[VAL_43:.]] = gpu.create_sparse_env async {{\[}}%[[VAL_41]]]			// CHECK: %[[VAL_42:.]], %[[VAL_43:.]] = gpu.create_sparse_env async {{\[}}%[[VAL_41]]]
	// CHECK: %[[VAL_44:.]], %[[VAL_45:.]] = gpu.create_csr async {{\[}}%[[VAL_43]]] %[[VAL_6]], %[[VAL_8]], %[[VAL_5]], %[[VAL_14]], %[[VAL_19]], %[[VAL_24]] : memref<?xindex>, memref<?xindex>, memref<?xf64>			// CHECK: %[[VAL_44:.]], %[[VAL_45:.]] = gpu.create_csr async {{\[}}%[[VAL_43]]] %[[VAL_6]], %[[VAL_8]], %[[VAL_5]], %[[VAL_14]], %[[VAL_19]], %[[VAL_24]] : memref<?xindex>, memref<?xindex>, memref<?xf64>
	// CHECK: %[[VAL_46:.]], %[[VAL_47:.]] = gpu.create_dn_mat async {{\[}}%[[VAL_45]]] %[[VAL_8]], %[[VAL_7]], %[[VAL_31]] : memref<?x?xf64>			// CHECK: %[[VAL_46:.]], %[[VAL_47:.]] = gpu.create_dn_mat async {{\[}}%[[VAL_45]]] %[[VAL_8]], %[[VAL_7]], %[[VAL_31]] : memref<?x?xf64>
	// CHECK: %[[VAL_48:.]], %[[VAL_49:.]] = gpu.create_dn_mat async {{\[}}%[[VAL_47]]] %[[VAL_6]], %[[VAL_7]], %[[VAL_38]] : memref<?x?xf64>			// CHECK: %[[VAL_48:.]], %[[VAL_49:.]] = gpu.create_dn_mat async {{\[}}%[[VAL_47]]] %[[VAL_6]], %[[VAL_7]], %[[VAL_38]] : memref<?x?xf64>
	// CHECK: %[[VAL_50:.]], %[[VAL_51:.]] = gpu.spmm_buffer_size async {{\[}}%[[VAL_49]]] %[[VAL_42]], %[[VAL_44]], %[[VAL_46]], %[[VAL_48]]			// CHECK: %[[VAL_50:.]], %[[VAL_51:.]] = gpu.spmm_buffer_size async {{\[}}%[[VAL_49]]] %[[VAL_42]], %[[VAL_44]]{ NON_TRANSPOSE}, %[[VAL_46]]{ NON_TRANSPOSE}, %[[VAL_48]]
	// CHECK: %[[VAL_52:.]], %[[VAL_53:.]] = gpu.alloc async {{\[}}%[[VAL_51]]] (%[[VAL_50]]) : memref<?xi8>			// CHECK: %[[VAL_52:.]], %[[VAL_53:.]] = gpu.alloc async {{\[}}%[[VAL_51]]] (%[[VAL_50]]) : memref<?xi8>
	// CHECK: %[[VAL_54:.*]] = gpu.spmm async {{\[}}%[[VAL_53]]] %[[VAL_42]], %[[VAL_44]], %[[VAL_46]], %[[VAL_48]], %[[VAL_52]] : memref<?xi8>			// CHECK: %[[VAL_54:.*]] = gpu.spmm async {{\[}}%[[VAL_53]]] %[[VAL_42]], %[[VAL_44]]{ NON_TRANSPOSE}, %[[VAL_46]]{ NON_TRANSPOSE}, %[[VAL_48]], %[[VAL_52]] : memref<?xi8>
	// CHECK: %[[VAL_55:.*]] = gpu.destroy_sp_mat async {{\[}}%[[VAL_54]]] %[[VAL_44]]			// CHECK: %[[VAL_55:.*]] = gpu.destroy_sp_mat async {{\[}}%[[VAL_54]]] %[[VAL_44]]
	// CHECK: %[[VAL_56:.*]] = gpu.destroy_dn_mat async {{\[}}%[[VAL_55]]] %[[VAL_46]]			// CHECK: %[[VAL_56:.*]] = gpu.destroy_dn_mat async {{\[}}%[[VAL_55]]] %[[VAL_46]]
	// CHECK: %[[VAL_57:.*]] = gpu.destroy_dn_mat async {{\[}}%[[VAL_56]]] %[[VAL_48]]			// CHECK: %[[VAL_57:.*]] = gpu.destroy_dn_mat async {{\[}}%[[VAL_56]]] %[[VAL_48]]
	// CHECK: %[[VAL_58:.*]] = gpu.destroy_sparse_env async {{\[}}%[[VAL_57]]] %[[VAL_42]]			// CHECK: %[[VAL_58:.*]] = gpu.destroy_sparse_env async {{\[}}%[[VAL_57]]] %[[VAL_42]]
	// CHECK: gpu.wait {{\[}}%[[VAL_58]]]			// CHECK: gpu.wait {{\[}}%[[VAL_58]]]
	// CHECK: %[[VAL_59:.*]] = gpu.wait async			// CHECK: %[[VAL_59:.*]] = gpu.wait async
	// CHECK: %[[VAL_60:.*]] = gpu.memcpy async {{\[}}%[[VAL_59]]] %[[VAL_34]], %[[VAL_38]] : memref<?x?xf64>, memref<?x?xf64>			// CHECK: %[[VAL_60:.*]] = gpu.memcpy async {{\[}}%[[VAL_59]]] %[[VAL_34]], %[[VAL_38]] : memref<?x?xf64>, memref<?x?xf64>
	// CHECK: %[[VAL_61:.*]] = gpu.dealloc async {{\[}}%[[VAL_60]]] %[[VAL_14]] : memref<?xindex>			// CHECK: %[[VAL_61:.*]] = gpu.dealloc async {{\[}}%[[VAL_60]]] %[[VAL_14]] : memref<?xindex>
	Show All 14 Lines

mlir/test/Dialect/SparseTensor/GPU/gpu_matvec_lib.mlir

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	// CHECK: %[[VAL_35:.]], %[[VAL_36:.]] = gpu.alloc async {{\[}}%[[VAL_33]]] (%[[VAL_34]]) : memref<?xf64>			// CHECK: %[[VAL_35:.]], %[[VAL_36:.]] = gpu.alloc async {{\[}}%[[VAL_33]]] (%[[VAL_34]]) : memref<?xf64>
	// CHECK: %[[VAL_37:.*]] = gpu.memcpy async {{\[}}%[[VAL_36]]] %[[VAL_35]], %[[VAL_32]] : memref<?xf64>, memref<?xf64>			// CHECK: %[[VAL_37:.*]] = gpu.memcpy async {{\[}}%[[VAL_36]]] %[[VAL_35]], %[[VAL_32]] : memref<?xf64>, memref<?xf64>
	// CHECK: gpu.wait {{\[}}%[[VAL_15]], %[[VAL_20]], %[[VAL_25]], %[[VAL_31]], %[[VAL_37]]]			// CHECK: gpu.wait {{\[}}%[[VAL_15]], %[[VAL_20]], %[[VAL_25]], %[[VAL_31]], %[[VAL_37]]]
	// CHECK: %[[VAL_38:.*]] = gpu.wait async			// CHECK: %[[VAL_38:.*]] = gpu.wait async
	// CHECK: %[[VAL_39:.]], %[[VAL_40:.]] = gpu.create_sparse_env async {{\[}}%[[VAL_38]]]			// CHECK: %[[VAL_39:.]], %[[VAL_40:.]] = gpu.create_sparse_env async {{\[}}%[[VAL_38]]]
	// CHECK: %[[VAL_41:.]], %[[VAL_42:.]] = gpu.create_coo async {{\[}}%[[VAL_40]]] %[[VAL_6]], %[[VAL_7]], %[[VAL_5]], %[[VAL_13]], %[[VAL_18]], %[[VAL_23]] : memref<?xindex>, memref<?xindex>, memref<?xf64>			// CHECK: %[[VAL_41:.]], %[[VAL_42:.]] = gpu.create_coo async {{\[}}%[[VAL_40]]] %[[VAL_6]], %[[VAL_7]], %[[VAL_5]], %[[VAL_13]], %[[VAL_18]], %[[VAL_23]] : memref<?xindex>, memref<?xindex>, memref<?xf64>
	// CHECK: %[[VAL_43:.]], %[[VAL_44:.]] = gpu.create_dn_vec async {{\[}}%[[VAL_42]]] %[[VAL_29]], %[[VAL_7]] : memref<?xf64>			// CHECK: %[[VAL_43:.]], %[[VAL_44:.]] = gpu.create_dn_vec async {{\[}}%[[VAL_42]]] %[[VAL_29]], %[[VAL_7]] : memref<?xf64>
	// CHECK: %[[VAL_45:.]], %[[VAL_46:.]] = gpu.create_dn_vec async {{\[}}%[[VAL_44]]] %[[VAL_35]], %[[VAL_6]] : memref<?xf64>			// CHECK: %[[VAL_45:.]], %[[VAL_46:.]] = gpu.create_dn_vec async {{\[}}%[[VAL_44]]] %[[VAL_35]], %[[VAL_6]] : memref<?xf64>
	// CHECK: %[[VAL_47:.]], %[[VAL_48:.]] = gpu.spmv_buffer_size async {{\[}}%[[VAL_46]]] %[[VAL_39]], %[[VAL_41]], %[[VAL_43]], %[[VAL_45]]			// CHECK: %[[VAL_47:.]], %[[VAL_48:.]] = gpu.spmv_buffer_size async {{\[}}%[[VAL_46]]] %[[VAL_39]], %[[VAL_41]]{ NON_TRANSPOSE}, %[[VAL_43]], %[[VAL_45]]
	// CHECK: %[[VAL_49:.]], %[[VAL_50:.]] = gpu.alloc async {{\[}}%[[VAL_48]]] (%[[VAL_47]]) : memref<?xi8>			// CHECK: %[[VAL_49:.]], %[[VAL_50:.]] = gpu.alloc async {{\[}}%[[VAL_48]]] (%[[VAL_47]]) : memref<?xi8>
	// CHECK: %[[VAL_51:.*]] = gpu.spmv async {{\[}}%[[VAL_50]]] %[[VAL_39]], %[[VAL_41]], %[[VAL_43]], %[[VAL_45]], %[[VAL_49]] : memref<?xi8>			// CHECK: %[[VAL_51:.*]] = gpu.spmv async {{\[}}%[[VAL_50]]] %[[VAL_39]], %[[VAL_41]]{ NON_TRANSPOSE}, %[[VAL_43]], %[[VAL_45]], %[[VAL_49]] : memref<?xi8>
	// CHECK: %[[VAL_52:.*]] = gpu.destroy_sp_mat async {{\[}}%[[VAL_51]]] %[[VAL_41]]			// CHECK: %[[VAL_52:.*]] = gpu.destroy_sp_mat async {{\[}}%[[VAL_51]]] %[[VAL_41]]
	// CHECK: %[[VAL_53:.*]] = gpu.destroy_dn_vec async {{\[}}%[[VAL_52]]] %[[VAL_43]]			// CHECK: %[[VAL_53:.*]] = gpu.destroy_dn_vec async {{\[}}%[[VAL_52]]] %[[VAL_43]]
	// CHECK: %[[VAL_54:.*]] = gpu.destroy_dn_vec async {{\[}}%[[VAL_53]]] %[[VAL_45]]			// CHECK: %[[VAL_54:.*]] = gpu.destroy_dn_vec async {{\[}}%[[VAL_53]]] %[[VAL_45]]
	// CHECK: %[[VAL_55:.*]] = gpu.destroy_sparse_env async {{\[}}%[[VAL_54]]] %[[VAL_39]]			// CHECK: %[[VAL_55:.*]] = gpu.destroy_sparse_env async {{\[}}%[[VAL_54]]] %[[VAL_39]]
	// CHECK: gpu.wait {{\[}}%[[VAL_55]]]			// CHECK: gpu.wait {{\[}}%[[VAL_55]]]
	// CHECK: %[[VAL_56:.*]] = gpu.wait async			// CHECK: %[[VAL_56:.*]] = gpu.wait async
	// CHECK: %[[VAL_57:.*]] = gpu.memcpy async {{\[}}%[[VAL_56]]] %[[VAL_32]], %[[VAL_35]] : memref<?xf64>, memref<?xf64>			// CHECK: %[[VAL_57:.*]] = gpu.memcpy async {{\[}}%[[VAL_56]]] %[[VAL_32]], %[[VAL_35]] : memref<?xf64>, memref<?xf64>
	// CHECK: %[[VAL_58:.*]] = gpu.dealloc async {{\[}}%[[VAL_57]]] %[[VAL_13]] : memref<?xindex>			// CHECK: %[[VAL_58:.*]] = gpu.dealloc async {{\[}}%[[VAL_57]]] %[[VAL_13]] : memref<?xindex>
	Show All 18 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] [sparse] [gpu] adding transpose support to spmm spmvClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 524967

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp

mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp

mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp

mlir/test/Conversion/GPUCommon/lower-sparse-to-gpu-runtime-calls.mlir

mlir/test/Dialect/GPU/ops.mlir

mlir/test/Dialect/SparseTensor/GPU/gpu_matmul_lib.mlir

mlir/test/Dialect/SparseTensor/GPU/gpu_matvec_lib.mlir

[mlir] [sparse] [gpu] adding transpose support to spmm spmv
ClosedPublic