This is an archive of the discontinued LLVM Phabricator instance.

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
125–129	This code has been repeated a bunch of times now. You should define a new tblgen-class so that you can then simply say `def GPU_SparseSpGEMMOpHandle : GPU_SparseHandle<SparseSpGEMMOpHandleType, "SpGEMM operation">;` and analogously for `GPU_SparseSpMatHandle` and `GPU_SparseDnTensorHandle` Even better, it'd be nice to hook that `GPU_SparseHandle` tblgen-class in with the `SparseHandleKind` enum (also moving that enum from C++ to tblgen), so that you can further simplify that to `def GPU_SparseSpGEMMOpHandle : GPU_SparseHandle<SpGEMMOp>;` where the `"SpGEMM operation"` string is given as the description/summary of the `SpGEMMOp` member of the tblgen-enum.
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1805–1807	does CUDA specify this discrepancy between the numeric value and the algorithm name? If it doesn't, then you may want to make the numeric values match the algorithm name for better clarity

wrengr added inline comments.Jul 21 2023, 3:20 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2133	This description needs to be updated to actually say something about what the op is/does. Ditto for all the other new ops

I feels like some parts are still missing here. Also, do we need the full fine-grained GPU dialect op <-> cusparse lib call correspondence
can we start by merging a few of the set-up lib calls into one gpu dialect op for now (similar to what we did for cusparselt 2:4)

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
125–129	+1 on at least the first improvement (at first glance I was a bit surprised to see a handle for an operation and not the data, but after reading GEMM in detail, I get it ;-)
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1805–1807	Yeah, it even seems stranger. But perhaps we can keep the 3,4,5 directly here? typedef enum { CUSPARSE_SPGEMM_DEFAULT = 0, CUSPARSE_SPGEMM_CSR_ALG_DETERMINITIC = 1, CUSPARSE_SPGEMM_CSR_ALG_NONDETERMINITIC = 2, CUSPARSE_SPGEMM_ALG1 = 3, CUSPARSE_SPGEMM_ALG2 = 4, CUSPARSE_SPGEMM_ALG3 = 5 } cusparseSpGEMMAlg_t;
1815	why? doc says Default algorithm. Currently, it is CUSPARSE_SPGEMM_ALG1.
2133	+1
2154	seems like you did not finish the description?
2159	missing argument, no return? gpu.spgemm_destroy_descr %descriptor
2229	missing?
2234	seems not right?
2347	this is the copy op?
2361	not complete?
2404	not copy?
2406	get size?
2440	empty line for space

addressing comments

addressing some comments

Harbormaster completed remote builds in B249584: Diff 546193.Aug 1 2023, 2:27 PM

addressing comments

better document

Harbormaster completed remote builds in B249620: Diff 546254.Aug 1 2023, 5:13 PM

aartbik added inline comments.Aug 1 2023, 5:19 PM

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
130	"SpGEMM operation" -> "SpGEMM operation handle type"
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1815	doesn't this fit on one line now? or is this forced by lint?
2133	can we say a bit more? the doc now pretty much says what the name already implies. perhaps say how such a descriptor is used subsequently
2184	could we combine this with spgemm_estimate_memory to reduce the number of ops (i.e. if they always appear in sequence, we could keep the operation footprint a bit less compared to actual lib calls)?
2197	can we break this over multiple lines to stay in 80-col
2245	how does this perform the actual computation?
2304	wait, this is no longer used to determine buffer size, right? you used use the buffer that was allocated
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
798–808	yeah, I am also thinking that keeping the definition and declaration together would already go a long way reducing boilerplate quite a bit but, if we use a macro of the code here, then that would also be more readable and we can keep the code below
mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
481	we repeat this TODO everywhere, shall we just mention it once in the alpha/beta macro
mlir/test/Conversion/GPUCommon/lower-sparse-to-gpu-runtime-calls.mlir
123	break lone line
128	and here
mlir/test/Dialect/GPU/sparse-roundtrip.mlir
95	break
100	break

wrengr added inline comments.Aug 2 2023, 12:42 PM

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
126	Fwiw, The thing I had in mind for this class would be to use `description#" handle type"` here, since that string suffix is repeated by all the instances of this class.
130	cf my comment on line 116

wrengr added inline comments.Aug 2 2023, 1:10 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1805–1807	Is there any reason not to also include `DEFAULT`? Especially if we do indeed make it the `GPU_SpGEMMAlgAttr.defaultValue`... (I'd suggest also adding `CSR_ALG_{,NON}DETERMINITIC`, though that'd require some extra validation to make sure that the arguments are CSR whenever those algos are requested. Though I'd imagine that we already have similar requirements for `Alg{1,2,3}`...)
2184	And if you're worried that "even though they always go together now, perhaps we'll need to split them up later on", then that could always be done via giving the op a new argument `enum Estimate { Work, Memory, Both }`. ...That is, assuming the extra buffer arguments to the function calls can be easily hidden away as part of the op lowering. Since our goal is to build a compiler rather than to provide MLIR bindings to the library per se, I think it's better to try to keep the ops as high-level as we actually want/need for the compiler, rather than going for 1-to-1 correspondence with the library function calls. Keeping things at the granularity of the compiler often helps both to improve codegen and to avoid compatibility issues with different library versions. Granted there's always a balancing act for this sort of thing, but nevertheless
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
311–317	I'm not sure what style is used in the rest of the file/dialect, but is it really necessary/helpful to provide the C/C++ types for each of the arguments, given as they have an easy 1-to-1 mapping to the LLVM types? Giving the names of the arguments makes sense, but the types less so imo

aartbik added inline comments.Aug 3 2023, 9:25 AM

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
130	yeah, that would work (the full description would then only be visible in doc page, and not inline in code, but I like that this automated way forces consistently)
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2184	+1 on the compiler vs 1:1 lib argument, and the proposal for the future

addressing comments

K-Wu added inline comments.Aug 3 2023, 11:09 AM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2304	cusparse spGEMM APIs are very wired. i am merely restating what I learnt from its documentation but not sure if spgemm_compute only does the computation The functions cusparseSpGEMM_workEstimation(), cusparseSpGEMM_estimateMemory(), and cusparseSpGEMM_compute() are used for both determining the buffer size and performing the actual computation.

wrengr added inline comments.Aug 3 2023, 12:37 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1814	Nit: should have a space between the "`> {`"
2137	Should wrap this in backticks so that it's formatted/rendered properly. Ditto for analogous documentation on other ops
2143–2144	Shouldn't there only be one of these triple-backtick lines? Ditto for the other ops
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
343–347	Should drop the `void*` part of the comments here, for consistency

Harbormaster completed remote builds in B250127: Diff 546943.Aug 3 2023, 1:10 PM

introducing macros

addressing comments

addressing some comments

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2137	Got it. I will apply backticks in all the new operators in this revision, and create a dedicated revision to applying backtick to other tablegen entries and applying the declaration macro to other ops.

addressing some comments

Harbormaster completed remote builds in B250452: Diff 547381.Aug 4 2023, 5:17 PM

reducing op num

K-Wu marked 2 inline comments as done.Aug 7 2023, 12:18 PM

addressing formatting

K-Wu marked an inline comment as done.Aug 7 2023, 12:36 PM

aartbik added inline comments.Aug 7 2023, 12:48 PM

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td
129	sparse matrix ->dense matrix
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2132	I would drop the "supported by sparse tensor", just "selected algorithm for SpGEMM",

addressing comments

aartbik accepted this revision.Aug 7 2023, 4:16 PM

aartbik added inline comments.

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
311–317	I agree we should find an automated way for this, both the types and names. As for the names, now, it is inconsistent with the other parts, since we only documented stream for some reason ;-) But for the long lists, it actually does help to find the right ones. We still have the problem that we document the type for stream (as in void*stream), but not the others, but dropping the stream type would be inconsistent with the rest. So, okay to leave what you have now, but we probably need to revisit this one day.
716	we can use this for ALL rules now, right? I can see you don't want to do that in this CL, but add a TODO for follow up

This revision is now accepted and ready to land.Aug 7 2023, 4:16 PM

wrengr added inline comments.Aug 7 2023, 4:22 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
2157–2158	nit: should reformat this to avoid overly long lines
2339	nit: reformat to avoid overly long lines

addressing comments

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
311–317	noted :D

This revision was landed with ongoing or failed builds.Aug 7 2023, 5:36 PM

Closed by commit rGdfe294290948: [mlir][sparse][gpu] add spgemm operator (authored by K-Wu). · Explain Why

This revision was automatically updated to reflect the committed changes.

K-Wu added a commit: rGdfe294290948: [mlir][sparse][gpu] add spgemm operator.

Harbormaster completed remote builds in B250954: Diff 548005.Aug 7 2023, 8:28 PM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

GPU/

IR/

GPUBase.td

6 lines

GPUDialect.h

3 lines

GPUOps.td

330 lines

lib/

Conversion/

GPUCommon/

GPUToLLVMConversion.cpp

355 lines

Dialect/

GPU/

IR/

GPUDialect.cpp

8 lines

ExecutionEngine/

CudaRuntimeWrappers.cpp

115 lines

test/

Conversion/

GPUCommon/

lower-sparse-to-gpu-runtime-calls.mlir

61 lines

Dialect/

GPU/

sparse-roundtrip.mlir

57 lines

Diff 542602

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td

Show First 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	def GPU_SparseDnTensorHandle :
BuildableType<"mlir::gpu::SparseDnTensorHandleType::get($_builder.getContext())">;		BuildableType<"mlir::gpu::SparseDnTensorHandleType::get($_builder.getContext())">;

def GPU_SparseSpMatHandle :		def GPU_SparseSpMatHandle :
DialectType<GPU_Dialect,		DialectType<GPU_Dialect,
CPred<"llvm::isa<::mlir::gpu::SparseSpMatHandleType>($_self)">,		CPred<"llvm::isa<::mlir::gpu::SparseSpMatHandleType>($_self)">,
"sparse matrix handle type">,		"sparse matrix handle type">,
BuildableType<"mlir::gpu::SparseSpMatHandleType::get($_builder.getContext())">;		BuildableType<"mlir::gpu::SparseSpMatHandleType::get($_builder.getContext())">;

		def GPU_SparseSpGEMMOpHandle :
		DialectType<GPU_Dialect,
		wrengrUnsubmitted Done Reply Inline Actions Fwiw, The thing I had in mind for this class would be to use `description#" handle type"` here, since that string suffix is repeated by all the instances of this class. wrengr: Fwiw, The thing I had in mind for this class would be to use `description#" handle type"` here…
		CPred<"llvm::isa<::mlir::gpu::SparseSpGEMMOpHandleType>($_self)">,
		"SpGEMM operation handle type">,
		BuildableType<"mlir::gpu::SparseSpGEMMOpHandleType::get($_builder.getContext())">;
		wrengrUnsubmitted Done Reply Inline Actions This code has been repeated a bunch of times now. You should define a new tblgen-class so that you can then simply say `def GPU_SparseSpGEMMOpHandle : GPU_SparseHandle<SparseSpGEMMOpHandleType, "SpGEMM operation">;` and analogously for `GPU_SparseSpMatHandle` and `GPU_SparseDnTensorHandle` Even better, it'd be nice to hook that `GPU_SparseHandle` tblgen-class in with the `SparseHandleKind` enum (also moving that enum from C++ to tblgen), so that you can further simplify that to `def GPU_SparseSpGEMMOpHandle : GPU_SparseHandle<SpGEMMOp>;` where the `"SpGEMM operation"` string is given as the description/summary of the `SpGEMMOp` member of the tblgen-enum. wrengr: This code has been repeated a bunch of times now. You should define a new tblgen-class so that…
		aartbikUnsubmitted Done Reply Inline Actions +1 on at least the first improvement (at first glance I was a bit surprised to see a handle for an operation and not the data, but after reading GEMM in detail, I get it ;-) aartbik: +1 on at least the first improvement (at first glance I was a bit surprised to see a handle…
		aartbikUnsubmitted Done Reply Inline Actions sparse matrix ->dense matrix aartbik: sparse matrix ->dense matrix

		aartbikUnsubmitted Done Reply Inline Actions "SpGEMM operation" -> "SpGEMM operation handle type" aartbik: "SpGEMM operation" -> "SpGEMM operation handle type"
		wrengrUnsubmitted Done Reply Inline Actions cf my comment on line 116 wrengr: cf my comment on line 116
		aartbikUnsubmitted Done Reply Inline Actions yeah, that would work (the full description would then only be visible in doc page, and not inline in code, but I like that this automated way forces consistently) aartbik: yeah, that would work (the full description would then only be visible in doc page, and not…
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// GPU Interfaces.		// GPU Interfaces.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def GPU_AsyncOpInterface : OpInterface<"AsyncOpInterface"> {		def GPU_AsyncOpInterface : OpInterface<"AsyncOpInterface"> {
let description = [{		let description = [{
Interface for GPU operations that execute asynchronously on the device.		Interface for GPU operations that execute asynchronously on the device.

▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/GPU/IR/GPUDialect.h

Show First 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	public:
/// held by this type. String returned can be one of"AOp", "BOp" and "COp".		/// held by this type. String returned can be one of"AOp", "BOp" and "COp".
StringRef getOperand() const;		StringRef getOperand() const;
};		};

// Adds a `gpu.async.token` to the front of the argument list.		// Adds a `gpu.async.token` to the front of the argument list.
void addAsyncDependency(Operation *op, Value token);		void addAsyncDependency(Operation *op, Value token);

// Handle types for sparse.		// Handle types for sparse.
enum class SparseHandleKind { SpMat, DnTensor };		enum class SparseHandleKind { SpMat, DnTensor, SpGEMMOp };

template <SparseHandleKind K>		template <SparseHandleKind K>
class SparseHandleType		class SparseHandleType
: public Type::TypeBase<SparseHandleType<K>, Type, TypeStorage> {		: public Type::TypeBase<SparseHandleType<K>, Type, TypeStorage> {
public:		public:
using Base =		using Base =
typename Type::TypeBase<SparseHandleType<K>, Type, TypeStorage>::Base;		typename Type::TypeBase<SparseHandleType<K>, Type, TypeStorage>::Base;
using Base::Base;		using Base::Base;
};		};

using SparseDnTensorHandleType = SparseHandleType<SparseHandleKind::DnTensor>;		using SparseDnTensorHandleType = SparseHandleType<SparseHandleKind::DnTensor>;
using SparseSpMatHandleType = SparseHandleType<SparseHandleKind::SpMat>;		using SparseSpMatHandleType = SparseHandleType<SparseHandleKind::SpMat>;
		using SparseSpGEMMOpHandleType = SparseHandleType<SparseHandleKind::SpGEMMOp>;

} // namespace gpu		} // namespace gpu
} // namespace mlir		} // namespace mlir

#include "mlir/Dialect/GPU/IR/GPUOpsEnums.h.inc"		#include "mlir/Dialect/GPU/IR/GPUOpsEnums.h.inc"

#include "mlir/Dialect/GPU/IR/GPUOpsDialect.h.inc"		#include "mlir/Dialect/GPU/IR/GPUOpsDialect.h.inc"

Show All 11 Lines

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

Show First 20 Lines • Show All 1,793 Lines • ▼ Show 20 Lines	]> {
let cppNamespace = GPU_Dialect.cppNamespace;		let cppNamespace = GPU_Dialect.cppNamespace;
}		}

def GPU_TransposeModeAttr : EnumAttr<GPU_Dialect, GPU_TransposeMode,		def GPU_TransposeModeAttr : EnumAttr<GPU_Dialect, GPU_TransposeMode,
"mat_transpose_mode">{		"mat_transpose_mode">{
let defaultValue = "TransposeMode::NON_TRANSPOSE";		let defaultValue = "TransposeMode::NON_TRANSPOSE";
}		}

		def GPU_SpGEMMAlg : I32EnumAttr<"SpGEMMAlg",
		"selected algorithm for sparse matrix SpGEMM ops supported by sparse tensor",
		[
		I32EnumAttrCase<"ALG1", 0>,
		I32EnumAttrCase<"ALG2", 1>,
		I32EnumAttrCase<"ALG3", 2>,
		wrengrUnsubmitted Done Reply Inline Actions does CUDA specify this discrepancy between the numeric value and the algorithm name? If it doesn't, then you may want to make the numeric values match the algorithm name for better clarity wrengr: does CUDA specify this discrepancy between the numeric value and the algorithm name? If it…
		aartbikUnsubmitted Done Reply Inline Actions Yeah, it even seems stranger. But perhaps we can keep the 3,4,5 directly here? typedef enum { CUSPARSE_SPGEMM_DEFAULT = 0, CUSPARSE_SPGEMM_CSR_ALG_DETERMINITIC = 1, CUSPARSE_SPGEMM_CSR_ALG_NONDETERMINITIC = 2, CUSPARSE_SPGEMM_ALG1 = 3, CUSPARSE_SPGEMM_ALG2 = 4, CUSPARSE_SPGEMM_ALG3 = 5 } cusparseSpGEMMAlg_t; aartbik: Yeah, it even seems stranger. But perhaps we can keep the 3,4,5 directly here? typedef enum…
		wrengrUnsubmitted Done Reply Inline Actions Is there any reason not to also include `DEFAULT`? Especially if we do indeed make it the `GPU_SpGEMMAlgAttr.defaultValue`... (I'd suggest also adding `CSR_ALG_{,NON}DETERMINITIC`, though that'd require some extra validation to make sure that the arguments are CSR whenever those algos are requested. Though I'd imagine that we already have similar requirements for `Alg{1,2,3}`...) wrengr: Is there any reason not to also include `DEFAULT`? Especially if we do indeed make it the…
		]> {
		let genSpecializedAttr = 0;
		let cppNamespace = GPU_Dialect.cppNamespace;
		}

		def GPU_SpGEMMAlgAttr : EnumAttr<GPU_Dialect, GPU_SpGEMMAlg,
		"spgemm_alg">{
		wrengrUnsubmitted Done Reply Inline Actions Nit: should have a space between the "`> {`" wrengr: Nit: should have a space between the "`> {`"
		let defaultValue = "SpGEMMAlg::ALG2";
		aartbikUnsubmitted Done Reply Inline Actions why? doc says Default algorithm. Currently, it is CUSPARSE_SPGEMM_ALG1. aartbik: why? doc says Default algorithm. Currently, it is CUSPARSE_SPGEMM_ALG1.
		aartbikUnsubmitted Done Reply Inline Actions doesn't this fit on one line now? or is this forced by lint? aartbik: doesn't this fit on one line now? or is this forced by lint?
		}

def GPU_SpMVBufferSizeOp : GPU_Op<"spmv_buffer_size", [GPU_AsyncOpInterface]> {		def GPU_SpMVBufferSizeOp : GPU_Op<"spmv_buffer_size", [GPU_AsyncOpInterface]> {
let summary = "Precompute buffersize for SpMV operation";		let summary = "Precompute buffersize for SpMV operation";
let description = [{		let description = [{
The `gpu.spmv_buffer_size` operation returns the buffer size required		The `gpu.spmv_buffer_size` operation returns the buffer size required
to perform the SpMV operation on the given sparse matrix and dense vectors.		to perform the SpMV operation on the given sparse matrix and dense vectors.
The operation expects handles returned by previous sparse operations		The operation expects handles returned by previous sparse operations
to construct an environment and the operands for SpMV.		to construct an environment and the operands for SpMV.

▲ Show 20 Lines • Show All 296 Lines • ▼ Show 20 Lines	def GPU_SDDMMOp : GPU_Op<"sddmm", [GPU_AsyncOpInterface]> {
];		];

let assemblyFormat = [{		let assemblyFormat = [{
custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
$dnmatA (`{` $modeA^ `}`)? `,` $dnmatB (`{` $modeB^ `}`)? `,` $spmatC `,` $buffer attr-dict `:` type($buffer) `into` $computeType		$dnmatA (`{` $modeA^ `}`)? `,` $dnmatB (`{` $modeB^ `}`)? `,` $spmatC `,` $buffer attr-dict `:` type($buffer) `into` $computeType
}];		}];
}		}

		// TODO: cusparseSpMatGetSize
		K-WuAuthorUnsubmitted Done Reply Inline Actions TODO: remove K-Wu: TODO: remove


		aartbikUnsubmitted Done Reply Inline Actions I would drop the "supported by sparse tensor", just "selected algorithm for SpGEMM", aartbik: I would drop the "supported by sparse tensor", just "selected algorithm for SpGEMM",
		def GPU_SpGEMMCreateDescrOp : GPU_Op<"spgemm_create_descr", [GPU_AsyncOpInterface]> {
		wrengrUnsubmitted Done Reply Inline Actions This description needs to be updated to actually say something about what the op is/does. Ditto for all the other new ops wrengr: This description needs to be updated to actually say something about what the op is/does. Ditto…
		aartbikUnsubmitted Done Reply Inline Actions +1 aartbik: +1
		aartbikUnsubmitted Done Reply Inline Actions can we say a bit more? the doc now pretty much says what the name already implies. perhaps say how such a descriptor is used subsequently aartbik: can we say a bit more? the doc now pretty much says what the name already implies. perhaps say…
		let summary = "SpGEMM Create Descr operation";
		let description = [{
		The `gpu.spgemm_create_descr`

		wrengrUnsubmitted Done Reply Inline Actions Should wrap this in backticks so that it's formatted/rendered properly. Ditto for analogous documentation on other ops wrengr: Should wrap this in backticks so that it's formatted/rendered properly. Ditto for analogous…
		K-WuAuthorUnsubmitted Done Reply Inline Actions Got it. I will apply backticks in all the new operators in this revision, and create a dedicated revision to applying backtick to other tablegen entries and applying the declaration macro to other ops. K-Wu: Got it. I will apply backticks in all the new operators in this revision, and create a…
		Example:

		```mlir
		%descriptor = gpu.spgemm_create_descr
		```

		}];
		wrengrUnsubmitted Done Reply Inline Actions Shouldn't there only be one of these triple-backtick lines? Ditto for the other ops wrengr: Shouldn't there only be one of these triple-backtick lines? Ditto for the other ops
		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies);
		let results = (outs GPU_SparseSpGEMMOpHandle:$desc,
		Optional<GPU_AsyncToken>:$asyncToken);
		let assemblyFormat = [{
		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
		attr-dict
		}];
		}

		def GPU_SpGEMMDestroyDescrOp : GPU_Op<"spgemm_destroy_descr", [GPU_AsyncOpInterface]> {
		aartbikUnsubmitted Done Reply Inline Actions seems like you did not finish the description? aartbik: seems like you did not finish the description?
		let summary = "SpGEMM Destroy Descr operation";
		let description = [{
		The `gpu.spgemm_destroy_descr`

		wrengrUnsubmitted Done Reply Inline Actions nit: should reformat this to avoid overly long lines wrengr: nit: should reformat this to avoid overly long lines
		Example:
		aartbikUnsubmitted Done Reply Inline Actions missing argument, no return? gpu.spgemm_destroy_descr %descriptor aartbik: missing argument, no return? gpu.spgemm_destroy_descr %descriptor

		```mlir
		%descriptor = gpu.spgemm_destroy_descr
		```

		}];

		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
		GPU_SparseSpGEMMOpHandle:$desc);
		let results = (outs Optional<GPU_AsyncToken>:$asyncToken);
		let assemblyFormat = [{
		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
		$desc attr-dict
		}];
		}


		def GPU_SpGEMMWorkEstimationOp : GPU_Op<"spgemm_work_estimation", [GPU_AsyncOpInterface]> {
		let summary = "SpGEMM work estimation operation";
		let description = [{
		The `gpu.spgemm_work_estimation`

		Example:

		```mlir
		aartbikUnsubmitted Done Reply Inline Actions could we combine this with spgemm_estimate_memory to reduce the number of ops (i.e. if they always appear in sequence, we could keep the operation footprint a bit less compared to actual lib calls)? aartbik: could we combine this with spgemm_estimate_memory to reduce the number of ops (i.e. if they…
		wrengrUnsubmitted Done Reply Inline Actions And if you're worried that "even though they always go together now, perhaps we'll need to split them up later on", then that could always be done via giving the op a new argument `enum Estimate { Work, Memory, Both }`. ...That is, assuming the extra buffer arguments to the function calls can be easily hidden away as part of the op lowering. Since our goal is to build a compiler rather than to provide MLIR bindings to the library per se, I think it's better to try to keep the ops as high-level as we actually want/need for the compiler, rather than going for 1-to-1 correspondence with the library function calls. Keeping things at the granularity of the compiler often helps both to improve codegen and to avoid compatibility issues with different library versions. Granted there's always a balancing act for this sort of thing, but nevertheless wrengr: And if you're worried that "even though they always go together now, perhaps we'll need to…
		aartbikUnsubmitted Done Reply Inline Actions +1 on the compiler vs 1:1 lib argument, and the proposal for the future aartbik: +1 on the compiler vs 1:1 lib argument, and the proposal for the future
		%descriptor = gpu.spgemm_work_estimation
		```

		}];

		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
		GPU_SparseSpGEMMOpHandle:$desc,
		GPU_TransposeModeAttr:$modeA,
		GPU_TransposeModeAttr:$modeB,
		GPU_SparseSpMatHandle:$spmatA,
		GPU_SparseSpMatHandle:$spmatB,
		GPU_SparseSpMatHandle:$spmatC,
		TypeAttr:$computeType,
		aartbikUnsubmitted Done Reply Inline Actions can we break this over multiple lines to stay in 80-col aartbik: can we break this over multiple lines to stay in 80-col
		Index:$bufferSz,
		GPU_SpGEMMAlgAttr:$alg,
		AnyMemRef:$buffer);
		let results = (outs Res<Index>:$bufferSzNew,
		Optional<GPU_AsyncToken>:$asyncToken);

		let builders = [OpBuilder<(ins
		"Type":$bufferSzNew,
		"Type":$asyncToken,
		"ValueRange":$asyncDependencies,
		"Value":$desc,
		"Value":$spmatA,
		"Value":$spmatB,
		"Value":$spmatC,
		"Type":$computeType,
		"Value":$bufferSz,
		"Value":$buffer), [{
		auto modeA = gpu::TransposeMode::NON_TRANSPOSE;
		auto modeB = gpu::TransposeMode::NON_TRANSPOSE;
		auto alg = gpu::SpGEMMAlg::ALG2;
		return build($_builder, $_state, bufferSzNew, asyncToken, asyncDependencies, desc,
		modeA, modeB, spmatA, spmatB, spmatC, computeType, bufferSz, alg, buffer);}]>
		];

		let assemblyFormat = [{
		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
		$spmatA (`{` $modeA^ `}`)? `,` $spmatB (`{` $modeB^ `}`)? `,` $spmatC `,` $alg `,` $desc `,` $bufferSz `,` $buffer attr-dict `:` $computeType `into` type($buffer)
		}];
		}


		def GPU_SpGEMMEstimateMemoryOp : GPU_Op<"spgemm_estimate_memory", [GPU_AsyncOpInterface]> {
		aartbikUnsubmitted Done Reply Inline Actions missing? aartbik: missing?
		let summary = "SpGEMM estimate memory operation";
		let description = [{
		The `gpu.spgemm_estimate_memory`

		Example:
		aartbikUnsubmitted Done Reply Inline Actions seems not right? aartbik: seems not right?

		```mlir
		%descriptor = gpu.spgemm_estimate_memory
		```

		}];

		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
		GPU_SparseSpGEMMOpHandle:$desc,
		GPU_TransposeModeAttr:$modeA,
		GPU_TransposeModeAttr:$modeB,
		aartbikUnsubmitted Not Done Reply Inline Actions how does this perform the actual computation? aartbik: how does this perform the actual computation?
		GPU_SparseSpMatHandle:$spmatA,
		GPU_SparseSpMatHandle:$spmatB,
		GPU_SparseSpMatHandle:$spmatC,
		TypeAttr:$computeType,
		GPU_SpGEMMAlgAttr:$alg,
		Index:$bufferSz3,
		AnyMemRef:$buffer3,
		Index:$bufferSz2);
		let results = (outs Index:$bufferSz3New,
		Index:$bufferSz2New,
		Optional<GPU_AsyncToken>:$asyncToken);

		let builders = [OpBuilder<(ins
		"Type":$bufferSz3New,
		"Type":$bufferSz2New,
		"Type":$asyncToken,
		"ValueRange":$asyncDependencies,
		"Value":$desc,
		"Value":$spmatA,
		"Value":$spmatB,
		"Value":$spmatC,
		"Type":$computeType,
		"Value":$bufferSz3,
		"Value":$buffer3,
		"Value":$bufferSz2), [{
		auto modeA = gpu::TransposeMode::NON_TRANSPOSE;
		auto modeB = gpu::TransposeMode::NON_TRANSPOSE;
		auto alg = gpu::SpGEMMAlg::ALG2;
		return build($_builder, $_state, bufferSz3New, bufferSz2New, asyncToken, asyncDependencies, desc,
		modeA, modeB, spmatA, spmatB, spmatC, computeType, alg, bufferSz3, buffer3, bufferSz2);}]>
		];

		let assemblyFormat = [{
		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
		$spmatA (`{` $modeA^ `}`)? `,` $spmatB (`{` $modeB^ `}`)? `,` $spmatC `,` $alg `,` $desc `,` $bufferSz3 `,` $bufferSz2 `,` $buffer3 attr-dict `:` $computeType `into` type($buffer3)
		}];
		}

		def GPU_SpGEMMComputeOp : GPU_Op<"spgemm_compute", [GPU_AsyncOpInterface]> {
		let summary = "SpGEMM compute operation";
		let description = [{
		The `gpu.spgemm` operation performs the SpGEMM operation on the given sparse
		matrices, and buffer. The operation expects handles returned by previous
		sparse operations to construct an environment and the operands for SpGEMM. The
		buffer must have been allocated on the device.

		C' = alpha * op(A) * op(B) + beta * C

		If the `async` keyword is present, the op is executed asynchronously (i.e.
		it does not block until the execution has finished on the device). In
		that case, it returns a !gpu.async.token in addition to the environment.

		Example:

		```mlir
		%descriptor = gpu.spgemm_compute
		```

		The matrix arguments can also be associated with one of the following
		aartbikUnsubmitted Not Done Reply Inline Actions wait, this is no longer used to determine buffer size, right? you used use the buffer that was allocated aartbik: wait, this is no longer used to determine buffer size, right? you used use the buffer that…
		K-WuAuthorUnsubmitted Done Reply Inline Actions cusparse spGEMM APIs are very wired. i am merely restating what I learnt from its documentation but not sure if spgemm_compute only does the computation The functions cusparseSpGEMM_workEstimation(), cusparseSpGEMM_estimateMemory(), and cusparseSpGEMM_compute() are used for both determining the buffer size and performing the actual computation. K-Wu: cusparse spGEMM APIs are very wired. i am merely restating what I learnt from its documentation…
		operators: NON_TRANSPOSE, TRANSPOSE, CONJUGATE_TRANSPOSE. The default value
		is NON_TRANSPOSE.

		}];

		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
		GPU_SparseSpGEMMOpHandle:$desc,
		GPU_TransposeModeAttr:$modeA,
		GPU_TransposeModeAttr:$modeB,
		GPU_SparseSpMatHandle:$spmatA,
		GPU_SparseSpMatHandle:$spmatB,
		GPU_SparseSpMatHandle:$spmatC,
		TypeAttr:$computeType,
		GPU_SpGEMMAlgAttr:$alg,
		Index:$bufferSz2,
		AnyMemRef:$buffer2);
		let results = (outs Index:$bufferSz2New,
		Optional<GPU_AsyncToken>:$asyncToken);

		let builders = [OpBuilder<(ins
		"Type":$asyncToken,
		"ValueRange":$asyncDependencies,
		"Value":$desc,
		"Value":$spmatA,
		"Value":$spmatB,
		"Value":$spmatC,
		"Type":$computeType,
		"Value":$bufferSz2,
		"Value":$buffer2), [{
		auto modeA = gpu::TransposeMode::NON_TRANSPOSE;
		auto modeB = gpu::TransposeMode::NON_TRANSPOSE;
		auto alg = gpu::SpGEMMAlg::ALG2;
		return build($_builder, $_state, asyncToken, asyncDependencies, desc,
		modeA, modeB, spmatA, spmatB, spmatC, computeType, alg, bufferSz2, buffer2);}]>
		];
		wrengrUnsubmitted Done Reply Inline Actions nit: reformat to avoid overly long lines wrengr: nit: reformat to avoid overly long lines

		let assemblyFormat = [{
		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
		$spmatA (`{` $modeA^ `}`)? `,` $spmatB (`{` $modeB^ `}`)? `,` $spmatC `,` $alg `,` $desc `,` $bufferSz2 `,` $buffer2 attr-dict `:` $computeType `into` type($buffer2)
		}];
		}

		def GPU_SpGEMMCopyOp : GPU_Op<"spgemm_copy", [GPU_AsyncOpInterface]> {
		aartbikUnsubmitted Done Reply Inline Actions this is the copy op? aartbik: this is the copy op?
		let summary = "SpGEMM copy operation";
		let description = [{
		The `gpu.spgemm` operation performs the SpGEMM operation on the given sparse
		matrices, and buffer. The operation expects handles returned by previous
		sparse operations to construct an environment and the operands for SpGEMM. The
		buffer must have been allocated on the device.

		C' = alpha * op(A) * op(B) + beta * C

		If the `async` keyword is present, the op is executed asynchronously (i.e.
		it does not block until the execution has finished on the device). In
		that case, it returns a !gpu.async.token in addition to the environment.

		Example:
		aartbikUnsubmitted Done Reply Inline Actions not complete? aartbik: not complete?

		```mlir
		%descriptor = gpu.spgemm_copy
		```

		The matrix arguments can also be associated with one of the following
		operators: NON_TRANSPOSE, TRANSPOSE, CONJUGATE_TRANSPOSE. The default value
		is NON_TRANSPOSE.

		}];

		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
		GPU_SparseSpGEMMOpHandle:$desc,
		GPU_TransposeModeAttr:$modeA,
		GPU_TransposeModeAttr:$modeB,
		GPU_SparseSpMatHandle:$spmatA,
		GPU_SparseSpMatHandle:$spmatB,
		GPU_SparseSpMatHandle:$spmatC,
		TypeAttr:$computeType,
		GPU_SpGEMMAlgAttr:$alg);
		let results = (outs Optional<GPU_AsyncToken>:$asyncToken);

		let builders = [OpBuilder<(ins
		"Type":$asyncToken,
		"ValueRange":$asyncDependencies,
		"Value":$desc,
		"Value":$spmatA,
		"Value":$spmatB,
		"Value":$spmatC,
		"Type":$computeType), [{
		auto modeA = gpu::TransposeMode::NON_TRANSPOSE;
		auto modeB = gpu::TransposeMode::NON_TRANSPOSE;
		auto alg = gpu::SpGEMMAlg::ALG2;
		return build($_builder, $_state, asyncToken, asyncDependencies, desc,
		modeA, modeB, spmatA, spmatB, spmatC, computeType, alg);}]>
		];

		let assemblyFormat = [{
		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
		$spmatA (`{` $modeA^ `}`)? `,` $spmatB (`{` $modeB^ `}`)? `,` $spmatC `,` $alg `,` $desc attr-dict `:` $computeType
		}];
		}

		aartbikUnsubmitted Done Reply Inline Actions not copy? aartbik: not copy?

		def GPU_SpGEMMGetSizeOp : GPU_Op<"spgemm_get_size", [GPU_AsyncOpInterface]> {
		aartbikUnsubmitted Done Reply Inline Actions get size? aartbik: get size?
		let summary = "SpGEMM copy operation";
		let description = [{
		The `gpu.spgemm` operation performs the SpGEMM operation on the given sparse
		matrices, and buffer. The operation expects handles returned by previous
		sparse operations to construct an environment and the operands for SpGEMM. The
		buffer must have been allocated on the device.

		C' = alpha * op(A) * op(B) + beta * C

		If the `async` keyword is present, the op is executed asynchronously (i.e.
		it does not block until the execution has finished on the device). In
		that case, it returns a !gpu.async.token in addition to the environment.

		Example:

		```mlir
		%descriptor = gpu.spgemm_get_size
		```

		The matrix arguments can also be associated with one of the following
		operators: NON_TRANSPOSE, TRANSPOSE, CONJUGATE_TRANSPOSE. The default value
		is NON_TRANSPOSE.

		}];

		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
		GPU_SparseSpMatHandle:$spmat);
		let results = (outs Index:$rows,
		Index:$cols,
		Index:$nnz,
		Optional<GPU_AsyncToken>:$asyncToken);

		let assemblyFormat = [{
		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
		aartbikUnsubmitted Done Reply Inline Actions empty line for space aartbik: empty line for space
		$spmat attr-dict
		}];
		}
#endif // GPU_OPS		#endif // GPU_OPS

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp

Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	protected:
LLVM::LLVMPointerType llvmPointerType =		LLVM::LLVMPointerType llvmPointerType =
this->getTypeConverter()->getPointerType(IntegerType::get(context, 8));		this->getTypeConverter()->getPointerType(IntegerType::get(context, 8));
Type llvmPointerPointerType =		Type llvmPointerPointerType =
this->getTypeConverter()->getPointerType(llvmPointerType);		this->getTypeConverter()->getPointerType(llvmPointerType);
Type llvmInt8Type = IntegerType::get(context, 8);		Type llvmInt8Type = IntegerType::get(context, 8);
Type llvmInt16Type = IntegerType::get(context, 16);		Type llvmInt16Type = IntegerType::get(context, 16);
Type llvmInt32Type = IntegerType::get(context, 32);		Type llvmInt32Type = IntegerType::get(context, 32);
Type llvmInt64Type = IntegerType::get(context, 64);		Type llvmInt64Type = IntegerType::get(context, 64);
		Type llvmFloat32Type = Float32Type::get(context);
Type llvmInt8PointerType =		Type llvmInt8PointerType =
this->getTypeConverter()->getPointerType(llvmInt8Type);		this->getTypeConverter()->getPointerType(llvmInt8Type);
Type llvmInt64PointerType =		Type llvmInt64PointerType =
this->getTypeConverter()->getPointerType(llvmInt64Type);		this->getTypeConverter()->getPointerType(llvmInt64Type);
Type llvmIntPtrType = IntegerType::get(		Type llvmIntPtrType = IntegerType::get(
context, this->getTypeConverter()->getPointerBitwidth(0));		context, this->getTypeConverter()->getPointerBitwidth(0));

FunctionCallBuilder moduleLoadCallBuilder = {		FunctionCallBuilder moduleLoadCallBuilder = {
▲ Show 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	FunctionCallBuilder createCuSparseLtSpMMBufferSizeBuilder = {
{llvmPointerType, llvmInt32Type, llvmInt32Type, llvmPointerType,		{llvmPointerType, llvmInt32Type, llvmInt32Type, llvmPointerType,
llvmPointerType, llvmPointerType, llvmInt32Type,		llvmPointerType, llvmPointerType, llvmInt32Type,
llvmPointerType /void stream*/}};		llvmPointerType /void stream*/}};
FunctionCallBuilder createCuSparseLtSpMMBuilder = {		FunctionCallBuilder createCuSparseLtSpMMBuilder = {
"mgpuCuSparseLtSpMM",		"mgpuCuSparseLtSpMM",
llvmVoidType,		llvmVoidType,
{llvmPointerType, llvmPointerType, llvmPointerType, llvmPointerType,		{llvmPointerType, llvmPointerType, llvmPointerType, llvmPointerType,
llvmPointerType, llvmPointerType, llvmPointerType /void stream*/}};		llvmPointerType, llvmPointerType, llvmPointerType /void stream*/}};
		FunctionCallBuilder createSpGEMMWorkEstimationBuilder = {
		"mgpuSpGEMMWorkEstimation",
		llvmIntPtrType,
		{llvmPointerType /void s/, llvmInt32Type /int32_t ma*/,
		llvmInt32Type /int32_t mb/, llvmPointerType /void a*/,
		llvmPointerType /void b/, llvmPointerType /void* c*/,
		llvmInt32Type /int32_t ctp/, llvmIntPtrType /intptr_t bs/,
		llvmPointerType /void buf/, llvmPointerType /void stream/}};
		FunctionCallBuilder createSpGEMMEstimateMemoryBuilder = {
		wrengrUnsubmitted Done Reply Inline Actions I'm not sure what style is used in the rest of the file/dialect, but is it really necessary/helpful to provide the C/C++ types for each of the arguments, given as they have an easy 1-to-1 mapping to the LLVM types? Giving the names of the arguments makes sense, but the types less so imo wrengr: I'm not sure what style is used in the rest of the file/dialect, but is it really…
		aartbikUnsubmitted Not Done Reply Inline Actions I agree we should find an automated way for this, both the types and names. As for the names, now, it is inconsistent with the other parts, since we only documented stream for some reason ;-) But for the long lists, it actually does help to find the right ones. We still have the problem that we document the type for stream (as in voidstream), but not the others, but dropping the stream type would be inconsistent with the rest. So, okay to leave what you have now, but we probably need to revisit this one day. aartbik:* I agree we should find an automated way for this, both the types and names. As for the names…
		K-WuAuthorUnsubmitted Done Reply Inline Actions noted :D K-Wu: noted :D
		"mgpuSpGEMMEstimateMemory",
		llvmVoidType,
		{llvmPointerType /void nbs3/, llvmPointerType /void* nbs2*/,
		llvmPointerType /void s/, llvmInt32Type /int32_t ma*/,
		llvmInt32Type /int32_t mb/, llvmPointerType /void a*/,
		llvmPointerType /void b/, llvmPointerType /void* c*/,
		llvmInt32Type /int32_t ctp/, llvmInt32Type /int32_t alg/,
		llvmFloat32Type /chunk_fraction/, llvmIntPtrType /intptr_t bs3/,
		llvmPointerType /void buf3/, llvmIntPtrType /intptr_t bs2*/,
		llvmPointerType /void stream*/}};
		FunctionCallBuilder createSpGEMMComputeBuilder = {
		"mgpuSpGEMMCompute",
		llvmIntPtrType,
		{llvmPointerType /void s/, llvmInt32Type /int32_t ma*/,
		llvmInt32Type /int32_t mb/, llvmPointerType /void a*/,
		llvmPointerType /void b/, llvmPointerType /void* c*/,
		llvmInt32Type /int32_t ctp/, llvmPointerType /void buf*/,
		llvmIntPtrType /intptr_t bs/, llvmPointerType /void stream*/}};
		FunctionCallBuilder createSpGEMMCopyBuilder = {
		"mgpuSpGEMMCopy",
		llvmVoidType,
		{llvmPointerType /void s/, llvmInt32Type /int32_t ma*/,
		llvmInt32Type /int32_t mb/, llvmPointerType /void a*/,
		llvmPointerType /void b/, llvmPointerType /void* c*/,
		llvmInt32Type /int32_t ctp/, llvmInt32Type /int32_t alg/,
		llvmPointerType /void stream*/}};
		FunctionCallBuilder createSpGEMMCreateDescrBuilder = {
		"mgpuSpGEMMCreateDescr",
		llvmPointerType,
		{llvmPointerType /void stream*/}};
		wrengrUnsubmitted Done Reply Inline Actions Should drop the `void` part of the comments here, for consistency wrengr:* Should drop the `void*` part of the comments here, for consistency
		FunctionCallBuilder createSpGEMMDestroyDescrBuilder = {
		"mgpuSpGEMMDestoryDescr",
		llvmVoidType,
		{llvmPointerType /void s/, llvmPointerType /void stream/}};
		FunctionCallBuilder createSpGEMMGetSizeBuilder = {
		"mgpuSpGEMMGetSize",
		llvmVoidType,
		{llvmPointerType /void mc/, llvmPointerType /void rc/,
		llvmPointerType /void cc/, llvmPointerType /void nc/,
		llvmPointerType /void stream*/}};
};		};

/// A rewrite pattern to convert gpu.host_register operations into a GPU runtime		/// A rewrite pattern to convert gpu.host_register operations into a GPU runtime
/// call. Currently it supports CUDA and ROCm (HIP).		/// call. Currently it supports CUDA and ROCm (HIP).
class ConvertHostRegisterOpToGpuRuntimeCallPattern		class ConvertHostRegisterOpToGpuRuntimeCallPattern
: public ConvertOpToGpuRuntimeCallPattern<gpu::HostRegisterOp> {		: public ConvertOpToGpuRuntimeCallPattern<gpu::HostRegisterOp> {
public:		public:
ConvertHostRegisterOpToGpuRuntimeCallPattern(LLVMTypeConverter &typeConverter)		ConvertHostRegisterOpToGpuRuntimeCallPattern(LLVMTypeConverter &typeConverter)
▲ Show 20 Lines • Show All 342 Lines • ▼ Show 20 Lines	ConvertSDDMMOpToGpuRuntimeCallPattern(LLVMTypeConverter &typeConverter)
: ConvertOpToGpuRuntimeCallPattern<gpu::SDDMMOp>(typeConverter) {}		: ConvertOpToGpuRuntimeCallPattern<gpu::SDDMMOp>(typeConverter) {}

private:		private:
LogicalResult		LogicalResult
matchAndRewrite(gpu::SDDMMOp op, OpAdaptor adaptor,		matchAndRewrite(gpu::SDDMMOp op, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const override;		ConversionPatternRewriter &rewriter) const override;
};		};

		class ConvertSpGEMMCreateDescrOpToGpuRuntimeCallPattern
		aartbikUnsubmitted Done Reply Inline Actions we can use this for ALL rules now, right? I can see you don't want to do that in this CL, but add a TODO for follow up aartbik: we can use this for ALL rules now, right? I can see you don't want to do that in this CL, but…
		: public ConvertOpToGpuRuntimeCallPattern<gpu::SpGEMMCreateDescrOp> {
		public:
		ConvertSpGEMMCreateDescrOpToGpuRuntimeCallPattern(
		LLVMTypeConverter &typeConverter)
		: ConvertOpToGpuRuntimeCallPattern<gpu::SpGEMMCreateDescrOp>(
		typeConverter) {}

		private:
		LogicalResult
		matchAndRewrite(gpu::SpGEMMCreateDescrOp op, OpAdaptor adaptor,
		ConversionPatternRewriter &rewriter) const override;
		};

		class ConvertSpGEMMDestroyDescrOpToGpuRuntimeCallPattern
		: public ConvertOpToGpuRuntimeCallPattern<gpu::SpGEMMDestroyDescrOp> {
		public:
		ConvertSpGEMMDestroyDescrOpToGpuRuntimeCallPattern(
		LLVMTypeConverter &typeConverter)
		: ConvertOpToGpuRuntimeCallPattern<gpu::SpGEMMDestroyDescrOp>(
		typeConverter) {}

		private:
		LogicalResult
		matchAndRewrite(gpu::SpGEMMDestroyDescrOp op, OpAdaptor adaptor,
		ConversionPatternRewriter &rewriter) const override;
		};

		class ConvertSpGEMMWorkEstimationOpToGpuRuntimeCallPattern
		: public ConvertOpToGpuRuntimeCallPattern<gpu::SpGEMMWorkEstimationOp> {
		public:
		ConvertSpGEMMWorkEstimationOpToGpuRuntimeCallPattern(
		LLVMTypeConverter &typeConverter)
		: ConvertOpToGpuRuntimeCallPattern<gpu::SpGEMMWorkEstimationOp>(
		typeConverter) {}

		private:
		LogicalResult
		matchAndRewrite(gpu::SpGEMMWorkEstimationOp op, OpAdaptor adaptor,
		ConversionPatternRewriter &rewriter) const override;
		};

		class ConvertSpGEMMEstimateMemoryOpToGpuRuntimeCallPattern
		: public ConvertOpToGpuRuntimeCallPattern<gpu::SpGEMMEstimateMemoryOp> {
		public:
		ConvertSpGEMMEstimateMemoryOpToGpuRuntimeCallPattern(
		LLVMTypeConverter &typeConverter)
		: ConvertOpToGpuRuntimeCallPattern<gpu::SpGEMMEstimateMemoryOp>(
		typeConverter) {}

		private:
		LogicalResult
		matchAndRewrite(gpu::SpGEMMEstimateMemoryOp op, OpAdaptor adaptor,
		ConversionPatternRewriter &rewriter) const override;
		};

		class ConvertSpGEMMComputeOpToGpuRuntimeCallPattern
		: public ConvertOpToGpuRuntimeCallPattern<gpu::SpGEMMComputeOp> {
		public:
		ConvertSpGEMMComputeOpToGpuRuntimeCallPattern(
		LLVMTypeConverter &typeConverter)
		: ConvertOpToGpuRuntimeCallPattern<gpu::SpGEMMComputeOp>(typeConverter) {}

		private:
		LogicalResult
		matchAndRewrite(gpu::SpGEMMComputeOp op, OpAdaptor adaptor,
		ConversionPatternRewriter &rewriter) const override;
		};

		class ConvertSpGEMMGetSizeOpToGpuRuntimeCallPattern
		: public ConvertOpToGpuRuntimeCallPattern<gpu::SpGEMMGetSizeOp> {
		public:
		ConvertSpGEMMGetSizeOpToGpuRuntimeCallPattern(
		LLVMTypeConverter &typeConverter)
		: ConvertOpToGpuRuntimeCallPattern<gpu::SpGEMMGetSizeOp>(typeConverter) {}

		private:
		LogicalResult
		matchAndRewrite(gpu::SpGEMMGetSizeOp op, OpAdaptor adaptor,
		ConversionPatternRewriter &rewriter) const override;
		};

		class ConvertSpGEMMCopyOpToGpuRuntimeCallPattern
		: public ConvertOpToGpuRuntimeCallPattern<gpu::SpGEMMCopyOp> {
		public:
		ConvertSpGEMMCopyOpToGpuRuntimeCallPattern(LLVMTypeConverter &typeConverter)
		: ConvertOpToGpuRuntimeCallPattern<gpu::SpGEMMCopyOp>(typeConverter) {}

		private:
		LogicalResult
		matchAndRewrite(gpu::SpGEMMCopyOp op, OpAdaptor adaptor,
		ConversionPatternRewriter &rewriter) const override;
		};
		PeimingUnsubmitted Done Reply Inline Actions I somehow feel that we have repeated this code so many times, do you think you can make it a macro instead? Peiming: I somehow feel that we have repeated this code so many times, do you think you can make it a…
		K-WuAuthorUnsubmitted Done Reply Inline Actions Good point. I will think about it. K-Wu: Good point. I will think about it.
		aartbikUnsubmitted Done Reply Inline Actions yeah, I am also thinking that keeping the definition and declaration together would already go a long way reducing boilerplate quite a bit but, if we use a macro of the code here, then that would also be more readable and we can keep the code below aartbik: yeah, I am also thinking that keeping the definition and declaration together would already go…

} // namespace		} // namespace

void GpuToLLVMConversionPass::runOnOperation() {		void GpuToLLVMConversionPass::runOnOperation() {
LowerToLLVMOptions options(&getContext());		LowerToLLVMOptions options(&getContext());
options.useOpaquePointers = useOpaquePointers;		options.useOpaquePointers = useOpaquePointers;
options.useBarePtrCallConv = hostBarePtrCallConv;		options.useBarePtrCallConv = hostBarePtrCallConv;

LLVMTypeConverter converter(&getContext(), options);		LLVMTypeConverter converter(&getContext(), options);
▲ Show 20 Lines • Show All 666 Lines • ▼ Show 20 Lines

template <typename T>		template <typename T>
static Value genConstInt32From(OpBuilder &builder, Location loc, T TValue) {		static Value genConstInt32From(OpBuilder &builder, Location loc, T TValue) {
Type llvmInt32Type = builder.getIntegerType(32);		Type llvmInt32Type = builder.getIntegerType(32);
return builder.create<LLVM::ConstantOp>(loc, llvmInt32Type,		return builder.create<LLVM::ConstantOp>(loc, llvmInt32Type,
static_cast<int32_t>(TValue));		static_cast<int32_t>(TValue));
}		}

		template <typename T>
		static Value genConstFloat32From(OpBuilder &builder, Location loc, T TValue) {
		Type llvmFloat32Type = builder.getF32Type();
		return builder.create<LLVM::ConstantOp>(
		loc, llvmFloat32Type,
		builder.getF32FloatAttr(static_cast<float>(TValue)));
		}

LogicalResult ConvertCreateDnTensorOpToGpuRuntimeCallPattern::matchAndRewrite(		LogicalResult ConvertCreateDnTensorOpToGpuRuntimeCallPattern::matchAndRewrite(
gpu::CreateDnTensorOp op, OpAdaptor adaptor,		gpu::CreateDnTensorOp op, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const {		ConversionPatternRewriter &rewriter) const {
if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) \|\|		if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) \|\|
failed(isAsyncWithOneDependency(rewriter, op)))		failed(isAsyncWithOneDependency(rewriter, op)))
return failure();		return failure();
Location loc = op.getLoc();		Location loc = op.getLoc();
auto stream = adaptor.getAsyncDependencies().front();		auto stream = adaptor.getAsyncDependencies().front();
▲ Show 20 Lines • Show All 416 Lines • ▼ Show 20 Lines	LogicalResult ConvertSDDMMOpToGpuRuntimeCallPattern::matchAndRewrite(
createSDDMMCallBuilder.create(loc, rewriter,		createSDDMMCallBuilder.create(loc, rewriter,
{modeA, modeB, adaptor.getDnmatA(),		{modeA, modeB, adaptor.getDnmatA(),
adaptor.getDnmatB(), adaptor.getSpmatC(),		adaptor.getDnmatB(), adaptor.getSpmatC(),
computeType, pBuf, stream});		computeType, pBuf, stream});
rewriter.replaceOp(op, {stream});		rewriter.replaceOp(op, {stream});
return success();		return success();
}		}

		LogicalResult
		ConvertSpGEMMCreateDescrOpToGpuRuntimeCallPattern::matchAndRewrite(
		gpu::SpGEMMCreateDescrOp op, OpAdaptor adaptor,
		ConversionPatternRewriter &rewriter) const {
		if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) \|\|
		failed(isAsyncWithOneDependency(rewriter, op)))
		return failure();
		Location loc = op.getLoc();
		auto stream = adaptor.getAsyncDependencies().front();
		Value descr = createSpGEMMCreateDescrBuilder.create(loc, rewriter, {stream})
		.getResult();
		rewriter.replaceOp(op, {descr, stream});
		return success();
		}

		LogicalResult
		ConvertSpGEMMDestroyDescrOpToGpuRuntimeCallPattern::matchAndRewrite(
		gpu::SpGEMMDestroyDescrOp op, OpAdaptor adaptor,
		ConversionPatternRewriter &rewriter) const {

		if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) \|\|
		failed(isAsyncWithOneDependency(rewriter, op)))
		return failure();
		Location loc = op.getLoc();
		auto stream = adaptor.getAsyncDependencies().front();
		createSpGEMMCopyBuilder.create(loc, rewriter, {adaptor.getDesc(), stream});
		rewriter.replaceOp(op, {stream});
		return success();
		}

		LogicalResult
		ConvertSpGEMMWorkEstimationOpToGpuRuntimeCallPattern::matchAndRewrite(
		gpu::SpGEMMWorkEstimationOp op, OpAdaptor adaptor,
		ConversionPatternRewriter &rewriter) const {
		if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) \|\|
		failed(isAsyncWithOneDependency(rewriter, op)))
		return failure();
		Location loc = op.getLoc();
		auto computeType = genConstInt32From(
		rewriter, loc, getCuSparseDataTypeFrom(adaptor.getComputeType()));
		auto modeA = genConstInt32From(rewriter, loc, adaptor.getModeA());
		auto modeB = genConstInt32From(rewriter, loc, adaptor.getModeB());
		auto stream = adaptor.getAsyncDependencies().front();
		// TODO: support other chunk fraction
		Value pBuf =
		MemRefDescriptor(adaptor.getBuffer()).allocatedPtr(rewriter, loc);
		if (!getTypeConverter()->useOpaquePointers())
		pBuf = rewriter.create<LLVM::BitcastOp>(loc, llvmPointerType, pBuf);

		auto bufferSizeNew =
		createSpGEMMWorkEstimationBuilder
		.create(loc, rewriter,
		{adaptor.getDesc(), modeA, modeB, adaptor.getSpmatA(),
		adaptor.getSpmatB(), adaptor.getSpmatC(), computeType,
		adaptor.getBufferSz(), pBuf, stream})
		.getResult();

		rewriter.replaceOp(op, {bufferSizeNew, stream});
		return success();
		}

		LogicalResult
		ConvertSpGEMMEstimateMemoryOpToGpuRuntimeCallPattern::matchAndRewrite(
		gpu::SpGEMMEstimateMemoryOp op, OpAdaptor adaptor,
		ConversionPatternRewriter &rewriter) const {
		if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) \|\|
		failed(isAsyncWithOneDependency(rewriter, op)))
		return failure();
		Location loc = op.getLoc();
		auto computeType = genConstInt32From(
		rewriter, loc, getCuSparseDataTypeFrom(adaptor.getComputeType()));
		auto alg = genConstInt32From(rewriter, loc, adaptor.getAlg());
		auto modeA = genConstInt32From(rewriter, loc, adaptor.getModeA());
		auto modeB = genConstInt32From(rewriter, loc, adaptor.getModeB());
		auto stream = adaptor.getAsyncDependencies().front();
		// TODO: support other chunk fraction
		Value chunkFraction = genConstFloat32From(rewriter, loc, 1.0);
		Value pBuf3 =
		MemRefDescriptor(adaptor.getBuffer3()).allocatedPtr(rewriter, loc);
		if (!getTypeConverter()->useOpaquePointers())
		pBuf3 = rewriter.create<LLVM::BitcastOp>(loc, llvmPointerType, pBuf3);

		auto two = rewriter.create<LLVM::ConstantOp>(loc, getIndexType(),
		rewriter.getIndexAttr(2));
		auto bufferSize = rewriter.create<LLVM::AllocaOp>(
		loc, llvmInt64PointerType, llvmInt64Type, two, /alignment=/16);

		auto bufferSizePtr2 = rewriter.create<LLVM::GEPOp>(
		loc, llvmInt64PointerType, llvmInt64PointerType, bufferSize,
		ValueRange{rewriter.create<LLVM::ConstantOp>(loc, getIndexType(),
		rewriter.getIndexAttr(0))});
		auto bufferSizePtr3 = rewriter.create<LLVM::GEPOp>(
		loc, llvmInt64PointerType, llvmInt64PointerType, bufferSize,
		ValueRange{rewriter.create<LLVM::ConstantOp>(loc, getIndexType(),
		rewriter.getIndexAttr(1))});

		createSpGEMMEstimateMemoryBuilder.create(
		loc, rewriter,
		{bufferSizePtr3, bufferSizePtr2, adaptor.getDesc(), modeA, modeB,
		adaptor.getSpmatA(), adaptor.getSpmatB(), adaptor.getSpmatC(),
		computeType, alg, chunkFraction, adaptor.getBufferSz3(), pBuf3,
		adaptor.getBufferSz2(), stream});
		auto bufferSize2 =
		rewriter.create<LLVM::LoadOp>(loc, llvmInt64Type, bufferSizePtr2);
		auto bufferSize3 =
		rewriter.create<LLVM::LoadOp>(loc, llvmInt64Type, bufferSizePtr3);

		rewriter.replaceOp(op, {bufferSize3, bufferSize2, stream});
		return success();
		}

		LogicalResult ConvertSpGEMMComputeOpToGpuRuntimeCallPattern::matchAndRewrite(
		gpu::SpGEMMComputeOp op, OpAdaptor adaptor,
		ConversionPatternRewriter &rewriter) const {
		if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) \|\|
		failed(isAsyncWithOneDependency(rewriter, op)))
		return failure();
		Location loc = op.getLoc();
		auto computeType = genConstInt32From(
		rewriter, loc, getCuSparseDataTypeFrom(adaptor.getComputeType()));
		auto modeA = genConstInt32From(rewriter, loc, adaptor.getModeA());
		auto modeB = genConstInt32From(rewriter, loc, adaptor.getModeB());
		auto stream = adaptor.getAsyncDependencies().front();
		Value pBuf2 =
		MemRefDescriptor(adaptor.getBuffer2()).allocatedPtr(rewriter, loc);
		if (!getTypeConverter()->useOpaquePointers())
		pBuf2 = rewriter.create<LLVM::BitcastOp>(loc, llvmPointerType, pBuf2);
		auto bufferSz2New =
		createSpGEMMComputeBuilder
		.create(loc, rewriter,
		{adaptor.getDesc(), modeA, modeB, adaptor.getSpmatA(),
		adaptor.getSpmatB(), adaptor.getSpmatC(), computeType, pBuf2,
		adaptor.getBufferSz2(), stream})
		.getResult();
		rewriter.replaceOp(op, {bufferSz2New, stream});
		return success();
		}

		LogicalResult ConvertSpGEMMCopyOpToGpuRuntimeCallPattern::matchAndRewrite(
		gpu::SpGEMMCopyOp op, OpAdaptor adaptor,
		ConversionPatternRewriter &rewriter) const {
		if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) \|\|
		failed(isAsyncWithOneDependency(rewriter, op)))
		return failure();
		Location loc = op.getLoc();
		auto computeType = genConstInt32From(
		rewriter, loc, getCuSparseDataTypeFrom(adaptor.getComputeType()));
		auto modeA = genConstInt32From(rewriter, loc, adaptor.getModeA());
		auto modeB = genConstInt32From(rewriter, loc, adaptor.getModeB());
		auto alg = genConstInt32From(rewriter, loc, adaptor.getAlg());
		auto stream = adaptor.getAsyncDependencies().front();
		createSpGEMMCopyBuilder.create(
		loc, rewriter,
		{adaptor.getDesc(), modeA, modeB, adaptor.getSpmatA(),
		adaptor.getSpmatB(), adaptor.getSpmatC(), computeType, alg, stream});
		rewriter.replaceOp(op, {stream});
		return success();
		}

		LogicalResult ConvertSpGEMMGetSizeOpToGpuRuntimeCallPattern::matchAndRewrite(
		gpu::SpGEMMGetSizeOp op, OpAdaptor adaptor,
		ConversionPatternRewriter &rewriter) const {
		if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) \|\|
		failed(isAsyncWithOneDependency(rewriter, op)))
		return failure();
		Location loc = op.getLoc();
		auto stream = adaptor.getAsyncDependencies().front();

		auto three = rewriter.create<LLVM::ConstantOp>(loc, getIndexType(),
		rewriter.getIndexAttr(3));
		auto buffer = rewriter.create<LLVM::AllocaOp>(
		loc, llvmInt64PointerType, llvmInt64Type, three, /alignment=/16);

		auto rowsPtr = rewriter.create<LLVM::GEPOp>(
		loc, llvmInt64PointerType, llvmInt64PointerType, buffer,
		ValueRange{rewriter.create<LLVM::ConstantOp>(loc, getIndexType(),
		rewriter.getIndexAttr(0))});
		auto colsPtr = rewriter.create<LLVM::GEPOp>(
		loc, llvmInt64PointerType, llvmInt64PointerType, buffer,
		ValueRange{rewriter.create<LLVM::ConstantOp>(loc, getIndexType(),
		rewriter.getIndexAttr(1))});
		auto nnzsPtr = rewriter.create<LLVM::GEPOp>(
		loc, llvmInt64PointerType, llvmInt64PointerType, buffer,
		ValueRange{rewriter.create<LLVM::ConstantOp>(loc, getIndexType(),
		rewriter.getIndexAttr(2))});
		createSpGEMMGetSizeBuilder.create(
		loc, rewriter, {adaptor.getSpmat(), rowsPtr, colsPtr, nnzsPtr, stream});
		auto rows = rewriter.create<LLVM::LoadOp>(loc, llvmInt64Type, rowsPtr);
		auto cols = rewriter.create<LLVM::LoadOp>(loc, llvmInt64Type, colsPtr);
		auto nnzs = rewriter.create<LLVM::LoadOp>(loc, llvmInt64Type, nnzsPtr);

		rewriter.replaceOp(op, {rows, cols, nnzs, stream});
		return success();
		}

void mlir::populateGpuToLLVMConversionPatterns(LLVMTypeConverter &converter,		void mlir::populateGpuToLLVMConversionPatterns(LLVMTypeConverter &converter,
RewritePatternSet &patterns,		RewritePatternSet &patterns,
StringRef gpuBinaryAnnotation,		StringRef gpuBinaryAnnotation,
bool kernelBarePtrCallConv) {		bool kernelBarePtrCallConv) {
addOpaquePointerConversion<gpu::AsyncTokenType>(converter);		addOpaquePointerConversion<gpu::AsyncTokenType>(converter);
addOpaquePointerConversion<gpu::SparseDnTensorHandleType>(converter);		addOpaquePointerConversion<gpu::SparseDnTensorHandleType>(converter);
addOpaquePointerConversion<gpu::SparseSpMatHandleType>(converter);		addOpaquePointerConversion<gpu::SparseSpMatHandleType>(converter);
		addOpaquePointerConversion<gpu::SparseSpGEMMOpHandleType>(converter);

patterns.add<ConvertAllocOpToGpuRuntimeCallPattern,		patterns.add<ConvertAllocOpToGpuRuntimeCallPattern,
ConvertDeallocOpToGpuRuntimeCallPattern,		ConvertDeallocOpToGpuRuntimeCallPattern,
ConvertHostRegisterOpToGpuRuntimeCallPattern,		ConvertHostRegisterOpToGpuRuntimeCallPattern,
ConvertHostUnregisterOpToGpuRuntimeCallPattern,		ConvertHostUnregisterOpToGpuRuntimeCallPattern,
ConvertMemcpyOpToGpuRuntimeCallPattern,		ConvertMemcpyOpToGpuRuntimeCallPattern,
ConvertMemsetOpToGpuRuntimeCallPattern,		ConvertMemsetOpToGpuRuntimeCallPattern,
ConvertSetDefaultDeviceOpToGpuRuntimeCallPattern,		ConvertSetDefaultDeviceOpToGpuRuntimeCallPattern,
ConvertWaitAsyncOpToGpuRuntimeCallPattern,		ConvertWaitAsyncOpToGpuRuntimeCallPattern,
ConvertWaitOpToGpuRuntimeCallPattern,		ConvertWaitOpToGpuRuntimeCallPattern,
ConvertAsyncYieldToGpuRuntimeCallPattern,		ConvertAsyncYieldToGpuRuntimeCallPattern,
ConvertCreateDnTensorOpToGpuRuntimeCallPattern,		ConvertCreateDnTensorOpToGpuRuntimeCallPattern,
ConvertDestroyDnTensorOpToGpuRuntimeCallPattern,		ConvertDestroyDnTensorOpToGpuRuntimeCallPattern,
ConvertCreateCooOpToGpuRuntimeCallPattern,		ConvertCreateCooOpToGpuRuntimeCallPattern,
ConvertCreateCooAoSOpToGpuRuntimeCallPattern,		ConvertCreateCooAoSOpToGpuRuntimeCallPattern,
ConvertCreateCsrOpToGpuRuntimeCallPattern,		ConvertCreateCsrOpToGpuRuntimeCallPattern,
ConvertCreate2To4SpMatOpToGpuRuntimeCallPattern,		ConvertCreate2To4SpMatOpToGpuRuntimeCallPattern,
ConvertDestroySpMatOpToGpuRuntimeCallPattern,		ConvertDestroySpMatOpToGpuRuntimeCallPattern,
		ConvertSpGEMMCreateDescrOpToGpuRuntimeCallPattern,
		ConvertSpGEMMDestroyDescrOpToGpuRuntimeCallPattern,
		ConvertSpGEMMWorkEstimationOpToGpuRuntimeCallPattern,
		ConvertSpGEMMEstimateMemoryOpToGpuRuntimeCallPattern,
		ConvertSpGEMMComputeOpToGpuRuntimeCallPattern,
		ConvertSpGEMMGetSizeOpToGpuRuntimeCallPattern,
		ConvertSpGEMMCopyOpToGpuRuntimeCallPattern,
ConvertSpMVBufferSizeOpToGpuRuntimeCallPattern,		ConvertSpMVBufferSizeOpToGpuRuntimeCallPattern,
ConvertSpMVOpToGpuRuntimeCallPattern,		ConvertSpMVOpToGpuRuntimeCallPattern,
ConvertSpMMBufferSizeOpToGpuRuntimeCallPattern,		ConvertSpMMBufferSizeOpToGpuRuntimeCallPattern,
ConvertSpMMOpToGpuRuntimeCallPattern,		ConvertSpMMOpToGpuRuntimeCallPattern,
ConvertSDDMMBufferSizeOpToGpuRuntimeCallPattern,		ConvertSDDMMBufferSizeOpToGpuRuntimeCallPattern,
ConvertSDDMMOpToGpuRuntimeCallPattern>(converter);		ConvertSDDMMOpToGpuRuntimeCallPattern>(converter);
patterns.add<ConvertLaunchFuncOpToGpuRuntimeCallPattern>(		patterns.add<ConvertLaunchFuncOpToGpuRuntimeCallPattern>(
converter, gpuBinaryAnnotation, kernelBarePtrCallConv);		converter, gpuBinaryAnnotation, kernelBarePtrCallConv);
patterns.add<EraseGpuModuleOpPattern>(&converter.getContext());		patterns.add<EraseGpuModuleOpPattern>(&converter.getContext());
}		}

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

Show First 20 Lines • Show All 142 Lines • ▼ Show 20 Lines
};		};
} // namespace		} // namespace

void GPUDialect::initialize() {		void GPUDialect::initialize() {
addTypes<AsyncTokenType>();		addTypes<AsyncTokenType>();
addTypes<MMAMatrixType>();		addTypes<MMAMatrixType>();
addTypes<SparseDnTensorHandleType>();		addTypes<SparseDnTensorHandleType>();
addTypes<SparseSpMatHandleType>();		addTypes<SparseSpMatHandleType>();
		addTypes<SparseSpGEMMOpHandleType>();
addOperations<		addOperations<
#define GET_OP_LIST		#define GET_OP_LIST
#include "mlir/Dialect/GPU/IR/GPUOps.cpp.inc"		#include "mlir/Dialect/GPU/IR/GPUOps.cpp.inc"
>();		>();
addAttributes<		addAttributes<
#define GET_ATTRDEF_LIST		#define GET_ATTRDEF_LIST
#include "mlir/Dialect/GPU/IR/GPUOpsAttributes.cpp.inc"		#include "mlir/Dialect/GPU/IR/GPUOpsAttributes.cpp.inc"
>();		>();
addInterfaces<GPUInlinerInterface>();		addInterfaces<GPUInlinerInterface>();
}		}

static std::string getSparseHandleKeyword(SparseHandleKind kind) {		static std::string getSparseHandleKeyword(SparseHandleKind kind) {
switch (kind) {		switch (kind) {
case SparseHandleKind::DnTensor:		case SparseHandleKind::DnTensor:
return "sparse.dntensor_handle";		return "sparse.dntensor_handle";
case SparseHandleKind::SpMat:		case SparseHandleKind::SpMat:
return "sparse.spmat_handle";		return "sparse.spmat_handle";
		case SparseHandleKind::SpGEMMOp:
		return "sparse.spgemmop_handle";
}		}
llvm_unreachable("unknown sparse handle kind");		llvm_unreachable("unknown sparse handle kind");
return "";		return "";
}		}

Type GPUDialect::parseType(DialectAsmParser &parser) const {		Type GPUDialect::parseType(DialectAsmParser &parser) const {
// Parse the main keyword for the type.		// Parse the main keyword for the type.
StringRef keyword;		StringRef keyword;
Show All 36 Lines	return MMAMatrixType::getChecked(mlir::detail::getDefaultDiagnosticEmitFn(
parser.getEncodedSourceLoc(beginLoc)),		parser.getEncodedSourceLoc(beginLoc)),
shape, elementType, operand);		shape, elementType, operand);
}		}

if (keyword == getSparseHandleKeyword(SparseHandleKind::DnTensor))		if (keyword == getSparseHandleKeyword(SparseHandleKind::DnTensor))
return SparseDnTensorHandleType::get(context);		return SparseDnTensorHandleType::get(context);
if (keyword == getSparseHandleKeyword(SparseHandleKind::SpMat))		if (keyword == getSparseHandleKeyword(SparseHandleKind::SpMat))
return SparseSpMatHandleType::get(context);		return SparseSpMatHandleType::get(context);
		if (keyword == getSparseHandleKeyword(SparseHandleKind::SpGEMMOp))
		return SparseSpGEMMOpHandleType::get(context);

parser.emitError(parser.getNameLoc(), "unknown gpu type: " + keyword);		parser.emitError(parser.getNameLoc(), "unknown gpu type: " + keyword);
return Type();		return Type();
}		}
// TODO: print refined type here. Notice that should be corresponding to the		// TODO: print refined type here. Notice that should be corresponding to the
// parser		// parser
void GPUDialect::printType(Type type, DialectAsmPrinter &os) const {		void GPUDialect::printType(Type type, DialectAsmPrinter &os) const {
TypeSwitch<Type>(type)		TypeSwitch<Type>(type)
.Case<AsyncTokenType>([&](Type) { os << "async.token"; })		.Case<AsyncTokenType>([&](Type) { os << "async.token"; })
.Case<SparseDnTensorHandleType>([&](Type) {		.Case<SparseDnTensorHandleType>([&](Type) {
os << getSparseHandleKeyword(SparseHandleKind::DnTensor);		os << getSparseHandleKeyword(SparseHandleKind::DnTensor);
})		})
.Case<SparseSpMatHandleType>(		.Case<SparseSpMatHandleType>(
[&](Type) { os << getSparseHandleKeyword(SparseHandleKind::SpMat); })		[&](Type) { os << getSparseHandleKeyword(SparseHandleKind::SpMat); })
		.Case<SparseSpGEMMOpHandleType>([&](Type) {
		os << getSparseHandleKeyword(SparseHandleKind::SpGEMMOp);
		})
.Case<MMAMatrixType>([&](MMAMatrixType fragTy) {		.Case<MMAMatrixType>([&](MMAMatrixType fragTy) {
os << "mma_matrix<";		os << "mma_matrix<";
auto shape = fragTy.getShape();		auto shape = fragTy.getShape();
for (auto dim = shape.begin(), e = shape.end() - 1; dim != e; ++dim)		for (auto dim = shape.begin(), e = shape.end() - 1; dim != e; ++dim)
os << *dim << 'x';		os << *dim << 'x';
os << shape.back() << 'x' << fragTy.getElementType();		os << shape.back() << 'x' << fragTy.getElementType();
os << ", \"" << fragTy.getOperand() << "\"" << '>';		os << ", \"" << fragTy.getOperand() << "\"" << '>';
})		})
▲ Show 20 Lines • Show All 1,523 Lines • Show Last 20 Lines

mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp

Show First 20 Lines • Show All 472 Lines • ▼ Show 20 Lines	extern "C" MLIR_CUDA_WRAPPERS_EXPORT void mgpuSDDMM(int32_t ma, int32_t mb,
cusparseSpMatDescr_t matC = reinterpret_cast<cusparseSpMatDescr_t>(c);		cusparseSpMatDescr_t matC = reinterpret_cast<cusparseSpMatDescr_t>(c);
auto cTp = static_cast<cudaDataType_t>(ctp);		auto cTp = static_cast<cudaDataType_t>(ctp);
ALPHABETA(cTp, alpha, beta)		ALPHABETA(cTp, alpha, beta)
CUSPARSE_REPORT_IF_ERROR(cusparseSDDMM(cusparse_env, modeA, modeB, alphap,		CUSPARSE_REPORT_IF_ERROR(cusparseSDDMM(cusparse_env, modeA, modeB, alphap,
matA, matB, betap, matC, cTp,		matA, matB, betap, matC, cTp,
CUSPARSE_SDDMM_ALG_DEFAULT, buf))		CUSPARSE_SDDMM_ALG_DEFAULT, buf))
}		}

		// TODO: add support to passing alpha and beta as arguments
		aartbikUnsubmitted Done Reply Inline Actions we repeat this TODO everywhere, shall we just mention it once in the alpha/beta macro aartbik: we repeat this TODO everywhere, shall we just mention it once in the alpha/beta macro
		extern "C" MLIR_CUDA_WRAPPERS_EXPORT intptr_t mgpuSpGEMMWorkEstimation(
		void s, int32_t ma, int32_t mb, void a, void b, void c, int32_t ctp,
		intptr_t bs, void buf, CUstream /stream*/) {
		cusparseSpGEMMDescr_t spgemmDesc = reinterpret_cast<cusparseSpGEMMDescr_t>(s);
		cusparseOperation_t modeA = static_cast<cusparseOperation_t>(ma);
		cusparseOperation_t modeB = static_cast<cusparseOperation_t>(mb);
		cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);
		cusparseSpMatDescr_t matB = reinterpret_cast<cusparseSpMatDescr_t>(b);
		cusparseSpMatDescr_t matC = reinterpret_cast<cusparseSpMatDescr_t>(c);
		auto cTp = static_cast<cudaDataType_t>(ctp);
		ALPHABETA(cTp, alpha, beta)
		size_t newBufferSize = bs;

		CUSPARSE_REPORT_IF_ERROR(cusparseSpGEMM_workEstimation(
		cusparse_env, modeA, modeB, alphap, matA, matB, betap, matC, cTp,
		CUSPARSE_SPGEMM_DEFAULT, spgemmDesc, &newBufferSize, buf))
		return newBufferSize == 0 ? 1 : newBufferSize; // avoid zero-alloc
		}

		// TODO: add support to passing alpha and beta as arguments
		extern "C" MLIR_CUDA_WRAPPERS_EXPORT void
		mgpuSpGEMMEstimateMemory(void nbs3, void nbs2, void *s, int32_t ma,
		int32_t mb, void a, void b, void *c, int32_t ctp,
		int32_t alg, float chunk_fraction, intptr_t bs3,
		void buf3, intptr_t bs2, CUstream /stream*/) {
		cusparseSpGEMMDescr_t spgemmDesc = reinterpret_cast<cusparseSpGEMMDescr_t>(s);
		cusparseOperation_t modeA = static_cast<cusparseOperation_t>(ma);
		cusparseOperation_t modeB = static_cast<cusparseOperation_t>(mb);
		cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);
		cusparseSpMatDescr_t matB = reinterpret_cast<cusparseSpMatDescr_t>(b);
		cusparseSpMatDescr_t matC = reinterpret_cast<cusparseSpMatDescr_t>(c);
		auto cTp = static_cast<cudaDataType_t>(ctp);
		ALPHABETA(cTp, alpha, beta)
		size_t newBufferSize2 = reinterpret_cast<size_t >(nbs2);
		size_t newBufferSize3 = reinterpret_cast<size_t >(nbs3);
		*newBufferSize2 = bs2;
		*newBufferSize3 = bs3;
		auto algorithm = static_cast<cusparseSpGEMMAlg_t>(alg);

		CUSPARSE_REPORT_IF_ERROR(cusparseSpGEMM_estimateMemory(
		cusparse_env, modeA, modeB, alphap, matA, matB, betap, matC, cTp,
		algorithm, spgemmDesc, chunk_fraction, newBufferSize3, buf3,
		newBufferSize2))
		// avoid zero-alloc
		if (*newBufferSize2 == 0) {
		*newBufferSize2 = 1;
		}
		if (*newBufferSize3 == 0) {
		*newBufferSize3 = 1;
		}
		return;
		}

		extern "C" MLIR_CUDA_WRAPPERS_EXPORT intptr_t
		mgpuSpGEMMCompute(void s, int32_t ma, int32_t mb, void a, void b, void c,
		int32_t ctp, intptr_t bsz2, void buf2, CUstream /stream*/) {
		cusparseSpGEMMDescr_t spgemmDesc = reinterpret_cast<cusparseSpGEMMDescr_t>(s);
		cusparseOperation_t modeA = static_cast<cusparseOperation_t>(ma);
		cusparseOperation_t modeB = static_cast<cusparseOperation_t>(mb);
		cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);
		cusparseSpMatDescr_t matB = reinterpret_cast<cusparseSpMatDescr_t>(b);
		cusparseSpMatDescr_t matC = reinterpret_cast<cusparseSpMatDescr_t>(c);
		auto cTp = static_cast<cudaDataType_t>(ctp);
		ALPHABETA(cTp, alpha, beta)
		size_t newBufferSize2 = bsz2;
		CUSPARSE_REPORT_IF_ERROR(cusparseSpGEMM_compute(
		cusparse_env, modeA, modeB, alphap, matA, matB, betap, matC, cTp,
		CUSPARSE_SPGEMM_DEFAULT, spgemmDesc, &newBufferSize2, buf2))
		return newBufferSize2 == 0 ? 1 : newBufferSize2; // avoid zero-alloc
		}

		extern "C" MLIR_CUDA_WRAPPERS_EXPORT void
		mgpuSpGEMMCopy(void s, int32_t ma, int32_t mb, void a, void b, void c,
		int32_t ctp, int32_t alg, CUstream /stream/) {
		cusparseSpGEMMDescr_t spgemmDesc = reinterpret_cast<cusparseSpGEMMDescr_t>(s);
		cusparseOperation_t modeA = static_cast<cusparseOperation_t>(ma);
		cusparseOperation_t modeB = static_cast<cusparseOperation_t>(mb);
		cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);
		cusparseSpMatDescr_t matB = reinterpret_cast<cusparseSpMatDescr_t>(b);
		cusparseSpMatDescr_t matC = reinterpret_cast<cusparseSpMatDescr_t>(c);
		auto cTp = static_cast<cudaDataType_t>(ctp);
		auto algorithm = static_cast<cusparseSpGEMMAlg_t>(alg);
		ALPHABETA(cTp, alpha, beta)

		CUSPARSE_REPORT_IF_ERROR(cusparseSpGEMM_copy(cusparse_env, modeA, modeB,
		alphap, matA, matB, betap, matC,
		cTp, algorithm, spgemmDesc))
		}

		extern "C" MLIR_CUDA_WRAPPERS_EXPORT void *
		mgpuSpGEMMCreateDescr(CUstream /stream/) {
		// cusparseSpGEMMDescr_t is a pointer type
		cusparseSpGEMMDescr_t spgemmDesc = nullptr;
		CUSPARSE_REPORT_IF_ERROR(cusparseSpGEMM_createDescr(&spgemmDesc))
		return reinterpret_cast<void *>(spgemmDesc);
		}

		extern "C" MLIR_CUDA_WRAPPERS_EXPORT void
		mgpuSpGEMMDestroyDescr(void s, CUstream /stream*/) {
		// cusparseSpGEMMDescr_t is a pointer type
		cusparseSpGEMMDescr_t spgemmDesc = reinterpret_cast<cusparseSpGEMMDescr_t>(s);
		CUSPARSE_REPORT_IF_ERROR(cusparseSpGEMM_destroyDescr(spgemmDesc))
		}

		extern "C" MLIR_CUDA_WRAPPERS_EXPORT void
		mgpuSpGEMMGetSize(void m, void r, void c, void n, CUstream /stream/) {
		cusparseConstSpMatDescr_t matDescr =
		reinterpret_cast<cusparseConstSpMatDescr_t>(m);
		int64_t rows = reinterpret_cast<int64_t >(r);
		int64_t cols = reinterpret_cast<int64_t >(c);
		int64_t nnz = reinterpret_cast<int64_t >(n);
		CUSPARSE_REPORT_IF_ERROR(cusparseSpMatGetSize(matDescr, rows, cols, nnz));
		}

#ifdef MLIR_ENABLE_CUDA_CUSPARSELT		#ifdef MLIR_ENABLE_CUDA_CUSPARSELT

///		///
/// Wrapper methods for the cuSparseLt library.		/// Wrapper methods for the cuSparseLt library.
///		///

struct cusparseLtSpMatHandleAndData {		struct cusparseLtSpMatHandleAndData {
cusparseLtMatDescriptor_t mat;		cusparseLtMatDescriptor_t mat;
▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

mlir/test/Conversion/GPUCommon/lower-sparse-to-gpu-runtime-calls.mlir

Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	func.func @sddmm(%arg0: index) {
%bufferSz, %token6 = gpu.sddmm_buffer_size async [%token5] %dnmat, %dnmat, %spmat into f64		%bufferSz, %token6 = gpu.sddmm_buffer_size async [%token5] %dnmat, %dnmat, %spmat into f64
%token7 = gpu.sddmm async [%token6] %dnmat, %dnmat, %spmat, %mem2 : memref<?xf64> into f64		%token7 = gpu.sddmm async [%token6] %dnmat, %dnmat, %spmat, %mem2 : memref<?xf64> into f64
%token8 = gpu.destroy_sp_mat async [%token7] %spmat		%token8 = gpu.destroy_sp_mat async [%token7] %spmat
%token9 = gpu.destroy_dn_tensor async [%token8] %dnmat		%token9 = gpu.destroy_dn_tensor async [%token8] %dnmat
gpu.wait [%token9]		gpu.wait [%token9]
return		return
}		}


		// CHECK-LABEL: func @spgemm
		// CHECK: llvm.call @mgpuStreamCreate
		// CHECK: llvm.call @mgpuMemAlloc
		// CHECK: llvm.call @mgpuMemAlloc
		// CHECK: llvm.call @mgpuCreateCsr
		// CHECK: llvm.call @mgpuCreateCsr
		// CHECK: llvm.call @mgpuCreateCsr
		// CHECK: llvm.call @mgpuSpGEMMCreateDescr
		// CHECK: llvm.call @malloc
		// CHECK: llvm.call @mgpuSpGEMMWorkEstimation
		// CHECK: llvm.call @mgpuMemAlloc
		// CHECK: llvm.call @mgpuSpGEMMWorkEstimation
		// CHECK: llvm.call @mgpuSpGEMMEstimateMemory
		// CHECK: llvm.call @mgpuMemAlloc
		// CHECK: llvm.call @mgpuSpGEMMEstimateMemory
		// CHECK: llvm.call @mgpuMemAlloc
		// CHECK: llvm.call @mgpuSpGEMMCompute
		// CHECK: llvm.call @mgpuMemAlloc
		// CHECK: llvm.call @mgpuMemAlloc
		// CHECK: llvm.call @mgpuStreamSynchronize
		// CHECK: llvm.call @mgpuStreamDestroy
		// CHECK: llvm.call @mgpuStreamCreate
		// CHECK: llvm.call @mgpuSpGEMMCopy
		// CHECK: llvm.call @mgpuDestroySpMat
		// CHECK: llvm.call @mgpuDestroySpMat
		// CHECK: llvm.call @mgpuDestroySpMat
		// CHECK: llvm.call @mgpuStreamSynchronize
		// CHECK: llvm.call @mgpuStreamDestroy
		func.func @spgemm(%arg0: index) {
		%token0 = gpu.wait async
		%mem1, %token1 = gpu.alloc async [%token0] (%arg0) : memref<?xindex>
		%mem2, %token2 = gpu.alloc async [%token1] (%arg0) : memref<?xf64>
		%spmatA, %token3 = gpu.create_csr async [%token2] %arg0, %arg0, %arg0, %mem1, %mem1, %mem2 : memref<?xindex>, memref<?xindex>, memref<?xf64>
		%spmatB, %token4 = gpu.create_csr async [%token3] %arg0, %arg0, %arg0, %mem1, %mem1, %mem2 : memref<?xindex>, memref<?xindex>, memref<?xf64>
		%spmatC, %token5 = gpu.create_csr async [%token4] %arg0, %arg0, %arg0, %mem1, %mem1, %mem2 : memref<?xindex>, memref<?xindex>, memref<?xf64>
		%spgemmDesc, %token6 = gpu.spgemm_create_descr async [%token5]
		// Used as nullptr
		%alloc = memref.alloc() : memref<0xi8>
		%c0 = arith.constant 0 : index
		%bufferSz1, %token7 = gpu.spgemm_work_estimation async [%token6] %spmatA{NON_TRANSPOSE}, %spmatB{NON_TRANSPOSE}, %spmatC, ALG2, %spgemmDesc, %c0, %alloc: f32 into memref<0xi8>
		aartbikUnsubmitted Done Reply Inline Actions break lone line aartbik: break lone line
		%buf1, %token8 = gpu.alloc async [%token7] (%bufferSz1) : memref<?xi8>
		%bufferSz1_1, %token9 = gpu.spgemm_work_estimation async [%token8] %spmatA, %spmatB, %spmatC, ALG2, %spgemmDesc, %bufferSz1, %buf1: f32 into memref<?xi8>
		%bufferSz3, %dummy, %token10 = gpu.spgemm_estimate_memory async [%token9] %spmatA, %spmatB, %spmatC, ALG2, %spgemmDesc, %c0, %c0, %alloc: f32 into memref<0xi8>
		%buf3, %token11 = gpu.alloc async [%token10] (%bufferSz3) : memref<?xi8>
		%bufferSz3_2, %bufferSz2, %token12 = gpu.spgemm_estimate_memory async [%token11] %spmatA, %spmatB, %spmatC, ALG2, %spgemmDesc, %bufferSz3, %c0, %buf3: f32 into memref<?xi8>
		aartbikUnsubmitted Done Reply Inline Actions and here aartbik: and here
		%buf2, %token13 = gpu.alloc async [%token12] (%bufferSz2) : memref<?xi8>
		%bufferSz2_2, %token14 = gpu.spgemm_compute async [%token13] %spmatA, %spmatB, %spmatC, ALG2, %spgemmDesc, %bufferSz2, %buf2: f32 into memref<?xi8>
		%rows, %cols, %nnz, %token15 = gpu.spgemm_get_size async [%token14] %spmatC
		%mem_columns, %token16 = gpu.alloc async [%token15] (%cols) : memref<?xi32>
		%mem_values, %token17 = gpu.alloc async [%token16] (%nnz) : memref<?xf32>
		gpu.wait [%token17]
		%token18 = gpu.wait async
		%token19 = gpu.spgemm_copy async [%token18] %spmatA, %spmatB, %spmatC, ALG2, %spgemmDesc: f32
		%token20 = gpu.destroy_sp_mat async [%token19] %spmatA
		%token21 = gpu.destroy_sp_mat async [%token20] %spmatB
		%token22 = gpu.destroy_sp_mat async [%token21] %spmatC
		gpu.wait [%token22]
		return
		}

}		}

mlir/test/Dialect/GPU/sparse-roundtrip.mlir

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	func.func @matmul(%arg0: index) {
%bufferSz, %token6 = gpu.spmm_buffer_size async [%token5] %spmat, %dnmat, %dnmat : index into f64		%bufferSz, %token6 = gpu.spmm_buffer_size async [%token5] %spmat, %dnmat, %dnmat : index into f64
%token7 = gpu.spmm async [%token6] %spmat, %dnmat, %dnmat, %mem2 : memref<?xf64> into f64		%token7 = gpu.spmm async [%token6] %spmat, %dnmat, %dnmat, %mem2 : memref<?xf64> into f64
%token8 = gpu.destroy_sp_mat async [%token7] %spmat		%token8 = gpu.destroy_sp_mat async [%token7] %spmat
%token9 = gpu.destroy_dn_tensor async [%token8] %dnmat		%token9 = gpu.destroy_dn_tensor async [%token8] %dnmat
gpu.wait [%token9]		gpu.wait [%token9]
return		return
}		}

		// CHECK-LABEL: func @spgemm
		// CHECK-SAME: %{{.*}} = gpu.wait async
		// CHECK: %{{.}}, %{{.}} = gpu.alloc async [%{{.}}] (%{{.}}) : memref<?xindex>
		// CHECK: %{{.}}, %{{.}} = gpu.alloc async [%{{.*}}] (%[[VAL_3]]) : memref<?xf64>
		// CHECK: %{{.}}, %{{.}} = gpu.create_csr async [%{{.*}}] %[[VAL_3]], %[[VAL_3]], %[[VAL_3]], %[[VAL_1]], %[[VAL_1]], %[[VAL_4]] : memref<?xindex>, memref<?xindex>, memref<?xf64>
		// CHECK: %{{.}}, %{{.}} = gpu.create_csr async [%{{.*}}] %[[VAL_3]], %[[VAL_3]], %[[VAL_3]], %[[VAL_1]], %[[VAL_1]], %[[VAL_4]] : memref<?xindex>, memref<?xindex>, memref<?xf64>
		// CHECK: %{{.}}, %{{.}} = gpu.create_csr async [%{{.*}}] %[[VAL_3]], %[[VAL_3]], %[[VAL_3]], %[[VAL_1]], %[[VAL_1]], %[[VAL_4]] : memref<?xindex>, memref<?xindex>, memref<?xf64>
		// CHECK: %{{.}}, %{{.}} = gpu.spgemm_create_descr async [%{{.*}}]
		// CHECK: %{{.*}} = memref.alloc() : memref<0xi8>
		// CHECK: %{{.*}} = arith.constant 0 : index
		// CHECK: %{{.}}, %{{.}} = gpu.spgemm_work_estimation async [%{{.*}}] %[[VAL_6]], %[[VAL_8]], %[[VAL_10]], ALG2, %[[VAL_12]], %[[VAL_15]], %[[VAL_14]] : f32 into memref<0xi8>
		// CHECK: %{{.}}, %{{.}} = gpu.alloc async [%{{.*}}] (%[[VAL_16]]) : memref<?xi8>
		// CHECK: %{{.}}, %{{.}} = gpu.spgemm_work_estimation async [%{{.*}}] %[[VAL_6]], %[[VAL_8]], %[[VAL_10]], ALG2, %[[VAL_12]], %[[VAL_16]], %[[VAL_18]] : f32 into memref<?xi8>
		// CHECK: %{{.}}, %{{.}}, %{{.}} = gpu.spgemm_estimate_memory async [%{{.}}] %[[VAL_6]], %[[VAL_8]], %[[VAL_10]], ALG2, %[[VAL_12]], %[[VAL_15]], %[[VAL_15]], %[[VAL_14]] : f32 into memref<0xi8>
		// CHECK: %{{.}}, %{{.}} = gpu.alloc async [%{{.*}}] (%[[VAL_22]]) : memref<?xi8>
		// CHECK: %{{.}}, %{{.}}, %{{.}} = gpu.spgemm_estimate_memory async [%{{.}}] %[[VAL_6]], %[[VAL_8]], %[[VAL_10]], ALG2, %[[VAL_12]], %[[VAL_22]], %[[VAL_15]], %[[VAL_25]] : f32 into memref<?xi8>
		// CHECK: %{{.}}, %{{.}} = gpu.alloc async [%{{.*}}] (%[[VAL_28]]) : memref<?xi8>
		// CHECK: %{{.}}, %{{.}} = gpu.spgemm_compute async [%{{.*}}] %[[VAL_6]], %[[VAL_8]], %[[VAL_10]], ALG2, %[[VAL_12]], %[[VAL_28]], %[[VAL_30]] : f32 into memref<?xi8>
		// CHECK: %{{.}}, %{{.}}, %{{.}}, %{{.}} = gpu.spgemm_get_size async [%{{.*}}] %[[VAL_10]]
		// CHECK: %{{.}}, %{{.}} = gpu.alloc async [%{{.*}}] (%[[VAL_35]]) : memref<?xi32>
		// CHECK: %{{.}}, %{{.}} = gpu.alloc async [%{{.*}}] (%[[VAL_36]]) : memref<?xf32>
		// CHECK: gpu.wait [%{{.*}}]
		// CHECK: gpu.spgemm_copy %[[VAL_6]], %[[VAL_8]], %[[VAL_10]], ALG2, %[[VAL_12]] : f32
		// CHECK: gpu.destroy_sp_mat %[[VAL_6]]
		// CHECK: gpu.destroy_sp_mat %[[VAL_8]]
		// CHECK: gpu.destroy_sp_mat %[[VAL_10]]
		// CHECK: return
		func.func @spgemm(%arg0: index) {
		%token0 = gpu.wait async
		%mem1, %token1 = gpu.alloc async [%token0] (%arg0) : memref<?xindex>
		%mem2, %token2 = gpu.alloc async [%token1] (%arg0) : memref<?xf64>
		%spmatA, %token3 = gpu.create_csr async [%token2] %arg0, %arg0, %arg0, %mem1, %mem1, %mem2 : memref<?xindex>, memref<?xindex>, memref<?xf64>
		%spmatB, %token4 = gpu.create_csr async [%token3] %arg0, %arg0, %arg0, %mem1, %mem1, %mem2 : memref<?xindex>, memref<?xindex>, memref<?xf64>
		%spmatC, %token5 = gpu.create_csr async [%token4] %arg0, %arg0, %arg0, %mem1, %mem1, %mem2 : memref<?xindex>, memref<?xindex>, memref<?xf64>
		%spgemmDesc, %token6 = gpu.spgemm_create_descr async [%token5]
		// Used as nullptr
		%alloc = memref.alloc() : memref<0xi8>
		%c0 = arith.constant 0 : index
		%bufferSz1, %token7 = gpu.spgemm_work_estimation async [%token6] %spmatA{NON_TRANSPOSE}, %spmatB{NON_TRANSPOSE}, %spmatC, ALG2, %spgemmDesc, %c0, %alloc: f32 into memref<0xi8>
		aartbikUnsubmitted Done Reply Inline Actions break aartbik: break
		%buf1, %token8 = gpu.alloc async [%token7] (%bufferSz1) : memref<?xi8>
		%bufferSz1_1, %token9 = gpu.spgemm_work_estimation async [%token8] %spmatA, %spmatB, %spmatC, ALG2, %spgemmDesc, %bufferSz1, %buf1: f32 into memref<?xi8>
		%bufferSz3, %dummy, %token10 = gpu.spgemm_estimate_memory async [%token9] %spmatA, %spmatB, %spmatC, ALG2, %spgemmDesc, %c0, %c0, %alloc: f32 into memref<0xi8>
		%buf3, %token11 = gpu.alloc async [%token10] (%bufferSz3) : memref<?xi8>
		%bufferSz3_2, %bufferSz2, %token12 = gpu.spgemm_estimate_memory async [%token11] %spmatA, %spmatB, %spmatC, ALG2, %spgemmDesc, %bufferSz3, %c0, %buf3: f32 into memref<?xi8>
		aartbikUnsubmitted Done Reply Inline Actions break aartbik: break
		%buf2, %token13 = gpu.alloc async [%token12] (%bufferSz2) : memref<?xi8>
		%bufferSz2_2, %token14 = gpu.spgemm_compute async [%token13] %spmatA, %spmatB, %spmatC, ALG2, %spgemmDesc, %bufferSz2, %buf2: f32 into memref<?xi8>
		%rows, %cols, %nnz, %token15 = gpu.spgemm_get_size async [%token14] %spmatC
		%mem_columns, %token16 = gpu.alloc async [%token15] (%cols) : memref<?xi32>
		%mem_values, %token17 = gpu.alloc async [%token16] (%nnz) : memref<?xf32>
		gpu.wait [%token17]
		gpu.spgemm_copy %spmatA, %spmatB, %spmatC, ALG2, %spgemmDesc: f32
		gpu.destroy_sp_mat %spmatA
		gpu.destroy_sp_mat %spmatB
		gpu.destroy_sp_mat %spmatC
		return
		}

// CHECK-LABEL: func @sddmm		// CHECK-LABEL: func @sddmm
// CHECK: %{{.*}} = gpu.wait async		// CHECK: %{{.*}} = gpu.wait async
// CHECK: %{{.}}, %{{.}} = gpu.alloc async [%{{.}}] (%{{.}}) : memref<?xindex>		// CHECK: %{{.}}, %{{.}} = gpu.alloc async [%{{.}}] (%{{.}}) : memref<?xindex>
// CHECK: %{{.}}, %{{.}} = gpu.alloc async [%{{.}}] (%{{.}}) : memref<?xf64>		// CHECK: %{{.}}, %{{.}} = gpu.alloc async [%{{.}}] (%{{.}}) : memref<?xf64>
// CHECK: %{{.}}, %{{.}} = gpu.create_csr async [%{{.}}] %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.*}} : memref<?xindex>, memref<?xindex>, memref<?xf64>		// CHECK: %{{.}}, %{{.}} = gpu.create_csr async [%{{.}}] %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.}}, %{{.*}} : memref<?xindex>, memref<?xindex>, memref<?xf64>
// CHECK: %{{.}}, %{{.}} = gpu.create_dn_tensor async [%{{.}}] %{{.}}, %{{.}}, %{{.}} : index, index into memref<?xf64>		// CHECK: %{{.}}, %{{.}} = gpu.create_dn_tensor async [%{{.}}] %{{.}}, %{{.}}, %{{.}} : index, index into memref<?xf64>
// CHECK: %{{.}}, %{{.}} = gpu.sddmm_buffer_size async [%{{.}}] %{{.}}, %{{.}}, %{{.}} into f64		// CHECK: %{{.}}, %{{.}} = gpu.sddmm_buffer_size async [%{{.}}] %{{.}}, %{{.}}, %{{.}} into f64
// CHECK: %{{.}} = gpu.sddmm async [%{{.}}] %{{.}}, %{{.}}, %{{.}}, %{{.}} : memref<?xf64> into f64		// CHECK: %{{.}} = gpu.sddmm async [%{{.}}] %{{.}}, %{{.}}, %{{.}}, %{{.}} : memref<?xf64> into f64
Show All 21 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse][gpu] add spgemm operatorClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 542602

mlir/include/mlir/Dialect/GPU/IR/GPUBase.td

mlir/include/mlir/Dialect/GPU/IR/GPUDialect.h

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp

mlir/lib/Dialect/GPU/IR/GPUDialect.cpp

mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp

mlir/test/Conversion/GPUCommon/lower-sparse-to-gpu-runtime-calls.mlir

mlir/test/Dialect/GPU/sparse-roundtrip.mlir

[mlir][sparse][gpu] add spgemm operator
ClosedPublic