This is an archive of the discontinued LLVM Phabricator instance.

K-Wu retitled this revision from [mlir][sparse][gpu] add result type to spmv and spmm gpu libgen path to [mlir][sparse][gpu] add result type to spmv, spmm and sddmm gpu libgen path.May 27 2023, 1:35 PM

fixed compile errors

aartbik added inline comments.May 27 2023, 1:53 PM

mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
231	this can go now, right?
240	shall we do the same for index to be consistent?

Harbormaster completed remote builds in B235037: Diff 526292.May 27 2023, 2:01 PM

should be minor

Harbormaster completed remote builds in B235039: Diff 526294.May 27 2023, 4:13 PM

K-Wu added inline comments.May 30 2023, 12:31 PM

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
1261	Seeking ways for easier-to-read literal comparison

using the API Wren suggested

rebase origin/main

Peiming added inline comments.May 30 2023, 1:00 PM

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
1261	You should be able to perform the compare without type conversion. It is essentially a pointer comparison and they will share the same underlying storage if they are equal.

wrengr added inline comments.May 30 2023, 1:13 PM

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
1259–1264	It would be cleaner to just call `getConstantIntValue` once and save the results. I've attached an edit to show what I mean. (Though really, there's not a whole lot of benefit to the `dw_` variable, so you might as well just use `dw` directly wherever needed) Though really this whole pattern should be floated out to a helper function, including the parts that check `dw` and return the corresponding `FloatType`— since the only difference between the version here and the version for `mgpuCreateCOO` is which `op.getOperand` is used.
1288	Since `gpu::CreateDnMatOp::getMemref` returns a `TypedValue<MemRefType>`, the `TypedValue<MemRefType>::getType` method will already return a `MemRefType`, therefore you don't need the `llvm::cast`
1294–1303	This is yet another place that uses the same function for converting a `Value` (which has `LLVM::ConstantOp` defining op) into the appropriate `FloatType`...
1338–1339	You can replace this with just `computeTypeOptional.value_or(defaultType)` https://en.cppreference.com/w/cpp/utility/optional/value_or

can you please add some examples to the ops.mlir for proper roundtrippnig

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1863	ah, good catch on this typo!
1891	I think it is more common in MLIR to put the result type at the end with a : as in ... : f64

Peiming added inline comments.May 30 2023, 1:19 PM

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
1256–1257	I don't think you need it (same below for getDnxxx)

Harbormaster completed remote builds in B235393: Diff 526754.May 30 2023, 1:27 PM

simpler get element type func

K-Wu marked an inline comment as done.May 30 2023, 2:19 PM

K-Wu added inline comments.

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
1261	I wish I could have known it earlier. Thank you!

K-Wu marked an inline comment as done.May 30 2023, 2:20 PM

K-Wu marked 2 inline comments as done.

K-Wu marked 2 inline comments as done.May 30 2023, 2:28 PM

K-Wu marked an inline comment as done.

rm unnecessary MemRefType cast

Harbormaster completed remote builds in B235416: Diff 526797.May 30 2023, 2:46 PM

clean up the code a bit

rm dw

fix to pass tests

K-Wu marked 2 inline comments as done.May 30 2023, 4:49 PM

Harbormaster completed remote builds in B235442: Diff 526846.May 30 2023, 6:16 PM

upd format; add roundtrip test

Herald added a subscriber: anlunx. · View Herald TranscriptMay 30 2023, 6:23 PM

K-Wu added inline comments.May 30 2023, 6:25 PM

mlir/test/Dialect/GPU/sparse-roundtrip.mlir
42 ↗	(On Diff #526864)	Hi @aartbik , I added a round-trip test where the new syntax `: computeType` is tested. Let me know what you think. Thanks!

Harbormaster completed remote builds in B235455: Diff 526864.May 30 2023, 6:41 PM

refactor index type as well

K-Wu marked an inline comment as done.May 30 2023, 10:05 PM

Harbormaster completed remote builds in B235478: Diff 526899.May 30 2023, 10:22 PM

Peiming added inline comments.May 31 2023, 9:00 AM

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
1327–1328	No longer needed?
1331	nit: maybe renaming to `getConstInt32From`?
1335	nit: `getConstInt32xxx`

addressing comments

rebase origin/main

K-Wu marked 2 inline comments as done.May 31 2023, 9:44 AM

K-Wu added inline comments.

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
1327–1328	Good catch! Addressed in new commit

Harbormaster completed remote builds in B235604: Diff 527094.May 31 2023, 10:40 AM

Build 364523 failed due to upload failure and is not caused by the code/test itself

aartbik added inline comments.May 31 2023, 2:57 PM

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1913	make this Examples and add an op below that has a compute type set
2000	given the mix of return types and other types, an "into" feels a bit nicer for compute type. also, do we perhaps simply want to make compute type non-optional and always show it?
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
692	Start with capital C and end with period
mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
308	this TODO is done, right?

K-Wu marked 3 inline comments as done.May 31 2023, 3:19 PM

K-Wu added inline comments.

mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
308	Good catch. Thanks!

K-Wu marked an inline comment as done.May 31 2023, 3:20 PM

addressing comments

K-Wu marked an inline comment as done.May 31 2023, 3:22 PM

Harbormaster completed remote builds in B235679: Diff 527210.May 31 2023, 3:57 PM

fix compile errors

rebase origin/main

Harbormaster completed remote builds in B235695: Diff 527226.May 31 2023, 5:08 PM

Peiming added inline comments.Jun 1 2023, 9:19 AM

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
695	Here and Below: Do you think it is better to copy the CUSPARSE macro to the file (but probably adding some prefix) so that it can be updated more easily later? Or does the current way already follows the same convention in the file?

K-Wu added inline comments.Jun 1 2023, 9:31 AM

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
695	It is kind of the reverse. Here we map types to integers, but in the cuda header files these enum class definitions are essentially mapping types to an integer, I.e., assigning an integer to each enum item. You raise a good point that we need to have better way to keep in sync, though the CUDA headers seem to keep the mapping of already existing types the same when adding new types probably as a way to keep backward compatibility. I will need to think about how to do that in the future.

aartbik accepted this revision.Jun 1 2023, 10:03 AM

This revision is now accepted and ready to land.Jun 1 2023, 10:03 AM

rebase origin/main

This revision was landed with ongoing or failed builds.Jun 1 2023, 10:17 AM

Closed by commit rGcc402de0b13b: [mlir][sparse][gpu] add result type to spmv and spmm gpu libgen path (authored by K-Wu). · Explain Why

This revision was automatically updated to reflect the committed changes.

K-Wu added a commit: rGcc402de0b13b: [mlir][sparse][gpu] add result type to spmv and spmm gpu libgen path.

Harbormaster completed remote builds in B235886: Diff 527476.Jun 1 2023, 11:29 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

GPU/

IR/

GPUOps.td

91 lines

lib/

Conversion/

GPUCommon/

GPUToLLVMConversion.cpp

173 lines

ExecutionEngine/

CudaRuntimeWrappers.cpp

24 lines

Diff 526752

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

Show First 20 Lines • Show All 1,854 Lines • ▼ Show 20 Lines	let description = [{

The matrix arguments can also be associated with one of the following		The matrix arguments can also be associated with one of the following
operators: NON_TRANSPOSE, TRANSPOSE, CONJUGATE_TRANSPOSE. The default value		operators: NON_TRANSPOSE, TRANSPOSE, CONJUGATE_TRANSPOSE. The default value
is NON_TRANSPOSE.		is NON_TRANSPOSE.

Example:		Example:

```mlir		```mlir
%buffersz, %token = gpu.spmv_buffersize async [%dep] %env, %spmatA{TRANSPOSE}, %dnX, %dnY		%buffersz, %token = gpu.spmv_buffer_size async [%dep] %env, %spmatA{TRANSPOSE}, %dnX, %dnY
		aartbikUnsubmitted Done Reply Inline Actions ah, good catch on this typo! aartbik: ah, good catch on this typo!
```		```
}];		}];
let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
GPU_SparseEnvHandle:$env,		GPU_SparseEnvHandle:$env,
GPU_TransposeModeAttr:$modeA,		GPU_TransposeModeAttr:$modeA,
GPU_SparseSpMatHandle:$spmatA,		GPU_SparseSpMatHandle:$spmatA,
GPU_SparseDnVecHandle:$dnX,		GPU_SparseDnVecHandle:$dnX,
GPU_SparseDnVecHandle:$dnY);		GPU_SparseDnVecHandle:$dnY,
		OptionalAttr<TypeAttr>:$computeType);
let results = (outs Res<Index>:$bufferSz,		let results = (outs Res<Index>:$bufferSz,
Optional<GPU_AsyncToken>:$asyncToken);		Optional<GPU_AsyncToken>:$asyncToken);

let builders = [OpBuilder<(ins		let builders = [OpBuilder<(ins
"::mlir::Type":$bufferSz,		"Type":$bufferSz,
"::mlir::Type":$asyncToken,		"Type":$asyncToken,
"::mlir::ValueRange":$asyncDependencies,		"ValueRange":$asyncDependencies,
"::mlir::Value":$env,		"Value":$env,
"::mlir::Value":$spmatA,		"Value":$spmatA,
"::mlir::Value":$dnX,		"Value":$dnX,
"::mlir::Value":$dnY), [{		"Value":$dnY)
		, [{
auto modeA = gpu::TransposeMode::NON_TRANSPOSE;		auto modeA = gpu::TransposeMode::NON_TRANSPOSE;
return build($_builder, $_state, bufferSz, asyncToken, asyncDependencies, env,		return build($_builder, $_state, bufferSz, asyncToken, asyncDependencies,
modeA, spmatA, dnX, dnY);}]>		env, modeA, spmatA, dnX, dnY, {});}]>
];		];

let assemblyFormat = [{		let assemblyFormat = [{
		(`{` $computeType^ `}`)?
		aartbikUnsubmitted Done Reply Inline Actions I think it is more common in MLIR to put the result type at the end with a : as in ... : f64 aartbik: I think it is more common in MLIR to put the result type at the end with a : as in ...
custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
$env `,` $spmatA (`{` $modeA^ `}`)? `,` $dnX `,` $dnY attr-dict		$env `,` $spmatA (`{` $modeA^ `}`)? `,` $dnX `,` $dnY attr-dict
}];		}];
}		}

def GPU_SpMVOp : GPU_Op<"spmv", [GPU_AsyncOpInterface]> {		def GPU_SpMVOp : GPU_Op<"spmv", [GPU_AsyncOpInterface]> {
let summary = "SpMV operation";		let summary = "SpMV operation";
let description = [{		let description = [{
The `gpu.spmv` operation performs the SpMV operation on the given sparse matrix,		The `gpu.spmv` operation performs the SpMV operation on the given sparse matrix,
dense vectors, and buffer. The operation expects handles returned by previous		dense vectors, and buffer. The operation expects handles returned by previous
sparse operations to construct an environment and the operands for SpMV. The		sparse operations to construct an environment and the operands for SpMV. The
buffer must have been allocated on the device.		buffer must have been allocated on the device.

If the `async` keyword is present, the op is executed asynchronously (i.e.		If the `async` keyword is present, the op is executed asynchronously (i.e.
it does not block until the execution has finished on the device). In		it does not block until the execution has finished on the device). In
that case, it returns a !gpu.async.token in addition to the environment.		that case, it returns a !gpu.async.token in addition to the environment.

The matrix arguments can also be associated with one of the following		The matrix arguments can also be associated with one of the following
operators: NON_TRANSPOSE, TRANSPOSE, CONJUGATE_TRANSPOSE. The default value		operators: NON_TRANSPOSE, TRANSPOSE, CONJUGATE_TRANSPOSE. The default value
is NON_TRANSPOSE.		is NON_TRANSPOSE.

Example:		Example:
		aartbikUnsubmitted Done Reply Inline Actions make this Examples and add an op below that has a compute type set aartbik: make this Examples and add an op below that has a compute type set

```mlir		```mlir
%token = gpu.spmv async [%dep] %env, %spmatA{TRANSPOSE}, %dnX, %dnY : memref<?xf64>		%token = gpu.spmv async [%dep] %env, %spmatA{TRANSPOSE}, %dnX, %dnY : memref<?xf64>
```		```
}];		}];
let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
GPU_SparseEnvHandle:$env,		GPU_SparseEnvHandle:$env,
GPU_TransposeModeAttr:$modeA,		GPU_TransposeModeAttr:$modeA,
GPU_SparseSpMatHandle:$spmatA,		GPU_SparseSpMatHandle:$spmatA,
GPU_SparseDnVecHandle:$dnX,		GPU_SparseDnVecHandle:$dnX,
GPU_SparseDnVecHandle:$dnY,		GPU_SparseDnVecHandle:$dnY,
		OptionalAttr<TypeAttr>:$computeType,
AnyMemRef:$buffer);		AnyMemRef:$buffer);
let results = (outs Optional<GPU_AsyncToken>:$asyncToken);		let results = (outs Optional<GPU_AsyncToken>:$asyncToken);

let builders = [OpBuilder<(ins		let builders = [OpBuilder<(ins
"::mlir::Type":$asyncToken,		"Type":$asyncToken,
"::mlir::ValueRange":$asyncDependencies,		"ValueRange":$asyncDependencies,
"::mlir::Value":$env,		"Value":$env,
"::mlir::Value":$spmatA,		"Value":$spmatA,
"::mlir::Value":$dnX,		"Value":$dnX,
"::mlir::Value":$dnY,		"Value":$dnY,
"::mlir::Value":$buffer), [{		"Value":$buffer), [{
auto modeA = gpu::TransposeMode::NON_TRANSPOSE;		auto modeA = gpu::TransposeMode::NON_TRANSPOSE;
return build($_builder, $_state, asyncToken, asyncDependencies, env, modeA,		return build($_builder, $_state, asyncToken, asyncDependencies, env, modeA,
spmatA, dnX, dnY, buffer);}]>		spmatA, dnX, dnY, {}, buffer);}]>
];		];

let assemblyFormat = [{		let assemblyFormat = [{
		(`{` $computeType^ `}`)?
custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
$env `,` $spmatA (`{` $modeA^ `}`)? `,` $dnX `,` $dnY `,` $buffer attr-dict `:` type($buffer)		$env `,` $spmatA (`{` $modeA^ `}`)? `,` $dnX `,` $dnY `,` $buffer attr-dict `:` type($buffer)
}];		}];
}		}

def GPU_SpMMBufferSizeOp : GPU_Op<"spmm_buffer_size", [GPU_AsyncOpInterface]> {		def GPU_SpMMBufferSizeOp : GPU_Op<"spmm_buffer_size", [GPU_AsyncOpInterface]> {
let summary = "Precompute buffersize for SpMM operation";		let summary = "Precompute buffersize for SpMM operation";
let description = [{		let description = [{
Show All 18 Lines	def GPU_SpMMBufferSizeOp : GPU_Op<"spmm_buffer_size", [GPU_AsyncOpInterface]> {
}];		}];

let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
GPU_SparseEnvHandle:$env,		GPU_SparseEnvHandle:$env,
GPU_TransposeModeAttr:$modeA,		GPU_TransposeModeAttr:$modeA,
GPU_TransposeModeAttr:$modeB,		GPU_TransposeModeAttr:$modeB,
GPU_SparseSpMatHandle:$spmatA,		GPU_SparseSpMatHandle:$spmatA,
GPU_SparseDnMatHandle:$dnmatB,		GPU_SparseDnMatHandle:$dnmatB,
GPU_SparseDnMatHandle:$dnmatC);		GPU_SparseDnMatHandle:$dnmatC,
		OptionalAttr<TypeAttr>:$computeType);
let results = (outs Res<Index>:$bufferSz,		let results = (outs Res<Index>:$bufferSz,
Optional<GPU_AsyncToken>:$asyncToken);		Optional<GPU_AsyncToken>:$asyncToken);

let builders = [OpBuilder<(ins		let builders = [OpBuilder<(ins
"::mlir::Type":$bufferSz,		"Type":$bufferSz,
"::mlir::Type":$asyncToken,		"Type":$asyncToken,
"::mlir::ValueRange":$asyncDependencies,		"ValueRange":$asyncDependencies,
"::mlir::Value":$env,		"Value":$env,
"::mlir::Value":$spmatA,		"Value":$spmatA,
"::mlir::Value":$dnmatB,		"Value":$dnmatB,
"::mlir::Value":$dnmatC), [{		"Value":$dnmatC), [{
auto modeA = gpu::TransposeMode::NON_TRANSPOSE;		auto modeA = gpu::TransposeMode::NON_TRANSPOSE;
auto modeB = gpu::TransposeMode::NON_TRANSPOSE;		auto modeB = gpu::TransposeMode::NON_TRANSPOSE;
return build($_builder, $_state, bufferSz, asyncToken, asyncDependencies,		return build($_builder, $_state, bufferSz, asyncToken, asyncDependencies,
env, modeA, modeB, spmatA, dnmatB, dnmatC);}]>		env, modeA, modeB, spmatA, dnmatB, dnmatC, {});}]>
];		];

let assemblyFormat = [{		let assemblyFormat = [{
		(`{` $computeType^ `}`)?
custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
$env `,` $spmatA (`{` $modeA^ `}`)? `,` $dnmatB (`{` $modeB^ `}`)? `,` $dnmatC attr-dict		$env `,` $spmatA (`{` $modeA^ `}`)? `,` $dnmatB (`{` $modeB^ `}`)? `,` $dnmatC attr-dict
		aartbikUnsubmitted Done Reply Inline Actions given the mix of return types and other types, an "into" feels a bit nicer for compute type. also, do we perhaps simply want to make compute type non-optional and always show it? aartbik: given the mix of return types and other types, an "into" feels a bit nicer for compute type.
}];		}];
}		}

def GPU_SpMMOp : GPU_Op<"spmm", [GPU_AsyncOpInterface]> {		def GPU_SpMMOp : GPU_Op<"spmm", [GPU_AsyncOpInterface]> {
let summary = "SpMM operation";		let summary = "SpMM operation";
let description = [{		let description = [{
The `gpu.spmm` operation performs the SpMM operation on the given sparse and		The `gpu.spmm` operation performs the SpMM operation on the given sparse and
dense matrix, and buffer. The operation expects handles returned by previous		dense matrix, and buffer. The operation expects handles returned by previous
Show All 17 Lines	def GPU_SpMMOp : GPU_Op<"spmm", [GPU_AsyncOpInterface]> {

let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
GPU_SparseEnvHandle:$env,		GPU_SparseEnvHandle:$env,
GPU_TransposeModeAttr:$modeA,		GPU_TransposeModeAttr:$modeA,
GPU_TransposeModeAttr:$modeB,		GPU_TransposeModeAttr:$modeB,
GPU_SparseSpMatHandle:$spmatA,		GPU_SparseSpMatHandle:$spmatA,
GPU_SparseDnMatHandle:$dnmatB,		GPU_SparseDnMatHandle:$dnmatB,
GPU_SparseDnMatHandle:$dnmatC,		GPU_SparseDnMatHandle:$dnmatC,
		OptionalAttr<TypeAttr>:$computeType,
AnyMemRef:$buffer);		AnyMemRef:$buffer);
let results = (outs Optional<GPU_AsyncToken>:$asyncToken);		let results = (outs Optional<GPU_AsyncToken>:$asyncToken);

let builders = [OpBuilder<(ins		let builders = [OpBuilder<(ins
"::mlir::Type":$asyncToken,		"Type":$asyncToken,
"::mlir::ValueRange":$asyncDependencies,		"ValueRange":$asyncDependencies,
"::mlir::Value":$env,		"Value":$env,
"::mlir::Value":$spmatA,		"Value":$spmatA,
"::mlir::Value":$dnmatB,		"Value":$dnmatB,
"::mlir::Value":$dnmatC,		"Value":$dnmatC,
"::mlir::Value":$buffer), [{		"Value":$buffer), [{
auto modeA = gpu::TransposeMode::NON_TRANSPOSE;		auto modeA = gpu::TransposeMode::NON_TRANSPOSE;
auto modeB = gpu::TransposeMode::NON_TRANSPOSE;		auto modeB = gpu::TransposeMode::NON_TRANSPOSE;
return build($_builder, $_state, asyncToken, asyncDependencies, env, modeA,		return build($_builder, $_state, asyncToken, asyncDependencies, env, modeA,
modeB, spmatA, dnmatB, dnmatC, buffer);}]>		modeB, spmatA, dnmatB, dnmatC, {}, buffer);}]>
];		];

let assemblyFormat = [{		let assemblyFormat = [{
		(`{` $computeType^ `}`)?
custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
$env `,` $spmatA (`{` $modeA^ `}`)? `,` $dnmatB (`{` $modeB^ `}`)? `,` $dnmatC `,` $buffer attr-dict `:` type($buffer)		$env `,` $spmatA (`{` $modeA^ `}`)? `,` $dnmatB (`{` $modeB^ `}`)? `,` $dnmatC `,` $buffer attr-dict `:` type($buffer)
}];		}];
}		}

def GPU_SDDMMBufferSizeOp : GPU_Op<"sddmm_buffer_size", [GPU_AsyncOpInterface]> {		def GPU_SDDMMBufferSizeOp : GPU_Op<"sddmm_buffer_size", [GPU_AsyncOpInterface]> {
let summary = "Precompute buffersize for SDDMM operation";		let summary = "Precompute buffersize for SDDMM operation";
let description = [{		let description = [{
Show All 18 Lines	def GPU_SDDMMBufferSizeOp : GPU_Op<"sddmm_buffer_size", [GPU_AsyncOpInterface]> {
}];		}];

let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
GPU_SparseEnvHandle:$env,		GPU_SparseEnvHandle:$env,
GPU_TransposeModeAttr:$modeA,		GPU_TransposeModeAttr:$modeA,
GPU_TransposeModeAttr:$modeB,		GPU_TransposeModeAttr:$modeB,
GPU_SparseDnMatHandle:$dnmatA,		GPU_SparseDnMatHandle:$dnmatA,
GPU_SparseDnMatHandle:$dnmatB,		GPU_SparseDnMatHandle:$dnmatB,
GPU_SparseSpMatHandle:$spmatC);		GPU_SparseSpMatHandle:$spmatC,
		OptionalAttr<TypeAttr>:$computeType);
let results = (outs Res<Index>:$bufferSz, Optional<GPU_AsyncToken>:$asyncToken);		let results = (outs Res<Index>:$bufferSz, Optional<GPU_AsyncToken>:$asyncToken);

let builders = [OpBuilder<(ins		let builders = [OpBuilder<(ins
"::mlir::Type":$bufferSz,		"::mlir::Type":$bufferSz,
"::mlir::Type":$asyncToken,		"::mlir::Type":$asyncToken,
"::mlir::ValueRange":$asyncDependencies,		"::mlir::ValueRange":$asyncDependencies,
"::mlir::Value":$env,		"::mlir::Value":$env,
"::mlir::Value":$dnmatA,		"::mlir::Value":$dnmatA,
"::mlir::Value":$dnmatB,		"::mlir::Value":$dnmatB,
"::mlir::Value":$spmatC), [{		"::mlir::Value":$spmatC), [{
auto modeA = gpu::TransposeMode::NON_TRANSPOSE;		auto modeA = gpu::TransposeMode::NON_TRANSPOSE;
auto modeB = gpu::TransposeMode::NON_TRANSPOSE;		auto modeB = gpu::TransposeMode::NON_TRANSPOSE;
return build($_builder, $_state, bufferSz, asyncToken, asyncDependencies,		return build($_builder, $_state, bufferSz, asyncToken, asyncDependencies,
env, modeA, modeB, dnmatA, dnmatB, spmatC);}]>		env, modeA, modeB, dnmatA, dnmatB, spmatC, {});}]>
];		];

let assemblyFormat = [{		let assemblyFormat = [{
		(`{` $computeType^ `}`)?
custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
$env `,` $dnmatA (`{` $modeA^ `}`)? `,` $dnmatB (`{` $modeB^ `}`)? `,` $spmatC attr-dict		$env `,` $dnmatA (`{` $modeA^ `}`)? `,` $dnmatB (`{` $modeB^ `}`)? `,` $spmatC attr-dict
}];		}];
}		}

def GPU_SDDMMOp : GPU_Op<"sddmm", [GPU_AsyncOpInterface]> {		def GPU_SDDMMOp : GPU_Op<"sddmm", [GPU_AsyncOpInterface]> {
let summary = "SDDMM operation";		let summary = "SDDMM operation";
let description = [{		let description = [{
Show All 19 Lines	def GPU_SDDMMOp : GPU_Op<"sddmm", [GPU_AsyncOpInterface]> {

let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,		let arguments = (ins Variadic<GPU_AsyncToken>:$asyncDependencies,
GPU_SparseEnvHandle:$env,		GPU_SparseEnvHandle:$env,
GPU_TransposeModeAttr:$modeA,		GPU_TransposeModeAttr:$modeA,
GPU_TransposeModeAttr:$modeB,		GPU_TransposeModeAttr:$modeB,
GPU_SparseDnMatHandle:$dnmatA,		GPU_SparseDnMatHandle:$dnmatA,
GPU_SparseDnMatHandle:$dnmatB,		GPU_SparseDnMatHandle:$dnmatB,
GPU_SparseSpMatHandle:$spmatC,		GPU_SparseSpMatHandle:$spmatC,
		OptionalAttr<TypeAttr>:$computeType,
AnyMemRef:$buffer);		AnyMemRef:$buffer);
let results = (outs Optional<GPU_AsyncToken>:$asyncToken);		let results = (outs Optional<GPU_AsyncToken>:$asyncToken);

let builders = [OpBuilder<(ins		let builders = [OpBuilder<(ins
"::mlir::Type":$asyncToken,		"::mlir::Type":$asyncToken,
"::mlir::ValueRange":$asyncDependencies,		"::mlir::ValueRange":$asyncDependencies,
"::mlir::Value":$env,		"::mlir::Value":$env,
"::mlir::Value":$dnmatA,		"::mlir::Value":$dnmatA,
"::mlir::Value":$dnmatB,		"::mlir::Value":$dnmatB,
"::mlir::Value":$spmatC,		"::mlir::Value":$spmatC,
"::mlir::Value":$buffer), [{		"::mlir::Value":$buffer), [{
auto modeA = gpu::TransposeMode::NON_TRANSPOSE;		auto modeA = gpu::TransposeMode::NON_TRANSPOSE;
auto modeB = gpu::TransposeMode::NON_TRANSPOSE;		auto modeB = gpu::TransposeMode::NON_TRANSPOSE;
return build($_builder, $_state, asyncToken, asyncDependencies, env, modeA,		return build($_builder, $_state, asyncToken, asyncDependencies, env, modeA,
modeB, dnmatA, dnmatB, spmatC, buffer);}]>		modeB, dnmatA, dnmatB, spmatC, {}, buffer);}]>
];		];

let assemblyFormat = [{		let assemblyFormat = [{
		(`{` $computeType^ `}`)?
custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)		custom<AsyncDependencies>(type($asyncToken), $asyncDependencies)
$env `,` $dnmatA (`{` $modeA^ `}`)? `,` $dnmatB (`{` $modeB^ `}`)? `,` $spmatC `,` $buffer attr-dict `:` type($buffer)		$env `,` $dnmatA (`{` $modeA^ `}`)? `,` $dnmatB (`{` $modeB^ `}`)? `,` $spmatC `,` $buffer attr-dict `:` type($buffer)
}];		}];
}		}

#endif // GPU_OPS		#endif // GPU_OPS

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp

Show First 20 Lines • Show All 232 Lines • ▼ Show 20 Lines protected:

FunctionCallBuilder destroySpMatCallBuilder = { FunctionCallBuilder destroySpMatCallBuilder = {

"mgpuDestroySpMat", "mgpuDestroySpMat",

llvmVoidType, llvmVoidType,

{llvmPointerType, llvmPointerType /* void *stream */}}; {llvmPointerType, llvmPointerType /* void *stream */}};

FunctionCallBuilder spMVBufferSizeCallBuilder = { FunctionCallBuilder spMVBufferSizeCallBuilder = {

"mgpuSpMVBufferSize", "mgpuSpMVBufferSize",

llvmIntPtrType, llvmIntPtrType,

{llvmPointerType, llvmInt32Type, llvmPointerType, llvmPointerType, {llvmPointerType, llvmInt32Type, llvmPointerType, llvmPointerType,

llvmPointerType, llvmInt32Type, llvmPointerType /* void *stream */}}; llvmPointerType, llvmInt32Type, llvmInt32Type,

llvmPointerType /* void *stream */}};

FunctionCallBuilder spMVCallBuilder = { FunctionCallBuilder spMVCallBuilder = {

"mgpuSpMV", "mgpuSpMV",

llvmVoidType, llvmVoidType,

{llvmPointerType, llvmInt32Type, llvmPointerType, llvmPointerType, {llvmPointerType, llvmInt32Type, llvmPointerType, llvmPointerType,

llvmPointerType, llvmInt32Type, llvmPointerType, llvmPointerType, llvmInt32Type, llvmInt32Type, llvmPointerType,

llvmPointerType /* void *stream */}}; llvmPointerType /* void *stream */}};

FunctionCallBuilder spMMBufferSizeCallBuilder = { FunctionCallBuilder spMMBufferSizeCallBuilder = {

"mgpuSpMMBufferSize", "mgpuSpMMBufferSize",

llvmIntPtrType, llvmIntPtrType,

{llvmPointerType, llvmInt32Type, llvmInt32Type, llvmPointerType, {llvmPointerType, llvmInt32Type, llvmInt32Type, llvmPointerType,

llvmPointerType, llvmPointerType, llvmInt32Type, llvmPointerType, llvmPointerType, llvmInt32Type, llvmInt32Type,

llvmPointerType /* void *stream */}}; llvmPointerType /* void *stream */}};

FunctionCallBuilder spMMCallBuilder = { FunctionCallBuilder spMMCallBuilder = {

"mgpuSpMM", "mgpuSpMM",

llvmVoidType, llvmVoidType,

{llvmPointerType, llvmInt32Type, llvmInt32Type, llvmPointerType, {llvmPointerType, llvmInt32Type, llvmInt32Type, llvmPointerType,

llvmPointerType, llvmPointerType, llvmInt32Type, llvmPointerType, llvmPointerType, llvmPointerType, llvmInt32Type, llvmInt32Type,

llvmPointerType /* void *stream */}}; llvmPointerType, llvmPointerType /* void *stream */}};

FunctionCallBuilder SDDMMBufferSizeCallBuilder = { FunctionCallBuilder SDDMMBufferSizeCallBuilder = {

"mgpuSDDMMBufferSize", "mgpuSDDMMBufferSize",

llvmIntPtrType, llvmIntPtrType,

{llvmPointerType, llvmInt32Type, llvmInt32Type, llvmPointerType, {llvmPointerType, llvmInt32Type, llvmInt32Type, llvmPointerType,

llvmPointerType, llvmPointerType, llvmInt32Type, llvmPointerType, llvmPointerType, llvmInt32Type, llvmInt32Type,

llvmPointerType /* void *stream */}}; llvmPointerType /* void *stream */}};

FunctionCallBuilder SDDMMCallBuilder = { FunctionCallBuilder SDDMMCallBuilder = {

"mgpuSDDMM", "mgpuSDDMM",

llvmVoidType, llvmVoidType,

{llvmPointerType, llvmInt32Type, llvmInt32Type, llvmPointerType, {llvmPointerType, llvmInt32Type, llvmInt32Type, llvmPointerType,

llvmPointerType, llvmPointerType, llvmInt32Type, llvmPointerType, llvmPointerType, llvmPointerType, llvmInt32Type, llvmInt32Type,

llvmPointerType /* void *stream */}}; llvmPointerType, llvmPointerType /* void *stream */}};

}; };

/// A rewrite pattern to convert gpu.host_register operations into a GPU runtime /// A rewrite pattern to convert gpu.host_register operations into a GPU runtime

/// call. Currently it supports CUDA and ROCm (HIP). /// call. Currently it supports CUDA and ROCm (HIP).

class ConvertHostRegisterOpToGpuRuntimeCallPattern class ConvertHostRegisterOpToGpuRuntimeCallPattern

: public ConvertOpToGpuRuntimeCallPattern<gpu::HostRegisterOp> { : public ConvertOpToGpuRuntimeCallPattern<gpu::HostRegisterOp> {

public: public:

ConvertHostRegisterOpToGpuRuntimeCallPattern(LLVMTypeConverter &typeConverter) ConvertHostRegisterOpToGpuRuntimeCallPattern(LLVMTypeConverter &typeConverter)

▲ Show 20 Lines • Show All 403 Lines • ▼ Show 20 Lines auto function = [&] {

if (auto function = module.lookupSymbol<LLVM::LLVMFuncOp>(functionName)) if (auto function = module.lookupSymbol<LLVM::LLVMFuncOp>(functionName))

return function; return function;

return OpBuilder::atBlockEnd(module.getBody()) return OpBuilder::atBlockEnd(module.getBody())

.create<LLVM::LLVMFuncOp>(loc, functionName, functionType); .create<LLVM::LLVMFuncOp>(loc, functionName, functionType);

}(); }();

return builder.create<LLVM::CallOp>(loc, function, arguments); return builder.create<LLVM::CallOp>(loc, function, arguments);

} }

// corresponding to cudaDataType_t defined in library_types.h

aartbikUnsubmitted

Done

Start with capital C and end with period

aartbik: Start with capital C and end with period

// TODO: add support to complex types

static int32_t getCUSparseDataTypeEnumFrom(Type type) {

if (type.isBF16())

PeimingUnsubmitted

Not Done

Here and Below: Do you think it is better to copy the CUSPARSE macro to the file (but probably adding some prefix) so that it can be updated more easily later? Or does the current way already follows the same convention in the file?

Peiming: Here and Below: Do you think it is better to copy the CUSPARSE macro to the file (but probably…

K-WuAuthorUnsubmitted

Done

It is kind of the reverse. Here we map types to integers, but in the cuda header files these enum class definitions are essentially mapping types to an integer, I.e., assigning an integer to each enum item. You raise a good point that we need to have better way to keep in sync, though the CUDA headers seem to keep the mapping of already existing types the same when adding new types probably as a way to keep backward compatibility. I will need to think about how to do that in the future.

K-Wu: It is kind of the reverse. Here we map types to integers, but in the cuda header files these…

return 14;

if (type.isF16())

return 2;

if (type.isF32())

return 0;

if (type.isF64())

return 1;

if (type.isInteger(8))

return 3;

if (type.isInteger(32))

return 10;

llvm_unreachable("unsupported element type");

}

// Returns whether all operands are of LLVM type. // Returns whether all operands are of LLVM type.

static LogicalResult areAllLLVMTypes(Operation *op, ValueRange operands, static LogicalResult areAllLLVMTypes(Operation *op, ValueRange operands,

ConversionPatternRewriter &rewriter) { ConversionPatternRewriter &rewriter) {

if (!llvm::all_of(operands, [](Value value) { if (!llvm::all_of(operands, [](Value value) {

return LLVM::isCompatibleType(value.getType()); return LLVM::isCompatibleType(value.getType());

})) }))

return rewriter.notifyMatchFailure( return rewriter.notifyMatchFailure(

op, "Cannot convert if operands aren't of LLVM type."); op, "Cannot convert if operands aren't of LLVM type.");

▲ Show 20 Lines • Show All 530 Lines • ▼ Show 20 Lines

// Returns the element type of the defining spmat op. // Returns the element type of the defining spmat op.

// TODO: safer and more flexible to store data type in actual op instead? // TODO: safer and more flexible to store data type in actual op instead?

static Type getSpMatElemType(Value spMat) { static Type getSpMatElemType(Value spMat) {

if (auto op = spMat.getDefiningOp<gpu::CreateCooOp>()) if (auto op = spMat.getDefiningOp<gpu::CreateCooOp>())

return llvm::cast<MemRefType>(op.getValues().getType()).getElementType(); return llvm::cast<MemRefType>(op.getValues().getType()).getElementType();

if (auto op = spMat.getDefiningOp<gpu::CreateCsrOp>()) if (auto op = spMat.getDefiningOp<gpu::CreateCsrOp>())

return llvm::cast<MemRefType>(op.getValues().getType()).getElementType(); return llvm::cast<MemRefType>(op.getValues().getType()).getElementType();

// the defining op may also be llvm.call after partial lowering

if (auto op = spMat.getDefiningOp<LLVM::CallOp>()) {

PeimingUnsubmitted

Done

I don't think you need it (same below for getDnxxx)

Peiming: I don't think you need it (same below for getDnxxx)

if (op.getCallee() == "mgpuCreateCsr") {

mlir::Attribute dw =

op.getOperand(8).getDefiningOp<LLVM::ConstantOp>().getValue();

if (!getConstantIntValue(dw).has_value()) {

K-WuAuthorUnsubmitted

Done

Seeking ways for easier-to-read literal comparison

K-Wu: Seeking ways for easier-to-read literal comparison

PeimingUnsubmitted

Done

You should be able to perform the compare without type conversion. It is essentially a pointer comparison and they will share the same underlying storage if they are equal.

Peiming: You should be able to perform the compare without type conversion. It is essentially a pointer…

K-WuAuthorUnsubmitted

Done

I wish I could have known it earlier. Thank you!

K-Wu: I wish I could have known it earlier. Thank you!

llvm_unreachable("expecting dw to be a constant but not");

}

auto dw_ = getConstantIntValue(dw).value();

wrengrUnsubmitted

Done

if (op.getCallee() == "mgpuCreateCsr") {

- mlir::Attribute dw =

- op.getOperand(8).getDefiningOp<LLVM::ConstantOp>().getValue();

- if (!getConstantIntValue(dw).has_value()) {

+ const auto dw =

+ getConstantIntValue(op.getOperand(8).getDefiningOp<LLVM::ConstantOp>().getValue());

+ if (!dw)

llvm_unreachable("expecting dw to be a constant but not");

- }

- auto dw_ = getConstantIntValue(dw).value();

+ const auto dw_ = *dw;

if (dw_ == 32)

It would be cleaner to just call getConstantIntValue once and save the results. I've attached an edit to show what I mean. (Though really, there's not a whole lot of benefit to the dw_ variable, so you might as well just use *dw directly wherever needed)

Though really this whole pattern should be floated out to a helper function, including the parts that check *dw and return the corresponding FloatType— since the only difference between the version here and the version for mgpuCreateCOO is which op.getOperand is used.

wrengr: It would be cleaner to just call `getConstantIntValue` once and save the results. I've attached…

if (dw_ == 32)

return FloatType::getF32(spMat.getContext());

else if (dw_ == 64)

return FloatType::getF64(spMat.getContext());

} else if (op.getCallee() == "mgpuCreateCoo") {

mlir::Attribute dw =

op.getOperand(7).getDefiningOp<LLVM::ConstantOp>().getValue();

if (!getConstantIntValue(dw).has_value()) {

llvm_unreachable("expecting dw to be a constant but not");

}

auto dw_ = getConstantIntValue(dw).value();

if (dw_ == 32)

return FloatType::getF32(spMat.getContext());

else if (dw_ == 64)

return FloatType::getF64(spMat.getContext());

}

llvm_unreachable("cannot find spmat def"); llvm_unreachable("cannot find spmat def");

} }

// Returns the element type of the defining dnmat or dnvec op.

static Type getDnElemType(Value dn) {

if (auto op = dn.getDefiningOp<gpu::CreateDnMatOp>())

return llvm::cast<MemRefType>(op.getMemref().getType()).getElementType();

wrengrUnsubmitted

Done

Since gpu::CreateDnMatOp::getMemref returns a TypedValue<MemRefType>, the TypedValue<MemRefType>::getType method will already return a MemRefType, therefore you don't need the llvm::cast

wrengr: Since `gpu::CreateDnMatOp::getMemref` returns a `TypedValue<MemRefType>`, the…

if (auto op = dn.getDefiningOp<gpu::CreateDnVecOp>())

return llvm::cast<MemRefType>(op.getMemref().getType()).getElementType();

// the defining op may also be llvm.call after partial lowering

if (auto op = dn.getDefiningOp<LLVM::CallOp>()) {

if (op.getCallee() == "mgpuCreateDnVec") {

mlir::Attribute dw =

op.getOperand(2).getDefiningOp<LLVM::ConstantOp>().getValue();

if (!getConstantIntValue(dw).has_value()) {

llvm_unreachable("expecting dw to be a constant but not");

}

auto dw_ = getConstantIntValue(dw).value();

if (dw_ == 32)

return FloatType::getF32(dn.getContext());

else if (dw_ == 64)

return FloatType::getF64(dn.getContext());

wrengrUnsubmitted

Done

This is yet another place that uses the same function for converting a Value (which has LLVM::ConstantOp defining op) into the appropriate FloatType...

wrengr: This is yet another place that uses the same function for converting a `Value` (which has `LLVM…

} else if (op.getCallee() == "mgpuCreateDnMat") {

mlir::Attribute dw =

op.getOperand(3).getDefiningOp<LLVM::ConstantOp>().getValue();

if (!getConstantIntValue(dw).has_value()) {

llvm_unreachable("expecting dw to be a constant but not");

}

auto dw_ = getConstantIntValue(dw).value();

if (dw_ == 32)

return FloatType::getF32(dn.getContext());

else if (dw_ == 64)

return FloatType::getF64(dn.getContext());

}

llvm_unreachable("cannot find dn def");

}

static Value genConstFrom(OpBuilder &builder, Location loc, static Value genConstFrom(OpBuilder &builder, Location loc,

gpu::TransposeMode mode) { gpu::TransposeMode mode) {

Type llvmInt32Type = builder.getIntegerType(32); Type llvmInt32Type = builder.getIntegerType(32);

return builder.create<LLVM::ConstantOp>(loc, llvmInt32Type, return builder.create<LLVM::ConstantOp>(loc, llvmInt32Type,

static_cast<int32_t>(mode)); static_cast<int32_t>(mode));

} }

static Value genConstFrom(OpBuilder &builder, Location loc,

int32_t computeTypeInt) {

PeimingUnsubmitted

Done

No longer needed?

Peiming: No longer needed?

K-WuAuthorUnsubmitted

Done

Good catch! Addressed in new commit

K-Wu: Good catch! Addressed in new commit

Type llvmInt32Type = builder.getIntegerType(32);

return builder.create<LLVM::ConstantOp>(loc, llvmInt32Type, computeTypeInt);

}

PeimingUnsubmitted

Done

nit: maybe renaming to getConstInt32From?

Peiming: nit: maybe renaming to `getConstInt32From`?

static Value

genConstFromOptionalComputeMode(OpBuilder &builder, Location loc,

std::optional<Type> computeTypeOptional,

PeimingUnsubmitted

Done

nit: getConstInt32xxx

Peiming: nit: `getConstInt32xxx`

Type defaultType) {

auto computeTypeInt = getCUSparseDataTypeEnumFrom(

computeTypeOptional.has_value() ? computeTypeOptional.value()

: defaultType);

wrengrUnsubmitted

Done

You can replace this with just computeTypeOptional.value_or(defaultType) https://en.cppreference.com/w/cpp/utility/optional/value_or

wrengr: You can replace this with just `computeTypeOptional.value_or(defaultType)` https://en.

auto computeType = genConstFrom(builder, loc, computeTypeInt);

return computeType;

}

LogicalResult ConvertCreateSparseEnvOpToGpuRuntimeCallPattern::matchAndRewrite( LogicalResult ConvertCreateSparseEnvOpToGpuRuntimeCallPattern::matchAndRewrite(

gpu::CreateSparseEnvOp op, OpAdaptor adaptor, gpu::CreateSparseEnvOp op, OpAdaptor adaptor,

ConversionPatternRewriter &rewriter) const { ConversionPatternRewriter &rewriter) const {

if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) || if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) ||

failed(isAsyncWithOneDependency(rewriter, op))) failed(isAsyncWithOneDependency(rewriter, op)))

return failure(); return failure();

Location loc = op.getLoc(); Location loc = op.getLoc();

auto stream = adaptor.getAsyncDependencies().front(); auto stream = adaptor.getAsyncDependencies().front();

▲ Show 20 Lines • Show All 188 Lines • ▼ Show 20 Lines LogicalResult ConvertSpMVBufferSizeOpToGpuRuntimeCallPattern::matchAndRewrite(

if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) || if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) ||

failed(isAsyncWithOneDependency(rewriter, op))) failed(isAsyncWithOneDependency(rewriter, op)))

return failure(); return failure();

Location loc = op.getLoc(); Location loc = op.getLoc();

auto modeA = genConstFrom(rewriter, loc, op.getModeA()); auto modeA = genConstFrom(rewriter, loc, op.getModeA());

Type dType = getSpMatElemType(op.getSpmatA()); Type dType = getSpMatElemType(op.getSpmatA());

auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type, auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type,

dType.getIntOrFloatBitWidth()); dType.getIntOrFloatBitWidth());

// retrieve the compute type, notice that it may be optional

auto computeType = genConstFromOptionalComputeMode(

rewriter, loc, adaptor.getComputeType(), getDnElemType(adaptor.getDnY()));

auto stream = adaptor.getAsyncDependencies().front(); auto stream = adaptor.getAsyncDependencies().front();

auto bufferSize = auto bufferSize =

spMVBufferSizeCallBuilder spMVBufferSizeCallBuilder

.create(loc, rewriter, .create(loc, rewriter,

{adaptor.getEnv(), modeA, adaptor.getSpmatA(), {adaptor.getEnv(), modeA, adaptor.getSpmatA(),

adaptor.getDnX(), adaptor.getDnY(), dw, stream}) adaptor.getDnX(), adaptor.getDnY(), dw, computeType, stream})

.getResult(); .getResult();

rewriter.replaceOp(op, {bufferSize, stream}); rewriter.replaceOp(op, {bufferSize, stream});

return success(); return success();

} }

LogicalResult ConvertSpMVOpToGpuRuntimeCallPattern::matchAndRewrite( LogicalResult ConvertSpMVOpToGpuRuntimeCallPattern::matchAndRewrite(

gpu::SpMVOp op, OpAdaptor adaptor, gpu::SpMVOp op, OpAdaptor adaptor,

ConversionPatternRewriter &rewriter) const { ConversionPatternRewriter &rewriter) const {

if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) || if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) ||

failed(isAsyncWithOneDependency(rewriter, op))) failed(isAsyncWithOneDependency(rewriter, op)))

return failure(); return failure();

Location loc = op.getLoc(); Location loc = op.getLoc();

Type dType = getSpMatElemType(op.getSpmatA()); Type dType = getSpMatElemType(op.getSpmatA());

auto modeA = genConstFrom(rewriter, loc, adaptor.getModeA()); auto modeA = genConstFrom(rewriter, loc, adaptor.getModeA());

auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type, auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type,

dType.getIntOrFloatBitWidth()); dType.getIntOrFloatBitWidth());

// retrieve the compute type, notice that it may be optional

auto computeType = genConstFromOptionalComputeMode(

rewriter, loc, adaptor.getComputeType(), getDnElemType(adaptor.getDnY()));

auto stream = adaptor.getAsyncDependencies().front(); auto stream = adaptor.getAsyncDependencies().front();

Value pBuf = Value pBuf =

MemRefDescriptor(adaptor.getBuffer()).allocatedPtr(rewriter, loc); MemRefDescriptor(adaptor.getBuffer()).allocatedPtr(rewriter, loc);

if (!getTypeConverter()->useOpaquePointers()) if (!getTypeConverter()->useOpaquePointers())

pBuf = rewriter.create<LLVM::BitcastOp>(loc, llvmPointerType, pBuf); pBuf = rewriter.create<LLVM::BitcastOp>(loc, llvmPointerType, pBuf);

spMVCallBuilder.create(loc, rewriter, spMVCallBuilder.create(loc, rewriter,

{adaptor.getEnv(), modeA, adaptor.getSpmatA(), {adaptor.getEnv(), modeA, adaptor.getSpmatA(),

adaptor.getDnX(), adaptor.getDnY(), dw, pBuf, adaptor.getDnX(), adaptor.getDnY(), dw, computeType,

stream}); pBuf, stream});

rewriter.replaceOp(op, {stream}); rewriter.replaceOp(op, {stream});

return success(); return success();

} }

LogicalResult ConvertSpMMBufferSizeOpToGpuRuntimeCallPattern::matchAndRewrite( LogicalResult ConvertSpMMBufferSizeOpToGpuRuntimeCallPattern::matchAndRewrite(

gpu::SpMMBufferSizeOp op, OpAdaptor adaptor, gpu::SpMMBufferSizeOp op, OpAdaptor adaptor,

ConversionPatternRewriter &rewriter) const { ConversionPatternRewriter &rewriter) const {

if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) || if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) ||

failed(isAsyncWithOneDependency(rewriter, op))) failed(isAsyncWithOneDependency(rewriter, op)))

return failure(); return failure();

Location loc = op.getLoc(); Location loc = op.getLoc();

auto modeA = genConstFrom(rewriter, loc, adaptor.getModeA()); auto modeA = genConstFrom(rewriter, loc, adaptor.getModeA());

auto modeB = genConstFrom(rewriter, loc, adaptor.getModeB()); auto modeB = genConstFrom(rewriter, loc, adaptor.getModeB());

Type dType = getSpMatElemType(op.getSpmatA()); Type dType = getSpMatElemType(op.getSpmatA());

auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type, auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type,

dType.getIntOrFloatBitWidth()); dType.getIntOrFloatBitWidth());

auto stream = adaptor.getAsyncDependencies().front(); auto stream = adaptor.getAsyncDependencies().front();

auto bufferSize = // retrieve the compute type, notice that it may be optional

spMMBufferSizeCallBuilder auto computeType =

genConstFromOptionalComputeMode(rewriter, loc, adaptor.getComputeType(),

getDnElemType(adaptor.getDnmatC()));

auto bufferSize = spMMBufferSizeCallBuilder

.create(loc, rewriter, .create(loc, rewriter,

{adaptor.getEnv(), modeA, modeB, adaptor.getSpmatA(), {adaptor.getEnv(), modeA, modeB,

adaptor.getDnmatB(), adaptor.getDnmatC(), dw, stream}) adaptor.getSpmatA(), adaptor.getDnmatB(),

adaptor.getDnmatC(), dw, computeType, stream})

.getResult(); .getResult();

rewriter.replaceOp(op, {bufferSize, stream}); rewriter.replaceOp(op, {bufferSize, stream});

return success(); return success();

} }

LogicalResult ConvertSDDMMBufferSizeOpToGpuRuntimeCallPattern::matchAndRewrite( LogicalResult ConvertSDDMMBufferSizeOpToGpuRuntimeCallPattern::matchAndRewrite(

gpu::SDDMMBufferSizeOp op, OpAdaptor adaptor, gpu::SDDMMBufferSizeOp op, OpAdaptor adaptor,

ConversionPatternRewriter &rewriter) const { ConversionPatternRewriter &rewriter) const {

if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) || if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) ||

failed(isAsyncWithOneDependency(rewriter, op))) failed(isAsyncWithOneDependency(rewriter, op)))

return failure(); return failure();

Location loc = op.getLoc(); Location loc = op.getLoc();

auto modeA = genConstFrom(rewriter, loc, adaptor.getModeA()); auto modeA = genConstFrom(rewriter, loc, adaptor.getModeA());

auto modeB = genConstFrom(rewriter, loc, adaptor.getModeB()); auto modeB = genConstFrom(rewriter, loc, adaptor.getModeB());

Type dType = getSpMatElemType(op.getSpmatC()); Type dType = getSpMatElemType(op.getSpmatC());

auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type, auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type,

dType.getIntOrFloatBitWidth()); dType.getIntOrFloatBitWidth());

auto computeType =

genConstFromOptionalComputeMode(rewriter, loc, adaptor.getComputeType(),

getSpMatElemType(adaptor.getSpmatC()));

auto stream = adaptor.getAsyncDependencies().front(); auto stream = adaptor.getAsyncDependencies().front();

auto bufferSize = auto bufferSize = SDDMMBufferSizeCallBuilder

SDDMMBufferSizeCallBuilder

.create(loc, rewriter, .create(loc, rewriter,

{adaptor.getEnv(), modeA, modeB, adaptor.getDnmatA(), {adaptor.getEnv(), modeA, modeB,

adaptor.getDnmatB(), adaptor.getSpmatC(), dw, stream}) adaptor.getDnmatA(), adaptor.getDnmatB(),

adaptor.getSpmatC(), dw, computeType, stream})

.getResult(); .getResult();

rewriter.replaceOp(op, {bufferSize, stream}); rewriter.replaceOp(op, {bufferSize, stream});

return success(); return success();

} }

LogicalResult ConvertSpMMOpToGpuRuntimeCallPattern::matchAndRewrite( LogicalResult ConvertSpMMOpToGpuRuntimeCallPattern::matchAndRewrite(

gpu::SpMMOp op, OpAdaptor adaptor, gpu::SpMMOp op, OpAdaptor adaptor,

ConversionPatternRewriter &rewriter) const { ConversionPatternRewriter &rewriter) const {

if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) || if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) ||

failed(isAsyncWithOneDependency(rewriter, op))) failed(isAsyncWithOneDependency(rewriter, op)))

return failure(); return failure();

Location loc = op.getLoc(); Location loc = op.getLoc();

auto modeA = genConstFrom(rewriter, loc, adaptor.getModeA()); auto modeA = genConstFrom(rewriter, loc, adaptor.getModeA());

auto modeB = genConstFrom(rewriter, loc, adaptor.getModeB()); auto modeB = genConstFrom(rewriter, loc, adaptor.getModeB());

Type dType = getSpMatElemType(op.getSpmatA()); Type dType = getSpMatElemType(op.getSpmatA());

auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type, auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type,

dType.getIntOrFloatBitWidth()); dType.getIntOrFloatBitWidth());

// retrieve the compute type, notice that it may be optional

auto computeType =

genConstFromOptionalComputeMode(rewriter, loc, adaptor.getComputeType(),

getDnElemType(adaptor.getDnmatC()));

auto stream = adaptor.getAsyncDependencies().front(); auto stream = adaptor.getAsyncDependencies().front();

Value pBuf = Value pBuf =

MemRefDescriptor(adaptor.getBuffer()).allocatedPtr(rewriter, loc); MemRefDescriptor(adaptor.getBuffer()).allocatedPtr(rewriter, loc);

if (!getTypeConverter()->useOpaquePointers()) if (!getTypeConverter()->useOpaquePointers())

pBuf = rewriter.create<LLVM::BitcastOp>(loc, llvmPointerType, pBuf); pBuf = rewriter.create<LLVM::BitcastOp>(loc, llvmPointerType, pBuf);

spMMCallBuilder.create(loc, rewriter, spMMCallBuilder.create(loc, rewriter,

{adaptor.getEnv(), modeA, modeB, adaptor.getSpmatA(), {adaptor.getEnv(), modeA, modeB, adaptor.getSpmatA(),

adaptor.getDnmatB(), adaptor.getDnmatC(), dw, pBuf, adaptor.getDnmatB(), adaptor.getDnmatC(), dw,

stream}); computeType, pBuf, stream});

rewriter.replaceOp(op, {stream}); rewriter.replaceOp(op, {stream});

return success(); return success();

} }

template <typename T> template <typename T>

static void addOpaquePointerConversion(LLVMTypeConverter &converter) { static void addOpaquePointerConversion(LLVMTypeConverter &converter) {

converter.addConversion([&converter](T) -> Type { converter.addConversion([&converter](T) -> Type {

return converter.getPointerType( return converter.getPointerType(

IntegerType::get(&converter.getContext(), 8)); IntegerType::get(&converter.getContext(), 8));

}); });

} }

LogicalResult ConvertSDDMMOpToGpuRuntimeCallPattern::matchAndRewrite( LogicalResult ConvertSDDMMOpToGpuRuntimeCallPattern::matchAndRewrite(

gpu::SDDMMOp op, OpAdaptor adaptor, gpu::SDDMMOp op, OpAdaptor adaptor,

ConversionPatternRewriter &rewriter) const { ConversionPatternRewriter &rewriter) const {

if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) || if (failed(areAllLLVMTypes(op, adaptor.getOperands(), rewriter)) ||

failed(isAsyncWithOneDependency(rewriter, op))) failed(isAsyncWithOneDependency(rewriter, op)))

return failure(); return failure();

Location loc = op.getLoc(); Location loc = op.getLoc();

Type dType = getSpMatElemType(op.getSpmatC()); Type dType = getSpMatElemType(op.getSpmatC());

auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type, auto dw = rewriter.create<LLVM::ConstantOp>(loc, llvmInt32Type,

dType.getIntOrFloatBitWidth()); dType.getIntOrFloatBitWidth());

auto computeType =

genConstFromOptionalComputeMode(rewriter, loc, adaptor.getComputeType(),

getSpMatElemType(adaptor.getSpmatC()));

auto modeA = genConstFrom(rewriter, loc, adaptor.getModeA()); auto modeA = genConstFrom(rewriter, loc, adaptor.getModeA());

auto modeB = genConstFrom(rewriter, loc, adaptor.getModeB()); auto modeB = genConstFrom(rewriter, loc, adaptor.getModeB());

auto stream = adaptor.getAsyncDependencies().front(); auto stream = adaptor.getAsyncDependencies().front();

Value pBuf = Value pBuf =

MemRefDescriptor(adaptor.getBuffer()).allocatedPtr(rewriter, loc); MemRefDescriptor(adaptor.getBuffer()).allocatedPtr(rewriter, loc);

if (!getTypeConverter()->useOpaquePointers()) if (!getTypeConverter()->useOpaquePointers())

pBuf = rewriter.create<LLVM::BitcastOp>(loc, llvmPointerType, pBuf); pBuf = rewriter.create<LLVM::BitcastOp>(loc, llvmPointerType, pBuf);

SDDMMCallBuilder.create(loc, rewriter, SDDMMCallBuilder.create(loc, rewriter,

{adaptor.getEnv(), modeA, modeB, adaptor.getDnmatA(), {adaptor.getEnv(), modeA, modeB, adaptor.getDnmatA(),

adaptor.getDnmatB(), adaptor.getSpmatC(), dw, pBuf, adaptor.getDnmatB(), adaptor.getSpmatC(), dw,

stream}); computeType, pBuf, stream});

rewriter.replaceOp(op, {stream}); rewriter.replaceOp(op, {stream});

return success(); return success();

} }

void mlir::populateGpuToLLVMConversionPatterns(LLVMTypeConverter &converter, void mlir::populateGpuToLLVMConversionPatterns(LLVMTypeConverter &converter,

RewritePatternSet &patterns, RewritePatternSet &patterns,

StringRef gpuBinaryAnnotation, StringRef gpuBinaryAnnotation,

bool kernelBarePtrCallConv) { bool kernelBarePtrCallConv) {

Show All 35 Lines

mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp

	Show First 20 Lines • Show All 222 Lines • ▼ Show 20 Lines
	extern "C" MLIR_CUDA_WRAPPERS_EXPORT void mgpuSetDefaultDevice(int32_t device) {			extern "C" MLIR_CUDA_WRAPPERS_EXPORT void mgpuSetDefaultDevice(int32_t device) {
	defaultDevice = device;			defaultDevice = device;
	}			}

	///			///
	/// Wrapper methods for the cuSparse library.			/// Wrapper methods for the cuSparse library.
	///			///

	static inline cudaDataType_t dataTp(int32_t width) {			static inline cudaDataType_t dataTp(int32_t width) {
				aartbikUnsubmitted Done Reply Inline Actions this can go now, right? aartbik: this can go now, right?
	switch (width) {			switch (width) {
	case 32:			case 32:
	return CUDA_R_32F;			return CUDA_R_32F;
	default:			default:
	return CUDA_R_64F;			return CUDA_R_64F;
	}			}
	}			}

	static inline cusparseIndexType_t idxTp(int32_t width) {			static inline cusparseIndexType_t idxTp(int32_t width) {
				aartbikUnsubmitted Done Reply Inline Actions shall we do the same for index to be consistent? aartbik: shall we do the same for index to be consistent?
	switch (width) {			switch (width) {
	case 32:			case 32:
	return CUSPARSE_INDEX_32I;			return CUSPARSE_INDEX_32I;
	default:			default:
	return CUSPARSE_INDEX_64I;			return CUSPARSE_INDEX_64I;
	}			}
	}			}

	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	}			}

	extern "C" MLIR_CUDA_WRAPPERS_EXPORT void			extern "C" MLIR_CUDA_WRAPPERS_EXPORT void
	mgpuDestroyDnMat(void m, CUstream /stream*/) {			mgpuDestroyDnMat(void m, CUstream /stream*/) {
	cusparseDnMatDescr_t mat = reinterpret_cast<cusparseDnMatDescr_t>(m);			cusparseDnMatDescr_t mat = reinterpret_cast<cusparseDnMatDescr_t>(m);
	CUSPARSE_REPORT_IF_ERROR(cusparseDestroyDnMat(mat))			CUSPARSE_REPORT_IF_ERROR(cusparseDestroyDnMat(mat))
	}			}

	extern "C" MLIR_CUDA_WRAPPERS_EXPORT void *			extern "C" MLIR_CUDA_WRAPPERS_EXPORT void *
				aartbikUnsubmitted Done Reply Inline Actions this TODO is done, right? aartbik: this TODO is done, right?
				K-WuAuthorUnsubmitted Done Reply Inline Actions Good catch. Thanks! K-Wu: Good catch. Thanks!
	mgpuCreateCoo(intptr_t rows, intptr_t cols, intptr_t nnz, void *rowIdxs,			mgpuCreateCoo(intptr_t rows, intptr_t cols, intptr_t nnz, void *rowIdxs,
	void colIdxs, void values, int32_t iw, int32_t dw,			void colIdxs, void values, int32_t iw, int32_t dw,
	CUstream /stream/) {			CUstream /stream/) {
	cusparseSpMatDescr_t mat = nullptr;			cusparseSpMatDescr_t mat = nullptr;
	cusparseIndexType_t itp = idxTp(iw);			cusparseIndexType_t itp = idxTp(iw);
	cudaDataType_t dtp = dataTp(dw);			cudaDataType_t dtp = dataTp(dw);
	CUSPARSE_REPORT_IF_ERROR(cusparseCreateCoo(&mat, rows, cols, nnz, rowIdxs,			CUSPARSE_REPORT_IF_ERROR(cusparseCreateCoo(&mat, rows, cols, nnz, rowIdxs,
	colIdxs, values, itp,			colIdxs, values, itp,
	Show All 18 Lines
	extern "C" MLIR_CUDA_WRAPPERS_EXPORT void			extern "C" MLIR_CUDA_WRAPPERS_EXPORT void
	mgpuDestroySpMat(void m, CUstream /stream*/) {			mgpuDestroySpMat(void m, CUstream /stream*/) {
	cusparseSpMatDescr_t mat = reinterpret_cast<cusparseSpMatDescr_t>(m);			cusparseSpMatDescr_t mat = reinterpret_cast<cusparseSpMatDescr_t>(m);
	CUSPARSE_REPORT_IF_ERROR(cusparseDestroySpMat(mat))			CUSPARSE_REPORT_IF_ERROR(cusparseDestroySpMat(mat))
	}			}

	extern "C" MLIR_CUDA_WRAPPERS_EXPORT intptr_t			extern "C" MLIR_CUDA_WRAPPERS_EXPORT intptr_t
	mgpuSpMVBufferSize(void h, int32_t ma, void a, void x, void y, int32_t dw,			mgpuSpMVBufferSize(void h, int32_t ma, void a, void x, void y, int32_t dw,
	CUstream /stream/) {			int32_t dtp, CUstream /stream/) {
	cusparseHandle_t handle = reinterpret_cast<cusparseHandle_t>(h);			cusparseHandle_t handle = reinterpret_cast<cusparseHandle_t>(h);
	cusparseOperation_t modeA = static_cast<cusparseOperation_t>(ma);			cusparseOperation_t modeA = static_cast<cusparseOperation_t>(ma);
	cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);			cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);
	cusparseDnVecDescr_t vecX = reinterpret_cast<cusparseDnVecDescr_t>(x);			cusparseDnVecDescr_t vecX = reinterpret_cast<cusparseDnVecDescr_t>(x);
	cusparseDnVecDescr_t vecY = reinterpret_cast<cusparseDnVecDescr_t>(y);			cusparseDnVecDescr_t vecY = reinterpret_cast<cusparseDnVecDescr_t>(y);
	cudaDataType_t dtp = dataTp(dw);			cudaDataType_t dTp = static_cast<cudaDataType_t>(dtp);
	ALPHABETA(dw, alpha, beta)			ALPHABETA(dw, alpha, beta)
	size_t bufferSize = 0;			size_t bufferSize = 0;
	CUSPARSE_REPORT_IF_ERROR(			CUSPARSE_REPORT_IF_ERROR(
	cusparseSpMV_bufferSize(handle, modeA, alphap, matA, vecX, betap, vecY,			cusparseSpMV_bufferSize(handle, modeA, alphap, matA, vecX, betap, vecY,
	dtp, CUSPARSE_SPMV_ALG_DEFAULT, &bufferSize))			dTp, CUSPARSE_SPMV_ALG_DEFAULT, &bufferSize))
	return bufferSize == 0 ? 1 : bufferSize; // avoid zero-alloc			return bufferSize == 0 ? 1 : bufferSize; // avoid zero-alloc
	}			}

	extern "C" MLIR_CUDA_WRAPPERS_EXPORT void mgpuSpMV(void h, int32_t ma, void a,			extern "C" MLIR_CUDA_WRAPPERS_EXPORT void mgpuSpMV(void h, int32_t ma, void a,
	void x, void y, int32_t dw,			void x, void y, int32_t dw,
	void *buf,			int32_t dtp, void *buf,
	CUstream /stream/) {			CUstream /stream/) {
	cusparseHandle_t handle = reinterpret_cast<cusparseHandle_t>(h);			cusparseHandle_t handle = reinterpret_cast<cusparseHandle_t>(h);
	cusparseOperation_t modeA = static_cast<cusparseOperation_t>(ma);			cusparseOperation_t modeA = static_cast<cusparseOperation_t>(ma);
	cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);			cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);
	cusparseDnVecDescr_t vecX = reinterpret_cast<cusparseDnVecDescr_t>(x);			cusparseDnVecDescr_t vecX = reinterpret_cast<cusparseDnVecDescr_t>(x);
	cusparseDnVecDescr_t vecY = reinterpret_cast<cusparseDnVecDescr_t>(y);			cusparseDnVecDescr_t vecY = reinterpret_cast<cusparseDnVecDescr_t>(y);
	cudaDataType_t dtp = dataTp(dw);			cudaDataType_t dTp = static_cast<cudaDataType_t>(dtp);
	ALPHABETA(dw, alpha, beta)			ALPHABETA(dw, alpha, beta)
	CUSPARSE_REPORT_IF_ERROR(cusparseSpMV(handle, modeA, alphap, matA, vecX,			CUSPARSE_REPORT_IF_ERROR(cusparseSpMV(handle, modeA, alphap, matA, vecX,
	betap, vecY, dtp,			betap, vecY, dTp,
	CUSPARSE_SPMV_ALG_DEFAULT, buf))			CUSPARSE_SPMV_ALG_DEFAULT, buf))
	}			}

	extern "C" MLIR_CUDA_WRAPPERS_EXPORT intptr_t			extern "C" MLIR_CUDA_WRAPPERS_EXPORT intptr_t
	mgpuSpMMBufferSize(void h, int32_t ma, int32_t mb, void a, void b, void c,			mgpuSpMMBufferSize(void h, int32_t ma, int32_t mb, void a, void b, void c,
	int32_t dw, CUstream /stream/) {			int32_t dw, int32_t dtp, CUstream /stream/) {
	cusparseHandle_t handle = reinterpret_cast<cusparseHandle_t>(h);			cusparseHandle_t handle = reinterpret_cast<cusparseHandle_t>(h);
	cusparseOperation_t modeA = static_cast<cusparseOperation_t>(ma);			cusparseOperation_t modeA = static_cast<cusparseOperation_t>(ma);
	cusparseOperation_t modeB = static_cast<cusparseOperation_t>(mb);			cusparseOperation_t modeB = static_cast<cusparseOperation_t>(mb);
	cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);			cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);
	cusparseDnMatDescr_t matB = reinterpret_cast<cusparseDnMatDescr_t>(b);			cusparseDnMatDescr_t matB = reinterpret_cast<cusparseDnMatDescr_t>(b);
	cusparseDnMatDescr_t matC = reinterpret_cast<cusparseDnMatDescr_t>(c);			cusparseDnMatDescr_t matC = reinterpret_cast<cusparseDnMatDescr_t>(c);
	cudaDataType_t dtp = dataTp(dw);			cudaDataType_t dTp = static_cast<cudaDataType_t>(dtp);
	ALPHABETA(dw, alpha, beta)			ALPHABETA(dw, alpha, beta)
	size_t bufferSize = 0;			size_t bufferSize = 0;
	CUSPARSE_REPORT_IF_ERROR(cusparseSpMM_bufferSize(			CUSPARSE_REPORT_IF_ERROR(cusparseSpMM_bufferSize(
	handle, modeA, modeB, alphap, matA, matB, betap, matC, dtp,			handle, modeA, modeB, alphap, matA, matB, betap, matC, dTp,
	CUSPARSE_SPMM_ALG_DEFAULT, &bufferSize))			CUSPARSE_SPMM_ALG_DEFAULT, &bufferSize))
	return bufferSize == 0 ? 1 : bufferSize; // avoid zero-alloc			return bufferSize == 0 ? 1 : bufferSize; // avoid zero-alloc
	}			}

	extern "C" MLIR_CUDA_WRAPPERS_EXPORT void			extern "C" MLIR_CUDA_WRAPPERS_EXPORT void
	mgpuSpMM(void h, int32_t ma, int32_t mb, void a, void b, void c, int32_t dw,			mgpuSpMM(void h, int32_t ma, int32_t mb, void a, void b, void c, int32_t dw,
	void buf, CUstream /stream*/) {			int32_t dtp, void buf, CUstream /stream*/) {
	cusparseHandle_t handle = reinterpret_cast<cusparseHandle_t>(h);			cusparseHandle_t handle = reinterpret_cast<cusparseHandle_t>(h);
	cusparseOperation_t modeA = static_cast<cusparseOperation_t>(ma);			cusparseOperation_t modeA = static_cast<cusparseOperation_t>(ma);
	cusparseOperation_t modeB = static_cast<cusparseOperation_t>(mb);			cusparseOperation_t modeB = static_cast<cusparseOperation_t>(mb);
	cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);			cusparseSpMatDescr_t matA = reinterpret_cast<cusparseSpMatDescr_t>(a);
	cusparseDnMatDescr_t matB = reinterpret_cast<cusparseDnMatDescr_t>(b);			cusparseDnMatDescr_t matB = reinterpret_cast<cusparseDnMatDescr_t>(b);
	cusparseDnMatDescr_t matC = reinterpret_cast<cusparseDnMatDescr_t>(c);			cusparseDnMatDescr_t matC = reinterpret_cast<cusparseDnMatDescr_t>(c);
	cudaDataType_t dtp = dataTp(dw);			cudaDataType_t dTp = static_cast<cudaDataType_t>(dtp);
	ALPHABETA(dw, alpha, beta)			ALPHABETA(dw, alpha, beta)
	CUSPARSE_REPORT_IF_ERROR(cusparseSpMM(handle, modeA, modeB, alphap, matA,			CUSPARSE_REPORT_IF_ERROR(cusparseSpMM(handle, modeA, modeB, alphap, matA,
	matB, betap, matC, dtp,			matB, betap, matC, dTp,
	CUSPARSE_SPMM_ALG_DEFAULT, buf))			CUSPARSE_SPMM_ALG_DEFAULT, buf))
	}			}

	extern "C" MLIR_CUDA_WRAPPERS_EXPORT intptr_t			extern "C" MLIR_CUDA_WRAPPERS_EXPORT intptr_t
	mgpuSDDMMBufferSize(void h, int32_t ma, int32_t mb, void a, void b, void c,			mgpuSDDMMBufferSize(void h, int32_t ma, int32_t mb, void a, void b, void c,
	int32_t dw, CUstream /stream/) {			int32_t dw, CUstream /stream/) {
	cusparseHandle_t handle = reinterpret_cast<cusparseHandle_t>(h);			cusparseHandle_t handle = reinterpret_cast<cusparseHandle_t>(h);
	cusparseOperation_t modeA = static_cast<cusparseOperation_t>(ma);			cusparseOperation_t modeA = static_cast<cusparseOperation_t>(ma);
	Show All 28 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse][gpu] add result type to spmv, spmm and sddmm gpu libgen pathClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 526752

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp

mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp

[mlir][sparse][gpu] add result type to spmv, spmm and sddmm gpu libgen path
ClosedPublic