This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse][gpu] rework CUDA sparse libs environment handle
ClosedPublic

Authored by K-Wu on Jun 16 2023, 2:38 PM.

Details

Summary

As we confirm with the Nvidia people that it is fine to create the environment handle once in the program and use it in multiple streams, I create this revision to rework the environment initialization/destruction to mimic module load/unload mechanism without passing the environment handle around in the dialect.
This allows 1) handy reuse of CUDA environment handle in CUDARuntimeWrapper throughout the instance lifetime, and 2) removal of environment handle in the IR

Diff Detail

Event Timeline

K-Wu created this revision.Jun 16 2023, 2:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2023, 2:38 PM
K-Wu updated this revision to Diff 532302.Jun 16 2023, 3:11 PM

working

K-Wu updated this revision to Diff 532303.Jun 16 2023, 3:11 PM

reformat

K-Wu updated this revision to Diff 532312.Jun 16 2023, 3:39 PM

working runtime

K-Wu updated this revision to Diff 535051.Jun 27 2023, 10:41 AM

rebase origin/main

K-Wu updated this revision to Diff 535115.Jun 27 2023, 1:31 PM

rm handles in the GPU dialect

K-Wu retitled this revision from [mlir][sparse][gpu] reuse CUDA environment handle throughout instance lifetime to [mlir][sparse][gpu] rework CUDA environment handle throughout instance lifetime.Jun 27 2023, 1:34 PM
K-Wu edited the summary of this revision. (Show Details)
K-Wu updated this revision to Diff 535118.Jun 27 2023, 1:35 PM

rm useless None flag

K-Wu edited the summary of this revision. (Show Details)Jun 27 2023, 1:37 PM
K-Wu added reviewers: Peiming, wrengr, bixia, yinying-lisa-li.
K-Wu published this revision for review.Jun 27 2023, 1:39 PM
K-Wu retitled this revision from [mlir][sparse][gpu] rework CUDA environment handle throughout instance lifetime to [mlir][sparse][gpu] rework CUDA sparse libs environment handle.
Peiming added inline comments.Jun 27 2023, 1:55 PM
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1558

Then, this is no longer an Async operation right?

1587

ditto

aartbik added inline comments.Jun 27 2023, 4:37 PM
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1558

I am even wondering if we can't just get rid of these two GPU ops altogether, and demand that the client (sparse compiler in this case) generates a proper cudaRTWrapper at the module start and end. That way, we can also document thread safety issue, i.e. single thread setup/breakdown

mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
109

this feels very thread unsafe!

I would expect something like a static initializer to take care of this.
Right now, the setup is done on an executing thread, so having more than one gets in trouble

318–319

I think we should not have this create/destroy, but a simple startModule/endModule (with restrictions that they are called in certain ways)
and then use the handle below.

mlir/test/Conversion/GPUCommon/lower-sparse-to-gpu-runtime-calls.mlir
23

it does not make sense to get a token on this anymore, as Peiming said above
but more importantly, I think we need a module level setup

K-Wu added inline comments.Jun 27 2023, 4:50 PM
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1558

These comments make sense. I will address them. Thank you both!

K-Wu updated this revision to Diff 535568.Jun 28 2023, 5:04 PM

rebase origin/main

K-Wu updated this revision to Diff 536344.Jun 30 2023, 11:46 AM

rm create/destroy sparse env from gpu dialect

K-Wu updated this revision to Diff 536345.Jun 30 2023, 11:47 AM

clean up

K-Wu marked 5 inline comments as done.Jun 30 2023, 11:49 AM
K-Wu added inline comments.
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1558

good catch! Now it is completely gone. And because we don't need to pass stream to the @mgpu call, we determine to use llvm.call to initialize and destroy environments, and completely remove the GPU dialect environment handle creation/destruction op

K-Wu updated this revision to Diff 536347.Jun 30 2023, 11:52 AM
K-Wu marked an inline comment as done.

addressing thread safety hopefully

K-Wu added inline comments.Jun 30 2023, 11:53 AM
mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
109

I moved the initialization into the create env functions. How does it look?

K-Wu updated this revision to Diff 536350.Jun 30 2023, 11:56 AM

fix compile error

K-Wu updated this revision to Diff 536352.Jun 30 2023, 12:04 PM

fix test error

aartbik added inline comments.Jun 30 2023, 12:17 PM
mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
83

Remove all this scoped stuff and initializer bool all together in favor of

(1) a single static handle handle
(2) create/destroy methods

then you can simply have

Single handle shared between all cuSparse calls. The client is responsible
for calling mgpuCreateSparseEnv() and d mgpuDestroySparseEnv()
// on entering and exiting the module containing sparsified GPU code.
static cusparseHandle_t env = nullptr;

void mgpuCreateSparseEnv() }

assert(!handle);
CUSPARSE_REPORT_IF_ERROR(cusparseCreate(&handle)));

}

extern "C" MLIR_CUDA_WRAPPERS_EXPORT void mgpuDestroySparseEnv() {

assert(handle);
CUSPARSE_REPORT_IF_ERROR(cusparseDestroy(handle))
handle = nullptr;

}

and then all methods have

extern "C" MLIR_CUDA_WRAPPERS_EXPORT void mgpuSpMV(int32_t ma, void *a, void *x,

                                                void *y, int32_t ctp,
                                                void *buf,
                                                CUstream /*stream*/) {
assert(handle) && "client did not call mgpuCreateSparseEnv()";
mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sm80-lt/sparse-matmul-2-4-lib.mlir
31–35

can we move this all the way up to main() just as illustration of what a sparse compiler should generate?

K-Wu updated this revision to Diff 536376.Jun 30 2023, 12:42 PM

addressing comments

K-Wu marked an inline comment as done.Jun 30 2023, 12:43 PM
K-Wu added inline comments.
mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sm80-lt/sparse-matmul-2-4-lib.mlir
31–35

Good point! I am working on it.

K-Wu updated this revision to Diff 536377.Jun 30 2023, 12:46 PM

add runtime check

K-Wu updated this revision to Diff 536378.Jun 30 2023, 12:49 PM

add todo

K-Wu updated this revision to Diff 536380.Jun 30 2023, 12:51 PM
K-Wu marked an inline comment as done.

addressing comments

K-Wu added inline comments.Jun 30 2023, 12:53 PM
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
503

I also noted TODO here

mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
83

Addressed. Let me know you thoughts!

K-Wu updated this revision to Diff 536384.Jun 30 2023, 12:59 PM

add some doc

aartbik added inline comments.Jun 30 2023, 1:46 PM
mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
1635

note that we can now also remove the EnvHandle type completely from

google3/third_party/llvm/llvm-project/mlir/include/mlir/Dialect/GPU/IR/GPUBase.td

K-Wu updated this revision to Diff 536405.Jun 30 2023, 1:48 PM

removing unused sparse env handle type

K-Wu marked 2 inline comments as done.Jun 30 2023, 1:49 PM
K-Wu updated this revision to Diff 536406.Jun 30 2023, 1:50 PM

fixing error

K-Wu updated this revision to Diff 536408.Jun 30 2023, 1:52 PM

removing emitting llvm.calls in SparseGPUCodegen.cpp

aartbik added inline comments.Jun 30 2023, 1:53 PM
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
63

I believe codegen already provides

createFuncCall(rewriter, loc, "foo", {}, {}, EmitCInterface::Off);

for this?

but, not needed per our convention

461

remove all this

we assume client will do this

611

yeah, none of the env stuff in this file

mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp
319

document that scoped context is for cinit

320

I would do

assert(!cusparse_env)

so we detect double calls in debug mode

539

assert

mlir/test/Dialect/SparseTensor/GPU/gpu_matvec_lib.mlir
48–50

don't

K-Wu updated this revision to Diff 536417.Jun 30 2023, 2:02 PM

fix test errors

K-Wu marked 7 inline comments as done.Jun 30 2023, 2:23 PM
K-Wu updated this revision to Diff 536428.Jun 30 2023, 2:23 PM

addressing comments

aartbik added inline comments.Jun 30 2023, 2:23 PM
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
40

add empty line back

mlir/test/Conversion/GPUCommon/lower-2to4-sparse-to-gpu-runtime-calls.mlir
23–27

I would completely remove the calls here

mlir/test/Conversion/GPUCommon/lower-sparse-to-gpu-runtime-calls.mlir
31

same here, since we do not run this code, just leave out the create/destroy calls form the IR

mlir/test/Dialect/GPU/ops.mlir
330

omit

358

omit

K-Wu updated this revision to Diff 536429.Jun 30 2023, 2:24 PM

no init in destroy handle func now

K-Wu updated this revision to Diff 536441.Jun 30 2023, 2:40 PM
K-Wu marked 2 inline comments as done.

addressing comments

K-Wu marked 3 inline comments as done.Jun 30 2023, 2:40 PM
aartbik accepted this revision.Jun 30 2023, 2:50 PM

Thanks for addressing all my comments so patiently. Looks great to me!

mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
40

still there?

This revision is now accepted and ready to land.Jun 30 2023, 2:50 PM
This revision was landed with ongoing or failed builds.Jun 30 2023, 2:53 PM
This revision was automatically updated to reflect the committed changes.
aartbik added inline comments.Jul 5 2023, 10:07 PM
mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
658

This should not have been removed. Sending out a fix.