Download Raw Diff

Details

Reviewers

herhut
csigg

Commits

rGb4117fede20b: Fix CUDA runtime wrapper for GPU mem alloc/free to async

Summary

Switch CUDA runtime wrapper for GPU mem alloc/free to async. The
semantics of the GPU dialect ops (gpu.alloc/dealloc) and the wrappers it
lowered to (gpu-to-llvm) was for the async versions -- however, this was
being incorrectly mapped to cuMemAlloc/cuMemFree instead of
cuMemAllocAsync/cuMemFreeAsync.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

bondhugula created this revision.Apr 10 2022, 10:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2022, 10:53 PM

Herald added subscribers: sdasgup3, wenzhicui, wrengr and 21 others. · View Herald Transcript

bondhugula requested review of this revision.Apr 10 2022, 10:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2022, 10:53 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B158947: Diff 421835.Apr 10 2022, 11:11 PM

This API was introduced in CUDA 11.2 (December 2020).
Should we #if CUDA_VERSION >= 11020 around this?

This revision is now accepted and ready to land.Apr 10 2022, 11:48 PM

In D123482#3441962, @csigg wrote:

This API was introduced in CUDA 11.2 (December 2020).
Should we #if CUDA_VERSION >= 11020 around this?

Sure, but I can't tell what a sound approach here is. I found the snippet below in clang headers but I'm not sure if this is a stable way:

#include "cuda.h"
#if !defined(CUDA_VERSION)
#error "cuda.h did not define CUDA_VERSION"
#endif

#if CUDA_VERSION >= 11020
 ...
#endif

This should be sufficient:

#if CUDA_VERSION >= 11020
  cuMemAllocAsync()
#else
  cuMemAlloc();
  (void)stream;
#endif

Guard lowering depending on CUDA version.

Minor adjustment.

In D123482#3442397, @csigg wrote:
This should be sufficient:
#if CUDA_VERSION >= 11020
  cuMemAllocAsync()
#else
  cuMemAlloc();
  (void)stream;
#endif

Thanks, done. I added a check for CUDA_VERSION being defined. Let me know if that's not needed.

Harbormaster completed remote builds in B158981: Diff 421884.Apr 11 2022, 5:35 AM

Rebase.

Harbormaster completed remote builds in B159145: Diff 422109.Apr 11 2022, 8:58 PM

This revision was landed with ongoing or failed builds.Apr 11 2022, 9:04 PM

Closed by commit rGb4117fede20b: Fix CUDA runtime wrapper for GPU mem alloc/free to async (authored by bondhugula). · Explain Why

This revision was automatically updated to reflect the committed changes.

bondhugula added a commit: rGb4117fede20b: Fix CUDA runtime wrapper for GPU mem alloc/free to async.

You broke the bot apparently: https://lab.llvm.org/buildbot/#/builders/61/builds/24891 ; can you look into this?

mehdi_amini added a reverting change: rG6b7e6ea489f6: Revert "Fix CUDA runtime wrapper for GPU mem alloc/free to async".Apr 11 2022, 11:50 PM

In D123482#3444626, @mehdi_amini wrote:

You broke the bot apparently: https://lab.llvm.org/buildbot/#/builders/61/builds/24891 ; can you look into this?

Thanks for reverting this. I'll take a look.

In D123482#3444729, @bondhugula wrote:

In D123482#3444626, @mehdi_amini wrote:

You broke the bot apparently: https://lab.llvm.org/buildbot/#/builders/61/builds/24891 ; can you look into this?

/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-cpu-runner: symbol lookup error: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_cuda_runtime.so: undefined symbol: cuMemAllocAsync

The bot has an old version of CUDA that doesn't support async alloc. We already guarded it with >= 11.2 above, and I double-checked again that a 11.2 or higher version should have had that method. Looking at the logs, the bot has CUDA_VERSION=10.2.89 in its env (not sure what the preprocessor sees). Either the format is different which is tripping the macro check or the build is being compiled with newer headers but is being linked with older libraries. Someone with access to the bot will have to see what CUDA_VERSION is being set to (for the build preprocessor).

@herhut of @csigg should be able to help I think?

The CUDA_VERSION comes from the cuda.h header file.

If the builder's CUDA SDK is newer than 11.2, it will link cuMemAllocAsync from stubs/libcuda.so.1, but fail at runtime if the driver is older than 450.80.02.

My theory is that the CUDA_VERSION=10.2.89 from the log refers to the driver.

But this is all speculation. @kuhnel do you have any idea what's going on here?

This is an archive of the discontinued LLVM Phabricator instance.

Fix CUDA runtime wrapper for GPU mem alloc/free to async
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 422109

mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp

This is an archive of the discontinued LLVM Phabricator instance.

Fix CUDA runtime wrapper for GPU mem alloc/free to asyncClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 422109

mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp

Fix CUDA runtime wrapper for GPU mem alloc/free to async
ClosedPublic