This is an archive of the discontinued LLVM Phabricator instance.

Fix CUDA runtime wrapper for GPU mem alloc/free to async
ClosedPublic

Authored by bondhugula on Apr 10 2022, 10:53 PM.

Details

Summary

Switch CUDA runtime wrapper for GPU mem alloc/free to async. The
semantics of the GPU dialect ops (gpu.alloc/dealloc) and the wrappers it
lowered to (gpu-to-llvm) was for the async versions -- however, this was
being incorrectly mapped to cuMemAlloc/cuMemFree instead of
cuMemAllocAsync/cuMemFreeAsync.

Diff Detail

Event Timeline

bondhugula created this revision.Apr 10 2022, 10:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 10 2022, 10:53 PM
bondhugula requested review of this revision.Apr 10 2022, 10:53 PM
csigg accepted this revision.Apr 10 2022, 11:48 PM

This API was introduced in CUDA 11.2 (December 2020).
Should we #if CUDA_VERSION >= 11020 around this?

This revision is now accepted and ready to land.Apr 10 2022, 11:48 PM
bondhugula added a comment.EditedApr 11 2022, 5:12 AM

This API was introduced in CUDA 11.2 (December 2020).
Should we #if CUDA_VERSION >= 11020 around this?

Sure, but I can't tell what a sound approach here is. I found the snippet below in clang headers but I'm not sure if this is a stable way:

#include "cuda.h"
#if !defined(CUDA_VERSION)
#error "cuda.h did not define CUDA_VERSION"
#endif

#if CUDA_VERSION >= 11020
 ...
#endif
csigg added a comment.Apr 11 2022, 5:16 AM

This should be sufficient:

#if CUDA_VERSION >= 11020
  cuMemAllocAsync()
#else
  cuMemAlloc();
  (void)stream;
#endif

Guard lowering depending on CUDA version.

Minor adjustment.

This should be sufficient:

#if CUDA_VERSION >= 11020
  cuMemAllocAsync()
#else
  cuMemAlloc();
  (void)stream;
#endif

Thanks, done. I added a check for CUDA_VERSION being defined. Let me know if that's not needed.

This revision was landed with ongoing or failed builds.Apr 11 2022, 9:04 PM
This revision was automatically updated to reflect the committed changes.

You broke the bot apparently: https://lab.llvm.org/buildbot/#/builders/61/builds/24891 ; can you look into this?

You broke the bot apparently: https://lab.llvm.org/buildbot/#/builders/61/builds/24891 ; can you look into this?

Thanks for reverting this. I'll take a look.

bondhugula added a comment.EditedApr 12 2022, 12:15 AM

You broke the bot apparently: https://lab.llvm.org/buildbot/#/builders/61/builds/24891 ; can you look into this?

/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-cpu-runner: symbol lookup error: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_cuda_runtime.so: undefined symbol: cuMemAllocAsync

The bot has an old version of CUDA that doesn't support async alloc. We already guarded it with >= 11.2 above, and I double-checked again that a 11.2 or higher version should have had that method. Looking at the logs, the bot has CUDA_VERSION=10.2.89 in its env (not sure what the preprocessor sees). Either the format is different which is tripping the macro check or the build is being compiled with newer headers but is being linked with older libraries. Someone with access to the bot will have to see what CUDA_VERSION is being set to (for the build preprocessor).

@herhut of @csigg should be able to help I think?

csigg added a subscriber: kuhnel.Apr 12 2022, 7:28 AM

The CUDA_VERSION comes from the cuda.h header file.

If the builder's CUDA SDK is newer than 11.2, it will link cuMemAllocAsync from stubs/libcuda.so.1, but fail at runtime if the driver is older than 450.80.02.

My theory is that the CUDA_VERSION=10.2.89 from the log refers to the driver.

But this is all speculation. @kuhnel do you have any idea what's going on here?