Switch CUDA runtime wrapper for GPU mem alloc/free to async. The
semantics of the GPU dialect ops (gpu.alloc/dealloc) and the wrappers it
lowered to (gpu-to-llvm) was for the async versions -- however, this was
being incorrectly mapped to cuMemAlloc/cuMemFree instead of
cuMemAllocAsync/cuMemFreeAsync.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This API was introduced in CUDA 11.2 (December 2020).
Should we #if CUDA_VERSION >= 11020 around this?
Sure, but I can't tell what a sound approach here is. I found the snippet below in clang headers but I'm not sure if this is a stable way:
#include "cuda.h" #if !defined(CUDA_VERSION) #error "cuda.h did not define CUDA_VERSION" #endif #if CUDA_VERSION >= 11020 ... #endif
This should be sufficient:
#if CUDA_VERSION >= 11020 cuMemAllocAsync() #else cuMemAlloc(); (void)stream; #endif
Thanks, done. I added a check for CUDA_VERSION being defined. Let me know if that's not needed.
You broke the bot apparently: https://lab.llvm.org/buildbot/#/builders/61/builds/24891 ; can you look into this?
/vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/bin/mlir-cpu-runner: symbol lookup error: /vol/worker/mlir-nvidia/mlir-nvidia/llvm.obj/lib/libmlir_cuda_runtime.so: undefined symbol: cuMemAllocAsync
The bot has an old version of CUDA that doesn't support async alloc. We already guarded it with >= 11.2 above, and I double-checked again that a 11.2 or higher version should have had that method. Looking at the logs, the bot has CUDA_VERSION=10.2.89 in its env (not sure what the preprocessor sees). Either the format is different which is tripping the macro check or the build is being compiled with newer headers but is being linked with older libraries. Someone with access to the bot will have to see what CUDA_VERSION is being set to (for the build preprocessor).
The CUDA_VERSION comes from the cuda.h header file.
If the builder's CUDA SDK is newer than 11.2, it will link cuMemAllocAsync from stubs/libcuda.so.1, but fail at runtime if the driver is older than 450.80.02.
My theory is that the CUDA_VERSION=10.2.89 from the log refers to the driver.
But this is all speculation. @kuhnel do you have any idea what's going on here?