This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/tools/nvptx-arch/
-
tools/
-
nvptx-arch/
-
CMakeLists.txt
1
NVPTXArch.cpp

Differential D141861

[nvptx-arch] Dynamically load the CUDA runtime if not found during the build
ClosedPublic

Authored by jhuber6 on Jan 16 2023, 9:21 AM.

Download Raw Diff

Details

Reviewers

jdoerfert
tianshilei1992
tra
yaxunl
JonChesterfield

Commits

rG9954516ffb10: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build

Summary

Much like the changes in D141859, this patch allows the nvptx-arch
tool to be built and provided with every distrubition of LLVM / Clang.
This will make it more reliable for our toolchains to depend on. The
changes here configure a version that dynamically loads CUDA if it was
not found at build time.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.Jan 16 2023, 9:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 16 2023, 9:21 AM

Herald added subscribers: mattd, gchakrabarti, asavonic. · View Herald Transcript

jhuber6 requested review of this revision.Jan 16 2023, 9:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 16 2023, 9:21 AM

Herald added subscribers: cfe-commits, jholewinski. · View Herald Transcript

Yeah, otherwise I suppose there will be some errors when compiling OpenMP program when there is no CUDA installed.

clang/tools/nvptx-arch/NVPTXArch.cpp
89–95	unrelated: I always prefer: int main(int argc, char *argv[]) { return 0; }

This revision is now accepted and ready to land.Jan 16 2023, 10:00 AM

Harbormaster completed remote builds in B208077: Diff 489585.Jan 16 2023, 10:59 AM

This revision was landed with ongoing or failed builds.Jan 16 2023, 11:14 AM

Closed by commit rG9954516ffb10: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build (authored by jhuber6). · Explain Why

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rG9954516ffb10: [nvptx-arch] Dynamically load the CUDA runtime if not found during the build.

This change appears to have broken the build when crosscompiling to x86-32 on a Linux x86-64 system; on the Halide buildbots, we now fail at link time with

FAILED: bin/nvptx-arch 
: && /usr/bin/g++-7  -m32 -Wno-psabi -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -O3 -DNDEBUG -Wl,-rpath-link,/home/halidenightly/build_bot/worker/llvm-16-x86-32-linux/llvm-build/./lib  -Wl,--gc-sections tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o -o bin/nvptx-arch  -Wl,-rpath,"\$ORIGIN/../lib"  lib/libLLVMSupport.a  -lpthread  -lrt  -ldl  -lpthread  -lm  lib/libLLVMDemangle.a && :
/usr/bin/ld: tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o: in function `handleError(cudaError_enum)':
NVPTXArch.cpp:(.text._ZL11handleError14cudaError_enum+0x2b): undefined reference to `cuGetErrorString'
/usr/bin/ld: tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o: in function `main':
NVPTXArch.cpp:(.text.startup.main+0xcf): undefined reference to `cuInit'
/usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0xf9): undefined reference to `cuDeviceGetCount'
/usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x11e): undefined reference to `cuDeviceGet'
/usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x131): undefined reference to `cuDeviceGetAttribute'
/usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x146): undefined reference to `cuDeviceGetAttribute'
collect2: error: ld returned 1 exit status

I'm guessing that the problem here is that the build machine has Cuda installed (so the headers are found), but no 32-bit version of Cuda (so linking fails).

Probably easy to fix, but as of right now, our 32-bit testing is dead in the water; could someone please revert this pending a proper fix?

In D141861#4060028, @srj wrote:

This change appears to have broken the build when crosscompiling to x86-32 on a Linux x86-64 system; on the Halide buildbots, we now fail at link time with

FAILED: bin/nvptx-arch 
: && /usr/bin/g++-7  -m32 -Wno-psabi -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -O3 -DNDEBUG -Wl,-rpath-link,/home/halidenightly/build_bot/worker/llvm-16-x86-32-linux/llvm-build/./lib  -Wl,--gc-sections tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o -o bin/nvptx-arch  -Wl,-rpath,"\$ORIGIN/../lib"  lib/libLLVMSupport.a  -lpthread  -lrt  -ldl  -lpthread  -lm  lib/libLLVMDemangle.a && :
/usr/bin/ld: tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o: in function `handleError(cudaError_enum)':
NVPTXArch.cpp:(.text._ZL11handleError14cudaError_enum+0x2b): undefined reference to `cuGetErrorString'
/usr/bin/ld: tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o: in function `main':
NVPTXArch.cpp:(.text.startup.main+0xcf): undefined reference to `cuInit'
/usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0xf9): undefined reference to `cuDeviceGetCount'
/usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x11e): undefined reference to `cuDeviceGet'
/usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x131): undefined reference to `cuDeviceGetAttribute'
/usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x146): undefined reference to `cuDeviceGetAttribute'
collect2: error: ld returned 1 exit status

I'm guessing here it's finding the cuda.h header but not the libraries? Maybe we should determine this via the CMake and add a definition instead of has_include.

In D141861#4060028, @srj wrote:

This change appears to have broken the build when crosscompiling to x86-32 on a Linux x86-64 system; on the Halide buildbots, we now fail at link time with

FAILED: bin/nvptx-arch 
: && /usr/bin/g++-7  -m32 -Wno-psabi -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -O3 -DNDEBUG -Wl,-rpath-link,/home/halidenightly/build_bot/worker/llvm-16-x86-32-linux/llvm-build/./lib  -Wl,--gc-sections tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o -o bin/nvptx-arch  -Wl,-rpath,"\$ORIGIN/../lib"  lib/libLLVMSupport.a  -lpthread  -lrt  -ldl  -lpthread  -lm  lib/libLLVMDemangle.a && :
/usr/bin/ld: tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o: in function `handleError(cudaError_enum)':
NVPTXArch.cpp:(.text._ZL11handleError14cudaError_enum+0x2b): undefined reference to `cuGetErrorString'
/usr/bin/ld: tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o: in function `main':
NVPTXArch.cpp:(.text.startup.main+0xcf): undefined reference to `cuInit'
/usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0xf9): undefined reference to `cuDeviceGetCount'
/usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x11e): undefined reference to `cuDeviceGet'
/usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x131): undefined reference to `cuDeviceGetAttribute'
/usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x146): undefined reference to `cuDeviceGetAttribute'
collect2: error: ld returned 1 exit status

I'm guessing that the problem here is that the build machine has Cuda installed (so the headers are found), but no 32-bit version of Cuda (so linking fails).

Probably easy to fix, but as of right now, our 32-bit testing is dead in the water; could someone please revert this pending a proper fix?

Can you let me know if rG4ce454c654bd solves it? I'm guessing the problem is arising when we find the libraries at build configure time but not at build time so we might need another check as well.

In D141861#4060100, @jhuber6 wrote:

In D141861#4060028, @srj wrote:

This change appears to have broken the build when crosscompiling to x86-32 on a Linux x86-64 system; on the Halide buildbots, we now fail at link time with

FAILED: bin/nvptx-arch 
: && /usr/bin/g++-7  -m32 -Wno-psabi -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wno-misleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -O3 -DNDEBUG -Wl,-rpath-link,/home/halidenightly/build_bot/worker/llvm-16-x86-32-linux/llvm-build/./lib  -Wl,--gc-sections tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o -o bin/nvptx-arch  -Wl,-rpath,"\$ORIGIN/../lib"  lib/libLLVMSupport.a  -lpthread  -lrt  -ldl  -lpthread  -lm  lib/libLLVMDemangle.a && :
/usr/bin/ld: tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o: in function `handleError(cudaError_enum)':
NVPTXArch.cpp:(.text._ZL11handleError14cudaError_enum+0x2b): undefined reference to `cuGetErrorString'
/usr/bin/ld: tools/clang/tools/nvptx-arch/CMakeFiles/nvptx-arch.dir/NVPTXArch.cpp.o: in function `main':
NVPTXArch.cpp:(.text.startup.main+0xcf): undefined reference to `cuInit'
/usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0xf9): undefined reference to `cuDeviceGetCount'
/usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x11e): undefined reference to `cuDeviceGet'
/usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x131): undefined reference to `cuDeviceGetAttribute'
/usr/bin/ld: NVPTXArch.cpp:(.text.startup.main+0x146): undefined reference to `cuDeviceGetAttribute'
collect2: error: ld returned 1 exit status

I'm guessing that the problem here is that the build machine has Cuda installed (so the headers are found), but no 32-bit version of Cuda (so linking fails).

Probably easy to fix, but as of right now, our 32-bit testing is dead in the water; could someone please revert this pending a proper fix?

Can you let me know if rG4ce454c654bd solves it? I'm guessing the problem is arising when we find the libraries at build configure time but not at build time so we might need another check as well.

It looks like this change (but not the rG4ce454c654bd) is in the 17 branch, as the latter is now failing in the same way for crosscompiles.

In D141861#4091383, @srj wrote:

It looks like this change (but not the rG4ce454c654bd) is in the 17 branch, as the latter is now failing in the same way for crosscompiles.

It looks like it's there for me, see https://github.com/llvm/llvm-project/blob/main/clang/tools/nvptx-arch/NVPTXArch.cpp#L20. What is the issue? I made a slight tweak a few days ago on the 17 branch that updated how we could the CUDA driver.

In D141861#4091403, @jhuber6 wrote:

In D141861#4091383, @srj wrote:

It looks like this change (but not the rG4ce454c654bd) is in the 17 branch, as the latter is now failing in the same way for crosscompiles.

It looks like it's there for me, see https://github.com/llvm/llvm-project/blob/main/clang/tools/nvptx-arch/NVPTXArch.cpp#L20. What is the issue? I made a slight tweak a few days ago on the 17 branch that updated how we could the CUDA driver.

We just started testing with the 17 branch this morning, can you point me at your tweak?

In D141861#4091408, @srj wrote:

In D141861#4091403, @jhuber6 wrote:

In D141861#4091383, @srj wrote:

It looks like this change (but not the rG4ce454c654bd) is in the 17 branch, as the latter is now failing in the same way for crosscompiles.

It looks like it's there for me, see https://github.com/llvm/llvm-project/blob/main/clang/tools/nvptx-arch/NVPTXArch.cpp#L20. What is the issue? I made a slight tweak a few days ago on the 17 branch that updated how we could the CUDA driver.

We just started testing with the 17 branch this morning, can you point me at your tweak?

https://github.com/llvm/llvm-project/commit/759dec253695f38a101c74905c819ea47392e515. Does it work if you revert this? I wouldn't think it wouldn't affect anything. That's the only change that happened after the 16 release as far as I'm aware.

https://github.com/llvm/llvm-project/commit/759dec253695f38a101c74905c819ea47392e515. Does it work if you revert this? I wouldn't think it wouldn't affect anything. That's the only change that happened after the 16 release as far as I'm aware.

Reverting this (well, actually just monkey-patching in the old code) does indeed make things build correctly again -- a bit surprising to me too, but that's CMake for you. Reverting the change seems appropriate pending a tested fix.

In D141861#4091851, @srj wrote:

https://github.com/llvm/llvm-project/commit/759dec253695f38a101c74905c819ea47392e515. Does it work if you revert this? I wouldn't think it wouldn't affect anything. That's the only change that happened after the 16 release as far as I'm aware.

Reverting this (well, actually just monkey-patching in the old code) does indeed make things build correctly again -- a bit surprising to me too, but that's CMake for you. Reverting the change seems appropriate pending a tested fix.

That's bizarre, that means it's finding CUDAToolkit but can't link against it?

In D141861#4091869, @jhuber6 wrote:

In D141861#4091851, @srj wrote:

https://github.com/llvm/llvm-project/commit/759dec253695f38a101c74905c819ea47392e515. Does it work if you revert this? I wouldn't think it wouldn't affect anything. That's the only change that happened after the 16 release as far as I'm aware.

Reverting this (well, actually just monkey-patching in the old code) does indeed make things build correctly again -- a bit surprising to me too, but that's CMake for you. Reverting the change seems appropriate pending a tested fix.

That's bizarre, that means it's finding CUDAToolkit but can't link against it?

It's finding a 64-bit CUDAToolkit, which it can't link against because the rest of the build is 32-bit.

In D141861#4091897, @srj wrote:

It's finding a 64-bit CUDAToolkit, which it can't link against because the rest of the build is 32-bit.

Wondering why it didn't find it before then. But that's definitely a weird configuration. Not sure what a good generic solution is. We could always make it dlopen all the time.

In D141861#4091903, @jhuber6 wrote:

In D141861#4091897, @srj wrote:

It's finding a 64-bit CUDAToolkit, which it can't link against because the rest of the build is 32-bit.

Wondering why it didn't find it before then. But that's definitely a weird configuration. Not sure what a good generic solution is. We could always make it dlopen all the time.

Crosscompiling to x86-32 on an x86-64 host doesn't strike me as particularly weird at all (especially on Windows), but apparently it is quite weird for LLVM at this point in time as we keep getting a lot of different things broken there :-)

In D141861#4091922, @srj wrote:

Crosscompiling to x86-32 on an x86-64 host doesn't strike me as particularly weird at all (especially on Windows), but apparently it is quite weird for LLVM at this point in time as we keep getting a lot of different things broken there :-)

I'm not very familiar with this type of build. Are there any variables we could pick up to just disable this if it's not building for the host system? Something like CMAKE_CROSSCOMPILING?

In D141861#4091949, @jhuber6 wrote:

In D141861#4091922, @srj wrote:

Crosscompiling to x86-32 on an x86-64 host doesn't strike me as particularly weird at all (especially on Windows), but apparently it is quite weird for LLVM at this point in time as we keep getting a lot of different things broken there :-)

I'm not very familiar with this type of build. Are there any variables we could pick up to just disable this if it's not building for the host system? Something like CMAKE_CROSSCOMPILING?

I'm not an expert on the LLVM build system, so I'm not entirely sure, but I'd start by examining the CMake setting LLVM_BUILD_32_BITS (which we set to ON in this case)

In D141861#4091961, @srj wrote:

In D141861#4091949, @jhuber6 wrote:

In D141861#4091922, @srj wrote:

Crosscompiling to x86-32 on an x86-64 host doesn't strike me as particularly weird at all (especially on Windows), but apparently it is quite weird for LLVM at this point in time as we keep getting a lot of different things broken there :-)

I'm not very familiar with this type of build. Are there any variables we could pick up to just disable this if it's not building for the host system? Something like CMAKE_CROSSCOMPILING?

I'm not an expert on the LLVM build system, so I'm not entirely sure, but I'd start by examining the CMake setting LLVM_BUILD_32_BITS (which we set to ON in this case)

Can you let me know if adding this fixes it.

diff --git a/clang/tools/nvptx-arch/CMakeLists.txt b/clang/tools/nvptx-arch/CMakeLists.txt
index 95c25dc75847..ccdba5ed69a7 100644
--- a/clang/tools/nvptx-arch/CMakeLists.txt
+++ b/clang/tools/nvptx-arch/CMakeLists.txt
@@ -12,7 +12,7 @@ add_clang_tool(nvptx-arch NVPTXArch.cpp)
 find_package(CUDAToolkit QUIET)
 
 # If we found the CUDA library directly we just dynamically link against it.
-if (CUDAToolkit_FOUND)
+if (CUDAToolkit_FOUND AND NOT CMAKE_CROSSCOMPILING)
   target_link_libraries(nvptx-arch PRIVATE CUDA::cuda_driver)
 else()
   target_compile_definitions(nvptx-arch PRIVATE "DYNAMIC_CUDA")

In D141861#4091987, @jhuber6 wrote:

Can you let me know if adding this fixes it.

Unfortunately, no. (That is: It does not fix it. CMAKE_CROSSCOMPILING is unfortunately a subtle and flaky beast, see e.g. https://gitlab.kitware.com/cmake/cmake/-/issues/21744)

In D141861#4092034, @srj wrote:

In D141861#4091987, @jhuber6 wrote:

Can you let me know if adding this fixes it.

Unfortunately, no. (That is: It does not fix it. CMAKE_CROSSCOMPILING is unfortunately a subtle and flaky beast, see e.g. https://gitlab.kitware.com/cmake/cmake/-/issues/21744)

Update: I may have a way to make this work from my side; testing now.

In D141861#4092096, @srj wrote:

Update: I may have a way to make this work from my side; testing now.

Alas, that didn't work, stlll broken.

In D141861#4092182, @srj wrote:

In D141861#4092096, @srj wrote:

Update: I may have a way to make this work from my side; testing now.

Alas, that didn't work, stlll broken.

Interesting. It's definitely a bad problem that we find the 64-bit version and try to link with it for a cross-compile. I don't want to revert the code because the old FIND_CUDA is officially deprecated. I figured there was surely a way to check if LLVM was cross compiling.

In D141861#4092190, @jhuber6 wrote:

In D141861#4092182, @srj wrote:

In D141861#4092096, @srj wrote:

Update: I may have a way to make this work from my side; testing now.

Alas, that didn't work, stlll broken.

Interesting. It's definitely a bad problem that we find the 64-bit version and try to link with it for a cross-compile. I don't want to revert the code because the old FIND_CUDA is officially deprecated. I figured there was surely a way to check if LLVM was cross compiling.

I'll see if I can get our CMake expert to weigh in :-)

For what it's worth, NVIDIA has started deprecating 32-bit binaries long ago (https://forums.developer.nvidia.com/t/deprecation-plans-for-32-bit-linux-x86-cuda-toolkit-and-cuda-driver/31356) and the process had finally come to the end with the release of CUDA-12:

CUDA-12 release notes say:

32-bit compilation native and cross-compilation is removed from CUDA 12.0 and later Toolkit.
Use the CUDA Toolkit from earlier releases for 32-bit compilation. CUDA Driver will continue
to support running existing 32-bit applications on existing GPUs except Hopper.
Hopper does not support 32-bit applications. Ada will be the last architecture with driver support for 32-bit applications.

In D141861#4092237, @tra wrote:

For what it's worth, NVIDIA has started deprecating 32-bit binaries long ago (https://forums.developer.nvidia.com/t/deprecation-plans-for-32-bit-linux-x86-cuda-toolkit-and-cuda-driver/31356) and the process had finally come to the end with the release of CUDA-12:

Hmm... maybe the right answer then is to just always use the dynamic-loading path when doing any kind of 32-bit build.

In D141861#4094036, @srj wrote:

In D141861#4092237, @tra wrote:

For what it's worth, NVIDIA has started deprecating 32-bit binaries long ago (https://forums.developer.nvidia.com/t/deprecation-plans-for-32-bit-linux-x86-cuda-toolkit-and-cuda-driver/31356) and the process had finally come to the end with the release of CUDA-12:

Hmm... maybe the right answer then is to just always use the dynamic-loading path when doing any kind of 32-bit build.

That's probably the best option. I don't think we have much pretense of supporting 64-bit offloading right now. Would this just require checking LLVM_BUILD_32_BITS? Should be an easy change.

In D141861#4094043, @jhuber6 wrote:

Would this just require checking LLVM_BUILD_32_BITS? Should be an easy change.

I think so. (It might be tempting to check if (CMAKE_SIZEOF_VOID_P EQUAL 8) but LLVM_BUILD_32_BITS is likely to be a more explicit signal.)

In D141861#4094058, @srj wrote:

In D141861#4094043, @jhuber6 wrote:

Would this just require checking LLVM_BUILD_32_BITS? Should be an easy change.

I think so. (It might be tempting to check if (CMAKE_SIZEOF_VOID_P EQUAL 8) but LLVM_BUILD_32_BITS is likely to be a more explicit signal.)

SG, want me to push that real quick?

In D141861#4094059, @jhuber6 wrote:

In D141861#4094058, @srj wrote:

In D141861#4094043, @jhuber6 wrote:

Would this just require checking LLVM_BUILD_32_BITS? Should be an easy change.

I think so. (It might be tempting to check if (CMAKE_SIZEOF_VOID_P EQUAL 8) but LLVM_BUILD_32_BITS is likely to be a more explicit signal.)

SG, want me to push that real quick?

Yes please!

In D141861#4094063, @srj wrote:

Yes please!

Let me know if this fixes anything rG9f64fbb882dc.

In D141861#4094079, @jhuber6 wrote:

In D141861#4094063, @srj wrote:

Yes please!

Let me know if this fixes anything rG9f64fbb882dc.

Testing now

In D141861#4094084, @srj wrote:

In D141861#4094079, @jhuber6 wrote:

In D141861#4094063, @srj wrote:

Yes please!

Let me know if this fixes anything rG9f64fbb882dc.

Testing now

So far, so good, let me just verify a few more things

In D141861#4094238, @srj wrote:

In D141861#4094084, @srj wrote:

In D141861#4094079, @jhuber6 wrote:

In D141861#4094063, @srj wrote:

Yes please!

Let me know if this fixes anything rG9f64fbb882dc.

Testing now

So far, so good, let me just verify a few more things

Looks good! Thanks for the fix!

Revision Contents

Path

Size

clang/

tools/

nvptx-arch/

CMakeLists.txt

16 lines

NVPTXArch.cpp

66 lines

Diff 489609

clang/tools/nvptx-arch/CMakeLists.txt

	# //===--------------------------------------------------------------------===//			# //===--------------------------------------------------------------------===//
	# //			# //
	# // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			# // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	# // See https://llvm.org/LICENSE.txt for details.			# // See https://llvm.org/LICENSE.txt for details.
	# // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			# // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	# //			# //
	# //===--------------------------------------------------------------------===//			# //===--------------------------------------------------------------------===//

				set(LLVM_LINK_COMPONENTS Support)
				add_clang_tool(nvptx-arch NVPTXArch.cpp)

	# TODO: This is deprecated. Since CMake 3.17 we can use FindCUDAToolkit instead.			# TODO: This is deprecated. Since CMake 3.17 we can use FindCUDAToolkit instead.
	find_package(CUDA QUIET)			find_package(CUDA QUIET)
	find_library(cuda-library NAMES cuda PATHS /lib64)			find_library(cuda-library NAMES cuda PATHS /lib64)
	if (NOT cuda-library AND CUDA_FOUND)			if (NOT cuda-library AND CUDA_FOUND)
	get_filename_component(CUDA_LIBDIR "${CUDA_cudart_static_LIBRARY}" DIRECTORY)			get_filename_component(CUDA_LIBDIR "${CUDA_cudart_static_LIBRARY}" DIRECTORY)
	find_library(cuda-library NAMES cuda HINTS "${CUDA_LIBDIR}/stubs")			find_library(cuda-library NAMES cuda HINTS "${CUDA_LIBDIR}/stubs")
	endif()			endif()

	if (NOT CUDA_FOUND OR NOT cuda-library)			# If we found the CUDA library directly we just dynamically link against it.
	message(STATUS "Not building nvptx-arch: cuda runtime not found")			if (CUDA_FOUND AND cuda-library)
	return()
	endif()

	add_clang_tool(nvptx-arch NVPTXArch.cpp)

	set_target_properties(nvptx-arch PROPERTIES INSTALL_RPATH_USE_LINK_PATH ON)
	target_include_directories(nvptx-arch PRIVATE ${CUDA_INCLUDE_DIRS})			target_include_directories(nvptx-arch PRIVATE ${CUDA_INCLUDE_DIRS})

	target_link_libraries(nvptx-arch PRIVATE ${cuda-library})			target_link_libraries(nvptx-arch PRIVATE ${cuda-library})
				endif()

clang/tools/nvptx-arch/NVPTXArch.cpp

//===- NVPTXArch.cpp - list installed NVPTX devies ------- C++ ----------===//		//===- NVPTXArch.cpp - list installed NVPTX devies ------- C++ ----------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements a tool for detecting name of CUDA gpus installed in the		// This file implements a tool for detecting name of CUDA gpus installed in the
// system.		// system.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		#include "llvm/Support/DynamicLibrary.h"
		#include "llvm/Support/Error.h"
		#include <cstdint>
		#include <cstdio>
		#include <memory>

#if defined(__has_include)		#if defined(__has_include)
#if __has_include("cuda.h")		#if __has_include("cuda.h")
#include "cuda.h"		#include "cuda.h"
#define CUDA_HEADER_FOUND 1		#define CUDA_HEADER_FOUND 1
#else		#else
#define CUDA_HEADER_FOUND 0		#define CUDA_HEADER_FOUND 0
#endif		#endif
#else		#else
#define CUDA_HEADER_FOUND 0		#define CUDA_HEADER_FOUND 0
#endif		#endif

#if !CUDA_HEADER_FOUND		#if !CUDA_HEADER_FOUND
int main() { return 1; }		typedef enum cudaError_enum {
		CUDA_SUCCESS = 0,
		CUDA_ERROR_NO_DEVICE = 100,
		} CUresult;

		typedef enum CUdevice_attribute_enum {
		CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75,
		CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76,
		} CUdevice_attribute;

		typedef uint32_t CUdevice;

		CUresult (*cuInit)(unsigned int);
		CUresult (cuDeviceGetCount)(int );
		CUresult (cuGetErrorString)(CUresult, const char *);
		CUresult (cuDeviceGet)(CUdevice , int);
		CUresult (cuDeviceGetAttribute)(int , CUdevice_attribute, CUdevice);

		constexpr const char *DynamicCudaPath = "libcuda.so";

		llvm::Error loadCUDA() {
		std::string ErrMsg;
		auto DynlibHandle = std::make_unique<llvm::sys::DynamicLibrary>(
		llvm::sys::DynamicLibrary::getPermanentLibrary(DynamicCudaPath, &ErrMsg));
		if (!DynlibHandle->isValid()) {
		return llvm::createStringError(llvm::inconvertibleErrorCode(),
		"Failed to 'dlopen' %s\n", DynamicCudaPath);
		}
		#define DYNAMIC_INIT(SYMBOL) \
		{ \
		void *SymbolPtr = DynlibHandle->getAddressOfSymbol(#SYMBOL); \
		if (!SymbolPtr) \
		return llvm::createStringError(llvm::inconvertibleErrorCode(), \
		"Failed to 'dlsym' " #SYMBOL); \
		SYMBOL = reinterpret_cast<decltype(SYMBOL)>(SymbolPtr); \
		}
		DYNAMIC_INIT(cuInit);
		DYNAMIC_INIT(cuDeviceGetCount);
		DYNAMIC_INIT(cuGetErrorString);
		DYNAMIC_INIT(cuDeviceGet);
		DYNAMIC_INIT(cuDeviceGetAttribute);
		#undef DYNAMIC_INIT
		return llvm::Error::success();
		}
#else		#else
		llvm::Error loadCUDA() { return llvm::Error::success(); }
#include <cstdint>		#endif
#include <cstdio>

static int handleError(CUresult Err) {		static int handleError(CUresult Err) {
const char *ErrStr = nullptr;		const char *ErrStr = nullptr;
CUresult Result = cuGetErrorString(Err, &ErrStr);		CUresult Result = cuGetErrorString(Err, &ErrStr);
if (Result != CUDA_SUCCESS)		if (Result != CUDA_SUCCESS)
return EXIT_FAILURE;		return EXIT_FAILURE;
fprintf(stderr, "CUDA error: %s\n", ErrStr);		fprintf(stderr, "CUDA error: %s\n", ErrStr);
return EXIT_FAILURE;		return EXIT_FAILURE;
}		}

int main() {		int main(int argc, char *argv[]) {
		// Attempt to load the NVPTX driver runtime.
		if (llvm::Error Err = loadCUDA()) {
		logAllUnhandledErrors(std::move(Err), llvm::errs());
		return EXIT_FAILURE;
		}

		tianshilei1992Unsubmitted Not Done Reply Inline Actions unrelated: I always prefer: int main(int argc, char argv[]) { return 0; } tianshilei1992:* unrelated: I always prefer: ``` int main(int argc, char *argv[]) { return 0; } ```
if (CUresult Err = cuInit(0)) {		if (CUresult Err = cuInit(0)) {
if (Err == CUDA_ERROR_NO_DEVICE)		if (Err == CUDA_ERROR_NO_DEVICE)
return EXIT_SUCCESS;		return EXIT_SUCCESS;
else		else
return handleError(Err);		return handleError(Err);
}		}

int Count = 0;		int Count = 0;
Show All 13 Lines	for (int DeviceId = 0; DeviceId < Count; ++DeviceId) {
if (CUresult Err = cuDeviceGetAttribute(		if (CUresult Err = cuDeviceGetAttribute(
&Minor, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, Device))		&Minor, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, Device))
return handleError(Err);		return handleError(Err);

printf("sm_%d%d\n", Major, Minor);		printf("sm_%d%d\n", Major, Minor);
}		}
return EXIT_SUCCESS;		return EXIT_SUCCESS;
}		}

#endif

This is an archive of the discontinued LLVM Phabricator instance.

[nvptx-arch] Dynamically load the CUDA runtime if not found during the buildClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 489609

clang/tools/nvptx-arch/CMakeLists.txt

clang/tools/nvptx-arch/NVPTXArch.cpp

[nvptx-arch] Dynamically load the CUDA runtime if not found during the build
ClosedPublic