We use the nvptx-arch and amdgpu-arch tools to identify the system's
installed GPUs. These are used across LLVM to handle setting up unit
tests and the --offload-arch=native option. This patch adds the
CLANG_FORCE_DLOPEN_LIBCUDA and CLANG_FORCE_DLOPEN_LIBHSA CMake
options to enable disabling these directly.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Does this address the case where we have HSA headers present on the system?
Since this only seems to modify link time behavior, I think building with this flag would still include "hsa/hsa.h" if they were present.
Those headers are guarded behind #ifndef DYNAMIC_CUDA. This is how we configure it to use the dlopen method, see the target_compile_definitions line. We used to use __has_include but it caused buildbots to fail for Windows cross compiling.
Ah yeah my bad 😅
Hmm somehow I keep getting confused by this. I'll check how other deps handle this, maybe there is a better way.
Fun fact I now know that libxml2 also struggles with this 😆
The following would roughly be how zlib and others handle it:
if (CLANG_ENABLE_HSA) if (CLANG_ENABLE_HSA STREQUAL FORCE_ON) find_package(hsa-runtime... REQUIRED) else() find_package (hsa-runtime...) endif() if (hsa-runtime_FOUND) # TODO: Some hsa sanity check, executed during configuration. # If check passes, `HAVE_HSA` is set. if (CLANG_ENABLE_HSA STREQUAL FORCE_ON AND NOT HAVE_HSA) message(FATAL_ERROR "Failed to configure HSA.") endif() endif() set(CLANG_ENABLE_HSA "${HAVE_HSA}") else() set(CLANG_ENABLE_HSA 0) endif()
Then we could remove __have_include and could directly check for CLANG_ENABLE_HSA during compile time. (We know that this has to work because of the TODO sanitiy check.)
#ifdef CLANG_HAVE_HSA #include "hsa/hsa.h" #endif
Since HSA is used by both OpenMP and the amdgpu-arch plugin, I think it would make sense to not call it CLANG_ENABLE_HSA but instead use LLVM_ENABLE_HSA and just put it llvm/cmake/config-ix.cmake (where I took the above code from).
@jhuber6 This doesn't consider the "DYNAMIC_..." macros. Do you think an approach like this, if worked out a bit more, could work?
This would also allow e.g. the ROCm packages to FORCE_ON hsa and enjoy the benefits of the LLVM configuration 😊
Like I said, the difference between these dependencies and something like libxml2 or libz is that the presence of the libraries doesn't guard any features. We can build the application both ways, the only difference is runtime cost if we don't link the library directly and a more annoying development environment. For the amdgpu-arch and nvptx-arch it might be easier to just always dlopen them to avoid this conversation entirely. So I'm just not sure if it's worth the extra effort to configure something that's fundamentally optional.
Actually, I might want to optionally link this stuff in the libc project in the future as well. That would be a dependency only required for tests but it might be compelling then.
FWIW, there are methods in CMake to check if features (such as if a struct contains specific data member, or if a library contains APIs). I think that would be the right way to fix the issue.
Ok so I went over https://isocpp.org/std/standing-documents/sd-6-sg10-feature-test-recommendations#testing-for-the-presence-of-a-header-__has_include (thanks, @aaron.ballman 😊) and through the commit history for the amdgpu-arch tool.
As I understand, in cases such as standard library headers, these __has_includes are useful (and are encouraged), (I assume especially for header-only libraries). But I'm not convinced that it's the right way to do things in the HSA case.
- If we don't detect this during configuration, we will fail during the build, not during configuration. this means that if the hsa CMake package is found but the headers are not actually present, we will fail late, potentially *very* late in the build.
- It also seems like the __has_includes have continuously been a source of confusion, and maybe I'm confused as well, but I think even the current implementation after https://reviews.llvm.org/rGbfe4514add5b7ab7e1f06248983a7162d734cffb has a bug (if DYNAMIC_HSA is not set and hsa is not found, we don't raise a "missing hsa.h" error).
@jhuber6 I didn't understand what you mean with linking it into libc. But if the performance regression is "ignorable" I'd be in favor of just always using dlopen.
I also agree with @tianshilei1992 that if this should remain we should have configuration for this.
IMO it is very desirable for every out-of-llvm-tree dependency to be checked during configuration. If someone without access to a compile server runs an LLVM build that seemingly passes configuration and then fails after 90 minutes because they had leftover cmake configs their package manager failed to clean up it unnecessarily inconveniences the user and we can prevent it.