This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Add options to disable direct linking of arch tools
Needs ReviewPublic

Authored by jhuber6 on Feb 10 2023, 10:28 AM.

Details

Summary

We use the nvptx-arch and amdgpu-arch tools to identify the system's
installed GPUs. These are used across LLVM to handle setting up unit
tests and the --offload-arch=native option. This patch adds the
CLANG_FORCE_DLOPEN_LIBCUDA and CLANG_FORCE_DLOPEN_LIBHSA CMake
options to enable disabling these directly.

Fixes: https://github.com/llvm/llvm-project/issues/60660

Diff Detail

Event Timeline

jhuber6 created this revision.Feb 10 2023, 10:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2023, 10:28 AM
jhuber6 requested review of this revision.Feb 10 2023, 10:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2023, 10:28 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Does this address the case where we have HSA headers present on the system?

Since this only seems to modify link time behavior, I think building with this flag would still include "hsa/hsa.h" if they were present.

Does this address the case where we have HSA headers present on the system?

Since this only seems to modify link time behavior, I think building with this flag would still include "hsa/hsa.h" if they were present.

Those headers are guarded behind #ifndef DYNAMIC_CUDA. This is how we configure it to use the dlopen method, see the target_compile_definitions line. We used to use __has_include but it caused buildbots to fail for Windows cross compiling.

Ah yeah my bad 😅

Hmm somehow I keep getting confused by this. I'll check how other deps handle this, maybe there is a better way.

aaronmondal added a comment.EditedFeb 10 2023, 11:51 AM

Fun fact I now know that libxml2 also struggles with this 😆

The following would roughly be how zlib and others handle it:

if (CLANG_ENABLE_HSA)

  if (CLANG_ENABLE_HSA STREQUAL FORCE_ON)
    find_package(hsa-runtime... REQUIRED)
  else()
    find_package (hsa-runtime...)
  endif()

  if (hsa-runtime_FOUND)
    # TODO: Some hsa sanity check, executed during configuration.
    # If check passes, `HAVE_HSA` is set.
    if (CLANG_ENABLE_HSA STREQUAL FORCE_ON AND NOT HAVE_HSA)
      message(FATAL_ERROR "Failed to configure HSA.")
    endif()
  endif()

  set(CLANG_ENABLE_HSA "${HAVE_HSA}")

else()
  set(CLANG_ENABLE_HSA 0)
endif()

Then we could remove __have_include and could directly check for CLANG_ENABLE_HSA during compile time. (We know that this has to work because of the TODO sanitiy check.)

#ifdef CLANG_HAVE_HSA
#include "hsa/hsa.h"
#endif

Since HSA is used by both OpenMP and the amdgpu-arch plugin, I think it would make sense to not call it CLANG_ENABLE_HSA but instead use LLVM_ENABLE_HSA and just put it llvm/cmake/config-ix.cmake (where I took the above code from).

@jhuber6 This doesn't consider the "DYNAMIC_..." macros. Do you think an approach like this, if worked out a bit more, could work?

This would also allow e.g. the ROCm packages to FORCE_ON hsa and enjoy the benefits of the LLVM configuration 😊

Like I said, the difference between these dependencies and something like libxml2 or libz is that the presence of the libraries doesn't guard any features. We can build the application both ways, the only difference is runtime cost if we don't link the library directly and a more annoying development environment. For the amdgpu-arch and nvptx-arch it might be easier to just always dlopen them to avoid this conversation entirely. So I'm just not sure if it's worth the extra effort to configure something that's fundamentally optional.

Actually, I might want to optionally link this stuff in the libc project in the future as well. That would be a dependency only required for tests but it might be compelling then.

FWIW, there are methods in CMake to check if features (such as if a struct contains specific data member, or if a library contains APIs). I think that would be the right way to fix the issue.

aaronmondal added a subscriber: aaron.ballman.EditedFeb 13 2023, 9:00 AM

Ok so I went over https://isocpp.org/std/standing-documents/sd-6-sg10-feature-test-recommendations#testing-for-the-presence-of-a-header-__has_include (thanks, @aaron.ballman 😊) and through the commit history for the amdgpu-arch tool.

As I understand, in cases such as standard library headers, these __has_includes are useful (and are encouraged), (I assume especially for header-only libraries). But I'm not convinced that it's the right way to do things in the HSA case.

  1. If we don't detect this during configuration, we will fail during the build, not during configuration. this means that if the hsa CMake package is found but the headers are not actually present, we will fail late, potentially *very* late in the build.
  2. It also seems like the __has_includes have continuously been a source of confusion, and maybe I'm confused as well, but I think even the current implementation after https://reviews.llvm.org/rGbfe4514add5b7ab7e1f06248983a7162d734cffb has a bug (if DYNAMIC_HSA is not set and hsa is not found, we don't raise a "missing hsa.h" error).

@jhuber6 I didn't understand what you mean with linking it into libc. But if the performance regression is "ignorable" I'd be in favor of just always using dlopen.

I also agree with @tianshilei1992 that if this should remain we should have configuration for this.

IMO it is very desirable for every out-of-llvm-tree dependency to be checked during configuration. If someone without access to a compile server runs an LLVM build that seemingly passes configuration and then fails after 90 minutes because they had leftover cmake configs their package manager failed to clean up it unnecessarily inconveniences the user and we can prevent it.