This avoids undefined behavior/issues resulting out of unconventional
or co-existing ROCm installations.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Dropping PATHS /opt/rocm from the openmp parts will break people using an existing rocm install with trunk llvm. I think it would be reasonable to look at whatever ROCM_PATH is set to instead of assuming /opt/rocm. Is that a variable we can pass to find_package?
The fallback opt/rocm is desired. If a module system needs to point to the specific rocm installation. Set CMAKE_PREFIX_PATH=<unconventional rocm> in the module file.
If you would like to honor ROCM_PATH, then do one find_library with explicit ROCM_PATH and another with all the defaults.
mlir/lib/Dialect/GPU/CMakeLists.txt | ||
---|---|---|
144 | Both ROCM_PATH HIP_PATH are used as hints without verification. if ROCM_PATH takes the precedence over everything else find_path( HIP_MODULE_FILE_DIR FindHIP.cmake HINTS ${ROCM_PATH} PATH_SUFFIXES hip/cmake REQUIRED NO_DEFAULT_PATH) else find_path( HIP_MODULE_FILE_DIR FindHIP.cmake HINTS $ENV{ROCM_PATH} /opt/rocm PATH_SUFFIXES hip/cmake REQUIRED) endif by doing this, you can verify that ROCM_PATH is correct if provided and the path the hip module file has been verified. then it is safe to do |
Added ROCM_PATH environment variable.
mlir/lib/Dialect/GPU/CMakeLists.txt | ||
---|---|---|
144 | The primary intention here is to avoid fallback to a default ROCM path to avoid unwanted issues related to multiple installation, however I agree the we should use paths with verification so I have accommodated the changes for HIP excluding the hint. Thanks for the code snippets. |
I don't know about the MLIR parts, but replacing
find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATHS /opt/rocm)
with
find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATHS ${ROCM_PATH})
seems an improvement, where ROCM_PATH is ideally a CMake variable spelled the same way in other components, such that someone can build an LLVM with the components all hooked into a specific ROCm location
edit:
device libs and comgr (possibly an old version) are using find_package(ROCM PATHS "/opt/rocm"), rocr looks in find_package(hsakmt 1.0 REQUIRED HINTS ${CMAKE_INSTALL_PREFIX} PATHS /opt/rocm)
Suggest the rest of the rocm stack comes up with some preferred cmake variable for 'look at this rocm' and we then use that variable here. Not at all keen on configuring the build using environment variables.
Marking as request changes because:
if (DEFINED ENV{ROCM_PATH})
Looking at environment variables is strictly worse than looking at cmake variables. I think we should replace /opt/rocm with a cmake variable that points to somewhere to find rocr/roct, and have no strong feelings about what the cmake variable should be called
@estewart08 thoughts on a good CMAKE variable to allow users to define equivalent of /opt/rocm ? and not use environment variable inside the cmake file.
Diff was created without context unfortunately, scrolling through the cmake locally suggests variables have names of the form LIBOMPTARGET_AMDGPU_FOO though the namespacing isn't as consistent as it might be.
Something like:
if (not defined (LIBOMPTARGET_AMDGPU_HSA_SEARCH_PATH)) set(LIBOMPTARGET_AMDGPU_HSA_SEARCH_PATH "/opt/rocm") endif()
where I'm not confident of the cmake syntax for 'is this thing defined' or 'assign', looks like it might be if (NOT LIBOMPTARGET_AMDGPU_HSA_SEARCH_PATH) and set.
I would be ok with the following, without the check for ENV{ROCM_PATH}. The user has the option to set -DROCM_PATH=$ROCM_PATH or -DCMAKE_PREFIX_PATH.
find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATH ${ROCM_PATH})
Dropping the default install location from the default search hint is entirely unreasonable
That seems to be the feature request, though I haven't gone through the jira web to work out what's going on here.
We can either have opt-in to somewhere other than CMAKE_INSTALL_PREFIX, which I think Ethan's suggestion does (not clear whether PATH between the two variables is semantically important), or we could have opt-out for people who want to be sure they aren't picking up something in /opt/rocm.
edit: From the docs https://cmake.org/cmake/help/latest/command/find_package.html
Search paths specified by the HINTS option. These should be paths computed by system introspection, such as a hint provided by the location of another item already found. Hard-coded guesses should be specified with the PATHS option.
This isn't a feature, it's the introduction of a bug. It degrades the usability of the build. If you want an explicit version, that's what the search order process is for.
We can either have opt-in to somewhere other than CMAKE_INSTALL_PREFIX, which I think Ethan's suggestion does (not clear whether PATH between the two variables is semantically important), or we could have opt-out for people who want to be sure they aren't picking up something in /opt/rocm.
This isn't how search paths work. By default, the fallback should always find the default location
clang/tools/amdgpu-arch/CMakeLists.txt | ||
---|---|---|
13 | I also think CMAKE_INSTALL_PREFIX in the search hints is bad practice. I don't recall ever seeing a project do this. Depending on the install path for anything leads to flaky builds |
It would regress people depending on the implicit pickup of /opt/rocm, leaving them with a straightforward workaround of setting the cmake variable.
The inverse, where we look in /opt/rocm unless that's overridden (perhaps by the empty string), achieves much the same effect without breaking anyone using trunk with a rocm install.
I'm very mistrusting of mixing a rocm toolchain with a trunk toolchain as they're both quite keen on runpath and LD_LIBRARY_PATH to find internal components. That makes it very easy to accidentally mix the two together into something that doesn't work so personal preference is to rip out the /opt/rocm search path for HSA entirely and encourage people to build it from source.
clang/tools/amdgpu-arch/CMakeLists.txt | ||
---|---|---|
13 | This was copied from the amdgpu plugin. I can't remember where I copied that from. Alternatives welcome - what's the proper way to indicate 'this library? it's probably next to all the other llvm libraries' |
Exactly. Builds that don't find their dependencies in the default location without doing anything are bad builds. It adds extra work and discovery a casual builder doesn't actually care about.
The inverse, where we look in /opt/rocm unless that's overridden (perhaps by the empty string), achieves much the same effect without breaking anyone using trunk with a rocm install.
Exactly, this is how the search process is supposed to work. HINTS are at the bottom of the search hierarchy. The mechanisms for finding a specific version win out over the basic hint to the default
I'm very mistrusting of mixing a rocm toolchain with a trunk toolchain as they're both quite keen on runpath and LD_LIBRARY_PATH to find internal components. That makes it very easy to accidentally mix the two together into something that doesn't work so personal preference is to rip out the /opt/rocm search path for HSA entirely and encourage people to build it from source.
There are a number of cmake crimes going on in both of these, but this isn't one of them. LD_LIBRARY_PATH should not be used for any builds to find components.
The LD_LIBRARY_PATH hazard isn't at compiler build time, it's at application run time. E.g. someone points LD_LIBRARY_PATH at a rocm install to get libhsa and gets that plus some rocm-distributed llvm libraries which don't necessarily match trunk. That's a different hazard to this particular one, except that cmake going in search of /opt/rocm suggests that arbitrary combinations of rocm and trunk will work together when it's closer to zero combinations work.
clang/tools/amdgpu-arch/CMakeLists.txt | ||
---|---|---|
13 | CMAKE_INSTALL_PREFIX should be removed. The search line should be as simple as find_package(hsa-runtime64 QUIET 1.2.0 PATH /opt/rocm) /opt/rocm can be kept as the lowest priority searching guess. There is no good reason to use ENV{ROCM_PATH} nor ROCM_PATH. I have enough pain with XXX_ROOT, XXX_HOME, XXX_PATH environment variables. If you need ROCM_PATH because it comes from modules (environment modules, lmod), ask the maintainer to add proper CMAKE_PREFIX_PATH. If hsa is not pulled in via modules, it is user responsible to tell CMake where the hsa installation is. In addition, please ask the hsa maintainer to move hsa-runtime64 cmake files from "/opt/rocm/lib/cmake/hsa-runtime64" to the hsa library "/opt/rocm/hsa/lib/cmake/hsa-runtime64" and then softlink to /opt/rocm/lib not the other way round. In such way, the hsa library from /opt/rocm can be used as an independent library without pulling the whole /opt/rocm if needed. | |
mlir/lib/Dialect/GPU/CMakeLists.txt | ||
144 | Do you know by chance that whether MLIR really need the hip runtime library or it only needs the HSA as well? |
We should have versioned libraries and well defined ABI breaks. The build should know what runtime versions it requires and account when searching
If we're playing that game, ROCm should be strictly an extension of upstream LLVM, such that code which accidentally or intentionally picks up parts of ROCm while expecting parts of trunk runs correctly. I'd be happy with the easier objective of making it difficult to accidentally splice together parts of different toolchains at runtime.
ROCm has as much to do with llvm as the linux kernel does. The compiler and runtime version should not be so tightly linked / the runtime so unstable.
@ronlieb do you have any more information? The patch says remove /opt/rocm but the discussion seems to be going another way. What's the actual req here?
Our conclusion is that the /opt/rocm weak default needs to stay in place. Thus this patch should be abandoned in my opinion. MarkS and I are requesting the folks that requestedt his change within AMD, revisit it.
Sounds good to me.
CMAKE_INSTALL_PREFIX should be removed.
I think we have consensus on that ^, orthogonal to the /opt/rocm presence or absence. As long as there's some way to tell cmake to use a given HSA, and it seems there are several, deleting that string is an improvement.
I also think CMAKE_INSTALL_PREFIX in the search hints is bad practice. I don't recall ever seeing a project do this. Depending on the install path for anything leads to flaky builds