This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][[amdgpu-arch]][OpenMP] Remove direct dependency on /opt/rocm
AbandonedPublic

Authored by dbhaskaran on Sep 16 2021, 7:29 AM.

Details

Summary

This avoids undefined behavior/issues resulting out of unconventional
or co-existing ROCm installations.

Diff Detail

Event Timeline

dbhaskaran created this revision.Sep 16 2021, 7:29 AM
dbhaskaran requested review of this revision.Sep 16 2021, 7:29 AM
Herald added a reviewer: herhut. · View Herald Transcript
Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald Transcript

Dropping PATHS /opt/rocm from the openmp parts will break people using an existing rocm install with trunk llvm. I think it would be reasonable to look at whatever ROCM_PATH is set to instead of assuming /opt/rocm. Is that a variable we can pass to find_package?

ye-luo requested changes to this revision.Sep 16 2021, 11:04 AM
ye-luo added a subscriber: ye-luo.

The fallback opt/rocm is desired. If a module system needs to point to the specific rocm installation. Set CMAKE_PREFIX_PATH=<unconventional rocm> in the module file.
If you would like to honor ROCM_PATH, then do one find_library with explicit ROCM_PATH and another with all the defaults.

mlir/lib/Dialect/GPU/CMakeLists.txt
144

Both ROCM_PATH HIP_PATH are used as hints without verification.
But they are used later for generating include paths. Overall logic is broken.

if ROCM_PATH takes the precedence over everything else
You can do this
if ROCM_PATH defined

find_path(
  HIP_MODULE_FILE_DIR FindHIP.cmake
  HINTS ${ROCM_PATH}
  PATH_SUFFIXES hip/cmake REQUIRED
  NO_DEFAULT_PATH)

else

find_path(
  HIP_MODULE_FILE_DIR FindHIP.cmake
  HINTS $ENV{ROCM_PATH} /opt/rocm
  PATH_SUFFIXES hip/cmake REQUIRED)

endif

by doing this, you can verify that ROCM_PATH is correct if provided and the path the hip module file has been verified. then it is safe to do
set(CMAKE_MODULE_PATH "${HIP_MODULE_FILE_DIR}" ${CMAKE_MODULE_PATH})
find_package(HIP)

This revision now requires changes to proceed.Sep 16 2021, 11:04 AM

Dropping PATHS /opt/rocm from the openmp parts will break people using an existing rocm install with trunk llvm. I think it would be reasonable to look at whatever ROCM_PATH is set to instead of assuming /opt/rocm. Is that a variable we can pass to find_package?

Added ROCM_PATH environment variable.

mlir/lib/Dialect/GPU/CMakeLists.txt
144

The primary intention here is to avoid fallback to a default ROCM path to avoid unwanted issues related to multiple installation, however I agree the we should use paths with verification so I have accommodated the changes for HIP excluding the hint. Thanks for the code snippets.

dbhaskaran updated this revision to Diff 385398.Nov 7 2021, 9:22 PM

Updated with review comments.

JonChesterfield added a comment.EditedNov 8 2021, 4:53 AM

I don't know about the MLIR parts, but replacing

find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATHS /opt/rocm)

with

find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATHS ${ROCM_PATH})

seems an improvement, where ROCM_PATH is ideally a CMake variable spelled the same way in other components, such that someone can build an LLVM with the components all hooked into a specific ROCm location

edit:

device libs and comgr (possibly an old version) are using find_package(ROCM PATHS "/opt/rocm"), rocr looks in find_package(hsakmt 1.0 REQUIRED HINTS ${CMAKE_INSTALL_PREFIX} PATHS /opt/rocm)

Suggest the rest of the rocm stack comes up with some preferred cmake variable for 'look at this rocm' and we then use that variable here. Not at all keen on configuring the build using environment variables.

JonChesterfield requested changes to this revision.Nov 8 2021, 5:01 AM
JonChesterfield added a reviewer: ronl.

Marking as request changes because:
if (DEFINED ENV{ROCM_PATH})
Looking at environment variables is strictly worse than looking at cmake variables. I think we should replace /opt/rocm with a cmake variable that points to somewhere to find rocr/roct, and have no strong feelings about what the cmake variable should be called

This revision now requires changes to proceed.Nov 8 2021, 5:01 AM

@estewart08 thoughts on a good CMAKE variable to allow users to define equivalent of /opt/rocm ? and not use environment variable inside the cmake file.

Diff was created without context unfortunately, scrolling through the cmake locally suggests variables have names of the form LIBOMPTARGET_AMDGPU_FOO though the namespacing isn't as consistent as it might be.

Something like:

if (not defined (LIBOMPTARGET_AMDGPU_HSA_SEARCH_PATH))
set(LIBOMPTARGET_AMDGPU_HSA_SEARCH_PATH "/opt/rocm")
endif()

where I'm not confident of the cmake syntax for 'is this thing defined' or 'assign', looks like it might be if (NOT LIBOMPTARGET_AMDGPU_HSA_SEARCH_PATH) and set.

@estewart08 thoughts on a good CMAKE variable to allow users to define equivalent of /opt/rocm ? and not use environment variable inside the cmake file.

I would be ok with the following, without the check for ENV{ROCM_PATH}. The user has the option to set -DROCM_PATH=$ROCM_PATH or -DCMAKE_PREFIX_PATH.

find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATH ${ROCM_PATH})
arsenm requested changes to this revision.Dec 16 2021, 11:06 AM
arsenm added a subscriber: arsenm.

Dropping the default install location from the default search hint is entirely unreasonable

JonChesterfield added a comment.EditedDec 16 2021, 11:26 AM

Dropping the default install location from the default search hint is entirely unreasonable

That seems to be the feature request, though I haven't gone through the jira web to work out what's going on here.

We can either have opt-in to somewhere other than CMAKE_INSTALL_PREFIX, which I think Ethan's suggestion does (not clear whether PATH between the two variables is semantically important), or we could have opt-out for people who want to be sure they aren't picking up something in /opt/rocm.

edit: From the docs https://cmake.org/cmake/help/latest/command/find_package.html

Search paths specified by the HINTS option. These should be paths computed by system introspection, such as a hint provided by the location of another item already found. Hard-coded guesses should be specified with the PATHS option.

Dropping the default install location from the default search hint is entirely unreasonable

That seems to be the feature request, though I haven't gone through the jira web to work out what's going on here.

This isn't a feature, it's the introduction of a bug. It degrades the usability of the build. If you want an explicit version, that's what the search order process is for.

We can either have opt-in to somewhere other than CMAKE_INSTALL_PREFIX, which I think Ethan's suggestion does (not clear whether PATH between the two variables is semantically important), or we could have opt-out for people who want to be sure they aren't picking up something in /opt/rocm.

This isn't how search paths work. By default, the fallback should always find the default location

arsenm added inline comments.Dec 16 2021, 11:44 AM
clang/tools/amdgpu-arch/CMakeLists.txt
13

I also think CMAKE_INSTALL_PREFIX in the search hints is bad practice. I don't recall ever seeing a project do this. Depending on the install path for anything leads to flaky builds

This isn't a feature, it's the introduction of a bug.

It would regress people depending on the implicit pickup of /opt/rocm, leaving them with a straightforward workaround of setting the cmake variable.

The inverse, where we look in /opt/rocm unless that's overridden (perhaps by the empty string), achieves much the same effect without breaking anyone using trunk with a rocm install.

I'm very mistrusting of mixing a rocm toolchain with a trunk toolchain as they're both quite keen on runpath and LD_LIBRARY_PATH to find internal components. That makes it very easy to accidentally mix the two together into something that doesn't work so personal preference is to rip out the /opt/rocm search path for HSA entirely and encourage people to build it from source.

clang/tools/amdgpu-arch/CMakeLists.txt
13

This was copied from the amdgpu plugin. I can't remember where I copied that from. Alternatives welcome - what's the proper way to indicate 'this library? it's probably next to all the other llvm libraries'

This isn't a feature, it's the introduction of a bug.

It would regress people depending on the implicit pickup of /opt/rocm, leaving them with a straightforward workaround of setting the cmake variable.

Exactly. Builds that don't find their dependencies in the default location without doing anything are bad builds. It adds extra work and discovery a casual builder doesn't actually care about.

The inverse, where we look in /opt/rocm unless that's overridden (perhaps by the empty string), achieves much the same effect without breaking anyone using trunk with a rocm install.

Exactly, this is how the search process is supposed to work. HINTS are at the bottom of the search hierarchy. The mechanisms for finding a specific version win out over the basic hint to the default

I'm very mistrusting of mixing a rocm toolchain with a trunk toolchain as they're both quite keen on runpath and LD_LIBRARY_PATH to find internal components. That makes it very easy to accidentally mix the two together into something that doesn't work so personal preference is to rip out the /opt/rocm search path for HSA entirely and encourage people to build it from source.

There are a number of cmake crimes going on in both of these, but this isn't one of them. LD_LIBRARY_PATH should not be used for any builds to find components.

JonChesterfield added a comment.EditedDec 19 2021, 10:59 PM

I'm very mistrusting of mixing a rocm toolchain with a trunk toolchain as they're both quite keen on runpath and LD_LIBRARY_PATH to find internal components. That makes it very easy to accidentally mix the two together into something that doesn't work so personal preference is to rip out the /opt/rocm search path for HSA entirely and encourage people to build it from source.

There are a number of cmake crimes going on in both of these, but this isn't one of them. LD_LIBRARY_PATH should not be used for any builds to find components.

The LD_LIBRARY_PATH hazard isn't at compiler build time, it's at application run time. E.g. someone points LD_LIBRARY_PATH at a rocm install to get libhsa and gets that plus some rocm-distributed llvm libraries which don't necessarily match trunk. That's a different hazard to this particular one, except that cmake going in search of /opt/rocm suggests that arbitrary combinations of rocm and trunk will work together when it's closer to zero combinations work.

ye-luo added inline comments.Dec 20 2021, 12:01 AM
clang/tools/amdgpu-arch/CMakeLists.txt
13

CMAKE_INSTALL_PREFIX should be removed.
If that search location is intended, pass it via <PackageName>_ROOT, CMAKE_PREFIX_PATH or CMAKE_LIBRARY_PATH.
See searching order https://cmake.org/cmake/help/latest/command/find_library.html

The search line should be as simple as

find_package(hsa-runtime64 QUIET 1.2.0 PATH /opt/rocm)

/opt/rocm can be kept as the lowest priority searching guess.

There is no good reason to use ENV{ROCM_PATH} nor ROCM_PATH. I have enough pain with XXX_ROOT, XXX_HOME, XXX_PATH environment variables. If you need ROCM_PATH because it comes from modules (environment modules, lmod), ask the maintainer to add proper CMAKE_PREFIX_PATH. If hsa is not pulled in via modules, it is user responsible to tell CMake where the hsa installation is.

In addition, please ask the hsa maintainer to move hsa-runtime64 cmake files from "/opt/rocm/lib/cmake/hsa-runtime64" to the hsa library "/opt/rocm/hsa/lib/cmake/hsa-runtime64" and then softlink to /opt/rocm/lib not the other way round. In such way, the hsa library from /opt/rocm can be used as an independent library without pulling the whole /opt/rocm if needed.

mlir/lib/Dialect/GPU/CMakeLists.txt
144

Do you know by chance that whether MLIR really need the hip runtime library or it only needs the HSA as well?

I'm very mistrusting of mixing a rocm toolchain with a trunk toolchain as they're both quite keen on runpath and LD_LIBRARY_PATH to find internal components. That makes it very easy to accidentally mix the two together into something that doesn't work so personal preference is to rip out the /opt/rocm search path for HSA entirely and encourage people to build it from source.

There are a number of cmake crimes going on in both of these, but this isn't one of them. LD_LIBRARY_PATH should not be used for any builds to find components.

The LD_LIBRARY_PATH hazard isn't at compiler build time, it's at application run time. E.g. someone points LD_LIBRARY_PATH at a rocm install to get libhsa and gets that plus some rocm-distributed llvm libraries which don't necessarily match trunk. That's a different hazard to this particular one, except that cmake going in search of /opt/rocm suggests that arbitrary combinations of rocm and trunk will work together when it's closer to zero combinations work.

We should have versioned libraries and well defined ABI breaks. The build should know what runtime versions it requires and account when searching

We should have versioned libraries and well defined ABI breaks. The build should know what runtime versions it requires and account when searching

If we're playing that game, ROCm should be strictly an extension of upstream LLVM, such that code which accidentally or intentionally picks up parts of ROCm while expecting parts of trunk runs correctly. I'd be happy with the easier objective of making it difficult to accidentally splice together parts of different toolchains at runtime.

We should have versioned libraries and well defined ABI breaks. The build should know what runtime versions it requires and account when searching

If we're playing that game, ROCm should be strictly an extension of upstream LLVM, such that code which accidentally or intentionally picks up parts of ROCm while expecting parts of trunk runs correctly. I'd be happy with the easier objective of making it difficult to accidentally splice together parts of different toolchains at runtime.

ROCm has as much to do with llvm as the linux kernel does. The compiler and runtime version should not be so tightly linked / the runtime so unstable.

ronlieb requested changes to this revision.Dec 21 2021, 12:01 PM

This needs to be revisited.

@ronlieb do you have any more information? The patch says remove /opt/rocm but the discussion seems to be going another way. What's the actual req here?

Our conclusion is that the /opt/rocm weak default needs to stay in place. Thus this patch should be abandoned in my opinion. MarkS and I are requesting the folks that requestedt his change within AMD, revisit it.

Sounds good to me.

CMAKE_INSTALL_PREFIX should be removed.

I think we have consensus on that ^, orthogonal to the /opt/rocm presence or absence. As long as there's some way to tell cmake to use a given HSA, and it seems there are several, deleting that string is an improvement.

dbhaskaran abandoned this revision.Jan 2 2022, 9:25 PM