Page MenuHomePhabricator

AMDGPU/clang: Search resource directory for device libraries
Needs ReviewPublic

Authored by arsenm on Jul 17 2020, 12:10 PM.

Details

Reviewers
yaxunl
tra
Summary

This should be the preferred way of locating the libraries, and it's a
packaging problem to ensure the libraries are symlinked into the
resource directory.

This takes precedence over searching for the rocm installation, but
check an explicit --hip-device-lib-path first.

Diff Detail

Event Timeline

arsenm created this revision.Jul 17 2020, 12:10 PM
tra added a comment.Jul 17 2020, 1:13 PM

Could you walk me through how you see this working in practice?

IIUIC, the idea is to have bitcode files located somewhere within clang installation.
If that's the case, will we ship those bitcode libraries with clang, or do they come from ROCm packages?
If we ship them with clang, who/where/how builds them?
If they come from ROCm packages, how would those packages add stuff into *clang* install directory? Resource dir is a rather awkward location if contents may be expected to change routinely.
What if I have multiple ROCm versions installed? Which one should provide the bitcode in the resource dir?

As long as explicitly specified --hip-device-lib-path can still point to the right path, it's probably OK, but it all adds some confusion about who controls which parts of the HIP compilation and how it all is supposed to work in cases that deviate from the default assumptions.
It would help if the requirements would be documented somewhere.

arsenm added a comment.Aug 8 2020, 7:28 AM
In D84068#2159391, @tra wrote:

Could you walk me through how you see this working in practice?

IIUIC, the idea is to have bitcode files located somewhere within clang installation.

Yes

If that's the case, will we ship those bitcode libraries with clang, or do they come from ROCm packages?

The rocm package will install symlinks into the clang resource directory. The rocm provided clang package should depend on the bitcode package

If we ship them with clang, who/where/how builds them?
If they come from ROCm packages, how would those packages add stuff into *clang* install directory? Resource dir is a rather awkward location if contents may be expected to change routinely.

Symlinks. I've been building the device libraries as part of LLVM_EXTERNAL_PROJECTS, and think this should be the preferred way to build and package the libraries. This is how compiler-rt is packaged on linux distributions. The compiler-rt binaries are a separate package symlinked into the resource directory locations. I'm not sure what you mean exactly by change routinely, the libraries should be an implementation detail invisible to users, not something they should be directly relying on. Only clang actually knows how to use them correctly and every other user is buggy

What if I have multiple ROCm versions installed? Which one should provide the bitcode in the resource dir?

These should be treated as an integral part of clang, and not something to mix and match. Each rocm version should have its own copy of the device libraries. It only happens to work most of the time if you mismatch these, and this isn't a guaranteed property.

As long as explicitly specified --hip-device-lib-path can still point to the right path, it's probably OK, but it all adds some confusion about who controls which parts of the HIP compilation and how it all is supposed to work in cases that deviate from the default assumptions.

Long term I would rather get rid of --hip-device-lib-path, and only use the standard -resource_dir flags

It would help if the requirements would be documented somewhere.

Documentation would be good, but the problem I always have is deciding where "somewhere" is

tra added a subscriber: echristo.Aug 10 2020, 2:15 PM

If we ship them with clang, who/where/how builds them?
If they come from ROCm packages, how would those packages add stuff into *clang* install directory? Resource dir is a rather awkward location if contents may be expected to change routinely.

Symlinks. I've been building the device libraries as part of LLVM_EXTERNAL_PROJECTS, and think this should be the preferred way to build and package the libraries. This is how compiler-rt is packaged on linux distributions. The compiler-rt binaries are a separate package symlinked into the resource directory locations. I'm not sure what you mean exactly by change routinely, the libraries should be an implementation detail invisible to users, not something they should be directly relying on. Only clang actually knows how to use them correctly and every other user is buggy

What if I have multiple ROCm versions installed? Which one should provide the bitcode in the resource dir?

These should be treated as an integral part of clang, and not something to mix and match. Each rocm version should have its own copy of the device libraries. It only happens to work most of the time if you mismatch these, and this isn't a guaranteed property.

I'm still not sure how that's going to work. We have M clang versions:N ROCm versions relationship here.
If I have one clang version, but want to do two different builds, one with ROCm-X and one with ROCm-Y, how would I do that? It sounds like I'll need to have multiple clang installation variants.

Similarly, if I have multiple clang versions installed, how would ROCm know which of those clang installations must be updated?

What if I install yet another clang version *after* ROCm has been installed, how will ROCm package know that it needs up update yet another clang installation.

This will get rather unmanageable as soon as you get beyond the "I only have one clang and one ROCm version" scenario.

I think it would make much more sense for clang to treat ROCm's bits as an external dependency, similar to CUDA. Be definition clang/llvm does not control anything outside of its own packages. While ROCm is AMD's package, I'm willing to bet that eventually various Linux distros will start shuffling its bits around the same way it happened to CUDA.

As long as explicitly specified --hip-device-lib-path can still point to the right path, it's probably OK, but it all adds some confusion about who controls which parts of the HIP compilation and how it all is supposed to work in cases that deviate from the default assumptions.

Long term I would rather get rid of --hip-device-lib-path, and only use the standard -resource_dir flags

Please, please, please keep explicit path option. There are real use cases where you can not expect ROCm to be installed anywhere 'standard'. Or at all.
Imagine people embedding libclang into their GUI/tools. There's no resource directory. There may be no ROCm installation, or it may not be possible due to lack of privileges.

I short, I think that tightly coupling clang's expectations to a non-clang project is not a good idea.
Summoning @echristo for a second opinion.

It would help if the requirements would be documented somewhere.

Documentation would be good, but the problem I always have is deciding where "somewhere" is

clang/docs ?

In D84068#2208132, @tra wrote:

If we ship them with clang, who/where/how builds them?
If they come from ROCm packages, how would those packages add stuff into *clang* install directory? Resource dir is a rather awkward location if contents may be expected to change routinely.

Symlinks. I've been building the device libraries as part of LLVM_EXTERNAL_PROJECTS, and think this should be the preferred way to build and package the libraries. This is how compiler-rt is packaged on linux distributions. The compiler-rt binaries are a separate package symlinked into the resource directory locations. I'm not sure what you mean exactly by change routinely, the libraries should be an implementation detail invisible to users, not something they should be directly relying on. Only clang actually knows how to use them correctly and every other user is buggy

What if I have multiple ROCm versions installed? Which one should provide the bitcode in the resource dir?

These should be treated as an integral part of clang, and not something to mix and match. Each rocm version should have its own copy of the device libraries. It only happens to work most of the time if you mismatch these, and this isn't a guaranteed property.

I'm still not sure how that's going to work. We have M clang versions:N ROCm versions relationship here.
If I have one clang version, but want to do two different builds, one with ROCm-X and one with ROCm-Y, how would I do that? It sounds like I'll need to have multiple clang installation variants.

Similarly, if I have multiple clang versions installed, how would ROCm know which of those clang installations must be updated?

What if I install yet another clang version *after* ROCm has been installed, how will ROCm package know that it needs up update yet another clang installation.

This will get rather unmanageable as soon as you get beyond the "I only have one clang and one ROCm version" scenario.

What I'm aiming for is a 1 clang : 1 device library copy. Every clang should have its own device library build. It's an implementation detail of clang, and not an independent component you can do anything with (correctly at least). clang is also minimally useful without the libraries

I think it would make much more sense for clang to treat ROCm's bits as an external dependency, similar to CUDA. Be definition clang/llvm does not control anything outside of its own packages. While ROCm is AMD's package, I'm willing to bet that eventually various Linux distros will start shuffling its bits around the same way it happened to CUDA.

Long term, I'd rather aim for merging rocm-device-libs into libclc and making it an llvm project. They're largely forks of the same original sources from about 5 years ago, and it's an unfortunate split of effort. I also specifically do not want distros to be shuffling this around, and want it to behave exactly like compiler-rt. As far as I can tell cuda clang does not actually work with the Ubuntu packaged cuda, which arbitrarily moved the nvvm binaries, and I don't really want a repeat of this situation. I also do want the device libs build to support a non-rocm package to install to a standard distro clang package, which is different than the rocm libraries trying to support every clang in the universe. Ideally just a regular clang works without any formal rocm installation.

As long as explicitly specified --hip-device-lib-path can still point to the right path, it's probably OK, but it all adds some confusion about who controls which parts of the HIP compilation and how it all is supposed to work in cases that deviate from the default assumptions.

Long term I would rather get rid of --hip-device-lib-path, and only use the standard -resource_dir flags

Please, please, please keep explicit path option. There are real use cases where you can not expect ROCm to be installed anywhere 'standard'. Or at all.
Imagine people embedding libclang into their GUI/tools. There's no resource directory. There may be no ROCm installation, or it may not be possible due to lack of privileges.

This isn't any different that compiler-rt; I would expect these cases to embed and mount these in a virtual filesystem.

I short, I think that tightly coupling clang's expectations to a non-clang project is not a good idea.
Summoning @echristo for a second opinion.

With the dance of searching for a rocm or cuda installation, it's still a coupling. It's just weirder looking and assumes the existing poor, non-standard packaging practices. What I want is something that's indistinguishable from compiler-rt from a packaging perspective.

Long term, I'd rather aim for merging rocm-device-libs into libclc and making it an llvm project. They're largely forks of the same original sources from about 5 years ago, and it's an unfortunate split of effort. I also specifically do not want distros to be shuffling this around, and want it to behave exactly like compiler-rt. As far as I can tell cuda clang does not actually work with the Ubuntu packaged cuda, which arbitrarily moved the nvvm binaries, and I don't really want a repeat of this situation. I also do want the device libs build to support a non-rocm package to install to a standard distro clang package, which is different than the rocm libraries trying to support every clang in the universe. Ideally just a regular clang works without any formal rocm installation.

I like the idea of making rocm-device-libs part of llvm-projects, but I think rocm-device-libs is not an OpenCL device library. That said, before rocm-device-library become part of llvm-project, we need to support rocm-device-library at other locations.

Hi All,

I'd really like to avoid depending on any program other than clang installing something into clang's working directory. I think this needs to be written from the perspective of specifying another directory to look for things.

-eric

Hi All,

I'd really like to avoid depending on any program other than clang installing something into clang's working directory. I think this needs to be written from the perspective of specifying another directory to look for things.

-eric

Do you mean "resource directory" by "working directory"? It's not copying to the working directory.

My point is these libraries *are* part of clang