This is an archive of the discontinued LLVM Phabricator instance.

Set rpath on openmp executables
ClosedPublic

Authored by JonChesterfield on Jan 28 2022, 10:03 AM.

Details

Summary

Openmp executables need to find libomp and libomptarget at runtime.
This currently requires LD_LIBRARY_PATH or the user to specify rpath. Change
that to set the expected location of the openmp libraries in the install tree.

Whether rpath means rpath or runpath is system dependent. The attached test
shows that the Wl,--disable-new-dtags control interacts correctly with this feature.

The implicit rpath field is appended to any user specified ones which is ideal.

Diff Detail

Event Timeline

JonChesterfield requested review of this revision.Jan 28 2022, 10:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 28 2022, 10:03 AM
JonChesterfield added a comment.EditedJan 28 2022, 10:07 AM

Testing manually looks good, provided there's no command line argument involved. -rpath even composes sanely, so doing this plus passing -Wl,rpath= results in multiple embedded rpaths. At a loss as to why changing Options.td is working so poorly, though it may be relevant that ninja thinks there's no work to do when the file changes. Trying things like:

def fno_openmp_add_rpath: Flag<["-"], "fno-openmp-add-rpath">, Flags<[NoArgumentUnused]>,
  HelpText<"Do not add -rpath with architecture-specific resource directory to the linker flags">;

which is a direct copy of fno_rtlib_add_path, didn't even bother changing the help text, and it's still missing from the clang executable.

Can we test this somehow?

My plan is to feed executables to readelf -d. Commandline problems seem to be solvable by clean builds between changes, slow but feasible. Might upset the incremental buildbot

  • Now with commandline argument

Slightly stalled on testing - I'd like to emit the object and feed it to readelf, something like:
// RUN: %clang %s -o %t && llvm-readelf --dynamic-table %t | FileCheck %s --check-prefixes=CHECK

which errors with cannot find -lomp. I feel there should be a linker flag to the effect of "ignore libraries you can't find", which I have failed to identify. Candidate workarounds are putting empty files called libomp.a in Inputs, changing the control flow in addOpenMPRuntime or finding said linker flag.

Toolchain style testing (like Driver/amdgpu-openmp-toolchain) is sufficient to show rpath is inserted, e.g. "-lomp" "-lomptarget" "-rpath" "*/llvm/lib", but doesn't let one check interaction with user provided rpaths. I'll pick this up over the weekend.

Poking it manually works as one would wish so the functionality looks fine. Please speak up in the meantime if you'd like the option to be named or documented differently, or if there's any other change desired for the non-testing part.

  • set rpath after libomptarget, reads better than between the two -l flags
  • Test rpath and runpath setting
  • composes with user specified rpath flags
JonChesterfield retitled this revision from [WIP]Set rpath on openmp executables to Set rpath on openmp executables.Jan 31 2022, 6:10 AM
JonChesterfield edited the summary of this revision. (Show Details)
  • Disable implicit rpath for the tests so we can be certain the tests aren't picking up unrelated libs
Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2022, 6:13 AM
  • rebase and format
clang/test/OpenMP/implicit_rpath.c
28 ↗(On Diff #404499)

This ^ probably has path separator issues on windows, will try to find what the proper regex is for that

jhuber6 added inline comments.Jan 31 2022, 7:40 AM
clang/test/OpenMP/implicit_rpath.c
28 ↗(On Diff #404499)
jhuber6 accepted this revision.Jan 31 2022, 7:41 AM

LGTM, unless someone else has reservations.

This revision is now accepted and ready to land.Jan 31 2022, 7:41 AM

Yep. I'm not totally confident what windows builds will embed in the dynamic table - might be windows style \ or it might be linux style /. Or it might be \\, which can be matched as in D109057. I think I'm going with the very conservative llvm{{*.}}lib pattern, as provided rpath/runpath has been written in the binary it probably points to the llvm dir in platform-specific fashion.

  • More paranoid regex - not sure what windows uses as the path separator in the dynamic table
This revision was landed with ongoing or failed builds.Jan 31 2022, 8:01 AM
This revision was automatically updated to reflect the committed changes.

Apologies for the noise, turns out 'llvm/lib' and 'llvm/lib64' are not the only two options. Made it more permissive and reapplied. Also looks like ppc64le defaults to rpath as opposed to runpath, so that's worth knowing.

JonChesterfield added a subscriber: kparzysz.

This test doesn't work on hexagon because hexagon doesn't have a linker (??), emailed @kparzysz asking how to disable tests on hexagon as I can't find a likely looking UNSUPPORTED string using grep.

thakis added a subscriber: thakis.Jan 31 2022, 10:01 AM

looks like the mac linker doesn't like this test either: http://45.33.8.238/macm1/26834/step_7.txt

JonChesterfield added a comment.EditedJan 31 2022, 10:08 AM

looks like the mac linker doesn't like this test either: http://45.33.8.238/macm1/26834/step_7.txt

Error message is ld: file too small (length=8) file '/Users/thakis/src/llvm-project/clang/test/OpenMP/Inputs/libomp.a' for architecture arm64

On the face of it I'd say that's a bug in your linker. The archive file is the magic !<arch>\n string by itself which is well formed to the extent the archive format is documented. It's definitely possible to put more text in the stub file to work around though it's hard to tell how much text, or whether your linker will do other validation like seeing whether the symbol table is present or objects actually contain machine code.

I'm going to delete the test file in the meantime.
edit: or less aggressive, is there an UNSUPPORTED flag I can add that will drop it from your CI?

JonChesterfield reopened this revision.Jan 31 2022, 10:13 AM
This revision is now accepted and ready to land.Jan 31 2022, 10:13 AM

looks like the mac linker doesn't like this test either: http://45.33.8.238/macm1/26834/step_7.txt

Error message is ld: file too small (length=8) file '/Users/thakis/src/llvm-project/clang/test/OpenMP/Inputs/libomp.a' for architecture arm64

On the face of it I'd say that's a bug in your linker.

It's just the standard Xcode linker, I didn't write it :)

The archive file is the magic !<arch>\n string by itself which is well formed to the extent the archive format is documented. It's definitely possible to put more text in the stub file to work around though it's hard to tell how much text, or whether your linker will do other validation like seeing whether the symbol table is present or objects actually contain machine code.

I'm going to delete the test file in the meantime.
edit: or less aggressive, is there an UNSUPPORTED flag I can add that will drop it from your CI?

Maybe you can pass an explicit --target triple to clang, and just check -### instead of actually linking, like driver tests usually do?

JonChesterfield added a comment.EditedJan 31 2022, 10:41 AM

looks like the mac linker doesn't like this test either: http://45.33.8.238/macm1/26834/step_7.txt

Error message is ld: file too small (length=8) file '/Users/thakis/src/llvm-project/clang/test/OpenMP/Inputs/libomp.a' for architecture arm64

On the face of it I'd say that's a bug in your linker.

It's just the standard Xcode linker, I didn't write it :)

The archive file is the magic !<arch>\n string by itself which is well formed to the extent the archive format is documented. It's definitely possible to put more text in the stub file to work around though it's hard to tell how much text, or whether your linker will do other validation like seeing whether the symbol table is present or objects actually contain machine code.

I'm going to delete the test file in the meantime.
edit: or less aggressive, is there an UNSUPPORTED flag I can add that will drop it from your CI?

Maybe you can pass an explicit --target triple to clang, and just check -### instead of actually linking, like driver tests usually do?

The point of the test is checking that the weirdness of rpath vs runpath combined with user specified flags works out properly after going through the linker so that would lose most (possibly all?) of the coverage.

The test may be more bother than it's worth though. I'll drop it for now. d14897c7dad8c05c003bada019c0b573ace63a1a

The behavior change can still be observed by a -### test -- just check that -rpath gets passed to the linker.

Clang tests generally can't rely on running linkers. Linkers tend to be platform-specific. Windows link.exe can't produce elf binaries, macOS's ld64 can't produce elf binaries, etc. That's why driver tests are all -###-based. (That, and we try to do more unit testing and less integration testing.)

tstellar added a subscriber: tstellar.EditedMar 21 2022, 4:50 PM

Fedora recently banned RUNPATH, so packages built with clang and -fopenmp are rejected by the build system with this change. Can you elaborate more about the intended use case that this change will fix? Is it just meant for when libomp.so is installed outside of /usr/lib64 ?

Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2022, 4:50 PM
JonChesterfield added a comment.EditedMar 22 2022, 1:10 AM

It lets applications run with the libomp and libomptarget built with the toolchain. For users, they don't have to be root. For compiler devs, we test our work instead of whatever the distro had installed.

There's no info at that link. What's 'broken rpath'? If they mean runpath, then it's really on fedora to have Wl,--rpath set rpath instead of runpath. Setting the default in the linker to the one your own build tools reject will break everything, not just llvm openmp.

Edit: https://fedoraproject.org/wiki/Changes/Broken_RPATH_will_fail_rpmbuild
Doesn't mean runpath. There's a list of strings Fedora (IBM?) have decided they don't like, named 'broken' or 'invalid', and declared shall not be present. Page doesn't go into the rpath/runpath distinction, maybe they mean for the string restrictions to apply to both.

Can't tell which rule we're violating (will need to work out what the install directory resolves to on fedora), though the one about executing untrusted code seems likely, where untrusted seems to be anything not owned by root. E.g. applications people have written themselves.

We can introduce a cmake flag that changes the default on this commandline argument, set when building on Fedora (and maybe redhat, centos-stream, others). That'll annoy anyone developing openmp on, or using a local llvm build on, Fedora but that might be zero people and they could override the cmake flag and disable the build checking scripts.

Or disable building openmp on Fedora, that seems reasonable too.

@estewart08 Fedora are rejecting some uses of rpath (and probably runpath). I can't find a corresponding page for redhat. This may become a problem for our aomp/rocm builds.

It lets applications run with the libomp and libomptarget built with the toolchain. For users, they don't have to be root. For compiler devs, we test our work instead of whatever the distro had installed.

There's no info at that link. What's 'broken rpath'? If they mean runpath, then it's really on fedora to have Wl,--rpath set rpath instead of runpath. Setting the default in the linker to the one your own build tools reject will break everything, not just llvm openmp.

Edit: https://fedoraproject.org/wiki/Changes/Broken_RPATH_will_fail_rpmbuild
Doesn't mean runpath. There's a list of strings Fedora (IBM?) have decided they don't like, named 'broken' or 'invalid', and declared shall not be present. Page doesn't go into the rpath/runpath distinction, maybe they mean for the string restrictions to apply to both.

I'm not sure I follow this, but there is a script run after every RPM built for fedora that checks for RUNPATH (among other things in the binary). I have been building packages for Fedora with clang14 and the script fails for packages built with clang + -fopenmp

Can't tell which rule we're violating (will need to work out what the install directory resolves to on fedora), though the one about executing untrusted code seems likely, where untrusted seems to be anything not owned by root. E.g. applications people have written themselves.

The rule that is broken is "standard RPATHs" Fedora installs libomp to /usr/lib64.

We can introduce a cmake flag that changes the default on this commandline argument, set when building on Fedora (and maybe redhat, centos-stream, others). That'll annoy anyone developing openmp on, or using a local llvm build on, Fedora but that might be zero people and they could override the cmake flag and disable the build checking scripts.

I don't really like adding CMake options for things like this. I think what we'll do in Fedora is just add -fno-openmp-implicit-rpath to the default CFLAGS so we can get the old behavior for official RPMs.

Or disable building openmp on Fedora, that seems reasonable too.

We want to be able to build packages that use OpenMP, so disabling is not really an option.

JonChesterfield added a comment.EditedMar 23 2022, 3:15 AM

The rule that is broken is "standard RPATHs" Fedora installs libomp to /usr/lib64.

...

I think what we'll do in Fedora is just add -fno-openmp-implicit-rpath to the default CFLAGS so we can get the old behavior for official RPMs.

Sounds good to me. I especially like the rapid unblocking.

Slightly longer term, we might want to adjust clang to notice when the rpath it is about to set coincides with a system directory. The purpose of this patch is to have clang -fopenmp -o a.out && ./a.out work without setting environment variables, but where the openmp libs in question are on the system search path the rpath is not necessary.

MaskRay added a subscriber: MaskRay.EditedJan 23 2023, 8:35 PM

Enabled-by-default -fopenmp-implicit-rpath is odd. For other shared objects (e.g. libclang_rt.*.so) we don't do this.
For libraries in a well-known system directory, we just rely on the default /etc/ld.so.conf*.
For libraries in other directories or when --sysroot= is specified, the user is responsible to set up it correctly.
An opt-out behavior causes follow-up workarounds like D142174.

(I am on a trip and don't check emails frequently.)

That works if you have one toolchain installed globally or you are happy to cobble together a working system using environment variables. If you have multiple toolchains, they can't all sit in the global directory. If you don't have root, you can't install them there.

Previously we required people to set LD_LIBRARY_PATH to use openmp at all. That's an inherently poor UX and interacted especially poorly with module setup systems found in HPC.

We could make this opt-in, at the cost of new users seeing that openmp just doesn't work out of the box and other ones having to modify build scripts. In exchange we get - consistency with other tools that fail to work without global installation or extra user burden.

In my opinion dynamically linking language runtimes is a bad default. It wins us an awful lot of failure modes. I've also argued against being able to swap out individual components via environment variables as way too error prone for a user facing interface. This is what the community consensus went for. The combination of dynamically substitutable and requiring LD_LIBRARY_PATH was especially sharp.

MaskRay added a comment.EditedJan 24 2023, 2:54 PM

That works if you have one toolchain installed globally or you are happy to cobble together a working system using environment variables. If you have multiple toolchains, they can't all sit in the global directory. If you don't have root, you can't install them there.

Previously we required people to set LD_LIBRARY_PATH to use openmp at all. That's an inherently poor UX and interacted especially poorly with module setup systems found in HPC.

We could make this opt-in, at the cost of new users seeing that openmp just doesn't work out of the box and other ones having to modify build scripts. In exchange we get - consistency with other tools that fail to work without global installation or extra user burden.

Well, if you have a toolchain at a non-standard location (I do this a lot myself), you can go an extra mile by specifying -Wl,-rpath=... in a Clang configuration file.
Gentoo has a nice summary of the feature https://blogs.gentoo.org/mgorny/2022/10/07/clang-in-gentoo-now-sets-default-runtimes-via-config-file/
libc++, libc++abi, libunwind, and compiler-rt don't add the extra DT_RUNPATH. Some build systems want to handle DT_RUNPATH themselves (e.g. CMAKE_INSTALL_RPATH).
I don't think it is a good idea for openmp to diverge and introduce an opt-out -fno-openmp-implicit-rpath.

In my opinion dynamically linking language runtimes is a bad default. It wins us an awful lot of failure modes. I've also argued against being able to swap out individual components via environment variables as way too error prone for a user facing interface. This is what the community consensus went for. The combination of dynamically substitutable and requiring LD_LIBRARY_PATH was especially sharp.

Well, some distributions have policies against DT_RUNPATH and this change is causing trouble. While a patch like D142174 can mitigate the problem, the hard-coded paths are ugly and may not work for another Linux distribution or another ELF-based operating system.

You don't necessarily use LD_LIBRARY_PATH. You can use -Wl,-rpath=...

nikic added a subscriber: nikic.Jan 26 2023, 7:48 AM

Reporting after another round of discussion. I don't have much support from the llvm openmp working group that we should maintain the current divergence from libc++ and the like so I'm backing down. Let's revert this and regress for all users who don't install globally.

The intent is to document in the FAQ things users can do themselves to replace the functionality here, where possibly one enumerated option is to pass --fopenmp-implicit-rpath (maybe renamed) though my preference would be to delete this code path from the compiler entirely if it's going to be a minimally-tested, non-default path (as it'll break and we won't notice).

@MaskRay please revert this patch at your leisure. I'm reluctantly consenting to the capability regression but not volunteering to fix any CI fallout from removing it.

@ronlieb I think we should keep this functionality in rocm so that rocm built binaries don't start going looking for llvm libs under /usr instead and fall over

Reporting after another round of discussion. I don't have much support from the llvm openmp working group that we should maintain the current divergence from libc++ and the like so I'm backing down. Let's revert this and regress for all users who don't install globally.

The intent is to document in the FAQ things users can do themselves to replace the functionality here, where possibly one enumerated option is to pass --fopenmp-implicit-rpath (maybe renamed) though my preference would be to delete this code path from the compiler entirely if it's going to be a minimally-tested, non-default path (as it'll break and we won't notice).

@MaskRay please revert this patch at your leisure. I'm reluctantly consenting to the capability regression but not volunteering to fix any CI fallout from removing it.

@ronlieb I think we should keep this functionality in rocm so that rocm built binaries don't start going looking for llvm libs under /usr instead and fall over

Thank you! I am preparing a commit to remove this then...

Duplicating a comment from the commit thread so it's easier for me to find later.

You've applied this to the release branches going back as far as 14. It's a user facing breaking change. As in people who have a working openmp toolchain and update to the point release which looks like a semantic versioning thing indicating minor bugfixes will experience immediate cessation of function. I consider that very poor user experience and do not agree with the scope of breakage.

Duplicating a comment from the commit thread so it's easier for me to find later.

You've applied this to the release branches going back as far as 14. It's a user facing breaking change. As in people who have a working openmp toolchain and update to the point release which looks like a semantic versioning thing indicating minor bugfixes will experience immediate cessation of function. I consider that very poor user experience and do not agree with the scope of breakage.

I don't see this revert in any of the official release branches. Am I missing something?

Duplicating a comment from the commit thread so it's easier for me to find later.

You've applied this to the release branches going back as far as 14. It's a user facing breaking change. As in people who have a working openmp toolchain and update to the point release which looks like a semantic versioning thing indicating minor bugfixes will experience immediate cessation of function. I consider that very poor user experience and do not agree with the scope of breakage.

555b572e3f407ac48b5f30fc06760cc4d0549977 is only in the main branch, not in release/14.x, release/15.x, or releast/16.x.

JonChesterfield added a comment.EditedMar 9 2023, 1:51 PM

That is great news, phabricator's list of branches at https://reviews.llvm.org/rG9b9d08111b618d74574ba03e5cc3d752ecc56f55 has mislead me. Thanks!