This is an archive of the discontinued LLVM Phabricator instance.

[Driver] Also search FilePaths for compiler-rt before falling back
AcceptedPublic

Authored by jrtc27 on Mar 6 2021, 7:35 AM.

Details

Summary

LibraryPaths is rather underused in Clang, unlike FilePaths. The latter
notably includes the sysroot for bare-metal toolchains, so this allows
compiler-rt to be found alongside the other libraries shipped as part of
a sysroot rather than requiring it to be bundled with Clang, allowing
bare-metal sysroots to behave much more like real OS sysroots.

This is particularly useful for RISC-V where there is an abundance of
possible ISA and ABI choices, but the architecture suffix added to the
library name only captures the architectural word size. Putting them in
ISA and ABI-specific sysroots is the easiest way to allow multiple
configurations to coexist on the system given that such sysroots already
must exist with the system headers and precompiled libc.

Diff Detail

Event Timeline

jrtc27 created this revision.Mar 6 2021, 7:35 AM
jrtc27 requested review of this revision.Mar 6 2021, 7:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2021, 7:35 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
jrtc27 updated this revision to Diff 328771.Mar 6 2021, 9:16 AM

Added missing file for test

jrtc27 updated this revision to Diff 328772.Mar 6 2021, 9:17 AM

Dropped unintended change

jrtc27 updated this revision to Diff 328773.Mar 6 2021, 9:18 AM

... and another accidental change

jrtc27 added a comment.Mar 6 2021, 9:25 AM

One thing that is currently rather ugly about BareMetal is that it ignores the AddArch argument. Once you have a sysroot, the architecture suffix is rather unnecessary, and given compiler-rt uses LLVM_ENABLE_PER_TARGET_RUNTIME_DIR to decide both whether to add a suffix and whether to add lib/${OS} to the install path, that means you can't nicely point compiler-rt at the sysroot as the install directory (well, unless you do something like make COMPILER_RT_OUTPUT_DIR point at the sysroot not the libdir and set COMPILER_RT_OS_DIR to . (or maybe the empty string also works)).

Ideally BareMetal would honour AddArch like the other toolchains, but you then have several issues:

  1. Are people relying on AddArch _not_ being honoured for things added to LibraryPaths (and therefore you need to check for both)?
  1. AddLinkRuntimeLib similarly hard-codes the fact that the architecture suffix is added. Changing it to use getCompilerRTArgString might work, but does subtly change things as then the path is provided to the linker, not a -l argument.
abidh added a comment.Mar 8 2021, 3:42 AM

One thing that is currently rather ugly about BareMetal is that it ignores the AddArch argument. Once you have a sysroot, the architecture suffix is rather unnecessary, and given compiler-rt uses LLVM_ENABLE_PER_TARGET_RUNTIME_DIR to decide both whether to add a suffix and whether to add lib/${OS} to the install path, that means you can't nicely point compiler-rt at the sysroot as the install directory (well, unless you do something like make COMPILER_RT_OUTPUT_DIR point at the sysroot not the libdir and set COMPILER_RT_OS_DIR to . (or maybe the empty string also works)).

Agree that setting the compiler-rt installation directory is not ideal. I remember I ended up doing some post installation steps to make sure that libraries were in right folder.

Regarding this patch, currently Baremetal toolchain looks for compiler-rt in a multilib specific directory. Does that not solve the problem of finding compiler-rt for any given arch/abi combination automatically?

jrtc27 added a comment.Mar 8 2021, 8:32 AM

One thing that is currently rather ugly about BareMetal is that it ignores the AddArch argument. Once you have a sysroot, the architecture suffix is rather unnecessary, and given compiler-rt uses LLVM_ENABLE_PER_TARGET_RUNTIME_DIR to decide both whether to add a suffix and whether to add lib/${OS} to the install path, that means you can't nicely point compiler-rt at the sysroot as the install directory (well, unless you do something like make COMPILER_RT_OUTPUT_DIR point at the sysroot not the libdir and set COMPILER_RT_OS_DIR to . (or maybe the empty string also works)).

Agree that setting the compiler-rt installation directory is not ideal. I remember I ended up doing some post installation steps to make sure that libraries were in right folder.

Regarding this patch, currently Baremetal toolchain looks for compiler-rt in a multilib specific directory. Does that not solve the problem of finding compiler-rt for any given arch/abi combination automatically?

So, technically yes for RISC-V I believe we can do that. However that support code only really exists for RISC-V, and we have other architectures too we need to extend. This approach is entirely general with no lists of arch-specific combinations needed, and far more flexible. I also believe that conceptually compiler-rt does belong in the sysroot, not with the compiler. The libgcc-with-gcc model kind of makes sense for GCC given you have to configure GCC for a specific architecture, but for Clang there is no such coupling. This allows you to, say, use a system-provided Clang with your own bare-metal sysroot (not that that's our use-case, but it's a sensible one). The other way to see that compiler-rt doesn't really belong with the compiler is to look at the bootstrapping process: compiler-rt depends on a libc, typically newlib, which then depends on your compiler. So you have to build Clang, then build newlib, then go back and build compiler-rt to add it to your Clang. That gets rather annoying to manage in CI jobs.

This makes sense to me but I'm not quite sure about the implications, especially when we consider compatibility. I think we need more eyes on this patch.

I just looked at this again and I don't have the full context in my mind right now but won't the test just exercise the BareMetal toolchain and not your changes?

I just looked at this again and I don't have the full context in my mind right now but won't the test just exercise the BareMetal toolchain and not your changes?

I've since lost my recollection of all the details, but BareMetal::AddLinkRuntimeLib is hard-coding -lclang_rt.builtins-$ARCH so I think we already do search in the sysroot for the library if it doesn't exist in the resources directory. What this tests is -print-libgcc-file-name (implemented in the common Driver.cpp) which is the thing that is currently "broken" for bare-metal toolchains as it will currently neglect to search the sysroot as the linker would, since it calls getCompilerRT directly (which BareMetal doesn't override).

This revision is now accepted and ready to land.Jul 11 2021, 2:14 PM

Why do you want to ship the builtins as part of your sysroot?

compiler-rt depends on a libc, typically newlib, which then depends on your compiler

The builtins should only depend on compiler-provided headers, and not on the rest of libc. Agreed re: the rest of compiler-rt.

compiler-rt depends on a libc, typically newlib, which then depends on your compiler

The builtins should only depend on compiler-provided headers, and not on the rest of libc. Agreed re: the rest of compiler-rt.

As of https://reviews.llvm.org/D103876 that should be true for baremetal targets, but clear_cache.c might still need libc/OS headers.

compiler-rt depends on a libc, typically newlib, which then depends on your compiler

The builtins should only depend on compiler-provided headers, and not on the rest of libc. Agreed re: the rest of compiler-rt.

Until recently builtins required libc headers, though we upstreamed a patch to remove that dependency.

Why do you want to ship the builtins as part of your sysroot?

Several reasons off the top of my head:

  1. There's no general way to have multiple ABIs/ISA variants co-exist for the same base ISA. We have aarch64, mips64, riscv32 and riscv64 as architectures we support, and each of those has three different variants we need (two with different ISA baselines, one with a whole different ABI).
  1. We build LLVM once and then use that same toolchain for multiple architectures. Whilst, with sufficient general multilib/multiarch support across all architectures, that could be supported, it means that either you need to provide a toolchain that has binaries built for all architectures (despite the fact that most of our users only want one or two), or you need to provide multiple copies of LLVM with different sets of libs bundled in the same directory. Moreover, your toolchain might be in a shared read-only directory (NFS or similar), installed by your sysadmin or who knows what else, with you unable to modify it. This allows you to use that toolchain to build whatever software you like, as you can point it at whatever sysroot you've created with whatever compiler-rt you want, rather than having to ask your sysadmin to install a specific version of compiler-rt in a global namespace.
  1. If people hard-code -lgcc in their Makefiles then distributing the run-time support library (be it a renamed compiler-rt or libgcc) in a sysroot works out of the box. If you want to then give libclang_rt.builtins its "proper" name, and switch the Makefile to use -print-libgcc-file-name rather than hard-coding -lgcc, this suddenly breaks, as -print-libgcc-file-name falls back on giving you the absolute path to where the builtins library would live in the resources directory. This incentivises people to do the wrong thing (hard-code -lgcc and rename their libclang_rt.builtins to libgcc) in order to make things work.
  1. If -lclang_rt.builtins-riscv64 works just fine, finding the library as normal in the library search path, why shouldn't -print-libgcc-file-name? Hard-coding -lclang_rt.builtins-ARCH in your Makefile is bad practice just like hard-coding -lgcc (maybe it's fine in your internal project, but not if it's general open-source software that needs to support GCC and Clang in a wide variety of configurations), yet also works, so again incentivises bad practices / penalises doing the right thing.
  1. When targeting FreeBSD, compiler-rt lives in the sysroot (FreeBSD's base system vendors it), this allows the same model to be used for bare-metal sysroots.

Overall, my view is this provides flexibility that aligns much more nicely with our build system and software distribution model with little added complexity to Clang and should not break any existing uses.