Page MenuHomePhabricator

[libcxx] Support per-target __config_site in per-target runtime build
ClosedPublic

Authored by phosek on Oct 7 2020, 3:18 PM.

Details

Reviewers
ldionne
MaskRay
Group Reviewers
Restricted Project
Restricted Project
Commits
rGea12d779bc23: [libc++] Support per-target __config_site in per-target runtime build
Summary

When using the per-target runtime build, it may be desirable to have
different __config_site headers for each target where all targets cannot
share a single configuration.

The layout used for libc++ headers after this change is:

include/
  c++/
    v1/
      <libc++ headers except for __config_site>
  <target1>/
    c++/
      v1/
        __config_site
  <target2>/
    c++/
      v1/
        __config_site
  <other targets>

This is the most optimal layout since it avoids duplication, the only
headers that's per-target is __config_site, all other headers are
shared across targets. This also means that we no need two
-isystem flags: one for the target-agnostic headers and one for
the target specific headers.

Diff Detail

Event Timeline

phosek created this revision.Oct 7 2020, 3:18 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 7 2020, 3:18 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
phosek added a comment.Oct 7 2020, 3:25 PM

This is the per-target __config (or more preferable __config_site) support I mentioned in D88843.

krisb added a subscriber: krisb.Oct 10 2020, 3:19 AM
phosek updated this revision to Diff 334289.Mar 30 2021, 3:50 PM
phosek retitled this revision from [libcxx] Support per-target __config in per-target runtime build to [libcxx] Support per-target __config_site in per-target runtime build.
phosek edited the summary of this revision. (Show Details)

@ldionne this is ready for review now that D97572 landed.

phosek updated this revision to Diff 334323.Mar 30 2021, 7:05 PM

Regarding

/usr/include/x86_64-unknown-linux-gnu/c++/v1
/usr/include/c++/v1

With GCC multiarch, some include search paths are preceded by machine-os-env specific suffix directories.
Note: 'vendor' is not there. So you may see .../include/x86_64-linux-gnu but there is no .../include/x86_64-{pc,unknown}-linux-gnu

/usr/local/include/x86_64-linux-gnu         # affected by sysroot, multiarch, usually nonexistent
/usr/local/include                          # affected by sysroot
/tmp/opt/gcc-debug/include
/tmp/opt/gcc-debug/lib/gcc/x86_64-pc-linux-gnu/11.0.1/include-fixed
/tmp/opt/gcc-debug/lib/gcc/x86_64-pc-linux-gnu/11.0.1/../../../../x86_64-pc-linux-gnu/include
/usr/include/x86_64-linux-gnu               # affected by sysroot, multiarch
/usr/include                                # affected by sysroot

On Debian x86_64, /usr/include/x86_64-linux-gnu/ exists. Adding x86_64-pc-linux-gnu (an possible value of LLVM_DEFAULT_TARGET_TRIPLE) may add more confusion.

phosek updated this revision to Diff 334506.Mar 31 2021, 12:05 PM

Regarding

/usr/include/x86_64-unknown-linux-gnu/c++/v1
/usr/include/c++/v1

With GCC multiarch, some include search paths are preceded by machine-os-env specific suffix directories.
Note: 'vendor' is not there. So you may see .../include/x86_64-linux-gnu but there is no .../include/x86_64-{pc,unknown}-linux-gnu

/usr/local/include/x86_64-linux-gnu         # affected by sysroot, multiarch, usually nonexistent
/usr/local/include                          # affected by sysroot
/tmp/opt/gcc-debug/include
/tmp/opt/gcc-debug/lib/gcc/x86_64-pc-linux-gnu/11.0.1/include-fixed
/tmp/opt/gcc-debug/lib/gcc/x86_64-pc-linux-gnu/11.0.1/../../../../x86_64-pc-linux-gnu/include
/usr/include/x86_64-linux-gnu               # affected by sysroot, multiarch
/usr/include                                # affected by sysroot

On Debian x86_64, /usr/include/x86_64-linux-gnu/ exists. Adding x86_64-pc-linux-gnu (an possible value of LLVM_DEFAULT_TARGET_TRIPLE) may add more confusion.

This was already discussed pretty extensively in the past, but I'll try to give a summary.

In the case of GCC, compiler can only compile binaries for a single target that's set at configuration time. For example if you configured GCC with --target=x86_64-acme-linux-gnu, the build links the binary as x86_64-acme-linux-gnu-gcc and that binary would look for headers in /usr/include/x86_64-acme-linux-gnu and libraries in /usr/lib/x86_64-acme-linux-gnu (among other paths) so there's little ambiguity.

In the case of Clang, the situation is a bit more difficult because a single compiler supports multiple targets via --target= option and there are different ways to spell the same target triple. For example, x86_64-linux-gnu and x86_64-unknown-linux-gnu are considered equivalent, the latter is a normalized version of the former (since you can omit emit components). The question then is where to look for headers and libraries. Currently, we always look in two locations: (1) we use --target=<triple as specified by the user and (2) normalized version of that target. For example if the user passes --target=x86_64-linux-gnu, we would first check <prefix>/lib/x86_64-linux-gnu and then <prefix>/lib/x86_64-unknown-linux-gnu.

In Fuchsia, we always prefer using normalized triples because that's less error prone. For example, in the past we would install libraries in <prefix>/lib/x86_64-fuchsia which works fine if you invoke the compiler with --target=x86_64-fuchsia, but if you instead use --target=x86_64-unknown-fuchsia (which some build systems would do), compiler would fail to find libraries because it doesn't know how to denormalize the triple. Using normalized triples addresses the issue because we can always normalize the triple to get a canonical spelling.

MaskRay added a comment.EditedMar 31 2021, 1:01 PM

Regarding

/usr/include/x86_64-unknown-linux-gnu/c++/v1
/usr/include/c++/v1

With GCC multiarch, some include search paths are preceded by machine-os-env specific suffix directories.
Note: 'vendor' is not there. So you may see .../include/x86_64-linux-gnu but there is no .../include/x86_64-{pc,unknown}-linux-gnu

/usr/local/include/x86_64-linux-gnu         # affected by sysroot, multiarch, usually nonexistent
/usr/local/include                          # affected by sysroot
/tmp/opt/gcc-debug/include
/tmp/opt/gcc-debug/lib/gcc/x86_64-pc-linux-gnu/11.0.1/include-fixed
/tmp/opt/gcc-debug/lib/gcc/x86_64-pc-linux-gnu/11.0.1/../../../../x86_64-pc-linux-gnu/include
/usr/include/x86_64-linux-gnu               # affected by sysroot, multiarch
/usr/include                                # affected by sysroot

On Debian x86_64, /usr/include/x86_64-linux-gnu/ exists. Adding x86_64-pc-linux-gnu (an possible value of LLVM_DEFAULT_TARGET_TRIPLE) may add more confusion.

This was already discussed pretty extensively in the past, but I'll try to give a summary.

In the case of GCC, compiler can only compile binaries for a single target that's set at configuration time. For example if you configured GCC with --target=x86_64-acme-linux-gnu, the build links the binary as x86_64-acme-linux-gnu-gcc and that binary would look for headers in /usr/include/x86_64-acme-linux-gnu and libraries in /usr/lib/x86_64-acme-linux-gnu (among other paths) so there's little ambiguity.

I'd love if vendor is in the multiarch string, i.e. /usr/include/x86_64-acme-linux-gnu and /usr/lib/x86_64-acme-linux-gnu, but unfortunately that is not the case. See my previous dump with a GCC configured with x86_64-pc-linux-gnu. Note that some paths are derived from the GCC installation, which has vendor part, while others are derived from $multiarch, which has no vendor part.

The inconsistency between the proposed /usr/include/x86_64-unknown-linux-gnu/c++/v1 and /usr/include/x86_64-linux-gnu is something I want to avoid.

In the case of Clang, the situation is a bit more difficult because a single compiler supports multiple targets via --target= option and there are different ways to spell the same target triple. For example, x86_64-linux-gnu and x86_64-unknown-linux-gnu are considered equivalent, the latter is a normalized version of the former (since you can omit emit components). The question then is where to look for headers and libraries. Currently, we always look in two locations: (1) we use --target=<triple as specified by the user and (2) normalized version of that target. For example if the user passes --target=x86_64-linux-gnu, we would first check <prefix>/lib/x86_64-linux-gnu and then <prefix>/lib/x86_64-unknown-linux-gnu.

x86_64-linux-gnu and x86_64-unknown-linux-gnu are mostly equivalent, but they matter for the driver constructed paths from GCC installation (see the ../../.. path components in /tmp/opt/gcc-debug/lib/gcc/x86_64-pc-linux-gnu/11.0.1/../../../../lib64).

I think it makes sense to use un-normalized (--target=) for both lib/gcc/$triple/11.0.1 derived paths and /usr/include/$triple and /usr/lib/$triple and such a change will likely make our rules more reasonable (and more useful if people do want to leverage vendor for different directories).
We can then drop clang/lib/Driver/ToolChains/Linux.cpp Linux::getMultiarchTriple.
But as it stands, Linux::getMultiarchTriple constructed triples have no vendor part, so this patch /usr/include/c++/$triple/c++/v1 will introduce some inconsistency.

The change will look like breaking but it actually works on Debian/Ubuntu and their derivatives (--enable-multi-arch GCC). Because their GCC installation paths have no vendor part, e.g. lib/gcc/x86_64-linux-gnu.

Summary:

  • clang a.cc => auto detection still works and will find lib/gcc/x86_64-linux-gnu
  • clang --target=x86_64-linux-gnu a.cc => find lib/gcc/x86_64-linux-gnu
  • clang --target=x86_64-unknown-linux-gnu a.cc => found lib/gcc/x86_64-linux-gnu with the current behavior, but unsupported in the future. I don't think people do this today.

I believe this change will not need anything on @thakis' side but just wanted to give him a heads-up.

In Fuchsia, we always prefer using normalized triples because that's less error prone. For example, in the past we would install libraries in <prefix>/lib/x86_64-fuchsia which works fine if you invoke the compiler with --target=x86_64-fuchsia, but if you instead use --target=x86_64-unknown-fuchsia (which some build systems would do), compiler would fail to find libraries because it doesn't know how to denormalize the triple. Using normalized triples addresses the issue because we can always normalize the triple to get a canonical spelling.

ldionne requested changes to this revision.Mar 31 2021, 2:17 PM

I'm personally not so concerned with the Clang driver side of things, but primarily with adding more complexity to the libc++ build. Shouldn't we be driving things from the runtimes build and setting a different CMAKE_INSTALL_PREFIX to install stuff to the right location instead? Then you can run N builds of libc++ for N targets, each setting the appropriate install path.

clang/lib/Driver/ToolChains/Fuchsia.cpp
373

Doesn't that break #include_next if both directories exist? Generally speaking, I think you only ever want to have a single directory containing the libc++ headers on your header search path.

This revision now requires changes to proceed.Mar 31 2021, 2:17 PM
phosek updated this revision to Diff 334550.Mar 31 2021, 3:12 PM

I'm personally not so concerned with the Clang driver side of things, but primarily with adding more complexity to the libc++ build. Shouldn't we be driving things from the runtimes build and setting a different CMAKE_INSTALL_PREFIX to install stuff to the right location instead? Then you can run N builds of libc++ for N targets, each setting the appropriate install path.

We could do that but problem with the approach is that we would end up duplicating all headers across targets. Since all headers except __config_site should be the same, we only want to have per-target version of that header which is what this change is doing. The final directory layout you end up with:

include/
  c++/
    v1/
      <headers except for __config_site>
  x86_64-linux-gnu/
    c++/
      v1/
        __config_site
  <other targets>

This is the most optimal layout but I don't think it's something we can achieve purely with CMAKE_INSTALL_PREFIX without additional build support.

scw added a subscriber: scw.Mar 31 2021, 3:24 PM
phosek added inline comments.Mar 31 2021, 3:24 PM
clang/lib/Driver/ToolChains/Fuchsia.cpp
373

See my comment, the per-target include path will only contain __config_site whereas the generic one contains all the headers other than __config_site so this works fine. It would even work if you had another copy of __config_site in the generic one which may be useful on platforms that don't currently support the multiarch layout like Darwin.

phosek edited the summary of this revision. (Show Details)Apr 2 2021, 12:33 PM

@ldionne I have update the change description to explain the new layout.

MaskRay added a comment.EditedApr 2 2021, 12:41 PM

This layout matches the layout use by other compilers like GCC.

Clarification: this is similar to but not match GCC's layout, where the multiarch path component (e.g. /x86_64-linux-gnu) is appended to .../include/c++/$V

% clang++ --target=aarch64-pc-linux-gnu a.cc '-###' |& sed -E 's/ "?-[LiI]/\n&/g'
clang version 13.0.0
Target: aarch64-pc-linux-gnu
Thread model: posix
InstalledDir: /tmp/RelA/bin
 "/tmp/RelA/bin/clang-13" "-cc1" "-triple" "aarch64-pc-linux-gnu" "-emit-obj" "-mrelax-all" "--mrelax-relocations" "-disable-free" "-main-file-name" "a.cc" "-mrelocation-model" "static" "-mframe-pointer=non-leaf" "-fmath-errno" "-fno-rounding-math" "-mconstructor-aliases" "-munwind-tables" "-target-cpu" "generic" "-target-feature" "+neon" "-target-abi" "aapcs" "-fallow-half-arguments-and-returns" "-debugger-tuning=gdb" "-fcoverage-compilation-dir=/tmp/c" "-resource-dir" "/tmp/RelA/lib/clang/13.0.0"
 "-internal-isystem" "/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/include/c++/10"
 "-internal-isystem" "/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/include/c++/10/aarch64-linux-gnu"
 "-internal-isystem" "/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/include/c++/10/backward"
...
phosek added a comment.Apr 6 2021, 4:19 PM

@ldionne any more thoughts on this?

phosek updated this revision to Diff 337166.Apr 13 2021, 8:36 AM
phosek edited the summary of this revision. (Show Details)

I agree the design is clean from the "what we ship" perspective. I really like the idea of shipping the whole library in a generic spot and only having one file embody all the target specific configuration of the library. I think it's a very elegant way to handle the configuration of libc++, and it will have positive effects beyond your intended use case like giving us a place and a way to push platform-specific hacks to.

Where this really stinks IMO is in the way we build the library. I don't want us to have a separate LLVM_ENABLE_PER_TARGET_RUNTIME_DIR setting and a different mode for building libc++. This is at the heart of many discussions we've been having lately. Basically, the way I do things:

  1. Build libc++ for one target using the most straightforward possible CMake settings
  2. Then, post-process that by moving headers around, creating universal binaries, etc, all of it from a separate script.

That's how I set up the libc++ build at Apple because I didn't want to add too many platform specific things in the CMake. Otherwise we'd have e.g. a mode for building for many architectures (x86_64 and arm64) at once on Apple platforms and creating a universal dylib from that. That's not the way CMake is intended to be used unless I fundamentally missed something. Also note that it would be much better if CMake had proper support for multi-arch, but I'm not aware of anything like that.

On the contrary, your approach (please correct me if I'm mis-speaking) is to make the CMake work out of the box exactly as you want it, even if that means adding complexity. This has the nice benefit that everything is done in a single place, however it forces us to go against the way CMake was designed in a few places, for example when it comes to building the library for several platforms.

My only push back on this patch, and on several patches in the same vein, is about this fundamental difference in approach that we have.

@ldionne friendly ping.

I agree the design is clean from the "what we ship" perspective. I really like the idea of shipping the whole library in a generic spot and only having one file embody all the target specific configuration of the library. I think it's a very elegant way to handle the configuration of libc++, and it will have positive effects beyond your intended use case like giving us a place and a way to push platform-specific hacks to.

Where this really stinks IMO is in the way we build the library. I don't want us to have a separate LLVM_ENABLE_PER_TARGET_RUNTIME_DIR setting and a different mode for building libc++. This is at the heart of many discussions we've been having lately. Basically, the way I do things:

  1. Build libc++ for one target using the most straightforward possible CMake settings
  2. Then, post-process that by moving headers around, creating universal binaries, etc, all of it from a separate script.

That's how I set up the libc++ build at Apple because I didn't want to add too many platform specific things in the CMake. Otherwise we'd have e.g. a mode for building for many architectures (x86_64 and arm64) at once on Apple platforms and creating a universal dylib from that. That's not the way CMake is intended to be used unless I fundamentally missed something. Also note that it would be much better if CMake had proper support for multi-arch, but I'm not aware of anything like that.

On the contrary, your approach (please correct me if I'm mis-speaking) is to make the CMake work out of the box exactly as you want it, even if that means adding complexity. This has the nice benefit that everything is done in a single place, however it forces us to go against the way CMake was designed in a few places, for example when it comes to building the library for several platforms.

My only push back on this patch, and on several patches in the same vein, is about this fundamental difference in approach that we have.

I agree with you, I'm just not sure if what you're describing is possible, at least not yet. There is one additional requirement we have, which is that build layout matches the installation layout. This is what many of our tools and tests rely on. For example, when you build clang++ and then use it, it's going to look for C++ headers in ../include/c++/v1 and without needing to run ninja install. This is where a lot of the complexity comes from and changing this assumption would be non-trivial (and it's not even clear to me that the alternative would be better from usability perspective).

If you want to simplify the libc++ build, we could consider moving the path setup (that is the LLVM_ENABLE_PER_TARGET_RUNTIME_DIR branch) to the runtimes build, but we would still need expose all of the variables to parametrize where different build artifacts are placed. If we decide to go that way, I'd prefer to do it in a follow up change since it's technically orthogonal.

I'd also point out that we support a lot of different ways to do things which is where a lot of the complexity comes from, one example is having multiple different ways to build runtimes which you brought up in your RFC, but there are others. LLVM_ENABLE_PER_TARGET_RUNTIME_DIR is actually an example of that, I'd really prefer if that was the default and the only supported way, but we would need to ensure that all vendors have switched to it, just as I'd like for everyone to use the runtimes build.

Well, one way to see things is that this patch only makes yet another step in a direction that we're already rather far in (in terms of CMake complexity). And as I said, I think the separation of __config_site in its own platform-specific directory makes a lot of sense. So don't consider yourself blocked on libc++.

I do have some questions regarding the driver side of things though. What platforms is this change going to impact on the Driver side? Fuchsia is self-explanatory, but what does the Gnu driver impact? Also, have you thought about using this scheme instead? Why did you opt for the scheme you selected?

include/
  c++/
    v1/
      <libc++ headers except for __config_site>
      <target1>/
          __config_site
      <target2>/
          __config_site
ldionne accepted this revision.Apr 14 2021, 11:34 AM

As discussed offline, I think this is fine. This layout seems to be closest to what other compilers use.

@phosek said he would send out an RFC to make the multiarch layout become the default for all platforms. I'd be interested in using that on Darwin too (the only difference is that it would be apple-<platform> without the architecture, since we use universal binaries).

This revision is now accepted and ready to land.Apr 14 2021, 11:34 AM

Oh, and can you make sure the Runtimes build passes CI before shipping this?

phosek updated this revision to Diff 340743.Apr 27 2021, 12:57 AM
phosek updated this revision to Diff 341088.Apr 28 2021, 12:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2021, 12:55 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Herald added a subscriber: arichardson. · View Herald Transcript
This revision now requires review to proceed.Apr 28 2021, 12:55 AM
phosek updated this revision to Diff 341262.Apr 28 2021, 10:39 AM
MaskRay accepted this revision.Apr 28 2021, 11:04 AM

The path component x86_64-unknown-linux-gnu (not multiarch x86_64-linux-gnu) looks good to me

clang/test/Driver/linux-header-search.cpp
16

You may want to use the -SAME: {{^}} pattern in linux-cross.cpp to improve the checks.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 28 2021, 2:27 PM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.