This is an archive of the discontinued LLVM Phabricator instance.

[cmake] Add support for multiple distributions
ClosedPublic

Authored by smeenai on Oct 9 2020, 7:33 PM.

Details

Summary

LLVM's build system contains support for configuring a distribution, but
it can often be useful to be able to configure multiple distributions
(e.g. if you want separate distributions for the tools and the
libraries). Add this support to the build system, along with
documentation and usage examples.

Diff Detail

Event Timeline

smeenai created this revision.Oct 9 2020, 7:33 PM
Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald Transcript
smeenai requested review of this revision.Oct 9 2020, 7:33 PM

I've not taken a thorough look at the changes, but I am definitely in favour of this change (conceptually).

We've already considered introducing a similar mechanism so thank you for working on this! There's one issue that I haven't figured out how to resolve and I'd be interested in your thoughts: building executables and libraries in the most optimal may require incompatible flags. For example, when building a shared library you have to use -fPIC but for executables that's suboptimal; similarly, when building executables you may want to use LTO but if you also want to create a distribution that contains static libraries, that's a problem because those will contain bitcode. So far our solution has been to simply do multiple builds with different flags (in GN this can be done in a single build by leveraging the "toolchain" concept that also allows sharing as much work as possible, but I haven't figured out how to do anything like that in CMake).

llvm/cmake/modules/LLVMDistributionSupport.cmake
113

Can we use EMPTY or something along those lines to make it more obvious?

195

Can we use EMPTY or something along those lines to make it more obvious?

We've already considered introducing a similar mechanism so thank you for working on this! There's one issue that I haven't figured out how to resolve and I'd be interested in your thoughts: building executables and libraries in the most optimal may require incompatible flags. For example, when building a shared library you have to use -fPIC but for executables that's suboptimal; similarly, when building executables you may want to use LTO but if you also want to create a distribution that contains static libraries, that's a problem because those will contain bitcode. So far our solution has been to simply do multiple builds with different flags (in GN this can be done in a single build by leveraging the "toolchain" concept that also allows sharing as much work as possible, but I haven't figured out how to do anything like that in CMake).

Good question. We need something similar as well, where some clients of the libraries want them built with RTTI (because their programs rely on RTTI and you can get link errors when linking them against non-RTTI LLVM libraries), but we don't wanna build the executables with RTTI because of the size increases.

I don't know of any way to do this in CMake other than just configuring multiple times with the different flags, like you said. Maybe we could implement some logic for that in the build system, but I'm not sure if it's worth it. (I'm wondering if the Ninja multi-config generator added in CMake 3.17 could be useful for this, although I haven't played around with that at all yet.) I do recognize it limits the usefulness of this approach quite a bit (since if you're doing separate configurations you can just do different LLVM_DISTRIBUTION_COMPONENT settings), but I'm hoping it can still be valuable for simpler scenarios.

Out of curiosity, with GN, how would you specify that you want binaries to be built with one toolchain (e.g. LTO and non-PIC) and libraries with another (e.g. non-LTO and PIC)?

llvm/cmake/modules/LLVMDistributionSupport.cmake
113

Sure, I'll change it.

Reading up on the Ninja multi-config generator a bit, you can define your own build types, so you should be able to create e.g. one build type for LTO and another for PIC. You'd still need some external logic to define that you use the LTO mode for executables and the PIC mode for libraries, but the multiple distribution support should work nicely in conjunction with that.

Unfortunately, I don't think that'll work for my RTTI case, since LLVM's flag additions for enabling or disabling (based on LLVM_ENABLE_RTTI) will override any flags you specify for your build type. (Maybe there's a reasonable way to tweak the build system to support that case though.)

We've already considered introducing a similar mechanism so thank you for working on this! There's one issue that I haven't figured out how to resolve and I'd be interested in your thoughts: building executables and libraries in the most optimal may require incompatible flags. For example, when building a shared library you have to use -fPIC but for executables that's suboptimal; similarly, when building executables you may want to use LTO but if you also want to create a distribution that contains static libraries, that's a problem because those will contain bitcode. So far our solution has been to simply do multiple builds with different flags (in GN this can be done in a single build by leveraging the "toolchain" concept that also allows sharing as much work as possible, but I haven't figured out how to do anything like that in CMake).

Good question. We need something similar as well, where some clients of the libraries want them built with RTTI (because their programs rely on RTTI and you can get link errors when linking them against non-RTTI LLVM libraries), but we don't wanna build the executables with RTTI because of the size increases.

We recently ran into this as well where we need RTTI for LLVM libraries but don't want to enable it for the toolchain.

I don't know of any way to do this in CMake other than just configuring multiple times with the different flags, like you said. Maybe we could implement some logic for that in the build system, but I'm not sure if it's worth it. (I'm wondering if the Ninja multi-config generator added in CMake 3.17 could be useful for this, although I haven't played around with that at all yet.) I do recognize it limits the usefulness of this approach quite a bit (since if you're doing separate configurations you can just do different LLVM_DISTRIBUTION_COMPONENT settings), but I'm hoping it can still be valuable for simpler scenarios.

One idea I had would be to eliminate the use of global variables like CMAKE_C_FLAGS and CMAKE_CXX_FLAGS and always set these via target_compile_options (which is something we should do in any case to move towards more modern CMake); since we always use functions like add_llvm_component_library to create targets in LLVM, we could then have these create multiple targets that would differ in flags like -fPIC or -frtti and set up dependencies among these appropriately. It means you'd have to build files multiple times (I don't think there's any way around that) but you'd save time on things like tablegen.

Out of curiosity, with GN, how would you specify that you want binaries to be built with one toolchain (e.g. LTO and non-PIC) and libraries with another (e.g. non-LTO and PIC)?

In GN there's no global concept of tools like CC or LD, instead these are scoped inside a toolchain which in GN is a first class concept, and dependencies can be parametrized by toolchains (for example you can have deps = [ "//path/to/dependency(//path/to/toolchain)" ]. These toolchains can be also parametrized and templated. A common use case is to have a variety of toolchains that have the same tools but differ in flags like -fPIC. In Ninja this is represented as subninjas where each child Ninja file defines its own set of rules.

You can see an example of this even in LLVM GN build where we define a set of toolchains in https://github.com/llvm/llvm-project/blob/1687a8d83b702332c4aae4f2a95d27c16688418d/llvm/utils/gn/build/toolchain/BUILD.gn#L179 and then use them to built runtimes in https://github.com/llvm/llvm-project/blob/1687a8d83b702332c4aae4f2a95d27c16688418d/llvm/utils/gn/secondary/compiler-rt/BUILD.gn which is an (more efficient) equivalent of the runtimes build.

smeenai updated this revision to Diff 298251.Oct 14 2020, 3:29 PM

Address review comments

We've already considered introducing a similar mechanism so thank you for working on this! There's one issue that I haven't figured out how to resolve and I'd be interested in your thoughts: building executables and libraries in the most optimal may require incompatible flags. For example, when building a shared library you have to use -fPIC but for executables that's suboptimal; similarly, when building executables you may want to use LTO but if you also want to create a distribution that contains static libraries, that's a problem because those will contain bitcode. So far our solution has been to simply do multiple builds with different flags (in GN this can be done in a single build by leveraging the "toolchain" concept that also allows sharing as much work as possible, but I haven't figured out how to do anything like that in CMake).

Good question. We need something similar as well, where some clients of the libraries want them built with RTTI (because their programs rely on RTTI and you can get link errors when linking them against non-RTTI LLVM libraries), but we don't wanna build the executables with RTTI because of the size increases.

We recently ran into this as well where we need RTTI for LLVM libraries but don't want to enable it for the toolchain.

I don't know of any way to do this in CMake other than just configuring multiple times with the different flags, like you said. Maybe we could implement some logic for that in the build system, but I'm not sure if it's worth it. (I'm wondering if the Ninja multi-config generator added in CMake 3.17 could be useful for this, although I haven't played around with that at all yet.) I do recognize it limits the usefulness of this approach quite a bit (since if you're doing separate configurations you can just do different LLVM_DISTRIBUTION_COMPONENT settings), but I'm hoping it can still be valuable for simpler scenarios.

One idea I had would be to eliminate the use of global variables like CMAKE_C_FLAGS and CMAKE_CXX_FLAGS and always set these via target_compile_options (which is something we should do in any case to move towards more modern CMake); since we always use functions like add_llvm_component_library to create targets in LLVM, we could then have these create multiple targets that would differ in flags like -fPIC or -frtti and set up dependencies among these appropriately. It means you'd have to build files multiple times (I don't think there's any way around that) but you'd save time on things like tablegen.

Yeah, we should definitely be moving away from global flag manipulation (and I know @ldionne has been doing a bunch of cleanups for that in libc++). Being able to have as much commonality as possible would be nice, although CMake seems to be more in the "different configurations/build types for different build settings" camp.

Reading up on the Ninja multi-config generator more, it seems pretty well suited to handling things like PIC vs. non-PIC and LTO vs. non-LTO (although I haven't thought a lot about how those configs would interact with the multi-stage bootstrapping setups). RTTI vs. non-RTTI is more complicated (see my previous comment), but should be achievable by tweaking how LLVM's build system manipulates those flags. (Strawman proposal: have LLVM_ENABLE_RTTI accept a default argument in addition to the regular booleans to denote that you want LLVM to respect your config's flag settings for RTTI instead of trying to enforce its own.)

Out of curiosity, with GN, how would you specify that you want binaries to be built with one toolchain (e.g. LTO and non-PIC) and libraries with another (e.g. non-LTO and PIC)?

In GN there's no global concept of tools like CC or LD, instead these are scoped inside a toolchain which in GN is a first class concept, and dependencies can be parametrized by toolchains (for example you can have deps = [ "//path/to/dependency(//path/to/toolchain)" ]. These toolchains can be also parametrized and templated. A common use case is to have a variety of toolchains that have the same tools but differ in flags like -fPIC. In Ninja this is represented as subninjas where each child Ninja file defines its own set of rules.

You can see an example of this even in LLVM GN build where we define a set of toolchains in https://github.com/llvm/llvm-project/blob/1687a8d83b702332c4aae4f2a95d27c16688418d/llvm/utils/gn/build/toolchain/BUILD.gn#L179 and then use them to built runtimes in https://github.com/llvm/llvm-project/blob/1687a8d83b702332c4aae4f2a95d27c16688418d/llvm/utils/gn/secondary/compiler-rt/BUILD.gn which is an (more efficient) equivalent of the runtimes build.

GN's toolchain system is really cool :) I was wondering how you'd tell GN that you want all your binaries to be built with one toolchain and all your libraries to be built with another one, but I found https://github.com/llvm/llvm-project/blob/master/llvm/utils/gn/build/BUILDCONFIG.gn, which I think answers that question.

smeenai marked 2 inline comments as done.Oct 14 2020, 3:41 PM

Are the runtimes expected to support this multi-distribution configuration? I don't think we'd want to move libc++ towards building multiple configurations at once in a single CMake invocation -- it's already too complicated to build just one configuration.

While I'm very sympathetic to the use case, it looks like this is working against the design of CMake. So, as long as the runtimes are not impacted by this change (which seems to be the case right now), I don't mind. But as I said, I don't think it would be wise to start supporting multiple distributions in a single CMake invocation for the runtime libraries. I think it would add too much complexity.

beanz added a comment.Oct 15 2020, 9:33 AM

Are the runtimes expected to support this multi-distribution configuration? I don't think we'd want to move libc++ towards building multiple configurations at once in a single CMake invocation -- it's already too complicated to build just one configuration.

I don’t see why the runtimes wouldn’t be included in this, especially when using the runtimes build which already supports building multiple configurations of libc++ from a single LLVM CMake invocation which in turn runs CMake for the runtimes multiple times. In fact, I think runtimes is basically the only place where we can currently cleanly handle building with different options.

While I'm very sympathetic to the use case, it looks like this is working against the design of CMake.

The way Compiler-RT builds for Darwin is certainly against the design of CMake, but I don’t think the same is true for how the runtimes build works.

I think the general idea of configure LLVM once, define multiple different “distributions” to get different groups of targets that you can install into different root directories and package for separate installation makes a lot of sense and is useful even if it doesn’t handle distributions that have different configurations.

Are the runtimes expected to support this multi-distribution configuration? I don't think we'd want to move libc++ towards building multiple configurations at once in a single CMake invocation -- it's already too complicated to build just one configuration.

I don’t see why the runtimes wouldn’t be included in this, especially when using the runtimes build which already supports building multiple configurations of libc++ from a single LLVM CMake invocation which in turn runs CMake for the runtimes multiple times. In fact, I think runtimes is basically the only place where we can currently cleanly handle building with different options.

While I'm very sympathetic to the use case, it looks like this is working against the design of CMake.

The way Compiler-RT builds for Darwin is certainly against the design of CMake, but I don’t think the same is true for how the runtimes build works.

I think the general idea of configure LLVM once, define multiple different “distributions” to get different groups of targets that you can install into different root directories and package for separate installation makes a lot of sense and is useful even if it doesn’t handle distributions that have different configurations.

That isn't what I meant. It's entirely okay for the runtimes to be driven via AddExternalProject like the runtimes build does, since that's akin to having a separate CMake invocation for each configuration. That's okay.

What I'm saying is that if the next logical step is to also add support for multiple distributions in libc++'s build itself (e.g. adding LIBCXX_<DISTRIBUTION>_ENABLE_SHARED & al), then I don't think that's a good idea.

That isn't what I meant. It's entirely okay for the runtimes to be driven via AddExternalProject like the runtimes build does, since that's akin to having a separate CMake invocation for each configuration. That's okay.

What I'm saying is that if the next logical step is to also add support for multiple distributions in libc++'s build itself (e.g. adding LIBCXX_<DISTRIBUTION>_ENABLE_SHARED & al), then I don't think that's a good idea.

Totally agree. That would be the path compiler-rt’s Darwin build goes, and it is a frequent problem.

That isn't what I meant. It's entirely okay for the runtimes to be driven via AddExternalProject like the runtimes build does, since that's akin to having a separate CMake invocation for each configuration. That's okay.

What I'm saying is that if the next logical step is to also add support for multiple distributions in libc++'s build itself (e.g. adding LIBCXX_<DISTRIBUTION>_ENABLE_SHARED & al), then I don't think that's a good idea.

Totally agree. That would be the path compiler-rt’s Darwin build goes, and it is a frequent problem.

Agreed as well, and I have no plans of going down that path. The multi-configuration stuff @phosek and I were discussing is related to the support CMake already has for changing flags based on your build configuration (e.g. the CMAKE_<LANG>_FLAGS_<CONFIG> variables), but something like LIBCXX_<DISTRIBUTION>_ENABLE_SHARED would be pretty different and is not a direction I plan on going in.

Ping. Assuming everyone is okay with the direction, any comments on the change itself?

phosek accepted this revision.Nov 6 2020, 2:57 PM

LGTM with a few nits.

clang/cmake/modules/AddClang.cmake
119

Should we make this uppercase and join it with _ to match the existing conventions? For example, you'd have ClangToolchainTargets and CLANG_TOOLCHAIN_HAS_EXPORTS. With the change as is, we would end up with CLANGToolchain_HAS_EXPORT which is a bit unusual.

lld/cmake/modules/LLDConfig.cmake.in
13

We use upper-case variables for the *_CONFIG_CODE so using lower case here is a bit unfortunate, could we use upper case for these variables?

This revision is now accepted and ready to land.Nov 6 2020, 2:57 PM
smeenai added inline comments.Nov 6 2020, 3:48 PM
clang/cmake/modules/AddClang.cmake
119

I'm playing a little trick with the lack of underscore. In case you're using the default unnamed distribution (as in you just specified LLVM_DISTRIBUTION_COMPONENTS), ${distribution} will be empty, so it'll expand to just CLANG_HAS_EXPORTS, which matches the existing property. If you wanted to add the underscore, you'd need to special-case the unnamed distribution, which seemed like a bunch of effort for a property name that should never be accessed directly. With that being said though, since all the Add*.cmake usages of get_llvm_distribution follow a similar pattern, I was considering just creating a higher-level function. Lemme make that change and share what it would look like and then we can decide which way we wanna go.

lld/cmake/modules/LLDConfig.cmake.in
13

I used the lowercase for consistency with the existing @llvm_config_include_buildtree_only_exports@ in the LLVM files, but I'm happy to change it.

smeenai updated this revision to Diff 343946.May 9 2021, 5:42 PM

Rebase (comments not addressed yet; uploading a pure rebase for easier version comparison)

smeenai marked an inline comment as done and 2 inline comments as not done.May 9 2021, 6:15 PM
smeenai updated this revision to Diff 343949.May 9 2021, 6:16 PM

Helper function to get install arg. TODO: Address other review comments

smeenai updated this revision to Diff 344299.May 10 2021, 11:58 PM

Uppercase variable names

smeenai marked 3 inline comments as done.May 11 2021, 12:03 AM
smeenai added inline comments.
llvm/cmake/modules/LLVMDistributionSupport.cmake
113

Thinking about this more, I prefer DEFAULT because it's marking the default (unnamed) distribution (and <DEFAULT> is just to make it stand out more and be less likely to clash with a user-specified distribution name). I can change it if you feel strongly though.

smeenai updated this revision to Diff 344666.May 11 2021, 10:35 PM

Add release notes

This revision was landed with ongoing or failed builds.May 12 2021, 11:13 AM
This revision was automatically updated to reflect the committed changes.