Page MenuHomePhabricator

[CMake] Allow setting the location of host tools with LLVM_NATIVE_TOOL_DIR
Needs ReviewPublic

Authored by mstorsjo on Aug 3 2022, 12:59 AM.

Details

Summary

This avoids having to specify the location of all individual tools.
In current builds, one may want to specify LLVM_TABLEGEN, CLANG_TABLEGEN,
LLDB_TABLEGEN, LLVM_CONFIG_PATH, CLANG_PSEUDO_GEN and
CLANG_TIDY_CONFUSABLE_CHARS_GEN; specifying just the base directory
containing all of them is much more convenient.

Note that HOST_EXECUTABLE_SUFFIX is different from CMAKE_EXECUTABLE_SUFFIX
(the suffix used for the cross compiled binaries). There's an upstream
request for CMake to provide such a variable, e.g.
CMAKE_HOST_EXECUTABLE_SUFFIX, in
https://gitlab.kitware.com/cmake/cmake/-/issues/17553, but it hasn't been
implemented yet.

This is a bit of an RFC, we could of course factorize setting of
HOST_EXECUTABLE_SUFFIX to a helper function/macro, and possibly
factorize the whole setup (including checking LLVM_USE_HOST_TOOLS and
calling build_native_tools) of clang-pseudo-gen and
clang-tidy-confusable-chars-gen, as it does end up as a fair bit of
boilerplate.

Diff Detail

Unit TestsFailed

TimeTest
60,120 msx64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
Script: -- : 'RUN: at line 4'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -O0 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/asan/TestCases/scariness_score_test.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Output/scariness_score_test.cpp.tmp

Event Timeline

mstorsjo created this revision.Aug 3 2022, 12:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2022, 12:59 AM
mstorsjo requested review of this revision.Aug 3 2022, 12:59 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 3 2022, 12:59 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
inclyc added a subscriber: inclyc.Aug 14 2022, 4:38 AM

Ping - is there any interest in this - a single flag for pointing towards existing prebuilt native executables instead of having to name every one (llvm-tblgen, clang-tblgen, lldb-tblgen, clang-pseudo-gen, clang-tidy-confusable-chars-gen) individually?

FWIW I have no idea: in principle this makes sense, but I don't use such a configuration and don't have a clear idea of what people who do use it want.

It also adds significant CMake complexity: e.g. clang-pseudo-gen now has 30 lines just to support the non-default "native tools" configuration, and this is duplicated for each tool. Maybe this could be made cheaper by sharing more of this logic in a separate place, but we should probably only add it at all if this really is helping someone significantly.

(Lines of CMake logic are extremely expensive to maintain IME: tooling and diagnostics are poor, there are no tests, there are too many configurations to reason about, interacting components are not documented, the language is inherently confusing and many of us that maintain it have only a rudimentary understanding)

FWIW I have no idea: in principle this makes sense, but I don't use such a configuration and don't have a clear idea of what people who do use it want.

Thanks for having a look and discussing the matter! FWIW, my current cross compilation script does something like this: https://github.com/mstorsjo/llvm-mingw/blob/master/build-llvm.sh#L176-L195 All of those lines could be changed into setting this one option.

When there's more native tools added, I want to keep this in sync to avoid needless recompilation of those tools - which is kinda burdensome in the current setup - while with this option, we'd only need to set one option and be done with it. (As long as everything else works, if there's a new tool that I haven't got hooked up, it "only" makes things slower to build.)

It also adds significant CMake complexity: e.g. clang-pseudo-gen now has 30 lines just to support the non-default "native tools" configuration, and this is duplicated for each tool. Maybe this could be made cheaper by sharing more of this logic in a separate place, but we should probably only add it at all if this really is helping someone significantly.

(Lines of CMake logic are extremely expensive to maintain IME: tooling and diagnostics are poor, there are no tests, there are too many configurations to reason about, interacting components are not documented, the language is inherently confusing and many of us that maintain it have only a rudimentary understanding)

This is a very good point indeed.

Most of the boilerplate for setting up clang-pseudo-gen and clang-tidy-confusable-chars-gen is identical, so those could probably be merged into a macro (hiding all the cross/native complexity entirely from the business logic, just like for tablegen) - but the complexity is still there. (It would end up in at least three cases; tablegen, llvm-config and in a generic-buildtime-tool-macro.)

On the topic of "really is helping someone significantly" - if we wouldn't have the burden of existing users using the existing options (for setting the path to each tool individually), picking this new option is a nobrainer - doing the same thing but in a much less annoying way. But since we have existing users with the existing options (which would need to be kept for some time transitionally), it's much less clear cut whether it's a "significant" improvement.

TL;DR: Thanks for explanation, LG if we move the common logic into a macro.

FWIW I have no idea: in principle this makes sense, but I don't use such a configuration and don't have a clear idea of what people who do use it want.

Thanks for having a look and discussing the matter! FWIW, my current cross compilation script does something like this: https://github.com/mstorsjo/llvm-mingw/blob/master/build-llvm.sh#L176-L195 All of those lines could be changed into setting this one option.

When there's more native tools added, I want to keep this in sync to avoid needless recompilation of those tools - which is kinda burdensome in the current setup - while with this option, we'd only need to set one option and be done with it. (As long as everything else works, if there's a new tool that I haven't got hooked up, it "only" makes things slower to build.)

Thanks, if this option is useful for you then I think it's probably useful for others too.
(Just want to make sure we're not adding features for no-one)

It also adds significant CMake complexity: e.g. clang-pseudo-gen now has 30 lines just to support the non-default "native tools" configuration, and this is duplicated for each tool. Maybe this could be made cheaper by sharing more of this logic in a separate place, but we should probably only add it at all if this really is helping someone significantly.

(Lines of CMake logic are extremely expensive to maintain IME: tooling and diagnostics are poor, there are no tests, there are too many configurations to reason about, interacting components are not documented, the language is inherently confusing and many of us that maintain it have only a rudimentary understanding)

This is a very good point indeed.

Most of the boilerplate for setting up clang-pseudo-gen and clang-tidy-confusable-chars-gen is identical, so those could probably be merged into a macro (hiding all the cross/native complexity entirely from the business logic, just like for tablegen) - but the complexity is still there. (It would end up in at least three cases; tablegen, llvm-config and in a generic-buildtime-tool-macro.)

I think the generic tool macro would be valuable even if it can't be used for tablegen (though it's not obvious at first glance why not).
In practice if we copy the logic in two places it's extremely likely to get copied in a third and/or modified by someone who doesn't really understand it at some point. A macro would make this less likely, the macro name itself is documentation, it increases the benefit/cost of having a comment.

On the topic of "really is helping someone significantly" - if we wouldn't have the burden of existing users using the existing options (for setting the path to each tool individually), picking this new option is a nobrainer - doing the same thing but in a much less annoying way. But since we have existing users with the existing options (which would need to be kept for some time transitionally), it's much less clear cut whether it's a "significant" improvement.

Agree this is probably a much nicer structure. (Only concern: is it likely that someone wants to use LLVM tools from out-of-tree but build clang tools in-tree?)
If it's plausible to remove the existing options, it probably makes sense to do it sooner rather than later - it'll break people the same amount either way.

Yeah, this makes sense to me if we factor out the commonalities. I agree that it's nicer to have a single option vs. having to specify every single native tool individually.