This is an archive of the discontinued LLVM Phabricator instance.

[CMake] Allow setting the location of host tools with LLVM_NATIVE_TOOL_DIR
ClosedPublic

Authored by mstorsjo on Aug 3 2022, 12:59 AM.

Details

Summary

This avoids having to specify the location of all individual tools.
In current builds, one may want to specify LLVM_TABLEGEN, CLANG_TABLEGEN,
LLDB_TABLEGEN, LLVM_CONFIG_PATH, CLANG_PSEUDO_GEN and
CLANG_TIDY_CONFUSABLE_CHARS_GEN; specifying just the base directory
containing all of them is much more convenient.

Note that HOST_EXECUTABLE_SUFFIX is different from CMAKE_EXECUTABLE_SUFFIX
(the suffix used for the cross compiled binaries). There's an upstream
request for CMake to provide such a variable, e.g.
CMAKE_HOST_EXECUTABLE_SUFFIX, in
https://gitlab.kitware.com/cmake/cmake/-/issues/17553, but it hasn't been
implemented yet.

This is a bit of an RFC, we could of course factorize setting of
HOST_EXECUTABLE_SUFFIX to a helper function/macro, and possibly
factorize the whole setup (including checking LLVM_USE_HOST_TOOLS and
calling build_native_tools) of clang-pseudo-gen and
clang-tidy-confusable-chars-gen, as it does end up as a fair bit of
boilerplate.

Diff Detail

Event Timeline

mstorsjo created this revision.Aug 3 2022, 12:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2022, 12:59 AM
mstorsjo requested review of this revision.Aug 3 2022, 12:59 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 3 2022, 12:59 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
inclyc added a subscriber: inclyc.Aug 14 2022, 4:38 AM

Ping - is there any interest in this - a single flag for pointing towards existing prebuilt native executables instead of having to name every one (llvm-tblgen, clang-tblgen, lldb-tblgen, clang-pseudo-gen, clang-tidy-confusable-chars-gen) individually?

FWIW I have no idea: in principle this makes sense, but I don't use such a configuration and don't have a clear idea of what people who do use it want.

It also adds significant CMake complexity: e.g. clang-pseudo-gen now has 30 lines just to support the non-default "native tools" configuration, and this is duplicated for each tool. Maybe this could be made cheaper by sharing more of this logic in a separate place, but we should probably only add it at all if this really is helping someone significantly.

(Lines of CMake logic are extremely expensive to maintain IME: tooling and diagnostics are poor, there are no tests, there are too many configurations to reason about, interacting components are not documented, the language is inherently confusing and many of us that maintain it have only a rudimentary understanding)

FWIW I have no idea: in principle this makes sense, but I don't use such a configuration and don't have a clear idea of what people who do use it want.

Thanks for having a look and discussing the matter! FWIW, my current cross compilation script does something like this: https://github.com/mstorsjo/llvm-mingw/blob/master/build-llvm.sh#L176-L195 All of those lines could be changed into setting this one option.

When there's more native tools added, I want to keep this in sync to avoid needless recompilation of those tools - which is kinda burdensome in the current setup - while with this option, we'd only need to set one option and be done with it. (As long as everything else works, if there's a new tool that I haven't got hooked up, it "only" makes things slower to build.)

It also adds significant CMake complexity: e.g. clang-pseudo-gen now has 30 lines just to support the non-default "native tools" configuration, and this is duplicated for each tool. Maybe this could be made cheaper by sharing more of this logic in a separate place, but we should probably only add it at all if this really is helping someone significantly.

(Lines of CMake logic are extremely expensive to maintain IME: tooling and diagnostics are poor, there are no tests, there are too many configurations to reason about, interacting components are not documented, the language is inherently confusing and many of us that maintain it have only a rudimentary understanding)

This is a very good point indeed.

Most of the boilerplate for setting up clang-pseudo-gen and clang-tidy-confusable-chars-gen is identical, so those could probably be merged into a macro (hiding all the cross/native complexity entirely from the business logic, just like for tablegen) - but the complexity is still there. (It would end up in at least three cases; tablegen, llvm-config and in a generic-buildtime-tool-macro.)

On the topic of "really is helping someone significantly" - if we wouldn't have the burden of existing users using the existing options (for setting the path to each tool individually), picking this new option is a nobrainer - doing the same thing but in a much less annoying way. But since we have existing users with the existing options (which would need to be kept for some time transitionally), it's much less clear cut whether it's a "significant" improvement.

TL;DR: Thanks for explanation, LG if we move the common logic into a macro.

FWIW I have no idea: in principle this makes sense, but I don't use such a configuration and don't have a clear idea of what people who do use it want.

Thanks for having a look and discussing the matter! FWIW, my current cross compilation script does something like this: https://github.com/mstorsjo/llvm-mingw/blob/master/build-llvm.sh#L176-L195 All of those lines could be changed into setting this one option.

When there's more native tools added, I want to keep this in sync to avoid needless recompilation of those tools - which is kinda burdensome in the current setup - while with this option, we'd only need to set one option and be done with it. (As long as everything else works, if there's a new tool that I haven't got hooked up, it "only" makes things slower to build.)

Thanks, if this option is useful for you then I think it's probably useful for others too.
(Just want to make sure we're not adding features for no-one)

It also adds significant CMake complexity: e.g. clang-pseudo-gen now has 30 lines just to support the non-default "native tools" configuration, and this is duplicated for each tool. Maybe this could be made cheaper by sharing more of this logic in a separate place, but we should probably only add it at all if this really is helping someone significantly.

(Lines of CMake logic are extremely expensive to maintain IME: tooling and diagnostics are poor, there are no tests, there are too many configurations to reason about, interacting components are not documented, the language is inherently confusing and many of us that maintain it have only a rudimentary understanding)

This is a very good point indeed.

Most of the boilerplate for setting up clang-pseudo-gen and clang-tidy-confusable-chars-gen is identical, so those could probably be merged into a macro (hiding all the cross/native complexity entirely from the business logic, just like for tablegen) - but the complexity is still there. (It would end up in at least three cases; tablegen, llvm-config and in a generic-buildtime-tool-macro.)

I think the generic tool macro would be valuable even if it can't be used for tablegen (though it's not obvious at first glance why not).
In practice if we copy the logic in two places it's extremely likely to get copied in a third and/or modified by someone who doesn't really understand it at some point. A macro would make this less likely, the macro name itself is documentation, it increases the benefit/cost of having a comment.

On the topic of "really is helping someone significantly" - if we wouldn't have the burden of existing users using the existing options (for setting the path to each tool individually), picking this new option is a nobrainer - doing the same thing but in a much less annoying way. But since we have existing users with the existing options (which would need to be kept for some time transitionally), it's much less clear cut whether it's a "significant" improvement.

Agree this is probably a much nicer structure. (Only concern: is it likely that someone wants to use LLVM tools from out-of-tree but build clang tools in-tree?)
If it's plausible to remove the existing options, it probably makes sense to do it sooner rather than later - it'll break people the same amount either way.

Yeah, this makes sense to me if we factor out the commonalities. I agree that it's nicer to have a single option vs. having to specify every single native tool individually.

mstorsjo updated this revision to Diff 488982.Jan 13 2023, 6:33 AM

Factorized setup of building a separate host version of code generation tools into a macro, shared across existing users.

There are some minor nuance differences to how tool generation happened before, but it should probably not matter (and this is probably an improvement):

  • Only look at the user setting variable (for pointing to a preexisting external tool executable) if LLVM_USE_HOST_TOOLS is set; if LLVM_USE_HOST_TOOLS isn't set, ignore the variable. (mlir-linalg-ods-gen already did it this way, the clang-tools-extra tools didn't.)
  • Add a dependency on the target version of the executable, when building a native one - this makes sure to regenereate the native one when there are changes to the sources for that tool (according to other cmake comments in the code; the clang-tools-extra tools didn't do this before).

If wanted, I could split this up into two patches; one for refactoring the call sites and one for adding the new variable LLVM_NATIVE_TOOL_DIR - but I'm posting them as one single patch here for now, to hopefully get it reviewed quicker.

beanz added inline comments.Jan 13 2023, 8:17 AM
llvm/cmake/modules/AddLLVM.cmake
2401 ↗(On Diff #488982)

Please make this a function instead of a macro. In general CMake macros expand in ways that are unintuitive which can be a maintenance burden for people coming in and modifying the code.

Also a bit nity, but maybe a name like find_host_program would be more consistent naming. I suspect you could even simplify this implementation by using CMake's find_program (https://cmake.org/cmake/help/latest/command/find_program.html)

mstorsjo added inline comments.Jan 13 2023, 8:58 AM
llvm/cmake/modules/AddLLVM.cmake
2401 ↗(On Diff #488982)

Re function vs macro - in mlir-linalg-ods-gen, we need to set the variable in the parent scope (i.e. the CMakeLists.txt of the surrounding directory); afaik with a function we can set the variable in the scope of the caller, but not in the caller's parent - or am I missing something?

Re naming, I think find_host_program as name feels a bit unintuitive for the regular (non-cross compiling) case; we've set up a target for building a tool and we want to execute it at build time - plus doing the boilerplate setup of building a separate host version of it when cross compiling. The fact that we may want to use a preexisting binary, if the user has pointed us to it, is the exception in the uncommon case here, so I wouldn't want to name the high level structure based on that.

beanz added a comment.Jan 13 2023, 9:20 AM

The convention that find_program uses is to cache the variables, which causes them to be defined at global scope. That also avoids needing to recompute filesystem lookups in incremental builds, which is desirable.

The convention that find_program uses is to cache the variables, which causes them to be defined at global scope.

Right, I guess that could work too. It would be less of a pure refactoring though, but still probably wouldn't need further changes in the call sites. I can try to make that change.

That also avoids needing to recompute filesystem lookups in incremental builds, which is desirable.

Yeah, that's always desireable.

(In this case, the filesystem lookups are in the setup of the <TOOLNAME>_DEFAULT variables though, so that wouldn't be cached here unless we make that variable a cache variable too - but that would only affect users who set the LLVM_NATIVE_TOOL_DIR variable.)

mstorsjo updated this revision to Diff 489098.Jan 13 2023, 12:31 PM

Switched the macro to a function and changed the variables to cache variables, to allow setting them in the grandparent scope without being a macro.

Switched the macro to a function and changed the variables to cache variables, to allow setting them in the grandparent scope without being a macro.

@beanz - Does this seems ok to you now?

beanz accepted this revision.Jan 18 2023, 9:22 AM

LGTM!

This revision is now accepted and ready to land.Jan 18 2023, 9:22 AM
This revision was landed with ongoing or failed builds.Jan 18 2023, 2:04 PM
This revision was automatically updated to reflect the committed changes.