This is an archive of the discontinued LLVM Phabricator instance.

cmake: Remove config.guess
AcceptedPublic

Authored by tstellar on Sep 15 2021, 9:40 AM.

Details

Summary

We are unable to update config.guess to a newer version since its
license is different than the license we use for llvm. Therefore, any
new host triple detection code needs to be done using CMake. Rather than have
some triple detection in CMake and some in config.guess, I think it
would be better to move all triple detection code into CMake.

This patch replaces config.guess with some logic that uses the host
compiler to determine the host's triple.

In the case where the CMake detection fails, the user can still manually
specify the host triple using -DLLVM_HOST_TRIPLE.

When using gcc or clang as the host compiler, this patch should make
host triple detection more accurate, since config.guess was returning
the wrong value on some systems.

Diff Detail

Event Timeline

tstellar requested review of this revision.Sep 15 2021, 9:40 AM
tstellar created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2021, 9:40 AM
keith added a subscriber: keith.Sep 15 2021, 9:56 AM
MaskRay added inline comments.Sep 15 2021, 10:02 AM
llvm/CMakeLists.txt
479 ↗(On Diff #372734)

Remove trailing . for the help message to be consistent with other options.

llvm/cmake/modules/GetHostTriple.cmake
71

Mention -DLLVM_HOST_TRIPLE=?

Seems that LLVM_HOST_TRIPLE is used to initialize LLVM_DEFAULT_TARGET_TRIPLE (llvm/CMakeLists.txt), so setting LLVM_HOST_TRIPLE is sufficient even in the absence of config.guess

hvdijk added a subscriber: hvdijk.Sep 15 2021, 11:00 AM

Thanks for suggesting this approach, I was not looking forward to having a massive CMake recreation of config.guess, this is a nice way to avoid duplicating all the detection.

llvm/cmake/modules/GetHostTriple.cmake
58

CMake if commands are special, see the warning at https://cmake.org/cmake/help/latest/command/if.html#variable-expansion.

59

When CMAKE_CXX_COMPILER is set using the CXX environment variable, and CXX contains multiple words, the remaining words are provided in CMAKE_CXX_COMPILER_ARG1 and should also be included. (Yes, despite the name, all words other than the first are stored in a single variable.) CMAKE_CXX_FLAGS may have been set from the CXXFLAGS environment variable.

Including these options can be useful when setting CXX="clang++ -m32" or CXX=clang++ CXXFLAGS=-m32 or such.

That said, this only helps when the host compiler is clang, as gcc -v's printed target is always the default target and does not get adjusted based on any -m options.

% gcc -dumpmachine
x86_64-linux-gnu
% clang -dumpmachine
x86_64-unknown-linux-gnu
llvm/cmake/modules/GetHostTriple.cmake
59
% gcc -dumpmachine
x86_64-linux-gnu
% gcc -dumpmachine -m32
x86_64-linux-gnu
% clang -dumpmachine
x86_64-unknown-linux-gnu
% clang -dumpmachine -m32
i386-unknown-linux-gnu

is probably better. If the user wants to control x86_64-linux-gnu, x86_64-unknown-linux-gnu, x86_64-pc-linux-gnu, the should specify LLVM_HOST_TRIPLE explicitly.

MaskRay added a comment.EditedSep 15 2021, 12:38 PM

I have thought again: we don't even need LLVM_USE_CONFIG_GUESS. The warning can just ask the user to run sh llvm/cmake/config.guess manually and re-run cmake with -DLLVM_HOST_TRIPLE=, with a caution that the config.guess value may not be correct.

tstellar updated this revision to Diff 372785.Sep 15 2021, 1:23 PM

Use -dumpmachine instead of -v.
Pass CXX_FLAGS to compiler along with -dumpmachine.

tstellar marked 4 inline comments as done.Sep 15 2021, 1:24 PM
tstellar added inline comments.
llvm/cmake/modules/GetHostTriple.cmake
71

Ok, I think this means I need to change this from a FATAL_ERROR to a WARNING, since get_host_triple gets called even if LLVM_HOST_TRIPLE is set.

tstellar marked an inline comment as done.Sep 15 2021, 1:26 PM

I have thought again: we don't even need LLVM_USE_CONFIG_GUESS. The warning can just ask the user to run sh llvm/cmake/config.guess manually and re-run cmake with -DLLVM_HOST_TRIPLE=, with a caution that the config.guess value may not be correct.

I like this approach, but what if we took it step further and asked them to download config.guess from https://raw.githubusercontent.com/gcc-mirror/gcc/master/config.guess so that we could remove it from the llvm tree entirely?

I have thought again: we don't even need LLVM_USE_CONFIG_GUESS. The warning can just ask the user to run sh llvm/cmake/config.guess manually and re-run cmake with -DLLVM_HOST_TRIPLE=, with a caution that the config.guess value may not be correct.

I like this approach, but what if we took it step further and asked them to download config.guess from https://raw.githubusercontent.com/gcc-mirror/gcc/master/config.guess so that we could remove it from the llvm tree entirely?

Sounds good!

phosek added a subscriber: phosek.Sep 16 2021, 10:23 AM

I have thought again: we don't even need LLVM_USE_CONFIG_GUESS. The warning can just ask the user to run sh llvm/cmake/config.guess manually and re-run cmake with -DLLVM_HOST_TRIPLE=, with a caution that the config.guess value may not be correct.

I like this approach, but what if we took it step further and asked them to download config.guess from https://raw.githubusercontent.com/gcc-mirror/gcc/master/config.guess so that we could remove it from the llvm tree entirely?

Sounds good!

We could make LLVM_USE_CONFIG_GUESS a string flag rather than a boolean option where the value would be the path to config.guess.

I like this approach, but what if we took it step further and asked them to download config.guess from https://raw.githubusercontent.com/gcc-mirror/gcc/master/config.guess so that we could remove it from the llvm tree entirely?

The idea sounds good to me, but the upstream link is https://git.savannah.gnu.org/cgit/config.git/plain/config.guess. GCC updates its copy when needed for GCC, not whenever a change is made, so it can get outdated.

tstellar updated this revision to Diff 373121.Sep 16 2021, 6:57 PM

Completely remove config.guess and add an error message with instructions
for how to download the latest config.guess and use it to generate the
default target.

I think I'd rather not tell people to download config.guess. We can still modify the one we have, have there been serious problems doing so?

I think I'd rather not tell people to download config.guess. We can still modify the one we have, have there been serious problems doing so?

We could modify the one we have, but personally, I would rather see people put the fixes into CMake rather than trying to keep using this very old version of config.guess.

Do you think we should keep it as a fallback or would you be OK with removing it if we also removed the part of the error message telling people to download it?

keith removed a subscriber: keith.Sep 16 2021, 7:26 PM
MaskRay added a comment.EditedSep 16 2021, 7:36 PM

GettingStarted mentions the supported compilers are:

Clang 3.5
Apple Clang 6.0
GCC 5.1
Visual Studio 2017

Note that the patch doesn't touch the MSVC code path. -dumpmachine works for GCC, Clang and Intel C++ Compiler, so really the downloading config.guess message will only apply to IBM XL C/C++.
Even for that, looking at powerpc64-ibm-aix above, the XL C/C++ code paths are covered.

So perhaps we can just replace elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Clang" OR CMAKE_CXX_COMPILER_ID STREQUAL "GNU") with unconditional -dumpmachine call and ask the user to specify LLVM_HOST_TRIPLE explicitly if -dumpmachine is somehow not available.
User not covered by -dumpmachine are expected to be able to come up the triplet by themselves.

If mentioning config.guess URL is a problem, just omit it. I don't think there will be a large loss.


If we want to be conservative, mention that llvm/cmake/config.guess exists; as a follow-up, delete config.guess.

Otherwise, just delete config.guess in this patch.

tstellar updated this revision to Diff 373343.Sep 17 2021, 2:42 PM
  • Remove suggestion to download config.guess
  • Failur to detect triple is a warning not instead of an error.
tstellar retitled this revision from cmake: Deprecate config.guess to cmake: Remove config.guess.Sep 17 2021, 2:45 PM
tstellar edited the summary of this revision. (Show Details)
MaskRay accepted this revision.Sep 17 2021, 2:52 PM

LGTM.

This revision is now accepted and ready to land.Sep 17 2021, 2:52 PM
tstellar updated this revision to Diff 373363.Sep 17 2021, 4:30 PM
tstellar edited the summary of this revision. (Show Details)

Make failure to determine the host via -dumpmachine a fatal error.
Empty string is not a valid value for LLVM_HOST_TRIPLE.

Does anyone else have comments for this before I commit?

This systems seems somewhat more fragile than config.guess? I guess I'm just not convinced that doing this versus the straightforward changes to config.guess are worth our effort? Can we get an idea of what changes you don't want to make to the local copy of config.guess?

Thanks!

This systems seems somewhat more fragile than config.guess? I guess I'm just not convinced that doing this versus the straightforward changes to config.guess are worth our effort? Can we get an idea of what changes you don't want to make to the local copy of config.guess?

Thanks!

The main feature that is missing from config.guess is the vendor detection (e.g. RedHat, OpenSuse, Ubuntu etc.), so the changes needed in config.guess to support this would be similar to what we have in clang now: https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/Distro.cpp#L97 There may be other features missing too, but this is just the one I'm interested in.

I also think that in general it would be nice to remove config.guess, because its license has caused some confusion with users and contributors.

Perhaps the description can mention the cost we'd incur if we took the "fixing config.guess" approach:

linux-musl, suse-linux, redhat-linux, and every Linux distro which customizes the 'vendor' part need to be fixed. The approach would involve poking around /etc/os-release and perhaps other random files.
As concrete examples: these are all incorrect x86_64-alpine-linux-musl, aarch64-suse-linux-gnu, powerpc64le-redhat-linux-gnu, i586-pc-haiku

The diff with the upstream version has 2000 lines FWIW :)

echristo accepted this revision.Sep 22 2021, 6:36 PM

This systems seems somewhat more fragile than config.guess? I guess I'm just not convinced that doing this versus the straightforward changes to config.guess are worth our effort? Can we get an idea of what changes you don't want to make to the local copy of config.guess?

Thanks!

The main feature that is missing from config.guess is the vendor detection (e.g. RedHat, OpenSuse, Ubuntu etc.), so the changes needed in config.guess to support this would be similar to what we have in clang now: https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/Distro.cpp#L97 There may be other features missing too, but this is just the one I'm interested in.

I also think that in general it would be nice to remove config.guess, because its license has caused some confusion with users and contributors.

That seems reasonable (and Fangrui has given a lot of context behind the scenes as well). It was mostly the long term maintainability of this versus other methods I was concerned about. icc and gxlc have compatibility modes here, though we might want to take a look and make sure that it works in general.

Otherwise, LGTM and thanks :)

-eric

llvm/cmake/modules/GetHostTriple.cmake
54–56

For debugging purposes it might be good to give the entire command line that you're trying versus just that the compiler failed?

tstellar updated this revision to Diff 374435.Sep 22 2021, 9:07 PM

Put full -dumpmachine command in the error message.

Thanks! No need to wait on future approval with that change :)

-eric

I've been doing a lot of testing with this patch, and one problem I've run into is that config.guess would include -gnu in the environment component of the triple, where gcc -dumpmachine does not (at least for the redhat triples e.g. x86_64-redhat-linux). There are a few places in gcc and clang that behave differently when the gnu environment is part of the triple.

I wonder if we should normalize some of the gcc style triples to include the gnu environment by default?

I've been doing a lot of testing with this patch, and one problem I've run into is that config.guess would include -gnu in the environment component of the triple, where gcc -dumpmachine does not (at least for the redhat triples e.g. x86_64-redhat-linux). There are a few places in gcc and clang that behave differently when the gnu environment is part of the triple.

I wonder if we should normalize some of the gcc style triples to include the gnu environment by default?

I tested this and it doesn't work. If we normalize x86_64-redhat-linux to x86_64-redhat-linux-gnu, then we don't end up picking the gcc install at /usr/lib/gcc/x86_64-redhat-linux/.

I've been doing a lot of testing with this patch, and one problem I've run into is that config.guess would include -gnu in the environment component of the triple, where gcc -dumpmachine does not (at least for the redhat triples e.g. x86_64-redhat-linux). There are a few places in gcc and clang that behave differently when the gnu environment is part of the triple.

Same problem for SUSE. Maybe we could let Triple::Triple(const Twine &) "guess" the environment as it does for some MIPS triples? So it would take x86_64-redhat-linux but then based on the OS vendor would set Environment = Triple::GNU; at least in the case of RedHat and SUSE, and maybe some other OS vendors.

I've been doing a lot of testing with this patch, and one problem I've run into is that config.guess would include -gnu in the environment component of the triple, where gcc -dumpmachine does not (at least for the redhat triples e.g. x86_64-redhat-linux). There are a few places in gcc and clang that behave differently when the gnu environment is part of the triple.

Same problem for SUSE. Maybe we could let Triple::Triple(const Twine &) "guess" the environment as it does for some MIPS triples? So it would take x86_64-redhat-linux but then based on the OS vendor would set Environment = Triple::GNU; at least in the case of RedHat and SUSE, and maybe some other OS vendors.

Here is my proposed solution for this: D110900

Matt added a subscriber: Matt.Oct 5 2021, 12:53 PM
hvdijk added a comment.Oct 6 2021, 1:24 PM

I tested this and it doesn't work. If we normalize x86_64-redhat-linux to x86_64-redhat-linux-gnu, then we don't end up picking the gcc install at /usr/lib/gcc/x86_64-redhat-linux/.

We actually already don't. Red Hat have a RH-specific patch for this that lets it work, it's not related to the triple handling, it's that RH prefer GCC installations that contain libgcc_s, which RH's cross-compilers don't: https://src.fedoraproject.org/rpms/clang/blob/rawhide/f/0004-PATCH-clang-Prefer-gcc-toolchains-with-libgcc_s.so-w.patch And this is suppressed under -static -static-libgcc, so to confirm that this is what causes the GCC installation to be picked that you want: after installing gcc-x86_64-linux-gnu and clang on a Fedora system, clang -v prints "Selected GCC installation: /usr/bin/../lib/gcc/x86_64-redhat-linux/11", clang -static -static-libgcc -v prints "Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/11".

So, this patch, if we modify it to normalize x86_64-redhat-linux to x86_64-redhat-linux-gnu, does not make things any worse. I am not trying to dismiss your other patches, they should still be considered, just probably not as blockers for this one.