This is an archive of the discontinued LLVM Phabricator instance.

[docs] Prefer setting LLVM_HOST_TRIPLE instead of LLVM_DEFAULT_TARGET_TRIPLE and LLVM_TARGET_ARCH
ClosedPublic

Authored by mstorsjo on Jan 23 2023, 1:52 PM.

Details

Summary

Setting LLVM_HOST_TRIPLE propagates the information to a few more
places than if only setting LLVM_TARGET_ARCH and
LLVM_DEFAULT_TARGET_TRIPLE, while both of those settings get their
defaults implied from LLVM_HOST_TRIPLE if they're not overridden.

Diff Detail

Event Timeline

mstorsjo created this revision.Jan 23 2023, 1:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 23 2023, 1:52 PM
mstorsjo requested review of this revision.Jan 23 2023, 1:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 23 2023, 1:52 PM
barannikov88 added a subscriber: barannikov88.EditedJan 23 2023, 2:17 PM

I believe this is wrong? You're specifying the host triple, i.e. the platform on which the (built) compiler should run.

beanz added a comment.Jan 23 2023, 2:22 PM

Neither the current doc, nor the proposed change are really always right, and they are doing different things.

The point of the existing option is to tell you how to setup the default target (the target implied when no target is specified) to be the native architecture of your cross target rather than the host (which is what it defaults to).

In theory, LLVM_HOST_TRIPLE should be inferable from the build configuration environment so you should never need to specify it explicitly.

I believe this is wrong? You're specifying the host triple, i.e. the platform on which the (built) compiler should run.

Yes - but this whole article is about cross compiling LLVM so that the compiler itself will run on a different architecture. When doing that, AFAIK it's customary to tell the LLVM CMake build system what kind of triple it actually is running on, i.e. setting LLVM_HOST_TRIPLE is possibly relevant whenever cross compiling.

Then secondly, if you're on OS/arch X, and are cross compiling LLVM to run on OS/arch Y, then it's of course possible to give it a default target triple and for a third OS/arch Z - but as far as I understood this article, it's about a case where Y and Z are equal, i.e. running on whatever system, building LLVM to run on ARM, to generate code for ARM.

Plus, since LLVM_TARGET_ARCH is the target to use for JIT generation, it essentially needs to be the same architecture as the host on which LLVM is going to run, so it can't really be set to a wildly different arch anyway?

The point of the existing option is to tell you how to setup the default target (the target implied when no target is specified) to be the native architecture of your cross target rather than the host (which is what it defaults to).

If I cross compile a LLVM to run on Linux/AArch64 and configure it with LLVM_HOST_TRIPLE=aarch64-linux-gnu, then this also implicitly sets LLVM_DEFAULT_TARGET_TRIPLE to the same, unless I have manually set another value for LLVM_DEFAULT_TARGET_TRIPLE - or do you disagree on this bit?

In theory, LLVM_HOST_TRIPLE should be inferable from the build configuration environment so you should never need to specify it explicitly.

LLVM_HOST_TRIPLE is generally inferrable when _not_ cross compiling, but when cross compiling, AFAIK we don't quite infer it. If LLVM_HOST_TRIPLE isn't set, it's defaulted to LLVM_INFERRED_HOST_TRIPLE which is set with get_host_triple: https://github.com/llvm/llvm-project/blob/35912ad39d8a0f244f36d24526ec70b8b028a6e0/llvm/cmake/config-ix.cmake#L441-L445 For some targets/OSes, get_host_triple does try to figure out the cross target host triple, but for the generic fallback case, it's simply set to the build host by running the config.guess script: https://github.com/llvm/llvm-project/blob/35912ad39d8a0f244f36d24526ec70b8b028a6e0/llvm/cmake/modules/GetHostTriple.cmake#L48

So for e.g. cross compilation to Linux targets, as far as I can see, you do need to set LLVM_HOST_TRIPLE manually as it will otherwise default to that of the machine where you are doing the cross compilation.

beanz added a comment.Jan 23 2023, 2:32 PM

Plus, since LLVM_TARGET_ARCH is the target to use for JIT generation, it essentially needs to be the same architecture as the host on which LLVM is going to run, so it can't really be set to a wildly different arch anyway?

I think you're misunderstanding how some of this works (or maybe rather the implications of it). As a concrete example: If my host build development is Ubuntu-x86, and I'm building LLVM to run on Android-AArch64, and I'm building a JIT to run on Android-AArch64. I NEED the LLVM_TARGET_ARCH to be AArch64, otherwise my JIT when run on Android will attempt to target x86.

I also _probably_ want the default target triple to be aarch64-linux-..., because I probably want the clang I build to infer AArch64-linux as its default architecture.

Setting LLVM_HOST_TRIPLE to Aarch64 on my x86 machine is likely to cause lots of problems, instead allowing it to be inferred from my build machine is appropriate. LLVM_HOST_TRIPLE should only be set explicitly in the odd case where my host machine's architecture and OS can't be identified by our build system.

I think you're misunderstanding how some of this works (or maybe rather the implications of it). As a concrete example: If my host build development is Ubuntu-x86, and I'm building LLVM to run on Android-AArch64, and I'm building a JIT to run on Android-AArch64. I NEED the LLVM_TARGET_ARCH to be AArch64, otherwise my JIT when run on Android will attempt to target x86.

Yes, I agree

I also _probably_ want the default target triple to be aarch64-linux-..., because I probably want the clang I build to infer AArch64-linux as its default architecture.

I also agree

Setting LLVM_HOST_TRIPLE to Aarch64 on my x86 machine is likely to cause lots of problems, instead allowing it to be inferred from my build machine is appropriate. LLVM_HOST_TRIPLE should only be set explicitly in the odd case where my host machine's architecture and OS can't be identified by our build system.

No, here I disagree. LLVM_HOST_TRIPLE is documented as Host on which LLVM binaries will run, not as the host where I'm currently compiling it. We can easily infer the details of the OS where we're doing the build, but usually much less so for the cross target, where the cross compiled LLVM will run.

Yes - but this whole article is about cross compiling LLVM so that the compiler itself will run on a different architecture. When doing that, AFAIK it's customary to tell the LLVM CMake build system what kind of triple it actually is running on, i.e. setting LLVM_HOST_TRIPLE is possibly relevant whenever cross compiling.

Ah, I get it! Sorry for the noise.

beanz added a comment.Jan 23 2023, 2:40 PM

No, here I disagree. LLVM_HOST_TRIPLE is documented as Host on which LLVM binaries will run, not as the host where I'm currently compiling it. We can easily infer the details of the OS where we're doing the build, but usually much less so for the cross target, where the cross compiled LLVM will run.

Ooof... That is the most terribly named variable ever. You are right. I kinda hate the idea of documenting this because that variable name is unnecessarily confusing. In fact, the line directly above the line you changed uses the word host to mean something completely different.

No, here I disagree. LLVM_HOST_TRIPLE is documented as Host on which LLVM binaries will run, not as the host where I'm currently compiling it. We can easily infer the details of the OS where we're doing the build, but usually much less so for the cross target, where the cross compiled LLVM will run.

Ooof... That is the most terribly named variable ever. You are right. I kinda hate the idea of documenting this because that variable name is unnecessarily confusing.

Yeah, it's not really great - but changing it would be kinda a lot of churn for all users who are cross compiling LLVM.

Anyway, my main point here is that whenever you're cross compiling, you more or less do need to set LLVM_HOST_TRIPLE - but you generally don't need to set LLVM_TARGET_ARCH and LLVM_DEFAULT_TARGET_TRIPLE unless you're doing a really, really exotic build. So the documentation should probably explain the most basic cross compilation case, not the most exotic one.

In fact, the line directly above the line you changed uses the word host to mean something completely different.

Ouch, I hadn't noticed that detail. We probably should reword those bits too, to make it even clearer.

No, here I disagree. LLVM_HOST_TRIPLE is documented as Host on which LLVM binaries will run, not as the host where I'm currently compiling it. We can easily infer the details of the OS where we're doing the build, but usually much less so for the cross target, where the cross compiled LLVM will run.

Ooof... That is the most terribly named variable ever. You are right. I kinda hate the idea of documenting this because that variable name is unnecessarily confusing.

Yeah, it's not really great - but changing it would be kinda a lot of churn for all users who are cross compiling LLVM.

I think that name comes from autoconf which uses build (machine where the software is being built), host (machine where the software is going to run) and target (machine we're going to generate code for).

Since we're already on this topic, in D137451 it was also brought up that having both LLVM_DEFAULT_TARGET_TRIPLE and LLVM_TARGET_TRIPLE is confusing and that perhaps we should only have one (presumably the latter).

In runtimes, we currently use LLVM_DEFAULT_TARGET_TRIPLE to construct the installation path but that's a ongoing source of issues. Neither LLVM_HOST_TRIPLE nor LLVM_TARGET_TRIPLE seem like the right replacement, since those variables are exported in LLVMConfig.cmake but in the runtimes build (which uses LLVMConfig.cmake) we need to set the triple based on the host we're compiling runtimes for, not based on the host we compiled LLVM for.

The solution I came up with in D137451 is introducing a new variable LLVM_RUNTIME_TRIPLE to avoid conflict with any of the existing variables. Do you have any other suggestions?

Since we're already on this topic, in D137451 it was also brought up that having both LLVM_DEFAULT_TARGET_TRIPLE and LLVM_TARGET_TRIPLE is confusing and that perhaps we should only have one (presumably the latter).

Hmm, I haven't quite followed exactly what LLVM_TARGET_TRIPLE is and which parts of the code it affects. I don't offhand know where it would be relevant, since a LLVM build supports multiple targets.

I agree that LLVM_DEFAULT_TARGET_TRIPLE is confusing and IMO incorrect for the runtimes. But for building LLVM/Clang level code generation, it's a totally valid option though.

In runtimes, we currently use LLVM_DEFAULT_TARGET_TRIPLE to construct the installation path but that's a ongoing source of issues. Neither LLVM_HOST_TRIPLE nor LLVM_TARGET_TRIPLE seem like the right replacement

IMO, if we'd follow the autoconf build/host/target nomenclature strictly, then LLVM_HOST_TRIPLE would be the correct name for it; within the context of the runtimes, that denotes what host the compiled code will be running on.

since those variables are exported in LLVMConfig.cmake but in the runtimes build (which uses LLVMConfig.cmake)

Ok, so the LLVMConfig.cmake from the surrounding LLVM build ends up included in the cmake builds of the individual cross built runtimes, contaminating these variables with values from the host? That's kinda non-ideal.

IMO, we ideally should avoid including that entirely, or at least filter out such settings which are incorrect here. Anything within LLVMConfig.cmake which is about the host of the LLVM build (arch/executable suffix/triples/etc) should be filtered out altogether. At most some parts that relate to the autoconf-labelled "build" environment can be reasonable to include, since the autoconf "build" environment is the same across both - I guess built tools like FileCheck are propagated this way?

we need to set the triple based on the host we're compiling runtimes for, not based on the host we compiled LLVM for.

The solution I came up with in D137451 is introducing a new variable LLVM_RUNTIME_TRIPLE to avoid conflict with any of the existing variables. Do you have any other suggestions?

I guess that sounds reasonable. LLVM_DEFAULT_TARGET_TRIPLE is at least kinda wrong. I haven't tried to track what LLVM_TARGET_TRIPLE actually does though, but either that or an entirely new variable is probably fine. LLVM_HOST_TRIPLE would be the technically correct but I guess it's messy, especially as long as compiler-rt still is expected to work in a somewhat-cross nature as a project within the main llvm build.

barannikov88 added a comment.EditedJan 24 2023, 1:51 AM

I'm not very familiar with building runtimes, but in case it helps others:

I think that name comes from autoconf which uses build (machine where the software is being built), host (machine where the software is going to run) and target (machine we're going to generate code for).

There is further explanation here.

In the case of target libraries, the machine you’re building for is the machine you specified with --target. So, build is the machine you’re building on (no change there), host is the machine you’re building for (the target libraries are built for the target, so host is the target you specified), and target doesn’t apply (because you’re not building a compiler, you’re building libraries). The configure/make process will adjust these variables as needed.

I.e. if you used --build=A --host=B --target=C (building cross compiler on machine A that will run on machine B and generate code for machine C) for building gcc, it will use --build=A --host=C when building libraries (--target is not applicable).

clang is multi-target and does thus not have --target equivalent (LLVM_DEFAULT_TARGET_TRIPLE is just the default target). So, when we build the compiler and the runtimes at the same time, we do need some kind of LLVM_RUNTIME_TRIPLE.
For example,
-DLLVM_HOST_TRIPLE=A -DLLVM_DEFAULT_TARGET_TRIPLE=B -DLLVM_RUNTIME_TRIPLE=C
would build [multi-target] clang that runs on machine A, will be used to build runtimes for machine C, but by default generate code for machine B.

When building runtime libraries only, LLVM_DEFAULT_TARGET_TRIPLE is not applicable.

mstorsjo updated this revision to Diff 493513.Jan 31 2023, 1:13 AM

Added a paragraph explaining what LLVM_HOST_TRIPLE really signifies here. I'll take care of path-to-host-bin for LLVM_NATIVE_TOOL_DIR in a separate patch.

No objections. The intention is to cross-compile LLVM to run on another target and this change preserves that. I'm not well versed in the subtle differences between the CMake variables so I'm happy for others to take the lead on that part.

beanz accepted this revision.Feb 2 2023, 9:30 AM

LGTM

This revision is now accepted and ready to land.Feb 2 2023, 9:30 AM
This revision was landed with ongoing or failed builds.Feb 3 2023, 12:57 AM
This revision was automatically updated to reflect the committed changes.