Page MenuHomePhabricator

[Preprocessor] Rename __is_{target -> host}_* function-like builtin macros
Needs RevisionPublic

Authored by Ericson2314 on Mar 21 2018, 1:00 PM.

Details

Summary

Per my belated [reply] to the mailing list, I believe the "target"
nomenclature incorrect for cross compilation.

[reply]: http://lists.llvm.org/pipermail/cfe-dev/2018-March/057258.html

Diff Detail

Event Timeline

Ericson2314 created this revision.Mar 21 2018, 1:00 PM

Sorry that I missed your earlier comment about this. The confusion could only arise in the context of a tool (like a compiler) that is being used for cross-compilation. That is a small fraction of the audience for Clang, and we should design this in a way that makes the most sense for the majority of users. If there's a naming scheme that is better for both, then we should do that, but I don't think this is it.

When dealing with a cross compiler, there is a need to distinguish the "target" where the compiler will run (which as you point out is typically referred to as the "host") from the "target" code produced by that cross compiler. There are two points in time: (1) when compiling the cross compiler, and (2) when running the cross compiler. In step (1), the compiler will be invoked with a "-target" option that specifies the "host". The preprocessor checks are compile-time checks, so there no way that one of these macros in the source code of the compiler itself could be referring to the target in step (2). The compiler option name will be "-target" regardless. Using "target" names in the macros is consistent with that compiler option name.

When dealing with anything other than a cross compiler (or similar cross-target development tool), the "host" terminology is not commonly used. The obvious connection between these macros and the value specified by the "-target" option would be lost. I really don't think this is a good alternative.

Ericson2314 added a comment.EditedMar 21 2018, 2:11 PM

Sorry that I missed your earlier comment about this. The confusion could only arise in the context of a tool (like a compiler) that is being used for cross-compilation. That is a small fraction of the audience for Clang, and we should design this in a way that makes the most sense for the majority of users. If there's a naming scheme that is better for both, then we should do that, but I don't think this is it.

I agree. But I believe mine no worse for both, and significantly better for the compiling-a-compiler case.

When dealing with a cross compiler, there is a need to distinguish the "target" where the compiler will run (which as you point out is typically referred to as the "host") from the "target" code produced by that cross compiler.

I Agree.

There are two points in time: (1) when compiling the cross compiler, and (2) when running the cross compiler. In step (1), the compiler will be invoked with a "-target" option that specifies the "host".

I prefer not to think times of points of time but in terms of different programs having different perspectives. The bootstrapping compiler was built on A, runs on B, and is passed -target for C. The new compiler was built on B, runs on C, and targets some set D... (which is not constrained). So the two compilers' frame of reference is shifted by 1, but the frame of reference per compiler is constant whether we are building it or running it.

The compiler option name will be "-target" regardless. Using "target" names in the macros is consistent with that compiler option name.

The obvious connection between these macros and the value specified by the "-target" option would be lost.

So I do wonder if -target was the best name, but agreed that ship has long since sailed.

Furthermore, with the way of per-compiler, not per-time thinking I described above, one can reconcile the -target flag with the autoconf terminology by saying is specifying the "target" of the compiler, not the "target" of the thing being built. Indeed, might build the new compiler like

export CC=clang
export CFLAGS+=' -target foo-bar-baz'
./Configure --host foo-bar-baz --target alpha-beta-gamma
make

The preprocessor checks are compile-time checks, so there no way that one of these macros in the source code of the compiler itself could be referring to the target in step (2).

Yes, agreed clang won't know what the target of the compiler being built is (for that would be clang's "post target"). The problem with the status quo is that the new compiler's build system will define its own macros, and those will clash with this in very confusing ways.

For example, check out this file of GHC's https://github.com/ghc/ghc/blob/master/compiler/ghc.mk#L155-L192. The __is_target_* macros made by LLVM would correspond to to the *_HOST_* macros produces by the build system and *not* the *_TARGET_* ones.

I disagree. I think "target" is the correct name, even for cross compiling. For something compiled with -target foo, we are consistent calling it "target" during compile time. It only becomes appropriate to call it "host" during the runtime of the executable. There is no such concept as "host" when you are doing cross compiling.

The builtin macros are compile time constant, so following the compile time naming is much better.

One that that might make my position clearer is to substitute the name "build", "host", and "target" for "build", "run", and "emit". [A colleague of mine proposed these alternative names and I do think they are vastly more human friendly.]

Then if we write

int main(void) {

#if __is_run(window)
  printf("Hello, Satya");
#elif __is_run(darwin)
  printf("Hello, Tim");
#else
  prinf("Unclear who I am talking too.");
#endif

#if __is_emit(darwin)
  #error "What's a Mach-O?"
#else
  /* do something with binutils */
#endif

  return 0;
}

and run

clang -emit something main.c

it is clear the intention is *not* for -emit to control __is_emit.

To me, this makes clear that the problem isn't the name-shift I am proposing, but the inherent vagary of the terms "host" and "target" relative to their specific meaning in Autoconf's jargon.

@steven_wu but what you say is directly contradicted by the GHC example. I'm sure I could fine a GCC example too where some macro with "target" is in the name affects the target of the compiler being built. In the vast majority of programs, no more than one platform need affect preprocessing, but when multiple platforms affect preprocessing, they are *always* named from the perspective of the being-built tool being run.

It is not about matching command line name to builtin marco name. "target" is the platform we are compiling for, whether it is host or device or something else. It is a different concept when you talking about cross-compiling, which "target" is strictly not host and "build" or "host" doesn't matter to compiler at all.

#if __is_run(window)

printf("Hello, Satya");

#elif __is_run(darwin)

printf("Hello, Tim");

#else

prinf("Unclear who I am talking too.");

#endif

This example is bad because you do not know about runtime when you do compilation. Putting runtime environment onto #if is just wrong in many ways. If autoconf really has to name it to something else, you can always write a "#define" to rename __is_target.

Ericson2314 added a comment.EditedMar 21 2018, 3:09 PM

I'm sure I could fine a GCC example

FWIW https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/aarch64.c vs https://github.com/gcc-mirror/gcc/blob/master/gcc/common/config/aarch64/aarch64-common.c

@steven_wu

It is not about matching command line name to builtin marco name.

Whew :)

"target" is the platform we are compiling for, whether it is host or device or something else.

I'm in total agreement, if we are saying that from the compiler's perspective.

It is a different concept when you talking about cross-compiling, which "target" is strictly not host and "build" or "host" doesn't matter to compiler at all.

So to be clear, I don't think there is a legitament reason *why* GHC and GCC care about the target platform at compile time. LLVM's approaching of always being multi-target and only choosing at run-time is far superior. Part of the philosophy behind that approach is moving towards a world where everything just works whether cross compiling or not. Redefining terminology based on whether we are cross compiling is counter to that goal.

This example is bad because you do not know about runtime when you do compilation.

I'm intrigued you singled out my first #if and not my second. If the binary is compiled with clang targeting windows, then (absent wine or something) we can be sure it is running on windows. On the other hand it seems odd and gcc-like to decide at compile time whether the newly built binary is targeting Darwin. The "emit" platform of the newly-built binary is strictly more removed from clang's purview than the "run" platform.

I am not trying to discuss which english word is best here. My point is simply:

  1. macros are evaluated during compile time
  2. "host"means either the platform you compiled on during compile time or the platform you run on during the runtime
  3. __is_host_* is not a good name, because it is misleading as it either implies "runtime" as a compile-time constant, or indicates the wrong platform.

@steven_wu Maybe there is something outside of "build" "host" or "target" that won't suffer from these problems of vantage point? __will_be_built_for_* for a very lengthy example, but hopefully something shorter too?

I may be a bit biased but I agree with @bob.wilson and @steven_wu. The current names are better from the user’s perspective. GCC’s build is a very bad example as it has runtime components built as part of it (libgcc). When building any code, even in a Canadian cross-compile, the target will always be what you are running on. The preprocessor macros are part of the code that you are building for a given target. The association with the command line option makes it more obvious what it is going to use to determine the value. Having a pithy name should also be considered a design goal. Recreating new terminology only muddles the problem.

Even if you are compiling a compiler, there is nothing special. It is a standard user space program that will run on a specific target. Even if you treat it as a perspective of the program, if you bootstrap on Linux, the bootstrapping compiler’s Target will be Linux even if the final compiler has a target of Windows. The compiler is answering from the perspective of the program :)

Ericson2314 added a comment.EditedMar 22 2018, 10:29 AM

OK I'll happily admit that Autoconf's choices of names is terrible, and that, yes, the names can be defined from two differing perspectives. And, I actually do believe the GCC build system is far inferior, too. But on other points I think we're all talking past each other.

I know __will_be_built_for_* is incredibly verbose and admitted as much; I was just trying to begin finding common ground by using different terminology with less baggage. Heck, __is_* works for me too, and nothing's more pithy than that. If you all wish to stick to preexisting terms and thus one of "build" "host" and "target", I'll switch back to arguing why I believe "target" is wrong.

I've claimed that target is never used in preprocessor macros to mean the platform where the being-built code is compiled for / platform where it will run. I've also linked various examples of code showing target used in the way I prescribe. In fact, here is another from LLVM itself: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Config/llvm-config.h.cmake#L23-L33. I quote:

/* Target triple LLVM will generate code for by default */
#cmakedefine LLVM_DEFAULT_TARGET_TRIPLE "${LLVM_DEFAULT_TARGET_TRIPLE}"
/* Host triple LLVM will be executed on */
#cmakedefine LLVM_HOST_TRIPLE "${LLVM_HOST_TRIPLE}"

perfect use of these terms in pre-processor macros as far as I am concerned: from the perspective of the program being built (LLVM) and not the compiler building it.

Do you all have a counter-example of a CPP macro containing target with your meaning?

dexonsmith requested changes to this revision.Mar 22 2018, 8:47 PM

I agree with Saleem and Bob: __is_target_* is not confusing here and seems to be a straightforward spelling. It has also already shipped in LLVM 6.0.0: it would be awkward to stop supporting this syntax.

Regardless, it's not clear that this patch is the right direction (i.e., we're not discussing the patch at all right now). I suggest moving the discussion back to the wider audience on cfe-dev until we have consensus for a change.

This revision now requires changes to proceed.Mar 22 2018, 8:47 PM

Bummer, I didn't realize this had already shipped.

I'm happy to discuss on the mailing list. Indeed I responded there first but it would appear nobody saw my message. Should I reply to my own message with a summary and link of this read, or would one of you like to do that?