This is an archive of the discontinued LLVM Phabricator instance.

Enable configuration files in clang
ClosedPublic

Authored by sepavloff on Sep 26 2016, 10:51 AM.

Details

Summary

Clang is inherently a cross compiler and can generate code for any target
enabled during build. It however requires to specify many parameters in the
invocation, which could be hardcoded during configuration process in the
case of single-target compiler. The purpose of configuration files is to
make specifying clang arguments easier.

A configuration file is a collection of driver options, which are inserted
into command line before other options specified in the clang invocation.
It groups related options together and allows specifying them in simpler,
more flexible and less error prone way than just listing the options
somewhere in build scripts. Configuration file may be thought as a "macro"
that names an option set and is expanded when the driver is called.

Use of configuration files is described in UserManual.rst.

The implementation follows the discussion in mail list
(http://lists.llvm.org/pipermail/cfe-dev/2016-September/050776.html).

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ddunbar edited edge metadata.Sep 26 2016, 12:14 PM

I am too out of the loop on Clang development to be able to comment on the specific direction, but I will just say that I am highly in favor of adding new features in this direction. Thank you!

mgorny requested changes to this revision.Oct 2 2016, 9:30 AM
mgorny added a reviewer: mgorny.
mgorny added a subscriber: mgorny.

Few minor nits. However, I'd like to say that I like the idea in general and see forward to deploy it in Gentoo.

One use case which doesn't seem to have been mentioned yes is setting the default -rtlib= and -stdlib=. Currently we're only able to set the defaults at compile-time but it generally sucks to rebuild the compiler to change the default. With this, we'd be able to set them via the system config file ;-).

tools/driver/driver.cpp
334 ↗(On Diff #72528)
  1. I'm not sure if others would agree with me but I think it would be better to move those default paths straight to LLVM. If others tools are to adopt those configuration files, it would only be reasonable to use the same search paths consistently and not repeat them in every tool.
  1. I think the /etc prefix should be made configurable via CMake options. One case that would benefit from this is Gentoo Prefix where the Gentoo system root is located in a subdirectory of /.
336 ↗(On Diff #72528)
  1. You need to update this, following your update on D24926.
  2. I think you need to use clang.cfg here.
This revision now requires changes to proceed.Oct 2 2016, 9:30 AM
sepavloff updated this revision to Diff 73245.Oct 3 2016, 2:00 AM
sepavloff edited edge metadata.

Updated path

The patch is aligned with corresponding changes in llvm repository (D24926).
Also support of configuration selection for executables line foo-clang is
added.

sepavloff added inline comments.Oct 3 2016, 2:10 AM
tools/driver/driver.cpp
334 ↗(On Diff #72528)

It make sense, as it would establish uniform paths. However CommandLine is positioned as general purpose library, which can be used by any tool, not only llvm ones, so forcing particular paths seems to be too rigid. Besides, discussion in mail list demonstrated very diverse opinions, so I would prefer flexible solution.

Notion about configurable paths is definitely worth implementing.

336 ↗(On Diff #72528)

Updated, thank you.

mgorny added inline comments.Oct 4 2016, 12:24 AM
tools/driver/driver.cpp
314 ↗(On Diff #73245)

Please document what this function does, exactly. I see you've documented it in call site but a doc here would be helpful as well.

376 ↗(On Diff #73245)

Are you sure about the name? I would rather see TARGET-clang.cfg than a name that doesn't explicitly mention that the file is for clang.

sepavloff updated this revision to Diff 73482.Oct 4 2016, 8:05 AM
sepavloff edited edge metadata.

Updated comments.

sepavloff added inline comments.Oct 4 2016, 8:08 AM
tools/driver/driver.cpp
314 ↗(On Diff #73245)

The function is documented now.

376 ↗(On Diff #73245)

You are right. Changed the comment.

mgorny resigned from this revision.Oct 9 2016, 3:03 AM
mgorny removed a reviewer: mgorny.

I have no further comments. However, I'm quite a fresh contributor, so I'm not really in a position to accept it.

ddunbar resigned from this revision.Oct 9 2016, 5:04 PM
ddunbar removed a reviewer: ddunbar.
phosek added a subscriber: phosek.Oct 23 2016, 2:36 PM
rnk edited edge metadata.Oct 31 2016, 1:17 PM

Added some people who might have opinions on the command line interface. Sorry, I don't have a lot of time for this right now.

hans edited edge metadata.Nov 1 2016, 1:35 PM

I didn't follow the original thread too closely as I didn't really see the problem this was trying to address, but since I'm now cc'd I'll add my opinion.

I'm not in favour of this functionality. (But I guess I might be missing the main use case.)

The way I see it, what flags to pass to the compiler should be configured in a project's build system. If a project wants to disable some warnings by default, add it to the Makefile (or equivalent). I don't see the point in checking in another file next to it.

The idea of having the compiler search for config files in random places on the filesystem seems scary to me. If I want to override a project's compile flags, I'd like to pass them in when configuring the build. If I don't pass any flags, I want Clang to work the same everywhere.

Bundling up compiler flags in a file can be convenient, but we already have @file for that.

There is already a lot of magic in the driver to figure out how to invoke the tools. I fear that this would make things even more complicated.

dtzWill added a subscriber: dtzWill.Nov 1 2016, 1:58 PM

Summary of proposal and discussion in mail list.

Problems

The two main problems this feature addresses are:

  1. Ability to modify default compiler settings.

As an example, the warning -Wundefined-var-template can be helpful for people doing module-enabled builds (https://llvm.org/bugs/show_bug.cgi?id=24425), but makes headache for others, who even come with ideas to turn this warning off by default (http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20160808/167354.html). To cope with this problem clang need a mechanism that effectively turns off a warning as if it were disabled in compiler sources.

  1. Support of cross builds.

Doing cross builds require to specify lots of options that set target, backend options, include and library directories etc. There is interest in making this action more convenient.

Solution

Configuration file groups related options together and allow to specify them in more simple and less error prone way than just listing them somewhere in build scripts. Configuration may be thought as a "macro" names option set and is expanded when compiler is called. There are two use patterns of configuration files:

  • Default configuration file that is applied without need to specify anything in compiler invocation. It can be read from predefined location or be specified by environmental variable.
  • Named configuration file that is specified explicitly by a user, either by command line option --config, or by using special executable file names, such as armv7l-clang.

Only one config file is used, if it is specified explicitly, default config file is not searched. Config file may contain comments, references to other config files using syntax @file. Default config file is searched in the directory where clang executable resides.

Existing experience

Configuration file is not a new concept, it has been used in various form in many compilers including clang.

  1. Intel compiler supports default configuration files (https://software.intel.com/en-us/node/522780), the file is searched for in the directory where executable resides or can be specified by setting environmental variable. The feature is pretty convenient in practice.
  2. Clang allows to specify target by using special executable names, like armv7l-clang. Only target name can be specified by this way, which is not sufficient to tune compiler for new target. Obvious question arise - why not extend this mechanism for other target specific options too.
  3. Clang allows modifying options using environmental variable CCC_OVERRIDE_OPTIONS. This is undocumented way and is looks more like a hack for internal use. Its existence however indicates that such functionality is useful.
  4. Cross compilation environment ELLCC (http://ellcc.org), based on clang, supports advanced config files in YAML format. It has proven to be very handy to support a wide variety of targets.
  5. GCC has support of spec files (https://gcc.gnu.org/onlinedocs/gcc/Spec-Files.html), which can be taken from default location or be specified by command line option --spec. Spec files are more powerful tool than just config files as they not only set options but also program driver logic. Using custom spec files allows GCC to be tuned to particular target without need to specify lots of options in command line.

Concerns

Somewhat random set of concerns extracted from discussions.

  1. "what flags to pass to the compiler should be configured in a project's build system"

A project may include dozens of components each come with own build systems, which are tuned differently. Retargeting such build system can be time consuming and error prone. On the other hand, setting compiler and target specific options is not a business of a component, ideally it may be unaware of using in cross compilation environment. Also, not all component build systems are error free.

  1. "If a project wants to disable some warnings by default, add it to the Makefile".

The problem is that some users do not want modify their build files, probably because it is not simple process as it appears to us. They would prefer customized compiler in which some options are on/off by default.

  1. "The idea of having the compiler search for config files in random places on the filesystem seems scary"

Most of elaborated environments allows search for libraries, packages, classes etc in some predefined places. Running simple command gcc file.c make compiler search set of various places, and the set may change depending on version and OS variant. Complex problems may require complicated tools for solving.

  1. "config files can make difficult to reproduce failure conditions"

Actual set of compiler options is provided in error dump files, in output of clang -v and clang -###, these invocations also report used config file.

  1. "Right now when you invoke clang without any additional options you know that you are in a well-known state"

"If I don't pass any flags, I want Clang to work the same everywhere"
In fact invocation of clang without additional argument does not provide a well known state. Actual state is represented by options passed from clang driver to clang compiler, but these options depend on used OS, CPU. In theory a system upgrade could cause compilation change. Hardly bare driver arguments can represent compilation options.

  1. "we already have @file for bundling up compiler flags"

Config file make this feature a bit more convenient by providing a way to choose proper option set and making files containing options more user friendly by allowing comments and nested @file.

  1. "There is already a lot of magic in the driver to figure out how to invoke the tools"

Orchestrating tools made by clang driver is already a complex task and IMO there is no hope that it can be solved by simple way. But it can be made more structured, config file may be a step in this direction making existing target selection in clang more powerful and useful.

hans added a comment.Nov 3 2016, 7:44 AM

Apologies for seeming so negative here. I can see that this would possibly be beneficial for some users. This would be a powerful feature, which is what worries me.

Here's a specific scenario I'm worried about.

For Chromium, our build system provides a specific Clang version close to ToT, and obviously what flags to use when invoking it. (Developers can of course override the flags when configuring if they want.) Now maybe a developer has a ~/clang.cfg because they want certain flags on by default when they build *other* projects, those flags would now be applied to all clangs, including the one we provide, and potentially wreak havoc.

As for compiler bugs getting harder to reproduce, we already get a lot of bug reports where only the driver invocation is provided, e.g. "clang -O2 foo.c". If the user has a config file that sets -fsome-other-flag, we wouldn't see it. Even worse, what if distros start trying to be helpful and provide default config files?

Other random thoughts:
Do we want to get the same config file for "clang", "clang++" and "clang-cl"? They accept different flags.
If we are going to move forward with this, I think the patch should also include an update to docs/UsersManual.rst, explaining exactly how it works (after that's been figured out of course).

I don't agree with the argumentation of @hans; however, I have my own concerns. I personally dislike the idea of reusing command-line option format for this. While I can see this is the simplest solution, it brings at least a few problems and questions. I'm afraid that this would result in a poor solution that gradually gets worse as people try to workaround its limitations.

Maybe it'd be better to take a step back and attempt to create a regular, .ini/.conf-style configuration file. One particular advantage of that is that we can precisely decide which options to support and how to support them. I think this could be better both for opponents of 'free configuration' (since it will be less powerful and more controlled), and for proponents of nice configuration files (such as me).

For example, the advantage of this is that -std= will no longer magically apply to all source types. If we decide to have a default -std= control via system configuration, we can add separate options for each language (which wouldn't be really useful as command-line options).

For Chromium, our build system provides a specific Clang version close to ToT, and obviously what flags to use when invoking it. (Developers can of course override the flags when configuring if they want.) Now maybe a developer has a ~/clang.cfg because they want certain flags on by default when they build *other* projects, those flags would now be applied to all clangs, including the one we provide, and potentially wreak havoc.

Default config file is searched for only in the directory where clang executable resides. It allows SDK suppliers to customize compiler, for instance by turning off unneeded warnings, by providing their own clang.cfg. On the other hand developers cannot by accident override default setting because clang.cfg is placed in system directory.

As for compiler bugs getting harder to reproduce, we already get a lot of bug reports where only the driver invocation is provided, e.g. "clang -O2 foo.c".

If foo.c includes other files, it can be a challenge now. clang -v can help but anyway, this task indeed becomes a bit more complex.

Do we want to get the same config file for "clang", "clang++" and "clang-cl"? They accept different flags.

The driver is still the same, so I think we can use the same config file in these cases. Using --driver-mode as a second dimension would make the scheme awkward, IMHO.

Maybe it'd be better to take a step back and attempt to create a regular, .ini/.conf-style configuration file. One particular advantage of that is that we can precisely decide which options to support and how to support them. I think this could be better both for opponents of 'free configuration' (since it will be less powerful and more controlled), and for proponents of nice configuration files (such as me).

Advanced config file format is a nice idea, but it already got objections in mail list:

because then we will have to document and maintain the semantics of things like multiple configuration files, in what order they override each other, and how they interact with flags. Our command line flag logic is already too complicated. We have one public interface, and it's flags, and I'm hesitant to add a new one.

I think many things can be simpler if we had enough powerful description language, as gcc spec. As for particular concern of making config files less powerful, maybe we could do simple validation of options contained there? For instance, only options started from -W, -I, etc are allowed, other would cause warning. This however complicates the driver and requires documenting intended behavior.

hans added a comment.Nov 9 2016, 9:12 AM

For Chromium, our build system provides a specific Clang version close to ToT, and obviously what flags to use when invoking it. (Developers can of course override the flags when configuring if they want.) Now maybe a developer has a ~/clang.cfg because they want certain flags on by default when they build *other* projects, those flags would now be applied to all clangs, including the one we provide, and potentially wreak havoc.

Default config file is searched for only in the directory where clang executable resides. It allows SDK suppliers to customize compiler, for instance by turning off unneeded warnings, by providing their own clang.cfg. On the other hand developers cannot by accident override default setting because clang.cfg is placed in system directory.

I thought one of the stated goals for this patch would be to allow users to set default flags without touching the project's build system.

What about this line?

static const ArrayRef<const char *> SearchDirs({ "~/.llvm", "/etc/llvm" });

That sounds like you'll be searching the home directory for a default config file.

And even if we ignore that, the CLANGCFG environment variable presents the same problem.

Let's say a developer uses that to point to a config file with their favourite compiler flags for compiling on their x86_64 machine.

But then they want to build something using the Android NDK, whose Clang will now pull in that config file and start passing flags which doesn't apply to the target, e.g. ARM. The user might not even be invoking the NDK directly; it might be through Android Studio which certainly doesn't expect other flags to be used than the ones it passes.

Default config file is searched for only in the directory where clang executable resides. It allows SDK suppliers to customize compiler, for instance by turning off unneeded warnings, by providing their own clang.cfg. On the other hand developers cannot by accident override default setting because clang.cfg is placed in system directory.

I thought one of the stated goals for this patch would be to allow users to set default flags without touching the project's build system.

Yes, that's true. Initially it was supposed that a user can override compiler defaults by putting clang.cfg into well-knows places, including ~/llvm, in particular. Then this idea was rejected as a result of feedback, and default config file now is allowed only in the directory where clang executable resides. It still allows to override default settings by SDK suppliers but limit this possibility to powerful users.

What about this line?

static const ArrayRef<const char *> SearchDirs({ "~/.llvm", "/etc/llvm" });

That sounds like you'll be searching the home directory for a default config file.

These are directories where config files specified by option --config are searched for. There is a big difference between default config files and those specified explicitly. Default file adds option silently, user does not specify anything in the command line. Due to this point it can represent a source of troubles, such as problems with bug reports. In contrast the named config files are specified explicitly in clang invocation. A user may set the same set of option by listing them in command line. Such config file does not represents any dander, it is only a convenience feature.

And even if we ignore that, the CLANGCFG environment variable presents the same problem.

Let's say a developer uses that to point to a config file with their favourite compiler flags for compiling on their x86_64 machine.

But then they want to build something using the Android NDK, whose Clang will now pull in that config file and start passing flags which doesn't apply to the target, e.g. ARM. The user might not even be invoking the NDK directly; it might be through Android Studio which certainly doesn't expect other flags to be used than the ones it passes.

If a user forget to update LIBRARY_PATH or C_INCLUDE_PATH the result would be the same. Config file set by environment variable is similar to default config file as user does not specify anything in command line. But the user must explicitly specify the file, it is not made automatically.

Ability to append compiler options via environment variable is important feature, I think, because:

  • This feature implemented by Intel compiles has proven its convenience.
  • Build systems provided by software components are not perfect. Some do not honor CFLAGS, some require setting options inside makefiles etc. Changing compiler options might be a challenge. Environment variable solves this problem easily and effectively.
hans added a comment.Nov 10 2016, 1:16 PM

Default config file is searched for only in the directory where clang executable resides. It allows SDK suppliers to customize compiler, for instance by turning off unneeded warnings, by providing their own clang.cfg. On the other hand developers cannot by accident override default setting because clang.cfg is placed in system directory.

I thought one of the stated goals for this patch would be to allow users to set default flags without touching the project's build system.

Yes, that's true. Initially it was supposed that a user can override compiler defaults by putting clang.cfg into well-knows places, including ~/llvm, in particular. Then this idea was rejected as a result of feedback, and default config file now is allowed only in the directory where clang executable resides. It still allows to override default settings by SDK suppliers but limit this possibility to powerful users.

I see. The SDK use case seems quite reasonable to me.

What about this line?

static const ArrayRef<const char *> SearchDirs({ "~/.llvm", "/etc/llvm" });

That sounds like you'll be searching the home directory for a default config file.

These are directories where config files specified by option --config are searched for. There is a big difference between default config files and those specified explicitly. Default file adds option silently, user does not specify anything in the command line. Due to this point it can represent a source of troubles, such as problems with bug reports. In contrast the named config files are specified explicitly in clang invocation. A user may set the same set of option by listing them in command line. Such config file does not represents any dander, it is only a convenience feature.

Oh, I misunderstood then. Thanks for clarifying.

And even if we ignore that, the CLANGCFG environment variable presents the same problem.

Let's say a developer uses that to point to a config file with their favourite compiler flags for compiling on their x86_64 machine.

But then they want to build something using the Android NDK, whose Clang will now pull in that config file and start passing flags which doesn't apply to the target, e.g. ARM. The user might not even be invoking the NDK directly; it might be through Android Studio which certainly doesn't expect other flags to be used than the ones it passes.

If a user forget to update LIBRARY_PATH or C_INCLUDE_PATH the result would be the same.

I don't like those either :-)

Config file set by environment variable is similar to default config file as user does not specify anything in command line. But the user must explicitly specify the file, it is not made automatically.

Sorry, I don't understand this part.

Ability to append compiler options via environment variable is important feature, I think, because:

  • This feature implemented by Intel compiles has proven its convenience.
  • Build systems provided by software components are not perfect. Some do not honor CFLAGS, some require setting options inside makefiles etc. Changing compiler options might be a challenge. Environment variable solves this problem easily and effectively.

But now you seem to be serving the goal about letting the user set flags without using the build system, which I thought wasn't a goal anymore?

Again, allowing SDKs to configure the compiler (such as setting the default target) without actually changing it in the source seems somewhat reasonable, but this other part worries me.

Config file set by environment variable is similar to default config file as user does not specify anything in command line. But the user must explicitly specify the file, it is not made automatically.

Sorry, I don't understand this part.

If a user specifies clang foo.c, the compiler driver "magically" gets additional options in both cases, whether default config file is used or the config is specified by environment variable. So both scenarios are evil to some extent. But the latter case is the lesser evil, as the user must previously define proper variable. Sorry for unclear wording.

Ability to append compiler options via environment variable is important feature, I think, because:
...

  • Build systems provided by software components are not perfect. Some do not honor CFLAGS, some require setting options inside makefiles etc. Changing compiler options might be a challenge. Environment variable solves this problem easily and effectively.

But now you seem to be serving the goal about letting the user set flags without using the build system, which I thought wasn't a goal anymore?

Strictly speaking, you are right. Setting config by environment variable has the same effect as using default config file, with the exception that compiler user deliberately sets the variable and can chose what options to set in a particular case. By allowing this "lesser evil" we provide the user with a way to work around complex cases.

@sepavloff, my suggestion would be to rework the patch to the minimal, acceptable feature (i.e. executable-relative config), and then possibly send more patches for the additional features that require additional discussion. This way, we'll at least get something done in a reasonable time ;-). I'm still looking forward for the opportunity to let users switch default stdlib/rtlib without rebuilding clang.

sepavloff updated this revision to Diff 78167.Nov 16 2016, 4:37 AM
sepavloff edited edge metadata.

Limit the patch by named configuration files only

Follow advice of @mgorny and removed all features from the patch except
named config files. These are files specified explicitly by option
--config or encoded in executable name, such as armv7l-clang. This
kind of configuration files did not cause objections except that "this
functionality is already available with @file". In contrast to @file
configuration files have advantages:

  • they is searched in well-known places rather than in current directory,
  • they may contain comments,
  • they may include other files by @file and these files are resolved relative to the including file.
  • they may be encoded in executable name; such encoding is already used by clang to specify targets.

Also added section about config files to the user manual.

sepavloff updated this object.Nov 16 2016, 4:39 AM
hans added a subscriber: srhines.Nov 16 2016, 11:44 AM

I'm still skeptical, but I think this is an improvement over the previous patch.

I think your best bet to get this landed is to find someone from the cfe-dev thread who is in favour of config files and have them review it.

I'm also cc'ing srhines who might be interested in this since he works on the Android toolchains and this patch has configuring cross-compilers as one of its motivations.

docs/UsersManual.rst
667 ↗(On Diff #78167)

The "add .cfg extension" magic seems a little awkward. It seems this is mixing the possibility of taking a filename and taking some other name.

For clang --config myfile.cfg, should it also search the current working directory?

I'm not keen on it searching in ~/.llvm.

694 ↗(On Diff #78167)

I guess you mean directive @os/linux.opts?

test/Driver/config-file.c
10 ↗(On Diff #78167)

This test will fail if I have a ~/.llvm/inexistent.cfg file on my machine.

13 ↗(On Diff #78167)

Where did /etc/llvm come from?

14 ↗(On Diff #78167)

There is no test checking the functionality that finds a config file based on the clang executable name.

rsmith added inline comments.Nov 16 2016, 2:44 PM
docs/UsersManual.rst
655–656 ↗(On Diff #78167)

Rather than inventing a new mechanism, could we extend our existing @file facility to support comments and nested inclusion of further @files?

667 ↗(On Diff #78167)

I'm also not keen on searching ~/.llvm. Whoever makes the blah-clang symlink should get to control what the default configuration for blah-clang is, I think. But then this seems to immediately lead to the conclusion that we don't need this implicit-config-file feature at all, since you can replace a blah-clang symlink with a shell script containing exec clang @blah.cfg "@$" -- and it's better to handle it that way, since you get more control over where the config file lives.

Thanks to Hans for notifying me about this. Cross-compilation flags are a big challenge in Android, but I have similar concerns about looking for magic files/dirs. Richard's suggestion about improving @file is really interesting, and would satisfy a large part of our needs in Android. Can you comment on whether you think that would work for your cases as well?

docs/UsersManual.rst
655–656 ↗(On Diff #78167)

This is a very interesting idea, and seems like something that would be very interesting for large systems that do a lot of cross-compilation. For instance, in Android, I would love to have a set of generic flags to add to all of our cross-compiles, and then a set of per-ABI flags for each of our major targets. If I could nest @file directives, that would get me most of this with minimal/no duplication of flags.

667 ↗(On Diff #78167)

Android is essentially taking the shell script wrapper approach today (although we are using python, and we don't do a lot with it just yet).

sepavloff updated this revision to Diff 78652.Nov 20 2016, 12:34 AM

Updated patch.

  • Restrict the search for config files specified by mangled clang name to binary directory only,
  • Allow the search for config files specified by --config in binary directory also,
  • Added tests for using mangled clang names and nested file inclusion.

What happens with unused arguments in the configuration files? This feature looks potentially useful for me, but only if it suppresses unused-argument warnings. For example, if I put

-L/usr/local/lib -stdlib=libc++

into a configuration file, I can't have these:

clang-4.0: warning: argument unused during compilation: '-L/usr/local/lib' [-Wunused-command-line-argument]
clang-4.0: warning: argument unused during compilation: '-stdlib=libc++' [-Wunused-command-line-argument]

pop up when compiling all over the place.

sepavloff marked 5 inline comments as done.Nov 20 2016, 4:56 AM

Rather than inventing a new mechanism, could we extend our existing @file facility to support comments and nested inclusion of further @files?

Reusing @file is an attractive idea, but it cannot be implemented due to compatibility reason. The file in @file is resolved relative to current directory in both msvc and libiberty. Current directory may change during build, so use of @file to deliver set of flags to every invocation of clang becomes unrealistic. Config files instead are taken from well-known places and if a user set CFLAGS='--config abc.cfg' in call to make, it should work as intended.

For clang --config myfile.cfg, should it also search the current working directory?

I believe no, current directory is not reliable place for configurations, as it may change during build.

Whoever makes the blah-clang symlink should get to control what the default configuration for blah-clang is, I think.

The patch is changed so that config file for blah-clang is searched for *only* in the directory where blah-clang resides. It prevents a user from overwriting 'system' config files. The idea is that the files in the binary directory are prepared by SDK suppliers who adapt clang for specific needs.

I'm not keen on it searching in ~/.llvm.

The patch is changed so that ~/.llvm is searched only of config files specified explicitly by --config. A user can specify full path to the config file: clang --config ~/.llvm/abc.cfg. By making the search in ~/.llvm we make work with config files a bit more convenient.

you can replace a blah-clang symlink with a shell script containing exec clang @blah.cfg "@$"

Due to intermediate shell the environment variables which were not exported would be lost. This solution is OK for build system but from viewpoint of compiler, which must work in any build system, it is too fragile.

docs/UsersManual.rst
667 ↗(On Diff #78167)

Agree, removed this feature.

694 ↗(On Diff #78167)

Yes, thank you for the catch.

test/Driver/config-file.c
10 ↗(On Diff #78167)

Changed the name to more long and awkward to reduce chance of clash.

13 ↗(On Diff #78167)

Remnants of the previous variant, removed.

14 ↗(On Diff #78167)

Added tests (file config-file2.c).

Whoever makes the blah-clang symlink should get to control what the default configuration for blah-clang is, I think.

The patch is changed so that config file for blah-clang is searched for *only* in the directory where blah-clang resides. It prevents a user from overwriting 'system' config files. The idea is that the files in the binary directory are prepared by SDK suppliers who adapt clang for specific needs.

I think this killed my use case. Could this be extended to somehow allow providing configuration for the compiler installed in standard *nix layout? I.e. make it possible to make it search ../../etc/clang or something like that?

you can replace a blah-clang symlink with a shell script containing exec clang @blah.cfg "@$"

Due to intermediate shell the environment variables which were not exported would be lost. This solution is OK for build system but from viewpoint of compiler, which must work in any build system, it is too fragile.

I think you are wrong here, or I'm missing something. When clang is spawned, only exported variables are passed to it. An intermediate shell layer won't change this.

However, I don't think forcing an additional wrapper for all clang invocations is a neat idea.

sepavloff updated this revision to Diff 81055.Dec 12 2016, 1:08 AM
sepavloff marked 4 inline comments as done.

Updated patch.

  • Directories which are searched for config files are now defined during project configuration process using special cmake options.
  • Driver do not warn on unused arguments that come from config file.
hfinkel added inline comments.Dec 12 2016, 1:18 AM
lib/Driver/Driver.cpp
2695 ↗(On Diff #81055)

Grammar here is a bit odd, how about:

// Claim all arguments that come from a configuration file so that the driver does not warn on any that are unused.
sepavloff updated this revision to Diff 81057.Dec 12 2016, 1:42 AM

Fixed grammar in comment, thanks to Hal Finkel.

bruno added a subscriber: bruno.Dec 12 2016, 11:11 AM
bruno added inline comments.
include/clang/Driver/Driver.h
287 ↗(On Diff #81057)

x -> FileName

lib/Driver/Driver.cpp
172 ↗(On Diff #81057)

If NumCfgArgs == 0 it would be nice to warn that the config file is empty?

2698 ↗(On Diff #81057)

Why shouldn't we warn about those? Should clang warn and point to the config file instead?

tools/driver/driver.cpp
318 ↗(On Diff #81057)

Use \brief here?

333 ↗(On Diff #81057)

You can early return false here if Pos == StringRef::npos

sepavloff updated this revision to Diff 81227.Dec 13 2016, 6:26 AM

Addressed review notes.

sepavloff added inline comments.Dec 13 2016, 6:53 AM
include/clang/Driver/Driver.h
287 ↗(On Diff #81057)

Fixed.

lib/Driver/Driver.cpp
172 ↗(On Diff #81057)

Not sure if it makes sense. Usually warning are used to attract attention to the things that potentially may be harmful. An empty config file is strange but does not look dangerous.

2698 ↗(On Diff #81057)

If clang is called to compile files only, it will warn on options like -L/usr/local/lib as they are unused. We don't want such warnings for options in config file, as it is used for all invocations of clang.

tools/driver/driver.cpp
318 ↗(On Diff #81057)

Doxygen is run with AUTO_BRIEF options turned on (enabled in r242485 - Doxygen: Enable autobrief feature, matching llvm config/coding standards).

333 ↗(On Diff #81057)

Fixed.

EricWF added a subscriber: EricWF.Jan 30 2017, 11:33 AM
hfinkel added inline comments.Jan 30 2017, 7:24 PM
lib/Driver/Driver.cpp
2695 ↗(On Diff #81227)

Clain -> Claim

172 ↗(On Diff #81057)

I agree. The config files might be automatically generated by wrapper scripts, and adding special cases to avoid creating empty files seems like an unnecessary complication.

tools/driver/driver.cpp
332 ↗(On Diff #81227)

Looking for "-clang" is too specific; our driver-name suffix processing allows more general naming than this. For one thing, it will not pick up armv7l-cpp correctly (which would be a problem because the config files likely contain options affecting preprocessing). I also have users which use symlinks to clang named fooclang.

I think that an easy way to do this is to use the result from ToolChain::getTargetAndModeFromProgramName, which we call from the caller of this function:

std::string ProgName = argv[0];
std::pair<std::string, std::string> TargetAndMode =
    ToolChain::getTargetAndModeFromProgramName(ProgName);

We can use TargetAndMode.first here, instead of looking for the string before "-clang", and that should be consistent with the rest of our processing.

sepavloff updated this revision to Diff 86761.Feb 1 2017, 7:21 PM

Updated patch

Use more robust algorithm to determine custom compiler prefix.
Bring the code in sync with changes in llvm patch.

hfinkel added inline comments.Feb 2 2017, 7:07 PM
include/clang/Driver/Driver.h
219 ↗(On Diff #86761)

requires -> require

lib/Driver/ToolChain.cpp
183 ↗(On Diff #86761)

I don't think that we can do it this way; it is a behavior change (we now might try to set the target to some string which did not validate as a known target, whereas we did not previously).

How about you always return the prefix, but also return a boolean indicating whether or not the prefix is a valid target? Then, after processing the config file, you can clear out the string if it was not a valid target.

tools/driver/driver.cpp
363 ↗(On Diff #86761)

I don't think that you can clear the string here. We might need it later to call insertTargetAndModeArgs.

449 ↗(On Diff #86761)

Replace

after them, - config file

with

after them because the config file
sepavloff updated this revision to Diff 87229.Feb 6 2017, 7:31 AM

Updated patch

sepavloff marked 2 inline comments as done.Feb 6 2017, 7:37 AM
sepavloff added inline comments.
lib/Driver/ToolChain.cpp
183 ↗(On Diff #86761)

Changed implementation of this function. Indeed using explicit flag looks more clear than conditional setting target name.

tools/driver/driver.cpp
363 ↗(On Diff #86761)

With new implementation of ToolChain::getTargetAndModeFromProgramName it is not needed anymore.

Hello,
I work on WebAssembly toolchains (including emscripten, which is more or less the "cross-compiler SDK" use case). There's a lot of history in this review and I haven't looked at the details yet, but does the current summary text reflect the current proposal and/or content of the review?
Thanks!

Glad to know that someone is interested in this feature!
Below is actual proposal.

Adding named configuration files to clang driver

A configuration file is a collection of driver options, which are inserted into command line before other options specified in the clang invocation. It groups related options together and allows specifying them in simpler, more flexible and less error prone way than just listing the options somewhere in build scripts. Configuration file may be thought as a "macro" that names an option set and is expanded when the driver is called. This feature must be helpful when a user need to specify many options, cross compilation is likely to be such case.

Configuration file can be specified by either of two methods:

  • by command line option --config <config_file>, or
  • by using special executable file names, such as armv7l-clang.

If the option --config is used, its argument is treated as a path to configuration file if it contains a directory separator, otherwise the file is searched for in the set of directories described below. If option --config is absent and clang executable has name in the form armv7l-clang, driver will search for file armv7l.cfg in the same set of directories. Similar encoding is already used by clang to specify target.

The set of directories where configuration files are searched for consists of at most three directories, checked in this order:

  • user directory (like ~/.llvm),
  • system directory (like /etc/llvm),
  • the directory where clang executable resides.

User and system directories are optional, they are reserved for distribution or SDK suppliers. By default they are absent, corresponding directories can be specified by cmake arguments CLANG_CONFIG_FILE_SYSTEM_DIR and CLANG_CONFIG_FILE_USER_DIR. The first found file is used.

Format of configuration file is similar to file used in the construct @file, it is a set of options. Configuration file have advantage over this construct:

  • it is searched for in well-known places rather than in current directory,
  • it may contain comments, long options may be split between lines using trailing backslashes,
  • other files may be included by @file and they will be resolved relative to the including file,
  • the file may be encoded in executable name,
  • unused options from configuration file do not produce warnings.
sepavloff updated this revision to Diff 98330.May 9 2017, 11:42 AM

Updated patch

sepavloff edited the summary of this revision. (Show Details)

Updated patch

  • Some functionality, not related to config files directly, is move to separate changes
  • Cleanup of tests.

This change does not implement using construct @file to apply config
file yet.

Here is a list of design solutions used in this implementation of config files.

How config file is specified

There are two ways to specify config file:

  • To encode it into executable file name, such as foo-clang,
  • To pass config file in command line arguments.

There were no objections to the variant foo-clang. It can be considered as a natural extension of the existing mechanism, in which invocation of foo-clang is equivalent to specifying target in command line: clang --target=foo. Config file allows to specify more than one option and its name is not confined to the registered targets.

As for specifying config file in command line, there are two variants:

  • Use existing construct @foo.
  • Use special command line option --config foo.

Each way has own advantages.

Construct @file allows to reuse existing command line syntax. Indeed, config file is a collection of command line arguments and @file is just a way to inserts such arguments from a file. Config file may include other files and it uses @file with the exception that file is resolved relative to the including file, not to current directory. Config file could be considered as an extension of existing mechanism provided by @file.

Using @file creates compatibility issues, because existing use of @file must not be broken. Obviously the file in @file may be treated as configuration only if it cannot be treated according to the existing semantics. Possible solution is to try loading file as configuration if it does not contain path separator and is not found in current directory.

The drawback of this syntax is that the meaning of @foo in the invocation clang @foo abc.cpp depends on the content of current directory. If it contains file foo, @foo is an expansion of response file, otherwise it specifies a config file. This behavior causes error if current directory accidentally contains a file with the same name as the specified config file.

Using dedicated option to apply config file makes the intention explicit. It also allow to use config files from arbitrary places. For instance, invocation clang --config ./foo allows to treat file foo in current directory as config file.

Although config file contains command line arguments as conventional response file, it differs from the latter:

  • It resolves nested @file constructs differently, relative to including file, not current directory.
  • File is searched for in predefined set of directories, not in the current only.
  • Config file is more like a part of working environment. For instance, clang based SDK supplier could deliver a set config files as a part of their product. Response file in contrast is more close to transient build data, often generated by some tool.
  • Warning about unused options are suppressed in config files.
  • There was a proposal to extend syntax of config file to enable comments and using trailing backslash to split long lines, although these extensions are useful for response files as well.

So, maybe, loading config file deserves a special option. This way has advantages:

  • it expresses intentions explicitly and reduce risk of accidental errors,
  • it allows using absolute paths to specify config file.

Where config files reside

There may be several places where config files can be kept:

  • Directory where clang executable resides. It is convenient for SDK developers as it simplifies packaging. User can use several instances of clang at the same time, they still may use their own set of config files without conflicts.
  • System directory (for instance, /etc/llvm), that keeps config files for use by several users. Such case is interesting for OS distribution maintainers and SDK developers.
  • User directory (for instance, ~/.llvm). A user can collect config file that tune compiler for their tasks in this directory and use them to select particular option set.
  • Config file can be specified by path, as in clang --config ./foo. This is convenient for developers to ensure that particular configuration is selected.

For the sake of flexibility it make sense to enable all these locations as they are useful in different scenarios. Location of user and system directories are specified at configuration, by default they are absent. If user directory is specified, it should have higher priority over other places for search so that user could correct system supplied option sets.

Driver mode

If config file is encoded in executable name, such as foo-clang, there is concern of using different driver modes. What config file should be searched for if compiler is called as foo-cpp, foo-cl etc? These tools support different set of options, so a flexible solution should provide possibility to specify different config files for different driver modes.

Clang implements flexible scheme of tool naming, in which a tool name has components:

<arch>-<something>-<driver-mode>[-<optional version suffix>][<optional version number>]

The part of executable name that precedes the driver-mode suffix can be arbitrary. It make sense to not analyze the executable name by components but use entire name without version as a base name of config file. So executables:

i686-linux-android-g++
i686-linux-android-g++5.0
i686-linux-android-g++-release

would search for file i686-linux-android-g++.cfg, while

foo-clang
foo-gcc
foo-s++

would search files foo-clang.cfg, foo-gcc.cfg and foo-s++.cfg respectively.

On the other hand, important use of config file is tuning compiler options for cross compilations. In such case it is likely that different tools would use the same options. Cloning config file for each tool is odd, so natural solution is a single file for a target.

The flexible solution is to search long name (like i686-linux-android-g++.cfg) first and if it is not found, look for short name based on 'target' only (such as i686.cfg and foo.cfg).

It make sense to use the same rule for the case when config file is specified explicitly (but not as path), so that invocation foo-clang be equivalent to clang --config foo. In this case the invocation:

clang-cpp --config foo abc.c

would first search for the file foo-clang-cpp, then for foo.cfg.

Target reloading

Configuration file is a general mechanism but it was proposed as a solution for cross compilation problems. In this case config file holds options required to tune compilation for particular target. There is a difficulty here because some command line options like -m32 effectively change the target and the new target may require different settings than those contained in config file.

The proposed solution is to reload config file. If:

  • config file starts with architecture component (like x86_64-),
  • command line contains option(s) that effectively changes target (like -m32),

then the driver tries to load config file with name obtained by replacing architecture component with the actual architecture. For instance, if config file was x86_64-clang.cfg the driver looks for i686-clang.cfg. If proper config file is found, options read from the previous config file are removed and content of new config file is inserted at the beginning of effective command line. If such file is not found, it is not an error.

Effect of target reloading must be exactly the same as if actual target were initially specified. For instance, invocation x86_64-clang -m32 must be equivalent to i686-clang. It means that:

  • Options read from previous config file are removed entirely.
  • The search for new config file is made by the same rule as original file, that is first i686-clang.cfg then i686.cfg.

Conflicting settings

It is possible that clang is requested to load several config files. Consider this possibility for the case of target, which represents more general case.

There are three ways to specify target for compilation:

  • config file, like clang --config i686,
  • executable prefix, like i686-clang,
  • command line option, like --target i686.

All may be combined.

Using --target is existing way to reload target. It must be able to use it in combination with config files, the only difference is possible target reloading. It has precedence over other ways due to compatibility.

Combining executable prefix and explicit config file may create conflicting choice, for instance:

mips64-clang --config x86_64

Possible solutions are:

  • emit error,
  • treat command line option as having higher priority.

The latter solution looks more appropriate, as it is consistent with the way clang processes command line options, --target in particular. Such combination may appear if --config comes from the set of compilation flags, while compiler is specified in different way. User may use --config instead of --target in existing build system just to set several target specific options, so this combination can be allowed for compatibility reason.

Another conflicting case is two config files specified in command line:

clang --config mips64 --config x86_64

Again, we have possible solutions:

  • treat such case as an error,
  • treat the second option as having higher priority.

The latter way is consistent with the general option treatment, but we have no compatibility reason to enable it. As this combination may be a result of error, probably it is better to prohibit it.

Some suggested improvements to the English in the documentation...

docs/UsersManual.rst
700 ↗(On Diff #110909)

Configuration files group command-line options and allow all of them to be specified just by ...

702 ↗(On Diff #110909)

instance -> example

"for instance" and "for example" are essentially the same, but for consistency, "for example" is much more common in this document, so we should standardize.

703 ↗(On Diff #110909)

options, etc.

706 ↗(On Diff #110909)

instance -> example

714 ↗(On Diff #110909)

and options are

719 ↗(On Diff #110909)

directory -> directories

720 ↗(On Diff #110909)

cmake -> CMake

721 ↗(On Diff #110909)

The first file found is used.

724 ↗(On Diff #110909)

specify a configuration file

725 ↗(On Diff #110909)

if the Clang executable

726 ↗(On Diff #110909)

will search for file

729 ↗(On Diff #110909)

If a driver mode

729 ↗(On Diff #110909)

to find a file

730 ↗(On Diff #110909)

if the executable is named x86_64-clang-cl

733 ↗(On Diff #110909)

If the command line

733 ↗(On Diff #110909)

changes -> change

734 ↗(On Diff #110909)

-EL, and some others

734 ↗(On Diff #110909)

and the configuration file starts with an architecture name

735 ↗(On Diff #110909)

load the configuration file for the effective

735 ↗(On Diff #110909)

instance -> example

742 ↗(On Diff #110909)

causes Clang search for a file named i368.cfg first, and if no such file is found, Clang looks for the file x86_64.cfg.

745 ↗(On Diff #110909)

command-line options

745 ↗(On Diff #110909)

several -> more

(several generally implies more than two, so saying "one or several" actually excludes two)

748 ↗(On Diff #110909)

by a trailing

748 ↗(On Diff #110909)

example of a configuration file

762 ↗(On Diff #110909)

included by @file directives

763 ↗(On Diff #110909)

For example, if a configuration file

763 ↗(On Diff #110909)

contains the directive

sepavloff updated this revision to Diff 118315.Oct 9 2017, 10:28 PM

Updated patch according to reviewer's notes

sepavloff marked 28 inline comments as done.Oct 9 2017, 10:37 PM

@hfinkel Thank you for explanations!

hfinkel added inline comments.Oct 25 2017, 8:30 PM
lib/Driver/Driver.cpp
630 ↗(On Diff #118315)
assert(Dir && "Directory path should not be null");
637 ↗(On Diff #118315)

Why do we do clear FilePath here?

657 ↗(On Diff #118315)

Is this necessary? I'd rather have a contract for these functions be that they always clear the FilePath if necessary, and then the caller will not need to worry about it.

661 ↗(On Diff #118315)

Why not just add this directory to the end of the list of directories?

hfinkel added inline comments.Oct 25 2017, 8:32 PM
lib/Driver/Driver.cpp
661 ↗(On Diff #118315)

(by which I mean, why not add BinDirectory to the list of directories instead of having special code for it?)

sepavloff updated this revision to Diff 120403.Oct 26 2017, 6:22 AM

Addressed reviewer's notes

  • Do not process binary directory separately but add it to the array of search directories.
  • Do not use result file name buffer for temporary data.
sepavloff marked 4 inline comments as done.Oct 26 2017, 6:26 AM
sepavloff added inline comments.
lib/Driver/Driver.cpp
637 ↗(On Diff #118315)

The intent was to avoid copying file name buffer. It however makes code less readable. Used temporary buffer instead.

657 ↗(On Diff #118315)

Agree. Changed accordingly.

661 ↗(On Diff #118315)

Now BinDirectory is added to the list of directories.

hfinkel added inline comments.Oct 26 2017, 7:18 PM
lib/Driver/Driver.cpp
706 ↗(On Diff #120403)

if it presents. -> if it is present.

739 ↗(On Diff #120403)

I don't see how this length check makes sense in all cases. If CfgFileName came from a --config command-line options, whether or not it is longer or shoter than ArchPrefixLen seems irrelevant. Do you need to do some prefix check here?

sepavloff marked 3 inline comments as done.Oct 27 2017, 11:50 AM
sepavloff added inline comments.
lib/Driver/Driver.cpp
739 ↗(On Diff #120403)

Indeed, this feature require further elaboration. We are going to treat config file specified by --config with higher priority, but actually we can have even more complex combination, something like:

mips64-clang --config mips32 -target x86 -m32

Nevertheless such call must be consistently processed.

I will implement the relevant functionality two weeks later, after return from vacation.

sepavloff updated this revision to Diff 123503.Nov 19 2017, 9:36 AM

Updated patch

  • Added command line option for setting directories where config files are searched for,
  • Fixed architecture calculation,
  • Option --config in config file is now diagnosed,
  • Output made by option -v contains config file search directories,
  • Test are changed so that they do not fail on non-x86 architectures,
  • Added new tests.
hfinkel added inline comments.Dec 14 2017, 4:42 PM
lib/Driver/Driver.cpp
641 ↗(On Diff #123503)

Use llvm::sys::fs::make_absolute instead of this.

hintonda added inline comments.
include/clang/Config/config.h.cmake
40 ↗(On Diff #123503)

These need to be in quotes since you assign them to a std::string, i.e.:

#cmakedefine CLANG_CONFIG_FILE_SYSTEM_DIR "${CLANG_CONFIG_FILE_SYSTEM_DIR}"
#cmakedefine CLANG_CONFIG_FILE_USER_DIR "${CLANG_CONFIG_FILE_USER_DIR}"
sepavloff updated this revision to Diff 127109.Dec 15 2017, 5:07 AM

Updated patch

Fixed cmake config file definition.
Use llvm::sys::fs::make_absolute to get absolute path.

sepavloff marked 2 inline comments as done.Dec 15 2017, 5:10 AM
sepavloff added inline comments.
include/clang/Config/config.h.cmake
40 ↗(On Diff #123503)

Fixed, thank you.

lib/Driver/Driver.cpp
641 ↗(On Diff #123503)

Removed getAbsolutePath.

hfinkel accepted this revision.Dec 20 2017, 7:23 PM

LGTM

lib/Driver/Driver.cpp
1508 ↗(On Diff #127109)

configuration file directory -> configuration-file directory

1511 ↗(On Diff #127109)

configuration file directory -> configuration-file directory

This revision is now accepted and ready to land.Dec 20 2017, 7:23 PM
sepavloff marked 2 inline comments as done.

Small corrections to the patch

  • Avoid quick return in the case of error, it may cause fails of clang tools.
  • Fixed typo.
This revision was automatically updated to reflect the committed changes.