Page MenuHomePhabricator

Enable configuration files in clang
ClosedPublic

Authored by sepavloff on Sep 26 2016, 10:51 AM.

Details

Summary

Clang is inherently a cross compiler and can generate code for any target
enabled during build. It however requires to specify many parameters in the
invocation, which could be hardcoded during configuration process in the
case of single-target compiler. The purpose of configuration files is to
make specifying clang arguments easier.

A configuration file is a collection of driver options, which are inserted
into command line before other options specified in the clang invocation.
It groups related options together and allows specifying them in simpler,
more flexible and less error prone way than just listing the options
somewhere in build scripts. Configuration file may be thought as a "macro"
that names an option set and is expanded when the driver is called.

Use of configuration files is described in UserManual.rst.

The implementation follows the discussion in mail list
(http://lists.llvm.org/pipermail/cfe-dev/2016-September/050776.html).

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Updated patch

Use more robust algorithm to determine custom compiler prefix.
Bring the code in sync with changes in llvm patch.

hfinkel added inline comments.Feb 2 2017, 7:07 PM
include/clang/Driver/Driver.h
219 ↗(On Diff #86761)

requires -> require

lib/Driver/ToolChain.cpp
183 ↗(On Diff #86761)

I don't think that we can do it this way; it is a behavior change (we now might try to set the target to some string which did not validate as a known target, whereas we did not previously).

How about you always return the prefix, but also return a boolean indicating whether or not the prefix is a valid target? Then, after processing the config file, you can clear out the string if it was not a valid target.

tools/driver/driver.cpp
363 ↗(On Diff #86761)

I don't think that you can clear the string here. We might need it later to call insertTargetAndModeArgs.

449 ↗(On Diff #86761)

Replace

after them, - config file

with

after them because the config file
sepavloff updated this revision to Diff 87229.Feb 6 2017, 7:31 AM

Updated patch

sepavloff marked 2 inline comments as done.Feb 6 2017, 7:37 AM
sepavloff added inline comments.
lib/Driver/ToolChain.cpp
183 ↗(On Diff #86761)

Changed implementation of this function. Indeed using explicit flag looks more clear than conditional setting target name.

tools/driver/driver.cpp
363 ↗(On Diff #86761)

With new implementation of ToolChain::getTargetAndModeFromProgramName it is not needed anymore.

Hello,
I work on WebAssembly toolchains (including emscripten, which is more or less the "cross-compiler SDK" use case). There's a lot of history in this review and I haven't looked at the details yet, but does the current summary text reflect the current proposal and/or content of the review?
Thanks!

Glad to know that someone is interested in this feature!
Below is actual proposal.

Adding named configuration files to clang driver

A configuration file is a collection of driver options, which are inserted into command line before other options specified in the clang invocation. It groups related options together and allows specifying them in simpler, more flexible and less error prone way than just listing the options somewhere in build scripts. Configuration file may be thought as a "macro" that names an option set and is expanded when the driver is called. This feature must be helpful when a user need to specify many options, cross compilation is likely to be such case.

Configuration file can be specified by either of two methods:

  • by command line option --config <config_file>, or
  • by using special executable file names, such as armv7l-clang.

If the option --config is used, its argument is treated as a path to configuration file if it contains a directory separator, otherwise the file is searched for in the set of directories described below. If option --config is absent and clang executable has name in the form armv7l-clang, driver will search for file armv7l.cfg in the same set of directories. Similar encoding is already used by clang to specify target.

The set of directories where configuration files are searched for consists of at most three directories, checked in this order:

  • user directory (like ~/.llvm),
  • system directory (like /etc/llvm),
  • the directory where clang executable resides.

User and system directories are optional, they are reserved for distribution or SDK suppliers. By default they are absent, corresponding directories can be specified by cmake arguments CLANG_CONFIG_FILE_SYSTEM_DIR and CLANG_CONFIG_FILE_USER_DIR. The first found file is used.

Format of configuration file is similar to file used in the construct @file, it is a set of options. Configuration file have advantage over this construct:

  • it is searched for in well-known places rather than in current directory,
  • it may contain comments, long options may be split between lines using trailing backslashes,
  • other files may be included by @file and they will be resolved relative to the including file,
  • the file may be encoded in executable name,
  • unused options from configuration file do not produce warnings.
sepavloff updated this revision to Diff 98330.May 9 2017, 11:42 AM

Updated patch

sepavloff edited the summary of this revision. (Show Details)

Updated patch

  • Some functionality, not related to config files directly, is move to separate changes
  • Cleanup of tests.

This change does not implement using construct @file to apply config
file yet.

Here is a list of design solutions used in this implementation of config files.

How config file is specified

There are two ways to specify config file:

  • To encode it into executable file name, such as foo-clang,
  • To pass config file in command line arguments.

There were no objections to the variant foo-clang. It can be considered as a natural extension of the existing mechanism, in which invocation of foo-clang is equivalent to specifying target in command line: clang --target=foo. Config file allows to specify more than one option and its name is not confined to the registered targets.

As for specifying config file in command line, there are two variants:

  • Use existing construct @foo.
  • Use special command line option --config foo.

Each way has own advantages.

Construct @file allows to reuse existing command line syntax. Indeed, config file is a collection of command line arguments and @file is just a way to inserts such arguments from a file. Config file may include other files and it uses @file with the exception that file is resolved relative to the including file, not to current directory. Config file could be considered as an extension of existing mechanism provided by @file.

Using @file creates compatibility issues, because existing use of @file must not be broken. Obviously the file in @file may be treated as configuration only if it cannot be treated according to the existing semantics. Possible solution is to try loading file as configuration if it does not contain path separator and is not found in current directory.

The drawback of this syntax is that the meaning of @foo in the invocation clang @foo abc.cpp depends on the content of current directory. If it contains file foo, @foo is an expansion of response file, otherwise it specifies a config file. This behavior causes error if current directory accidentally contains a file with the same name as the specified config file.

Using dedicated option to apply config file makes the intention explicit. It also allow to use config files from arbitrary places. For instance, invocation clang --config ./foo allows to treat file foo in current directory as config file.

Although config file contains command line arguments as conventional response file, it differs from the latter:

  • It resolves nested @file constructs differently, relative to including file, not current directory.
  • File is searched for in predefined set of directories, not in the current only.
  • Config file is more like a part of working environment. For instance, clang based SDK supplier could deliver a set config files as a part of their product. Response file in contrast is more close to transient build data, often generated by some tool.
  • Warning about unused options are suppressed in config files.
  • There was a proposal to extend syntax of config file to enable comments and using trailing backslash to split long lines, although these extensions are useful for response files as well.

So, maybe, loading config file deserves a special option. This way has advantages:

  • it expresses intentions explicitly and reduce risk of accidental errors,
  • it allows using absolute paths to specify config file.

Where config files reside

There may be several places where config files can be kept:

  • Directory where clang executable resides. It is convenient for SDK developers as it simplifies packaging. User can use several instances of clang at the same time, they still may use their own set of config files without conflicts.
  • System directory (for instance, /etc/llvm), that keeps config files for use by several users. Such case is interesting for OS distribution maintainers and SDK developers.
  • User directory (for instance, ~/.llvm). A user can collect config file that tune compiler for their tasks in this directory and use them to select particular option set.
  • Config file can be specified by path, as in clang --config ./foo. This is convenient for developers to ensure that particular configuration is selected.

For the sake of flexibility it make sense to enable all these locations as they are useful in different scenarios. Location of user and system directories are specified at configuration, by default they are absent. If user directory is specified, it should have higher priority over other places for search so that user could correct system supplied option sets.

Driver mode

If config file is encoded in executable name, such as foo-clang, there is concern of using different driver modes. What config file should be searched for if compiler is called as foo-cpp, foo-cl etc? These tools support different set of options, so a flexible solution should provide possibility to specify different config files for different driver modes.

Clang implements flexible scheme of tool naming, in which a tool name has components:

<arch>-<something>-<driver-mode>[-<optional version suffix>][<optional version number>]

The part of executable name that precedes the driver-mode suffix can be arbitrary. It make sense to not analyze the executable name by components but use entire name without version as a base name of config file. So executables:

i686-linux-android-g++
i686-linux-android-g++5.0
i686-linux-android-g++-release

would search for file i686-linux-android-g++.cfg, while

foo-clang
foo-gcc
foo-s++

would search files foo-clang.cfg, foo-gcc.cfg and foo-s++.cfg respectively.

On the other hand, important use of config file is tuning compiler options for cross compilations. In such case it is likely that different tools would use the same options. Cloning config file for each tool is odd, so natural solution is a single file for a target.

The flexible solution is to search long name (like i686-linux-android-g++.cfg) first and if it is not found, look for short name based on 'target' only (such as i686.cfg and foo.cfg).

It make sense to use the same rule for the case when config file is specified explicitly (but not as path), so that invocation foo-clang be equivalent to clang --config foo. In this case the invocation:

clang-cpp --config foo abc.c

would first search for the file foo-clang-cpp, then for foo.cfg.

Target reloading

Configuration file is a general mechanism but it was proposed as a solution for cross compilation problems. In this case config file holds options required to tune compilation for particular target. There is a difficulty here because some command line options like -m32 effectively change the target and the new target may require different settings than those contained in config file.

The proposed solution is to reload config file. If:

  • config file starts with architecture component (like x86_64-),
  • command line contains option(s) that effectively changes target (like -m32),

then the driver tries to load config file with name obtained by replacing architecture component with the actual architecture. For instance, if config file was x86_64-clang.cfg the driver looks for i686-clang.cfg. If proper config file is found, options read from the previous config file are removed and content of new config file is inserted at the beginning of effective command line. If such file is not found, it is not an error.

Effect of target reloading must be exactly the same as if actual target were initially specified. For instance, invocation x86_64-clang -m32 must be equivalent to i686-clang. It means that:

  • Options read from previous config file are removed entirely.
  • The search for new config file is made by the same rule as original file, that is first i686-clang.cfg then i686.cfg.

Conflicting settings

It is possible that clang is requested to load several config files. Consider this possibility for the case of target, which represents more general case.

There are three ways to specify target for compilation:

  • config file, like clang --config i686,
  • executable prefix, like i686-clang,
  • command line option, like --target i686.

All may be combined.

Using --target is existing way to reload target. It must be able to use it in combination with config files, the only difference is possible target reloading. It has precedence over other ways due to compatibility.

Combining executable prefix and explicit config file may create conflicting choice, for instance:

mips64-clang --config x86_64

Possible solutions are:

  • emit error,
  • treat command line option as having higher priority.

The latter solution looks more appropriate, as it is consistent with the way clang processes command line options, --target in particular. Such combination may appear if --config comes from the set of compilation flags, while compiler is specified in different way. User may use --config instead of --target in existing build system just to set several target specific options, so this combination can be allowed for compatibility reason.

Another conflicting case is two config files specified in command line:

clang --config mips64 --config x86_64

Again, we have possible solutions:

  • treat such case as an error,
  • treat the second option as having higher priority.

The latter way is consistent with the general option treatment, but we have no compatibility reason to enable it. As this combination may be a result of error, probably it is better to prohibit it.

Some suggested improvements to the English in the documentation...

docs/UsersManual.rst
700 ↗(On Diff #110909)

Configuration files group command-line options and allow all of them to be specified just by ...

702 ↗(On Diff #110909)

instance -> example

"for instance" and "for example" are essentially the same, but for consistency, "for example" is much more common in this document, so we should standardize.

703 ↗(On Diff #110909)

options, etc.

706 ↗(On Diff #110909)

instance -> example

714 ↗(On Diff #110909)

and options are

719 ↗(On Diff #110909)

directory -> directories

720 ↗(On Diff #110909)

cmake -> CMake

721 ↗(On Diff #110909)

The first file found is used.

724 ↗(On Diff #110909)

specify a configuration file

725 ↗(On Diff #110909)

if the Clang executable

726 ↗(On Diff #110909)

will search for file

729 ↗(On Diff #110909)

If a driver mode

729 ↗(On Diff #110909)

to find a file

730 ↗(On Diff #110909)

if the executable is named x86_64-clang-cl

733 ↗(On Diff #110909)

If the command line

733 ↗(On Diff #110909)

changes -> change

734 ↗(On Diff #110909)

-EL, and some others

734 ↗(On Diff #110909)

and the configuration file starts with an architecture name

735 ↗(On Diff #110909)

load the configuration file for the effective

735 ↗(On Diff #110909)

instance -> example

742 ↗(On Diff #110909)

causes Clang search for a file named i368.cfg first, and if no such file is found, Clang looks for the file x86_64.cfg.

745 ↗(On Diff #110909)

command-line options

745 ↗(On Diff #110909)

several -> more

(several generally implies more than two, so saying "one or several" actually excludes two)

748 ↗(On Diff #110909)

by a trailing

748 ↗(On Diff #110909)

example of a configuration file

762 ↗(On Diff #110909)

included by @file directives

763 ↗(On Diff #110909)

For example, if a configuration file

763 ↗(On Diff #110909)

contains the directive

sepavloff updated this revision to Diff 118315.Oct 9 2017, 10:28 PM

Updated patch according to reviewer's notes

sepavloff marked 28 inline comments as done.Oct 9 2017, 10:37 PM

@hfinkel Thank you for explanations!

hfinkel added inline comments.Oct 25 2017, 8:30 PM
lib/Driver/Driver.cpp
630 ↗(On Diff #118315)
assert(Dir && "Directory path should not be null");
637 ↗(On Diff #118315)

Why do we do clear FilePath here?

657 ↗(On Diff #118315)

Is this necessary? I'd rather have a contract for these functions be that they always clear the FilePath if necessary, and then the caller will not need to worry about it.

661 ↗(On Diff #118315)

Why not just add this directory to the end of the list of directories?

hfinkel added inline comments.Oct 25 2017, 8:32 PM
lib/Driver/Driver.cpp
661 ↗(On Diff #118315)

(by which I mean, why not add BinDirectory to the list of directories instead of having special code for it?)

sepavloff updated this revision to Diff 120403.Oct 26 2017, 6:22 AM

Addressed reviewer's notes

  • Do not process binary directory separately but add it to the array of search directories.
  • Do not use result file name buffer for temporary data.
sepavloff marked 4 inline comments as done.Oct 26 2017, 6:26 AM
sepavloff added inline comments.
lib/Driver/Driver.cpp
637 ↗(On Diff #118315)

The intent was to avoid copying file name buffer. It however makes code less readable. Used temporary buffer instead.

657 ↗(On Diff #118315)

Agree. Changed accordingly.

661 ↗(On Diff #118315)

Now BinDirectory is added to the list of directories.

hfinkel added inline comments.Oct 26 2017, 7:18 PM
lib/Driver/Driver.cpp
706 ↗(On Diff #120403)

if it presents. -> if it is present.

739 ↗(On Diff #120403)

I don't see how this length check makes sense in all cases. If CfgFileName came from a --config command-line options, whether or not it is longer or shoter than ArchPrefixLen seems irrelevant. Do you need to do some prefix check here?

sepavloff marked 3 inline comments as done.Oct 27 2017, 11:50 AM
sepavloff added inline comments.
lib/Driver/Driver.cpp
739 ↗(On Diff #120403)

Indeed, this feature require further elaboration. We are going to treat config file specified by --config with higher priority, but actually we can have even more complex combination, something like:

mips64-clang --config mips32 -target x86 -m32

Nevertheless such call must be consistently processed.

I will implement the relevant functionality two weeks later, after return from vacation.

sepavloff updated this revision to Diff 123503.Nov 19 2017, 9:36 AM

Updated patch

  • Added command line option for setting directories where config files are searched for,
  • Fixed architecture calculation,
  • Option --config in config file is now diagnosed,
  • Output made by option -v contains config file search directories,
  • Test are changed so that they do not fail on non-x86 architectures,
  • Added new tests.
hfinkel added inline comments.Dec 14 2017, 4:42 PM
lib/Driver/Driver.cpp
641 ↗(On Diff #123503)

Use llvm::sys::fs::make_absolute instead of this.

hintonda added inline comments.
include/clang/Config/config.h.cmake
40 ↗(On Diff #123503)

These need to be in quotes since you assign them to a std::string, i.e.:

#cmakedefine CLANG_CONFIG_FILE_SYSTEM_DIR "${CLANG_CONFIG_FILE_SYSTEM_DIR}"
#cmakedefine CLANG_CONFIG_FILE_USER_DIR "${CLANG_CONFIG_FILE_USER_DIR}"
sepavloff updated this revision to Diff 127109.Dec 15 2017, 5:07 AM

Updated patch

Fixed cmake config file definition.
Use llvm::sys::fs::make_absolute to get absolute path.

sepavloff marked 2 inline comments as done.Dec 15 2017, 5:10 AM
sepavloff added inline comments.
include/clang/Config/config.h.cmake
40 ↗(On Diff #123503)

Fixed, thank you.

lib/Driver/Driver.cpp
641 ↗(On Diff #123503)

Removed getAbsolutePath.

hfinkel accepted this revision.Dec 20 2017, 7:23 PM

LGTM

lib/Driver/Driver.cpp
1508 ↗(On Diff #127109)

configuration file directory -> configuration-file directory

1511 ↗(On Diff #127109)

configuration file directory -> configuration-file directory

This revision is now accepted and ready to land.Dec 20 2017, 7:23 PM
sepavloff marked 2 inline comments as done.

Small corrections to the patch

  • Avoid quick return in the case of error, it may cause fails of clang tools.
  • Fixed typo.
This revision was automatically updated to reflect the committed changes.