This is an archive of the discontinued LLVM Phabricator instance.

[Support] Expand `<CFGDIR>` as the base directory in configuration files.
ClosedPublic

Authored by jackoalan on Dec 12 2021, 1:42 PM.

Details

Summary

Extends response file expansion to recognize <CFGDIR> and expand to the
current file's directory. This makes it much easier to author clang config
files rooted in portable, potentially not-installed SDK directories.

A typical use case may be something like the following:

# sample_sdk.cfg
--target=sample
-isystem <CFGDIR>/include
-L <CFGDIR>/lib
-T <CFGDIR>/ldscripts/link.ld

Diff Detail

Event Timeline

jackoalan created this revision.Dec 12 2021, 1:42 PM
jackoalan requested review of this revision.Dec 12 2021, 1:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 12 2021, 1:42 PM

It should be evaluated if <@> is a sufficiently unique token to not accidentally break existing users' response files. The use of shell operator characters would likely ensure this.

jackoalan updated this revision to Diff 393777.Dec 12 2021, 2:04 PM

Add entry to user manual explaining the <@> token.

Herald added a project: Restricted Project. · View Herald TranscriptDec 12 2021, 2:04 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
jackoalan updated this revision to Diff 393781.Dec 12 2021, 3:39 PM

Fix ResponseFiles test for Windows.

jackoalan updated this revision to Diff 393785.Dec 12 2021, 5:36 PM

Test coverage for more than one <@> per arg.

IIUC, the new behavior being introduced by this patch is not the ability of having a way to refer to other files in a config/response file relative way, but rather extending that ability from only the options that start with @ to options that contain @ as a sub-string in any place.

If so, why do we need that exactly? What's wrong with specifying the config/response file as:

--target=sample
-isystem
 @include
-L
 @lib
-T 
@ldscripts/link.ld

instead of having them on the flag and the value on the same line? are there any "real" cases in which you need a foo=@something today and that command line option doesn't have a form without the = in between?

IIUC, the new behavior being introduced by this patch is not the ability of having a way to refer to other files in a config/response file relative way, but rather extending that ability from only the options that start with @ to options that contain @ as a sub-string in any place.

@ actually behaves as an "include" operation (i.e. the contents of the relative file are tokenized into more response file args). It will fail if a directory is passed because it is only used by FileSystem::getBufferForFile.

The <@> operation I am proposing does not perform any file load and simply expands to the base path as the function already knows it.

bump, review requested

sepavloff added inline comments.Dec 22 2021, 9:05 AM
clang/docs/UsersManual.rst
920 ↗(On Diff #393785)

Response and configuration files are different things, so the documentation must more specific what facility is extended.

According to this implementation the response file treatment is extended. During development of configuration files couple of attempts were taken to extend the functionality of response files (see for example https://reviews.llvm.org/D36045). They were rejected as breaking compatibility with gcc. It does not mean that such extending is not possible, but it should be discussed in wider community. There must be serious reason for such extension and significant use case.

As for using this feature for configuration files, it does not look as very helpful. Configuration files are searched for in several places and the first found file is used. In this case specification of various locations relative to the config-file does not look neither safe nor natural. Specific use case also would be helpful to ground the need for such extension.

jackoalan added inline comments.Dec 22 2021, 9:55 AM
clang/docs/UsersManual.rst
920 ↗(On Diff #393785)

Specific use case also would be helpful to ground the need for such extension.

An embedded architecture's community members may wish to deploy their own portable (non-installed) platform packages, while minimizing the user burden of configuring the build system of their choice to set up all the search paths. Ideally, just the --config option should be enough to handle everything necessary for full use of the package.

Consider an embedded architecture "foo". It deploys a common directory which contains a shared configuration file and headers:

-foo-common
  -config.cfg
  -include
    -foo.h
    -stdlib.h
    -string.h
    -...

foo-common/config.cfg:

--target=foo
-isystem <@>/include

foo is implemented by a family of boards. Each board SDK is contained in a directory located next to the common one. Each has their own device libraries and some require special feature and machine flags.

Here is the directory layout for a board named "bar":

-foo-bar
  -config.cfg
  -include
    -bar.h
  -lib
    -libbar.a
    -libc.a
    -libm.a
  -ldscripts
    -link.ld

foo-bar/config.cfg:

@../foo-common/config.cfg
-mcpu=<foo-variant-used-in-bar>
-isystem <@>/include
-L <@>/lib
-T <@>/ldscripts/link.ld

Configuration files are very convenient for setting up compiler options in an accessible way (i.e. not having to depend on hard-coded logic in the driver or cmake/makefiles/etc to set up the environment). However, the inability to express config-relative paths is a major burden for setting up search paths in this use case.

Thank you for the detailed explanation. Now I see how you are going to use this facility and think it worth implementation. A couple of questions.

  1. It it possible to limit the new syntax to config files only? It would avoid concerns of gcc compatibility.
  1. The new token is a weird combination of special symbols, it cannot be understood without reading documentation. Is it possible to use more understandable designation, like <CONFIG> or something similar? It would make config files more readable and allow future extensions.
  1. It it possible to limit the new syntax to config files only? It would avoid concerns of gcc compatibility.

Yes, ultimately this use case only calls for extending config files.

Is it possible to use more understandable designation, like <CONFIG> or something similar?

I agree, keyword directives like this would be more flexible in the long term and the use of shell redirection characters would theoretically protect any general purpose word. For now, I think it is acceptable to consume <CONFIG> as a singular token, but in the future, generic <(.*)> tokenization may be worth adding.

Make expansion token a parameter of ExpandResponseFiles and limit use to readConfigFile. Elaborate manual entry further and add entry to release notes.

jackoalan retitled this revision from [Support] Expand `<@>` as the base directory in response files. to [Support] Expand `<CONFIG>` as the base directory in response files..Dec 23 2021, 12:03 PM
jackoalan edited the summary of this revision. (Show Details)
jackoalan edited the summary of this revision. (Show Details)

Rebase, use the slightly more intuitive <CFGDIR> token to expand base paths.

jackoalan retitled this revision from [Support] Expand `<CONFIG>` as the base directory in response files. to [Support] Expand `<CFGDIR>` as the base directory in response files..Dec 28 2021, 10:06 AM
jackoalan edited the summary of this revision. (Show Details)

The name of the patch could be more precise if you change response files to configuration files.

clang/docs/UsersManual.rst
926–927 ↗(On Diff #396410)

It does not describe the use case clearly. Such macro is useful in the cases when configuration file is kept together with SDK content so that location of various SDK component can be expressed using the file location. Something about that could be provided here.

llvm/include/llvm/Support/CommandLine.h
2117 ↗(On Diff #396410)

It seems that this parameter can be boolean. It only instructs to call ExpandBasePaths, which knows what to expand.

2126 ↗(On Diff #396410)

The same about boolean parameter.

llvm/lib/Support/CommandLine.cpp
1082

It seems that the parameter Token is not needed, as it is always <CFGDIR>.

1122

It also could be boolean.

1182

It could be boolean.

jackoalan retitled this revision from [Support] Expand `<CFGDIR>` as the base directory in response files. to [Support] Expand `<CFGDIR>` as the base directory in configuration files..

Make <CFGDIR> constant at point of expansion. Use bool parameter to activate <CFGDIR> expansion.

jackoalan edited the summary of this revision. (Show Details)Dec 29 2021, 11:53 AM
jackoalan updated this revision to Diff 396576.Dec 29 2021, 3:59 PM

Update call parameters in ExpandResponseFilesDatabase::expand

sepavloff added inline comments.Dec 29 2021, 11:29 PM
llvm/lib/Support/CommandLine.cpp
1099

What happens if <CFGDIR> is used without trailing path? Such line:

--sysroot=<CFGDIR> -abc

would be processed correctly?

jackoalan updated this revision to Diff 396667.Dec 30 2021, 6:49 AM
jackoalan marked 5 inline comments as done.

Update ReadConfigFile test with additional <CFGDIR> expansion contexts (non-prefixed, non-suffixed, escaped in middle).

jackoalan marked an inline comment as done.Dec 30 2021, 6:54 AM
jackoalan added inline comments.
llvm/lib/Support/CommandLine.cpp
1099

Yes, if (!Remaining.empty()) handles this case. I've added more test coverage for varying expansion contexts.

jackoalan marked 3 inline comments as done.Dec 30 2021, 6:56 AM
jackoalan updated this revision to Diff 396670.Dec 30 2021, 7:18 AM

Add ReadConfigFile test case for multiple <CFGDIR> in one arg.

Actually, I just thought of a possible limitation of using path-append when suffixing with something that isn't actually a path component. However, I cannot say how critical this limitation is.

-Wl,-rpath,<CFGDIR>,foo.o

This will cause a trailing / to be inserted after <CFGDIR>. However, since <CFGDIR> is guaranteed to be a directory, a trailing slash should not be harmful in most cases.

The primary motivation of using path-append is to tidy cases that would otherwise expand as dir//file. Although most practical filesystems handle paths with empty components gracefully, vfs::InMemoryFileSystem does not.

Weighing these two issues, I still think it is worth using path-append.

sepavloff accepted this revision.Dec 30 2021, 9:55 AM

LGTM

Thanks!

This revision is now accepted and ready to land.Dec 30 2021, 9:55 AM
This revision was landed with ongoing or failed builds.Dec 30 2021, 10:45 AM
This revision was automatically updated to reflect the committed changes.