Page MenuHomePhabricator

[clangd] Enable extraction of system includes from custom toolchains
ClosedPublic

Authored by kadircet on Jun 3 2019, 1:43 AM.

Details

Summary

Some custom toolchains come with their own header files and compiler
drivers. Those compiler drivers implicitly know about include search path for
those headers. This patch aims to extract that information from drivers and add
it to the command line when invoking clang frontend.

Diff Detail

Repository
rL LLVM

Event Timeline

kadircet created this revision.Jun 3 2019, 1:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2019, 1:43 AM

Implementation looks good. I can't see a better way to solve this problem, it's just a bit unfortunate to have a sophisticated solution but not be able to turn it on by default.

I think naming is important here: it's a fairly complicated feature that (I suppose) relatively few will use, so having an unambiguous way to refer to it e.g. in docs will reduce the cost/confusion.
I suggested "QueryDriver" below, but we might be able to come up with something better offline.

clang-tools-extra/clangd/ClangdServer.h
127 ↗(On Diff #202663)

not a regex

127 ↗(On Diff #202663)

I think this probably wants to be a vector<string> (comma-separated on command-line)?

Clangd-level settings aren't easy to customize per-project.

clang-tools-extra/clangd/GlobalCompilationDatabase.cpp
65 ↗(On Diff #202663)

Can we move the implementation out to a separate file?
It's pretty logically separate, and about as much code as everything else together.
It could also use a file-level comment describing how the scheme works, and also the reasons it's not always on (security, and not all argv0's are gcc-compatible)
(Sharing the same header seems fine)

71 ↗(On Diff #202663)

I'd consider std::search for the delimiters explicitly - I think it's clearer what happens when you find one but not the other.

(And you probably want to log an error and do nothing unless you find both)

82 ↗(On Diff #202663)

Is this check important? (What happens if non-existent dirs are on the include path?)

If it is, maybe we should pass in the VFS (factory?) here.

90 ↗(On Diff #202663)

some explanation of what this is doing?

e.g. "Many compilers have default system includes... these are known by the driver and passed to cc1 ... gcc and compilers with compatible CLI can dump the cc1 invocation information with this syntax... we do so and parse the result."

96 ↗(On Diff #202663)

nit: please don't bother with const on locals (style is debatable, but it's hardly used anywhere and adds little value unless used consistently)

96 ↗(On Diff #202663)

we need to remove this file at some point

99 ↗(On Diff #202663)

log messages should contain more context, e.g. "Driver flags extraction: failed to create temporary file"

108 ↗(On Diff #202663)

this will crash if the lookup didn't provide a "real" type

112 ↗(On Diff #202663)

should we check the driver exists before executing it? the main advantage would be that we avoid logging this as if it were an error (or we do so with a better error message)

125 ↗(On Diff #202663)

I think we should log the success case too... remember this is only going to happen once per driver (or {driver, filetype}).

129 ↗(On Diff #202663)

just take cmd by reference and modify it?

140 ↗(On Diff #202663)

please use llvm::Regex::escape() on the chunks between *, so we're sure to be consistent

178 ↗(On Diff #202663)

I think you need to subscribe to changes in the base and broadcast them to your own subscribers.
As it stands this breaks background indexing I think.
This pattern is error prone, as it's too easy to forget :-(

190 ↗(On Diff #202663)

check the CommandLine isn't empty? (I know, silly...)

190 ↗(On Diff #202663)

The flag values will be absolute paths (or must be, to make any security sense), so we need to make Driver absolute before checking against it (using the command's working directory). And we need to make sure the string we exec is the same as the one we check, so that needs to be absolute too.

195 ↗(On Diff #202663)

regex check is part of the cacheable computation.

198 ↗(On Diff #202663)

you're computing based on the filename, but caching based only on the driver.

If the filename is important, I think we should probably restructure so you get the filetype here and cache by it instead.

210 ↗(On Diff #202663)

put "cache" in the name?

clang-tools-extra/clangd/GlobalCompilationDatabase.h
96 ↗(On Diff #202663)

If the trusted list is empty, it seems reasonable just to return Base. Then the caller can skip the check.

96 ↗(On Diff #202663)

if null base is forbidden (seems to be the intent here), just say so, don't specify what happens in that case (and probably add an assert)

clang-tools-extra/clangd/tool/ClangdMain.cpp
271 ↗(On Diff #202663)

I'm not sure this is the right name, because "trusted" doesn't indficate what the feature does or when you might need it. And we've mitigated most of the security risk by turning this off by default, I'm not sure we actually need a warning in the flag name.

Maybe -query-driver=?

(commenting here because it's the main user-visible documentation, but I think we should use consistent names throughout, e.g. QueryDriverGlob).

273 ↗(On Diff #202663)

nit: drop "tells clangd to", it's implied

273 ↗(On Diff #202663)

from gcc-compatible drivers

274 ↗(On Diff #202663)

I think it's more useful to give an example (/usr/bin/**/clang-*) than try to describe glob syntax.

276 ↗(On Diff #202663)

is there something useful we should set this to by default? like /usr/bin/gcc,/usr/bin/g++?

Or is the assumption that the standard/system compilers will never have weird information to extract?

Could you give more context on what the custom toolchains are?
One feasible alternative is to move this detection to clang's driver (where the rest of include path detection lives), why won't that work?

kadircet updated this revision to Diff 204066.Jun 11 2019, 7:30 AM
kadircet marked 27 inline comments as done.
  • Address comments

Could you give more context on what the custom toolchains are?
One feasible alternative is to move this detection to clang's driver (where the rest of include path detection lives), why won't that work?

Moving this logic into clang's driver should also work there is nothing specific to clangd. This patch just fetches those headers and adds them to clang invocation command line with a "-isystem" in front of them.
But we execute an arbitrary binary to fetch system includes coming from the custom toolchain drivers. Therefore we decided to rather keep it contained in clangd hidden behind a flag during offline discussions.

For example a gcc cross compiling to arm comes with its own system includes and has some mechanisms to discover that implicitly, without requiring any "-I" flags.
Hence when used with clangd, we make use of system includes instead of the target's include library. This patch aims to solve that issue, tools like cquery also handles that problem in a similar way.

clang-tools-extra/clangd/GlobalCompilationDatabase.cpp
82 ↗(On Diff #202663)

Existence of the directory doesn't really matter, I've put this as a debug measure, in case we get a "non-path"(malformed) entry.
But I suppose we can already see this in the "Updating a.cc with command ...." log.

So removing it.

90 ↗(On Diff #202663)

I've aleady mentioned this in file comment, do you think it is necessary to mention that in here as well?

clang-tools-extra/clangd/tool/ClangdMain.cpp
276 ↗(On Diff #202663)

I believe this is not an issue with default gcc/g++ since we never heard such complaints. I would rather leave the default empty until we observe some wide usage, due to security concerns

For example a gcc cross compiling to arm comes with its own system includes and has some mechanisms to discover that implicitly, without requiring any "-I" flags.
Hence when used with clangd, we make use of system includes instead of the target's include library. This patch aims to solve that issue, tools like cquery also handles that problem in a similar way.

That's exactly what driver is about. The approach is slightly different, though. Instead of executing a binary, one has to mimic the toolchain search logic of a particular toolchain by hand.
In addition to includes, it also handles adding the corresponding -D flags and anything else that the cross-compile toolchain does. Is this toolchain not currently supported by the driver? Is adding it so much work that we would choose to workaround like this instead?

kadircet updated this revision to Diff 204243.Jun 12 2019, 2:57 AM
  • Fix off-by-one bug and improve lit test

For example a gcc cross compiling to arm comes with its own system includes and has some mechanisms to discover that implicitly, without requiring any "-I" flags.
Hence when used with clangd, we make use of system includes instead of the target's include library. This patch aims to solve that issue, tools like cquery also handles that problem in a similar way.

That's exactly what driver is about. The approach is slightly different, though. Instead of executing a binary, one has to mimic the toolchain search logic of a particular toolchain by hand.
In addition to includes, it also handles adding the corresponding -D flags and anything else that the cross-compile toolchain does. Is this toolchain not currently supported by the driver? Is adding it so much work that we would choose to workaround like this instead?

As discussed offline, this is also a possible way to go. but looks like it requires making sure every toolchain has correct logic to deduce include paths which is more work and could become outdated, in addition to that it is also possible to introduce breakages due to new search logic. whereas this solution fixes the problem for all target.
So we decided to keep it clangd specific.

sammccall accepted this revision.Mon, Jun 24, 2:35 AM
sammccall added inline comments.
clang-tools-extra/clangd/ClangdLSPServer.cpp
345 ↗(On Diff #204243)

if statement is not needed I think, as you return base if empty

clang-tools-extra/clangd/SystemIncludeExtractor.cpp
1 ↗(On Diff #204243)

This name is a little limiting (e.g. if we want to pull more info from the driver).

Not a big problem, but especially if you anticipate pulling out more properties, query-driver or similar might be clearer.

66 ↗(On Diff #204243)

also log output here?

91 ↗(On Diff #204243)

This isn't an error - probably vlog, *maybe* log

143 ↗(On Diff #204243)

also include extracted includes - the info for this weird setup is important to preserve in the logs I think

152 ↗(On Diff #204243)

Technically this doesn't work if the command is clang --driver-mode=cl ... - you'll be able to query the info from clang using the default gcc-compatible syntax, but the actual command-line won't support the -isystem flag.

Not worth fixing I think but a comment?

254 ↗(On Diff #204243)

docs says it returns base if globs are empty?

clang-tools-extra/clangd/test/system-include-extractor.test
1 ↗(On Diff #204243)

Wow, I'm impressed you managed to test this :-)

Test needs quite a lot of comments to explain what it's doing though, I think.

This revision is now accepted and ready to land.Mon, Jun 24, 2:35 AM
kadircet updated this revision to Diff 206231.Mon, Jun 24, 8:51 AM
kadircet marked 9 inline comments as done.
  • Rename SystemIncludeExtractor to QueryDriverDatabase
  • Address comments
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptWed, Jun 26, 12:48 AM