Some custom toolchains come with their own header files and compiler
drivers. Those compiler drivers implicitly know about include search path for
those headers. This patch aims to extract that information from drivers and add
it to the command line when invoking clang frontend.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 33799 Build 33798: arc lint + arc unit
Event Timeline
Implementation looks good. I can't see a better way to solve this problem, it's just a bit unfortunate to have a sophisticated solution but not be able to turn it on by default.
I think naming is important here: it's a fairly complicated feature that (I suppose) relatively few will use, so having an unambiguous way to refer to it e.g. in docs will reduce the cost/confusion.
I suggested "QueryDriver" below, but we might be able to come up with something better offline.
clang-tools-extra/clangd/ClangdServer.h | ||
---|---|---|
128 | not a regex | |
128 | I think this probably wants to be a vector<string> (comma-separated on command-line)? Clangd-level settings aren't easy to customize per-project. | |
clang-tools-extra/clangd/GlobalCompilationDatabase.cpp | ||
65 ↗ | (On Diff #202663) | Can we move the implementation out to a separate file? |
71 ↗ | (On Diff #202663) | I'd consider std::search for the delimiters explicitly - I think it's clearer what happens when you find one but not the other. (And you probably want to log an error and do nothing unless you find both) |
82 ↗ | (On Diff #202663) | Is this check important? (What happens if non-existent dirs are on the include path?) If it is, maybe we should pass in the VFS (factory?) here. |
90 ↗ | (On Diff #202663) | some explanation of what this is doing? e.g. "Many compilers have default system includes... these are known by the driver and passed to cc1 ... gcc and compilers with compatible CLI can dump the cc1 invocation information with this syntax... we do so and parse the result." |
96 ↗ | (On Diff #202663) | nit: please don't bother with const on locals (style is debatable, but it's hardly used anywhere and adds little value unless used consistently) |
96 ↗ | (On Diff #202663) | we need to remove this file at some point |
99 ↗ | (On Diff #202663) | log messages should contain more context, e.g. "Driver flags extraction: failed to create temporary file" |
108 ↗ | (On Diff #202663) | this will crash if the lookup didn't provide a "real" type |
112 ↗ | (On Diff #202663) | should we check the driver exists before executing it? the main advantage would be that we avoid logging this as if it were an error (or we do so with a better error message) |
125 ↗ | (On Diff #202663) | I think we should log the success case too... remember this is only going to happen once per driver (or {driver, filetype}). |
129 ↗ | (On Diff #202663) | just take cmd by reference and modify it? |
140 ↗ | (On Diff #202663) | please use llvm::Regex::escape() on the chunks between *, so we're sure to be consistent |
178 ↗ | (On Diff #202663) | I think you need to subscribe to changes in the base and broadcast them to your own subscribers. |
190 ↗ | (On Diff #202663) | check the CommandLine isn't empty? (I know, silly...) |
190 ↗ | (On Diff #202663) | The flag values will be absolute paths (or must be, to make any security sense), so we need to make Driver absolute before checking against it (using the command's working directory). And we need to make sure the string we exec is the same as the one we check, so that needs to be absolute too. |
195 ↗ | (On Diff #202663) | regex check is part of the cacheable computation. |
198 ↗ | (On Diff #202663) | you're computing based on the filename, but caching based only on the driver. If the filename is important, I think we should probably restructure so you get the filetype here and cache by it instead. |
210 ↗ | (On Diff #202663) | put "cache" in the name? |
clang-tools-extra/clangd/GlobalCompilationDatabase.h | ||
96 | If the trusted list is empty, it seems reasonable just to return Base. Then the caller can skip the check. | |
96 | if null base is forbidden (seems to be the intent here), just say so, don't specify what happens in that case (and probably add an assert) | |
clang-tools-extra/clangd/tool/ClangdMain.cpp | ||
271 | I'm not sure this is the right name, because "trusted" doesn't indficate what the feature does or when you might need it. And we've mitigated most of the security risk by turning this off by default, I'm not sure we actually need a warning in the flag name. Maybe -query-driver=? (commenting here because it's the main user-visible documentation, but I think we should use consistent names throughout, e.g. QueryDriverGlob). | |
273 | nit: drop "tells clangd to", it's implied | |
273 | from gcc-compatible drivers | |
274 | I think it's more useful to give an example (/usr/bin/**/clang-*) than try to describe glob syntax. | |
276 | is there something useful we should set this to by default? like /usr/bin/gcc,/usr/bin/g++? Or is the assumption that the standard/system compilers will never have weird information to extract? |
Could you give more context on what the custom toolchains are?
One feasible alternative is to move this detection to clang's driver (where the rest of include path detection lives), why won't that work?
Moving this logic into clang's driver should also work there is nothing specific to clangd. This patch just fetches those headers and adds them to clang invocation command line with a "-isystem" in front of them.
But we execute an arbitrary binary to fetch system includes coming from the custom toolchain drivers. Therefore we decided to rather keep it contained in clangd hidden behind a flag during offline discussions.
For example a gcc cross compiling to arm comes with its own system includes and has some mechanisms to discover that implicitly, without requiring any "-I" flags.
Hence when used with clangd, we make use of system includes instead of the target's include library. This patch aims to solve that issue, tools like cquery also handles that problem in a similar way.
clang-tools-extra/clangd/GlobalCompilationDatabase.cpp | ||
---|---|---|
82 ↗ | (On Diff #202663) | Existence of the directory doesn't really matter, I've put this as a debug measure, in case we get a "non-path"(malformed) entry. So removing it. |
90 ↗ | (On Diff #202663) | I've aleady mentioned this in file comment, do you think it is necessary to mention that in here as well? |
clang-tools-extra/clangd/tool/ClangdMain.cpp | ||
276 | I believe this is not an issue with default gcc/g++ since we never heard such complaints. I would rather leave the default empty until we observe some wide usage, due to security concerns |
That's exactly what driver is about. The approach is slightly different, though. Instead of executing a binary, one has to mimic the toolchain search logic of a particular toolchain by hand.
In addition to includes, it also handles adding the corresponding -D flags and anything else that the cross-compile toolchain does. Is this toolchain not currently supported by the driver? Is adding it so much work that we would choose to workaround like this instead?
As discussed offline, this is also a possible way to go. but looks like it requires making sure every toolchain has correct logic to deduce include paths which is more work and could become outdated, in addition to that it is also possible to introduce breakages due to new search logic. whereas this solution fixes the problem for all target.
So we decided to keep it clangd specific.
clang-tools-extra/clangd/ClangdLSPServer.cpp | ||
---|---|---|
345 | if statement is not needed I think, as you return base if empty | |
clang-tools-extra/clangd/SystemIncludeExtractor.cpp | ||
1 ↗ | (On Diff #204243) | This name is a little limiting (e.g. if we want to pull more info from the driver). Not a big problem, but especially if you anticipate pulling out more properties, query-driver or similar might be clearer. |
66 ↗ | (On Diff #204243) | also log output here? |
91 ↗ | (On Diff #204243) | This isn't an error - probably vlog, *maybe* log |
143 ↗ | (On Diff #204243) | also include extracted includes - the info for this weird setup is important to preserve in the logs I think |
152 ↗ | (On Diff #204243) | Technically this doesn't work if the command is clang --driver-mode=cl ... - you'll be able to query the info from clang using the default gcc-compatible syntax, but the actual command-line won't support the -isystem flag. Not worth fixing I think but a comment? |
254 ↗ | (On Diff #204243) | docs says it returns base if globs are empty? |
clang-tools-extra/clangd/test/system-include-extractor.test | ||
2 | Wow, I'm impressed you managed to test this :-) Test needs quite a lot of comments to explain what it's doing though, I think. |
if statement is not needed I think, as you return base if empty