This is an archive of the discontinued LLVM Phabricator instance.

clangd: Provide the resource dir via environment variable
Needs ReviewPublic

Authored by madscientist on Jul 10 2023, 3:37 PM.

Details

Reviewers
sammccall
Summary

Set an environment variable CLANGD_RESOURCE_DIR to the path of the Clang
resource directory, so that the compiler driver can (a) know it's being
invoked from clangd and (b) massage its system include paths to use the
Clang resources if it wishes to.

This addresses clangd/clangd#1695

Diff Detail

Event Timeline

madscientist created this revision.Jul 10 2023, 3:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 10 2023, 3:37 PM
madscientist published this revision for review.Jul 10 2023, 3:43 PM
nridge added a subscriber: nridge.

Looks reasonable to me but Sam should OK this.

Thanks for adding Sam. I tried to do this but failed: his Phabricator handle isn't available in CODE_OWNERS.txt and my attempts to add him via his email address failed. I have no Phabricator fu!

sammccall added a comment.EditedJul 16 2023, 1:13 AM

I'm a little nervous about this. On the one hand it's simple and I want to get you unblocked but:

  • the more ways to initialize resource dir and the more places this needs to be propagated to, the more likely this is to get out of sync
  • we might want to remove the need for resource-dir one day and embed the built-in-headers in the binary, it's not clear what this would do in such a world.
  • I don't understand how the technique this enables can be robust. Using the built-in headers from one compiler with a different compiler may work sometimes but is going to break in mysterious ways. I don't want to stop you using this technique, but if we add a feature for this people will want support in using it correctly.

It seems like you can *almost* do this with no new support though - the driver will always be invoked by clangd, so you can ask the OS where the invoking binary is. This assumes:

  • you're on an OS where this is reasonably possible
  • you're not customizing the resource-dir path, so we can use the same heuristics clangd does to find it

Proof of concept (works on my machine!) to access the resource dir from the driver - this just sets it as an include path but you could do whatever:

#!/bin/bash
# /home/sammccall/qd/driver.sh
echo "Target: x86_64-linux-gnu" >&2
echo "#include \"...\" search starts here:" >&2
echo "#include <...> search starts here:" >&2
echo $(dirname $(dirname $(readlink /proc/$PPID/exe)))/lib/clang/*/include >&2
echo "End of search list." >&2
# /home/sammccall/qd/.clangd
CompileFlags:
  Compiler: /home/sammccall/qd/driver.sh
$ env CLANGD_FLAGS=--query-driver="/home/sammccall/qd/driver.sh" clangd --check=/home/sammccall/qd/hello.cc -log=verbose
...
I[10:09:51.174] Failed to find compilation database for /home/sammccall/qd/hello.cc
V[10:09:51.177] System include extraction: target extracted: "x86_64-linux-gnu"
V[10:09:51.177] System include extraction: adding /usr/lib/clangd/lib/clang/16/include
I[10:09:51.177] System includes extractor: successfully executed /home/sammccall/qd/driver.sh
	got includes: "/usr/lib/clangd/lib/clang/16/include"
	got target: "x86_64-linux-gnu"
I[10:09:51.177] Generic fallback command is: [/home/sammccall/qd] /home/sammccall/qd/driver.sh -isystem /usr/lib/clangd/lib/clang/16/include --target=x86_64-linux-gnu -resource-dir=/usr/lib/clangd/lib/clang/16 -- /home/sammccall/qd/hello.cc
...

This is of course pretty coupled to implementation details, but I think we're squarely in that zone anyway. The ../lib/clang/*/include path is very stable in practice, assuming there's only one match for * (i.e. this is a standalone clangd install, not trying to share files with clang which may have various versions).

If this doesn't seem workable, can you explain a bit more about your setup?

Sorry for the delay in replying. I'll try to respond to the various points you made:

  • I don't quite understand the first bullet, "more ways to initialize the resource dir". As far as I can see there's only one place it's set. If my change is using a "non-official" alternative to obtaining the resource dir then certainly I would be happy to change my code to fix that: really the goal of my patch is to ensure that there is one single way to find this directory and this is preserved by telling the compiler driver explicitly where it is, rather than having multiple places attempting to figure it out for themselves.
  • If the resource-dir were to become built in then I would imagine that this variable would be changed to either be empty or (preferable) contain a special token such as "builtin" so that the compiler driver could know this. How this would or could work would depend greatly on exactly how the built-in facilities were implemented, but if for example all that's needed is to avoid adding GCC's intrinsics headers to the list, then the compiler driver that saw a CLANGD_RESOURCE_DIR value of builtin (for example) would remove the GCC intrinsics directory completely, rather than replacing it with the clangd resource directory.

If using the built-in headers from one compiler with a different compiler is fated to break in mysterious ways, does that mean you feel it's not useful to try to use clangd in any project which doesn't use clang as its compiler? I hope that this is not the case since (although the idea has been raised on their mailing lists) there is not much work going on in GCC to create a GCC-based LSP server. Although I do agree that there are no guarantees, in reality clangd works very well in my GCC-based project if I replace the GCC intrinsics directory with the clangd resource directory when the compiler driver generates its list of built-in locations.

Regarding alternative solutions:

  • My environment is cross-platform between GNU/Linux (with various distributions and on various platforms such as x86_64, arm64, and power9 and power10), MacOS on both x86_64 and m1/m2, and Windows (10+) on x86_64. So relying on things like /proc/$PPID/exe (or even bash!) is not ideal.
  • Your solution requires me to know implicitly where (compared to the clangd binary) the resource directory lives; that seems less reliable than having clangd tell me. I don't quite understand the reassurance that these implementation details are stable enough to assume to be "fact", compared with the concern in the first bullet above about how they might change.

Your solution also assumes there is only one version of clang installed on the system; in fact my system has multiple versions so .../lib/clang/*/include expands to multiple paths which parses incorrectly (clangd assumes one directory per line). Of course I could have my compiler driver run clangd -version and parse the output to find the right version; my resource directory is /usr/lib/clang/16 but the version is 16.0.0... it is do-able but it just seems increasingly hairy.

At the moment I don't have a different compiler driver for clangd than the standard compiler wrapper script that is provided via compile_commands.json although of course I could create one and change my .clangd to point to it. I was going to simply modify my compiler wrapper to DTRT if it discovered it was being invoked with -v and CLANGD_RESOURCE_DIR was set.

Also, on one of my systems clangd is apparently configured to use posix_spawn() to invoke the compiler driver, which does not work with shell scripts. But that just seems like a bug somewhere. I get:

E[17:23:24.164] System include extraction: driver execution failed with return code: -1 - 'posix_spawn failed: Exec format error'. Args: [/home/pds/src/clangd/bin/qdriver.sh -E -x c++ - -v]

although running that command from the shell prompt works fine.

TL;DR: can you try just excluding GCC's resource dir, rather than replacing it with clang's?
I think that might solve your problem, and is probably something we should be doing automatically.

Sorry for the delay in replying. I'll try to respond to the various points you made:

  • I don't quite understand the first bullet, "more ways to initialize the resource dir". As far as I can see there's only one place it's set. If my change is using a "non-official" alternative to obtaining the resource dir then certainly I would be happy to change my code to fix that

The resource dir for a parse can be set in the following ways AFAIK, some of which are hopefully unreachable:

  • it can be set by passing -resource-dir to clangd, which gets propagated into the compile flags by CommandMangler
  • CommandMangler::detect() can choose it
  • the compile flags from the CDB can contain -resource-dir, overriding CommandMangler
  • the clang driver that we invoke as a library can detect it (hopefully never happens as a production CommandMangler always ensures -resource-dir is set)
  • it could be left unset if the clang driver fails to detect it (ditto)

I'm not saying you're injecting it into the wrong place, I'm just saying adding a sixth path has a cost.

really the goal of my patch is to ensure that there is one single way to find this directory and this is preserved by telling the compiler driver explicitly where it is, rather than having multiple places attempting to figure it out for themselves.

As far as the driver is concerned, that's already the case before this patch: we endeavour to always pass -resource-dir to the driver.
This patch attempts to sync that setting to/from another place, and doesn't entirely succeed (in the third case the resource dir used for parsing does not match the environment variable).

If using the built-in headers from one compiler with a different compiler is fated to break in mysterious ways, does that mean you feel it's not useful to try to use clangd in any project which doesn't use clang as its compiler?

Not at all, but clangd should use clangd's built-in headers even if you're parsing a project that builds with another compiler.

The built-in headers expose a (fairly) standard interface that code is expected to depend on, and their implementation is tightly coupled to the parser they ship with.
For example, Clang's <tgmath.h> uses __attribute__((__overloadable__)), which GCC doesn't support, while gcc's <tgmath.h> uses __builtin_tgmath, which Clang doesn't support. These are used to implement e.g. generic acosh(), which is a public interface code can rely on.
This is the difference between the built-in headers and the (rest of the) standard library: stdlibs generally try to be mostly portable between compilers and versions, the built-in headers are coupled.
(In principle it's just as incorrect to try to use e.g. clang-12's builtin headers with clangd-13 - these really don't try to be portable).

I hope that this is not the case since (although the idea has been raised on their mailing lists) there is not much work going on in GCC to create a GCC-based LSP server. Although I do agree that there are no guarantees, in reality clangd works very well in my GCC-based project if I replace the GCC intrinsics directory with the clangd resource directory when the compiler driver generates its list of built-in locations.

Ahh, I finally think I understand what you're trying to do here... is this right?

  • you have a project that builds with GCC, using GCC's include path
  • you want to reuse that config with clangd, so you're using --query-driver to extract the include path list
  • the extracted list includes GCC's built-in headers, and these get inserted onto clangd's regular include path
  • now GCC's built-in headers shadow clangd's, so we have the "doesn't work" scenario described above
  • so you want to rewrite the list GCC's driver reports, replacing GCC's built-in headers with clangd's
  • and for that you need clangd's resource dir

That would work but it's more than you need to do: clangd's resource dir is still on the search path (at the end), so if you simply remove GCC's built-in headers from the list then they'll stop being shadowed.
Can you test if this works?

This seems like a design oversight in query-driver.
It should probably be doing this filtering by default, if there's a reasonable way to detect which entries are builtin includes.
clang supports -nobuiltininc but GCC does not.
On the other hand clang --print-file-name=include gives the built-in headers path, and the same works with GCC. Maybe we can invoke that separately and exclude it.

Regarding alternative solutions:

  • My environment is cross-platform between GNU/Linux (with various distributions and on various platforms such as x86_64, arm64, and power9 and power10), MacOS on both x86_64 and m1/m2, and Windows (10+) on x86_64. So relying on things like /proc/$PPID/exe (or even bash!) is not ideal.
  • Your solution requires me to know implicitly where (compared to the clangd binary) the resource directory lives; that seems less reliable than having clangd tell me. I don't quite understand the reassurance that these implementation details are stable enough to assume to be "fact", compared with the concern in the first bullet above about how they might change.

Your solution also assumes there is only one version of clang installed on the system; in fact my system has multiple versions so .../lib/clang/*/include expands to multiple paths which parses incorrectly (clangd assumes one directory per line). Of course I could have my compiler driver run clangd -version and parse the output to find the right version; my resource directory is /usr/lib/clang/16 but the version is 16.0.0... it is do-able but it just seems increasingly hairy.

At the moment I don't have a different compiler driver for clangd than the standard compiler wrapper script that is provided via compile_commands.json although of course I could create one and change my .clangd to point to it. I was going to simply modify my compiler wrapper to DTRT if it discovered it was being invoked with -v and CLANGD_RESOURCE_DIR was set.

This all makes sense, working around this outside clangd is indeed going to get messy, especially when cross-platform.
For general unsupportable messing-with-resource-dir stuff this might still be where we'd end up, but it sounds like we're actually talking about a problem that should be fixed for all users.

Also, on one of my systems clangd is apparently configured to use posix_spawn() to invoke the compiler driver, which does not work with shell scripts. But that just seems like a bug somewhere. I get:

E[17:23:24.164] System include extraction: driver execution failed with return code: -1 - 'posix_spawn failed: Exec format error'. Args: [/home/pds/src/clangd/bin/qdriver.sh -E -x c++ - -v]

although running that command from the shell prompt works fine.

Yikes, from a cursory look it seems glibc posix_spawn is a bit of a mess...

D156044 attempts to have clangd always exclude the builtin headers when querying the driver.
Would be great to know if this fixes your problem.

(But if you're not in a position to try a patch like that, filtering the path out in your driver wrapper is a good proxy)