This is an archive of the discontinued LLVM Phabricator instance.

Load plugins when creating a CompilerInvocation.
Needs ReviewPublic

Authored by psionic12 on Nov 25 2020, 11:03 PM.

Details

Summary

Since frontend plugins can report diagnostics nowadays,
the old way of loading plugins only fits for clang it self,
tools like clang-check and clangd will not load plugins
even if the getFrontendOpts().Plugins has paths of plugins.

To make all tools to report diagnostics from plugins,
move the loading into CompilerInvocation::CreateFromArgs().

Diff Detail

Event Timeline

psionic12 created this revision.Nov 25 2020, 11:03 PM
psionic12 requested review of this revision.Nov 25 2020, 11:03 PM

Fix some syntax

Is it possible to write tests for this, obviously they will only work with plugin support on the current platform.
Also now that this is loaded from the command line, how will this play nicely with other tools that haven't been built with plugin support?

Is it possible to write tests for this, obviously they will only work with plugin support on the current platform.
Also now that this is loaded from the command line, how will this play nicely with other tools that haven't been built with plugin support?

  1. I'll try to write a test case for this.
  2. If a tool is built with CLANG_PLUGIN_SUPPORT=OFF, then the executable will not expose the symbols, then if a user does send a plugin to the command line as a parameter, the diagnostic will warning "undefined symbol". This will case the plugin unloaded, but not involve the main clang procedure. Do you suggested that this is not an elegant way?

And I checked the CMakeList.txt for the driver, if CLANG_PLUGIN_SUPPORT is off, I guess the "undefined symbol` problem will also happens, I think this maybe one reason that option CLANG_PLUGIN_SUPPORT is as ON by default.

I don't have a lot of background in clang plugins, either on the technical side or how they're used in practice, so some of this may be a bit underinformed.
This sounds OK for experiments as long as it's off by default, but there may be barriers to going any further.

Some questions to try to understand how this is going to play with clangd:

  • what are the plugins you want to use with clangd? (i.e what kinds of plugins are most important, what are they useful for)
  • are plugins always threadsafe?
  • are there any meaningful restrictions from a security perspective (e.g. can it load arbitrary .sos from disk, given control over compile args)
  • how robust are plugins when the main frontendaction is nonstandard (e.g. uses preambles, skips function bodies)

Other points that i'm fairly sure will be problematic:

  • the filesystem access to the libraries isn't virtualized (vfs) and I don't know if it really can be.
  • consequences of bad behavior/crash are worse, because clangd is long-running
  • if a plugin works with clang but not clangd, it's unclear whether/where there's a bug
clang-tools-extra/clangd/tool/CMakeLists.txt
44

This looks like it's on by default - it should be off at least by default until:

  • support surface of this is understood by maintainers
  • security is understood and safe (command-line arguments are effectively attacker-controlled for clangd)
  • we've verified that this actually works well with important plugins
clang/lib/Frontend/CompilerInvocation.cpp
3902

how does this code behave if CLANG_PLUGIN_SUPPORT is off?

3905

is this threadsafe on all platforms?

3914

we can't support replacement of the main action in clangd. What's the plan there - ignore the plugin?

  • what are the plugins you want to use with clangd? (i.e what kinds of plugins are most important, what are they useful for)

Any plugins which will report diagnostics, especially customized attribute plugins, you can check clang/examples/ for details. Examples like AnnotationFunctions, Attribute and CallSuperAttribute(which is wrote by me) all defined a customized attribute and apply some rules on it, if the code do not followed the rule, diagnostics are reported.

  • are plugins always threadsafe?

Could you give an example about what kind of thread-safety problem you are imagining? As long as I wrote clang specific plugins, only callbacks are needed to implement, no thread functionalities are used, and as far as I know, clang use single thread to deal with lex, parser, and sema. So I think thread safety won't be a problem here.

  • are there any meaningful restrictions from a security perspective (e.g. can it load arbitrary .sos from disk, given control over compile args)

Sorry I can't tell, but I think this patch is some sort of NFC patch, what you concern will or will not exist with or without this patch.

  • how robust are plugins when the main frontendaction is nonstandard (e.g. uses preambles, skips function bodies)

Different plugins have different callbacks, these callbacks will only get called if the compile work flow goes there, that is to say if the frontendcation skip the bodies somehow(sorry I don't know how could it be done, could you help to explain more?), the callback will not triggered.

  • the filesystem access to the libraries isn't virtualized (vfs) and I don't know if it really can be.

Sorry I don't quite understand, as long as I know the plugin is no more than a shared library, and the driver is no more than a executable, the load procedure seems no different with other dlopen cases.

  • consequences of bad behavior/crash are worse, because clangd is long-running

Does clangd shares the same process with an FrontendAction? Because in clang, if a plugin carshes, the whole process does terminated, clang didn't handle this yet(just a core dump).

  • if a plugin works with clang but not clangd, it's unclear whether/where there's a bug

I think that depends. Since clangd use syntax-only, plugins other than frontendAction seems won't get called. As to frontendActions, if it works with clang, it should works with clangd, because frontendActions is more about syntax check before generate IR code, clang or clangd or chang-check makes no difference from an frontendActions perspective.

Anyway, some of your concerns seems out of my scope, since clangd have never worked with plugins before, do you think it's better to add some one who knows these? (I just don't know who to add...)

psionic12 added inline comments.Nov 26 2020, 8:11 AM
clang/lib/Frontend/CompilerInvocation.cpp
3902

If CLANG_PLUGIN_SUPPORT is off, than no symbols are exported from a executable, than load will failed with undefined symbols, a diagnostic will be reported and the plugin will not be loaded. The main procedure will not be affected.

3905

While I can't tell, clang driver use this for a while and seems no problem, could you help to point out what's the worst case your concerned about?

3914

Could you help to explain why action replacements are not supported?

As far as I know, replacement of actions is related with Actions, which does not have directly relationship with clangd,

  • what are the plugins you want to use with clangd? (i.e what kinds of plugins are most important, what are they useful for)

Any plugins which will report diagnostics, especially customized attribute plugins, you can check clang/examples/ for details. Examples like AnnotationFunctions, Attribute and CallSuperAttribute(which is wrote by me) all defined a customized attribute and apply some rules on it, if the code do not followed the rule, diagnostics are reported.

Right, I understand (a little bit, at least!) what plugins *can* do. What I'm asking specifically is: this feature has a cost, how important is supporting it? Are there codebases where these attributes are widely used, and enforcement at development time is particularly valuable?

  • are plugins always threadsafe?

Could you give an example about what kind of thread-safety problem you are imagining? As long as I wrote clang specific plugins, only callbacks are needed to implement, no thread functionalities are used, and as far as I know, clang use single thread to deal with lex, parser, and sema. So I think thread safety won't be a problem here.

Sorry, I probably should have said "can plugins be thread-hostile".
Clangd runs several parsing actions in parallel, each on a single separate thread.
If a plugin assumes there's only a single instance of it running and uses static data accordingly, we're going to hit problems in clangd.
The problem is that precisely because clang uses a single thread, (some) plugins may think this is a reasonable assumption. And it's not possible to tell "from the outside" whether a given plugin is unsafe in this way.

  • are there any meaningful restrictions from a security perspective (e.g. can it load arbitrary .sos from disk, given control over compile args)

Sorry I can't tell, but I think this patch is some sort of NFC patch, what you concern will or will not exist with or without this patch.

It looks like this patch moves LoadLibraryPermanently to a different location, per the patch description clangd does not currently load plugins. That's why I'm concerned this patch may introduce unsafe loading of untrusted .sos.

  • how robust are plugins when the main frontendaction is nonstandard (e.g. uses preambles, skips function bodies)

Different plugins have different callbacks, these callbacks will only get called if the compile work flow goes there, that is to say if the frontendcation skip the bodies somehow(sorry I don't know how could it be done, could you help to explain more?), the callback will not triggered.

SkipFunctionBodies is an option in FrontendOpts that produces a different AST that clang itself (which does not set this option) can't produce. Basically I'm saying "can we tell if a plugin is going to work with our weird ASTs, or do we just have to try and maybe crash".

  • the filesystem access to the libraries isn't virtualized (vfs) and I don't know if it really can be.

Sorry I don't quite understand, as long as I know the plugin is no more than a shared library, and the driver is no more than a executable, the load procedure seems no different with other dlopen cases.

We don't support any other dlopen cases in clangd.

  • consequences of bad behavior/crash are worse, because clangd is long-running

Does clangd shares the same process with an FrontendAction? Because in clang, if a plugin carshes, the whole process does terminated, clang didn't handle this yet(just a core dump).

Yes. And I'm not convinced it's possible to handle this safely without large architectural changes.

Anyway, some of your concerns seems out of my scope, since clangd have never worked with plugins before, do you think it's better to add some one who knows these? (I just don't know who to add...)

I think we need someone who understands plugins well why they're safe (or how they can be made safe) to work with clangd, and who's willing to do much of the work to do so.
Absent that, I think we probably shouldn't enable them (beyond an experiment).

Right, I understand (a little bit, at least!) what plugins *can* do. What I'm asking specifically is: this feature has a cost, how important is supporting it? Are there codebases where these attributes are widely used, and enforcement at development time is particularly valuable?

Well, I can't tell you how widely plugin used, because they often used in third-party projects. Firefox team seems use them a lot, the ParsedAttrInfo plugin implementation seems write by them if I remember right. And our team use both plugins and clangd a lot, that's why I'm willing to contribute to make them work together. As about the importance, my personal opinion is that since there's a feature, we should try to polish it...

Sorry, I probably should have said "can plugins be thread-hostile".
Clangd runs several parsing actions in parallel, each on a single separate thread.
If a plugin assumes there's only a single instance of it running and uses static data accordingly, we're going to hit problems in clangd.
The problem is that precisely because clang uses a single thread, (some) plugins may think this is a reasonable assumption. And it's not possible to tell "from the outside" whether a given plugin is unsafe in this way.

That seems a serious problem, but I got a question, what if the clang itself declared some static data (I'll investigate later, just ask for now)?

It looks like this patch moves LoadLibraryPermanently to a different location, per the patch description clangd does not currently load plugins. That's why I'm concerned this patch may introduce unsafe loading of untrusted .sos.

Well, since it called plugin, it's users choose to use it or not, I personally think we can't check if an .so is trusted, and neither should we care...

SkipFunctionBodies is an option in FrontendOpts that produces a different AST that clang itself (which does not set this option) can't produce. Basically I'm saying "can we tell if a plugin is going to work with our weird ASTs, or do we just have to try and maybe crash".

I'll test this.

  • the filesystem access to the libraries isn't virtualized (vfs) and I don't know if it really can be.

Sorry I still can't understand this, could you help to explain more? What I mean before is that since this part of code is used in clang driver as well, it should be no problem get called by clangd.

  • consequences of bad behavior/crash are worse, because clangd is long-running

Does clangd shares the same process with an FrontendAction? Because in clang, if a plugin carshes, the whole process does terminated, clang didn't handle this yet(just a core dump).

Yes. And I'm not convinced it's possible to handle this safely without large architectural changes.

Well, I guess currently I can do nothing to improve this, but if a user uses a plugin, it crashed, that's should be the plugin's problem...

Absent that, I think we probably shouldn't enable them (beyond an experiment).

How about capsule the loading functionalities to a function, and call it from clangd if experimental feature is checked? In this way nothing changed if the clangd's experimental feature is off.

TL;DR: clangd has some different concerns than clang, so we really need to approach this as a new feature with some design required (even if a mechanism already exists).
The most promising ideas I can think of involve doing the loading at clangd startup based on some trusted configuration, and *not* dynamically loading in the driver, instead allowing the use of plugins already loaded.

Right, I understand (a little bit, at least!) what plugins *can* do. What I'm asking specifically is: this feature has a cost, how important is supporting it? Are there codebases where these attributes are widely used, and enforcement at development time is particularly valuable?

Well, I can't tell you how widely plugin used, because they often used in third-party projects. Firefox team seems use them a lot, the ParsedAttrInfo plugin implementation seems write by them if I remember right. And our team use both plugins and clangd a lot, that's why I'm willing to contribute to make them work together. As about the importance, my personal opinion is that since there's a feature, we should try to polish it...

Ah, thanks! If the widely-used plugins live in-tree, then it's more plausible to support just those, e.g.

  • we don't need to consider the security concerns of loading unknown code
  • we can fix them if they become e.g. thread-hostile

Sorry, I probably should have said "can plugins be thread-hostile".
Clangd runs several parsing actions in parallel, each on a single separate thread.
If a plugin assumes there's only a single instance of it running and uses static data accordingly, we're going to hit problems in clangd.
The problem is that precisely because clang uses a single thread, (some) plugins may think this is a reasonable assumption. And it's not possible to tell "from the outside" whether a given plugin is unsafe in this way.

That seems a serious problem, but I got a question, what if the clang itself declared some static data (I'll investigate later, just ask for now)?

You mean if clang itself were thread-hostile? We've found and fixed such problems over the years as part of developing clangd and other tools that use the clang frontend in a multithreaded environment. This cost some time and effort, but had a high reward, we had access to all the code, and the problem was bounded. So it was feasible and worthwhile!

It looks like this patch moves LoadLibraryPermanently to a different location, per the patch description clangd does not currently load plugins. That's why I'm concerned this patch may introduce unsafe loading of untrusted .sos.

Well, since it called plugin, it's users choose to use it or not, I personally think we can't check if an .so is trusted, and neither should we care...

We need to be careful about what "user" and "choose" means.
e.g. if you clone an untrusted git repo and open src/foo.cpp in your editor...
(and unknown to you, src/compile_flags.txt contains -plugin=../lib/pwn.so)
(and unknown to you, lib/pwn.so wipes your hard disk when loaded)
Have you made a meaningful choice to use that plugin? That's the security context clangd is in - the decision to parse the file is implicit, and the compile flags are potentially attacker-controlled.

While the *clang* flags to parse a particular file are attacker-controlled, the *clangd* flags are not, and are a viable place to put security-sensitive configuration. So one potential design idea is to pass -clang-plugin=foo.so to clangd, and load foo.so on startup (not in the clang driver). Then when it comes time to parse a file, we can allow its args to activate plugins that are already loaded, but not to load new ones.
(There's precedent for this: the clangd --query-driver flag is designed to mitigate a similar clang-flags-lead-to-arbitary-code-execution concern)

I think this might be viable but as you can imagine it's considerably more involved than just enabling the plugin code in the driver.

  • the filesystem access to the libraries isn't virtualized (vfs) and I don't know if it really can be.

Sorry I still can't understand this, could you help to explain more?

clangd (and other clang tools) get deployed in environments where all access to the filesystem goes through a llvm::vfs::Filesystem, and all filenames referred to in the compile command are within that virtual filesystem rather than the real one.
(Again, this isn't an issue if we require these paths to be specified on the *clangd* startup command line, as opposed to the clang flags that form the compile command)

What I mean before is that since this part of code is used in clang driver as well, it should be no problem get called by clangd.

To my knowledge, clang itself is never used in such an environment, so only the parts that already run in clang tools have been made VFS-clean.

  • consequences of bad behavior/crash are worse, because clangd is long-running

Does clangd shares the same process with an FrontendAction? Because in clang, if a plugin carshes, the whole process does terminated, clang didn't handle this yet(just a core dump).

Yes. And I'm not convinced it's possible to handle this safely without large architectural changes.

Well, I guess currently I can do nothing to improve this, but if a user uses a plugin, it crashed, that's should be the plugin's problem...

I'm not sure that expecting everyone writing clang plugins to understand and care about clangd's quirks is reasonable or realistic. If we set the expectation that plugins should work, there will be some pressure to "emulate" clang sufficiently accurately. We do this for clang-tidy checks (but again, the value there is very high).

Absent that, I think we probably shouldn't enable them (beyond an experiment).

How about capsule the loading functionalities to a function, and call it from clangd if experimental feature is checked? In this way nothing changed if the clangd's experimental feature is off.

Yeah, a clangd flag to allow switching this on/off at without rebuilding could definitely be an option.

clang/lib/Frontend/CompilerInvocation.cpp
3905

clang-driver isn't multithreaded, so wouldn't show up problems here.

I'm concerned that if thread A loads plugin X, thread B loads plugin X, and thread C loads plugin Y, all concurrently and using this code path, whether they will all load correctly.

In clangd these threads could correspond to two open files and a background-index worker.

(I don't know much about dynamic loading, this may be totally fine)

3914

Clangd uses some custom FrontendActions (different ones for different ways we're parsing the file) to implement its features (e.g. track which parts of the AST are part of the main-file without deserializing the preamble, and to do indexing).
If you replace these actions, normal features will not work.

e.g.:

If we replace these with the plugin's action, clangd won't work

clangd (and other clang tools) get deployed in environments where all access to the filesystem goes through a llvm::vfs::Filesystem, and all filenames referred to in the compile command are within that virtual filesystem rather than the real one.
(Again, this isn't an issue if we require these paths to be specified on the *clangd* startup command line, as opposed to the clang flags that form the compile command)

Yes I know how clang manager files through vfs, it just that loading libraries does not involve vfs at all, the path of a lib is passed directly to the system level API (eg, dlopen on Linux, System::Load on Windows), so I still can't understand what you are concerning, maybe you could point out a more specific example? Or, could you help to point out what's the difference between passing a plugin path through *clangd* startup command line and through clang flags?

clang/lib/Frontend/CompilerInvocation.cpp
3905

In this case I can make sure multiple thread works just fine with LoadLibraryPermanently, I checked all the dynamic loading API on most platforms and all of the are thread-safe(it's rare to see system APIs which are not thread safe, to me).

3914

I think this is the part I didn't expected, seems a standalone action loading logic needs to exist.

I'll try to come up a patch which has standalone plugin loading and guard it with experimental checking.

What I'm asking specifically is: this feature has a cost, how important is supporting it? Are there codebases where these attributes are widely used, and enforcement at development time is particularly valuable?

FWIW, one of the codebases I work on uses a clang plugin to report a number of additional diagnostics, and I think it would be quite useful to have those show up as you're editing (comparable to clang-tidy diagnostics, except even more relevant as they're domain-specific).

I think making plugin support opt-in (including by requiring the clangd flags to mention the plugin, as suggested) would be fine. I think it's also fine to make clangd's support for plugins best-effort, in the sense that if a plugin doesn't play nicely in one of the mentioned ways (e.g. it uses static storage, or does not expect SkipFunctionBodies), the onus is on the plugin to fix those behaviours.

clangd (and other clang tools) get deployed in environments where all access to the filesystem goes through a llvm::vfs::Filesystem, and all filenames referred to in the compile command are within that virtual filesystem rather than the real one.
(Again, this isn't an issue if we require these paths to be specified on the *clangd* startup command line, as opposed to the clang flags that form the compile command)

Yes I know how clang manager files through vfs, it just that loading libraries does not involve vfs at all, the path of a lib is passed directly to the system level API (eg, dlopen on Linux, System::Load on Windows), so I still can't understand what you are concerning,

This is exactly the problem, filenames specified in *clang* flags are *supposed* to be read from the VFS. (In practice this probably just means we'd need to disable this feature in environments where VFS is used)

Or, could you help to point out what's the difference between passing a plugin path through *clangd* startup command line and through clang flags?

Sure. TL;DR is: clangd flags are configured by the user, user can be fully responsible for security/stability.
clang flags are configured by the project. If they're bad, we can e.g. give bad diagnostics, but can't crash or compromise security.

More detail:

In the simplest possible case, clangd is configured as follows:

  1. user downloads clangd binary
  2. user installs an LSP plugin for their editor, and configures the plugin to use /usr/bin/clangd for C++ files. clangd starts when the editor does
  3. the build system for $PROJECT generates $PROJECT/compile_commands.json
  4. when the user opens $PROJECT/src/foo.cpp in the editor, it notifies clangd. clangd searches for $PROJECT/compile_commands.json, finds the clang arguments, and uses them to parse foo.cpp

*clangd* command-line flags would be added explicitly by the user at step 2. We can reasonably ask the user to be aware/responsible for security/stability implications of doing this, including with their particular clangd version. We can also ask them to run clangd --check without the plugin flag to test whether the plugin is causing a stability problem.

*clang* command-line flags are added implicitly in step 3. Or they could simply be checked into the repository - nothing ensures they were generated locally by the build system. The point is in typical usage they are not controlled by the user directly, and from a security perspective are not trusted (as safely opening files from untrusted repos is a reasonable expectation). So if we're loading plugins based on instructions in clang command-line flags, clangd bears most of the responsibility for making sure that's safe and correct (and I don't see a way to do that).

clang/lib/Frontend/CompilerInvocation.cpp
3905

That sounds good! I do see mutexes in the posix implementation so I guess the LLVM wrapper is intended to be threadsafe.

3914

It seems plugins can run before/after/replace the main action.

I wonder if it'd break things to silently "promote" a replace plugin to e.g. an after one, which I guess would solve this.

Another consideration I haven't mentioned: clangd parses files in a few different modes.

  • code completion: this shouldn't run plugins I believe: latency is critical and we don't emit any diagnostics
  • main file AST build: Important to run plugins here but it's critical they don't traverse the whole preamble (can degrade latency by orders of magnitude).
  • preamble build: this is tricky, we *rarely* emit diagnostics from this step, and it's very performance-sensitive. It's a separate action from the main file AST build so plugins won't be able to see the results of analysis. I think we shouldn't run plugins here.
  • background indexing: again, no reason to run plugins here

This is similar to clang-tidy, which we run on the main file AST build only.
ASTContext::TraversalScope was implemented to make clang-tidy run effectively in clangd (provide a way to avoid traversing and deserializing the preamble).
So:

  • some plugins will need changes to run efficiently, but these changes are probably simple. e.g. CallSuperAttrInfo.cpp needs Visitor.TraverseDecl(Context.getTranslationUnitDecl()); to be replaced by Visitor.TraverseAST(Context).
  • we need to ensure the traversal scope is already set by the time plugin hooks that may perform traversals (e.g. ASTConsumer::handleTranslationUnit) are called

This is exactly the problem, filenames specified in *clang* flags are *supposed* to be read from the VFS. (In practice this probably just means we'd need to disable this feature in environments where VFS is used)

I don't thinks so, plugins are loaded by using -fplugin=xxxx.so, this is a flag being parsed and saved in FrontendOptions::Plugins, which is a string vector. As you can see, xxx.so is treat as a string, not a file, vfs didn't load it to memory like normal files (for example main.cpp) . So I think whether VFS is used or not has nothing to do with plugin loading procedure.

Sure. TL;DR is: clangd flags are configured by the user, user can be fully responsible for security/stability.
clang flags are configured by the project. If they're bad, we can e.g. give bad diagnostics, but can't crash or compromise security.

More detail:

In the simplest possible case, clangd is configured as follows:

  1. user downloads clangd binary
  2. user installs an LSP plugin for their editor, and configures the plugin to use /usr/bin/clangd for C++ files. clangd starts when the editor does
  3. the build system for $PROJECT generates $PROJECT/compile_commands.json
  4. when the user opens $PROJECT/src/foo.cpp in the editor, it notifies clangd. clangd searches for $PROJECT/compile_commands.json, finds the clang arguments, and uses them to parse foo.cpp

*clangd* command-line flags would be added explicitly by the user at step 2. We can reasonably ask the user to be aware/responsible for security/stability implications of doing this, including with their particular clangd version. We can also ask them to run clangd --check without the plugin flag to test whether the plugin is causing a stability problem.

*clang* command-line flags are added implicitly in step 3. Or they could simply be checked into the repository - nothing ensures they were generated locally by the build system. The point is in typical usage they are not controlled by the user directly, and from a security perspective are not trusted (as safely opening files from untrusted repos is a reasonable expectation). So if we're loading plugins based on instructions in clang command-line flags, clangd bears most of the responsibility for making sure that's safe and correct (and I don't see a way to do that).

I got you point on this, using a clangd command-line flags means users are aware of the the plugin loading. Currently my idea is that if a plugin is listed in the compile_command.json, it will only be loaded if a flag with the same plugin name passed in from clangd command-line.

psionic12 added a comment.EditedNov 27 2020, 1:25 AM
  • code completion: this shouldn't run plugins I believe: latency is critical and we don't emit any diagnostics

Totally agree.

  • main file AST build: Important to run plugins here but it's critical they don't traverse the whole preamble (can degrade latency by orders of magnitude).

Well, Sound reasonable, but what if getTranslationUnitDecl() is indeed needed instead of using TraverseAST (I didn't came up with an example yet)? In this case I think clangd should allow a plugin to do its job, even if it is performance hostile.

  • preamble build: this is tricky, we *rarely* emit diagnostics from this step, and it's very performance-sensitive. It's a separate action from the main file AST build so plugins won't be able to see the results of analysis. I think we shouldn't run plugins here.

I got a question, if "we *rarely* emit diagnostics from this step", what will happen if I open the source file of the header? Can I consider that the diagnostic from the header is not generated by creating PCH (Simply speaking preamble build do not generate diags, and diags from a header file do not generated from preamble build stage)? If so, I think no plugin loading in this step is fine.

  • background indexing: again, no reason to run plugins here

Agree, but I got a question, what if a error diagnostic exist in the code, will the index procedure do?

  • we need to ensure the traversal scope is already set by the time plugin hooks that may perform traversals (e.g. ASTConsumer::handleTranslationUnit) are called

I think it's hard to implement, and even so a plugin may still need to traversal out of scope, and I think we should let them do that.

  • main file AST build: Important to run plugins here but it's critical they don't traverse the whole preamble (can degrade latency by orders of magnitude).

Well, Sound reasonable, but what if getTranslationUnitDecl() is indeed needed instead of using TraverseAST (I didn't came up with an example yet)? In this case I think clangd should allow a plugin to do its job, even if it is performance hostile.

To a first approximation, I think we *should* disable features that don't perform well, as the overall responsiveness is more important than any one feature.
We make numerous functionality tradeoffs in clangd consistent with this.

As a practical concern, I'm not sure what we can reasonably do to enforce this though.

(One idea we've been playing with is that traversal of the AST usually goes through TranslationUnitDecls::decls_begin(), which triggers loading from the preamble. We could add a stateful flag to DeclContext that causes this to behave like noload_decls_begin instead...)

(One idea we've been playing with is that traversal of the AST usually goes through TranslationUnitDecls::decls_begin(), which triggers loading from the preamble. We could add a stateful flag to DeclContext that causes this to behave like noload_decls_begin instead...)

I agree with that, since the C++20 released the "module" feature, projects with PCHs should be a little more common than before, the traverse algorithm should not load external source by default... I tried to modify the code, but it seems no easy as I though... Currently out team set scopes manually on an AST to avoid decls_begin() get called. Maybe we should mention this on the documents?

nridge added a comment.EditedDec 12 2020, 12:16 PM

Or, could you help to point out what's the difference between passing a plugin path through *clangd* startup command line and through clang flags?

Sure. TL;DR is: clangd flags are configured by the user, user can be fully responsible for security/stability.
clang flags are configured by the project. If they're bad, we can e.g. give bad diagnostics, but can't crash or compromise security.

More detail:

In the simplest possible case, clangd is configured as follows:

  1. user downloads clangd binary
  2. user installs an LSP plugin for their editor, and configures the plugin to use /usr/bin/clangd for C++ files. clangd starts when the editor does
  3. the build system for $PROJECT generates $PROJECT/compile_commands.json
  4. when the user opens $PROJECT/src/foo.cpp in the editor, it notifies clangd. clangd searches for $PROJECT/compile_commands.json, finds the clang arguments, and uses them to parse foo.cpp

*clangd* command-line flags would be added explicitly by the user at step 2. We can reasonably ask the user to be aware/responsible for security/stability implications of doing this, including with their particular clangd version. We can also ask them to run clangd --check without the plugin flag to test whether the plugin is causing a stability problem.

*clang* command-line flags are added implicitly in step 3. Or they could simply be checked into the repository - nothing ensures they were generated locally by the build system. The point is in typical usage they are not controlled by the user directly, and from a security perspective are not trusted (as safely opening files from untrusted repos is a reasonable expectation). So if we're loading plugins based on instructions in clang command-line flags, clangd bears most of the responsibility for making sure that's safe and correct (and I don't see a way to do that).

Something just occurred to me: can't clangd arguments also be controlled by the untrusted repository by having a .vscode/settings.json file with specificed "clangd.arguments" checked in?

Or, could you help to point out what's the difference between passing a plugin path through *clangd* startup command line and through clang flags?

Sure. TL;DR is: clangd flags are configured by the user, user can be fully responsible for security/stability.
clang flags are configured by the project. If they're bad, we can e.g. give bad diagnostics, but can't crash or compromise security.

More detail:

In the simplest possible case, clangd is configured as follows:

  1. user downloads clangd binary
  2. user installs an LSP plugin for their editor, and configures the plugin to use /usr/bin/clangd for C++ files. clangd starts when the editor does
  3. the build system for $PROJECT generates $PROJECT/compile_commands.json
  4. when the user opens $PROJECT/src/foo.cpp in the editor, it notifies clangd. clangd searches for $PROJECT/compile_commands.json, finds the clang arguments, and uses them to parse foo.cpp

*clangd* command-line flags would be added explicitly by the user at step 2. We can reasonably ask the user to be aware/responsible for security/stability implications of doing this, including with their particular clangd version. We can also ask them to run clangd --check without the plugin flag to test whether the plugin is causing a stability problem.

*clang* command-line flags are added implicitly in step 3. Or they could simply be checked into the repository - nothing ensures they were generated locally by the build system. The point is in typical usage they are not controlled by the user directly, and from a security perspective are not trusted (as safely opening files from untrusted repos is a reasonable expectation). So if we're loading plugins based on instructions in clang command-line flags, clangd bears most of the responsibility for making sure that's safe and correct (and I don't see a way to do that).

Something just occurred to me: can't clangd arguments also be controlled by the untrusted repository by having a .vscode/settings.json file with specificed "clangd.arguments" checked in?

Yeah, there are too many ways to pass an argument without user's awareness, all the safety protections we talked about aren't help much, and I think this is not the clangd's problem, it exists in clang as well. So I think loading plugin codes guarded with CLANG_PLUGIN_SUPPORT on is enough, no more complicity protections should added.

Something just occurred to me: can't clangd arguments also be controlled by the untrusted repository by having a .vscode/settings.json file with specificed "clangd.arguments" checked in?

Hmm, maybe... I thought getConfiguration('clangd') with no scope specified was supposed to be global (i.e. not a workspace-specific setting). There's a scope you can pass in, and we're not providing one.

Nevertheless testing it locally these flags do seem to be used. We should fix this, I think workspace.getConfiguration('clangd').inspect('arguments') and then applying the components manually makes it possible. This is a horrible breaking change, though :-(

Alternatives would be treating clangd args as untrusted too, conceding that opening a file in vscode can own the user, some list of safe flags...

Yeah, there are too many ways to pass an argument without user's awareness, all the safety protections we talked about aren't help much, and I think this is not the clangd's problem, it exists in clang as well. So I think loading plugin codes guarded with CLANG_PLUGIN_SUPPORT on is enough, no more complicity protections should added.

I can't agree with this, even if other problems exist, it's not OK to introduce a big problem with a broad scope, when we have alternatives.

I thought getConfiguration('clangd') with no scope specified was supposed to be global (i.e. not a workspace-specific setting). There's a scope you can pass in, and we're not providing one.

Nevertheless testing it locally these flags do seem to be used. We should fix this, I think workspace.getConfiguration('clangd').inspect('arguments') and then applying the components manually makes it possible. This is a horrible breaking change, though :-(

Apart from the breaking nature of the change, I think being able to specify "clangd.arguments" per-workspace is a very important use case (to take the most obvious example, --compile-commands-dir is workspace-specific).