Page MenuHomePhabricator

[clangd] Make signatureHelp work with stale preambles

Authored by kadircet on Apr 3 2020, 6:35 AM.



This is achieved by calculating newly added includes and implicitly
parsing them as if they were part of the main file.

This also gets rid of the need for consistent preamble reads.

Diff Detail

Event Timeline

kadircet created this revision.Apr 3 2020, 6:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 3 2020, 6:35 AM
kadircet retitled this revision from [clangd] Make signatureHelp work with stale preambles to [WIP][clangd] Make signatureHelp work with stale preambles.Apr 6 2020, 1:40 AM

This is the perfect feature to start with (doesn't actually need any location transforms, lets us drop TUScheduler features), so well done for finding that.

That said, high-level comments mostly about preamble patching in general rather than this particular case.


This is going to be used in other places where we parse too, we'll want some abstraction to encapsulate the analysis, the tweaks to CompilerInvocation, and (for other uses) mapping of locations from the preamble file and injected regions.

As discussed offline, I think injecting by synthesizing a header snippet which is mapped into the VFS and then added to PreprocessorOpts.Includes is preferable to tweaking the CompilerInvocation directly:

  • it extends to arbitrary directives, e.g. there's no way today to inject an #import directive via CompilerInvocation, no control over <> vs "", ...
  • it's easier to debug as the injected region can easily be dumped as text
  • it's easier to for code to recognize locations in our injected region if we're not sharing it with e.g. -Ds coming from the commandline
  • (I think) it lets us use the PresumedLocation mechanism to record the association of each injected line with the corresponding line in the mainfile (not sure if this is valuable if we also need to translate the other way)

So I'd probably suggest something like

class PreamblePatch { // ends up stored in ParsedAST
  PreamblePatch(); // empty, used when no patching is done
  static PreamblePatch compute(const ParseInputs&);

  void apply(CompilerInvocation&); // adjusts PPOptions.{RemappedFileBuffers, Includes} to include injected header

  friend operator<<(raw_ostream&, const PreamblePatch &); // probably just dumps InjectedHeaderContent

  // later... not sure about signature/names here
  SourceLocation parsedToUserLoc(SourceLocation);
  SourceLocation userToParsedLoc(SourceLocation); // is this also needed?

  std::string InjectedHeaderContent; // with #line directives, or at least comments 
  // some data structure to map between locations in the stale preamble and their user locations
  // some data structure to map between locations in the injected header and their user locations
237 ↗(On Diff #254779)

The tradeoff between raw-lexing and running the actual preprocessor in SingleFileParseMode seems pretty interesting.
(I'm not suggesting any particular change here, but I think it's worth exploring at some point. If the raw lexing is "for now" let's call that out, if it's a decision let's justify it. Also just curious what you think at this point - I know we've discussed this in the past and I pushed for raw-lexing)

The SingleFileParseMode is more accurate:

  • it's probably much easier to handle more directives correctly
  • it will handle ifdefs correctly where possible without reading files
  • it will tell us what includes actually resolve to (I'm not sure if this is useful though, when you can inject an #include with the right spelling and check that later)
  • ??? maybe we can recognize bad directives that we want to avoid injecting for better results ???
  • we don't have to do weird stuff like track the raw-lexed includes in preambledata

But potentially expensive with IO:

  • it will stat all the files along the search path.
  • it will ::realpath all the files that were found

Possible workaround for stat along the search path: most of the time, the #include directives are going to refer to files that were already included in the previous preamble (i.e. the preamble we're trying to patch) so we (heuristically) know what they resolve to.
If we used the HeaderSearchInfo's alias map to redirect <foo.h> to </absolute/path/to/foo.h> then #include <foo.h> will just cost a single stat. And when our heuristic is wrong and <foo.h> resolves to something else, we'll just get an include-not-found, we can still add the #include.

More radical workaround: give it a completely null VFS so it can't do any IO. All includes will fail to resolve, but if we don't need their resolved path at this point we don't care, right?

198 ↗(On Diff #254779)

I'm guessing the APIs we end up with probably belong more in Preamble.h rather than here, both thematically and in terms of layering

kadircet updated this revision to Diff 256359.Apr 9 2020, 12:57 PM
  • Encapsulate logic into PreamblePatch
kadircet marked 3 inline comments as done.Apr 9 2020, 1:07 PM
kadircet added inline comments.

implemented an initial version of this, no sourcelocation handling yet.

compute also takes the preamble we are patching though.

237 ↗(On Diff #254779)

the choice of raw lexer was more of a shortcut. i was just trying to get an initial version, tried to put some ideas in comments.

Will experiment with couple of options, but using alias mapping to map written includes to resolved absolute paths and also using the statcached vfs sounds like the best thing we can do for now.

We should also put this behind a "raw string equality" check to not perform any IO in the most common case(preamble is same).

Moreover, we can also put this logic into TUScheduler to not even perform string equality checks in case the contents haven't been modified. Not sure how useful that'll be though, as we usually receive a signaturehelp request after a modification to the source code.

kadircet updated this revision to Diff 258064.Apr 16 2020, 8:40 AM
  • Use preprocessor instead of raw lexer

This is ready for another round. To summarize the current state:

  • PreamblePatch can be create from a ParseInput and a PreambleData,
    • It will preprocessor preamble section of current file contents using a dummy FS to reduce cost of stat/realpaths
    • It won't populate Resolved path for inclusions, as we don't need them at this stage, and even later on. As we'll run the real preprocessor when building the ParsedAST, and we can collect resolved paths at that stage instead.
    • Afterwards it will diff the includes found in current contents with includes found in the PreambleData, and create a patch containing all the new includes
  • PreamblePatch can be applied to a CompilerInvocation:
    • It is a no-op if contents are empty, i.e it was default constructed or there were no new includes.
    • Otherwise it will create an implicit include which contains patch created with new includes.

Things that I don't like about current state:

  • We'll run processor to collect includes even when the preamble section of current file hasn't changed.
    • PrecompiledPreamble contains the contents of the preamble but it is not exposed, I suppose we can either expose that to get rid of extra preprocessor creation overhead.
  • We create an extra CompilerInvocation and CompilerInstance just to run preprocessor and throw them away later on. I am not sure what to do about this :/

This looks pretty good! Haven't reviewed the tests or removal of consistent preamble support yet. (Mostly note-to-self)


Given your getDecomposedTok before, you might want to assert Tok is in the main file inside the loop


I'd call this scanPreambleIncludes or something just slightly less generic - again here we don't want it to be accidentally reused for something that needs to be accurate.


Something needs to check that there's one input etc.
(This could maybe be moved into buildCompilerInvocation I guess, but currently the caller does it)


I'd stick to LexedIncludes or ApproxmimateIncludes or BaselineIncludes - it excludes stuff under ifdefs that *can* be resolved, and we don't want it to be used for other purposes...


I'm slightly nervous about incorporating the filename itself, not sure why but it feels unneccesary.
WDYT about "dir/__preamble_patch__.h"?


Why not a DenseSet<pair<PPKeywordKind, StringRef>>?
(The copies probably don't matter, but I think it'd be a bit clearer and more typesafe)


worth a comment indicating exactly where this is going to be inserted into the parsing stream.


Maybe "Used as a baseline for creating a PreamblePatch"?
Since this is a bit unusual.


Since computing these inclusions seems to be cheap (you've disabled all the IO), would it be clearer and more flexible to store the preamble text as a string and compute both the baseline & new includes at the same time?
I'm thinking if other preamble constructs (macros?) are added, needing to put more data structures in PreambleData, where if you store the text only those can be private details.

(In fact the preamble text is already stored in PrecompiledPreamble, you could just expose it there)


This is probably a good place for a high-level overview of the concept, and maybe an example.

In particular, *something* should mention that this is approximate, and what we use it for/don't use it for.


// With an empty patch, the preamble is used verbatim.


nit: it'd be good to give these "version"s names. e.g. PI -> Modified, Preamble -> Baseline. And maybe mention these in the class description?


This explanation is a little low-level, I'd add a first sentence explaining the goal.

e.g. "Adjusts CI (which compiles the modified inputs) to be used with the baseline preamble".

kadircet marked 15 inline comments as done.Apr 20 2020, 1:44 PM
kadircet added inline comments.

it was done to get rid of path manipulations, but not that important.


PPKeywordKind didn't have a DenseMapInfo, adding one. It already has two invalid enum values.


since a preamble can be used with multiple preamble patches i wanted to reduce lexing overhead. I suppose we can address that later if it shows up in the traces. exposing the contents in PrecompiledPreamble.

kadircet updated this revision to Diff 258833.Apr 20 2020, 1:44 PM
kadircet marked 2 inline comments as done.
  • Address comments
  • Don't store LexedInclude in PreambleData, expose the contents in PrecompiledPreamble instead.
  • Introduce DenseMapInfo for PPKeywordKind
sammccall marked an inline comment as done.Apr 20 2020, 3:06 PM

Great stuff!


The error-handling paths here just return {}, same as an empty preamble.

if both baseline & current fail, then we'll generate no patch --> fine
if baseline is ok and current fails, we'll consider all headers, we'll consider all headers removed and generate no new includes --> fine for now (added-includes-only model)
if baseline fails and current is OK, we'll create a patch that adds all the headers --> really slow, we'd be better off creating an empty patch

clarity and debuggability:
behaviors are implicit and silent. I think we should write out these three cases explicitly in the code, and log them at least at -log/verbose. These should be rare conditions I think. (If they're not, we should be able to track down & handle the common causes I think)

I think this function should probably return Expected.


Ugh, I thought enums had those implicitly. The need for two invalid values is really annoying, it's probably why we don't have more implicit ones.

I wonder whether it's feasible (technically and legally) to replace DenseHashMap with a fork of absl::flat_hash_map. I'm pretty sure it's faster, and it doesn't have these weird API requirements.

sammccall accepted this revision.Apr 20 2020, 3:07 PM

Keep missing the "ship it" button...

This revision is now accepted and ready to land.Apr 20 2020, 3:07 PM
kadircet updated this revision to Diff 258915.Apr 21 2020, 12:47 AM
kadircet marked 2 inline comments as done.
  • Make scanPreambleIncludes return an Expected<std::vector<Inclusion>>
  • Bail out in case of errors, document the rational.
kadircet marked an inline comment as done.Apr 21 2020, 12:51 AM
kadircet added inline comments.

IANAL, but they both seemed to have apache v2 :D

as for technicalities, how hard can it be to replace one of the most basic data structures in a code base :P

kadircet retitled this revision from [WIP][clangd] Make signatureHelp work with stale preambles to [clangd] Make signatureHelp work with stale preambles.Apr 21 2020, 1:19 AM
This revision was automatically updated to reflect the committed changes.