Page MenuHomePhabricator

[clangd] Make AST-based signals available to runWithPreamble.

Authored by usaxena95 on Mon, Jan 11, 9:06 AM.



Changes for new AST signals for TUScheduler.

Many useful signals can be derived from a valid AST which is regularly updated by
the ASTWorker. runWithPreamble does not have access to the ParsedAST
but it can be provided access to some signals derived from a (possibly
stale) AST.

Diff Detail

Event Timeline

usaxena95 created this revision.Mon, Jan 11, 9:06 AM
usaxena95 requested review of this revision.Mon, Jan 11, 9:06 AM
usaxena95 edited the summary of this revision. (Show Details)
usaxena95 updated this revision to Diff 315839.Mon, Jan 11, 9:27 AM

Documentation change.

adamcz added inline comments.Tue, Jan 12, 8:15 AM

nit: I think you can just use OS.str() here and below instead of RelatedNS to avoid extra string copy.


bar or tar?


Can you add anonymous namespaces as well?



usaxena95 updated this revision to Diff 316379.Wed, Jan 13, 6:30 AM
usaxena95 marked 4 inline comments as done.

Addressed comments.


Good point. Avoided capturing anonymous namespaces.

adamcz accepted this revision.Wed, Jan 13, 10:32 AM
This revision is now accepted and ready to land.Wed, Jan 13, 10:32 AM

Thanks, this looks pretty solid!

I want to be pretty careful about APIs/naming/code organization here (even moreso than usual!) because TUScheduler is a big complex beast as it is, and it's easy to get lost.


In practice, this is always called together with getPossiblyStalePreamble() for runWithPreamble (2 callsites). The two feel fairly related, and they each lock the same mutex.

The other two callsites of getPossiblyStalePreamble() are measuring memory (which in principle should also get the ASTSignals, though in practice it's likely small enough not to matter), and building ASTs, which can ignore it.

I'd suggest combining them into the somewhat awkward signature

shared_ptr<const PreambleData> getPossiblyStalePreamble(shared_ptr<const ASTSignals> *PossiblyStaleASTSignals = nullptr)

why is this public?


This implementation doesn't belong in TUScheduler, which is all about managing lifecycles and threading, rather than inspecting ASTs.

Best suggestion I have is to create a separate header ASTSignals.h, which defines the ASTSignals struct and a static ASTSignals ASTSignals::derive(const ParsedAST&) function to extract them.

(It's pretty awkward to even directly depend on it, since this feels more of a "feature" like go-to-definition that's independent of to TUScheduler and injected by ClangdServer, rather than core things like ParsedAST that are layered below TUScheduler. But fundamentally we can't avoid this without somehow obfuscating the data structure we're storing).


why not?


as mentioned above, you may want to do this only if the per-symbol-count was incremented from 0 to 1.
You'd have to ignore namespaces for symbols with no symbolID, but that doesn't matter.


why not NSD->isAnonymous()?

(and generally prefer early continue rather than indenting the whole loop body)


rather than checking/printing the namespace every time it's encountered, collect a DenseMap<NamespaceDecl*, unsigned> and then run through them once at the end?


use printNamespaceScope() from AST.h, which correctly handles inline namespace, adds trailing "::" etc.


nit: to minimize the length of the locked section:

  • move the new signals into the shared_ptr first (here I think you can just allocate it with make_shared in the first place). This avoids locking the allocation + move constructor.
  • introduce a new scope for the locked section, and std::swap the shared_ptrs rather than moving. Together, this avoids locking the destructor of the old signals (it runs when the pointer you swapped it into goes out of scope, instead).

Neither the comment or the name really explain what this is or what's unusual about it.
I can't think of a great name, but the comment should mention:

  • the purpose (provide information that can only be extracted from the AST to actions that can't access an AST)
  • the limitations (may be absent, always stale)
  • example usage (information about what declarations are used in a file affects code-completion ranking)



I have a hard time parsing this comment (what does it mean for a namespace to be present in the file?) Maybe:

/// Namespaces whose symbols are used in the file, and the number of such symbols.

Looking at the implementation, it looks like it's rather the number of symbol *occurrences*. I suspect distinct symbols as your comment here suggests would be better though. If one symbol from a namespace is used 100 times, that should boost that symbol a lot but not necessarily the namespace a lot.


(untangling ASTSignals from TUScheduler should make this clearer to test directly, though you may still want to keep a simple smoke test here)

usaxena95 updated this revision to Diff 316669.Thu, Jan 14, 8:19 AM
usaxena95 marked 13 inline comments as done.

Addressed comments.

usaxena95 added inline comments.Thu, Jan 14, 8:19 AM

Moved struct and calculation to a separate file ASTSignals.h/cpp


My mental model suggested these must be namespaces from outside the file which are relevant to this file.
I suppose we can take care of this while using this (in CC signals calculation) and let these represent all namespaces including the primary one.

On a second thought: If we do not explicitly exclude primary namespace, the results from primary namespace will be boosted much more than the results from index which might be acceptable too.


Ignored NS without a valid SymbolID.
Counted distinct number of Symbols used per namespace.


I guess isAnonymous would cover most cases. I thought of excluding named NS in an anonymous NS but those are super rare I suppose and including those is not harmful too.

usaxena95 updated this revision to Diff 316670.Thu, Jan 14, 8:25 AM

Removed unused headers.

sammccall accepted this revision.Thu, Jan 14, 8:26 AM



this requires storing all the SymbolIDs again.

You could just have a counter instead. The trick is that you're already counting the number of times a symbol has been seen...

unsigned &SymbolCount = Signals.ReferencedSymbols[ID];
if (SymbolCount == 1) { // first time seeing the symbol
  // increment counter for its namespace
Harbormaster completed remote builds in B85178: Diff 316670.
usaxena95 updated this revision to Diff 316681.Thu, Jan 14, 9:27 AM

Addressed comments. Ready for landing.

usaxena95 updated this revision to Diff 316683.Thu, Jan 14, 9:31 AM
usaxena95 marked an inline comment as done.

Added newline at end of new file.

This revision was landed with ongoing or failed builds.Thu, Jan 14, 9:35 AM
This revision was automatically updated to reflect the committed changes.
Harbormaster completed remote builds in B85187: Diff 316681.