This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Make AST-based signals available to runWithPreamble.
ClosedPublic

Authored by usaxena95 on Jan 11 2021, 9:06 AM.

Details

Summary

Changes for new AST signals for TUScheduler.

Many useful signals can be derived from a valid AST which is regularly updated by
the ASTWorker. runWithPreamble does not have access to the ParsedAST
but it can be provided access to some signals derived from a (possibly
stale) AST.

Diff Detail

Event Timeline

usaxena95 created this revision.Jan 11 2021, 9:06 AM
usaxena95 requested review of this revision.Jan 11 2021, 9:06 AM
usaxena95 edited the summary of this revision. (Show Details)
usaxena95 updated this revision to Diff 315839.Jan 11 2021, 9:27 AM

Documentation change.

adamcz added inline comments.Jan 12 2021, 8:15 AM
clang-tools-extra/clangd/TUScheduler.cpp
857

nit: I think you can just use OS.str() here and below instead of RelatedNS to avoid extra string copy.

clang-tools-extra/clangd/unittests/TUSchedulerTests.cpp
716

bar or tar?

720

Can you add anonymous namespaces as well?

733

s/for/while?

usaxena95 updated this revision to Diff 316379.Jan 13 2021, 6:30 AM
usaxena95 marked 4 inline comments as done.

Addressed comments.

clang-tools-extra/clangd/unittests/TUSchedulerTests.cpp
720

Good point. Avoided capturing anonymous namespaces.

adamcz accepted this revision.Jan 13 2021, 10:32 AM
This revision is now accepted and ready to land.Jan 13 2021, 10:32 AM

Thanks, this looks pretty solid!

I want to be pretty careful about APIs/naming/code organization here (even moreso than usual!) because TUScheduler is a big complex beast as it is, and it's easy to get lost.

clang-tools-extra/clangd/TUScheduler.cpp
422

In practice, this is always called together with getPossiblyStalePreamble() for runWithPreamble (2 callsites). The two feel fairly related, and they each lock the same mutex.

The other two callsites of getPossiblyStalePreamble() are measuring memory (which in principle should also get the ASTSignals, though in practice it's likely small enough not to matter), and building ASTs, which can ignore it.

I'd suggest combining them into the somewhat awkward signature

shared_ptr<const PreambleData> getPossiblyStalePreamble(shared_ptr<const ASTSignals> *PossiblyStaleASTSignals = nullptr)
424

why is this public?

840

This implementation doesn't belong in TUScheduler, which is all about managing lifecycles and threading, rather than inspecting ASTs.

Best suggestion I have is to create a separate header ASTSignals.h, which defines the ASTSignals struct and a static ASTSignals ASTSignals::derive(const ParsedAST&) function to extract them.

(It's pretty awkward to even directly depend on it, since this feels more of a "feature" like go-to-definition that's independent of to TUScheduler and injected by ClangdServer, rather than core things like ParsedAST that are layered below TUScheduler. But fundamentally we can't avoid this without somehow obfuscating the data structure we're storing).

850

why not?

852

as mentioned above, you may want to do this only if the per-symbol-count was incremented from 0 to 1.
You'd have to ignore namespaces for symbols with no symbolID, but that doesn't matter.

853

why not NSD->isAnonymous()?

(and generally prefer early continue rather than indenting the whole loop body)

854

rather than checking/printing the namespace every time it's encountered, collect a DenseMap<NamespaceDecl*, unsigned> and then run through them once at the end?

856

use printNamespaceScope() from AST.h, which correctly handles inline namespace, adds trailing "::" etc.

865

nit: to minimize the length of the locked section:

  • move the new signals into the shared_ptr first (here I think you can just allocate it with make_shared in the first place). This avoids locking the allocation + move constructor.
  • introduce a new scope for the locked section, and std::swap the shared_ptrs rather than moving. Together, this avoids locking the destructor of the old signals (it runs when the pointer you swapped it into goes out of scope, instead).
clang-tools-extra/clangd/TUScheduler.h
37

Neither the comment or the name really explain what this is or what's unusual about it.
I can't think of a great name, but the comment should mention:

  • the purpose (provide information that can only be extracted from the AST to actions that can't access an AST)
  • the limitations (may be absent, always stale)
  • example usage (information about what declarations are used in a file affects code-completion ranking)
40

ReferencedSymbols?

41

I have a hard time parsing this comment (what does it mean for a namespace to be present in the file?) Maybe:

/// Namespaces whose symbols are used in the file, and the number of such symbols.

Looking at the implementation, it looks like it's rather the number of symbol *occurrences*. I suspect distinct symbols as your comment here suggests would be better though. If one symbol from a namespace is used 100 times, that should boost that symbol a lot but not necessarily the namespace a lot.

clang-tools-extra/clangd/unittests/TUSchedulerTests.cpp
738

(untangling ASTSignals from TUScheduler should make this clearer to test directly, though you may still want to keep a simple smoke test here)

usaxena95 updated this revision to Diff 316669.Jan 14 2021, 8:19 AM
usaxena95 marked 13 inline comments as done.

Addressed comments.

usaxena95 added inline comments.Jan 14 2021, 8:19 AM
clang-tools-extra/clangd/TUScheduler.cpp
840

Moved struct and calculation to a separate file ASTSignals.h/cpp

850

My mental model suggested these must be namespaces from outside the file which are relevant to this file.
I suppose we can take care of this while using this (in CC signals calculation) and let these represent all namespaces including the primary one.

On a second thought: If we do not explicitly exclude primary namespace, the results from primary namespace will be boosted much more than the results from index which might be acceptable too.

852

Ignored NS without a valid SymbolID.
Counted distinct number of Symbols used per namespace.

853

I guess isAnonymous would cover most cases. I thought of excluding named NS in an anonymous NS but those are super rare I suppose and including those is not harmful too.

usaxena95 updated this revision to Diff 316670.Jan 14 2021, 8:25 AM

Removed unused headers.

sammccall accepted this revision.Jan 14 2021, 8:26 AM

Thanks!

clang-tools-extra/clangd/ASTSignals.cpp
9

this requires storing all the SymbolIDs again.

You could just have a counter instead. The trick is that you're already counting the number of times a symbol has been seen...

unsigned &SymbolCount = Signals.ReferencedSymbols[ID];
++SymbolCount;
if (SymbolCount == 1) { // first time seeing the symbol
  // increment counter for its namespace
}
Harbormaster completed remote builds in B85178: Diff 316670.
usaxena95 updated this revision to Diff 316681.Jan 14 2021, 9:27 AM

Addressed comments. Ready for landing.

usaxena95 updated this revision to Diff 316683.Jan 14 2021, 9:31 AM
usaxena95 marked an inline comment as done.

Added newline at end of new file.

This revision was landed with ongoing or failed builds.Jan 14 2021, 9:35 AM
This revision was automatically updated to reflect the committed changes.
Harbormaster completed remote builds in B85187: Diff 316681.