This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Add instrumentation mode in clangd for metrics collection.
ClosedPublic

Authored by usaxena95 on Mar 4 2020, 5:11 AM.

Details

Summary

This patch adds an instrumentation mode for clangd (enabled by
corresponding option in cc_opts).
If this mode is enabled then user can specify callbacks to run on the
final code completion result.

Moreover the CodeCompletion::Score will contain the detailed Quality and
Relevance signals used to compute the score when this mode is enabled.
These are required because we do not any place in which the final
candidates (scored and sorted) are available along with the above
signals. The signals are temporary structures in addCandidate.

The callback is needed as it gives access to many data structures that
are internal to CodeCompleteFlow and are available once Sema has run. Eg:
ScopeDistnace and FileDistance.

If this mode is disabled (as in default) then Score would just contain 2
shared pointers (null). Thus cost(memory/time) increase for the default
mode would be fairly cheap and insignificant.

Diff Detail

Event Timeline

usaxena95 created this revision.Mar 4 2020, 5:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2020, 5:11 AM
usaxena95 updated this revision to Diff 248160.Mar 4 2020, 6:01 AM

Addressed linter issues.

sammccall added inline comments.Mar 4 2020, 6:21 AM
clang-tools-extra/clangd/CodeComplete.h
140

I'm wondering if we can simplify this interface a bit. I'm not sure why we need another callback rather than just returning the CodeCompleteResult in the usual way.

Another option:

  • if we invoke a callback for each completion instead of for the result set as a whole, we don't need to work out where to stash anything.
  • the work done after addCandidate is pretty trivial, so invoking a callback there provides basically all the information about the result set. The Top-N truncation is probably something you'd rather *not* have for analysis.
  • code completion always ends with the callback being invoked, so cross-result analysis can be done at that point.

So I think this could just be a single
std::function<void(const CodeCompletion&, const SymbolQualitySignals &, const SymbolRelevanceSignals &)>.
If it's non-null, addCandidate would call toCodeCompletion on the bundle and pass it to the callback at the end.

216

why shared rather than unique?

kadircet added inline comments.Mar 4 2020, 6:33 AM
clang-tools-extra/clangd/CodeComplete.cpp
1463

can't we make use of the trace::Span instead ?

usaxena95 updated this revision to Diff 248233.Mar 4 2020, 10:20 AM
usaxena95 marked 3 inline comments as done.

Changed to invoke callback on all code completion items.

usaxena95 marked 3 inline comments as done.Mar 4 2020, 10:26 AM
usaxena95 added inline comments.
clang-tools-extra/clangd/CodeComplete.cpp
1463

CMIIW. I believe with trace::Span we can send only JSON messages from clangd to the tool running it. This doesn't allow us to get access to internal DS of CodeCompleteFlow that are used along with Quality and Relevance signals (like Distances of file and scopes).
You can argue that all this information can be serialized as a JSON (including features derived from these internal DS) but this then must be done on clangd side (not on tools side).

IMO this gives more freedom to the tool to use and derive more features which makes experimentation easier.

clang-tools-extra/clangd/CodeComplete.h
140

why we need another callback rather than just returning the CodeCompleteResult in the usual way.

There are some data structures from CodeCompleteFlow referred to in Reference signals like ScopeDistance which were needed to compute distances. But think can be addressed in your suggested "per completion" callback approach.

Another option: ...

I had given this approach some thought previously and had concerns about mapping the items to the final ranked items in the TopN result. But on a second thought we can completely ignore the final result (assuming no significant changes are done after addCandidate) and score and rank the results ourselves in the FlumeTool.

216

A not-so-proud hack to keep Score copyable (this is removed now).

usaxena95 updated this revision to Diff 248240.Mar 4 2020, 10:32 AM
usaxena95 marked 2 inline comments as done.

Remove ununsed import.

sammccall accepted this revision.Mar 4 2020, 10:58 AM
sammccall added inline comments.
clang-tools-extra/clangd/CodeComplete.h
136

First say what the behavior/API is (called once for each result...), Then justify it :)

139

I'd suggest including the final score in the signature rather than recompute it, just so the contract is really clear and simple (results yielded in arbitrary order, will be ranked by -score). Please spell this out.

140

This doesn't need to be a pointer, std::function is copy/movable and supports nullptr as a sentinel value.

clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp
1048

Nit: typically auto here (the anonymous lambda type) and let it convert to function implicitly when needed.

No need for -> type in trivial cases

This revision is now accepted and ready to land.Mar 4 2020, 10:58 AM
Harbormaster completed remote builds in B48073: Diff 248233.

Forgot to mention: I also think the trace approach certainly has things going for it, or even parsing out the messages from the existing logs.
But in this particular case the callback happens to be extremely convenient and also not invasive (since the data structures are already exposed, code complete has an opts struct etc). And since this is for analysis we have a lot of flexibility to rework if it stops being easy to maintain.

Forgot to mention: I also think the trace approach certainly has things going for it, or even parsing out the messages from the existing logs.
But in this particular case the callback happens to be extremely convenient and also not invasive (since the data structures are already exposed, code complete has an opts struct etc). And since this is for analysis we have a lot of flexibility to rework if it stops being easy to maintain.

as discussed offline, I was rather afraid of the initial version of the patch, but the final version seems ok as it only adds a single field to codecompleteopts.

usaxena95 updated this revision to Diff 248416.Mar 5 2020, 3:03 AM
usaxena95 marked 5 inline comments as done.

Addressed comments.

  • Populated score in CodeCompletion before invoking the callback.
  • Tested that CodeCompletion is scored
  • Updated comment for callback.
usaxena95 added inline comments.Mar 5 2020, 3:07 AM
clang-tools-extra/clangd/CodeComplete.h
139

Couldn't add it to the signature since inner classes cannot be forward declared.
Since CodeCompletion contains the Score, I have populated this field in the CodeCompletion (and also CompletionTokenRange) as done in toCodeCompleteResult to be consistent.

Also tested that the CodeCompletion is scored.

usaxena95 updated this revision to Diff 248422.Mar 5 2020, 3:23 AM

Passed score as a float as an explicit argument of the callback.

This revision was automatically updated to reflect the committed changes.