Introduces an endpoint to Tracer for tracking metrics on internal events.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
- Introduce latency tracking through context.
- Add metrics for code actions and rename.
Thanks for tackling this! Mostly API stuff, obviously :-)
clang-tools-extra/clangd/Trace.h | ||
---|---|---|
35 ↗ | (On Diff #258595) | Conceptually there's a difference between a metric (the thing being measured, such as "latency distribution by method") and the measurement (such as "code completion took 90ms this time"). For the API (to emit data) it seems natural to model this split using constexpr objects for the metrics, and methods for the measurement: constexpr Metric IncomingLatencyMS("/clangd/latency", Distribution); ... void handleCodeCompletion() { ... IncomingLatencyMS.record(90); } (IOW I think our favorite metrics framework got this right) For the SPI (to consume data), things are a bit muddier. For I think we want to avoid having a protocol to explicitly register metrics, so there's an attraction to just reporting the flat (name, type, label, value) tuple. I also quite like the idea of reusing the type above and reporting this as (const Metric&, label, value) which makes extending Metric more natural. |
36 ↗ | (On Diff #258595) | I think we're missing an important type: a value directly provided by the measurement, such as memory usage. |
36 ↗ | (On Diff #258595) | This enum is about defining the relationship between the metric and the measurements, but also about the (related) nature of the metric itself. The current names are measurement-centric, which is good for the former but not so much for the latter, and a bit confusing if we set this enum when initializing the *metric*. I'd consider having this be metric-centric and documenting the measurements instead. e.g. enum MetricType { /// A number whose value is meaningful, and may vary over time. /// Each measurement replaces the current value. Value, /// An aggregate number whose rate of change over time is meaningful. /// Each measurement is an increment for the counter. Counter, /// A distribution of values with a meaningful mean and count. /// Each measured value is a sample for the distribution. /// The distribution is assumed not to vary, samples are aggregated over time. Distribution, }; |
42 ↗ | (On Diff #258595) | An example to show the naming scheme? |
46 ↗ | (On Diff #258595) | (Not clear whether this is the allowed set of labels for a metric, bag of tags for a measurement, tuple of coordinates for a measurement...) What are the design goals around labels? Can we make them as simple as possible (or remove them) for now? Take the example of cache hit/miss.
The latter seems to capture the structure pretty well, but doesn't generalize (what if there are three categories?). The former seems almost equivalent to labels though. Benefits of labels:
The first benefit seems pretty strong actually. I would suggest only supporting a single label though - it's a way simpler interface, we avoid complicated metrics whose schema might need documentation and enforcement, and we avoid any hint of memory allocation in the common case (just passing around a stringref that points to a literal, a filename we own, etc). |
48 ↗ | (On Diff #258595) | nit: since you have an unscoped enum, give the good name to the member and the bad name to the enum (which users need rarely spell) |
57 ↗ | (On Diff #258595) | Events we measure the duration of seem so closely related to trace spans that I don't think we can provide both without making it clear how they should relate to each other. How horrible do you think it would be to have a second Metric* Measure = nullptr in the Span constructor? It's hopelessly coupling together these two things that could otherwise be split, but... |
clang-tools-extra/clangd/Trace.h | ||
---|---|---|
35 ↗ | (On Diff #258595) | this looks a lot better, thanks! Changing the order of label and value to make it nicer for non-labeled metrics. |
36 ↗ | (On Diff #258595) |
I was planning to extend with this later on when we added memory tracking metrics. |
I like this a lot!
clang-tools-extra/clangd/ClangdLSPServer.cpp | ||
---|---|---|
45 | comment should mention the label schema if it's not configured programmatically (You could also do a trick of having a StringLiteral parameter for the label, even if all you store is bool WantsLabel or nothing at all) | |
45 | units. We should decide whether units go in the metric name or just a comment. | |
46 | nit: I'm not sure pluralizing metric names makes sense/is helpful, seems likely to make them less regular | |
clang-tools-extra/clangd/ClangdServer.cpp | ||
366 | We're not counting symbols, we're counting occurrences. | |
366 | static? and below | |
430 | this is an internal name, mixing terminology seems unneccesary I also like noun first then verb, so we get related metrics grouped alphabetically | |
463 | suggest "tweak_success" or "tweak_attempt" depending on what we're measuring | |
474 | we return early here, skipping the metric increment, but not for error-conditions below. We should be either measuring attempts or successes, consistently. | |
496 | nit: this should be outside the if | |
clang-tools-extra/clangd/TUScheduler.cpp | ||
145–148 | you could consider passing the metric in here so we can distinguish between retrieval for diagnostics and retrieval for actions | |
clang-tools-extra/clangd/support/Trace.cpp | ||
222 | don't we want a catch-all metric so we get the existing span latencies in some form? | |
227 | if we're recording floating point numbers, seconds seems less arbitrary than ms, and mayxe less prone to confusion | |
clang-tools-extra/clangd/support/Trace.h | ||
33–72 | trace events and metric measurements.... | |
53 | define label here or as part of the class definition | |
57 | text says slashes, example has dots | |
83 | now we have several logically-independent functions here, consider providing a no-op implementation (return Context::current().clone()) for this and the other methods (and remove them from the test) | |
94 | event -> metric | |
131 | units (suggest seconds) | |
132 | I'd consider making these overloads so we don't have to use a pointer. |
clang-tools-extra/clangd/ClangdLSPServer.cpp | ||
---|---|---|
45 |
did that and put an assertion to ensure labels are provided when needed and vice versa. | |
clang-tools-extra/clangd/ClangdServer.cpp | ||
366 | why? i thought constexpr already enforced internal linkage(at least for top level decls). are you suggesting moving it into the function body and marking static constexpr? | |
474 | yes i was trying to record only the successful executions. but attempts sounds better. | |
clang-tools-extra/clangd/TUScheduler.cpp | ||
145–148 | did that. apart from tracking read/diag access it is also important to not track accesses when we are trying to evict the cache. | |
clang-tools-extra/clangd/support/Trace.cpp | ||
222 | i put a metric to track all spans that wasn't created with an explicit metric. |
Awesome, ship it!
... though, how do you feel about testing the actual metrics we export?
Suggest a slightly generalized TestTracer that installs itself (RAII), and has a
// Returns recorded values for a metric, and clears then. vector<double> TestTracer::take(StringRef Metric, StringRef Label="");
Then it should be pretty simple to add assertions to a few existing tests.
clang-tools-extra/clangd/ClangdLSPServer.cpp | ||
---|---|---|
48 | Nit: operation->method, as they're called in JSON-RPC | |
clang-tools-extra/clangd/ClangdServer.cpp | ||
366 | You're right, I was confused about constexpr. That said, for the ones used inside one function, restricting the scope as you suggest seems nice. (We can also inline them entirely if only used once, but maybe some level of consistency is better) | |
clang-tools-extra/clangd/support/Trace.h | ||
95 | Remove the default argument here. Virtual + default is a little confusing and we're always going to pass it anyway |
Done, except tweak related metrics. We are not testing them through clangdserver, and i shied away from doing the plumbing :/ Let me know if you think it is important.
Also while writing the test for rename I've noticed we were actually counting renamed files rather than occurrences, had to put a loop that would go over each file, PTAL.
Thanks!
Oh, I'd forgotten this. This was actually deliberate IIRC, I think the idea was much of the "cost" of a large rename grows with files rather than occurrences, so the limit of 50 applies to files.
I'd probably leave as-is and rename the metric to rename_files, but up to you.
clang-tools-extra/clangd/unittests/ClangdLSPServerTests.cpp | ||
---|---|---|
159 | nit: ElementsAre(_) or SizeIs(1)? | |
clang-tools-extra/clangd/unittests/TUSchedulerTests.cpp | ||
523 | SizeIs(0)/SizeIs(1) might be more expressive here | |
clang-tools-extra/clangd/unittests/TestTracer.h | ||
36 | ah, this should probably be takeMetric or so, because we might extend this class to cover non-metric tracing stuff. (a slight misnomer, but "take measurement" collides with an english idiom and is also a bit unwieldy) | |
clang-tools-extra/clangd/unittests/support/TestTracer.cpp | ||
19 | this needs a lock, the interface is documented as threadsafe. |
For anyone running into the same problem, this broke the build with GCC 5:
/work/llvm.monorepo/clang-tools-extra/clangd/ClangdServer.cpp:374:75: error: could not convert ‘(const char*)""’ from ‘const char*’ to ‘llvm::StringLiteral’ trace::Metric::Distribution); ^
Hopefully 3c2c7760d9eea0236c5c54e8939b3901f4208835 fixes it.
comment should mention the label schema if it's not configured programmatically
(You could also do a trick of having a StringLiteral parameter for the label, even if all you store is bool WantsLabel or nothing at all)