This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Track time spent in filesystem ops during preamble builds
ClosedPublic

Authored by adamcz on Mar 15 2022, 9:23 AM.

Details

Summary

In some deployments, for example when running on FUSE or using some
network-based VFS implementation, the filesystem operations might add up
to a significant fraction of preamble build time. This change allows us
to track time spent in FS operations to better understand the problem.

Diff Detail

Event Timeline

adamcz created this revision.Mar 15 2022, 9:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 15 2022, 9:23 AM
adamcz requested review of this revision.Mar 15 2022, 9:23 AM

Hmm just a few curious questions from the sidelines.
Why a "custom system" instead of something -ftime-trace based?
How much overhead does this introduce, esp. for normal use-cases?
What's the gain? The information is very coarse and general-purpose system profiling tools should give you much better information regarding file system perf.

Thanks, LG in general. I suppose the extra timing overhead should be negligible, considering the preamble build times but would be nice to see some numbers if you have any.
Apart from that I've raised the concern around multiple preamble builds finishing in parallel and poisoning the first_build metric, wanted to raise it here again to make sure it isn't lost in details.

clang-tools-extra/clangd/FS.cpp
128

nit: Timer(std::move(Timer))

141

i guess this is negligible but, timing close operations too shouldn't hurt?

clang-tools-extra/clangd/FS.h
16

i don't think we need this one

71

I don't think mentioning FS here is worth it. Can we rather emphasize on this being an accumulating timer?

71

what about making all of this an implementation detail of Preamble.cpp ? we can lift it up once we have other places that we want to make use of.

73

maybe put a FIXME:

78

nit: we use /// for public comments.

82

what about returning an int with ms granularity ?

86

what about just storing double/int ?

91

i feel like this deserves the FS specific comment mentioned above. maybe something like:

This will record all time spent on IO operations in \p Timer.
91

i don't think we ever want concurrent access to Timer, i.e. startTime should never be called without a matching call to stopTime first. passing it in as a shared_ptr somehow gives the feeling that it might be shared across multiple objects, which might do whatever they want with the object.
maybe just pass in a reference ?

clang-tools-extra/clangd/Preamble.cpp
379

what about only conditionally doing these when Stats is not null?

382

why use an expensive llvm::Timer if all we care is walltime?

385

it might be worth leaving some comments around this TimedFS being exposed to outside through PreambleCallbacks. We provide access to Preprocessor, which keeps the underlying FileManager alive.
Today all of this happens in serial hence all is fine, but we were discussing the possibility of performing the indexing and preamble serialization in parallel, it might result in some surprising race conditions if we do that.

clang-tools-extra/clangd/Preamble.h
82

nit: again /// (i see that fields of PreambleData doesn't follow the style here, but it's the outlier)

95

nit: maybe make this the last parameter and default to nullptr to get rid of changes in tests.

clang-tools-extra/clangd/TUScheduler.cpp
100

can we move all of these into an anonymous namespace instead?

109

what about just tracking the total build time here? we can get the ratio afterwards.

113

nit: llvm style doesn't really use const if the parameter is being copied anyway

121

this can still be called in parallel as we have multiple preamble threads. moreover this also makes "first_build" tracking somewhat tricky. do you think this metric would still be useful in the presence of such noise? e.g. a respawning clangd instance might receive addDocument requests for all the previously open files and the first preamble to build might've benefited a lot from cached file IO that was done by other preamble builds.

i suppose we can have something like following to at least address multiple preamble threads accessing FirstReport issue.

static bool ReportedFirstBuild = [&]{
  PreambleBuildFilesystemLatency.record(Stats.FileSystemTime, "first_build"); // I think it's fine to report twice here, as we've both performed a first build for a file, and it was the first build of clangd instance.
  return true;
}();
llvm::StringLiteral Label = IsFirstPreamble ? "first_build_for_file" : "rebuild";
PreambleBuildFilesystemLatency.record(Stats.FileSystemTime, Label);
if (Stats.TotalBuildTime > 0) // Avoid division by zero.
    PreambleBuildFilesystemLatencyRatio.record(
        Stats.FileSystemTime / Stats.TotalBuildTime, Label);
1008

nit: drop the const per style.

1015

nit:

if(!LatestBuild)
  return;
reportPreambleBuild(Stats, IsFirstPreamble);
if(isReliable(LatestBuild->CompileCommand))
  HeaderIncluders.update(FileName, LatestBuild->Includes.allHeaders());
adamcz updated this revision to Diff 415919.Mar 16 2022, 11:38 AM
adamcz marked 12 inline comments as done.

addressed review comments

Hmm just a few curious questions from the sidelines.
Why a "custom system" instead of something -ftime-trace based?

I'm not sure if I understand. -ftime-trace is a one-off. I want to monitor these values across multiple clangd instances, over time.

How much overhead does this introduce, esp. for normal use-cases?

On my machine it takes about 3.5 milliseconds to do 100k startTimer(); stopTimer(); calls in a loop. Building preamble for SemaOverload.cpp (random file I checked) takes about ~2200 such calls. Basically the impact of this should not be noticeable.

What's the gain? The information is very coarse and general-purpose system profiling tools should give you much better information regarding file system perf.

The idea is to use this to monitor large deployments of clangd on many machines over long period of time. One question to answer is whether some sort of prefetching of header files (with parallel I/O) would be beneficial. That depends on what fraction of the time we spend waiting on the serial file reads and how warm the caches are, on average.
It will also help us catch regressions, like one we had recently (that went unnoticed for quite a while) where adding some -fmodule-map-file flags to do a layering check caused a lot of stat() calls, which can be quite expensive.

Does that make sense? Does it answer your questions?

clang-tools-extra/clangd/FS.h
71

Before I make the change, let me clarify: are you suggesting moving the whole TimedFS into Preamble.cpp?

Definitely possible.

73

IMHO FIXME should be actionable. This is not - we shouldn't do it unless something changes. It's just a note that if someone ever wonders "could I just move this to support/ and re-use it", the answer is yes.

Does that make sense?

82

We could, but why? llvm::Timer returns WallTime as double so this is somewhat consistent with it. Not a strong argument, of course, since that consistency doesn't really matter, but I'm not sure what benefit using int here offers?

86

Again, why?

(I'm not strictly against it, just trying to understand why you're asking for this)

91

It's not about concurrency, it's about live time. This timer needs to have the same lifetime as the entire VFS, which is also ref-counted.

Alternative would be for the TimerFS to own this timer and expose it's own getTime() method. That means TimerFS must now be visible outside of FS.cpp, but if we're moving it to Preamble.cpp per your other comment that's fine.

clang-tools-extra/clangd/Preamble.cpp
385

Hmm...I added a comment, but nothing really changes here, right? VFS in general is not thread safe, so if we add parallelism we need to either make sure no file access happens there or make all VFSs used here thread safe.

clang-tools-extra/clangd/Preamble.h
95

I'd rather not, unless you insist. Besides not having to modify tests (which I already did anyway), what's the benefit of having it be default? Do you think it's more readable?

clang-tools-extra/clangd/TUScheduler.cpp
100

Done. I moved them into anonymous namespace, but I'm not sure what you mean by "instead"?

109

How would you get the ratio then?

113

Right, and I'll never get used to that. Sorry ;-)

121

That's an excellent point. Fixed.

I do believe that this extra information is worth it, even if it's noisy. If the fact that we're building multiple TUs in parallel and thus, essentially, have parallel file reading, significantly improves this metric then perhaps lack of parallel reads is not such a big deal after all.

1008

Hmm...is that actually documented somewhere? There's definitely many cases of "const bool" in LLVM codebase. I think the "const" improves readability.

kadircet added inline comments.Mar 21 2022, 1:28 AM
clang-tools-extra/clangd/FS.h
71

Before I make the change, let me clarify: are you suggesting moving the whole TimedFS into Preamble.cpp?

Yes, exactly. as neither the FS nor the walltimer is needed by anything else (and we probably want to be more cautious when we want to make use of them in other components).

73

SG.

82

it's mostly personal preference i suppose, it just feels reasoning about discrete durations are easier than doubles. feel free to leave as is.

86

it just feels easier to reason about builtin types, than a template alias like std::chrono::steady_clock::duration, at least on the interface level. but again, probably just a personal preference that I don't feel strongly about, so feel free to ignore. (usually with what abouts i try to signal this :D)

91

It's not about concurrency, it's about live time. This timer needs to have the same lifetime as the entire VFS, which is also ref-counted.

Right, I've noticed that as I was going through the rest of the review, but forgot to delete these as it was split into two days. sorry for the churn.

Alternative would be for the TimerFS to own this timer and expose it's own getTime() method. That means TimerFS must now be visible outside of FS.cpp, but if we're moving it to Preamble.cpp per your other comment that's fine.

yes, i think that would be a nice simplification.

clang-tools-extra/clangd/Preamble.cpp
385

right. in theory we have that more loudly spelled out by having a ThreadSafeFS, but as you pointed out that wouldn't be the only thing to consider when such a shift happens.

389

is this call intentional ?

clang-tools-extra/clangd/Preamble.h
95

right, i think it's more readable, and moreover it will reduce the need for typing that parameter more times in the future (mostly in tests again).
at the very least, what's the reason for not inserting it at the last position but rather before PreambleCallback?

clang-tools-extra/clangd/TUScheduler.cpp
100

instead of having static in front of the void reportPreambleBuild :D (sorry the comment was definitely misplaced).

109

we could just divide the distributions point by point. it isn't as accurate since we won't have the exact association, but something we've been doing in the past.
but no matter what, i forgot that we actually track the total build time through the span in buildPreamble so nvm. no need to give up the accuracy here.

142

we can drop the static now.

144

doesn't this need to be at least static ?

1008

is that actually documented somewhere

nothing that I can find either.

There's definitely many cases of "const bool" in LLVM codebase. I think the "const" improves readability.

yes, but I think the majority is still not having "const" in front of bool. it's at least the convention in clangd.
I also agree that the const improves readability, but for a local variable it carries much less importance and being consistent with the majority of the cases here is a lot more important.
because seeing occurrences of both locals with const and non-const will eventually make the reasoning hard and each case surprising.

if you think there's enough value to having consts for locals for readability, i think we should figure out a way to make the codebase (at least the clangd part) consistent with consts first.

adamcz updated this revision to Diff 416964.Mar 21 2022, 8:28 AM
adamcz marked 3 inline comments as done.

more review comments

clang-tools-extra/clangd/FS.h
91

It's not about concurrency, it's about live time. This timer needs to have the same lifetime as the entire VFS, which is also ref-counted.

Right, I've noticed that as I was going through the rest of the review, but forgot to delete these as it was split into two days. sorry for the churn.

Alternative would be for the TimerFS to own this timer and expose it's own getTime() method. That means TimerFS must now be visible outside of FS.cpp, but if we're moving it to Preamble.cpp per your other comment that's fine.

yes, i think that would be a nice simplification.

Done.

clang-tools-extra/clangd/Preamble.cpp
389

Oops, no, it's not.

clang-tools-extra/clangd/Preamble.h
95

OK, I made it default to nullptr.

The logic behind it not being last was that it's usual (though not a hard rule) for callbacks to be the last argument, probably to make passing lambdas look nicer. Not really important.

clang-tools-extra/clangd/TUScheduler.cpp
1008

OK, I'll drop const for now then.

kadircet accepted this revision.Mar 21 2022, 9:37 AM

thanks, lgtm! sorry for the long round trip.

clang-tools-extra/clangd/Preamble.cpp
315

nit: double slashes instead of triple now (as it's no longer public interface), sorry :(
same for members

This revision is now accepted and ready to land.Mar 21 2022, 9:37 AM
adamcz updated this revision to Diff 417006.Mar 21 2022, 10:29 AM
adamcz marked an inline comment as done.

fix comments

This revision was landed with ongoing or failed builds.Mar 21 2022, 10:34 AM
This revision was automatically updated to reflect the committed changes.