This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Extract common file-caching logic from ConfigProvider.
ClosedPublic

Authored by sammccall on Sep 23 2020, 11:01 AM.

Details

Summary

The plan is to use this to use this for .clang-format, .clang-tidy, and
compile_commands.json. (Currently the former two are reparsed every
time, and the latter is cached forever and changes are never seen).

Diff Detail

Event Timeline

sammccall created this revision.Sep 23 2020, 11:01 AM
sammccall requested review of this revision.Sep 23 2020, 11:01 AM

To give a bit of context here: the end-goal here is hot-reload of compile_commands.json. (This covers our *two* top bugs: #192 and #83).

I'm very happy with the hot-reloading of config files (.clangd etc) and I think it's a good fit to generalize.
In particular, config parsing shouldn't be repeated unneccesarily as it has side-effects (diagnostics). Similarly, compile_commands parsing shouldn't be repeated as it's expensive (the file is often huge).
So this is better than other approaches like caching at the FS level.

We could reasonably apply this to .clang-format and .clang-tidy as well. Currently we stat up the directory chain, and then read and parse these every time. It's not very expensive as these are small files, but it's also not free.

In all of these cases, the costs are going to be in eschewing the simple APIs that hide all the IO. For compile_commands.json, we need to extend or stop using the generic compilation-database plugin mechanism. For clang-tidy/format we need to introduce a tree-shaped cache (we have one for CDBs already).

@kbobyrev ping... I think we do actually want to land this, for use with .clang-tidy files after D91029

@kbobyrev ping... I think we do actually want to land this, for use with .clang-tidy files after D91029

Yes, I was looking at copying the config cache for use with .clang-tidy

clang-tools-extra/clangd/support/FileCache.h
44

nit: Is it more descriptive to have this spelled as PathRef, same for path() below.

Sorry for such a tremendous delay.

Mostly looks good functionally, I just have a few questions about semantics and several stylistic nits.

Also, thanks for the amount of documentation - it makes it really nice to navigate around and review.

clang-tools-extra/clangd/support/FileCache.cpp
19

Uhhh. Doesn't Clang-Tidy think this is a bad idea? IIUC this is not UB but it looks quite scary regardless. Maybe use std::numeric_limits<uint64_t>::max() here?

45

Wait, I thought ValidTime is the last time point when the file could be correctly parsed (i.e. the file was valid). But if the parsing could not be done then this would be bumped regardless, right? So what's the "validity" semantics here?

clang-tools-extra/clangd/support/FileCache.h
24

s/live/fresh?

59

So the client is responsible for keeping FreshTime updated, right? I think it might be worth mentioning in the comments.

71

I assume MTime is ModifiedTime - took me some time to realize it (getting into the code), maybe expand or add some comment. Also, why are the types different with ValidTime?

clang-tools-extra/clangd/unittests/support/FileCacheTests.cpp
16

Please sort the includes here (clangd should have a Code Action here).

njames93 added inline comments.Nov 25 2020, 1:39 AM
clang-tools-extra/clangd/unittests/support/FileCacheTests.cpp
16

The includes are sorted, this is a bug in the include order check that was fixed in D91602, but the pre merge bot isnt using the updated version.

kbobyrev added inline comments.Nov 25 2020, 1:41 AM
clang-tools-extra/clangd/unittests/support/FileCacheTests.cpp
16

Ahh, I didn't know that and I had some of the valid warnings in my patches. I see, thanks!

sammccall marked 5 inline comments as done.Nov 25 2020, 1:55 AM
sammccall added inline comments.
clang-tools-extra/clangd/support/FileCache.cpp
19

Done.
(FWIW I prefer the -1 idiom as you don't have to repeat the type so there's no chance of a mismatch)

45

Ah, "valid" refers to the cache, not the file. The cache is valid if it represents what's on disk, whether that's good, bad or missing. (Note that this class doesn't actually know anything about the semantics of the file content, or whether it's valid).

Added a comment to the member.

clang-tools-extra/clangd/support/FileCache.h
24

I guess it depends which way you look at it:
You want the *configuration file* to be "live" in the sense that it takes immediate effect.
You want the *configuration* to be "fresh" in the sense that it reflects the latest source from somewhere.

71

Renamed to ModifiedTime. (mtime is a term of art: see mtime/ctime/atime. But spelling it out is nearly as precise and less cryptic).

The types are different because they are different coordinate systems: steady_clock is approximately "nanos since boot" which is the best way to compare timestamps captured within a program, and system_clock has semantics that match mtime stored in the filesystem (though actually we just compare for equality, so it's just the most convenient type).

Changed ModifiedTime to spell the underlying std type directly to make this difference clearer (though the API we get it from uses the LLVM typedef).

sammccall marked 3 inline comments as done.Nov 25 2020, 1:56 AM

(oops, missed a merge conflict. New snapshot shortly)

sammccall updated this revision to Diff 307551.Nov 25 2020, 2:17 AM
sammccall marked 2 inline comments as done.

Address review comments

sammccall added inline comments.Nov 25 2020, 2:19 AM
clang-tools-extra/clangd/support/FileCache.h
59

I'm not really following - FreshTime is a parameter and must be provided each time, so I'm not sure what "keep it updated" means.

It's true that clients likely want to pass now() + some_duration, I've added an example explaining that to the docs.
But I don't think "if you store the value of now() for a long time and use it later, it may refer to a time way in the past" isn't a trap we need to warn against.

clang-tools-extra/clangd/unittests/support/FileCacheTests.cpp
16

I can apply the fix, but then arc diff wants to revert it. I don't think this is important enough to fight the tools over.

kbobyrev accepted this revision.Nov 25 2020, 2:21 AM

LGTM with std::numeric_limits in the second place :) Thanks!

clang-tools-extra/clangd/support/FileCache.cpp
19

Yeah I guess one could use reflection for that but that but that's more code.

22

I think makes sense to do std::numeric_limits() here, too for consistency?

This revision is now accepted and ready to land.Nov 25 2020, 2:21 AM
njames93 added inline comments.Nov 25 2020, 2:32 AM
clang-tools-extra/clangd/support/FileCache.cpp
60

Is tracking the size really that important?
Modified time should be good enough. If someone modifies the file but ensures the mtime doesnt change you have to think there is a reason for that. Also a modification could preserve file size, in which case that wouldn't track anything.

sammccall marked an inline comment as done.Nov 25 2020, 2:53 AM
sammccall added inline comments.
clang-tools-extra/clangd/support/FileCache.cpp
60

If the content changes without changing mtime, and we don't detect it using size, we're stuck with an invalid cache until it changes again (which could be forever).

you have to think there is a reason for that

My best guess is if the FS has a low resolution (e.g. FAT32, though hopefully people aren't putting config files on USB sticks...) or the file was written twice e.g. by a script fast enough to beat the timer. In both cases I'd want the changes to be revealed.
What's the reason you're thinking of?

Also a modification could preserve file size, in which case that wouldn't track anything

Yup, I don't have a good answer for that case. Hopefully it's very rare. I don't think we should let the perfect be the enemy of the good.

This revision was landed with ongoing or failed builds.Nov 25 2020, 3:09 AM
This revision was automatically updated to reflect the committed changes.