Page MenuHomePhabricator

[clangd] Add a Filesystem that overlays Dirty files.

Authored by njames93 on Jan 12 2021, 2:20 PM.



Create a ThreadsafeFS in the DraftStore that overlays the dirty file contents over another ThreadsafeFS.
This provides a nice thread-safe interface for using dirty file contents throughout the codebase, for example cross file refactoring.
Creating a Filesystem view will overlay a snapshot of the current contents, so if the draft store is updated while the view is being used, it will contain stale contents.

Diff Detail

Event Timeline

njames93 created this revision.Jan 12 2021, 2:20 PM
njames93 requested review of this revision.Jan 12 2021, 2:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2021, 2:20 PM
njames93 planned changes to this revision.Jan 12 2021, 2:37 PM
njames93 updated this revision to Diff 316357.Jan 13 2021, 3:35 AM

Remove some unrelated changes

njames93 updated this revision to Diff 316599.Jan 14 2021, 2:28 AM

Refactor RefCntString out of DraftStore class
Remove MTime from Draft, other users of DraftStore don't need to see it.

njames93 updated this revision to Diff 317856.Jan 20 2021, 6:09 AM

Start Splitting this patch up

njames93 retitled this revision from [clangd][WIP] Add a Filesystem that overlays Dirty files. to [clangd] Add a Filesystem that overlays Dirty files..Jan 20 2021, 6:47 AM
njames93 edited the summary of this revision. (Show Details)

Sorry for the long delay here. I've gotten myself buried in too many different things.

Overall the idea of exposing drafts as an overlay FS is definitely growing on me - it makes it easier and clearer to control which feature is seeing which version of the files.

At a high level my main requests are:

  • we should move maintenance of the drafts from ClangdLSPServer into ClangdServer. (This should maybe have been done a while ago, but this change makes it more complicated and therefore pressing)
  • we should simplify the implementation as much as possible, and lean on existing infrastructure - there's a lot of custom pieces here like RefCntString and DraftStoreFileSystem that seem like small optimizations over existing types

(I might take a crack at moving DraftStore by itself, as there probably are issues and I want to understand what they are...)


Passing in FS + DirtyFS doesn't seem quite right, because we also pass in all the mutations that create the differences.

It seem rather that *ClangdServer* should maintain the overlay containing the dirty buffers, and expose it (for use in the few places that ClangdLSPServer currently uses DraftStore).

(the fact that DraftStore isn't accessible from ClangdServer did also come up when trying to split the *Server classes into Modules, so this seems like a helpful direction from that perspective too)


These classes need comments describing why they exist :-)
(I think you're right that buffers may outlive the VFS)


Is there any reason to think we need this micro-optimization? It's quite invasive.


can't we use InMemoryFileSystem for this?


doing IO in view() doesn't seem obviously OK.

what's the practical consequence of the overlay's inodes not matching that of the underlying FS? (It seems reasonable to say that the files always have different identity, and may/may not have the same content)


shared_ptr<const string> seems like a cheap alternative here.

However if the point is ultimately to obtain MemoryBuffers that we can hand out with the underlying data being refcounted, can we do that directly?

class SharedBuffer : public MemoryBuffer {
  std::shared_ptr<const std::string> Data;
  Shared(StringRef Name, std::string Data);
  Shared(const RefcountBuffer &); // copyable

providing non-explicit string conversion operators is IME very likely to cause ambiguous overload errors when using this in variaous places :-(


avoid assert in headers for ODR reasons (ooh, we have a couple of violations i should clean up)

njames93 marked 3 inline comments as done.Mar 1 2021, 1:50 PM
njames93 added inline comments.

I was never a fan of this tbh. But moving DraftStore would result in this being too big,


Its basically copied from llvm/support/MemoryBuffer.cpp


InMemoryFileSystem Would work in theory. However using that requires building a lot of unnecessary nodes for thing like intermediate directories.


It's probably not necessary, but I know preambles use the UniqueID. It may just be safer to create a uniqueID for each item in the DraftStore and leave it at that.

njames93 updated this revision to Diff 327280.Mar 1 2021, 1:50 PM
njames93 marked an inline comment as done.

Address some comments, though I have a feeling @sammccall, I may have to wait until DraftStore has been refactored first before continuing on with this.

Address some comments, though I have a feeling @sammccall, I may have to wait until DraftStore has been refactored first before continuing on with this.

D97738 is an attempt at this. It is... sprawly, but it's an overdue cleanup that should fit together well with this patch, I think.

There may be some conflicts but I don't think it'll be terrible actually... I think we should keep going here, it looks really good.


Sure, but this doesn't really answer the question! Can we keep this trivial until we have a reason to optimize it? (Clang typically holds tens of thousands of MemoryBuffers, but we're only likely to have a few hundred of these)


Probably, but it would keep this code simpler, supports directory iteration and status (used in code completion), and I'm fairly confident its path handling is solid.

It's certainly possible to add this optimization later if it seems important, but in this patch I think it distracts from the design questions.


The biggest risks I can see with this approach:

  • in build-preamble-from-dirty-FS mode, would opening a draft cause the preamble to be considered invalid and needing rebuilding? My read of PrecompiledPreamble::CanReuse is no (UniqueIDs are not stored)
  • when parsing, if the file can be read through the overlay *and* underlying via different paths (e.g. hardlinks), these won't be deduplicated. However giving them the same inode only changes the nature of the bug: the file would get whichever content was read first.

So I think it's as good as we can expect.


having DraftStore sit on top of TFS seems a bit inside-out, giving it bigger scope than necessary.

What about giving DraftStore an asVFS() method that returns an in-memory filesystem?

then separately we can have a separate DirtyFS : TFS, that has a DraftStore& and a TFS &Base and implements viewImpl() on top of their public interfaces. Having the dependency in that direction seems more natural to me.


alternatively you can define whichever internal struct you like, like

struct FileData {
  std::string Filename;
  std::string Contents;
  // and MTime and UID

and store DenseMap<StringRef, shared_ptr<FileData>> (with the keys pointing into the values).

You can still hand out shared_ptr<const std::string> using the aliasing constructor of shared_ptr.

The advantages of this:

  • it avoids the internal representation from being too constrained by the public API
  • SharedStringBuffer can easily wrap the whole shared_ptr<FileData>, and so you doesn't need a *copy* of the filename

this is a little bit of complexity though, up to you

njames93 updated this revision to Diff 327646.Mar 2 2021, 6:34 PM

Getting there.

njames93 added inline comments.Mar 2 2021, 6:39 PM

How I had it was a symptom of using the underlying TFS for getting the UniqueID of files. But yes now that's gone this approach works.

njames93 added inline comments.Mar 3 2021, 4:14 AM

Probably needs moving to its own place, but wasn't sure where best to put it.


Unfortunately in the current implementation, opening a draft will invalidate and preambles that use the underlying file.
This is due to the last modification times not matching.
This could be addressed by querying the TFS on the textDocument/didOpen request to get the actual last modification time, but we shouldn't really be doing any kind of IO on that thread.

sammccall accepted this revision.Mar 8 2021, 8:33 AM

Thanks, LG!


I think ClangdServer is fine, but this can be a unique_ptr<ThreadsafeFS> with the class moved to the implementation file


we don't need try_emplace here, we overwrite the whole value regardless.
Can we use Draft &D = Drafts[File] again?


nit: std::time() (and std::time_t)


nit: /*RequiresNullTerminator=*/

This revision is now accepted and ready to land.Mar 8 2021, 8:33 AM
njames93 marked 11 inline comments as done.Mar 8 2021, 10:43 AM
njames93 added inline comments.

How about privately inheriting from ThreadsafeFS :P
I agree, hiding the impl in the cpp is probably a much nicer way.


I changed this because in an older version I wanted to set the UniqueID on the first time the file was added, however now we don't do that (or check the return value for insertion at all) its safe to go back to the operator[] access.


@sammccall Any further comments with regard to this. Is calling IO to get modification time in didOpen ok, or just accept opening a file would invalidate a preamble if the DirtyFS was used for preambles.

njames93 updated this revision to Diff 329070.Mar 8 2021, 10:46 AM
njames93 marked 2 inline comments as done.

Address comments.

sammccall added inline comments.Mar 9 2021, 2:40 AM

There's no urgency in resolving this:

  • preambles-from-dirty-fs doesn't exist in this patch
  • even once it does, we can introduce the problem first (in this experimental mode) and then solve it

That said, my suspicion is that having ClangdLSPServer stat the file, conditional on being in preambles-from-dirty-fs mode, would be OK.
(If the filesize matches I think it's safe to say the content does, and copy the timestamp. If the filesize mismatches... well, there are a number of scenarios involving encodings, and it's hard to say which will matter in practice, so we'd probably want to log and wait for the bug reports)

sammccall accepted this revision.Mar 9 2021, 3:32 AM

Still LG btw!


Maybe add an explicit comment that this returns nullptr if the file is not open.
(When the type was Optional this was fairly self-explanatory)


nit: we don't use const on pointer members, just on the pointee where appropriate.
Maybe not the best style in a vacuum but it's fairly consistent and diverging from it is a bit confusing.

njames93 marked 4 inline comments as done.Mar 9 2021, 6:34 AM
This revision was automatically updated to reflect the committed changes.