Page MenuHomePhabricator

[clangd] Refactor background-index shard loading

Authored by kadircet on Jul 15 2019, 2:02 AM.

Diff Detail


Event Timeline

kadircet created this revision.Jul 15 2019, 2:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2019, 2:02 AM
kadircet retitled this revision from [clangd] Refactor background-index shard loading to [clangd][NFC] Refactor background-index shard loading.Jul 15 2019, 2:03 AM

So I've stared at this refactoring for a while, and I still don't totally get it.
It seems like a class that really wants to be a function. It's a complicated function though - maybe a separate cpp file does make sense.

We discussed moving ShardVersions and Writes into the loader, splitting ShardVersions from IndexSymbols seems pretty odd though.

What's the advantage of making this stateful?

kadircet updated this revision to Diff 210303.Jul 17 2019, 6:07 AM
  • As discussed offline, moved the "staleness" detection back to background index
  • A renaming within Rebuilder, "LoadedTUs -> LoadedShards"

Refactoring generally looks good.
You're replacing a lot of documented code with new undocumented code, can we add some high-level comments?

130 ↗(On Diff #210303)

nit: unclear what's changed with respect to what.

I'd suggest shardIsStale() and swap the param order?

133 ↗(On Diff #210303)

While here: this message lacks context and is unnecessarily jargony: "Background-indexing: couldn't read {0} to validate stored index"?

455 ↗(On Diff #210303)

This function now has almost no comments, can you explain the major steps?

481 ↗(On Diff #210303)

move this outside the loop and use a counter? it's an extra lock/unlock for no reason

490 ↗(On Diff #210303)

nit: again, this patch is adding Path/PathRef aliases to parts of the code that don't use them

177 ↗(On Diff #210303)

and their dependencies

77 ↗(On Diff #210303)

you can't call this an (opaque?) ShardIdentifier and then use it as a path - maybe SourceFile?

1 ↗(On Diff #210303)

header is out of date. The new file/structs/functions need documentation

25 ↗(On Diff #210303)


30 ↗(On Diff #210303)

IndexFileIn is just a struct - optional<>? (And need a comment explaining when it can be null)

36 ↗(On Diff #210303)

no, they're not :-)

we talked about making the values refs to ShardInfo.AbsolutePath.

37 ↗(On Diff #210303)

why are these outside the ShardInfo?

42 ↗(On Diff #210303)

clangd::load seems like an insufficiently precise name for this function


kadircet retitled this revision from [clangd][NFC] Refactor background-index shard loading to [clangd] Refactor background-index shard loading.Jul 18 2019, 12:54 AM
kadircet updated this revision to Diff 210511.Jul 18 2019, 3:00 AM
kadircet marked 13 inline comments as done.
  • Address comments
kadircet added inline comments.Jul 18 2019, 3:00 AM
490 ↗(On Diff #210303)

I believe they represent the intention better then string/StringRefthat was the reason I was using them.

sammccall added inline comments.Jul 18 2019, 4:26 AM
133 ↗(On Diff #210511)

{0 is missing its }

497 ↗(On Diff #210511)

// We'll accept data from stale shards, but ensure the files get reindexed soon.

93 ↗(On Diff #210511)

This handles only one main file, I can't see where you skip loading shards for headers that were loaded for a previous file

104 ↗(On Diff #210511)

ShardIdentifier still here

126 ↗(On Diff #210511)

I don't think this works, the storage for these StringRefs is the LoadedShards you just moved from.
(In practice, maybe you get away with it if the strings are longer than the SSO?)

kadircet updated this revision to Diff 210541.Jul 18 2019, 6:00 AM
kadircet marked 5 inline comments as done.
  • Address comments
93 ↗(On Diff #210511)

The real loading happens down the line at const CachedShard &CS = loadShard(ShardIdentifier, Storage);, which returns shard from cache if it was already loaded.

126 ↗(On Diff #210511)

Yes you are right, I don't see a good way out of it with current data model. I suppose we can keep an "intern table :)" for the strings, but then LoadedShards would be meaningless without that intern table so we would end up returning another struct from loadIndexShards that will wrap std::vector<LoadedShards> and the intern table(but we tried to mitigate from that struct in previous iteration).

So instead I am changing the model to return only the first translation unit and actually this makes a lot of sense when we think about shards for standard library, in a code base like chromium vector could easily have ~40k dependent TUs and even if we just returned a vector<StringRef> it would still be around 640kB, and we have a lot of standard library headers. this can quite easily add hundreds of MB to clangd's memory footprint while loading shards.

kadircet updated this revision to Diff 210564.Jul 18 2019, 7:35 AM
  • Add comments
sammccall added inline comments.Jul 18 2019, 7:42 AM
47 ↗(On Diff #210541)

I think we only need to traverse this when *missing* the cache, so it can be a return value of loadShard rather than stored in CachedShard (so CachedShard becomes just LoadedShard)

sammccall accepted this revision.Jul 18 2019, 8:16 AM
sammccall added inline comments.
93 ↗(On Diff #210511)

OK, then we still traverse the whole subtree but hit every node.

I think it'd be nicer to avoid the traversal (as noted above) if that works well, up to you.

This revision is now accepted and ready to land.Jul 18 2019, 8:16 AM
kadircet updated this revision to Diff 210589.Jul 18 2019, 8:46 AM
kadircet marked an inline comment as done.
  • Get rid of CachedShard
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2019, 9:25 AM