This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Extend SymbolOrigin, stop serializing it
ClosedPublic

Authored by sammccall on Dec 7 2021, 5:11 AM.

Details

Summary

New values:

  • Split Dynamic into Open/Preamble
  • Add Background (previously was just Unknown)
  • Soon: stdlib index

This requires extending to 16 bits, which fits within the padding of Symbol.
Unfortunately we're also *serializing* SymbolOrigin as a fixed 8 bits.

Stop serializing SymbolOrigin:

  • conceptually, the source is whoever indexes or *deserializes* a symbol
  • deserialization takes SymbolOrigin as a parameter and stamps it on each sym
  • this is a breaking format change

Diff Detail

Event Timeline

sammccall created this revision.Dec 7 2021, 5:11 AM
sammccall requested review of this revision.Dec 7 2021, 5:11 AM

I think we also need to update index/remote/Server.cpp && FileSymbols (and FileIndex too).

Regarding updates to loadIndex, I actually think it makes sense for that index the always retrieve symbols as Static origin, then whoever makes use of that (we always have either a FileSymbol or a complete remote-index marshaling on top) should overwrite the origin while either building the final merged index or during marshaling. That would get rid of lots of extra plumbing.
WDYT?

clang-tools-extra/clangd/index/SymbolOrigin.cpp
17

s/P/9 && s/6/P

clang-tools-extra/clangd/index/SymbolOrigin.h
26

do we think this still provides a meaningful signal ?
It only provides value only when multiple indices of same type was involved in providing the result (and only that single type of indices). I suppose after this patch it can only happen for SM (as there are certain remote-index implementations that mark themselves as static) or RM (well this is not possible today, but some day we might have a stack of remote-indices).
As soon as there's a different type of index is involved M no longer has any meanings, as it's clear that there was some sort of merge happening (or am I missing something obvious?)


Before this patch situation is a little bit different since FileIndex just said D for both main file and preamble, but that's changing.

30

thanks for remembering that! :D

sammccall marked 2 inline comments as done.Dec 7 2021, 5:58 PM

I think we also need to update index/remote/Server.cpp && FileSymbols (and FileIndex too).

Regarding updates to loadIndex, I actually think it makes sense for that index the always retrieve symbols as Static origin, then whoever makes use of that (we always have either a FileSymbol or a complete remote-index marshaling on top) should overwrite the origin while either building the final merged index or during marshaling. That would get rid of lots of extra plumbing.
WDYT?

The problem is these methods yield SymbolSlabs, and the symbols within them are frozen/const. There's no provision to "just tweak some bitfield" - we'd have to copy the whole slab.
In general the const-ness is useful safety: it ensures strings are interned and point to the slab's storage. But it's annoying here.
(The amount of extra plumbing seemed modest enough that I didn't try very hard to hack around this, there may be a neat solution).

clang-tools-extra/clangd/index/SymbolOrigin.cpp
17

Whoops, good catch!

clang-tools-extra/clangd/index/SymbolOrigin.h
26

Hmm, we can also get M when merging happens at indexing time due to multiple visits of the same symbol (DuplicateHandling::Merge, which is used by background indexing).

I don't have a strong opinion on its usefulness though. IIRC we added it (and origins) when trying to understand the paths various symbols took through indexes. As a debugging tool some redundancy maybe has some sanity-check value, but we could also drop it.

I don't think it has so much to do with this patch (unless you think we should reuse the bit instead of expanding to 16)...

Before this patch situation is a little bit different since FileIndex just said D for both main file and preamble

...it makes sense that this could have been a reason for the flag, but I don't think it actually was :-)

sammccall updated this revision to Diff 392609.Dec 7 2021, 5:58 PM
sammccall marked an inline comment as done.

fix sigils

sammccall updated this revision to Diff 392612.Dec 7 2021, 6:03 PM

Update remote index server and FileIndex

I think we also need to update index/remote/Server.cpp && FileSymbols (and FileIndex too).

Done. I wonder why premerge tests didn't fail for FileIndex not being updated?

kadircet accepted this revision.Dec 10 2021, 6:44 AM

Thanks, LGTM!


The problem is these methods yield SymbolSlabs, and the symbols within them are frozen/const. There's no provision to "just tweak some bitfield" - we'd have to copy the whole slab.

Yes and no. As mentioned SymbolSlabs are frozen once created and I totally agree with the reasons there. But AFAICT in none of the indexes (apart from monolithic FileIndex) we ever have a single SymbolSlab.
We always have multiple of those and periodically build a single slab out of them, by copying/merging each symbol so that we can build a SymbolIndex on top of the final merged SymbolSlab. I was suggesting to perform ultimate Origin tweaking there (we can even respect the Origins of each individual slabs too).

Couple reasons:

  • Definitely less plumbing, we just let FileSymbol now what it should be called during construction. Background/RemoteIndex already has this knowledge hard coded.
  • In theory those indexes that operate on top of a multiple SymbolSlabs should be inserting their own "origin" into the final merged slab as it contains some extra logic, just like MergeIndex.

Downside is the possible discrepancy between the invocation of SymbolCollector and the serving of the SymbolSlab acquired from it.
The discrepancy is only possible when we serialize/deserialize the slab between collection and serving, I'd say in those cases the symbol origin should just be Static (and the server should add any extra semantics).

No matter where we do the "overriding" this discrepancy will become an issue today anyways, the consumer of the serialized slabs needs to overwrite them.
I just feel like keeping that semantic to FileSymbol/Background/RemoteIndex isolates the logic better.


I don't feel too strongly about it though, feel free to roll with this version if you don't think that isolation is worth the changes.

clang-tools-extra/clangd/index/SymbolOrigin.h
26

I don't think it has so much to do with this patch (unless you think we should reuse the bit instead of expanding to 16)...

Nope it isn't directly related to this patch (also, we soon plan to introduce the stdlib kind here, so doesn't really let us get back to 8).
Mostly checking if we can get rid of it and some code governing it.

Hmm, we can also get M when merging happens at indexing time due to multiple visits of the same symbol (DuplicateHandling::Merge, which is used by background indexing).
I don't have a strong opinion on its usefulness though. IIRC we added it (and origins) when trying to understand the paths various symbols took through indexes. As a debugging tool some redundancy maybe has some sanity-check value, but we could also drop it.

I see, yeah I am not sure about its usefulness either, sounds like there might still be rare cases this provides value (like debugging some behaviour in background index).
Thanks for the thoughts, definitely no action needed here.

This revision is now accepted and ready to land.Dec 10 2021, 6:44 AM

Yes and no. As mentioned SymbolSlabs are frozen once created and I totally agree with the reasons there. But AFAICT in none of the indexes (apart from monolithic FileIndex) we ever have a single SymbolSlab.
We always have multiple of those and periodically build a single slab out of them, by copying/merging each symbol so that we can build a SymbolIndex on top of the final merged SymbolSlab. I was suggesting to perform ultimate Origin tweaking there (we can even respect the Origins of each individual slabs too).

Discussed this offline a bit.

We could pass around pairs of (SymbolSlab, Origin) and make it the index's responsibility to set the origin on each symbol. But I don't think this communicates so clearly that the origin reflects where the symbol came from/passed through, rather than what's serving it. (Given that Symbol::Origin exists in the slabs too).

  • Definitely less plumbing, we just let FileSymbol now what it should be called during construction. Background/RemoteIndex already has this knowledge hard coded.

Origin for an index isn't monolithic. Even a slab can contain symbols that are the result of merging, but FileIndex will soon have symbols from stdlib index + preamble index too.
Given this, I'm not sure it's much less plumbing - it's not just passing a constructor parameter to the index.

  • In theory those indexes that operate on top of a multiple SymbolSlabs should be inserting their own "origin" into the final merged slab as it contains some extra logic, just like MergeIndex.

This makes sense to me if it's useful, but I think this is *additional* to the origin reflecting the source of the slabs.

(Our plan is to land this after new year some time, so that remote index servers don't get disrupted by format changes while people are on holiday)

This revision was automatically updated to reflect the committed changes.