This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Use xxh3_64bits for background index file digests
ClosedPublic

Authored by MaskRay on Jul 21 2023, 1:55 PM.

Details

Summary

Many sources show that xxh3 is much better than xxh64. This particular
instance may or may not have noticeable difference, but this change
moves us toward removing xxHash64.

Diff Detail

Event Timeline

MaskRay created this revision.Jul 21 2023, 1:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2023, 1:55 PM
Herald added a subscriber: arphaman. · View Herald Transcript
MaskRay requested review of this revision.Jul 21 2023, 1:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2023, 1:55 PM

These hashes are used in the filenames of index shards, and in the metadata describing dependencies between shards.

So this is an incompatible change to indexing format, you also need to increment constexpr static uint32_t Version in Serialization.cpp:460.

(I think this is fine, but not stamping just yet as I want to sync with Kadir on monday - I don't think we have anything that GCs the old shards but probably that shouldn't block this change)

MaskRay updated this revision to Diff 543200.Jul 22 2023, 9:05 AM

bump version

thanks for doing this!

as Sam pointed out this will result in invalidation of all the index shards, but that's not something new. we already don't clean up non-relevant index shards when people delete sources over time and rely on people having new checkouts or clearing things up manually. this will be somewhat more severe, as we'll double the index size all of a sudden.

but i'd like to keep this patch from landing until llvm-17 is cut (which should be tomorrow). we didn't have any changes to our index shard serialization since llvm-16 release, so forcing everyone to face this invalidation purely for the sake of a clean up that might never happen in the codebase doesn't feel enough of a justification. whereas it's more likely that we'll have changes to serialization format throughout the next release cycle and can hit users only once with big cache invalidations.

sammccall accepted this revision.Jul 24 2023, 1:53 AM

Agree - LGTM but please land once the branch release/17.x exists. Thanks!

This revision is now accepted and ready to land.Jul 24 2023, 1:53 AM

thanks for doing this!

as Sam pointed out this will result in invalidation of all the index shards, but that's not something new. we already don't clean up non-relevant index shards when people delete sources over time and rely on people having new checkouts or clearing things up manually. this will be somewhat more severe, as we'll double the index size all of a sudden.

but i'd like to keep this patch from landing until llvm-17 is cut (which should be tomorrow). we didn't have any changes to our index shard serialization since llvm-16 release, so forcing everyone to face this invalidation purely for the sake of a clean up that might never happen in the codebase doesn't feel enough of a justification. whereas it's more likely that we'll have changes to serialization format throughout the next release cycle and can hit users only once with big cache invalidations.

Sounds good. I'll wait after the branch (today) :)