Page MenuHomePhabricator

[Serialization] Delta-encode consecutive SourceLocations in TypeLoc
ClosedPublic

Authored by sammccall on May 11 2022, 10:34 AM.

Details

Summary

Much of the size of PCH/PCM files comes from stored SourceLocations.
These are encoded using (almost) their raw value, VBR-encoded. Absolute
SourceLocations can be relatively large numbers, so this commonly takes
20-30 bits per location.

We can reduce this by exploiting redundancy: many "nearby" SourceLocations are
stored differing only slightly and can be delta-encoded.
Randam-access loading of AST nodes constrains how long these sequences
can be, but we can do it at least within a node that always gets
deserialized as an atomic unit.

TypeLoc is implemented in this patch as it's a relatively small change
that shows most of the API.
This saves ~3.5% of PCH size, I have local changes applying this technique
further that save another 3%, I think it's possible to get to 10% total.

Diff Detail

Event Timeline

sammccall created this revision.May 11 2022, 10:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2022, 10:34 AM
Herald added a subscriber: mgorny. · View Herald Transcript
sammccall requested review of this revision.May 11 2022, 10:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2022, 10:34 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ilya-biryukov accepted this revision.May 18 2022, 4:54 AM

NIT: typo in the change description

Randam-access

Random

LGTM overall, the improvements in PCH and PCM sizes seem worthwhile. Please see the NIT about the naming before landing, maybe there are better opions.

clang/include/clang/Serialization/SourceLocationEncoding.h
145

NIT: Naming feels a bit confusing here.
Root that may have Parent seems a bit weird to me conceptually.

One idea that I have in mind is Sequence::State that allows to continue an existing sequence or establish a new one. Note that we will need to do something about the clashing private member.

This revision is now accepted and ready to land.May 18 2022, 4:54 AM
sammccall marked an inline comment as done.May 19 2022, 12:25 AM
sammccall added inline comments.
clang/include/clang/Serialization/SourceLocationEncoding.h
145

Fair enough. Renamed Root -> State, and State -> Prev.
(Prev is not a great name, but it's private with small scope)

This revision was landed with ongoing or failed builds.May 19 2022, 12:40 AM
This revision was automatically updated to reflect the committed changes.
sammccall marked an inline comment as done.