This is an archive of the discontinued LLVM Phabricator instance.

[clang][AST] Use 64 bit integers for bit offsets inside AST file
AbandonedPublic

Authored by DmitryPolukhin on Mar 17 2020, 9:44 AM.

Details

Reviewers
None
Summary

This diff is for discussion how to address problem with large preamble file.

32 bit offsets can be used for PCH/preamble files below 512M. This diff fixes
crashes and asserts on clangd when preamble file exceeds 512M limit. The asserts
usually look like "LLVM ERROR: Invalid abbrev number". On about 700M preamble
files this patch increases file size on about 4%. I tested this diff on
Clang 8, 9 and master.

Test Plan:
Tested on clangd with 700M preamble file. 

Diff Detail

Event Timeline

DmitryPolukhin created this revision.Mar 17 2020, 9:44 AM
aganea added a subscriber: aganea.EditedMar 17 2020, 10:35 AM

Hello Dmitry! Could we use varints instead of uint64_t if size on disk is a concern?

Hello Dmitry! Could we use varints instead of uint64_t if size on disk is a concern?

Variants are very close to LLVM Bitcode Variable Width Integers that can be even smaller than byte. So I think the answer is yes but it will require replacing blobs with encoded values and I'm not sure why it was not implemented this way from the beginning taking into account that mechanism exists. Perhaps it happens because indices can be memory mapped and used as is without expansion in memory.

lebedev.ri retitled this revision from Use 64 bit integers for bit offsets inside AST file to [clang][AST] Use 64 bit integers for bit offsets inside AST file.Mar 17 2020, 1:35 PM