When emitting bitcode for a C++ file, TYPE.STRUCT_NAME entries are a significant
part of the size. A typical name is "struct.std::_Vector_base.618", and
the record contents is the sequence of characters.
These records are efficiently encoded as arrays of 6-bit chars if each
char is representable in char6 encoding: [A-Za-z0-9._]
This does not include ":" so very few C++ names are so encoded - 0.4% in
the file I checked. (<> and space are also common and not encodable).
Before this patch, the fallback is to use unabbreviated encoding: each
character is a vbr6. For ~all characters (ascii>=0x20) this means
encoding as 12 bits per character.
After this patch, the fallback is to encode the characters as fixed8
arrays. This saves 4 bits per character (and also 6 bits per
unabbreviated record).
On my test file (bitcode from clang-tools-extra/clangd/ParsedAST.cpp):
overall size -18% (113 => 93kB) STRUCT_NAME fraction 47% => 37% STRUCT_NAME average size -33% (451 => 301)