Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context.
A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is time- and memory- consuming. Instead a context vector of StringRef is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a StringRef is now engineered as an unordered map keyed by SampleContext. SampleContext is reshaped to using an ArrayRef to represent a full context for CS profile. For non-CS profile, it falls back to use StringRef to represent a contextless function name. Both the ArrayRef and StringRef objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in ProfiledBinary and ProfileGenerator. Full context strings can be generated only in those cases of debugging and printing.
When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional SecCSNameTable section which stores all full contexts logically in the form of vector<int>, which each element as an offset points to SecNameTable. All occurrences of contexts elsewhere are redirected to using the offset of SecCSNameTable.
Testing
This is no-diff change in terms of code quality and profile content (for text profile).
For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated.
The compile time of ads is dropped by 25%.