String-based CS profiles can have severe size inflation for C++ programs with very long function names. We have seen in some extreme cases, the name table sections took 99% of profile size for an extbinary profile, which in turn caused compiler to OOM or slow down. To address that issue we are enabling MD5-based CS profile.
Different from a MD5 non-CS profile where MD5 codes are stored as integers in the name table section and can be used with extbinary only, a MD5 CS profile keeps the profile context in string form with MD5 code used to represent functions in the context. Therefore it can be used with both text and extbinary profiles.
Here is an example of a name-based CS text profile and the MD5 counterpart:
[main:3.1 @ _Z5funcBi]:120:19 0: 19 1: 19 _Z8funcLeafi:20 3: 12 [0xdb956436e78dd5fa:3.1 @ 0x630ba95aaba8cb5]:120:19 0: 19 1: 19 0x62919f2827854931:20 3: 12
Note that in the MD5 profile all function names are replaced by their MD5 codes.
The extbinary equivalents work similarly by having the context (either in real names or MD5 codes) stored in the name table section. The main benefit of this is to avoid reconstructing the context string in the sample loader. An icing on the cake is to allow mixed use of real names and MD5 codes. There is a need of this when we start squeezing the size of pseudo probe descs.
Implementation
To support this string-based MD5 profile, we reuse part of the jobs done to the non-CS profile while diverge from the rest. The profile producer, i.e, llvm-profgen and lvm-profdata, will need a special flag --use-md5 to generate MD5 profile. Therefore the internal flag FunctionSamples::UseMD5 needs to be set. However, the profile consumer, i.e, the sample profile loader, is set up to know automatically if a function name is a real name or a MD5 code, based on the 0x prefix, and it does not need FunctionSamples::UseMD5 .
The sample context tracker is tweaked to operate on integral MD5 codes internally to support contexts with real names and MD5 codes. A GUIDToFuncNameMap, which is always built for CS profile, can be used to look up real names for debugging.
Testing
With the current change, the compiler generates exactly same code with MD5 and non-MD5 CS profile. Tested with SPEC and an internal large service. For the large service, extbinary profile size was down by 10x, build time reduced to half.
The ordering based Hash in FuncToCtxtProfiles is mainly to achieve a consistent context promotion between md5 and non-md5 profiles which in turn gives a consistent codegen. However it is expansive. I tried sorting by the the combination of total sample counts and head sample counts, but still could not get every case covered. I think we might want to do this for non-md5 profile only, to favor md5 performance.