Pre-instrumentation inline (pre-inliner) greatly improves the IR instrumentation code performance, among other benefits. One issue of the pre-inliner is it can introduce CFG-mismatch for COMDAT functions. This is due to the fact that the same COMDAT function may have different early inline decisions across different modules -- that means different copies of COMDAT functions will have different CFG checksum.
A simple example:
COMDAT function foo() --> bar()
foo() is defined in f1.cc and f2.cc. If bar() is available in f1.cc and is small enough to inline in pre-inliner, and it's not available in f2.cc. After pre-inline pass, f1.cc and f2.cc each has its own foo() with different IR (different CFG). If the version in f1.cc is chosen, we only have this version of counters in the profile (as we merge the COMDAT profile variables). We will have checksum mismatch when do a profile-use compilation in f2.cc.
In this patch, we propose a partially renaming the COMDAT group and its member function/variable so we have different profile counter for each version. We will post-fix the COMDAT function and the group name with its FunctionHash.
For the above case, if the foo() in f1.cc has a hash of 123456 and the foo() in f2.cc has a hash of 234567. foo() in f1.cc will be rename to foo.123456() (as well as the COMDAT name, and the profile variables in that function). foo() in f2.cc will be rename to foo.234567() (as well as the COMDAT name, and the profile variables in that function).
There are cases where two functions with the same FunctionHash might not have the same IR. For example,
foo() { bar(); goo(); }
bar() { a++; }
goo() { b++; }
in f1.cc bar() is inlined while in f2.cc goo() is inlined. From edge profile point of view, nothing will go wrong as both version has the same profile variables and all the counter updates will be captured.
The only potential problem is the indirect-call profiling. For example, if bar() contains an indirect-call. The version of foo() in f1.cc will also contain an indirect-call counter, while the version in f2.cc does not have. This will create a mismatch when reading value profiles. To address this, we add the number of indirect-calls to the function hash.
This is not bullet-proof solution. As same number of indirect-calls does not guarantee the indirect-calls are the same. For example, if goo() contains a different indirect-call. Both version in f1.cc and f2.cc has one indirect call, even though they have different call-sites. Our proposed method will treat the two versions of foo() as identical as they have the same function hash. But we believe this happens rarely. Also note that the worse case here is that we will promote a wrong indirect-target. There are not correctness issues.
Some implementation details:
(1) The mismatch also applies to AvailableExternallyLinkage functions.
We will convert AvailableExternallyLinkage functions to COMDAT functions, but not ExternalWeakLinkage functions.
(2) COMDAT group with multiple functions.
We only handle COMDAT group that having one COMDAT function to reduce the complexity. If a COMDAT group has multiple functions, we need to have a unique post-fix for all the functions. To do this, it requires to collect all the member functions and their hash, which is costly.
An alternate way is to do a post instrumentation fix-up on the instrumentation intrinsics. This is costly too.
We find multiple functions COMDAT groups are relatively rare (mostly in global static initializer). So we decided to only handle single member COMDAT groups.
Future work:
(1) Reduce the number of renamings
One optimization that reduces the number of renamings is to only apply the renaming to the COMDAT functions that preinline occurs. Unfortunately, currently we does not have an attribute for this.
(2) Darwin
Darwin does not use COMDAT, instead, it uses LinkOnce linkage. We will have a separated patch to deal the mismatch introduced by pre-inliner.
Change the name to "DoComdatRenaming"