sample-profile-top-down-load is an internal option which can enable top-down order of inlining and profile annotation in sample profile load pass. It was found to be beneficial for better profile annotation.
Recently we found it could also solve some build time issue. Suppose function A has many callsites in function B. In the last release binary where sample profile was collected, the outline copy of A is large because there are many other functions inlined into A. However although all the callsites calling A in B are inlined, but every inlined body is small (A was inlined into B before other functions are inlined into A), there is no build time issue in last release.
In an optimized build using the sample profile collected from last release, without top-down inlining, we saw a case that A got very large because of inlining, and then multiple callsites of A got inlined into B, and that led to a huge B which caused significant build time issue besides profile annotation issue.
To solve that problem, the patch proposes to enable the flag sample-profile-top-down-load by default.
I reevaluated the performance again in two server benchmarks. Run one benchmark 6 times, it had no performance change in 4 runs and had 0.2% improvement in 2 runs. Run another benchmark 6 times and it had no performance change.
I don't fully understand the patch TBH. ProfileMergeInlinee is also set to false when CG == nullptr which holds true when the legacy pass manager is used. Is this intended?