This is an archive of the discontinued LLVM Phabricator instance.

[AutoFDO] Remove a broken assert in merging inlinee samples
ClosedPublic

Authored by hoy on Oct 23 2020, 9:49 AM.

Details

Summary

Duplicated callsites share the same callee profile if the original callsite was inlined. The sharing also causes the profile of callee's callee to be shared. This breaks the assert introduced ealier by D84997 in a tricky way.

To illustrate, I'm using an abstract example. Say we have three functions A, B and C. A calls B twice and B calls C once. Some optimize performed prior to the sample profile loader duplicates first callsite to B and the program may look like

A()
{
  B();  // with nested profile B1 and C1
  B();  // duplicated, with nested profile B1 and C1
  B();  // with nested profile B2 and C2
}

For some reason, the sample profile loader inliner then decides to only inline the first callsite in A and transforms A into

A()
{
  C();  // with nested profile C1
  B();  // duplicated, with nested profile B1 and C1
  B();  // with nested profile B2 and C2.
}

Here is what happens next:

  1. Failing to inline the callsite C() results in C1's samples returned to C's base (outlined) profile. In the meantime, C1's head samples are updated to C1's entry sample. This also affects the profile of the middle callsite which shares C1 with the first callsite.
  2. Failing to inline the middle callsite results in B1 returned to B's base profile, which in turn will cause C1 merged into B's base profile. Note that the nest C profile in B's base has a non-zero head sample count now. The value actually equals to C1's entry count.
  3. Failing to inline last callsite results in B2 returned to B's base profile. Note that the nested C profile in B's base now has an entry count equal to the sum of that of C1 and C2, with the head count equal to that of C1. This will trigger the assert later on.
    1. Compiling B using B's base profile. Failing to inline C there triggers the returning of the nested C profile. Since the nested C profile has a non-zero head count, the returning doesn't go through. Instead, the assert goes off.

It's good that C1 is only returned once, based on using a non-zero head count to ensure an inline profile is only returned once. However C2 is never returned. While it seems hard to solve this perfectly within the current framework, I'm just removing the broken assert. This should be reasonably fixed by the upcoming CSSPGO work where counts returning is based on context-sensitivity and a distribution factor for callsite probes.

The simple example is extracted from one of our internal services. In reality, why the original callsite B() and duplicate one having different inline behavior is a magic. It has to do with imperfect counts in profile and extra complicated inlining that makes the hotness for them different.

Diff Detail

Event Timeline

hoy created this revision.Oct 23 2020, 9:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 23 2020, 9:49 AM
hoy requested review of this revision.Oct 23 2020, 9:49 AM
hoy edited the summary of this revision. (Show Details)Oct 23 2020, 10:00 AM
hoy edited the summary of this revision. (Show Details)
hoy added reviewers: wenlei, wmi, davidxl.
hoy retitled this revision from [AutoFDO] Remove an assert when merging inlinee samples to [AutoFDO] Remove a broken assert in merging inlinee samples.
hoy edited the summary of this revision. (Show Details)Oct 23 2020, 10:13 AM
wenlei accepted this revision.Oct 23 2020, 10:14 AM

Thanks for tracking down the assertion. The context profile adjustment and merging for AutoFDO today is a best effort approximation. Given the case we ran into, which is hard to handle perfectly and also very rare, removing the assertion makes sense to me.

This revision is now accepted and ready to land.Oct 23 2020, 10:14 AM
wmi added a comment.Oct 23 2020, 11:20 AM

LGTM. The assertion is indeed fragile. I can think of another case to break the assertion. getEntrySamples depends on what is the first source line or the callsite in the inline instance. Suppose a FunctionSample is duplicated so initally getEntrySamples value of all duplicates are all equal. However it is easy for the value of getEntrySamples to change after some optimization changes the first line/callsite in a duplicate.

This revision was landed with ongoing or failed builds.Oct 23 2020, 5:42 PM
This revision was automatically updated to reflect the committed changes.