This is an archive of the discontinued LLVM Phabricator instance.

[AutoFDO] Properly merge context-sensitive profile of inlinee back to outlined function
ClosedPublic

Authored by wenlei on Nov 24 2019, 11:34 PM.

Details

Summary

When sample profile loader decides not to inline a previously inlined call-site, we adjust the profile of outlined function simply by scaling up its profile counts by call-site count. This means the context-sensitive profile of that inlined instance will be thrown away. This commit try to keep context-sensitive profile for such cases:

  • Instead of scaling outlined function's profile, we now properly merge the FunctionSamples of inlined instance into outlined function, including all recursively inlined profile.
  • Instead of adjusting the profile for negative inline decision at the end of the sample profile loader pass, we do the profile merge right after processing each function. This change paired with top-down ordering of annotation/inline-replay (a separate diff) will make sure we recursively merge profile back before the profile is used for annotation and inline replay.

A new switch -sample-profile-merge-inlinee is added to enable the new profile merge for tuning. It should be the default behavior eventually.

Diff Detail

Event Timeline

wenlei created this revision.Nov 24 2019, 11:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2019, 11:34 PM
wmi added a comment.Nov 25 2019, 12:15 PM

The change makes sense to me. The only concern is for inline instance getEntrySamples is not precise and sometimes can have a large difference with the actual head count. It will be interesting to see the actual effect. I will test it combined with D70655 on my side.

The only concern is for inline instance getEntrySamples is not precise and sometimes can have a large difference with the actual head count

Agreed. I think this problem can be mitigated if we run profile inference on inlinee as well before using the count to drive inline decision and then also use the post-inference entry count of inlinee as call site count for profile adjustment. But that itself deserves a separate change, and needs to be evaluated.

For the reason you mentioned, total samples had to be used to drive inline replay here instead of relying on inlinee's entry count which would be a better proximation of call site count. Ideally, if we run profile inference before early inlining, inlinee's entry count can be more reliable and we may use that instead of total samples.

wmi added a comment.Nov 27 2019, 9:35 AM

I did performance test combing D70653 and D70655, there was ~0.2% latency improvement in a benchmark, so that is good.

llvm/include/llvm/ProfileData/SampleProfReader.h
363

I found a potential problem here. When we create a new profile for a function when merging inline instance profile back, if the function somehow has no debug information, the annotation will fail and a warning will be issued. We may want to skip those functions without debug information (The warning is issued in SampleProfileLoader::getFunctionLoc)

wenlei marked an inline comment as done.Nov 27 2019, 10:17 AM
In D70653#1761978, @wmi wrote:

I did performance test combing D70653 and D70655, there was ~0.2% latency improvement in a benchmark, so that is good.

Thanks for code review and measurement. Glad to know that it helped your workload too.

llvm/include/llvm/ProfileData/SampleProfReader.h
363

For these cases, if the function without debug info wasn't inlined, we would still get the warning, right? I though the warning in this case is no different and as helpful comparing to other cases, or did I miss anything?

wmi added inline comments.Nov 27 2019, 10:39 AM
llvm/include/llvm/ProfileData/SampleProfReader.h
363

Although without the patch, we may also see the warning because of the source drift or compiler upgrade, in reality we rarely see that happens. I am worried that the patch may increase the chance of such warning and break many builds.

So I will look into it and see what causes the warning and whether it is a rare case or not.

wmi added inline comments.Dec 2 2019, 4:02 PM
llvm/include/llvm/ProfileData/SampleProfReader.h
363

I looked into it and found the case was not rare.

For the function without debug information, in afdo profile there was no inline instance. The function was added to localNotInlinedCallSites because findFunctionSamplesAt returned a function sample as long as the location matched (the imprecise matching is designed for indirect promotion). In this case, the FunctionSamples returned didn't belong to the function without debug information.

I applied file specific -Wno-error to disable the warning whenever I saw a case like this in a large benchmark build, and I found the same problem in at least three files (If I continue the process, I may find more).

To solve the problem, maybe check debug information in findFunctionSamplesAt before return a FunctionSamples when callee name doesn't match the name in FunctionSamples. If debug information doesn't exists for a function, inlining it won't bring more context sensitive information because no profile annotation is possible.

wenlei marked an inline comment as done.Dec 3 2019, 4:46 PM
wenlei added inline comments.
llvm/include/llvm/ProfileData/SampleProfReader.h
363

Thanks for looking! We only call getOrCreateSamplesFor when a callee is found from call instruction, i.e., it's a direct call. So in this case, it seems a direct call's callee got samples from a different function (inlinee)? Sounds like it can be caused by source change?

If imprecise matching is only meant for indirect call, I can turn it off for direct call - inside findFunctionSamplesAt, only do the fuzzy matching when CalleeName is empty string.

wenlei updated this revision to Diff 232019.Dec 3 2019, 4:52 PM

address review feedback - only do fuzzy match for call site targets for indirect calls, as a result, I had to fix a few test cases.

wmi accepted this revision.Dec 5 2019, 9:06 AM

I rerun perf test and I don't see any performance change. Last run I saw very small improvement on latency in a benchmark. This is fine since the benchmark has some fluctuation by itself.

llvm/include/llvm/ProfileData/SampleProf.h
400

Nit: have an early return when CalleeName.empty() is true.

llvm/test/Transforms/SampleProfile/inline-mergeprof.ll
34–35

run opt -instnamer for the test.

This revision is now accepted and ready to land.Dec 5 2019, 9:06 AM
wenlei updated this revision to Diff 232395.Dec 5 2019, 10:46 AM
wenlei marked 2 inline comments as done.

address feedback

wenlei added inline comments.Dec 5 2019, 10:47 AM
llvm/include/llvm/ProfileData/SampleProf.h
400

done

llvm/test/Transforms/SampleProfile/inline-mergeprof.ll
34–35

done. i need to develop this habit. :)

This revision was automatically updated to reflect the committed changes.