This is an archive of the discontinued LLVM Phabricator instance.

[AutoFDO] Statistic for context sensitive profile guided inlining
ClosedPublic

Authored by wenlei on Nov 22 2019, 12:03 AM.

Details

Summary

AutoFDO compilation has two places that do inlining - the sample profile loader that does inlining with context sensitive profile, and the regular inliner as CGSCC pass. Ideally we want most inlining to come from sample profile loader as that is driven by context sensitive profile and also retains context sensitivity after inlining. However the reality is most of the inlining actually happens during regular inliner. To track the number of inline instances from sample profile loader and help move more inlining to sample profile loader, I'm adding statistics and optimization remarks for sample profile loader's inlining.

Event Timeline

wenlei created this revision.Nov 22 2019, 12:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2019, 12:03 AM
wmi added a comment.Nov 25 2019, 10:08 AM

However the reality is most of the inlining actually happens during regular inliner.

I would imagine among all the functions with inline instance in profile, only those small and warm/cold functions which are not inlined in early inliner will be inlined in regular inliner. The number of those small/warm and small/cold functions may be large. We found it was helpful to inline warm functions before (but have to pay some cost of code size increase. It is better to only inline small/warm functions). For small and cold functions, I think it doesn't matter whether they are inlined early or late.

It is helpful to collect some optimization remarks here, so thanks for the patch.

llvm/lib/Transforms/IPO/SampleProfile.cpp
906–907

The first parameter in the declaration of OptimizationRemark is "const char *PassName", so why not use DEBUG_TYPE?

I feel "NeverInline" may be more clear than "NotInline" in terms of showing it is illegal to inline.

1010–1021

Not inlined candidate may be reported multiple times here because of the iterative outer loop.

I guess you put the OptimizationRemark here because you want to know the exact reason of why the candidate with inline instance in profile is not inlined (here the reason is not hot enough), then some more information should be emitted to explain it.

If you don't care the exact reason, then it is better to generate the optimization remark in the loop iterating localNotInlinedCallSites. localNotInlinedCallSites contains all the candidates with inline instance in profile but not being inlined for whatever reason including the reason of "not hot enough".

llvm/test/Transforms/SampleProfile/inline-stats.ll
3

Test new pass manager as well.

30

rename the vars with just a number.

wenlei marked 4 inline comments as done.Nov 25 2019, 3:14 PM

I would imagine among all the functions with inline instance in profile, only those small and warm/cold functions which are not inlined in early inliner will be inlined in regular inliner.

Yes, small functions are the majority. But there're still others.

For small and cold functions, I think it doesn't matter whether they are inlined early or late.

It matters for post-inline profile quality. If it's inlined early by replay inliner of sample loader, the context sensitive profile will be kept. However, if replay inliner rejects them, and later they got inlined by CGSCC inliner, we will have to do count scaling for inlinee so it's not as accurate as if we inline early and preserve context-sensitive profile. This doesn't matter much for the result of inline decision, but it matters for post-inline profile quality which can affect block layout later.

llvm/lib/Transforms/IPO/SampleProfile.cpp
906–907

DEBUG_TYPE is "sample-profile" and I feel it's too broad for practical uses as it covers sample usages, weight propagation, and inlining. I wanted to have something that only gives me inlining remarks, thus using "sample-profile-inline" instead here.

Good point about "NeverInline", will change.

1010–1021

Yes, I care about the reasons. I will change the message to make the reason explicit in the output remarks.

wenlei updated this revision to Diff 232708.Dec 7 2019, 9:12 AM

Restructured remarks sample profile loader inlining:

  1. Split into InlineAttempt, InlineSuccess and InlineFail for remark names.
  2. Use OptimizationRemark only for InlineSuccess and OptimizationRemarkAnalysis for the rest.
wenlei updated this revision to Diff 232709.Dec 7 2019, 9:23 AM

update test case

wmi accepted this revision.Dec 11 2019, 4:04 PM
wmi added inline comments.
llvm/lib/Transforms/IPO/SampleProfile.cpp
906–907

Then maybe define a macro for it like #define CSINLINE DEBUG_TYPE "-inline".

This revision is now accepted and ready to land.Dec 11 2019, 4:04 PM
wenlei updated this revision to Diff 233496.Dec 11 2019, 9:35 PM

address feedback

wenlei marked an inline comment as done.Dec 11 2019, 9:36 PM
wenlei added inline comments.
llvm/lib/Transforms/IPO/SampleProfile.cpp
906–907

thanks, macro added.

This revision was automatically updated to reflect the committed changes.