AutoFDO compilation has two places that do inlining - the sample profile loader that does inlining with context sensitive profile, and the regular inliner as CGSCC pass. Ideally we want most inlining to come from sample profile loader as that is driven by context sensitive profile and also retains context sensitivity after inlining. However the reality is most of the inlining actually happens during regular inliner. To track the number of inline instances from sample profile loader and help move more inlining to sample profile loader, I'm adding statistics and optimization remarks for sample profile loader's inlining.
However the reality is most of the inlining actually happens during regular inliner.
I would imagine among all the functions with inline instance in profile, only those small and warm/cold functions which are not inlined in early inliner will be inlined in regular inliner. The number of those small/warm and small/cold functions may be large. We found it was helpful to inline warm functions before (but have to pay some cost of code size increase. It is better to only inline small/warm functions). For small and cold functions, I think it doesn't matter whether they are inlined early or late.
It is helpful to collect some optimization remarks here, so thanks for the patch.
The first parameter in the declaration of OptimizationRemark is "const char *PassName", so why not use DEBUG_TYPE?
I feel "NeverInline" may be more clear than "NotInline" in terms of showing it is illegal to inline.
Not inlined candidate may be reported multiple times here because of the iterative outer loop.
I guess you put the OptimizationRemark here because you want to know the exact reason of why the candidate with inline instance in profile is not inlined (here the reason is not hot enough), then some more information should be emitted to explain it.
If you don't care the exact reason, then it is better to generate the optimization remark in the loop iterating localNotInlinedCallSites. localNotInlinedCallSites contains all the candidates with inline instance in profile but not being inlined for whatever reason including the reason of "not hot enough".
Test new pass manager as well.
rename the vars with just a number.
I would imagine among all the functions with inline instance in profile, only those small and warm/cold functions which are not inlined in early inliner will be inlined in regular inliner.
Yes, small functions are the majority. But there're still others.
For small and cold functions, I think it doesn't matter whether they are inlined early or late.
It matters for post-inline profile quality. If it's inlined early by replay inliner of sample loader, the context sensitive profile will be kept. However, if replay inliner rejects them, and later they got inlined by CGSCC inliner, we will have to do count scaling for inlinee so it's not as accurate as if we inline early and preserve context-sensitive profile. This doesn't matter much for the result of inline decision, but it matters for post-inline profile quality which can affect block layout later.
DEBUG_TYPE is "sample-profile" and I feel it's too broad for practical uses as it covers sample usages, weight propagation, and inlining. I wanted to have something that only gives me inlining remarks, thus using "sample-profile-inline" instead here.
Good point about "NeverInline", will change.
Yes, I care about the reasons. I will change the message to make the reason explicit in the output remarks.
Restructured remarks sample profile loader inlining:
- Split into InlineAttempt, InlineSuccess and InlineFail for remark names.
- Use OptimizationRemark only for InlineSuccess and OptimizationRemarkAnalysis for the rest.