This is an archive of the discontinued LLVM Phabricator instance.

[SampleProfile] Add all callsites to AllCandidates if InlineReplay is in effect
ClosedPublic

Authored by modimo on Oct 18 2021, 2:24 PM.

Details

Summary

Replay in sample profiling needs to be asked on candidates that may not have counts or below the threshold. If replay is in effect for a function make sure these are captured and also imported during thinLTO.

Testing:
ninja check-all

Diff Detail

Event Timeline

modimo created this revision.Oct 18 2021, 2:24 PM
modimo requested review of this revision.Oct 18 2021, 2:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 18 2021, 2:24 PM
modimo edited the summary of this revision. (Show Details)Oct 18 2021, 2:33 PM
modimo added reviewers: wenlei, mtrofin.
modimo edited the summary of this revision. (Show Details)
hoy added a comment.Oct 18 2021, 2:52 PM

Good catch, thanks for working on this.

llvm/include/llvm/Analysis/ReplayInlineAdvisor.h
36 ↗(On Diff #380519)

Wondering if CallersToReplay could contain function names with those deduplicating suffixes, like .llvm. and if we should exclude those suffixes from F.getName using FunctionSamples::getCanonicalFnName. If the replay file from dwarf dump or Rpass?

llvm/lib/Transforms/IPO/SampleProfile.cpp
1112

Nit: This code doesn't exist on the inlineHotFunctionsWithPriority path which mainly serves csspgo. We should bring up the support there, could be in a separate change.

modimo updated this revision to Diff 380569.Oct 18 2021, 7:18 PM

Use getCanonicalFnName, add same path to inlineHotFunctionsWithPriority

modimo marked an inline comment as not done.Oct 18 2021, 7:20 PM
modimo added inline comments.
llvm/include/llvm/Analysis/ReplayInlineAdvisor.h
36 ↗(On Diff #380519)

Both dwarf (from llvm-symbolizer) and Rpass will emit .llvm. suffixes. Added getCanonicalFnName to uses.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1112

Makes sense. I added the code path but not sure how to generate a csspgo profile for function_metadata.ll to test it out. Is there a simple way to do that or should I create a test case from scratch?

hoy added a comment.Oct 18 2021, 10:13 PM

Add a test case for the inlining (non-importing) path? to the existing test llvm/test/Transforms/SampleProfile/inline-replay.ll ?

llvm/include/llvm/Analysis/ReplayInlineAdvisor.h
36 ↗(On Diff #380519)

Thanks. How about move the getCanonicalFnName calls here?

llvm/lib/Transforms/IPO/SampleProfile.cpp
1112

Thanks for working on the csspgo support. To test thinlto function import, here is an existing test case
llvm/test/Transforms/SampleProfile/csspgo-import-list.ll. It can be mocked to take in a replay file. I'm fine with make a separate change for csspgo work.

wenlei added inline comments.Oct 20 2021, 5:21 PM
llvm/include/llvm/Analysis/ReplayInlineAdvisor.h
34 ↗(On Diff #380569)

nit: the name is not accurate. it's more like hasInlineAdvice? When scope is module, this can return true while there's no remark for the function.

36 ↗(On Diff #380519)

How does this work with current version where uses have suffix stripped by getCanonicalFnName? CallersToReplay can still have the original names with suffixes from remarks so they won't match.

Moving FunctionSamples::getCanonicalFnName causes a weird dependency from InlineAdvisor to FunctionSamples. But not moving it here make things inconsistent between sample loader replay and cgscc replay. I actually don't think handling of suffix is critical for reply - consider the case with funique-internal-linkage-name. So we can leave out the canonicalization if possible.

Both dwarf (from llvm-symbolizer) and Rpass will emit .llvm. suffixes.

I thought dwarf names won't have the llvm suffix, but only Rpass will have it since it's using symbol names which is changed by ThinLTO local to global promotion.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1106

Can we move the following into a helper SampleProfileLoader::ExternalInlineAdviceAvailableForFunction(..), it's used in a few places.

ExternalInlineAdvisor &&
    ExternalInlineAdvisor->hasRemarksForFunction(F)
1112

Good point Hongtao. I think it's better if we include the change for inlineHotFunctionsWithPriority in this patch - it feels incomplete otherwise.

1161

When no recommendation is available from replay advisor, this should be a no-op. However with the way it's currently implemented, this will actually fall back to run call analyzer for regular inline decision in shouldInlineCandidate? May the checks above guarantees the advice will always be available here, but the way it's setup seems a bit fragile and not fail-safe..

1162–1163

There're two issues:

  • If we rely on this path to influence importing using replay input, we would lose that influence completely for non-FDO path, but replay is not meant for FDO only.
  • On the other hand, for FDO, we have the opportunity to look deeper into the call tree and do better, i.e following findExternalInlineCandidate to import lower level candidates too.

For #2, with current implementation, even if the inline replay provides the exact same decision as the native inline decision, we could miss out inlining due to missing importing here.

And this added complexity logically still belongs to findExternalInlineCandidate, I suggest we move the logic there to keep the main flow cleaner here.

All above applies to inlineHotFunctionsWithPriority as well.

1285–1288

Add a comment please.

1888

With the only callsite for getReplayInlineAdvisor removed, the function and all stuff around HasReplayRemarks can be removed too? We now assume ReplayInlineAdvisor always have remarks loaded, right?

modimo updated this revision to Diff 381401.Oct 21 2021, 2:49 PM
modimo marked 2 inline comments as done and an inline comment as not done.

Refactor uses of replay advisor into getExternalInlineAdvisorCost and getExternalInlineAdvisorShouldInline. Move importing code to findExternalInlineCandidate

@wenlei I refactored the implementation so that all queries to the external advisor route to getExternalInlineAdvisorCost or getExternalInlineAdvisorShouldInline depending on if we want the actual cost or just if inlining occurs, respectively. This unifies every location that queries for the existence of remarks (the original one in hasInlineAdvice and the new ones that add sites as candidates) and also means I don't need to expose a hasRemarksForFunction from the ReplayAdvisor and can do everything using the original InlineAdvisor interface. I also edited how importing is done:

  1. If samples are not available then we can't transitively import but can at least import the symbol itself
  2. If samples are present and external advisor wants to inline drop the import threshold to 0 so we have the opportunity to replay post-link with imported call sites

Does it make sense to drop the threshold to allow importing and are there any potential recursive issues with that?

llvm/include/llvm/Analysis/ReplayInlineAdvisor.h
36 ↗(On Diff #380519)

I thought dwarf names won't have the llvm suffix, but only Rpass will have it since it's using symbol names which is changed by ThinLTO local to global promotion.

I looked into it more since the behavior is odd, when I query llvm-symbolizer on an address I get the .llvm. suffixes even though AFAIK it should use DWARF information:

llvm-symbolizer --inlining -e llvm-objdump 0x998400 --no-demangle
_ZL14getSegmentNamePKN4llvm6object15MachOObjectFileERKNS0_10SectionRefE.llvm.5974130588735922841

However, using GDB which directly queries dwarf:

(gdb) disas/s 0x998400
Dump of assembler code for function getSegmentName(llvm::object::MachOObjectFile const*, llvm::object::SectionRef const&):

Which doesn't contain any suffixes. Looking into it, llvm-symbolizer by default actually prefers using the symbol table over the dwarf information: https://reviews.llvm.org/D83530#inline-789802 and the option --use-symbol-table is now a no-op (D87067) so it always uses symbol table information. I manually edited Opts.UseSymbolTable = false; and confirm that the suffix does get removed by llvm-symbolizer.

What that means is that both sources I'm using (llvm-symbolizer and Rpass) will have these suffixes. Canonicalization isn't necessary though, implementing it when we need it makes sense rather than anticipating its usage. I don't particularly like the weird dependency of InlineAdvisor to FunctionSamples as well.

wenlei accepted this revision.Oct 21 2021, 3:35 PM

@wenlei I refactored the implementation so that all queries to the external advisor route to getExternalInlineAdvisorCost or getExternalInlineAdvisorShouldInline depending on if we want the actual cost or just if inlining occurs, respectively. This unifies every location that queries for the existence of remarks (the original one in hasInlineAdvice and the new ones that add sites as candidates) and also means I don't need to expose a hasRemarksForFunction from the ReplayAdvisor and can do everything using the original InlineAdvisor interface. I also edited how importing is done:

  1. If samples are not available then we can't transitively import but can at least import the symbol itself
  2. If samples are present and external advisor wants to inline drop the import threshold to 0 so we have the opportunity to replay post-link with imported call sites

Does it make sense to drop the threshold to allow importing and are there any potential recursive issues with that?

That sounds good to me. It makes the implementation cleaner. Setting threshold to 0 may lead to more importing than needed which can slow down build a bit, but I don't think that's a blocker for inline replay. Lgtm now.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1285–1288

Maybe you missed this one, a comment will be helpful. When sample loader inliner is used for reply, we want to inline callee if replay advisor say so even when the callee doesn't have profile.

One thing I'm not exactly sure is how robust is the sample loader inliner for handling candidate without a profile. For testing, maybe you want to try removing the check altogether for a regular (non-replay) build for a larger component, and see if there's any trouble.

This revision is now accepted and ready to land.Oct 21 2021, 3:35 PM
modimo updated this revision to Diff 381438.Oct 21 2021, 5:16 PM

Add comment to SampleProfileLoader::getInlineCandidate. Add inline-topdown-missing.prof file that wasn't part of the diff before.

llvm/lib/Transforms/IPO/SampleProfile.cpp
1285–1288

Maybe you missed this one, a comment will be helpful. When sample loader inliner is used for reply, we want to inline callee if replay advisor say so even when the callee doesn't have profile.

Added

One thing I'm not exactly sure is how robust is the sample loader inliner for handling candidate without a profile. For testing, maybe you want to try removing the check altogether for a regular (non-replay) build for a larger component, and see if there's any trouble.

That's a good idea to validate robustness, I'm run with this check removed on CSSPGO clang build and see what shakes loose.

llvm/test/Transforms/SampleProfile/Inputs/csspgo-import-list.prof
3

FYI I was debugging through this and found that sample associated with CallBase expected a discriminator for _Z5funcAi. I'm assuming these were meant to fully match against the source. I added csspgo-import-list-no-funca.prof as an explicit test case where _Z5funcAi samples are missing.

modimo added inline comments.Oct 28 2021, 4:18 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
1285–1288

Building Clang with this early return commented out was successful.