Page MenuHomePhabricator

[SampleFDO] Enhance profile remapping support for inline instance

Authored by wmi on Aug 20 2020, 11:06 PM.



Profile remapping is a feature to match a function in the module with its profile in sample profile if the function name and the name in profile look different but are equivalent using given remapping rules. This is a useful feature to keep the performance stable by specifying some remapping rules when sampleFDO targets are going through some large scale function signature change.

However, currently profile remapping support is only valid for outline function profile in SampleFDO. It cannot match a callee with an inline instance profile if they have different but equivalent names. We found that without the support for inline instance profile, remapping is less effective for some large scale change.

To add that support, the patch introduces a temporary remapper in addition to the current remapper used for outline function profile. The temporary remapper will be populated with remapping rules at the same time as the regular remapper. At a callsite when compiler tries to find an inline instance profile to match with the callee, it will be populated with all the inline instance profiles at the same location as the callsite first, then check whether the callee name exists in the remapper. After the matching is done, the remapper should be cleared except that the remapping rules should be kept, that is why it is called temporary remapper. The clearing is necessary to prevent the callee name from being mistakenly matched with an inline instance profile at a different location or with an outline profile.

In a planned large scale change of int64 type (long long) to int64_t (long), we found the performance of a google internal benchmark degraded by 2% if nothing was done. That was very significant for us. If current profile remapping was enabled, the performance degradation dropped to 1.2%, and it was still significant. If the profile remapping with the current patch was enabled, the performance degradation further dropped to 0.14%, and that was small enough to be acceptable.

Diff Detail

Event Timeline

wmi created this revision.Aug 20 2020, 11:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 20 2020, 11:06 PM
wmi requested review of this revision.Aug 20 2020, 11:06 PM
wmi updated this revision to Diff 286956.Aug 20 2020, 11:16 PM

Remove an unrelated file being accidently included.

hoy added a subscriber: hoy.Aug 21 2020, 12:51 PM

Nice to see the perf regression drops to around 1% with this change!


Nit: Update comment with Remapper?


Does this remove only the last callee name added to the the remapper? Maybe give a more explicit name for that?

wmi added a comment.Aug 21 2020, 8:54 PM

Hongtao, thanks for the review!




Yes, it only removes the last callee name. I cannot find a good name so I just change the remapper to CalleeRemapper.

wmi updated this revision to Diff 287155.Aug 21 2020, 8:55 PM

Address Hongtao's comment.

hoy accepted this revision.Aug 22 2020, 11:11 AM
This revision is now accepted and ready to land.Aug 22 2020, 11:11 AM
hoy removed a reviewer: hoyFB.Aug 22 2020, 11:14 AM
wenlei added inline comments.Aug 22 2020, 8:47 PM

I guess I didn't fully understand the clearing part - if a function (inlinee) is renamed, why do we not want to apply the remapping globally? E.g. for A->B, and C->B inlined profile, if B is renamed to D, we would want to match D to B for both A->D profile and C->D path, right? Then a global remapper including all names should suffice.

Could you share a concrete example as to why each function needs its own temp remapper for all its inlinees?

wmi added inline comments.Aug 23 2020, 9:44 AM

Right, given two names B and D, ideally we want the remapper to tell us whether they are identical by returning true or false for remapper.equivalent(B, D). However, remapper doesn't work like this. To check whether B and D are identical, remapper needs to insert B into a foldingset first by parsing it, then check whether D has already existed in it by parsing D and trying to insert D (but not truely insert it).

Similarly, if we want to check whether an equivalent name exists in a nameset, remapper need to insert alll the names in the nameset before it can answer the query. So to answer the query whether the callee's name matches an inline instance profile at a specific callsite, it needs to clear names in the remapper inserted at other callsites before it inserts the callee's name at the current callsite into remapper.

Existing remapper doesn't have to be cleared throughout the sample profile loading pass because it insert all the names of outline instance in the sample profile, and the question it needs to answer in the whole pass is whether a function name has a match in the outline instances. So because of the way how remapper works, we need different remappers to answer different types of remapping queries.

wenlei added inline comments.Aug 23 2020, 11:49 PM

Maybe I'm still missing something, but the part that I don't understand is why would we need to have boundary for nameset and the queries? For mangle names, they all live in a single conceptual "namespace", regardless of whether it's name for inlinee or outlined function.

For inlinee's name, it's possible that we've inserted their names in global remapper when initializing with all outlined functions' names. If not, we could also insert it into global remapper right before checking for alternative names? To me the only difference for inlinee's name is that, we might discover those names a bit later as we may not see them when initializing the global remapper with outlined function names, but that doesn't seem to warrant creating boundaries for nameset (between outline vs inlinee, inlinees for one functions vs inlinees for another).

wmi added inline comments.Aug 24 2020, 8:20 AM

why would we need to have boundary for nameset and the queries?

That is because the query we want to have is whether a callee matches any inline instance at a specific callsite location, not the inline stances at any other callsites. That is why we want boundary for nameset related with the current callsite, and need to clear the names related with other callsites.

For inlinee's name, it's possible that we've inserted their names in global remapper

Yes, many inline instances have already exist as outline instance in global remapper. If we use global remapper, we can know a callee match an inline/outline instance somewhere in the profile, but don't know whether the callee match an inline instance at the current callsite.

wenlei added inline comments.Aug 24 2020, 10:40 AM

Ok, thanks for clarification. Looking closer at the code, I think I understand the reason now. You're iterating over the old names to see if anyone of them matches variant of new name (callee name at the call site), then you have to limit the temp remapper to contain only the new name of interest (callee name).

For outlined functions profiles, the remapper provides new_name -> old_profile mapping (SampleProfileReaderItaniumRemapper::getSamplesFor(StringRef Fname)). Dealing with name->profile mapping is tricky for inlinee's profiles. But would it be possible to tweak the remapper to support new_name->old_name lookup? With that, inlinee profile can be handled universally by retrieving old name for new callee name, then iterate the call site profiles to get inlinee's profile.

And to be consistent with SampleProfileReaderItaniumRemapper::getSamplesFor for outline profile, handling of call site profile can also be abstracted away into the remapper?

Specifically, I'm thinking about handling inlinees in SampleProfileReaderItaniumRemapper::applyRemapping too and keeping a Key to old_name map, instead of Key to samples map.

wmi added inline comments.Aug 24 2020, 10:53 PM

That is a very good suggestion! Following the suggestion I get the same remapping result with simpler code, and I find I can also add support for indirect call with the simpler implementation and am working on it.

wmi updated this revision to Diff 287804.Aug 25 2020, 5:25 PM

Address Wenlei's comment and add the support to search indirect call promotion candidate using remapper.

Great - thanks for making the changes. Looks good with two minor comments.


Do we actually need this extra check? Renaming won't change wether a call is self-recursive, so the check on line 1059 should be good enough?


hoist Remapper = Reader->getRemapper() out of the loop?

wmi added inline comments.Aug 25 2020, 9:15 PM

Line 1059 only checks whether an inline instance at the callsite has the identical name with the parent function F. It is possible that the inline instance has a different but equivalent name as F. In that case, SymbolMap may map the name of the inline instance back to the definition of F and use F as the indirect call promotion target.

I can add one more check "R->getValue()->getName() != F.getName()" to the if (R != SymbolMap.end() ...) below and simplify the code a little bit.


Good point.

wenlei added inline comments.Aug 25 2020, 9:25 PM

I see. Or R->getValue() != &F would work too?

wmi added inline comments.Aug 25 2020, 10:03 PM

Yes, that is better.

wmi updated this revision to Diff 287826.Aug 25 2020, 10:04 PM

Address Wenlei's comment.

wenlei accepted this revision.Aug 25 2020, 10:07 PM

LGTM, thanks!

wmi added a comment.Aug 25 2020, 10:10 PM

Thanks Wenlei for the review and the insightful suggestions!

hoyFB added a subscriber: hoyFB.Aug 26 2020, 9:04 AM

Looks even better now!