This is an archive of the discontinued LLVM Phabricator instance.

SamplePGO: convert callsite samples map key from callsite_location to callsite_location+callee_name
ClosedPublic

Authored by danielcdh on Apr 11 2017, 10:30 AM.

Details

Summary

For iterative SamplePGO, an indirect call can be speculatively promoted to multiple direct calls and get inlined. All these promoted direct calls will share the same callsite location (offset+discriminator). With the current implementation, we cannot distinguish between different promotion candidates and its inlined instance. This patch adds callee_name to the key of the callsite sample map. And added helper functions to get all inlined callee samples for a given callsite location. This helps the profile annotator promote correct targets and inline it before annotation, and ensures all indirect call targets to be annotated correctly.

Event Timeline

danielcdh created this revision.Apr 11 2017, 10:30 AM
davidxl added inline comments.Apr 13 2017, 9:46 AM
include/llvm/ProfileData/SampleProf.h
276

do you need to use const & here? just 'auto FS = ..'

281

What does this loop do?

lib/Transforms/IPO/SampleProfile.cpp
597

early return when M == null.

danielcdh marked an inline comment as done.Apr 13 2017, 10:02 AM
danielcdh added inline comments.
include/llvm/ProfileData/SampleProf.h
281

If we cannot find exact match, use the FS with the max total count. Updated the code to add the comment. And also updated the function level comment.

lib/Transforms/IPO/SampleProfile.cpp
597

Do you mean M->size() == 0? early return added for that condition.

davidxl added inline comments.Apr 13 2017, 10:05 AM
include/llvm/ProfileData/SampleProf.h
281

Is it correct to return the max count samples? What is the reason this is needed?

danielcdh added inline comments.Apr 13 2017, 10:13 AM
include/llvm/ProfileData/SampleProf.h
281

Sometimes, the callsite does not have full callee name in the debug info/afdo profile because the linkage name for internal functions were not emitted. There are also cases that the function name get changed during refactoring. So if we cannot find exact match, we want to have the best-effort matching.

There is no correctness concern because even if the profile is completely incorrect, we can still emit correct (but inefficient code).

And for this specific case, different callee_name with the same callsite_loc only applies to profiles collected from speculatively promoted indirect call (with > 1 targets), which is rare. So the best-effort match here is most likely one-target-match, which falls back to the case without the patch.

davidxl accepted this revision.Apr 13 2017, 10:28 AM

lgtm (with more testing (internal)).

This revision is now accepted and ready to land.Apr 13 2017, 10:28 AM
danielcdh closed this revision.Apr 13 2017, 1:04 PM