Page MenuHomePhabricator

[MemProf] Memprof profile matching and annotation
Needs ReviewPublic

Authored by tejohnson on Jun 19 2022, 10:18 AM.



Profile matching and IR annotation for memprof profiles.

See also related RFCs:
RFC: Sanitizer-based Heap Profiler [1]
RFC: A binary serialization format for MemProf [2]
RFC: IR metadata format for MemProf [3]*
* Note that the IR metadata format has changed from the RFC during
implementation, as described in the preceeding patch adding the basic
metadata and verification support.

The matching is performed during the normal PGO annotation phase, to
ensure that the inlines applied in the IR at that point are a subset
of the inlines in the profiled binary and thus reflected in the
profile's call stacks. This is important because the call frames are
associated with functions in the profile based on the inlining in the
symbolized call stacks, and this simplifies locating the subset of
profile data relevant for matching onto each function's IR.

The PGOInstrumentationUse pass is enhanced to perform matching for
whatever combination of memprof and regular PGO profile data exists in
the profile.

Using the utilities introduced in D128854:
The memprof profile data for each context is converted to "cold" or
"notcold" based on parameterized thresholds for size, access count, and
lifetime. The memprof allocation contexts are trimmed to the minimal
amount of context required to uniquely identify whether the context is
cold or not cold. For allocations where all profiled contexts have the
same allocation type, no memprof metadata is attached and instead the
allocation call is directly annotated with an attribute specifying the
alloction type. This is the same attributed that will be applied to
allocation calls once cloned for different contexts, and later used
during LibCall simplification to emit allocation hints [4].

Depends on D128141 and D128854.


Diff Detail

Event Timeline

tejohnson created this revision.Jun 19 2022, 10:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 19 2022, 10:18 AM
tejohnson requested review of this revision.Jun 19 2022, 10:18 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 19 2022, 10:18 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
MTC added a subscriber: MTC.Jun 21 2022, 8:15 PM

I'm still going through PGOInstrumentation.cpp ...


Just -no-pie is better (details:

1 ↗(On Diff #438201)

Can you split out the memory profile info parts into a separate patch? Also I think the interface is simple enough to be able to unit test. Something set up like [1] would be great. Then we can call addCallstack with different annotations followed by buildAndAttachMIBMetaData followed by checking the metadata annotated on the call inst(s). What do you think?


40 ↗(On Diff #438201)

Should this be a doxygen comment block with ///?


The RawInstrProfReader shouldn't have the memprof mask set. We have a separate raw binary format which is independent. So this should always return false. Also maybe add a comment to document the fact?

19 ↗(On Diff #438201)

We use MemoryProfile and MemProf interchangeably. Does it make sense to pick one and make it consistent throughout? Here for eg. the flags begin with "memprof-*" but the debug type is "memory-profile-*".

102 ↗(On Diff #438201)

nit: prefer static_cast<uint8_t> here and elsewhere.

tejohnson marked 3 inline comments as done.Jun 29 2022, 1:44 PM
tejohnson added inline comments.
1 ↗(On Diff #438201)

See D128854. I'll try to rebase this and the follow on inliner patch on top of that when I get a chance.

40 ↗(On Diff #438201)

Fixed in new patch.

19 ↗(On Diff #438201)

The clang option is -fmemory-profile for instrumentation, so I've used that some places (e.g. the file names too) for clarity. MemProf is a nice short hand and used in the metadata. I don't have a strong opinion about which name should be used where. I've kept this as is for now in the new patch, let me know what your thoughts are on what is clearer.

102 ↗(On Diff #438201)

Fixed in new patch.

tejohnson updated this revision to Diff 441529.Jun 30 2022, 2:57 PM
tejohnson marked 3 inline comments as done.

Rebase on top of D128854 which now includes the extracted Analysis utilities.
I have not yet addressed the other comments on this patch.

Enna1 added inline comments.Jul 1 2022, 3:05 AM

nit: place the definition of llvm::isNewLikeFn just after llvm::isAllocationFn(const Value *V, function_ref<const TargetLibraryInfo &(Function &)> GetTLI) ?

tejohnson updated this revision to Diff 446986.Fri, Jul 22, 2:45 PM
tejohnson marked 3 inline comments as done.

Address comments


Fixed, here and elsewhere


Also added an assert.

snehasish added inline comments.Mon, Jul 25, 4:10 PM

Prefer moving this code after the loop, close to where AllocType is used.


I think if you use an llvm::SetVector here instead then you don't need the StackHashSet std::set below. CallstackTrie::addCallstack already accepts an ArrayRef so it won't need to change if we use a SetVector.


nit: It doesn't look like we #include <set> in this file so we are probably relying on having it transitively being included from somewhere.


Consider defining the lambda outside above the condition to reduce indentation. IMO it will be a little easier to follow if it wasn't inlined into the if statement itself.


Is an llvm::Twine a better choice here instead of std::string? I guess it doesn't matter much in error handling code.


LocHashToCallSiteFrame to indicate the value in the map corresponds to an individual frame?


Not using auto over here would be helpful to know that we are indexing into the map below using an uint64_t. Same below.


Should we assert that it was actually found?


Can you add an assert for this?


DIL != nullptr is a little easier to follow.


Prefer moving this to a static helper method to reduce the size of the loop body, reduce indentation for this logic and make it more readable overall. Probably creating an functor object on the stack for each instruction that we process is not efficient either.


"First add !memprof metadata ..." -- the ordering of the if-else condition isn't necessary though since only one of the iters can be non-null? We could rewrite the else condition first to reduce the complexity here a bit. Eg --

if (CallSitesIter != LocHashToCallSites.end()) {

// Flip the conditions here
if (!isNewLikeFn() || AllocInfoIter == LocHashToAllocInfo.end()) {

CallStackTrie AllocTrie;



--check-prefixes=MEMPROF,ALL can be used instead.


I suspect that the check lines are redundant. I think FileCheck scans the entire file and groups conditions by prefix. So we could have the 3 run lines followed by a group of prefix checks.

; ALL-NOT: memprof record not found for function hash
; ALL-NOT: no profile data available for function
; MEMPROF-NOT: !prof
; PGOONLY-NOT: !memprof
; PGOONLY-NOT: !callsite

Should we use a regex here to make it more resilient since we don't care about the exact hash?