This is an archive of the discontinued LLVM Phabricator instance.

[MemProf] Memprof profile matching and annotation
ClosedPublic

Authored by tejohnson on Jun 19 2022, 10:18 AM.

Details

Summary

Profile matching and IR annotation for memprof profiles.

See also related RFCs:
RFC: Sanitizer-based Heap Profiler [1]
RFC: A binary serialization format for MemProf [2]
RFC: IR metadata format for MemProf [3]*
* Note that the IR metadata format has changed from the RFC during
implementation, as described in the preceeding patch adding the basic
metadata and verification support.

The matching is performed during the normal PGO annotation phase, to
ensure that the inlines applied in the IR at that point are a subset
of the inlines in the profiled binary and thus reflected in the
profile's call stacks. This is important because the call frames are
associated with functions in the profile based on the inlining in the
symbolized call stacks, and this simplifies locating the subset of
profile data relevant for matching onto each function's IR.

The PGOInstrumentationUse pass is enhanced to perform matching for
whatever combination of memprof and regular PGO profile data exists in
the profile.

Using the utilities introduced in D128854:
The memprof profile data for each context is converted to "cold" or
"notcold" based on parameterized thresholds for size, access count, and
lifetime. The memprof allocation contexts are trimmed to the minimal
amount of context required to uniquely identify whether the context is
cold or not cold. For allocations where all profiled contexts have the
same allocation type, no memprof metadata is attached and instead the
allocation call is directly annotated with an attribute specifying the
alloction type. This is the same attributed that will be applied to
allocation calls once cloned for different contexts, and later used
during LibCall simplification to emit allocation hints [4].

Depends on D128141 and D128854.

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html
[2] https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html
[3] https://discourse.llvm.org/t/rfc-ir-metadata-format-for-memprof/59165
[4] https://github.com/google/tcmalloc/commit/ab87cf382dc56784f783f3aaa43d6d0465d5f385

Diff Detail

Event Timeline

tejohnson created this revision.Jun 19 2022, 10:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 19 2022, 10:18 AM
tejohnson requested review of this revision.Jun 19 2022, 10:18 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 19 2022, 10:18 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
MTC added a subscriber: MTC.Jun 21 2022, 8:15 PM

I'm still going through PGOInstrumentation.cpp ...

clang/test/CodeGen/memprof.cpp
16

Just -no-pie is better (details: https://reviews.llvm.org/rG103b28902fd6)

llvm/include/llvm/Analysis/MemoryProfileInfo.h
1 ↗(On Diff #438201)

Can you split out the memory profile info parts into a separate patch? Also I think the interface is simple enough to be able to unit test. Something set up like [1] would be great. Then we can call addCallstack with different annotations followed by buildAndAttachMIBMetaData followed by checking the metadata annotated on the call inst(s). What do you think?

[1] https://github.com/llvm/llvm-project/blob/3f8e4169c1c390fd086658c1e51983ee61bff9bc/llvm/unittests/Analysis/FunctionPropertiesAnalysisTest.cpp#L71

40 ↗(On Diff #438201)

Should this be a doxygen comment block with ///?

llvm/include/llvm/ProfileData/InstrProfReader.h
333

The RawInstrProfReader shouldn't have the memprof mask set. We have a separate raw binary format which is independent. So this should always return false. Also maybe add a comment to document the fact?

llvm/lib/Analysis/MemoryProfileInfo.cpp
19 ↗(On Diff #438201)

We use MemoryProfile and MemProf interchangeably. Does it make sense to pick one and make it consistent throughout? Here for eg. the flags begin with "memprof-*" but the debug type is "memory-profile-*".

102 ↗(On Diff #438201)

nit: prefer static_cast<uint8_t> here and elsewhere.

tejohnson marked 3 inline comments as done.Jun 29 2022, 1:44 PM
tejohnson added inline comments.
llvm/include/llvm/Analysis/MemoryProfileInfo.h
1 ↗(On Diff #438201)

See D128854. I'll try to rebase this and the follow on inliner patch on top of that when I get a chance.

40 ↗(On Diff #438201)

Fixed in new patch.

llvm/lib/Analysis/MemoryProfileInfo.cpp
19 ↗(On Diff #438201)

The clang option is -fmemory-profile for instrumentation, so I've used that some places (e.g. the file names too) for clarity. MemProf is a nice short hand and used in the metadata. I don't have a strong opinion about which name should be used where. I've kept this as is for now in the new patch, let me know what your thoughts are on what is clearer.

102 ↗(On Diff #438201)

Fixed in new patch.

tejohnson updated this revision to Diff 441529.Jun 30 2022, 2:57 PM
tejohnson marked 3 inline comments as done.

Rebase on top of D128854 which now includes the extracted Analysis utilities.
I have not yet addressed the other comments on this patch.

Enna1 added inline comments.Jul 1 2022, 3:05 AM
llvm/lib/Analysis/MemoryBuiltins.cpp
320–322

nit: place the definition of llvm::isNewLikeFn just after llvm::isAllocationFn(const Value *V, function_ref<const TargetLibraryInfo &(Function &)> GetTLI) ?

tejohnson updated this revision to Diff 446986.Jul 22 2022, 2:45 PM
tejohnson marked 3 inline comments as done.

Address comments

clang/test/CodeGen/memprof.cpp
16

Fixed, here and elsewhere

llvm/include/llvm/ProfileData/InstrProfReader.h
333

Also added an assert.

snehasish added inline comments.Jul 25 2022, 4:10 PM
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1255

Prefer moving this code after the loop, close to where AllocType is used.

1258

I think if you use an llvm::SetVector here instead then you don't need the StackHashSet std::set below. CallstackTrie::addCallstack already accepts an ArrayRef so it won't need to change if we use a SetVector.

1259

nit: It doesn't look like we #include <set> in this file so we are probably relying on having it transitively being included from somewhere.

1282

Consider defining the lambda outside above the condition to reduce indentation. IMO it will be a little easier to follow if it wasn't inlined into the if statement itself.

1303

Is an llvm::Twine a better choice here instead of std::string? I guess it doesn't matter much in error handling code.

1319

LocHashToCallSiteFrame to indicate the value in the map corresponds to an individual frame?

1325

Not using auto over here would be helpful to know that we are indexing into the map below using an uint64_t. Same below.

1336

Should we assert that it was actually found?

1366

Can you add an assert for this?

1371

DIL != nullptr is a little easier to follow.

1403

Prefer moving this to a static helper method to reduce the size of the loop body, reduce indentation for this logic and make it more readable overall. Probably creating an functor object on the stack for each instruction that we process is not efficient either.

1420

"First add !memprof metadata ..." -- the ordering of the if-else condition isn't necessary though since only one of the iters can be non-null? We could rewrite the else condition first to reduce the complexity here a bit. Eg --

if (CallSitesIter != LocHashToCallSites.end()) {
  ...
  continue
}

// Flip the conditions here
if (!isNewLikeFn() || AllocInfoIter == LocHashToAllocInfo.end()) {
   continue
}

CallStackTrie AllocTrie;
...
llvm/test/Transforms/PGOProfile/memprof.ll
83

./pgo.exe

96

--check-prefixes=MEMPROF,ALL can be used instead.

109

I suspect that the check lines are redundant. I think FileCheck scans the entire file and groups conditions by prefix. So we could have the 3 run lines followed by a group of prefix checks.

; ALL-NOT: memprof record not found for function hash
; ALL-NOT: no profile data available for function
; MEMPROF-NOT: !prof
; PGOONLY-NOT: !memprof
; PGOONLY-NOT: !callsite
llvm/test/Transforms/PGOProfile/memprofmissingfunc.ll
14

Should we use a regex here to make it more resilient since we don't care about the exact hash?

tejohnson marked 12 inline comments as done.Aug 30 2022, 10:15 AM
tejohnson added inline comments.
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1258

Noted, but moot as this has been removed (see below)

1259

Although this std::set has been removed, I use it elsewhere so added the include.

1262

I have removed this handling. It is not correct in the case of mutual recursion involving more than one function where it will result in non-sensical stack traces. I am deferring handling recursion until later during the LTO phase.

1282

I could do this, but as is it mirrors the structure of the similar handling in readCounters, which has some advantages. wdyt?

1319

The key is the location hash (stack id) of a single frame, but the value is a set of all the CallSites from the profile that reference it.

1336

Added assert after the loop.

1366

In this case "may only" meant "might only", not "may only at most". So I can't assert on anything. This can happen for example if we have a location that corresponds to both an allocation call and another callsite (I've seen this periodically, and can reproduce e.g. with a macro). We would need to use discriminators more widely to better distinguish them in that case (with the handling here we will only match to the allocation call for now - edit: a slight change noted further below ensures this is the case). Will change /may/might/ and add a note.

1420

As noted earlier, it might be in more than one map. But I realized we could sometimes add the callsite metadata, instead of the memprof metadata, to a non-new allocation call (e.g. malloc) when there is a matching location in both maps given the structuring of the handling below. I've changed it so we handle all instructions with matching allocation profile data in the below if statement, and skip adding any metadata if there is matching allocation profile data but it is not isNewLikeFn. I've made the allocation profile matching if statement below have a continue at the end, so that I can remove the indentation further below for the callsite-only matched situation and added an assert there.

llvm/test/Transforms/PGOProfile/memprof.ll
96

Done here and elsewhere.

109

It doesn't group them by prefix, but you are right there are a lot of redundant checks in this test. And one that is not correct (see below). I have cleaned this up

116

The comment here and below and one of the checks is incorrect. This case is testing a pgo+memprof profile.

124

This is not correct for this test. But worked because it matched the first !prof against the first PGO label below after ALL-LABEL. This was just uselessly checking that there were no additional !prof before that. I have removed this and made the earlier similar check a MEMPROFONLY check.

tejohnson marked 5 inline comments as done.

Address comments

snehasish accepted this revision.Sep 14 2022, 8:58 AM

lgtm

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1282

I wasn't a big fan of the existing structure in readCounters but I didn't want to ask you to change the other code. Let's leave it as is for now.

1366

Thanks for the explanation.

This revision is now accepted and ready to land.Sep 14 2022, 8:58 AM
This revision was landed with ongoing or failed builds.Sep 22 2022, 12:49 PM
This revision was automatically updated to reflect the committed changes.
MaskRay added inline comments.
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1247

You may use BLAKE3 instead of MD5. BLAKE3 is much faster than LLVM's slow MD5 implementation.

MaskRay added inline comments.Sep 22 2022, 6:30 PM
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1247

llvm/include/llvm/Support/xxhash.h is also a good choice.

tejohnson marked an inline comment as done.Sep 23 2022, 11:29 AM
tejohnson added inline comments.
llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
1247

Thanks for the tip! I used BLAKE3.

eopXD added a subscriber: eopXD.EditedOct 4 2022, 7:04 PM

I see that this revision is reverted along with reversion of D128143, I see that D128143 is re-committed.
The commit message of D128143 says that it depends on this patch. Does the dependency still exist?

eopXD added a comment.Oct 4 2022, 7:05 PM
This comment was removed by eopXD.