This is an archive of the discontinued LLVM Phabricator instance.

[memprof] Store callsite metadata with memprof records.
ClosedPublic

Authored by snehasish on Mar 7 2022, 6:45 PM.

Details

Summary

To ease profile annotation, each of the callsites in a function can be
annotated with profile data - "IR metadata format for MemProf" [1]. This
patch extends the on-disk serialized record format to store the debug
information for allocation callsites incl inline frames. This change is
incompatible with the existing format i.e. indexed profiles must be
regenerated, raw profiles are unaffected.

[1] https://groups.google.com/g/llvm-dev/c/aWHsdMxKAfE/m/WtEmRqyhAgAJ

Diff Detail

Event Timeline

snehasish created this revision.Mar 7 2022, 6:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 6:45 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
snehasish requested review of this revision.Mar 7 2022, 6:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 6:45 PM
snehasish updated this revision to Diff 413679.Mar 7 2022, 6:52 PM

Add a comment and remove some commented code.

tejohnson added inline comments.Mar 8 2022, 2:58 PM
llvm/include/llvm/ProfileData/MemProf.h
209

It would be good to document the data structure choice - i.e. why this is a vector of vectors

311

Should this be comparing CallSites as well?

Also, it might be nice to refactor some of the code below into an operator== in the AllocationInfo struct.

llvm/include/llvm/ProfileData/RawMemProfReader.h
104

missing word(s) after "that"?

llvm/lib/ProfileData/RawMemProfReader.cpp
296

Document what the uint64_t values are.

319

Why can't the leaf function be an inline frame? I.e. what if we have:

foo() { malloc(); }
bar() { foo(); }

and we inline foo into bar.

Won't the leaf be foo which will be an inline frame?

327

If we want to avoid a hash table lookup to SymbolizedFrame for each of these addresses in the later loop, we could store a pointer to the the DenseMap iterator value_type for SymbolizedFrame[Address].

llvm/test/tools/llvm-profdata/memprof-basic.test
50

Can this test now also look for the CallSites?

llvm/unittests/ProfileData/InstrProfTest.cpp
228

Should this file also be testing CallSites?

303–304

Why is this test commented out now?

llvm/unittests/ProfileData/MemProfTest.cpp
188

Should this callstack now have 4 entries and can all be checked?

Additionally, should the record for foo now have CallSites since it shows up in the call stack for the allocation in CSM[0x2]?

snehasish updated this revision to Diff 414121.Mar 9 2022, 9:12 AM

Address comments.

snehasish marked 5 inline comments as done.Mar 9 2022, 9:38 AM

Thanks for the detailed review! PTAL.

llvm/lib/ProfileData/RawMemProfReader.cpp
296

The comment here should really have been down below before the loop. Updated the comment since we now use a pointer to avoid another lookup in the map.

319

My usage of the term "leaf" is the cause of confusion, here I identify "last non-inlined" function in the callstack as leaf. Since we will annotate the profile post-inlining, this is the function where the records will be placed. In the example above, bar will have the memprof record entry. Let me know how you want me to reword this to make it clearer.

Also a note about ordering of entries in the callstack which represented by a list of MemProfRecord::Frames. For example

qux() { malloc(); }
inline foo() { qux(); }
bar() { foo(); }

is represented as:

bar (isInline: false); foo (isInline: true); qux (isInline:false)

as opposed to the expectation:

foo (isInline: true); bar (isInline: false); qux (isInline:false)

Based on your comment, this may be surprising since foo is the "leaf" function. I chose this representation to have deterministic access to the function id where the memprof record is to be applied, i.e. it is always at index 0. However, now that we have a map of function ids to memprof records in the reader (previously we just had a list of memprof records) we can change the order of to be more intuitive (i.e. on L339 we will append the frames in reverse). Wdyt?

llvm/test/tools/llvm-profdata/memprof-basic.test
50

The input for this test case only has main which calls malloc directly so there is no callsite data to add in this profile. I'll follow up with another llvm-profdata test which has some inlined allocation calls (also see discussion in leaf function comment).

llvm/unittests/ProfileData/InstrProfTest.cpp
303–304

Meant to remove it altogether since the failure path is no longer exercised by this test. The error is now guarded by an assert in mapRawProfileToRecords L331.

llvm/unittests/ProfileData/MemProfTest.cpp
188

Sure, added the check below.

tejohnson added inline comments.Mar 9 2022, 12:14 PM
llvm/lib/ProfileData/RawMemProfReader.cpp
319

My usage of the term "leaf" is the cause of confusion, here I identify "last non-inlined" function in the callstack as leaf. Since we will annotate the profile post-inlining, this is the function where the records will be placed. In the example above, bar will have the memprof record entry. Let me know how you want me to reword this to make it clearer.

An issue with this is that foo may have been inlined into bar before or after the point of matching. I.e. the inline frames in the profile are those inlined anywhere in the compiler: either in early inlining (which is done before typical PGO matching), or in the usual inline optimization pass (or in the rarer but supported case of an instrumented build with ThinLTO, in the LTO backend's invocation of the inliner).

So in my example, when we match we may or may not have inlined foo into bar at that point. Which raises the question of which function(s) should get the memprof data in the profile. I would argue that at the very least foo() should, even if it is marked as an inline frame (because when we match it may not yet have been inlined, and in fact based on profile info we may or may not make the same inlining decisions later in the compile either). Possibly it should also be on the non-inlined caller (and any other intervening inline frames), to make matching easier, in the case where the inline was done by the early inliner before matching.

Actually, just realized this same question affects where we attach the callsites profiles below given a set of inline frames.

Also a note about ordering of entries in the callstack which represented by a list of MemProfRecord::Frames. For example

qux() { malloc(); }
inline foo() { qux(); }
bar() { foo(); }

is represented as:

bar (isInline: false); foo (isInline: true); qux (isInline:false)

This makes sense to me since bar calls foo calls qux. Also, just a note that the isInline here assumes we only inlined foo into bar (which is suggested but not technically guaranteed by the inline keyword, and also we could have additional inlining by the optimizer).

as opposed to the expectation:

foo (isInline: true); bar (isInline: false); qux (isInline:false)

I'm not sure why this one would be the expectation? It is neither bottom up nor top down.

Based on your comment, this may be surprising since foo is the "leaf" function. I chose this representation to have deterministic access to the function id where the memprof record is to be applied, i.e. it is always at index 0. However, now that we have a map of function ids to memprof records in the reader (previously we just had a list of memprof records) we can change the order of to be more intuitive (i.e. on L339 we will append the frames in reverse). Wdyt?

snehasish planned changes to this revision.Mar 9 2022, 4:15 PM

Discussed with @tejohnson offline how we can make the matching more robust.

snehasish updated this revision to Diff 416022.Mar 16 2022, 4:26 PM

Update allocsite and callsite profile contents.

@tejohnson I've updated the logic to track and annotate additional functions with the allocation site profiles. Please take a look at the example in MemProfTest.cpp L176 to see if it matches your expectations. Thanks!

tejohnson added inline comments.Mar 17 2022, 2:07 PM
llvm/include/llvm/ProfileData/MemProf.h
259

Specify ordering of inline frames (i.e. leaf to root).

llvm/lib/ProfileData/RawMemProfReader.cpp
294

maybe replace "to each callsite location" with "to the set of callsite locations"

328

I think we decided we need to do the same thing for the callsites that you are doing below for the memprof data on allocations, and put the callsites metadata on the inline functions as well, since the inlining at the point of matching may be less than the inlining in the final instrumented binary. Although in that case in the inline functions we only need the frames in Frames up to and including that function.

e.g.

void qux() { ... }
void foo() { qux(); }
void bar() { foo(); }

If instrumented binary inlined foo -> bar (but not qux->foo), Frames for the address in bar would include:
foo:0:13 (IsInlineFrame = true)
bar:0:13 (IsInlineFrame = false)

and we would want:

foo:

CallSites:
   [ foo:0:13 ]

bar:

CallSites:
   [ foo:0:13, bar:0:13 ]
340

It would be clearer I think to have this comment just above the start of this loop, to explain what the loop is doing.

llvm/unittests/ProfileData/MemProfTest.cpp
172

Is this printing meant to be here?

178

This is similar to the case I described in my comment in mapRawProfileToRecords(), where I think we should get CallSite metadata in foo() and in xyz() below. Of course, with the "inline" keyword it is likely to be early inlined before we do matching. But in the general case the symbolized stack frame may contain inline frames from a later inlining pass, which would not have been inlined at the point of matching yet. And we can't distinguish these cases when doing the profile symbolization / generation.

snehasish updated this revision to Diff 416588.Mar 18 2022, 1:16 PM

Address comments.

snehasish updated this revision to Diff 416594.Mar 18 2022, 1:28 PM

Fix lit test and some cleanup.

Updated the patch to track callsites for all functions in the frame. PTAL, thanks!

This revision is now accepted and ready to land.Mar 21 2022, 10:55 AM

With this patch on an internal binary, the time taken to merge indexed and raw profile increases to 85s (up from 48s in D120430). The zip profile size increases to ~520M (up from 247M in D118653). I'll look into de-duplicating the inline storage of callstacks and their frames in followup patches. Thanks for the review!

This revision was landed with ongoing or failed builds.Mar 21 2022, 1:58 PM
This revision was automatically updated to reflect the committed changes.
hctim added a subscriber: hctim.Mar 21 2022, 3:57 PM
hctim added inline comments.
llvm/include/llvm/ProfileData/MemProf.h
236

Hi, looks like this patch causes the MemorySanitizer build bot to fail:

See instructions here on how to reproduce:

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90
FAIL: LLVM-Unit :: ProfileData/./ProfileDataTests/MemProf.RecordSerializationRoundTrip (78166 of 82953)
******************** TEST 'LLVM-Unit :: ProfileData/./ProfileDataTests/MemProf.RecordSerializationRoundTrip' FAILED ********************
Script:
--
/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/unittests/ProfileData/./ProfileDataTests --gtest_filter=MemProf.RecordSerializationRoundTrip
--
Note: Google Test filter = MemProf.RecordSerializationRoundTrip
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from MemProf
[ RUN      ] MemProf.RecordSerializationRoundTrip
==47398==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x4ae582 in llvm::memprof::MemProfRecord::AllocationInfo::operator==(llvm::memprof::MemProfRecord::AllocationInfo const&) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/ProfileData/MemProf.h:236:11
    #1 0x4adb6a in operator!= /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/ProfileData/MemProf.h:250:15
    #2 0x4adb6a in llvm::memprof::MemProfRecord::operator==(llvm::memprof::MemProfRecord const&) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/ProfileData/MemProf.h:319:25
    #3 0x589fc6 in (anonymous namespace)::EqualsRecordMatcherP<llvm::memprof::MemProfRecord>::gmock_Impl<llvm::memprof::MemProfRecord const&>::MatchAndExplain(llvm::memprof::MemProfRecord const&, testing::MatchResultListener*) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/unittests/ProfileData/MemProfTest.cpp:111:11
    #4 0x585c50 in MatchAndExplain /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/unittest/googletest/include/gtest/gtest-matchers.h:264:19
    #5 0x585c50 in Matches /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/unittest/googletest/include/gtest/gtest-matchers.h:270:12
    #6 0x585c50 in operator()<llvm::memprof::MemProfRecord> /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/unittest/googlemock/include/gmock/gmock-matchers.h:1327:17
    #7 0x585c50 in (anonymous namespace)::MemProf_RecordSerializationRoundTrip_Test::TestBody() /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/unittests/ProfileData/MemProfTest.cpp:300:3
    #8 0xa56c69 in HandleExceptionsInMethodIfSupported<testing::Test, void> /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc
    #9 0xa56c69 in testing::Test::Run() /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2508:5
    #10 0xa5ba53 in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2684:11
    #11 0xa5d14e in testing::TestSuite::Run() /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2816:28
    #12 0xa8be08 in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:5338:44
    #13 0xa8a443 in HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc
    #14 0xa8a443 in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:4925:10
    #15 0xa3b70e in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/unittest/googletest/include/gtest/gtest.h:2472:46
    #16 0xa3b70e in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/utils/unittest/UnitTestMain/TestMain.cpp:55:10
    #17 0x7fbb240a409a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) (BuildId: 18b9a9a8c523e5cfe5b5d946d605d09242f09798)
    #18 0x379e39 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/unittests/ProfileData/ProfileDataTests+0x379e39)