This is an archive of the discontinued LLVM Phabricator instance.

[memprof] Extend the index prof format to include memory profiles.
ClosedPublic

Authored by snehasish on Jan 31 2022, 1:51 PM.

Details

Summary

This patch adds support for optional memory profile information to be
included with and indexed profile. The indexed profile header adds a new
field which points to the offset of the memory profile section (if
present) in the indexed profile. For users who do not utilize this
feature the only overhead is a 64-bit offset in the header.

The memory profile section contains (1) profile metadata describing the
information recorded for each entry (2) an on-disk hashtable containing
the profile records indexed via llvm::md5(function_name). We chose to
introduce a separate hash table instead of the existing one since the
indexing for the instrumented fdo hash table is based on a CFG hash
which itself is perturbed by memprof instrumentation.

Diff Detail

Event Timeline

snehasish created this revision.Jan 31 2022, 1:51 PM
snehasish requested review of this revision.Jan 31 2022, 1:51 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 31 2022, 1:51 PM
Herald added subscribers: llvm-commits, Restricted Project. · View Herald Transcript
snehasish updated this revision to Diff 404734.Jan 31 2022, 2:34 PM

Rebase and fix error handling in llvm-profdata.

snehasish planned changes to this revision.Feb 1 2022, 7:54 PM

Found an issue with llvm-profdata merge. I'll update the patch with the fix and more tests.

snehasish updated this revision to Diff 405732.Feb 3 2022, 11:53 AM

Changes:

  • Removed summary for memprof for now. We will add a custom summary later.
  • Fixed issue, added llvm-profdata merge lit test.
snehasish updated this revision to Diff 405738.Feb 3 2022, 12:05 PM

Remove some more leftover code from reverted summary changes.

snehasish updated this revision to Diff 405740.Feb 3 2022, 12:12 PM

Add static checks to header reading code for v8.

Updated the patch with a llvm-profdata test and some cleanup. PTAL, thanks!

tejohnson added inline comments.Feb 3 2022, 4:01 PM
llvm/include/llvm/ProfileData/InstrProfData.inc
665

I noticed this change and the one below are not in compiler-rt/include/profile/InstrProfData.inc, I assume these two files should be identical?

llvm/include/llvm/ProfileData/InstrProfReader.h
621

typo: recrods

llvm/include/llvm/ProfileData/MemProf.h
65

Note the return value?

143

You could just return false if they are unequal, then return true if we fall through, to avoid extra comparisons

275

expected that second parameter unused?

279

ditto - will we get unused warnings about N?

331

expected that last parameter not used (here and in EmitData)?

llvm/lib/ProfileData/InstrProfReader.cpp
1042

Maybe include the hash value in the error?

llvm/lib/ProfileData/InstrProfWriter.cpp
267

What is the intention of the code - this will overwrite Key for every non-inline frame. Should it break after Key is assigned the first time? Or do we want the last one, in which case you could walk the CallStack in reverse order and again break when Key is set for the first time.

359

"these fields"

371

s/spec/space/ ?

llvm/tools/llvm-profdata/llvm-profdata.cpp
255

When would this fail?

davidxl added inline comments.Feb 4 2022, 9:23 PM
llvm/include/llvm/ProfileData/InstrProfData.inc
665

yes.

llvm/include/llvm/ProfileData/InstrProfReader.h
624

single line?

llvm/include/llvm/ProfileData/InstrProfWriter.h
44

Why not StringMap?

66

Should there be a merge interface like the addRecord for Edge profile data above?

llvm/include/llvm/ProfileData/MemProf.h
139

Document type -- MD5 Hash

llvm/test/tools/llvm-profdata/memprof-merge.test
40

Requires text format reader and writer for memprof

tejohnson added inline comments.Feb 5 2022, 1:42 PM
llvm/include/llvm/ProfileData/MemProf.h
139

Can we use GlobalValue::GUID as the type (which is uint64_t)? That's already used elsewhere in this directory (at least in the SampleProfiler). If so, then the later md5 hash computation can call the static helper Function::getGUID on the string (which is just a wrapper around MD5Hash but matches the type name).

snehasish updated this revision to Diff 406629.Feb 7 2022, 3:26 PM

Address comments.

snehasish updated this revision to Diff 406635.Feb 7 2022, 3:45 PM
snehasish marked 8 inline comments as done.

Remove whitespace.

Thanks for the detailed review. PTAL!

llvm/include/llvm/ProfileData/InstrProfReader.h
624

clang-format puts it on a separate line since the length is 85 chars.

llvm/include/llvm/ProfileData/InstrProfWriter.h
44

The memprof data is indexed via uint64_t obtained from llvm::md5(FunctionName). The instrprof data is first indexed by function name as string and then via the cfg hash thus uses a StringMap.

66

The existing mergeRecordsFromWriter function defined below has been extended to merge memprof data if necessary. See the changes in lib/ProfileData/InstrProfWriter.cpp:L287.

llvm/include/llvm/ProfileData/MemProf.h
65

I decided to rewrite this function to accept raw_ostream& OS instead of char *Ptr. This allowed me to remove a memory allocation TODO in a higher up callee. The function no longer returns a value.

139

Documented the frame and all the members.

Changed the type to GlobalValue::GUID. While I don't expect the type to change, I added a static assert in the writer to ensure that the writer and reader code does not bitrot in case it does.

275

Yes, added /*Unused*/ for everywhere this is applicable to make it clear.

279

No warnings for unused parameters AFAICT.

llvm/lib/ProfileData/InstrProfWriter.cpp
267

It is the latter. I changed the loop to iterate in reverse and break early.

llvm/test/tools/llvm-profdata/memprof-merge.test
40

We have a text format writer, it prints it to YAML right now. The reader from YAML has not been implemented yet. This comment however alludes to the lack of the iterator interface to the memprof data. That requires a little bit of code which I wanted to keep separate to minimize the size of this patch.

llvm/tools/llvm-profdata/llvm-profdata.cpp
255

For now, when clang frontend profiles are merged with memprof. Updated the comment to clarify.

tejohnson accepted this revision.Feb 8 2022, 9:52 AM

lgtm. Slight preference for using Function::getGUID instead of llvm::MD5Hash to obfuscate the details of what a GUID is, but ok as is if you want to be explicit.

This revision is now accepted and ready to land.Feb 8 2022, 9:52 AM
snehasish updated this revision to Diff 406990.Feb 8 2022, 3:04 PM

Rebase after landing D116784.
Use Function::getGUID instead of llvm::MD5 directly.

Seems like a good idea to keep it consistent so I updated the code and test to use Function::getGUID. Added a comment to note that it is effectively llvm::md5. Thanks!

snehasish updated this revision to Diff 407211.Feb 9 2022, 10:46 AM

Update PortableMemInfoBlock::deserialize to remove ub when reading unaligned 32b types.

I tried this out on an internal workload. Sharing some characteristics of the new indexed profile containing IR-level instr profiling data and memprof data:

Raw file sizes (directly written out from the runtime)
Instr prof: 264M
MemProf: 215M
Instr prof zip: 80M
MemProf zip: 45M

Indexed format file sizes (after running llvm-profdata merge)
Indexed instr prof data only: 505M
Indexed instr prof data only zip: 139M
Indexed instr + memprof: 1010M
Indexed instr + memprof zip: 247M

llvm-profdata merge runtime
Indexed instr prof data only: 10s
Indexed instr + memprof data: 1m 53s

Looking into the profile using perf record -e cycles:up, I found that (unsurprisingly) a significant amount is spent in symbolization. We can significantly reduce the runtime here with caching of PC -> Frame symbolization.

llvm-profdata merge peak memory usage
Indexed instr prof data only: 1.4G
Indexed instr + memprof data: 8.16G

Looking into the profile of memory allocations using massif, I saw that it is from DwarfContext::create in the initialize method. This is a little concerning so I'll have to investigate further how we can reduce the reduce the overhead. The debug info generated for this binary is the same as what is used for Sample FDO (-gmlt and -fdebug-info-for-profiling). The debug information was stored separately using -gsplit-dwarf (fission).

I'll follow up after submitting this patch to improve upon the runtime and peak memory usage.

davidxl accepted this revision.Feb 9 2022, 4:17 PM

lgtm

snehasish updated this revision to Diff 407699.Feb 10 2022, 3:16 PM

Rebase and cleanup.

  • Use 0x40 for MemProf in InstrProf.h
  • Sync minor differences MemProfData.inc.
snehasish updated this revision to Diff 408466.Feb 14 2022, 9:44 AM

Rebase after moving some PortableMemInfoBlock changes to D117256.

This revision was landed with ongoing or failed builds.Feb 14 2022, 9:55 AM
This revision was automatically updated to reflect the committed changes.