This is an archive of the discontinued LLVM Phabricator instance.

[MemProf] ThinLTO summary support
ClosedPublic

Authored by tejohnson on Oct 11 2022, 2:06 PM.

Details

Summary

Implements the ThinLTO summary support for memprof related metadata.

This includes support for the assembly format, and for building the
summary from IR during ModuleSummaryAnalysis.

To reduce space in both the bitcode format and the in memory index,
we do 2 things:

  1. We keep a single vector of all uniq stack id hashes, and record the index into this vector in the callsite and allocation memprof summaries.
  2. When building the combined index during the LTO link, the callsite and allocation memprof summaries are only kept on the FunctionSummary of the prevailing copy.

Diff Detail

Event Timeline

tejohnson created this revision.Oct 11 2022, 2:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2022, 2:06 PM
tejohnson requested review of this revision.Oct 11 2022, 2:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2022, 2:06 PM
tejohnson edited the summary of this revision. (Show Details)Oct 11 2022, 2:12 PM

Probably want to start with the changes in ModuleSummaryIndex.h, which describes the summary records.

snehasish added inline comments.Oct 12 2022, 10:01 AM
llvm/lib/Bitcode/Reader/BitcodeReader.cpp
7128–7129

Can you split out the type change to tuple (along with all the get<X> changes) in a separate patch and submit it as NFC. Then for this patch we will only need to extend the contents of the tuple and touch fewer files?

tejohnson added inline comments.Oct 12 2022, 10:14 AM
llvm/lib/Bitcode/Reader/BitcodeReader.cpp
7128–7129

Will do. Note that only affects this file so while it makes sense and will reduce the reviewing diffs here in BitcodeReader.cpp, it won't result in this patch touching fewer files, unfortunately.

tejohnson updated this revision to Diff 469037.Oct 19 2022, 1:50 PM

Rebase on top of BitcodeReader NFC changes (646e25d05146d2f0d517364cf78a6fa68d77bf71)

llvm/lib/Bitcode/Reader/BitcodeReader.cpp
7128–7129

Still reading to understand how things work here ...

llvm/include/llvm/IR/ModuleSummaryIndex.h
311

We can specify this in the declaration with an initializer list instead of calling push_back.

319

Can we reorder AllocInfo and MIBInfo to avoid this forward reference?

llvm/lib/Bitcode/Reader/BitcodeReader.cpp
923

Instead of unconditionally returning true, if we modify the call to IsPrevailing and check whether IsPrevailing is empty then the codegen at the callsite is a little better.

Example - Line L43-L46 in the codegen is eliminated.
https://godbolt.org/z/qfodoc3T9

llvm/lib/IR/AsmWriter.cpp
3188–3267

Can you add a comment here indicating that the AllocationType identifiers capture the profiled context behaviour. Thus "notcoldandcold" implies there are multiple contexts which reach this site, some of which are cold.

llvm/unittests/Analysis/MemoryProfileInfoTest.cpp
424

I think this will segfault the test if makeLLVMIndex fails and returns a nullptr.

439

/*guid=*/23 to make it clear what we are referring to.

snehasish added inline comments.Oct 25 2022, 5:09 PM
llvm/include/llvm/AsmParser/LLToken.h
397

Should we add a comment indicating these are used by memprof?

llvm/include/llvm/IR/ModuleSummaryIndex.h
301

Call this Clones instead of Versions to avoid confusion with AllocType Version? Also see the related comment below about introducing a separate kw_clones token.

1332

We can replace 2 lookups in the map with just one if we use insert and check the boolean indicating whether an insertion happened?

1352

Maybe call StackIds.shrink_to_fit() as well to trim the vector too?

llvm/lib/AsmParser/LLParser.cpp
9570

nit: Given the small number of items to parse using llvm::SmallVector would probably eliminate dynamic allocations?

9707

It would be great to use a separate token such as kw_clones here instead of reusing version for clarity. It caused a bit of confusion for me when reading the bitcode in the test.

llvm/lib/Bitcode/Reader/BitcodeReader.cpp
7478

Rewriting this as for(unsigned I = 1; I < Record.size; I++) would reduce the scope of I a bit more and make it readable overall I think.

Alternatively consider using an iterator for Record so we don't have to worry about indices? This might be a better approach for the code below.

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
3906

Does this need to be templated? Looks like the uses have the same signature.

L3974: Fn1 ValueInfo -> unsigned, Fn2 unsigned -> unsigned
L4434: Fn1 ValueInfo -> unsigned, Fn2 unsigned -> unsigned

3912

I think we can let the SmallVector infer the number of elements automatically unless we have a good idea of how many records we might have.

https://llvm.org/doxygen/classllvm_1_1SmallVector.html

tejohnson marked 15 inline comments as done.Oct 27 2022, 12:52 PM
tejohnson added inline comments.
llvm/lib/AsmParser/LLParser.cpp
9570

Ended up changing the type of Versions/Clones/StackIdIndices in the index and throughout.

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
3906

Yeah I don't remember why I templated it in the first place!

llvm/unittests/Analysis/MemoryProfileInfoTest.cpp
424

Inserted an ASSERT_NE with nullptr above this.

tejohnson marked 3 inline comments as done.

Address comments

Enna1 added a subscriber: Enna1.Oct 28 2022, 1:04 AM
snehasish accepted this revision.Oct 28 2022, 12:46 PM

lgtm, thanks for your patience with the slow review.

llvm/include/llvm/Bitcode/BitcodeReader.h
124–125

Can this go away now that we check before calling the functor?

llvm/lib/Bitcode/Reader/BitcodeReader.cpp
6628–6629

Is this a TODO?

Also I would write this with template specialization to avoid the need to #ifdef in the decl and the definition.

template<bool AllowNullValueInfo = false>
ModuleSummaryIndexBitcodeReader::getValueInfoFromValueId(unsigned ValueId);

Then the callsite at L7510 can just be getValueInfoFromValueId</*AllowNullValueInfo*/=true> without having a separate function signature in the debug and non-debug builds.

7487

I was thinking about using the iterator here, so it would look like the following

auto RecordIter = Record.begin();
const unsigned ValueID = *RecordIter;
const unsigned NumStackIds = *RecordIter++;
const unsigned NumVersions = *RecordIter++;
...

IMO this pattern is a little better than using an Index variable but I don't have a strong opinion if you would like to keep as is.

This revision is now accepted and ready to land.Oct 28 2022, 12:46 PM
tejohnson marked 3 inline comments as done.Nov 14 2022, 3:02 PM
tejohnson added inline comments.
llvm/include/llvm/Bitcode/BitcodeReader.h
124–125

Yes, we can make it default to nullptr as we do in the ModuleSummaryIndexBitcodeReader constructor.

llvm/lib/Bitcode/Reader/BitcodeReader.cpp
6628–6629

That was a stale note to self from an older version that didn't pass in AllowNullValueInfo and simply disabled the assert, that I forgot to clean up.

Good idea on the template! In fact, the callsite was incorrect as it was always passing the additional parameter without checking for build mode, and would have failed release builds...

tejohnson updated this revision to Diff 475290.Nov 14 2022, 3:02 PM
tejohnson marked 2 inline comments as done.

Address comments

This revision was landed with ongoing or failed builds.Nov 15 2022, 6:45 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Nov 15 2022, 7:32 AM

This doesn't build on non-linux:

http://45.33.8.238/macm1/48650/step_4.txt
http://45.33.8.238/win/70010/step_4.txt

Please take a look and revert for now if it takes a while to fix.

tejohnson reopened this revision.Nov 15 2022, 8:03 AM

I have fixes for the 2 unique bot failures I saw (one of which should fix the issues reported by @thakis). Going to upload new patch in a min, doing another round of testing before I push it upstream again.

This revision is now accepted and ready to land.Nov 15 2022, 8:03 AM
tejohnson updated this revision to Diff 475487.Nov 15 2022, 8:04 AM

Fixes for a couple of bot failures

tejohnson closed this revision.Nov 15 2022, 9:06 AM

This was resubmitted with fixes in 98ed423361de2f9dc0113a31be2aa04524489ca9. Please let me know if you still see any issues.

jmorse added a subscriber: jmorse.Nov 16 2022, 3:27 AM

Hi Teresa,

I've reverted this (and Maskray's followup) in rG5d938eb6f, as MSVC doesn't seem to be able to link it:

https://lab.llvm.org/buildbot/#/builders/123/builds/14137

LLVMAnalysis.lib(ModuleSummaryAnalysis.cpp.obj) : error LNK2019: unresolved external symbol "public: __cdecl llvm::memprof::CallStack<class llvm::MDNode,class llvm::MDOperand const *>::CallStackIterator::CallStackIterator(class llvm::MDNode const *,bool)"
LLVMAnalysis.lib(ModuleSummaryAnalysis.cpp.obj) : error LNK2019: unresolved external symbol "public: unsigned __int64 __cdecl llvm::memprof::CallStack<class llvm::MDNode,class llvm::MDOperand const *>::CallStackIterator::operator*(void)"
bin\llvm-extract.exe : fatal error LNK1120: 2 unresolved externals

I'm not completely sure why MSVC is unhappy, but it seems to have something to do with the template specialisation of CallStackIterator's constructor -- a reduced example here https://godbolt.org/z/EoEez3Wo5 doesn't generate any code with MSVC, wheras with clang and gcc it does.

Hi Teresa,

I've reverted this (and Maskray's followup) in rG5d938eb6f, as MSVC doesn't seem to be able to link it:

https://lab.llvm.org/buildbot/#/builders/123/builds/14137

LLVMAnalysis.lib(ModuleSummaryAnalysis.cpp.obj) : error LNK2019: unresolved external symbol "public: __cdecl llvm::memprof::CallStack<class llvm::MDNode,class llvm::MDOperand const *>::CallStackIterator::CallStackIterator(class llvm::MDNode const *,bool)"
LLVMAnalysis.lib(ModuleSummaryAnalysis.cpp.obj) : error LNK2019: unresolved external symbol "public: unsigned __int64 __cdecl llvm::memprof::CallStack<class llvm::MDNode,class llvm::MDOperand const *>::CallStackIterator::operator*(void)"
bin\llvm-extract.exe : fatal error LNK1120: 2 unresolved externals

I'm not completely sure why MSVC is unhappy, but it seems to have something to do with the template specialisation of CallStackIterator's constructor -- a reduced example here https://godbolt.org/z/EoEez3Wo5 doesn't generate any code with MSVC, wheras with clang and gcc it does.

Sorry, I had missed that bot failure. Hmm, not sure what is wrong with the specialization, but thanks for the reduced example in godbolt, will use that to test a fix.

Hi Teresa,

I've reverted this (and Maskray's followup) in rG5d938eb6f, as MSVC doesn't seem to be able to link it:

https://lab.llvm.org/buildbot/#/builders/123/builds/14137

LLVMAnalysis.lib(ModuleSummaryAnalysis.cpp.obj) : error LNK2019: unresolved external symbol "public: __cdecl llvm::memprof::CallStack<class llvm::MDNode,class llvm::MDOperand const *>::CallStackIterator::CallStackIterator(class llvm::MDNode const *,bool)"
LLVMAnalysis.lib(ModuleSummaryAnalysis.cpp.obj) : error LNK2019: unresolved external symbol "public: unsigned __int64 __cdecl llvm::memprof::CallStack<class llvm::MDNode,class llvm::MDOperand const *>::CallStackIterator::operator*(void)"
bin\llvm-extract.exe : fatal error LNK1120: 2 unresolved externals

I'm not completely sure why MSVC is unhappy, but it seems to have something to do with the template specialisation of CallStackIterator's constructor -- a reduced example here https://godbolt.org/z/EoEez3Wo5 doesn't generate any code with MSVC, wheras with clang and gcc it does.

Sorry, I had missed that bot failure. Hmm, not sure what is wrong with the specialization, but thanks for the reduced example in godbolt, will use that to test a fix.

MSVC is happy if I define the unspecialized primary implementation outside the class: https://godbolt.org/z/h4bd568ba

Will incorporate that change before resubmitting.

tejohnson reopened this revision.Nov 16 2022, 7:44 AM

Tested fix in godbolt with separate source and header file too: https://godbolt.org/z/KMYKhfYW3

This revision is now accepted and ready to land.Nov 16 2022, 7:44 AM
tejohnson updated this revision to Diff 475827.Nov 16 2022, 7:45 AM

Work around MSVC template specialization issue

tejohnson updated this revision to Diff 475851.Nov 16 2022, 9:42 AM

Sigh, forgot that template function defs need to go in header...fixed

This revision was landed with ongoing or failed builds.Nov 16 2022, 9:42 AM
This revision was automatically updated to reflect the committed changes.