This is an archive of the discontinued LLVM Phabricator instance.

[Coverage] Load records immediately
ClosedPublic

Authored by tunz on Mar 22 2021, 2:17 PM.

Details

Summary

The current implementation keeps buffers generated for each object file
until it completes loading of all files. This approach requires a lot of memory
if there are a lot of huge object files. Thus, make it to load coverage records
immediately rather than waiting for other binaries to be loaded.

This reduces memory usage of llvm-cov from >128GB to 5GB when
loading Chromium binaries in Windows.

Diff Detail

Event Timeline

tunz created this revision.Mar 22 2021, 2:17 PM
tunz requested review of this revision.Mar 22 2021, 2:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2021, 2:17 PM
tunz edited the summary of this revision. (Show Details)Mar 22 2021, 2:24 PM
vsk added a comment.Mar 22 2021, 2:32 PM

Have you determined why opening a readonly binary costs memory?

I think I see the problem, have you tried disabling the RequiresNullTerminator bit?

diff --git a/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp b/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp
index cdbcde50d33a..f34e29397b3a 100644
--- a/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp
+++ b/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp
@@ -318,11 +318,12 @@ CoverageMapping::load(ArrayRef<StringRef> ObjectFilenames,
   auto ProfileReader = std::move(ProfileReaderOrErr.get());

   SmallVector<std::unique_ptr<CoverageMappingReader>, 4> Readers;
   SmallVector<std::unique_ptr<MemoryBuffer>, 4> Buffers;
   for (const auto &File : llvm::enumerate(ObjectFilenames)) {
-    auto CovMappingBufOrErr = MemoryBuffer::getFileOrSTDIN(File.value());
+    auto CovMappingBufOrErr = MemoryBuffer::getFileOrSTDIN(
+        File.value(), /*FileSize=*/-1, /*RequiresNullTerminator=*/false);
     if (std::error_code EC = CovMappingBufOrErr.getError())
       return errorCodeToError(EC);
     StringRef Arch = Arches.empty() ? StringRef() : Arches[File.index()];
     MemoryBufferRef CovMappingBufRef =
         CovMappingBufOrErr.get()->getMemBufferRef();

This should prevent the MemoryBuffer API from allocating a buffer for the object file, just to add a '\0' at the end. If the OS can just mmap() the buffer readonly, that should be essentially free.

llvm/include/llvm/ProfileData/Coverage/CoverageMapping.h
593

Public APIs shouldn't invite the user to make invalid state representable (https://hugotunius.se/2020/05/16/making-invalid-state-unrepresentable.html).

vsk added inline comments.Mar 22 2021, 2:36 PM
llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp
1064 ↗(On Diff #332424)

Do the thin archive llvm-cov tests pass after deleting this? How?

tunz added a comment.Mar 22 2021, 3:00 PM
In D99110#2642603, @vsk wrote:

Have you determined why opening a readonly binary costs memory?

I think I see the problem, have you tried disabling the RequiresNullTerminator bit?

diff --git a/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp b/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp
index cdbcde50d33a..f34e29397b3a 100644
--- a/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp
+++ b/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp
@@ -318,11 +318,12 @@ CoverageMapping::load(ArrayRef<StringRef> ObjectFilenames,
   auto ProfileReader = std::move(ProfileReaderOrErr.get());

   SmallVector<std::unique_ptr<CoverageMappingReader>, 4> Readers;
   SmallVector<std::unique_ptr<MemoryBuffer>, 4> Buffers;
   for (const auto &File : llvm::enumerate(ObjectFilenames)) {
-    auto CovMappingBufOrErr = MemoryBuffer::getFileOrSTDIN(File.value());
+    auto CovMappingBufOrErr = MemoryBuffer::getFileOrSTDIN(
+        File.value(), /*FileSize=*/-1, /*RequiresNullTerminator=*/false);
     if (std::error_code EC = CovMappingBufOrErr.getError())
       return errorCodeToError(EC);
     StringRef Arch = Arches.empty() ? StringRef() : Arches[File.index()];
     MemoryBufferRef CovMappingBufRef =
         CovMappingBufOrErr.get()->getMemBufferRef();

This should prevent the MemoryBuffer API from allocating a buffer for the object file, just to add a '\0' at the end. If the OS can just mmap() the buffer readonly, that should be essentially free.

Yes, I have tried disabling the RequiresNullTerminator bit, but it didn't help. Actually, the readonly buffer itself was fine. There are some places reading and storing data from them.
For example, https://github.com/llvm/llvm-project/blob/95f7f7c21b47f1739e4a901d428af970b7d7c0b9/llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp#L980

std::string FuncRecords;
...
for (SectionRef Section : *CoverageRecordsSections) {
  auto CoverageRecordsOrErr = Section.getContents();
  if (!CoverageRecordsOrErr)
    return CoverageRecordsOrErr.takeError();
  FuncRecords += CoverageRecordsOrErr.get();
  while (FuncRecords.size() % 8 != 0)
    FuncRecords += '\0';
}

FuncRecords is stored as a new string. When I removed this part, I could save about 10-20% of memory. And, there are some other similar things such as uncompressed data. So, I just came back to this approach.

vsk added a comment.Mar 22 2021, 3:54 PM

This should prevent the MemoryBuffer API from allocating a buffer for the object file, just to add a '\0' at the end. If the OS can just mmap() the buffer readonly, that should be essentially free.

Yes, I have tried disabling the RequiresNullTerminator bit, but it didn't help. Actually, the readonly buffer itself was fine. There are some places reading and storing data from them.

I'm not sure where the memory problem is. From your patch summary, it sounded like it was from the open objectfile buffers, but it sounds like that's not the case. This is confusing by itself, because without the RequiresNullTerminator change, I'd expect a large part of the memory cost to show up here.

It would help if you could pause the program while it's at peak RSS and share the results from a memory profiler, so we can see what's taking up the most space on the heap.

std::string FuncRecords;
...
for (SectionRef Section : *CoverageRecordsSections) {
  auto CoverageRecordsOrErr = Section.getContents();
  if (!CoverageRecordsOrErr)
    return CoverageRecordsOrErr.takeError();
  FuncRecords += CoverageRecordsOrErr.get();
  while (FuncRecords.size() % 8 != 0)
    FuncRecords += '\0';
}

FuncRecords is stored as a new string. When I removed this part, I could save about 10-20% of memory. And, there are some other similar things such as uncompressed data. So, I just came back to this approach.

I don't understand what you replaced this with. For some context: this logic is only needed for non-MachO (non-Darwin) file formats, because the use of comdats splits function records into multiple sections. See 80cd518b809dd. Looking at this some more, I admit it's possible to optimize, but I don't see how this patch does it.

tunz marked 2 inline comments as done.Mar 22 2021, 4:15 PM
tunz added inline comments.
llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp
1064 ↗(On Diff #332424)

It was my mistake. The ownership should be transferred to the caller. The test passed in my local machine just because the freed value was not overwritten.

tunz updated this revision to Diff 332467.Mar 22 2021, 4:16 PM
tunz marked an inline comment as done.

Addressing comments

tunz added a comment.Mar 22 2021, 4:43 PM
std::string FuncRecords;
...
for (SectionRef Section : *CoverageRecordsSections) {
  auto CoverageRecordsOrErr = Section.getContents();
  if (!CoverageRecordsOrErr)
    return CoverageRecordsOrErr.takeError();
  FuncRecords += CoverageRecordsOrErr.get();
  while (FuncRecords.size() % 8 != 0)
    FuncRecords += '\0';
}

FuncRecords is stored as a new string. When I removed this part, I could save about 10-20% of memory. And, there are some other similar things such as uncompressed data. So, I just came back to this approach.

I don't understand what you replaced this with. For some context: this logic is only needed for non-MachO (non-Darwin) file formats, because the use of comdats splits function records into multiple sections. See 80cd518b809dd. Looking at this some more, I admit it's possible to optimize, but I don't see how this patch does it.

The patch I did was replacing std::string to std::vector<StringRef> so that it can use the existing buffer rather than creating a new buffer. This is just one example of the increasing memory behind the readonly buffer. I've also observed some other places increasing the memory such as RawCoverageFilenamesReader.

I was measuring coverage of 500~600 binaries whose total binary size is about 100GB. So, I thought it kind of makes sense that it's taking quite a lot of memory since it loads all of them and processes them. I'll try to profile the memory.

tunz edited the summary of this revision. (Show Details)Mar 22 2021, 4:45 PM
tunz added a comment.Mar 23 2021, 9:33 AM

Here is the memory profile. I couldn't take a snapshot at the peak RSS, so I took a snapshot at the first 6GB.
The followings are the top 5 heap objects and their callstack.

  • [1] void (size: 2.5GB) (count: 8,889,503)
  • [2] char[] (size: 1GB) (count: 2)
  • [3] char[] (size: 777MB) (count: 11)
  • [4] llvm-cov.exe!llvm::coverage::BinaryCoverageReader::ProfileMappingRecord[] (size: 300MB) (count: 10)
  • [5] llvm-cov.exe!std::pair<unsigned __int64,llvm::StringRef>[] (size: 278MB) (count: 11)

[1]

llvm-cov.exe!operator new() - Line 35
llvm-cov.exe!llvm::allocate_buffer() - Line 21
llvm-cov.exe!llvm::MallocAllocator::Allocate() - Line 86
llvm-cov.exe!llvm::StringMapEntry<enum llvm::NoneType>::Create<llvm::MallocAllocator>() - Line 100
llvm-cov.exe!llvm::StringMap<enum llvm::NoneType,llvm::MallocAllocator>::try_emplace<>() - Line 324
llvm-cov.exe!llvm::StringSet<llvm::MallocAllocator>::insert() - Line 34
llvm-cov.exe!llvm::InstrProfSymtab::addFuncName() - Line 483
llvm-cov.exe!llvm::readPGOFuncNameStrings() - Line 494
llvm-cov.exe!llvm::InstrProfSymtab::create() - Line 538
llvm-cov.exe!`anonymous namespace'::CovMapFuncRecordReader::get<unsigned __int64,1>() - Line 751
llvm-cov.exe!readCoverageMappingData<unsigned __int64,1>() - Line 788
llvm-cov.exe!llvm::coverage::BinaryCoverageReader::createCoverageReaderFromBuffer() - Line 837
llvm-cov.exe!loadBinaryFormat() - Line 986

or

llvm-cov.exe!llvm::StringMapImpl::RehashTable() - Line 219
llvm-cov.exe!llvm::readPGOFuncNameStrings() - Line 494

[2]

llvm-cov.exe!operator new() - Line 30
llvm-cov.exe!llvm::WritableMemoryBuffer::getNewUninitMemBuffer() - Line 298
llvm-cov.exe!getOpenFileImpl<llvm::MemoryBuffer>() - Line 470
llvm-cov.exe!getFileAux<llvm::MemoryBuffer>() - Line 268
llvm-cov.exe!llvm::MemoryBuffer::getFile() - Line 247
llvm-cov.exe!llvm::MemoryBuffer::getFileOrSTDIN() - Line 151
llvm-cov.exe!llvm::coverage::CoverageMapping::load() - Line 324

[3]

llvm-cov.exe!operator new() - Line 35
llvm-cov.exe!std::_Default_allocate_traits::_Allocate() - Line 78
llvm-cov.exe!std::_Allocate_manually_vector_aligned<std::_Default_allocate_traits>() - Line 120
llvm-cov.exe!std::_Allocate<16,std::_Default_allocate_traits,0>() - Line 201
llvm-cov.exe!std::allocator<char>::allocate() - Line 815
llvm-cov.exe!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::_Reallocate_grow_by<<lambda_65e615be2a453ca0576c979606f46740>,char const *,unsigned __int64>() - Line 4341
llvm-cov.exe!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::append() - Line 2955
llvm-cov.exe!llvm::operator+=() - Line 924
llvm-cov.exe!loadBinaryFormat() - Line 981

[4]

llvm-cov.exe!operator new() - Line 35
llvm-cov.exe!std::_Allocate_manually_vector_aligned<std::_Default_allocate_traits>() - Line 120
llvm-cov.exe!`anonymous namespace'::VersionedCovMapFuncRecordReader<5,unsigned __int64,1>::insertFunctionRecordIfNeeded() - Line 565
llvm-cov.exe!`anonymous namespace'::VersionedCovMapFuncRecordReader<5,unsigned __int64,1>::readFunctionRecords() - Line 722
llvm-cov.exe!readCoverageMappingData<unsigned __int64,1>() - Line 810
llvm-cov.exe!llvm::coverage::BinaryCoverageReader::createCoverageReaderFromBuffer() - Line 838
llvm-cov.exe!loadBinaryFormat() - Line 986

[5]

llvm-cov.exe!operator new() - Line 35
llvm-cov.exe!std::_Allocate_manually_vector_aligned<std::_Default_allocate_traits>() - Line 120
llvm-cov.exe!std::vector<std::pair<unsigned __int64,llvm::StringRef>,std::allocator<std::pair<unsigned __int64,llvm::StringRef> > >::_Emplace_reallocate<std::pair<unsigned __int64,llvm::StringRef> >() - Line 749
llvm-cov.exe!llvm::readPGOFuncNameStrings() - Line 494
llvm-cov.exe!`anonymous namespace'::CovMapFuncRecordReader::get<unsigned __int64,1>() - Line 751
llvm-cov.exe!readCoverageMappingData<unsigned __int64,1>() - Line 788
llvm-cov.exe!llvm::coverage::BinaryCoverageReader::createCoverageReaderFromBuffer() - Line 838
llvm-cov.exe!loadBinaryFormat() - Line 986

Seems like the RequiresNullTerminator setting is one of the causes [2]. But, only a few binaries were affected by the RequiresNullTerminator and were not using mmap (VirtualAlloc) in my case when I tested with all the binaries. So, it was not a main issue for me, but I also agree that it's something we want to fix.

tunz added a comment.Mar 23 2021, 10:14 AM

In summary, there are three places allocating the memories: PGOFuncNames (string set), RequiresNullTerminator (writable file buffer), FuncRecords (string).
Since I don't fully understand the internals of profile data yet, I'm not sure if the high memory usage of PGFuncNames is the expected behavior, but anyway this change will fix not to keep those memories owned by Readers and Buffers.

vsk added a comment.Mar 23 2021, 10:48 AM

Thanks for bearing with me :). The memory profile is really helpful.

Based on the data, I think you should go ahead and fold in the /*RequiresNullTerminator=*/false change (looks like another 1gb savings).

Replacing std::string FuncRecords sounds more involved, that can happen in a follow up patch if necessary.

llvm/lib/ProfileData/Coverage/CoverageMapping.cpp
284–286

Please add a comment here explaining this this memory optimization works by shortening the lifetimes of CoverageMappingReader instances.

288

This should accept a CoverageMapping &, that clearly conveys that Coverage cannot be null.

292

Since this returns Error, the std::move(E) now triggers -Wpessimizing-move, so you can just drop it. I guess it was needed earlier so the implicit Expected(Error &&) constructor could be invoked.

294–295

Ditto.

353

Simply passing *Coverage works.

tunz updated this revision to Diff 332732.Mar 23 2021, 11:09 AM
tunz marked 4 inline comments as done.

Addressing comments

tunz marked an inline comment as done.Mar 23 2021, 11:09 AM

Thanks for the review :)

vsk accepted this revision.Mar 23 2021, 11:43 AM

Lgtm, thanks.

This revision is now accepted and ready to land.Mar 23 2021, 11:43 AM
tunz added a comment.Mar 23 2021, 3:01 PM

Thanks. I don't have permission to commit. Could you land this on behalf of me?

This revision was automatically updated to reflect the committed changes.
RKSimon added inline comments.
llvm/lib/ProfileData/Coverage/CoverageMapping.cpp
332

@tunz There is a integer/bool mismatch warning for the "FileSize" argument - in fact the signature for the method is:

ErrorOr<std::unique_ptr<MemoryBuffer>>
MemoryBuffer::getFileOrSTDIN(const Twine &Filename, bool IsText, bool RequiresNullTerminator);

so "FileSize" makes no sense - please can you take a look?

The api has changed in https://github.com/llvm/llvm-project/commit/c83cd8feef7eb8319131d968bb8c94fdc8dbb6a6 2 hours ago. now I think it should be /*IsText=*/false.
@abhina.sreeskantharajan Can you update this as well? or, I can update it today later or tomorrow. also I don't have permission to commit.

The api has changed in https://github.com/llvm/llvm-project/commit/c83cd8feef7eb8319131d968bb8c94fdc8dbb6a6 2 hours ago. now I think it should be /*IsText=*/false.
@abhina.sreeskantharajan Can you update this as well? or, I can update it today later or tomorrow. also I don't have permission to commit.

Sure, I'll create a patch for this.

I created a patch to reorder the args, sorry for missing this case! https://reviews.llvm.org/D99349