This is an archive of the discontinued LLVM Phabricator instance.

[clang][deps] Use MemoryBuffer in minimizing FS
ClosedPublic

Authored by jansvoboda11 on Dec 3 2021, 7:44 AM.

Details

Summary

This patch avoids unnecessarily copying contents of mmap-ed files into CachedFileSystemEntry by storing MemoryBuffer instead. The change leads to ~50% reduction of peak memory footprint when scanning LLVM+Clang via clang-scan-deps.

Depends on D115331.

Diff Detail

Event Timeline

jansvoboda11 requested review of this revision.Dec 3 2021, 7:44 AM
jansvoboda11 created this revision.
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 3 2021, 7:44 AM
jansvoboda11 added inline comments.Dec 3 2021, 7:48 AM
llvm/include/llvm/Support/SmallVectorMemoryBuffer.h
54 ↗(On Diff #391643)

I'm not happy with introducing new (hacky) constructor.

But, the best alternative I could come up with is to avoid using SmallVectorMemoryBuffer, store the SmallString with minimized contents somewhere in the filesystem and refer to it via regular MemoryBuffer. This seems needlessly complicated, so hacky constructor it is...

Side question: is the hassle with implicit null-termination (expected by Clang's lexer) worth the complications? What are the benefits anyway?

jansvoboda11 edited the summary of this revision. (Show Details)Dec 3 2021, 8:49 AM
jansvoboda11 edited the summary of this revision. (Show Details)
dexonsmith added inline comments.Dec 6 2021, 4:36 PM
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
59–62

This is a bit simpler:

Result.Contents = MinimizedFileContents.isSmall()
                    ? MemoryBuffer::getMemBufferCopy(...)
                    : std::make_unique<llvm::SmallVectorMemoryBuffer>();

and doesn't require changing SmallVectorMemoryBuffer.

Another option is:

{
  // Move to the heap and ensure null-terminated to ensure the MemoryBuffer works.
  SmallVector<char, 0> Heap(std::move(MinimizedFileContents));
  Heap.push_back(0);
  Heap.pop_back();
  Result.Contents = std::make_unique<llvm::SmallVectorMemoryBuffer>(std::move(Heap));
}

The latter seems like reasonable logic for the SmallVectorMemoryBuffer constructor though.

llvm/include/llvm/Support/SmallVectorMemoryBuffer.h
54 ↗(On Diff #391643)

It'd be better to add a RequiresNullTerminator argument to the constructor, matching the MemoryBuffer factory functions, which (if set) does the push_back/pop_back dance, and in either case passes it through to init().

SmallVectorMemoryBuffer(SmallVectorImpl<char> &&SV, bool RequiresNullTerminator = true)
    : SV(std::move(SV)), BufferName("<in-memory object>") {
  if (RequiresNullTerminator) {
    this->SV.push_back(0);
    this->SV.pop_back(0);
  }
  init(this->SV.begin(), this->SV.end(), RequiresNullTerminator);
}

If you do that, you should update the (few) existing callers to explicitly pass false. Note that RequiresNullTerminator defaults to true in all the other MemoryBuffer APIs so it'd be confusing to default it to false here.

Side question: is the hassle with implicit null-termination (expected by Clang's lexer) worth the complications? What are the benefits anyway?

Not sure whether it's a micro-optimization for getting better codegen or if it just predates StringRef, which made it much easier to reason about a byte range as a single unit (rather than tracking two separate pointers, which is more error-prone). I somewhat doubt it's worth it, which is why when I wrote the minimizer I used StringRef, but I don't have data and it'd be a lot of work to change. (The minimizer is fast enough as-is, but it does less work than the full Clang lexer. It's plausible that the optimizations matter there?)

Rebase on top of D115331.

jansvoboda11 edited the summary of this revision. (Show Details)Dec 8 2021, 4:50 AM
jansvoboda11 marked 2 inline comments as done.
jansvoboda11 added inline comments.
llvm/include/llvm/Support/SmallVectorMemoryBuffer.h
54 ↗(On Diff #391643)

You're right that exposing the RequiresNullTerminator parameter is much clearer than passing in a random lambda. I've extracted the change into D115331.

jansvoboda11 marked an inline comment as done.

Update comment

jansvoboda11 added inline comments.Dec 8 2021, 6:54 AM
clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
49

In the implementation of this function we get std::unique_ptr<llvm::vfs::File> from the underlying filesystem. When that goes out of scope, the file descriptor will be closed immediately. How did copying contents into memory prevent running out of file descriptors? @arphaman

This revision is now accepted and ready to land.Dec 8 2021, 10:12 AM
This revision was automatically updated to reflect the committed changes.