This is an archive of the discontinued LLVM Phabricator instance.

[dsymutil] Introduce a new CachedBinaryHolder
ClosedPublic

Authored by JDevlieghere on Jun 22 2018, 11:41 AM.

Details

Summary

The original binary holder has an optimization where it caches a static
library (archive) between consecutive calls to GetObjects. However, the
actual memory buffer wasn't cached between calls.

This made sense when dsymutil was processing objects one after each
other, but when processing them in parallel, several binaries have to be
in memory at the same time. For this reason, every link context
contained a binary holder.

Having one binary holder per context is problematic, because the same
static archive was cached for every object file. Luckily, when the file
is mmap'ed, this was only costing us virtual memory.

This patch introduces a new BinaryHolder variant that is fully cached,
for all the object files it load, as well as the static archives. This
way, we don't have to give up on this optimization of bypassing the
file system. Currently I only updated the uses in the DwarfLinker, but
my goal is to phase out the exisiting BinaryHolder (let's call it the
LegacyBinaryHolder ;-) in favor of this one.

Diff Detail

Repository
rL LLVM

Event Timeline

JDevlieghere created this revision.Jun 22 2018, 11:41 AM

I want to do some more extensive testing before landing this, but I think it's at a point where it is ready to be reviewed.

aprantl added inline comments.Jun 22 2018, 2:47 PM
llvm/tools/dsymutil/BinaryHolder.h
30 ↗(On Diff #152526)

Could you add a Doxygen comment explaining what this class is for?

36 ↗(On Diff #152526)

comment?

43 ↗(On Diff #152526)

comment?

44 ↗(On Diff #152526)

Could this be a std::vector<object::ObjectFile> instead?

58 ↗(On Diff #152526)

As a casual reader it's no-obvious to me what a "derived version" could be.

87 ↗(On Diff #152526)

comment? :-)

friss added a comment.Jun 22 2018, 3:44 PM

While you're rewriting this part of the code, why not make the new cache actually thread-safe and share it between the threads handling different architectures. They will open the same files. This way you could also use it in loadClangModule.

In the initial design, the BinaryHolder would have only one file open at a time, now we're keeping them alive basically forever. Any concerns with this?

llvm/tools/dsymutil/DwarfLinker.cpp
3998–3999 ↗(On Diff #152526)

It's not clear from reading the comment why it's not thread-safe here. I *think* I know why (this is called by the cloning thread, which is not the thread updating the BinaryHolder), but it would be good to be explicit about it so that others can understand it too.
It would be good to document in another comment somewhere why the other CachedBinaryHolder is used in a thread-safe way. I suppose there's only ever one thread writing to it, but that's not stated anywhere. It would be even better to have a dynamic check that this is actually the case.

  • Shared CachedBinaryHolder between architectures as suggested by Fred.
  • Address other CR feedback
JDevlieghere marked 7 inline comments as done.Jun 23 2018, 7:27 AM
JDevlieghere added inline comments.
llvm/tools/dsymutil/BinaryHolder.h
44 ↗(On Diff #152526)

No, we move it into the unique pointer from the loading logic.

llvm/tools/dsymutil/DwarfLinker.cpp
3998–3999 ↗(On Diff #152526)

You are indeed correct. I didn't add the comment as I implemented your suggestion and made the cached binary holder thread safe.

JDevlieghere marked 2 inline comments as done.
  • Fallback to loading file as an object if loading of archive fails.
friss added a comment.Jun 27 2018, 3:33 PM

Why not transition completely to the new implementation and remove the old code in one step?

Should we have a limited number of files that we keep open? Some systems might run with a low number of allowed open files per-process.

This code is supposed to handle a lot of weird cases, can you make sure that all of this still works:

  • fat object files
  • thin archive containing fat object-files
  • fat archive containing thin objectfiles

(I think the current code handles all of them, but I might be wrong. Please double-check)

llvm/tools/dsymutil/BinaryHolder.cpp
254–259 ↗(On Diff #152869)

Do we need a clear() method as this object basically never dies?

Why not transition completely to the new implementation and remove the old code in one step?

I wanted to do that in a separate patch: https://reviews.llvm.org/D48770

Should we have a limited number of files that we keep open? Some systems might run with a low number of allowed open files per-process.

The files are closed immediately after being read by the MemoryBuffer. While we keep the files in memory the FD is already closed.

This code is supposed to handle a lot of weird cases, can you make sure that all of this still works:

  • fat object files
  • thin archive containing fat object-files
  • fat archive containing thin objectfiles

(I think the current code handles all of them, but I might be wrong. Please double-check)

This logic didn't change, all of them are still supported. (I paid special attention to this while doing the refactoring)

friss accepted this revision.Jun 29 2018, 9:04 AM

This looks good to me, modulo my question in D48770.

This revision is now accepted and ready to land.Jun 29 2018, 9:04 AM
  • Move lifetime fix into this patch.
This revision was automatically updated to reflect the committed changes.