Page MenuHomePhabricator

LTO: Reduce memory consumption by creating an in-memory symbol table for InputFiles. NFCI.
ClosedPublic

Authored by pcc on Mar 24 2017, 11:14 PM.

Details

Summary

Introduce symbol table data structures that can be potentially written to
disk, have the LTO library build those data structures using temporarily
constructed modules and redirect the LTO library implementation to go through
those data structures. This allows us to remove the LLVMContext and Modules
owned by InputFile.

With this change I measured a peak memory consumption decrease from 5.4GB to
2.8GB in a no-op incremental ThinLTO link of Chromium on Linux. The impact on
memory consumption is larger in COFF linkers where we are currently forced to
materialize all metadata in order to read linker options. Peak memory consumption
linking a large piece of Chromium for Windows with full LTO and debug info decreases
from >64GB (OOM) to 15GB.

Part of PR27551.

Diff Detail

Repository
rL LLVM

Event Timeline

pcc created this revision.Mar 24 2017, 11:14 PM
mehdi_amini edited edge metadata.Mar 24 2017, 11:16 PM

With this change I measured a peak memory consumption decrease from 5.4GB to 2.8GB in a no-op incremental ThinLTO link of Chromium on Linux, and total time elapsed decreases from ~61s to ~48s.

That seems to indicate an issue where we don't release ressources early enough in the process though. We could also look into this independently of the symbol table work.

pcc added a comment.Mar 24 2017, 11:52 PM

Just a couple of notes.

llvm/lib/LTO/LTO.cpp
515 ↗(On Diff #93032)

This may appear to not be NFC but it looks like this part has already been moved to the backend.

601 ↗(On Diff #93032)

This is not actually NFC but a fix for a bug found by a colleague. I'll split it out into a separate change later.

rafael added inline comments.Mar 27 2017, 8:38 AM
lld/COFF/InputFiles.cpp
347 ↗(On Diff #93032)

These small utility predicates are awesome. Would you mind committing them first with the current implementation? It would be a nice improvement and make this patch easier to read.

llvm/include/llvm/Object/IRSymtab.h
31 ↗(On Diff #93032)

why do you use this for Symbol? If we are storing something by value it is better to just use a uint32_t.

getting phab to send email.

pcc added a subscriber: inglorion.Mar 27 2017, 12:52 PM
pcc added inline comments.
lld/COFF/InputFiles.cpp
347 ↗(On Diff #93032)

Will do

llvm/include/llvm/Object/IRSymtab.h
31 ↗(On Diff #93032)

If we eventually want to write these data structures out to files I guess we will need to specify an endianness. The plan I have in mind is:

  1. define data structures that can be potentially written to files, but use them only as an in-memory format to start with (that is what this patch does)
  2. potentially improve them in-tree (e.g. by sharing the string table between the module and the symbol table)
  3. flip the switch and start writing them to files
llvm/lib/LTO/LTO.cpp
601 ↗(On Diff #93032)

Correction: this is NFC. The bug is elsewhere. I'll let @inglorion send out a fix in due course.

pcc updated this revision to Diff 93504.Mar 30 2017, 11:01 AM
  • Introduce a symbol reader API
pcc edited the summary of this revision. (Show Details)Mar 30 2017, 11:02 AM
pcc added a comment.Mar 30 2017, 11:08 AM

I removed the performance claims from the commit message because they were based on bad measurements (too much variance, insufficient runs). My most recent measurements do show a small perf improvement of about 3%, but it may not be statistically significant, and that isn't the most important thing about this change anyway.

This looks good to me. Since it is a big addition please make sure Mehdi or Teresa LGTM it too.

Thanks!

I'm fine with the approach. I didn't review in great details but I can trust @rafael.

llvm/include/llvm/Object/IRSymtab.h
15 ↗(On Diff #93032)

I think it should be mentioned that the table contains the information for *multiple* modules. This is non-intuitive when someone think about a "symbol table for IR".

tejohnson edited edge metadata.Mar 30 2017, 1:24 PM

It's a lot of code but I skimmed most of it and read a few parts in more detail. A few questions/comments.

llvm/include/llvm/Object/IRSymtab.h
123 ↗(On Diff #93504)

The name "write" here seems unexpected to me, since we aren't writing to disk e.g.. The client does a "write" which involves a Writer class, followed by a Reader, when together both are needed to essentially "read" the symbols from Modules. Maybe "buildSymbolTable" or something like that. The Writer is more like a Builder.

llvm/lib/LTO/LTO.cpp
506 ↗(On Diff #93504)

Before this was embedded within symbol_iterator. What is the impact of moving it here - we only want to skip these symbols in the regular LTO case?

It's a little confusing to me that we have two symbol iterations going on in the below loop, one over the InputFile::Symbols and one over the ModuleSymbolTable - why do we need two data structures of symbols now?

llvm/lib/Object/IRSymtab.cpp
24 ↗(On Diff #93504)

Needs some comments.

pcc updated this revision to Diff 93548.Mar 30 2017, 3:00 PM
pcc marked 2 inline comments as done.
  • Address review comments
pcc added inline comments.Mar 30 2017, 3:02 PM
llvm/include/llvm/Object/IRSymtab.h
123 ↗(On Diff #93504)

Renamed to "build" (likewise to "Builder").

llvm/lib/LTO/LTO.cpp
506 ↗(On Diff #93504)

Before this was embedded within symbol_iterator. What is the impact of moving it here - we only want to skip these symbols in the regular LTO case?

We need to skip the same set of symbols in the module as we do in the InputFile's symbol table. The logic in Skip() duplicates the condition on line 361 of LTO.cpp where we only add global/non-format-specific symbols to the InputFile. It may be worth eliminating some of the duplication here, but I'd have to think about it more carefully.

It's a little confusing to me that we have two symbol iterations going on in the below loop, one over the InputFile::Symbols and one over the ModuleSymbolTable - why do we need two data structures of symbols now?

This function needs some attributes from the symbol table as well as direct access to module symbols for the IRMover. Because symbol table symbols were previously implemented in terms of module symbols, the symbol table list also served the purpose of providing access to module symbols.

This change separated the symbol table from the module, so now we have three lists: resolution, symbol table symbols and module symbols. In principle the latter two should contain the same information, so strictly we only need resolution and module symbols, but sometimes it's more convenient to pull symbol attributes out of the symbol table.

I agree that this is a little confusing, so I've left a comment at the top of this block.

This revision is now accepted and ready to land.Mar 30 2017, 6:27 PM
This revision was automatically updated to reflect the committed changes.
pcc added inline comments.Jun 6 2017, 3:26 PM
llvm/include/llvm/Object/IRSymtab.h
123 ↗(On Diff #93504)

Every time I go into this code it bugs me a little that we have a reader and a builder (as opposed to a writer). I hate to bring up a bikeshedding topic, and it's not a big deal I guess, but would you mind if I rename this back to write/Writer? In support of my position I point to the many Writer classes we have in LLVM [1] that are not necessarily writing to disk.

[1] http://llvm-cs.pcc.me.uk/?q=writer

tejohnson added inline comments.Jun 7 2017, 2:18 PM
llvm/include/llvm/Object/IRSymtab.h
123 ↗(On Diff #93504)

Given where this is going (towards writing those directly to disk), I'm less concerned with the builder vs writer name so this seems ok to me.