This is an archive of the discontinued LLVM Phabricator instance.

Speed up the -Map option.
ClosedPublic

Authored by ruiu on Apr 27 2017, 8:19 PM.

Details

Summary

We found that some part of code for the -Map option takes O(m*n)
where m is the number of input sections in some file and n is
the number of symbols in the same file. If you do LTO, we usually
have only a few object files as inputs for the -Map option
feature, so this performance characteristic was worse than I
expected.

This patch rewrites the -Map option feature to speed it up.
I eliminated the O(m*n) bottleneck and also used multi-threading.

As a result, clang link time with the -Map option improved from
18.7 seconds to 11.2 seconds. Without -Map, it takes 7.7 seconds,
so the -Map option is now about 3x faster than before for this
test case (from 11.0 seconds to 3.5 seconds.) The generated output
file size was 223 MiB, and the file contains 1.2M lines.

Event Timeline

ruiu created this revision.Apr 27 2017, 8:19 PM
ruiu updated this revision to Diff 97039.Apr 27 2017, 8:27 PM
  • Fix indentation errors and add -strict-whitespace to FileCheck to catch that kind of errors.
dberris added inline comments.
lld/ELF/MapFile.cpp
50

No SmallVector<...> ?

ruiu added inline comments.Apr 27 2017, 8:56 PM
lld/ELF/MapFile.cpp
50

Does it make any noticeable difference? I usually use data structures in the standard library unless that is not appropriate because most people are more familiar with std::* than llvm:*.

dberris added inline comments.Apr 27 2017, 9:05 PM
lld/ELF/MapFile.cpp
50

I suspect it should, especially if you size the SmallVector with enough values to not require an additional indirection when reaching for the values. It should be easy to measure, given that you already have a test case. :)

ruiu added inline comments.Apr 27 2017, 9:06 PM
lld/ELF/MapFile.cpp
50

OK, let me do that tomorrow.

pcc edited edge metadata.Apr 27 2017, 9:14 PM

Another idea I had for eliminating the O(m * n) behaviour was to collect a list of symbols (externals from the global symbol table and internals from the input files), sort it by VA, and print it in order, emitting input and output section lines whenever either of them changes. That would still be O(n * log n) like your solution, but I think it would have a few other benefits (e.g. it would make it easier to print absolute and synthetic symbols). We can think about doing that separately, though.

lld/ELF/MapFile.cpp
50

I think the key type of this map could be const InputSection *.

92

I think you can enumerate Symtab<ELFT>::X->getObjectFiles() instead here.

117

Can you avoid calling this function twice?

ruiu updated this revision to Diff 97101.Apr 28 2017, 7:59 AM
  • Address review comments.
ruiu added a comment.Apr 28 2017, 8:03 AM

Yes, I think we do not have a multimap in LLVM. I could use std::unordered_multimap, but that's probably slower than DenseMap<T, std::vector<U>>.

lld/ELF/MapFile.cpp
50

It actually cannot.

50

Changing the type from std::vector to SmallVector sped it up by a few hundred milliseconds for the clang test case, so that actually mattered.

117

Done.

ruiu added a comment.Apr 28 2017, 8:16 AM

I implemented Peter's suggestion to collect all symbols first, and then print them out in the increasing VA order. It turned out that doesn't print out all output sections because some sections doesn't have symbols at all. For example, .eh_frame doesn't have any symbols that point to the section. There might be some way to handle this, but I'll leave it alone for now.

pcc accepted this revision.Apr 28 2017, 10:19 AM

LGTM

This revision is now accepted and ready to land.Apr 28 2017, 10:19 AM
This revision was automatically updated to reflect the committed changes.