This is an archive of the discontinued LLVM Phabricator instance.

Speed up --start-lib and --end-lib.
ClosedPublic

Authored by ruiu on May 21 2019, 5:09 AM.

Details

Summary

--{start,end}-lib give files grouped by the options the archive file
semantics. That is, each object file between them acts as if it were
in an archive file whose sole member is the file.

Therefore, files between --{start,end}-lib are linked to the final
output only if they are needed to resolve some undefined symbols.

Previously, the feature was implemented this way:

  1. We read a symbol table and insert defined symbols to the symbol table as lazy symbols.
  2. If an undefind symbol is resolved to a lazy symbol, that lazy symbol instantiate ObjFile class for that symbol, which re-insert all defined symbols to the symbol table.

So, if an ObjFile is instantiated, defined symbols are inserted to the
symbol table twice. Since inserting long symbol names is not cheap,
there's a room to optimize here.

This patch optimzies it. Now, LazyObjFile remembers symbol handles and
passed them over to a new ObjFile instance, so that the ObjFile
doesn't insert the same strings.

Here is a quick benchmark to link clang. "Original" is the original
lld with unmodified command line options. For "With" and "Without", I
extracted all files from archive files and replace .a's in a command
line with .o's wrapped with --{start,end}-lib.

    With this patch: 5.674s (+0.0%)
Withtout this patch: 6.087s (+7.2%)
           Original: 5.886s (+3.7%)

So, interestingly, --{start,end}-lib are now faster than the regular
linking scheme with archive files. That's perhaps not too surprising,
though, because for regular archive files, we look up the symbol table
with the same string twice.

Event Timeline

ruiu created this revision.May 21 2019, 5:09 AM
Herald added a project: Restricted Project. · View Herald Transcript
ruiu edited the summary of this revision. (Show Details)May 21 2019, 5:10 AM
ruiu edited the summary of this revision. (Show Details)
ruiu updated this revision to Diff 200464.May 21 2019, 5:15 AM
  • remove dead declaration

This is interesting! I'll take a closer look tomorrow

lld/ELF/InputFiles.cpp
996–997

Missing default:

ruiu marked an inline comment as done.May 21 2019, 9:46 PM
ruiu added inline comments.
lld/ELF/InputFiles.cpp
996–997

Does this have to be exhaustive? We have a fatal after this switch. Anyways, I think I can rewrite this with if.

ruiu updated this revision to Diff 200638.May 21 2019, 9:48 PM
  • address review comment
ruiu updated this revision to Diff 200642.May 21 2019, 9:58 PM
  • rebase

In LazyObjFile::parse():

for (size_t I = FirstGlobal, End = ESyms.size(); I != End; ++I)
  if (ESyms[I].st_shndx != SHN_UNDEF)
    // resolveSymbol() called by addSymbol() checks if the symbol was originally undefined.
    // If that is the case, addSymbol() will transitively call SymbolTable::fetchLazy, which calls LazyObjFile::fetch().
    this->Symbols[I] = Symtab->addSymbol(
        LazyObject{*this, CHECK(ESyms[I].getName(Strtab), this)});
return;

In LazyObjFile::parse():

// Symbols[0,I) are inherited from LazyObjFile::parse(), other entries are nullptr.
// This has another problem: it increases memory usage.
// The copy assignment cannot be replaced with move assignment, because
// parse() will continue processing Symbols[I+1] Symbols[I+2] ...
File->Symbols = Symbols;

I think in most cases (that are not flagged by --warn-backrefs), this may not have a perceivable improvement in the current form. Actually,
For an (input: 4.2G; output: 1.6G) executable, I measure a performance regression, 1% or lower, but the memory usage increases by ~1.6%. Do you know if the problem can be fixed?

lld/ELF/InputFiles.cpp
952

typo. first object -> `other

ruiu added a comment.May 21 2019, 11:33 PM

In LazyObjFile::parse():

for (size_t I = FirstGlobal, End = ESyms.size(); I != End; ++I)
  if (ESyms[I].st_shndx != SHN_UNDEF)
    // resolveSymbol() called by addSymbol() checks if the symbol was originally undefined.
    // If that is the case, addSymbol() will transitively call SymbolTable::fetchLazy, which calls LazyObjFile::fetch().
    this->Symbols[I] = Symtab->addSymbol(
        LazyObject{*this, CHECK(ESyms[I].getName(Strtab), this)});
return;

Good point. I didn't think about that, but looks like that's a problem. I believe this can be fixed by inserting placeholder symbols first and then resolve them against LazyObject symbols. Let me try to do that.

In LazyObjFile::parse():

// Symbols[0,I) are inherited from LazyObjFile::parse(), other entries are nullptr.
// This has another problem: it increases memory usage.
// The copy assignment cannot be replaced with move assignment, because
// parse() will continue processing Symbols[I+1] Symbols[I+2] ...
File->Symbols = Symbols;

I think in most cases (that are not flagged by --warn-backrefs), this may not have a perceivable improvement in the current form. Actually,
For an (input: 4.2G; output: 1.6G) executable, I measure a performance regression, 1% or lower, but the memory usage increases by ~1.6%. Do you know if the problem can be fixed?

What program did you use for your testing?

MaskRay added inline comments.May 21 2019, 11:39 PM
lld/ELF/InputFiles.cpp
996–997

It doesn't have to be exhaustive. I thinkg -Wswitch-enum doesn't fire if the variable (Binding) is not of the enum type. I find the original switch easier to read...

Good point. I didn't think about that, but looks like that's a problem. I believe this can be fixed by inserting placeholder symbols first and then resolve them against LazyObject symbols. Let me try to do that.

In LazyObjFile::parse():

// Symbols[0,I) are inherited from LazyObjFile::parse(), other entries are nullptr.
// This has another problem: it increases memory usage.
// The copy assignment cannot be replaced with move assignment, because
// parse() will continue processing Symbols[I+1] Symbols[I+2] ...
File->Symbols = Symbols;

I think in most cases (that are not flagged by --warn-backrefs), this may not have a perceivable improvement in the current form. Actually,
For an (input: 4.2G; output: 1.6G) executable, I measure a performance regression, 1% or lower, but the memory usage increases by ~1.6%. Do you know if the problem can be fixed?

What program did you use for your testing?

An internal indexing program. It was compiled with -O3 -ffunction-sections -fdata-sections and many other flags that shouldn't cause a big difference to the linker. I think the patch may show improvement after it is parallelized.

ruiu updated this revision to Diff 200652.May 22 2019, 12:03 AM
  • add placeholder symbols first before adding LazyObjSymbol
  • reduce memory consumption using std::move.
ruiu added a comment.May 22 2019, 12:04 AM

It looks like this new patch runs faster than the previous ones. Can you confirm?

ruiu updated this revision to Diff 200656.May 22 2019, 12:14 AM

With this patch: 5.674s (+0.0%)
Withtout this patch: 6.087s (+7.2%)
Original: 5.886s (+3.7%)

Can it be that you are running debug LLD? I think linking clang was much faster in the past.

lld/ELF/InputFiles.cpp
919

Did you use this-> intentionally in the comments? I think just Symbols would look better.

927

What is wrong with calling resize without check? It should do nothing I think.

ruiu marked 2 inline comments as done.May 22 2019, 1:20 AM

I should've said that I used clang *with debug info* for testing.

lld/ELF/InputFiles.cpp
919

This is needed because this is a template derived class. You'll get a compile error if you don't write this-> in this case.

927

Good point, I'll do that.

grimar added inline comments.May 22 2019, 1:25 AM
lld/ELF/InputFiles.cpp
919

I was talking about comments :)

"Initialize this->Symbols. this->Symbols is a parallel array as" -> "Initialize Symbols. Symbols is a parallel array as"?

ruiu marked an inline comment as done.May 22 2019, 1:28 AM
ruiu added inline comments.
lld/ELF/InputFiles.cpp
919

Ah. This is for consistency with the code that this comment explains. Not a big deal, but I thought that this is better.

ruiu updated this revision to Diff 200874.May 22 2019, 11:18 PM
  • rebase
MaskRay accepted this revision.EditedMay 23 2019, 2:02 AM

For one program, I measured 3.0% time decrease with 0.25% memory increase. Looks great!

lld/ELF/InputFiles.cpp
1525

Nit:

if (ESyms[I].st_shndx != SHN_UNDEF)
  this->Symbols[I] = Symtab->insert(CHECK(ESyms[I].getName(Strtab), this));
This revision is now accepted and ready to land.May 23 2019, 2:02 AM
This revision was automatically updated to reflect the committed changes.