Page MenuHomePhabricator

[ELF] Parallelize initializeLocalSymbols
ClosedPublic

Authored by MaskRay on Feb 15 2022, 6:16 PM.

Details

Summary

ObjFile::parse combines symbol initialization and resolution. Many tasks
unrelated to symbol resolution can be postponed and parallelized. This patch
extracts local symbol initialization and parallelizes it.

Technically the new function initializeLocalSymbols can be merged into
ObjFile::postParse, but functions like getSrcMsg may access the
uninitialized (all nullptr) local part of InputFile::symbols.

Linking chrome: 1.02x as fast with glibc malloc, 1.04x as fast with mimalloc

Depends on D119908

Diff Detail

Event Timeline

MaskRay created this revision.Feb 15 2022, 6:16 PM
MaskRay requested review of this revision.Feb 15 2022, 6:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2022, 6:16 PM
MaskRay updated this revision to Diff 409314.Feb 16 2022, 10:17 AM
MaskRay edited the summary of this revision. (Show Details)

rebase

MaskRay updated this revision to Diff 410441.EditedFeb 21 2022, 9:02 PM

Replace getSpecificAllocSingleton<SymbolUnion>().Allocate(firstGlobal); with localSymStorage.

1.02x as fast when linking clang
1.04x as fast when linking chrome

I plan to commit the change separately to decrease diff, but for single threading there is no observable difference for performance.

Replace getSpecificAllocSingleton<SymbolUnion>().Allocate(firstGlobal); with localSymStorage.

1.02x as fast when linking clang
1.04x as fast when linking chrome

I plan to commit the change separately to decrease diff, but for single threading there is no observable difference for performance.

Do you know where the performance gain comes from? Can the same optimization be useful elsewhere? Can localSymStorage be std::unique_ptr<SymbolUnion> to avoid the low-level manipulations with the alignment?

lld/ELF/InputFiles.cpp
1159

initializeLocalSymbols() is very similar to postParseObjectFile() from D119908. I believe they should be defined in the same way: both in InputFiles.cpp after methods they call or both in Driver.cpp, as static functions before LinkerDriver::link().

Replace getSpecificAllocSingleton<SymbolUnion>().Allocate(firstGlobal); with localSymStorage.

1.02x as fast when linking clang
1.04x as fast when linking chrome

I plan to commit the change separately to decrease diff, but for single threading there is no observable difference for performance.

Do you know where the performance gain comes from? Can the same optimization be useful elsewhere? Can localSymStorage be std::unique_ptr<SymbolUnion> to avoid the low-level manipulations with the alignment?

NUMA aware allocation, memory reference locality, etc. Symbols.h includes InputFiles.h. SymbolUnion is not accessible in InputFiles.h, so localSymStorage cannot use std::unique_ptr<SymbolUnion[]> with incomplete type.

MaskRay retitled this revision from [ELF] Parallelize initializeLocalSymbols and postpone postParse to [ELF] Parallelize initializeLocalSymbols.Feb 22 2022, 10:16 AM
MaskRay updated this revision to Diff 411004.Feb 23 2022, 9:41 PM

Replace manual aligned alloc with std::make_unique<SymbolUnion[]>(firstGlobal);
now that Symbols.h no longer includes InputFiles.h

MaskRay marked an inline comment as done.Feb 23 2022, 9:42 PM
MaskRay added inline comments.
lld/ELF/InputFiles.cpp
1159

Technically the new function initializeLocalSymbols can be merged into ObjFile::postParse, but functions like getSrcMsg may access the uninitialized (all nullptr) local part of InputFile::symbols.

Combing will cause null pointer reference for some diagnostics.

MaskRay marked an inline comment as done.Feb 23 2022, 9:43 PM
ikudrin added inline comments.Feb 24 2022, 3:42 AM
lld/ELF/InputFiles.cpp
1159

My comment was not about combining. I only suggested moving initializeLocalSymbols() to Driver.cpp, place it after postParseObjectFile() and make static.

MaskRay updated this revision to Diff 411164.Feb 24 2022, 9:32 AM
MaskRay marked an inline comment as done.

Move static void initializeLocalSymbols to Driver.cpp

ikudrin accepted this revision.Feb 24 2022, 7:20 PM

Thanks for the update! LGTM.

This revision is now accepted and ready to land.Feb 24 2022, 7:20 PM
MaskRay edited the summary of this revision. (Show Details)Feb 24 2022, 8:05 PM
This revision was automatically updated to reflect the committed changes.
jgorbe added a subscriber: jgorbe.Mar 4 2022, 3:00 PM

We've found segfaults in lld when linking go binaries after this patch:

 #0 0x0000562d149b2b8a llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (third_party/crosstool/v18/llvm_unstable/toolchain/bin/ld+0x4bb2b8a)
 #1 0x0000562d149b0b78 llvm::sys::RunSignalHandlers() (third_party/crosstool/v18/llvm_unstable/toolchain/bin/ld+0x4bb0b78)
 #2 0x0000562d149b31ec SignalHandler(int) (third_party/crosstool/v18/llvm_unstable/toolchain/bin/ld+0x4bb31ec)
 #3 0x00007fd976fac750 __restore_rt (/usr/grte/v5/lib64/libpthread.so.0+0x15750)
 #4 0x0000562d123eb029 void (anonymous namespace)::MarkLive<llvm::object::ELFType<(llvm::support::endianness)1, true> >::resolveReloc<llvm::object::Elf_Rel_Impl<llvm::object::ELFType<(llvm::support::endianness)1, true>, true> const>(lld::elf::InputSectionBase&, llvm::object::Elf_Rel_Impl<llvm::object::ELFType<(llvm::support::endianness)1, true>, true> const&, bool) (third_party/crosstool/v18/llvm_unstable/toolchain/bin/ld+0x25eb029)
 #5 0x0000562d123e75f9 void lld::elf::markLive<llvm::object::ELFType<(llvm::support::endianness)1, true> >() (third_party/crosstool/v18/llvm_unstable/toolchain/bin/ld+0x25e75f9)
 #6 0x0000562d1235e86b lld::elf::LinkerDriver::link(llvm::opt::InputArgList&) (third_party/crosstool/v18/llvm_unstable/toolchain/bin/ld+0x255e86b)
 #7 0x0000562d12351137 lld::elf::LinkerDriver::linkerMain(llvm::ArrayRef<char const*>) (third_party/crosstool/v18/llvm_unstable/toolchain/bin/ld+0x2551137)
 #8 0x0000562d1234f6a9 lld::elf::link(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, bool, bool) (third_party/crosstool/v18/llvm_unstable/toolchain/bin/ld+0x254f6a9)
 #9 0x0000562d122505d3 lldMain(int, char const**, llvm::raw_ostream&, llvm::raw_ostream&, bool) (third_party/crosstool/v18/llvm_unstable/toolchain/bin/ld+0x24505d3)
#10 0x0000562d1224fd89 main (third_party/crosstool/v18/llvm_unstable/toolchain/bin/ld+0x244fd89)
#11 0x00007fd976e3c8d3 __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x628d3)
#12 0x0000562d1224f96a _start (third_party/crosstool/v18/llvm_unstable/toolchain/bin/ld+0x244f96a)
Segmentation fault

I'm going to revert temporarily until @MaskRay can track down the cause.

Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2022, 3:00 PM