Page MenuHomePhabricator

[PDB] Defer relocating .debug$S until commit time and parallelize it
ClosedPublic

Authored by rnk on Jan 7 2021, 2:19 PM.

Details

Summary

This is a pretty classic optimization. Instead of processing symbol
records and copying them to temporary storage, do a first pass to
measure how large the module symbol stream will be, and then copy the
data into place in the PDB file. This requires defering relocation until
much later, which accounts for most of the complexity in this patch.

This patch avoids copying the contents of all live .debug$S sections
into heap memory, which is worth about 20% of private memory usage when
making PDBs. However, this is not an unmitigated performance win,
because it can be faster to read dense, temporary, heap data than it is
to iterate symbol records in object file backed memory a second time.

Results on release chrome.dll:
peak mem: 5164.89MB -> 4072.19MB (-1,092.7MB, -21.2%)
wall-j1: 0m30.844s -> 0m32.094s (slightly slower)
wall-j3: 0m20.968s -> 0m20.312s (slightly faster)
wall-j8: 0m19.062s -> 0m17.672s (meaningfully faster)

I gathered similar numbers for a debug, component build of content.dll
in Chrome, and the performance impact of this change was in the noise.
The memory usage reduction was visible and similar.

Because of the new parallelism in the PDB commit phase, more cores makes
the new approach faster. I'm assuming that most C++ developer machines
these days are at least quad core, so I think this is a win.

Diff Detail

Event Timeline

rnk created this revision.Jan 7 2021, 2:19 PM
rnk requested review of this revision.Jan 7 2021, 2:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2021, 2:19 PM
rnk updated this revision to Diff 315244.Jan 7 2021, 2:20 PM
  • formatting
aganea added a comment.Jan 7 2021, 2:53 PM

This is really cool. I'll take a look at this tomorrow!

aganea accepted this revision.Jan 11 2021, 1:03 PM

LTGM, thanks!

I've checked several of our binaries, compared the PDB symbol stream, and except for extra S_SKIP records, it's all the same.

Some figures:

Large target (400 MB EXE, 2 GB PDB), MSVC 16.8 .OBJs, Unity files:

Link timePeak commit
Before23 sec10.3 GB
After20.5 sec8.5 GB
                                    Summary
--------------------------------------------------------------------------------
           4848 Input OBJ files (expanded from all cmd-line inputs)
             61 PDB type server dependencies
             40 Precomp OBJ dependencies
      105084352 Input type records
     5672204074 Input type records bytes
        8275330 Merged TPI records
        2939923 Merged IPI records
          58997 Output PDB strings
        8218778 Global symbol records
       24183908 Module symbol records
        2075680 Public symbol records

Same target as above, but compiled with Clang 11 .OBJs:

Link timePeak commit
Before16.9 sec6.1 GB
After15.1 sec5 GB
                                    Summary
--------------------------------------------------------------------------------
           4844 Input OBJ files (expanded from all cmd-line inputs)
             61 PDB type server dependencies
              0 Precomp OBJ dependencies
       23827009 Input type records
     1366100202 Input type records bytes
        6548058 Merged TPI records
        2588948 Merged IPI records
          58643 Output PDB strings
        4454924 Global symbol records
       42985893 Module symbol records
        1906582 Public symbol records

For smaller targets (like a game retail target), this patch consistently saves 1.3 sec.

lld/COFF/PDB.cpp
125

s/moduleStreamSize/moduleSymOffset/ to match the definition.

221

It's a bit strange that '4' means to do nothing in finish() but I understand that it makes the logic in analyzeSymbolSubsection() less complicated.

473

Was this a divergence from MSVC link.exe, or was it handled somewhere else before your patch?

1524

I can't say I like poking plain memory without going through structured data, it makes the code less reader-friendly. It's a pity we don't have definitions for serialized structures :-( Nothing you can do now I guess.

This revision is now accepted and ready to land.Jan 11 2021, 1:03 PM

FWIW, it looks like we're spending more than half of the CPU time in the NT kernel (running on Windows 10 version 2004).

This is mostly caused by contention when page faulting on either mmap'ed files or zero-pages. KeYieldProcessorEx is spinning while waiting for a lock for bringing pages into the 'working set'.

I won't have time this week, but I expect VirtualLock or PrefetchVirtualMemory APIs inserted very early, at proper places could probably save a few more seconds.

rnk added a comment.Jan 11 2021, 5:21 PM

This is mostly caused by contention when page faulting on either mmap'ed files or zero-pages. KeYieldProcessorEx is spinning while waiting for a lock for bringing pages into the 'working set'.

Interesting, I've heard similar things about LLD ELF. I wonder how much of this is IO and how much of this is locks around modifying the process page directory.

I haven't profiled this out the way you have it here, but I've noticed that LLD's performance is really sensitive to the filesystem cache. If you can fit all input objs into RAM, then LLD runs slow on the first run as all these page faults load all the bits of the objects, and then on the second link, it runs really fast. Zach used to joke that LLD was an incremental linker, it just leverages the FS cache.

Similar to inserting prefetches, I was wondering if there were some APIs we can use to load the obj in phases:

  • reserve memory for the entire file
  • commit only the portions of the object used for symbol table, section table, and relocations
  • resolve symbols, run linker GC
  • commit section content memory for sections marked live, do not load memory for non-live sections

This would be much more explicit, similar to an explicit seeks and reads, explicitly getting the data from the FS when you need it.

aganea added a comment.EditedJan 12 2021, 2:02 PM
In D94267#2491787, @rnk wrote:

This is mostly caused by contention when page faulting on either mmap'ed files or zero-pages. KeYieldProcessorEx is spinning while waiting for a lock for bringing pages into the 'working set'.

Interesting, I've heard similar things about LLD ELF. I wonder how much of this is IO and how much of this is locks around modifying the process page directory.

In my case at least, it's exclusively due to virtual page management. There's no disk IO, everything was already in cache. Only the two spikes at the end, which is the deferred System write of the PDB & the EXE.

After cleaning the Windows cache, I get this:
(the top graph is CPU usage, the bottom graph is disk IO throughput)

Since the "Input File Reading" & "GC" are single-threaded, the application itself is the bottleneck rather than the disk. The Raid array on the machine is able to sustain 6.2 GB/s read, measured. Even in the cases where it's multithreaded, the disk IO never reaches that value, the "PDB Emission" takes exactly the same time regardless of cache. I think on a HDD the IO could be an issue, but not on modern SSDs.

  Input File Reading:            9525 ms ( 26.5%)
  GC:                           13852 ms ( 38.6%)
  Code Layout:                    982 ms (  2.7%)
  Commit Output File:              38 ms (  0.1%)
  PDB Emission (Cumulative):    11030 ms ( 30.7%)
    Add Objects:                 6442 ms ( 17.9%)
      Global Type Hashing:        889 ms (  2.5%)
      GHash Type Merging:        1349 ms (  3.8%)
      Symbol Merging:            3754 ms ( 10.4%)
    Publics Stream Layout:        620 ms (  1.7%)
    TPI Stream Layout:             51 ms (  0.1%)
    Commit to Disk:              2595 ms (  7.2%)
--------------------------------------------------
Total Link Time:                35930 ms (100.0%)    <-- cold cache, was 17 sec with hot cache

Similar to inserting prefetches, I was wondering if there were some APIs we can use to load the obj in phases:

  • reserve memory for the entire file
  • commit only the portions of the object used for symbol table, section table, and relocations
  • resolve symbols, run linker GC
  • commit section content memory for sections marked live, do not load memory for non-live sections

This would be much more explicit, similar to an explicit seeks and reads, explicitly getting the data from the FS when you need it.

Yes, that's pretty much what PrefetchVirtualMemory does: you give it a bunch of memory ranges, and it would fetch them all in parallel for you, in the background. When a memory-mapped file is open, nothing is commited. Prefetching the memory-mapped pages would initiate the IO, and then bring the pages into the process space. Ideally, we should compute the file regions and explicitly prefetch them as early in the process as possible. That should solve both issues: the IO and the virtual pages faults.

rnk marked an inline comment as done.Jan 12 2021, 2:57 PM

Thanks, I'll push this after a few more tests.

lld/COFF/PDB.cpp
221

I went ahead and gave this a named constant so it's a bit more readable.

473

This is a necessary change because now this code runs before translateIdSymbols runs, so it has to include the *_ID procedure variants

1524

I'll factor out the reinterpret_cast from the scope stack management code above so that it can be shared. That hopefully makes it a bit more readable.

@rnk This appears to be causing / has unearthed an asan failure: http://lab.llvm.org:8011/#/builders/5/builds/3382

hctim added a subscriber: hctim.Jan 19 2021, 11:48 AM

Reverted in 5b7aef6eb4b2930971029b984cb2360f7682e5a5 to bring the ASan bots online. Repro instructions at https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild:

http://lab.llvm.org:8011/#/builders/99/builds/1567

==13225==ERROR: AddressSanitizer: container-overflow on address 0x614000010120 at pc 0x000004ff2e67 bp 0x7f3f530f4510 sp 0x7f3f530f4508
READ of size 2 at 0x614000010120 thread T2
    #0 0x4ff2e66 in read<unsigned short, 1> /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Endian.h:66:3
    #1 0x4ff2e66 in read<unsigned short, llvm::support::little, 1> /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Endian.h:77:10
    #2 0x4ff2e66 in operator unsigned short /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Endian.h:216:12
    #3 0x4ff2e66 in read<unsigned short, llvm::support::little> /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Endian.h:357:10
    #4 0x4ff2e66 in read16<llvm::support::little> /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Endian.h:371:10
    #5 0x4ff2e66 in read16le /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Endian.h:380:50
    #6 0x4ff2e66 in add16 /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/Chunks.cpp:60:57
    #7 0x4ff2e66 in applySecIdx /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/Chunks.cpp:95:5
    #8 0x4ff2e66 in lld::coff::SectionChunk::applyRelX64(unsigned char*, unsigned short, lld::coff::OutputSection*, unsigned long, unsigned long) const /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/Chunks.cpp:112:34
    #9 0x4ffa9cd in lld::coff::SectionChunk::applyRelocation(unsigned char*, llvm::object::coff_relocation const&) const /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/Chunks.cpp:402:5
    #10 0x4ffc53d in lld::coff::SectionChunk::writeAndRelocateSubsection(llvm::ArrayRef<unsigned char>, llvm::ArrayRef<unsigned char>, unsigned int&, unsigned char*) const /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/Chunks.cpp:453:5
    #11 0x50fe998 in (anonymous namespace)::PDBLinker::writeSymbolRecord(lld::coff::SectionChunk*, llvm::ArrayRef<unsigned char>, llvm::codeview::CVRecord<llvm::codeview::SymbolKind>, unsigned long, unsigned int&, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> >&) /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:565:15
    #12 0x50fc61a in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:685:15
    #13 0x50fc61a in forEachCodeViewRecord<llvm::codeview::CVRecord<llvm::codeview::SymbolKind>, (lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:673:23)> /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/DebugInfo/CodeView/CVRecord.h:85:19
    #14 0x50fc61a in writeAllModuleSymbolRecords /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:672:17
    #15 0x50fc61a in (anonymous namespace)::PDBLinker::commitSymbolsForObject(void*, void*, llvm::BinaryStreamWriter&) /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:712:41
    #16 0x5cb916f in llvm::pdb::DbiModuleDescriptorBuilder::commitSymbolStream(llvm::msf::MSFLayout const&, llvm::WritableBinaryStreamRef) /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/DebugInfo/PDB/Native/DbiModuleDescriptorBuilder.cpp:175:21
    #17 0x5cc8e5b in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/DebugInfo/PDB/Native/DbiStreamBuilder.cpp:405:23
    #18 0x5cc8e5b in operator()<std::unique_ptr<llvm::pdb::DbiModuleDescriptorBuilder> &> /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Parallel.h:302:37
    #19 0x5cc8e5b in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Parallel.h:192:25
    #20 0x5cc8e5b in __invoke<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Parallel.h:188:16) &> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/type_traits:3679:1
    #21 0x5cc8e5b in __call<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Parallel.h:188:16) &> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/__functional_base:348:9
    #22 0x5cc8e5b in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1558:16
    #23 0x5cc8e5b in _ZNSt3__110__function6__funcIZN4llvm8parallel6detail25parallel_transform_reduceINS_11__wrap_iterIPNS_10unique_ptrINS2_3pdb26DbiModuleDescriptorBuilderENS_14default_deleteIS9_EEEEEEP15LLVMOpaqueErrorZNS2_20parallelForEachErrorIRNS_6vectorISC_NS_9allocatorISC_EEEEZNS8_16DbiStreamBuilder6commitERKNS2_3msf9MSFLayoutENS2_23WritableBinaryStreamRefEE3$_4EENS2_5ErrorEOT_T0_EUlSG_SG_E_ZNSH_ISM_ST_EESU_SW_SX_EUlSW_E_EESX_SV_SV_SX_T1_T2_EUlvE_NSJ_IS12_EEFvvEEclEv /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1732:12
    #24 0x4fecbd5 in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1885:16
    #25 0x4fecbd5 in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:2560:12
    #26 0x4fecbd5 in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:160:7
    #27 0x4fecbd5 in __invoke<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:159:41) &> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/type_traits:3679:1
    #28 0x4fecbd5 in __call<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:159:41) &> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/__functional_base:348:9
    #29 0x4fecbd5 in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1558:16
    #30 0x4fecbd5 in std::__1::__function::__func<llvm::parallel::detail::TaskGroup::spawn(std::__1::function<void ()>)::$_0, std::__1::allocator<llvm::parallel::detail::TaskGroup::spawn(std::__1::function<void ()>)::$_0>, void ()>::operator()() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1732:12
    #31 0x4fe8b1d in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1885:16
    #32 0x4fe8b1d in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:2560:12
    #33 0x4fe8b1d in llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::work(llvm::ThreadPoolStrategy, unsigned int) /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:108:7
    #34 0x4fe8f8c in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:52:36
    #35 0x4fe8f8c in __invoke<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:52:30)> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/type_traits:3679:1
    #36 0x4fe8f8c in __thread_execute<std::unique_ptr<std::__thread_struct>, (lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:52:30)> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/thread:280:5
    #37 0x4fe8f8c in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::ThreadPoolExecutor(llvm::ThreadPoolStrategy)::'lambda'()::operator()() const::'lambda'()> >(void*) /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/thread:291:5
    #38 0x7f3f5a8e7fa2 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7fa2)
    #39 0x7f3f5a7fe4ce in clone (/lib/x86_64-linux-gnu/libc.so.6+0xf94ce)
0x614000010120 is located 224 bytes inside of 416-byte region [0x614000010040,0x6140000101e0)
allocated by thread T2 here:
    #0 0x4d34e08 in operator new(unsigned long) /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:99:3
    #1 0x5006b89 in __libcpp_operator_new<unsigned long> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/new:235:10
    #2 0x5006b89 in __libcpp_allocate /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/new:261:10
    #3 0x5006b89 in allocate /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/memory:840:38
    #4 0x5006b89 in allocate /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/__memory/allocator_traits.h:468:21
    #5 0x5006b89 in __split_buffer /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/__split_buffer:314:29
    #6 0x5006b89 in std::__1::vector<unsigned char, std::__1::allocator<unsigned char> >::__append(unsigned long) /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/vector:1093:53
    #7 0x50fe8f8 in resize /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/vector:2024:15
    #8 0x50fe8f8 in (anonymous namespace)::PDBLinker::writeSymbolRecord(lld::coff::SectionChunk*, llvm::ArrayRef<unsigned char>, llvm::codeview::CVRecord<llvm::codeview::SymbolKind>, unsigned long, unsigned int&, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> >&) /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:561:11
    #9 0x50fc61a in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:685:15
    #10 0x50fc61a in forEachCodeViewRecord<llvm::codeview::CVRecord<llvm::codeview::SymbolKind>, (lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:673:23)> /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/DebugInfo/CodeView/CVRecord.h:85:19
    #11 0x50fc61a in writeAllModuleSymbolRecords /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:672:17
    #12 0x50fc61a in (anonymous namespace)::PDBLinker::commitSymbolsForObject(void*, void*, llvm::BinaryStreamWriter&) /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:712:41
    #13 0x5cb916f in llvm::pdb::DbiModuleDescriptorBuilder::commitSymbolStream(llvm::msf::MSFLayout const&, llvm::WritableBinaryStreamRef) /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/DebugInfo/PDB/Native/DbiModuleDescriptorBuilder.cpp:175:21
    #14 0x5cc8e5b in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/DebugInfo/PDB/Native/DbiStreamBuilder.cpp:405:23
    #15 0x5cc8e5b in operator()<std::unique_ptr<llvm::pdb::DbiModuleDescriptorBuilder> &> /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Parallel.h:302:37
    #16 0x5cc8e5b in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Parallel.h:192:25
    #17 0x5cc8e5b in __invoke<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Parallel.h:188:16) &> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/type_traits:3679:1
    #18 0x5cc8e5b in __call<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Parallel.h:188:16) &> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/__functional_base:348:9
    #19 0x5cc8e5b in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1558:16
    #20 0x5cc8e5b in _ZNSt3__110__function6__funcIZN4llvm8parallel6detail25parallel_transform_reduceINS_11__wrap_iterIPNS_10unique_ptrINS2_3pdb26DbiModuleDescriptorBuilderENS_14default_deleteIS9_EEEEEEP15LLVMOpaqueErrorZNS2_20parallelForEachErrorIRNS_6vectorISC_NS_9allocatorISC_EEEEZNS8_16DbiStreamBuilder6commitERKNS2_3msf9MSFLayoutENS2_23WritableBinaryStreamRefEE3$_4EENS2_5ErrorEOT_T0_EUlSG_SG_E_ZNSH_ISM_ST_EESU_SW_SX_EUlSW_E_EESX_SV_SV_SX_T1_T2_EUlvE_NSJ_IS12_EEFvvEEclEv /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1732:12
    #21 0x4fecbd5 in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1885:16
    #22 0x4fecbd5 in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:2560:12
    #23 0x4fecbd5 in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:160:7
    #24 0x4fecbd5 in __invoke<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:159:41) &> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/type_traits:3679:1
    #25 0x4fecbd5 in __call<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:159:41) &> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/__functional_base:348:9
    #26 0x4fecbd5 in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1558:16
    #27 0x4fecbd5 in std::__1::__function::__func<llvm::parallel::detail::TaskGroup::spawn(std::__1::function<void ()>)::$_0, std::__1::allocator<llvm::parallel::detail::TaskGroup::spawn(std::__1::function<void ()>)::$_0>, void ()>::operator()() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1732:12
    #28 0x4fe8b1d in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1885:16
    #29 0x4fe8b1d in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:2560:12
    #30 0x4fe8b1d in llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::work(llvm::ThreadPoolStrategy, unsigned int) /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:108:7
    #31 0x4fe8f8c in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:52:36
    #32 0x4fe8f8c in __invoke<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:52:30)> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/type_traits:3679:1
    #33 0x4fe8f8c in __thread_execute<std::unique_ptr<std::__thread_struct>, (lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:52:30)> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/thread:280:5
    #34 0x4fe8f8c in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::ThreadPoolExecutor(llvm::ThreadPoolStrategy)::'lambda'()::operator()() const::'lambda'()> >(void*) /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/thread:291:5
    #35 0x7f3f5a8e7fa2 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7fa2)
rnk added a comment.Jan 20 2021, 10:00 AM

I debugged this yesterday. The bug is from container overflow, which is only detectable with an ASan instrumented build of libc++, which I don't have. I made some local source changes to replace a std::vector with a temporary array, and I was able to observe the bug.

lld/COFF/Chunks.cpp
451

This is the buggy bounds check. At this point, we don't know how large the relocation is, so it's hard to check. We have an implicit assumption here that all relocations are entirely within or outside of the subsection that is being relocated. The assumption is correct for well-formed debug information.

However, we have what appear to be invalid input objects in the pdb-file-static test case. These input objects are yaml-ified object files from MSVC. It seems that obj2yaml/yambl2obj of MSVC object files does not round trip. The relocations, which are maintained separately from the symbol records, do not correspond to the symbol record offsets that yaml2obj generates. We know that MSVC does not align symbol records to 4 bytes, but LLVM does. This may be the source of the discrepancy.

This bug is actually already present in LLD. An object file with a relocation that points to the last byte of a section will cause LLD to do an OOB write. Since LLD applies relocations directly to the output file memory, it is hard to observe or exploit this bug. However, given this recent refactoring, I think it might be worth going back and fixing it.