This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/COFF/
-
COFF/
-
Chunks.h
1/1
Chunks.cpp
6/7
PDB.cpp
-
llvm/
-
include/llvm/DebugInfo/PDB/Native/
-
llvm/
-
DebugInfo/
-
PDB/
-
Native/
-
DbiModuleDescriptorBuilder.h
-
lib/DebugInfo/PDB/Native/
-
DebugInfo/
-
PDB/
-
Native/
-
DbiModuleDescriptorBuilder.cpp
-
DbiStreamBuilder.cpp

Differential D94267

[PDB] Defer relocating .debug$S until commit time and parallelize it
ClosedPublic

Authored by rnk on Jan 7 2021, 2:19 PM.

Download Raw Diff

Details

Reviewers

aganea
thakis

Commits

rG6529d7c5a45b: [PDB] Defer relocating .debug$S until commit time and parallelize it

Summary

This is a pretty classic optimization. Instead of processing symbol
records and copying them to temporary storage, do a first pass to
measure how large the module symbol stream will be, and then copy the
data into place in the PDB file. This requires defering relocation until
much later, which accounts for most of the complexity in this patch.

This patch avoids copying the contents of all live .debug$S sections
into heap memory, which is worth about 20% of private memory usage when
making PDBs. However, this is not an unmitigated performance win,
because it can be faster to read dense, temporary, heap data than it is
to iterate symbol records in object file backed memory a second time.

Results on release chrome.dll:
peak mem: 5164.89MB -> 4072.19MB (-1,092.7MB, -21.2%)
wall-j1: 0m30.844s -> 0m32.094s (slightly slower)
wall-j3: 0m20.968s -> 0m20.312s (slightly faster)
wall-j8: 0m19.062s -> 0m17.672s (meaningfully faster)

I gathered similar numbers for a debug, component build of content.dll
in Chrome, and the performance impact of this change was in the noise.
The memory usage reduction was visible and similar.

Because of the new parallelism in the PDB commit phase, more cores makes
the new approach faster. I'm assuming that most C++ developer machines
these days are at least quad core, so I think this is a win.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rnk created this revision.Jan 7 2021, 2:19 PM

Herald added subscribers: mgrang, hiraditya. · View Herald TranscriptJan 7 2021, 2:19 PM

rnk requested review of this revision.Jan 7 2021, 2:19 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2021, 2:19 PM

formatting

Harbormaster completed remote builds in B84390: Diff 315243.Jan 7 2021, 2:53 PM

This is really cool. I'll take a look at this tomorrow!

Harbormaster completed remote builds in B84391: Diff 315244.Jan 7 2021, 3:07 PM

LTGM, thanks!

I've checked several of our binaries, compared the PDB symbol stream, and except for extra S_SKIP records, it's all the same.

Some figures:

Large target (400 MB EXE, 2 GB PDB), MSVC 16.8 .OBJs, Unity files:

	Link time	Peak commit
Before	23 sec	10.3 GB
After	20.5 sec	8.5 GB

                                    Summary
--------------------------------------------------------------------------------
           4848 Input OBJ files (expanded from all cmd-line inputs)
             61 PDB type server dependencies
             40 Precomp OBJ dependencies
      105084352 Input type records
     5672204074 Input type records bytes
        8275330 Merged TPI records
        2939923 Merged IPI records
          58997 Output PDB strings
        8218778 Global symbol records
       24183908 Module symbol records
        2075680 Public symbol records

Same target as above, but compiled with Clang 11 .OBJs:

	Link time	Peak commit
Before	16.9 sec	6.1 GB
After	15.1 sec	5 GB

                                    Summary
--------------------------------------------------------------------------------
           4844 Input OBJ files (expanded from all cmd-line inputs)
             61 PDB type server dependencies
              0 Precomp OBJ dependencies
       23827009 Input type records
     1366100202 Input type records bytes
        6548058 Merged TPI records
        2588948 Merged IPI records
          58643 Output PDB strings
        4454924 Global symbol records
       42985893 Module symbol records
        1906582 Public symbol records

For smaller targets (like a game retail target), this patch consistently saves 1.3 sec.

lld/COFF/PDB.cpp
125	s/moduleStreamSize/moduleSymOffset/ to match the definition.
221	It's a bit strange that '4' means to do nothing in `finish()` but I understand that it makes the logic in `analyzeSymbolSubsection()` less complicated.
473	Was this a divergence from MSVC link.exe, or was it handled somewhere else before your patch?
1524	I can't say I like poking plain memory without going through structured data, it makes the code less reader-friendly. It's a pity we don't have definitions for serialized structures :-( Nothing you can do now I guess.

This revision is now accepted and ready to land.Jan 11 2021, 1:03 PM

FWIW, it looks like we're spending more than half of the CPU time in the NT kernel (running on Windows 10 version 2004).

This is mostly caused by contention when page faulting on either mmap'ed files or zero-pages. KeYieldProcessorEx is spinning while waiting for a lock for bringing pages into the 'working set'.

I won't have time this week, but I expect VirtualLock or PrefetchVirtualMemory APIs inserted very early, at proper places could probably save a few more seconds.

aganea mentioned this in D87805: [PDB] Merge types in parallel when using ghashing.Jan 11 2021, 4:03 PM

In D94267#2491643, @aganea wrote:

This is mostly caused by contention when page faulting on either mmap'ed files or zero-pages. KeYieldProcessorEx is spinning while waiting for a lock for bringing pages into the 'working set'.

Interesting, I've heard similar things about LLD ELF. I wonder how much of this is IO and how much of this is locks around modifying the process page directory.

I haven't profiled this out the way you have it here, but I've noticed that LLD's performance is really sensitive to the filesystem cache. If you can fit all input objs into RAM, then LLD runs slow on the first run as all these page faults load all the bits of the objects, and then on the second link, it runs really fast. Zach used to joke that LLD was an incremental linker, it just leverages the FS cache.

Similar to inserting prefetches, I was wondering if there were some APIs we can use to load the obj in phases:

reserve memory for the entire file
commit only the portions of the object used for symbol table, section table, and relocations
resolve symbols, run linker GC
commit section content memory for sections marked live, do not load memory for non-live sections

This would be much more explicit, similar to an explicit seeks and reads, explicitly getting the data from the FS when you need it.

In D94267#2491787, @rnk wrote:

In D94267#2491643, @aganea wrote:

This is mostly caused by contention when page faulting on either mmap'ed files or zero-pages. KeYieldProcessorEx is spinning while waiting for a lock for bringing pages into the 'working set'.

Interesting, I've heard similar things about LLD ELF. I wonder how much of this is IO and how much of this is locks around modifying the process page directory.

In my case at least, it's exclusively due to virtual page management. There's no disk IO, everything was already in cache. Only the two spikes at the end, which is the deferred System write of the PDB & the EXE.

After cleaning the Windows cache, I get this:
(the top graph is CPU usage, the bottom graph is disk IO throughput)

Since the "Input File Reading" & "GC" are single-threaded, the application itself is the bottleneck rather than the disk. The Raid array on the machine is able to sustain 6.2 GB/s read, measured. Even in the cases where it's multithreaded, the disk IO never reaches that value, the "PDB Emission" takes exactly the same time regardless of cache. I think on a HDD the IO could be an issue, but not on modern SSDs.

  Input File Reading:            9525 ms ( 26.5%)
  GC:                           13852 ms ( 38.6%)
  Code Layout:                    982 ms (  2.7%)
  Commit Output File:              38 ms (  0.1%)
  PDB Emission (Cumulative):    11030 ms ( 30.7%)
    Add Objects:                 6442 ms ( 17.9%)
      Global Type Hashing:        889 ms (  2.5%)
      GHash Type Merging:        1349 ms (  3.8%)
      Symbol Merging:            3754 ms ( 10.4%)
    Publics Stream Layout:        620 ms (  1.7%)
    TPI Stream Layout:             51 ms (  0.1%)
    Commit to Disk:              2595 ms (  7.2%)
--------------------------------------------------
Total Link Time:                35930 ms (100.0%)    <-- cold cache, was 17 sec with hot cache

Similar to inserting prefetches, I was wondering if there were some APIs we can use to load the obj in phases:

reserve memory for the entire file

commit only the portions of the object used for symbol table, section table, and relocations

resolve symbols, run linker GC

commit section content memory for sections marked live, do not load memory for non-live sections

This would be much more explicit, similar to an explicit seeks and reads, explicitly getting the data from the FS when you need it.

Yes, that's pretty much what PrefetchVirtualMemory does: you give it a bunch of memory ranges, and it would fetch them all in parallel for you, in the background. When a memory-mapped file is open, nothing is commited. Prefetching the memory-mapped pages would initiate the IO, and then bring the pages into the process space. Ideally, we should compute the file regions and explicitly prefetch them as early in the process as possible. That should solve both issues: the IO and the virtual pages faults.

Thanks, I'll push this after a few more tests.

lld/COFF/PDB.cpp
221	I went ahead and gave this a named constant so it's a bit more readable.
473	This is a necessary change because now this code runs before `translateIdSymbols` runs, so it has to include the `*_ID` procedure variants
1524	I'll factor out the reinterpret_cast from the scope stack management code above so that it can be shared. That hopefully makes it a bit more readable.

Closed by commit rG6529d7c5a45b: [PDB] Defer relocating .debug$S until commit time and parallelize it (authored by rnk). · Explain WhyJan 12 2021, 5:47 PM

This revision was automatically updated to reflect the committed changes.

rnk added a commit: rG6529d7c5a45b: [PDB] Defer relocating .debug$S until commit time and parallelize it.

@rnk This appears to be causing / has unearthed an asan failure: http://lab.llvm.org:8011/#/builders/5/builds/3382

hctim added a reverting change: rG5b7aef6eb4b2: Revert "[PDB] Defer relocating .debug$S until commit time and parallelize it".Jan 19 2021, 11:46 AM

Reverted in 5b7aef6eb4b2930971029b984cb2360f7682e5a5 to bring the ASan bots online. Repro instructions at https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild:

http://lab.llvm.org:8011/#/builders/99/builds/1567

==13225==ERROR: AddressSanitizer: container-overflow on address 0x614000010120 at pc 0x000004ff2e67 bp 0x7f3f530f4510 sp 0x7f3f530f4508
READ of size 2 at 0x614000010120 thread T2
    #0 0x4ff2e66 in read<unsigned short, 1> /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Endian.h:66:3
    #1 0x4ff2e66 in read<unsigned short, llvm::support::little, 1> /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Endian.h:77:10
    #2 0x4ff2e66 in operator unsigned short /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Endian.h:216:12
    #3 0x4ff2e66 in read<unsigned short, llvm::support::little> /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Endian.h:357:10
    #4 0x4ff2e66 in read16<llvm::support::little> /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Endian.h:371:10
    #5 0x4ff2e66 in read16le /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Endian.h:380:50
    #6 0x4ff2e66 in add16 /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/Chunks.cpp:60:57
    #7 0x4ff2e66 in applySecIdx /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/Chunks.cpp:95:5
    #8 0x4ff2e66 in lld::coff::SectionChunk::applyRelX64(unsigned char*, unsigned short, lld::coff::OutputSection*, unsigned long, unsigned long) const /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/Chunks.cpp:112:34
    #9 0x4ffa9cd in lld::coff::SectionChunk::applyRelocation(unsigned char*, llvm::object::coff_relocation const&) const /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/Chunks.cpp:402:5
    #10 0x4ffc53d in lld::coff::SectionChunk::writeAndRelocateSubsection(llvm::ArrayRef<unsigned char>, llvm::ArrayRef<unsigned char>, unsigned int&, unsigned char*) const /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/Chunks.cpp:453:5
    #11 0x50fe998 in (anonymous namespace)::PDBLinker::writeSymbolRecord(lld::coff::SectionChunk*, llvm::ArrayRef<unsigned char>, llvm::codeview::CVRecord<llvm::codeview::SymbolKind>, unsigned long, unsigned int&, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> >&) /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:565:15
    #12 0x50fc61a in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:685:15
    #13 0x50fc61a in forEachCodeViewRecord<llvm::codeview::CVRecord<llvm::codeview::SymbolKind>, (lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:673:23)> /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/DebugInfo/CodeView/CVRecord.h:85:19
    #14 0x50fc61a in writeAllModuleSymbolRecords /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:672:17
    #15 0x50fc61a in (anonymous namespace)::PDBLinker::commitSymbolsForObject(void*, void*, llvm::BinaryStreamWriter&) /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:712:41
    #16 0x5cb916f in llvm::pdb::DbiModuleDescriptorBuilder::commitSymbolStream(llvm::msf::MSFLayout const&, llvm::WritableBinaryStreamRef) /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/DebugInfo/PDB/Native/DbiModuleDescriptorBuilder.cpp:175:21
    #17 0x5cc8e5b in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/DebugInfo/PDB/Native/DbiStreamBuilder.cpp:405:23
    #18 0x5cc8e5b in operator()<std::unique_ptr<llvm::pdb::DbiModuleDescriptorBuilder> &> /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Parallel.h:302:37
    #19 0x5cc8e5b in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Parallel.h:192:25
    #20 0x5cc8e5b in __invoke<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Parallel.h:188:16) &> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/type_traits:3679:1
    #21 0x5cc8e5b in __call<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Parallel.h:188:16) &> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/__functional_base:348:9
    #22 0x5cc8e5b in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1558:16
    #23 0x5cc8e5b in _ZNSt3__110__function6__funcIZN4llvm8parallel6detail25parallel_transform_reduceINS_11__wrap_iterIPNS_10unique_ptrINS2_3pdb26DbiModuleDescriptorBuilderENS_14default_deleteIS9_EEEEEEP15LLVMOpaqueErrorZNS2_20parallelForEachErrorIRNS_6vectorISC_NS_9allocatorISC_EEEEZNS8_16DbiStreamBuilder6commitERKNS2_3msf9MSFLayoutENS2_23WritableBinaryStreamRefEE3$_4EENS2_5ErrorEOT_T0_EUlSG_SG_E_ZNSH_ISM_ST_EESU_SW_SX_EUlSW_E_EESX_SV_SV_SX_T1_T2_EUlvE_NSJ_IS12_EEFvvEEclEv /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1732:12
    #24 0x4fecbd5 in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1885:16
    #25 0x4fecbd5 in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:2560:12
    #26 0x4fecbd5 in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:160:7
    #27 0x4fecbd5 in __invoke<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:159:41) &> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/type_traits:3679:1
    #28 0x4fecbd5 in __call<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:159:41) &> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/__functional_base:348:9
    #29 0x4fecbd5 in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1558:16
    #30 0x4fecbd5 in std::__1::__function::__func<llvm::parallel::detail::TaskGroup::spawn(std::__1::function<void ()>)::$_0, std::__1::allocator<llvm::parallel::detail::TaskGroup::spawn(std::__1::function<void ()>)::$_0>, void ()>::operator()() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1732:12
    #31 0x4fe8b1d in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1885:16
    #32 0x4fe8b1d in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:2560:12
    #33 0x4fe8b1d in llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::work(llvm::ThreadPoolStrategy, unsigned int) /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:108:7
    #34 0x4fe8f8c in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:52:36
    #35 0x4fe8f8c in __invoke<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:52:30)> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/type_traits:3679:1
    #36 0x4fe8f8c in __thread_execute<std::unique_ptr<std::__thread_struct>, (lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:52:30)> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/thread:280:5
    #37 0x4fe8f8c in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::ThreadPoolExecutor(llvm::ThreadPoolStrategy)::'lambda'()::operator()() const::'lambda'()> >(void*) /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/thread:291:5
    #38 0x7f3f5a8e7fa2 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7fa2)
    #39 0x7f3f5a7fe4ce in clone (/lib/x86_64-linux-gnu/libc.so.6+0xf94ce)
0x614000010120 is located 224 bytes inside of 416-byte region [0x614000010040,0x6140000101e0)
allocated by thread T2 here:
    #0 0x4d34e08 in operator new(unsigned long) /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:99:3
    #1 0x5006b89 in __libcpp_operator_new<unsigned long> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/new:235:10
    #2 0x5006b89 in __libcpp_allocate /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/new:261:10
    #3 0x5006b89 in allocate /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/memory:840:38
    #4 0x5006b89 in allocate /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/__memory/allocator_traits.h:468:21
    #5 0x5006b89 in __split_buffer /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/__split_buffer:314:29
    #6 0x5006b89 in std::__1::vector<unsigned char, std::__1::allocator<unsigned char> >::__append(unsigned long) /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/vector:1093:53
    #7 0x50fe8f8 in resize /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/vector:2024:15
    #8 0x50fe8f8 in (anonymous namespace)::PDBLinker::writeSymbolRecord(lld::coff::SectionChunk*, llvm::ArrayRef<unsigned char>, llvm::codeview::CVRecord<llvm::codeview::SymbolKind>, unsigned long, unsigned int&, std::__1::vector<unsigned char, std::__1::allocator<unsigned char> >&) /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:561:11
    #9 0x50fc61a in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:685:15
    #10 0x50fc61a in forEachCodeViewRecord<llvm::codeview::CVRecord<llvm::codeview::SymbolKind>, (lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:673:23)> /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/DebugInfo/CodeView/CVRecord.h:85:19
    #11 0x50fc61a in writeAllModuleSymbolRecords /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:672:17
    #12 0x50fc61a in (anonymous namespace)::PDBLinker::commitSymbolsForObject(void*, void*, llvm::BinaryStreamWriter&) /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/lld/COFF/PDB.cpp:712:41
    #13 0x5cb916f in llvm::pdb::DbiModuleDescriptorBuilder::commitSymbolStream(llvm::msf::MSFLayout const&, llvm::WritableBinaryStreamRef) /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/DebugInfo/PDB/Native/DbiModuleDescriptorBuilder.cpp:175:21
    #14 0x5cc8e5b in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/DebugInfo/PDB/Native/DbiStreamBuilder.cpp:405:23
    #15 0x5cc8e5b in operator()<std::unique_ptr<llvm::pdb::DbiModuleDescriptorBuilder> &> /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Parallel.h:302:37
    #16 0x5cc8e5b in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Parallel.h:192:25
    #17 0x5cc8e5b in __invoke<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Parallel.h:188:16) &> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/type_traits:3679:1
    #18 0x5cc8e5b in __call<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/include/llvm/Support/Parallel.h:188:16) &> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/__functional_base:348:9
    #19 0x5cc8e5b in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1558:16
    #20 0x5cc8e5b in _ZNSt3__110__function6__funcIZN4llvm8parallel6detail25parallel_transform_reduceINS_11__wrap_iterIPNS_10unique_ptrINS2_3pdb26DbiModuleDescriptorBuilderENS_14default_deleteIS9_EEEEEEP15LLVMOpaqueErrorZNS2_20parallelForEachErrorIRNS_6vectorISC_NS_9allocatorISC_EEEEZNS8_16DbiStreamBuilder6commitERKNS2_3msf9MSFLayoutENS2_23WritableBinaryStreamRefEE3$_4EENS2_5ErrorEOT_T0_EUlSG_SG_E_ZNSH_ISM_ST_EESU_SW_SX_EUlSW_E_EESX_SV_SV_SX_T1_T2_EUlvE_NSJ_IS12_EEFvvEEclEv /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1732:12
    #21 0x4fecbd5 in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1885:16
    #22 0x4fecbd5 in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:2560:12
    #23 0x4fecbd5 in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:160:7
    #24 0x4fecbd5 in __invoke<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:159:41) &> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/type_traits:3679:1
    #25 0x4fecbd5 in __call<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:159:41) &> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/__functional_base:348:9
    #26 0x4fecbd5 in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1558:16
    #27 0x4fecbd5 in std::__1::__function::__func<llvm::parallel::detail::TaskGroup::spawn(std::__1::function<void ()>)::$_0, std::__1::allocator<llvm::parallel::detail::TaskGroup::spawn(std::__1::function<void ()>)::$_0>, void ()>::operator()() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1732:12
    #28 0x4fe8b1d in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:1885:16
    #29 0x4fe8b1d in operator() /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/functional:2560:12
    #30 0x4fe8b1d in llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::work(llvm::ThreadPoolStrategy, unsigned int) /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:108:7
    #31 0x4fe8f8c in operator() /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:52:36
    #32 0x4fe8f8c in __invoke<(lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:52:30)> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/type_traits:3679:1
    #33 0x4fe8f8c in __thread_execute<std::unique_ptr<std::__thread_struct>, (lambda at /b/sanitizer-x86_64-linux-bootstrap/build/llvm-project/llvm/lib/Support/Parallel.cpp:52:30)> /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/thread:280:5
    #34 0x4fe8f8c in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::ThreadPoolExecutor(llvm::ThreadPoolStrategy)::'lambda'()::operator()() const::'lambda'()> >(void*) /b/sanitizer-x86_64-linux-bootstrap/build/libcxx_build_asan/include/c++/v1/thread:291:5
    #35 0x7f3f5a8e7fa2 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7fa2)

I debugged this yesterday. The bug is from container overflow, which is only detectable with an ASan instrumented build of libc++, which I don't have. I made some local source changes to replace a std::vector with a temporary array, and I was able to observe the bug.

lld/COFF/Chunks.cpp
451	This is the buggy bounds check. At this point, we don't know how large the relocation is, so it's hard to check. We have an implicit assumption here that all relocations are entirely within or outside of the subsection that is being relocated. The assumption is correct for well-formed debug information. However, we have what appear to be invalid input objects in the pdb-file-static test case. These input objects are yaml-ified object files from MSVC. It seems that obj2yaml/yambl2obj of MSVC object files does not round trip. The relocations, which are maintained separately from the symbol records, do not correspond to the symbol record offsets that yaml2obj generates. We know that MSVC does not align symbol records to 4 bytes, but LLVM does. This may be the source of the discrepancy. This bug is actually already present in LLD. An object file with a relocation that points to the last byte of a section will cause LLD to do an OOB write. Since LLD applies relocations directly to the output file memory, it is hard to observe or exploit this bug. However, given this recent refactoring, I think it might be worth going back and fixing it.

rnk mentioned this in rG9e708ac6b992: [COFF] Fix relocation offsets in pdb-file-statics test input.Jan 20 2021, 11:45 AM

Revision Contents

Path

Size

lld/

COFF/

Chunks.h

10 lines

Chunks.cpp

113 lines

PDB.cpp

643 lines

llvm/

include/

llvm/

DebugInfo/

PDB/

Native/

DbiModuleDescriptorBuilder.h

63 lines

lib/

DebugInfo/

PDB/

Native/

DbiModuleDescriptorBuilder.cpp

81 lines

DbiStreamBuilder.cpp

10 lines

Diff 316290

lld/COFF/Chunks.h

Show First 20 Lines • Show All 198 Lines • ▼ Show 20 Lines	public:
};		};

SectionChunk(ObjFile file, const coff_section header);		SectionChunk(ObjFile file, const coff_section header);
static bool classof(const Chunk *c) { return c->kind() == SectionKind; }		static bool classof(const Chunk *c) { return c->kind() == SectionKind; }
size_t getSize() const { return header->SizeOfRawData; }		size_t getSize() const { return header->SizeOfRawData; }
ArrayRef<uint8_t> getContents() const;		ArrayRef<uint8_t> getContents() const;
void writeTo(uint8_t *buf) const;		void writeTo(uint8_t *buf) const;

		// Defend against unsorted relocations. This may be overly conservative.
		void sortRelocations();

		// Write and relocate a portion of the section. This is intended to be called
		// in a loop. Relocations must be sorted first.
		void writeAndRelocateSubsection(ArrayRef<uint8_t> sec,
		ArrayRef<uint8_t> subsec,
		uint32_t &nextRelocIndex, uint8_t *buf) const;

uint32_t getOutputCharacteristics() const {		uint32_t getOutputCharacteristics() const {
return header->Characteristics & (permMask \| typeMask);		return header->Characteristics & (permMask \| typeMask);
}		}
StringRef getSectionName() const {		StringRef getSectionName() const {
return StringRef(sectionNameData, sectionNameSize);		return StringRef(sectionNameData, sectionNameSize);
}		}
void getBaserels(std::vector<Baserel> *res);		void getBaserels(std::vector<Baserel> *res);
bool isCOMDAT() const;		bool isCOMDAT() const;
		void applyRelocation(uint8_t *off, const coff_relocation &rel) const;
void applyRelX64(uint8_t off, uint16_t type, OutputSection os, uint64_t s,		void applyRelX64(uint8_t off, uint16_t type, OutputSection os, uint64_t s,
uint64_t p) const;		uint64_t p) const;
void applyRelX86(uint8_t off, uint16_t type, OutputSection os, uint64_t s,		void applyRelX86(uint8_t off, uint16_t type, OutputSection os, uint64_t s,
uint64_t p) const;		uint64_t p) const;
void applyRelARM(uint8_t off, uint16_t type, OutputSection os, uint64_t s,		void applyRelARM(uint8_t off, uint16_t type, OutputSection os, uint64_t s,
uint64_t p) const;		uint64_t p) const;
void applyRelARM64(uint8_t off, uint16_t type, OutputSection os, uint64_t s,		void applyRelARM64(uint8_t off, uint16_t type, OutputSection os, uint64_t s,
uint64_t p) const;		uint64_t p) const;
▲ Show 20 Lines • Show All 465 Lines • Show Last 20 Lines

lld/COFF/Chunks.cpp

Show First 20 Lines • Show All 363 Lines • ▼ Show 20 Lines	for (size_t i = 0, e = relocsSize; i < e; i++) {
// we don't have the relocation size, which is only known after checking the		// we don't have the relocation size, which is only known after checking the
// machine and relocation type. As a result, a relocation may overwrite the		// machine and relocation type. As a result, a relocation may overwrite the
// beginning of the following input section.		// beginning of the following input section.
if (rel.VirtualAddress >= inputSize) {		if (rel.VirtualAddress >= inputSize) {
error("relocation points beyond the end of its parent section");		error("relocation points beyond the end of its parent section");
continue;		continue;
}		}

uint8_t *off = buf + rel.VirtualAddress;		applyRelocation(buf + rel.VirtualAddress, rel);
		}
		}

auto *sym =		void SectionChunk::applyRelocation(uint8_t *off,
dyn_cast_or_null<Defined>(file->getSymbol(rel.SymbolTableIndex));		const coff_relocation &rel) const {
		auto *sym = dyn_cast_or_null<Defined>(file->getSymbol(rel.SymbolTableIndex));

// Get the output section of the symbol for this relocation. The output		// Get the output section of the symbol for this relocation. The output
// section is needed to compute SECREL and SECTION relocations used in debug		// section is needed to compute SECREL and SECTION relocations used in debug
// info.		// info.
Chunk *c = sym ? sym->getChunk() : nullptr;		Chunk *c = sym ? sym->getChunk() : nullptr;
OutputSection *os = c ? c->getOutputSection() : nullptr;		OutputSection *os = c ? c->getOutputSection() : nullptr;

// Skip the relocation if it refers to a discarded section, and diagnose it		// Skip the relocation if it refers to a discarded section, and diagnose it
// as an error if appropriate. If a symbol was discarded early, it may be		// as an error if appropriate. If a symbol was discarded early, it may be
// null. If it was discarded late, the output section will be null, unless		// null. If it was discarded late, the output section will be null, unless
// it was an absolute or synthetic symbol.		// it was an absolute or synthetic symbol.
if (!sym \|\|		if (!sym \|\|
(!os && !isa<DefinedAbsolute>(sym) && !isa<DefinedSynthetic>(sym))) {		(!os && !isa<DefinedAbsolute>(sym) && !isa<DefinedSynthetic>(sym))) {
maybeReportRelocationToDiscarded(this, sym, rel);		maybeReportRelocationToDiscarded(this, sym, rel);
continue;		return;
}		}

uint64_t s = sym->getRVA();		uint64_t s = sym->getRVA();

// Compute the RVA of the relocation for relative relocations.		// Compute the RVA of the relocation for relative relocations.
uint64_t p = rva + rel.VirtualAddress;		uint64_t p = rva + rel.VirtualAddress;
switch (config->machine) {		switch (config->machine) {
case AMD64:		case AMD64:
applyRelX64(off, rel.Type, os, s, p);		applyRelX64(off, rel.Type, os, s, p);
break;		break;
case I386:		case I386:
applyRelX86(off, rel.Type, os, s, p);		applyRelX86(off, rel.Type, os, s, p);
break;		break;
case ARMNT:		case ARMNT:
applyRelARM(off, rel.Type, os, s, p);		applyRelARM(off, rel.Type, os, s, p);
break;		break;
case ARM64:		case ARM64:
applyRelARM64(off, rel.Type, os, s, p);		applyRelARM64(off, rel.Type, os, s, p);
break;		break;
default:		default:
llvm_unreachable("unknown machine type");		llvm_unreachable("unknown machine type");
}		}
}		}

		// Defend against unsorted relocations. This may be overly conservative.
		void SectionChunk::sortRelocations() {
		auto cmpByVa = [](const coff_relocation &l, const coff_relocation &r) {
		return l.VirtualAddress < r.VirtualAddress;
		};
		if (llvm::is_sorted(getRelocs(), cmpByVa))
		return;
		warn("some relocations in " + file->getName() + " are not sorted");
		MutableArrayRef<coff_relocation> newRelocs(
		bAlloc.Allocate<coff_relocation>(relocsSize), relocsSize);
		memcpy(newRelocs.data(), relocsData, relocsSize * sizeof(coff_relocation));
		llvm::sort(newRelocs, cmpByVa);
		setRelocs(newRelocs);
		}

		// Similar to writeTo, but suitable for relocating a subsection of the overall
		// section.
		void SectionChunk::writeAndRelocateSubsection(ArrayRef<uint8_t> sec,
		ArrayRef<uint8_t> subsec,
		uint32_t &nextRelocIndex,
		uint8_t *buf) const {
		assert(!subsec.empty() && !sec.empty());
		assert(sec.begin() <= subsec.begin() && subsec.end() <= sec.end() &&
		"subsection is not part of this section");
		size_t vaBegin = std::distance(sec.begin(), subsec.begin());
		size_t vaEnd = std::distance(sec.begin(), subsec.end());
		memcpy(buf, subsec.data(), subsec.size());
		for (; nextRelocIndex < relocsSize; ++nextRelocIndex) {
		const coff_relocation &rel = relocsData[nextRelocIndex];
		// Skip relocations applied before this subsection.
		if (rel.VirtualAddress < vaBegin)
		continue;
		// Stop if the relocation does not apply to this subsection.
		if (rel.VirtualAddress >= vaEnd)
		rnkAuthorUnsubmitted Done Reply Inline Actions This is the buggy bounds check. At this point, we don't know how large the relocation is, so it's hard to check. We have an implicit assumption here that all relocations are entirely within or outside of the subsection that is being relocated. The assumption is correct for well-formed debug information. However, we have what appear to be invalid input objects in the pdb-file-static test case. These input objects are yaml-ified object files from MSVC. It seems that obj2yaml/yambl2obj of MSVC object files does not round trip. The relocations, which are maintained separately from the symbol records, do not correspond to the symbol record offsets that yaml2obj generates. We know that MSVC does not align symbol records to 4 bytes, but LLVM does. This may be the source of the discrepancy. This bug is actually already present in LLD. An object file with a relocation that points to the last byte of a section will cause LLD to do an OOB write. Since LLD applies relocations directly to the output file memory, it is hard to observe or exploit this bug. However, given this recent refactoring, I think it might be worth going back and fixing it. rnk: This is the buggy bounds check. At this point, we don't know how large the relocation is, so…
		break;
		applyRelocation(&buf[rel.VirtualAddress - vaBegin], rel);
		}
}		}

void SectionChunk::addAssociative(SectionChunk *child) {		void SectionChunk::addAssociative(SectionChunk *child) {
// Insert this child at the head of the list.		// Insert this child at the head of the list.
assert(child->assocChildren == nullptr &&		assert(child->assocChildren == nullptr &&
"associated sections cannot have their own associated children");		"associated sections cannot have their own associated children");
child->assocChildren = assocChildren;		child->assocChildren = assocChildren;
assocChildren = child;		assocChildren = child;
▲ Show 20 Lines • Show All 501 Lines • Show Last 20 Lines

lld/COFF/PDB.cpp

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
#include <memory>		#include <memory>

using namespace llvm;		using namespace llvm;
using namespace llvm::codeview;		using namespace llvm::codeview;
using namespace lld;		using namespace lld;
using namespace lld::coff;		using namespace lld::coff;

using llvm::object::coff_section;		using llvm::object::coff_section;
		using llvm::pdb::StringTableFixup;

static ExitOnError exitOnErr;		static ExitOnError exitOnErr;

static Timer totalPdbLinkTimer("PDB Emission (Cumulative)", Timer::root());		static Timer totalPdbLinkTimer("PDB Emission (Cumulative)", Timer::root());
static Timer addObjectsTimer("Add Objects", totalPdbLinkTimer);		static Timer addObjectsTimer("Add Objects", totalPdbLinkTimer);
Timer lld::coff::loadGHashTimer("Global Type Hashing", addObjectsTimer);		Timer lld::coff::loadGHashTimer("Global Type Hashing", addObjectsTimer);
Timer lld::coff::mergeGHashTimer("GHash Type Merging", addObjectsTimer);		Timer lld::coff::mergeGHashTimer("GHash Type Merging", addObjectsTimer);
static Timer typeMergingTimer("Type Merging", addObjectsTimer);		static Timer typeMergingTimer("Type Merging", addObjectsTimer);
Show All 30 Lines	public:
void addObjectsToPDB();		void addObjectsToPDB();

/// Add every live, defined public symbol to the PDB.		/// Add every live, defined public symbol to the PDB.
void addPublicsToPDB();		void addPublicsToPDB();

/// Link info for each import file in the symbol table into the PDB.		/// Link info for each import file in the symbol table into the PDB.
void addImportFilesToPDB(ArrayRef<OutputSection *> outputSections);		void addImportFilesToPDB(ArrayRef<OutputSection *> outputSections);

		void createModuleDBI(ObjFile *file);

/// Link CodeView from a single object file into the target (output) PDB.		/// Link CodeView from a single object file into the target (output) PDB.
/// When a precompiled headers object is linked, its TPI map might be provided		/// When a precompiled headers object is linked, its TPI map might be provided
/// externally.		/// externally.
void addDebug(TpiSource *source);		void addDebug(TpiSource *source);

void addDebugSymbols(TpiSource *source);		void addDebugSymbols(TpiSource *source);

void mergeSymbolRecords(TpiSource *source,		// Analyze the symbol records to separate module symbols from global symbols,
std::vector<ulittle32_t *> &stringTableRefs,		// find string references, and calculate how large the symbol stream will be
		// in the PDB.
		void analyzeSymbolSubsection(SectionChunk *debugChunk,
		uint32_t &moduleSymOffset,
		aganeaUnsubmitted Done Reply Inline Actions s/moduleStreamSize/moduleSymOffset/ to match the definition. aganea: s/moduleStreamSize/moduleSymOffset/ to match the definition.
		uint32_t &nextRelocIndex,
		std::vector<StringTableFixup> &stringTableFixups,
BinaryStreamRef symData);		BinaryStreamRef symData);

		// Write all module symbols from all all live debug symbol subsections of the
		// given object file into the given stream writer.
		Error writeAllModuleSymbolRecords(ObjFile *file, BinaryStreamWriter &writer);

		// Callback to copy and relocate debug symbols during PDB file writing.
		static Error commitSymbolsForObject(void ctx, void obj,
		BinaryStreamWriter &writer);

		// Copy the symbol record, relocate it, and fix the alignment if necessary.
		// Rewrite type indices in the record. Replace unrecognized symbol records
		// with S_SKIP records.
		void writeSymbolRecord(SectionChunk *debugChunk,
		ArrayRef<uint8_t> sectionContents, CVSymbol sym,
		size_t alignedSize, uint32_t &nextRelocIndex,
		std::vector<uint8_t> &storage);

/// Add the section map and section contributions to the PDB.		/// Add the section map and section contributions to the PDB.
void addSections(ArrayRef<OutputSection *> outputSections,		void addSections(ArrayRef<OutputSection *> outputSections,
ArrayRef<uint8_t> sectionTable);		ArrayRef<uint8_t> sectionTable);

/// Write the PDB to disk and store the Guid generated for it in *Guid.		/// Write the PDB to disk and store the Guid generated for it in *Guid.
void commit(codeview::GUID *guid);		void commit(codeview::GUID *guid);

// Print statistics regarding the final PDB		// Print statistics regarding the final PDB
Show All 15 Lines	private:
// For statistics		// For statistics
uint64_t globalSymbols = 0;		uint64_t globalSymbols = 0;
uint64_t moduleSymbols = 0;		uint64_t moduleSymbols = 0;
uint64_t publicSymbols = 0;		uint64_t publicSymbols = 0;
uint64_t nbTypeRecords = 0;		uint64_t nbTypeRecords = 0;
uint64_t nbTypeRecordsBytes = 0;		uint64_t nbTypeRecordsBytes = 0;
};		};

		/// Represents an unrelocated DEBUG_S_FRAMEDATA subsection.
		struct UnrelocatedFpoData {
		SectionChunk *debugChunk = nullptr;
		ArrayRef<uint8_t> subsecData;
		uint32_t relocIndex = 0;
		};

		/// The size of the magic bytes at the beginning of a symbol section or stream.
		enum : uint32_t { kSymbolStreamMagicSize = 4 };

class DebugSHandler {		class DebugSHandler {
PDBLinker &linker;		PDBLinker &linker;

/// The object file whose .debug$S sections we're processing.		/// The object file whose .debug$S sections we're processing.
ObjFile &file;		ObjFile &file;

/// The result of merging type indices.		/// The result of merging type indices.
TpiSource *source;		TpiSource *source;
Show All 10 Lines	class DebugSHandler {
/// PDB.		/// PDB.
DebugChecksumsSubsectionRef checksums;		DebugChecksumsSubsectionRef checksums;

/// The DEBUG_S_FRAMEDATA subsection(s). There can be more than one of		/// The DEBUG_S_FRAMEDATA subsection(s). There can be more than one of
/// these and they need not appear in any specific order. However, they		/// these and they need not appear in any specific order. However, they
/// contain string table references which need to be re-written, so we		/// contain string table references which need to be re-written, so we
/// collect them all here and re-write them after all subsections have been		/// collect them all here and re-write them after all subsections have been
/// discovered and processed.		/// discovered and processed.
std::vector<DebugFrameDataSubsectionRef> newFpoFrames;		std::vector<UnrelocatedFpoData> frameDataSubsecs;

		/// List of string table references in symbol records. Later they will be
		/// applied to the symbols during PDB writing.
		std::vector<StringTableFixup> stringTableFixups;

		/// Sum of the size of all module symbol records across all .debug$S sections.
		/// Includes record realignment and the size of the symbol stream magic
		/// prefix.
		aganeaUnsubmitted Done Reply Inline Actions It's a bit strange that '4' means to do nothing in `finish()` but I understand that it makes the logic in `analyzeSymbolSubsection()` less complicated. aganea: It's a bit strange that '4' means to do nothing in `finish()` but I understand that it makes…
		rnkAuthorUnsubmitted Done Reply Inline Actions I went ahead and gave this a named constant so it's a bit more readable. rnk: I went ahead and gave this a named constant so it's a bit more readable.
		uint32_t moduleStreamSize = kSymbolStreamMagicSize;

		/// Next relocation index in the current .debug$S section. Resets every
		/// handleDebugS call.
		uint32_t nextRelocIndex = 0;

		void advanceRelocIndex(SectionChunk *debugChunk, ArrayRef<uint8_t> subsec);

/// Pointers to raw memory that we determine have string table references		void addUnrelocatedSubsection(SectionChunk *debugChunk,
/// that need to be re-written. We first process all .debug$S subsections		const DebugSubsectionRecord &ss);
/// to ensure that we can handle subsections written in any order, building
/// up this list as we go. At the end, we use the string table (which must
/// have been discovered by now else it is an error) to re-write these
/// references.
std::vector<ulittle32_t *> stringTableReferences;

void mergeInlineeLines(const DebugSubsectionRecord &inlineeLines);		void addFrameDataSubsection(SectionChunk *debugChunk,
		const DebugSubsectionRecord &ss);

		void recordStringTableReferences(CVSymbol sym, uint32_t symOffset);

public:		public:
DebugSHandler(PDBLinker &linker, ObjFile &file, TpiSource *source)		DebugSHandler(PDBLinker &linker, ObjFile &file, TpiSource *source)
: linker(linker), file(file), source(source) {}		: linker(linker), file(file), source(source) {}

void handleDebugS(ArrayRef<uint8_t> relocatedDebugContents);		void handleDebugS(SectionChunk *debugChunk);

void finish();		void finish();
};		};
}		}

// Visual Studio's debugger requires absolute paths in various places in the		// Visual Studio's debugger requires absolute paths in various places in the
// PDB to work without additional configuration:		// PDB to work without additional configuration:
// https://docs.microsoft.com/en-us/visualstudio/debugger/debug-source-files-common-properties-solution-property-pages-dialog-box		// https://docs.microsoft.com/en-us/visualstudio/debugger/debug-source-files-common-properties-solution-property-pages-dialog-box
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	builder.getTpiBuilder().addTypeRecords(source->mergedTpi.recs,
source->mergedTpi.recHashes);		source->mergedTpi.recHashes);
builder.getIpiBuilder().addTypeRecords(source->mergedIpi.recs,		builder.getIpiBuilder().addTypeRecords(source->mergedIpi.recs,
source->mergedIpi.recSizes,		source->mergedIpi.recSizes,
source->mergedIpi.recHashes);		source->mergedIpi.recHashes);
});		});
}		}

static void		static void
recordStringTableReferenceAtOffset(MutableArrayRef<uint8_t> contents,		recordStringTableReferences(CVSymbol sym, uint32_t symOffset,
uint32_t offset,		std::vector<StringTableFixup> &stringTableFixups) {
std::vector<ulittle32_t *> &strTableRefs) {
contents =
contents.drop_front(offset).take_front(sizeof(support::ulittle32_t));
ulittle32_t index = reinterpret_cast<ulittle32_t >(contents.data());
strTableRefs.push_back(index);
}

static void
recordStringTableReferences(SymbolKind kind, MutableArrayRef<uint8_t> contents,
std::vector<ulittle32_t *> &strTableRefs) {
// For now we only handle S_FILESTATIC, but we may need the same logic for		// For now we only handle S_FILESTATIC, but we may need the same logic for
// S_DEFRANGE and S_DEFRANGE_SUBFIELD. However, I cannot seem to generate any		// S_DEFRANGE and S_DEFRANGE_SUBFIELD. However, I cannot seem to generate any
// PDBs that contain these types of records, so because of the uncertainty		// PDBs that contain these types of records, so because of the uncertainty
// they are omitted here until we can prove that it's necessary.		// they are omitted here until we can prove that it's necessary.
switch (kind) {		switch (sym.kind()) {
case SymbolKind::S_FILESTATIC:		case SymbolKind::S_FILESTATIC: {
// FileStaticSym::ModFileOffset		// FileStaticSym::ModFileOffset
recordStringTableReferenceAtOffset(contents, 8, strTableRefs);		uint32_t ref = reinterpret_cast<const ulittle32_t >(&sym.data()[8]);
		stringTableFixups.push_back({ref, symOffset + 8});
break;		break;
		}
case SymbolKind::S_DEFRANGE:		case SymbolKind::S_DEFRANGE:
case SymbolKind::S_DEFRANGE_SUBFIELD:		case SymbolKind::S_DEFRANGE_SUBFIELD:
log("Not fixing up string table reference in S_DEFRANGE / "		log("Not fixing up string table reference in S_DEFRANGE / "
"S_DEFRANGE_SUBFIELD record");		"S_DEFRANGE_SUBFIELD record");
break;		break;
default:		default:
break;		break;
}		}
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	if (kind == SymbolKind::S_GPROC32_ID \|\| kind == SymbolKind::S_LPROC32_ID) {
}		}

kind = (kind == SymbolKind::S_GPROC32_ID) ? SymbolKind::S_GPROC32		kind = (kind == SymbolKind::S_GPROC32_ID) ? SymbolKind::S_GPROC32
: SymbolKind::S_LPROC32;		: SymbolKind::S_LPROC32;
prefix->RecordKind = uint16_t(kind);		prefix->RecordKind = uint16_t(kind);
}		}
}		}

/// Copy the symbol record. In a PDB, symbol records must be 4 byte aligned.		namespace {
/// The object file may not be aligned.
static MutableArrayRef<uint8_t>
copyAndAlignSymbol(const CVSymbol &sym, MutableArrayRef<uint8_t> &alignedMem) {
size_t size = alignTo(sym.length(), alignOf(CodeViewContainer::Pdb));
assert(size >= 4 && "record too short");
assert(size <= MaxRecordLength && "record too long");
assert(alignedMem.size() >= size && "didn't preallocate enough");

// Copy the symbol record and zero out any padding bytes.
MutableArrayRef<uint8_t> newData = alignedMem.take_front(size);
alignedMem = alignedMem.drop_front(size);
memcpy(newData.data(), sym.data().data(), sym.length());
memset(newData.data() + sym.length(), 0, size - sym.length());

// Update the record prefix length. It should point to the beginning of the
// next record.
auto prefix = reinterpret_cast<RecordPrefix >(newData.data());
prefix->RecordLen = size - 2;
return newData;
}

struct ScopeRecord {		struct ScopeRecord {
ulittle32_t ptrParent;		ulittle32_t ptrParent;
ulittle32_t ptrEnd;		ulittle32_t ptrEnd;
};		};
		} // namespace

struct SymbolScope {		/// Given a pointer to a symbol record that opens a scope, return a pointer to
ScopeRecord *openingRecord;		/// the scope fields.
uint32_t scopeOffset;		static ScopeRecord getSymbolScopeFields(void sym) {
};		return reinterpret_cast<ScopeRecord >(reinterpret_cast<char >(sym) +
		sizeof(RecordPrefix));
static void scopeStackOpen(SmallVectorImpl<SymbolScope> &stack,		}
uint32_t curOffset, CVSymbol &sym) {
assert(symbolOpensScope(sym.kind()));		// To open a scope, push the offset of the current symbol record onto the
SymbolScope s;		// stack.
s.scopeOffset = curOffset;		static void scopeStackOpen(SmallVectorImpl<uint32_t> &stack,
s.openingRecord = const_cast<ScopeRecord *>(		std::vector<uint8_t> &storage) {
reinterpret_cast<const ScopeRecord *>(sym.content().data()));		stack.push_back(storage.size());
s.openingRecord->ptrParent = stack.empty() ? 0 : stack.back().scopeOffset;		}
stack.push_back(s);
}		// To close a scope, update the record that opened the scope.
		static void scopeStackClose(SmallVectorImpl<uint32_t> &stack,
static void scopeStackClose(SmallVectorImpl<SymbolScope> &stack,		std::vector<uint8_t> &storage,
uint32_t curOffset, InputFile *file) {		uint32_t storageBaseOffset, ObjFile *file) {
if (stack.empty()) {		if (stack.empty()) {
warn("symbol scopes are not balanced in " + file->getName());		warn("symbol scopes are not balanced in " + file->getName());
return;		return;
}		}
SymbolScope s = stack.pop_back_val();
s.openingRecord->ptrEnd = curOffset;		// Update ptrEnd of the record that opened the scope to point to the
		// current record, if we are writing into the module symbol stream.
		uint32_t offOpen = stack.pop_back_val();
		uint32_t offEnd = storageBaseOffset + storage.size();
		uint32_t offParent = stack.empty() ? 0 : (stack.back() + storageBaseOffset);
		ScopeRecord *scopeRec = getSymbolScopeFields(&(storage)[offOpen]);
		scopeRec->ptrParent = offParent;
		scopeRec->ptrEnd = offEnd;
}		}

static bool symbolGoesInModuleStream(const CVSymbol &sym, bool isGlobalScope) {		static bool symbolGoesInModuleStream(const CVSymbol &sym,
		unsigned symbolScopeDepth) {
switch (sym.kind()) {		switch (sym.kind()) {
case SymbolKind::S_GDATA32:		case SymbolKind::S_GDATA32:
case SymbolKind::S_CONSTANT:		case SymbolKind::S_CONSTANT:
case SymbolKind::S_GTHREAD32:		case SymbolKind::S_GTHREAD32:
// We really should not be seeing S_PROCREF and S_LPROCREF in the first place		// We really should not be seeing S_PROCREF and S_LPROCREF in the first place
// since they are synthesized by the linker in response to S_GPROC32 and		// since they are synthesized by the linker in response to S_GPROC32 and
// S_LPROC32, but if we do see them, don't put them in the module stream I		// S_LPROC32, but if we do see them, don't put them in the module stream I
// guess.		// guess.
case SymbolKind::S_PROCREF:		case SymbolKind::S_PROCREF:
case SymbolKind::S_LPROCREF:		case SymbolKind::S_LPROCREF:
return false;		return false;
// S_UDT records go in the module stream if it is not a global S_UDT.		// S_UDT records go in the module stream if it is not a global S_UDT.
case SymbolKind::S_UDT:		case SymbolKind::S_UDT:
return !isGlobalScope;		return symbolScopeDepth > 0;
// S_GDATA32 does not go in the module stream, but S_LDATA32 does.		// S_GDATA32 does not go in the module stream, but S_LDATA32 does.
case SymbolKind::S_LDATA32:		case SymbolKind::S_LDATA32:
case SymbolKind::S_LTHREAD32:		case SymbolKind::S_LTHREAD32:
default:		default:
return true;		return true;
}		}
}		}

static bool symbolGoesInGlobalsStream(const CVSymbol &sym,		static bool symbolGoesInGlobalsStream(const CVSymbol &sym,
bool isFunctionScope) {		unsigned symbolScopeDepth) {
switch (sym.kind()) {		switch (sym.kind()) {
case SymbolKind::S_CONSTANT:		case SymbolKind::S_CONSTANT:
case SymbolKind::S_GDATA32:		case SymbolKind::S_GDATA32:
case SymbolKind::S_GTHREAD32:		case SymbolKind::S_GTHREAD32:
case SymbolKind::S_GPROC32:		case SymbolKind::S_GPROC32:
case SymbolKind::S_LPROC32:		case SymbolKind::S_LPROC32:
		case SymbolKind::S_GPROC32_ID:
		aganeaUnsubmitted Not Done Reply Inline Actions Was this a divergence from MSVC link.exe, or was it handled somewhere else before your patch? aganea: Was this a divergence from MSVC link.exe, or was it handled somewhere else before your patch?
		rnkAuthorUnsubmitted Done Reply Inline Actions This is a necessary change because now this code runs before `translateIdSymbols` runs, so it has to include the `_ID` procedure variants rnk:* This is a necessary change because now this code runs before `translateIdSymbols` runs, so it…
		case SymbolKind::S_LPROC32_ID:
// We really should not be seeing S_PROCREF and S_LPROCREF in the first place		// We really should not be seeing S_PROCREF and S_LPROCREF in the first place
// since they are synthesized by the linker in response to S_GPROC32 and		// since they are synthesized by the linker in response to S_GPROC32 and
// S_LPROC32, but if we do see them, copy them straight through.		// S_LPROC32, but if we do see them, copy them straight through.
case SymbolKind::S_PROCREF:		case SymbolKind::S_PROCREF:
case SymbolKind::S_LPROCREF:		case SymbolKind::S_LPROCREF:
return true;		return true;
// Records that go in the globals stream, unless they are function-local.		// Records that go in the globals stream, unless they are function-local.
case SymbolKind::S_UDT:		case SymbolKind::S_UDT:
case SymbolKind::S_LDATA32:		case SymbolKind::S_LDATA32:
case SymbolKind::S_LTHREAD32:		case SymbolKind::S_LTHREAD32:
return !isFunctionScope;		return symbolScopeDepth == 0;
default:		default:
return false;		return false;
}		}
}		}

static void addGlobalSymbol(pdb::GSIStreamBuilder &builder, uint16_t modIndex,		static void addGlobalSymbol(pdb::GSIStreamBuilder &builder, uint16_t modIndex,
unsigned symOffset, const CVSymbol &sym) {		unsigned symOffset,
		std::vector<uint8_t> &symStorage) {
		CVSymbol sym(makeArrayRef(symStorage));
switch (sym.kind()) {		switch (sym.kind()) {
case SymbolKind::S_CONSTANT:		case SymbolKind::S_CONSTANT:
case SymbolKind::S_UDT:		case SymbolKind::S_UDT:
case SymbolKind::S_GDATA32:		case SymbolKind::S_GDATA32:
case SymbolKind::S_GTHREAD32:		case SymbolKind::S_GTHREAD32:
case SymbolKind::S_LTHREAD32:		case SymbolKind::S_LTHREAD32:
case SymbolKind::S_LDATA32:		case SymbolKind::S_LDATA32:
case SymbolKind::S_PROCREF:		case SymbolKind::S_PROCREF:
case SymbolKind::S_LPROCREF:		case SymbolKind::S_LPROCREF: {
builder.addGlobalSymbol(sym);		// sym is a temporary object, so we have to copy and reallocate the record
		// to stabilize it.
		uint8_t *mem = bAlloc.Allocate<uint8_t>(sym.length());
		memcpy(mem, sym.data().data(), sym.length());
		builder.addGlobalSymbol(CVSymbol(makeArrayRef(mem, sym.length())));
break;		break;
		}
case SymbolKind::S_GPROC32:		case SymbolKind::S_GPROC32:
case SymbolKind::S_LPROC32: {		case SymbolKind::S_LPROC32: {
SymbolRecordKind k = SymbolRecordKind::ProcRefSym;		SymbolRecordKind k = SymbolRecordKind::ProcRefSym;
if (sym.kind() == SymbolKind::S_LPROC32)		if (sym.kind() == SymbolKind::S_LPROC32)
k = SymbolRecordKind::LocalProcRef;		k = SymbolRecordKind::LocalProcRef;
ProcRefSym ps(k);		ProcRefSym ps(k);
ps.Module = modIndex;		ps.Module = modIndex;
// For some reason, MSVC seems to add one to this value.		// For some reason, MSVC seems to add one to this value.
++ps.Module;		++ps.Module;
ps.Name = getSymbolName(sym);		ps.Name = getSymbolName(sym);
ps.SumName = 0;		ps.SumName = 0;
ps.SymOffset = symOffset;		ps.SymOffset = symOffset;
builder.addGlobalSymbol(ps);		builder.addGlobalSymbol(ps);
break;		break;
}		}
default:		default:
llvm_unreachable("Invalid symbol kind!");		llvm_unreachable("Invalid symbol kind!");
}		}
}		}

void PDBLinker::mergeSymbolRecords(TpiSource *source,		// Check if the given symbol record was padded for alignment. If so, zero out
std::vector<ulittle32_t *> &stringTableRefs,		// the padding bytes and update the record prefix with the new size.
		static void fixRecordAlignment(MutableArrayRef<uint8_t> recordBytes,
		size_t oldSize) {
		size_t alignedSize = recordBytes.size();
		if (oldSize == alignedSize)
		return;
		reinterpret_cast<RecordPrefix *>(recordBytes.data())->RecordLen =
		alignedSize - 2;
		memset(recordBytes.data() + oldSize, 0, alignedSize - oldSize);
		}

		// Replace any record with a skip record of the same size. This is useful when
		// we have reserved size for a symbol record, but type index remapping fails.
		static void replaceWithSkipRecord(MutableArrayRef<uint8_t> recordBytes) {
		memset(recordBytes.data(), 0, recordBytes.size());
		auto prefix = reinterpret_cast<RecordPrefix >(recordBytes.data());
		prefix->RecordKind = SymbolKind::S_SKIP;
		prefix->RecordLen = recordBytes.size() - 2;
		}

		// Copy the symbol record, relocate it, and fix the alignment if necessary.
		// Rewrite type indices in the record. Replace unrecognized symbol records with
		// S_SKIP records.
		void PDBLinker::writeSymbolRecord(SectionChunk *debugChunk,
		ArrayRef<uint8_t> sectionContents,
		CVSymbol sym, size_t alignedSize,
		uint32_t &nextRelocIndex,
		std::vector<uint8_t> &storage) {
		// Allocate space for the new record at the end of the storage.
		storage.resize(storage.size() + alignedSize);
		auto recordBytes = MutableArrayRef<uint8_t>(storage).take_back(alignedSize);

		// Copy the symbol record and relocate it.
		debugChunk->writeAndRelocateSubsection(sectionContents, sym.data(),
		nextRelocIndex, recordBytes.data());
		fixRecordAlignment(recordBytes, sym.length());

		// Re-map all the type index references.
		TpiSource *source = debugChunk->file->debugTypesObj;
		if (!source->remapTypesInSymbolRecord(recordBytes)) {
		log("ignoring unknown symbol record with kind 0x" + utohexstr(sym.kind()));
		replaceWithSkipRecord(recordBytes);
		}

		// An object file may have S_xxx_ID symbols, but these get converted to
		// "real" symbols in a PDB.
		translateIdSymbols(recordBytes, tMerger, source);
		}

		void PDBLinker::analyzeSymbolSubsection(
		SectionChunk *debugChunk, uint32_t &moduleSymOffset,
		uint32_t &nextRelocIndex, std::vector<StringTableFixup> &stringTableFixups,
BinaryStreamRef symData) {		BinaryStreamRef symData) {
ObjFile *file = source->file;		ObjFile *file = debugChunk->file;
		uint32_t moduleSymStart = moduleSymOffset;

		uint32_t scopeLevel = 0;
		std::vector<uint8_t> storage;
		ArrayRef<uint8_t> sectionContents = debugChunk->getContents();

ArrayRef<uint8_t> symsBuffer;		ArrayRef<uint8_t> symsBuffer;
cantFail(symData.readBytes(0, symData.getLength(), symsBuffer));		cantFail(symData.readBytes(0, symData.getLength(), symsBuffer));
SmallVector<SymbolScope, 4> scopes;

if (symsBuffer.empty())		if (symsBuffer.empty())
warn("empty symbols subsection in " + file->getName());		warn("empty symbols subsection in " + file->getName());

// Iterate every symbol to check if any need to be realigned, and if so, how		Error ec = forEachCodeViewRecord<CVSymbol>(
// much space we need to allocate for them.
bool needsRealignment = false;
unsigned totalRealignedSize = 0;
auto ec = forEachCodeViewRecord<CVSymbol>(
symsBuffer, [&](CVSymbol sym) -> llvm::Error {		symsBuffer, [&](CVSymbol sym) -> llvm::Error {
unsigned realignedSize =		// Track the current scope.
		if (symbolOpensScope(sym.kind()))
		++scopeLevel;
		else if (symbolEndsScope(sym.kind()))
		--scopeLevel;

		uint32_t alignedSize =
alignTo(sym.length(), alignOf(CodeViewContainer::Pdb));		alignTo(sym.length(), alignOf(CodeViewContainer::Pdb));
needsRealignment \|= realignedSize != sym.length();
totalRealignedSize += realignedSize;		// Copy global records. Some global records (mainly procedures)
		// reference the current offset into the module stream.
		if (symbolGoesInGlobalsStream(sym, scopeLevel)) {
		storage.clear();
		writeSymbolRecord(debugChunk, sectionContents, sym, alignedSize,
		nextRelocIndex, storage);
		addGlobalSymbol(builder.getGsiBuilder(),
		file->moduleDBI->getModuleIndex(), moduleSymOffset,
		storage);
		++globalSymbols;
		}

		// Update the module stream offset and record any string table index
		// references. There are very few of these and they will be rewritten
		// later during PDB writing.
		if (symbolGoesInModuleStream(sym, scopeLevel)) {
		recordStringTableReferences(sym, moduleSymOffset, stringTableFixups);
		moduleSymOffset += alignedSize;
		++moduleSymbols;
		}

return Error::success();		return Error::success();
});		});

// If any of the symbol record lengths was corrupt, ignore them all, warn		// If we encountered corrupt records, ignore the whole subsection. If we wrote
// about it, and move on.		// any partial records, undo that. For globals, we just keep what we have and
		// continue.
if (ec) {		if (ec) {
warn("corrupt symbol records in " + file->getName());		warn("corrupt symbol records in " + file->getName());
		moduleSymOffset = moduleSymStart;
consumeError(std::move(ec));		consumeError(std::move(ec));
return;
}

// If any symbol needed realignment, allocate enough contiguous memory for
// them all. Typically symbol subsections are small enough that this will not
// cause fragmentation.
MutableArrayRef<uint8_t> alignedSymbolMem;
if (needsRealignment) {
void *alignedData =
bAlloc.Allocate(totalRealignedSize, alignOf(CodeViewContainer::Pdb));
alignedSymbolMem = makeMutableArrayRef(
reinterpret_cast<uint8_t *>(alignedData), totalRealignedSize);
}		}

// Iterate again, this time doing the real work.
unsigned curSymOffset = file->moduleDBI->getNextSymbolOffset();
ArrayRef<uint8_t> bulkSymbols;
cantFail(forEachCodeViewRecord<CVSymbol>(
symsBuffer, [&](CVSymbol sym) -> llvm::Error {
// Align the record if required.
MutableArrayRef<uint8_t> recordBytes;
if (needsRealignment) {
recordBytes = copyAndAlignSymbol(sym, alignedSymbolMem);
sym = CVSymbol(recordBytes);
} else {
// Otherwise, we can actually mutate the symbol directly, since we
// copied it to apply relocations.
recordBytes = makeMutableArrayRef(
const_cast<uint8_t *>(sym.data().data()), sym.length());
}		}

// Re-map all the type index references.		Error PDBLinker::writeAllModuleSymbolRecords(ObjFile *file,
if (!source->remapTypesInSymbolRecord(recordBytes)) {		BinaryStreamWriter &writer) {
log("error remapping types in symbol of kind 0x" +		std::vector<uint8_t> storage;
utohexstr(sym.kind()) + ", ignoring");		SmallVector<uint32_t, 4> scopes;
return Error::success();
}		// Visit all live .debug$S sections a second time, and write them to the PDB.
		for (SectionChunk *debugChunk : file->getDebugChunks()) {
		if (!debugChunk->live \|\| debugChunk->getSize() == 0 \|\|
		debugChunk->getSectionName() != ".debug$S")
		continue;

// An object file may have S_xxx_ID symbols, but these get converted to		ArrayRef<uint8_t> sectionContents = debugChunk->getContents();
// "real" symbols in a PDB.		auto contents =
translateIdSymbols(recordBytes, tMerger, source);		SectionChunk::consumeDebugMagic(sectionContents, ".debug$S");
sym = CVSymbol(recordBytes);		DebugSubsectionArray subsections;
		BinaryStreamReader reader(contents, support::little);
		exitOnErr(reader.readArray(subsections, contents.size()));

// If this record refers to an offset in the object file's string table,		uint32_t nextRelocIndex = 0;
// add that item to the global PDB string table and re-write the index.		for (const DebugSubsectionRecord &ss : subsections) {
recordStringTableReferences(sym.kind(), recordBytes, stringTableRefs);		if (ss.kind() != DebugSubsectionKind::Symbols)
		continue;

// Fill in "Parent" and "End" fields by maintaining a stack of scopes.		uint32_t moduleSymStart = writer.getOffset();
		scopes.clear();
		storage.clear();
		ArrayRef<uint8_t> symsBuffer;
		BinaryStreamRef sr = ss.getRecordData();
		cantFail(sr.readBytes(0, sr.getLength(), symsBuffer));
		auto ec = forEachCodeViewRecord<CVSymbol>(
		symsBuffer, [&](CVSymbol sym) -> llvm::Error {
		// Track the current scope. Only update records in the postmerge
		// pass.
if (symbolOpensScope(sym.kind()))		if (symbolOpensScope(sym.kind()))
scopeStackOpen(scopes, curSymOffset, sym);		scopeStackOpen(scopes, storage);
else if (symbolEndsScope(sym.kind()))		else if (symbolEndsScope(sym.kind()))
scopeStackClose(scopes, curSymOffset, file);		scopeStackClose(scopes, storage, moduleSymStart, file);

// Add the symbol to the globals stream if necessary. Do this before		// Copy, relocate, and rewrite each module symbol.
// adding the symbol to the module since we may need to get the next		if (symbolGoesInModuleStream(sym, scopes.size())) {
// symbol offset, and writing to the module's symbol stream will update		uint32_t alignedSize =
// that offset.		alignTo(sym.length(), alignOf(CodeViewContainer::Pdb));
if (symbolGoesInGlobalsStream(sym, !scopes.empty())) {		writeSymbolRecord(debugChunk, sectionContents, sym, alignedSize,
addGlobalSymbol(builder.getGsiBuilder(),		nextRelocIndex, storage);
file->moduleDBI->getModuleIndex(), curSymOffset, sym);
++globalSymbols;
}		}
		return Error::success();
		});

if (symbolGoesInModuleStream(sym, scopes.empty())) {		// If we encounter corrupt records in the second pass, ignore them. We
// Add symbols to the module in bulk. If this symbol is contiguous		// already warned about them in the first analysis pass.
// with the previous run of symbols to add, combine the ranges. If		if (ec) {
// not, close the previous range of symbols and start a new one.		consumeError(std::move(ec));
if (sym.data().data() == bulkSymbols.end()) {		storage.clear();
bulkSymbols = makeArrayRef(bulkSymbols.data(),		}
bulkSymbols.size() + sym.length());
} else {		// Writing bytes has a very high overhead, so write the entire subsection
file->moduleDBI->addSymbolsInBulk(bulkSymbols);		// at once.
bulkSymbols = recordBytes;		// TODO: Consider buffering symbols for the entire object file to reduce
		// overhead even further.
		if (Error e = writer.writeBytes(storage))
		return e;
}		}
curSymOffset += sym.length();
++moduleSymbols;
}		}

return Error::success();		return Error::success();
}));		}

// Add any remaining symbols we've accumulated.		Error PDBLinker::commitSymbolsForObject(void ctx, void obj,
file->moduleDBI->addSymbolsInBulk(bulkSymbols);		BinaryStreamWriter &writer) {
		return static_cast<PDBLinker *>(ctx)->writeAllModuleSymbolRecords(
		static_cast<ObjFile *>(obj), writer);
}		}

static pdb::SectionContrib createSectionContrib(const Chunk *c, uint32_t modi) {		static pdb::SectionContrib createSectionContrib(const Chunk *c, uint32_t modi) {
OutputSection *os = c ? c->getOutputSection() : nullptr;		OutputSection *os = c ? c->getOutputSection() : nullptr;
pdb::SectionContrib sc;		pdb::SectionContrib sc;
memset(&sc, 0, sizeof(sc));		memset(&sc, 0, sizeof(sc));
sc.ISect = os ? os->sectionIndex : llvm::pdb::kInvalidStreamIndex;		sc.ISect = os ? os->sectionIndex : llvm::pdb::kInvalidStreamIndex;
sc.Off = c && os ? c->getRVA() - os->getRVA() : 0;		sc.Off = c && os ? c->getRVA() - os->getRVA() : 0;
Show All 23 Lines	if (!expectedString) {
warn("Invalid string table reference");		warn("Invalid string table reference");
consumeError(expectedString.takeError());		consumeError(expectedString.takeError());
return 0;		return 0;
}		}

return pdbStrTable.insert(*expectedString);		return pdbStrTable.insert(*expectedString);
}		}

void DebugSHandler::handleDebugS(ArrayRef<uint8_t> relocatedDebugContents) {		void DebugSHandler::handleDebugS(SectionChunk *debugChunk) {
relocatedDebugContents =		// Note that we are processing the unrelocated section contents. They will
SectionChunk::consumeDebugMagic(relocatedDebugContents, ".debug$S");		// be relocated later during PDB writing.
		ArrayRef<uint8_t> contents = debugChunk->getContents();
		contents = SectionChunk::consumeDebugMagic(contents, ".debug$S");
DebugSubsectionArray subsections;		DebugSubsectionArray subsections;
BinaryStreamReader reader(relocatedDebugContents, support::little);		BinaryStreamReader reader(contents, support::little);
exitOnErr(reader.readArray(subsections, relocatedDebugContents.size()));		exitOnErr(reader.readArray(subsections, contents.size()));
		debugChunk->sortRelocations();

		// Reset the relocation index, since this is a new section.
		nextRelocIndex = 0;

for (const DebugSubsectionRecord &ss : subsections) {		for (const DebugSubsectionRecord &ss : subsections) {
// Ignore subsections with the 'ignore' bit. Some versions of the Visual C++		// Ignore subsections with the 'ignore' bit. Some versions of the Visual C++
// runtime have subsections with this bit set.		// runtime have subsections with this bit set.
if (uint32_t(ss.kind()) & codeview::SubsectionIgnoreFlag)		if (uint32_t(ss.kind()) & codeview::SubsectionIgnoreFlag)
continue;		continue;

switch (ss.kind()) {		switch (ss.kind()) {
case DebugSubsectionKind::StringTable: {		case DebugSubsectionKind::StringTable: {
assert(!cvStrTab.valid() &&		assert(!cvStrTab.valid() &&
"Encountered multiple string table subsections!");		"Encountered multiple string table subsections!");
exitOnErr(cvStrTab.initialize(ss.getRecordData()));		exitOnErr(cvStrTab.initialize(ss.getRecordData()));
break;		break;
}		}
case DebugSubsectionKind::FileChecksums:		case DebugSubsectionKind::FileChecksums:
assert(!checksums.valid() &&		assert(!checksums.valid() &&
"Encountered multiple checksum subsections!");		"Encountered multiple checksum subsections!");
exitOnErr(checksums.initialize(ss.getRecordData()));		exitOnErr(checksums.initialize(ss.getRecordData()));
break;		break;
case DebugSubsectionKind::Lines:		case DebugSubsectionKind::Lines:
// We can add the relocated line table directly to the PDB without
// modification because the file checksum offsets will stay the same.
file.moduleDBI->addDebugSubsection(ss);
break;
case DebugSubsectionKind::InlineeLines:		case DebugSubsectionKind::InlineeLines:
// The inlinee lines subsection also has file checksum table references		addUnrelocatedSubsection(debugChunk, ss);
// that can be used directly, but it contains function id references that
// must be remapped.
mergeInlineeLines(ss);
break;		break;
case DebugSubsectionKind::FrameData: {		case DebugSubsectionKind::FrameData:
// We need to re-write string table indices here, so save off all		addFrameDataSubsection(debugChunk, ss);
// frame data subsections until we've processed the entire list of
// subsections so that we can be sure we have the string table.
DebugFrameDataSubsectionRef fds;
exitOnErr(fds.initialize(ss.getRecordData()));
newFpoFrames.push_back(std::move(fds));
break;		break;
}		case DebugSubsectionKind::Symbols:
case DebugSubsectionKind::Symbols: {		linker.analyzeSymbolSubsection(debugChunk, moduleStreamSize,
linker.mergeSymbolRecords(source, stringTableReferences,		nextRelocIndex, stringTableFixups,
ss.getRecordData());		ss.getRecordData());
break;		break;
}

case DebugSubsectionKind::CrossScopeImports:		case DebugSubsectionKind::CrossScopeImports:
case DebugSubsectionKind::CrossScopeExports:		case DebugSubsectionKind::CrossScopeExports:
// These appear to relate to cross-module optimization, so we might use		// These appear to relate to cross-module optimization, so we might use
// these for ThinLTO.		// these for ThinLTO.
break;		break;

case DebugSubsectionKind::ILLines:		case DebugSubsectionKind::ILLines:
Show All 10 Lines	for (const DebugSubsectionRecord &ss : subsections) {
default:		default:
warn("ignoring unknown debug$S subsection kind 0x" +		warn("ignoring unknown debug$S subsection kind 0x" +
utohexstr(uint32_t(ss.kind())) + " in file " + toString(&file));		utohexstr(uint32_t(ss.kind())) + " in file " + toString(&file));
break;		break;
}		}
}		}
}		}

static Expected<StringRef>		void DebugSHandler::advanceRelocIndex(SectionChunk *sc,
getFileName(const DebugStringTableSubsectionRef &strings,		ArrayRef<uint8_t> subsec) {
const DebugChecksumsSubsectionRef &checksums, uint32_t fileID) {		ptrdiff_t vaBegin = subsec.data() - sc->getContents().data();
auto iter = checksums.getArray().at(fileID);		assert(vaBegin > 0);
if (iter == checksums.getArray().end())		auto relocs = sc->getRelocs();
return make_error<CodeViewError>(cv_error_code::no_records);		for (; nextRelocIndex < relocs.size(); ++nextRelocIndex) {
uint32_t offset = iter->FileNameOffset;		if (relocs[nextRelocIndex].VirtualAddress >= vaBegin)
return strings.getString(offset);		break;
}		}

void DebugSHandler::mergeInlineeLines(
const DebugSubsectionRecord &inlineeSubsection) {
DebugInlineeLinesSubsectionRef inlineeLines;
exitOnErr(inlineeLines.initialize(inlineeSubsection.getRecordData()));
if (!source) {
warn("ignoring inlinee lines section in file that lacks type information");
return;
}		}

// Remap type indices in inlinee line records in place.		namespace {
		/// Wrapper class for unrelocated line and inlinee line subsections, which
		/// require only relocation and type index remapping to add to the PDB.
		class UnrelocatedDebugSubsection : public DebugSubsection {
		public:
		UnrelocatedDebugSubsection(DebugSubsectionKind k, SectionChunk *debugChunk,
		ArrayRef<uint8_t> subsec, uint32_t relocIndex)
		: DebugSubsection(k), debugChunk(debugChunk), subsec(subsec),
		relocIndex(relocIndex) {}

		Error commit(BinaryStreamWriter &writer) const override;
		uint32_t calculateSerializedSize() const override { return subsec.size(); }

		SectionChunk *debugChunk;
		ArrayRef<uint8_t> subsec;
		uint32_t relocIndex;
		};
		} // namespace

		Error UnrelocatedDebugSubsection::commit(BinaryStreamWriter &writer) const {
		std::vector<uint8_t> relocatedBytes(subsec.size());
		uint32_t tmpRelocIndex = relocIndex;
		debugChunk->writeAndRelocateSubsection(debugChunk->getContents(), subsec,
		tmpRelocIndex, relocatedBytes.data());

		// Remap type indices in inlinee line records in place. Skip the remapping if
		// there is no type source info.
		if (kind() == DebugSubsectionKind::InlineeLines &&
		debugChunk->file->debugTypesObj) {
		TpiSource *source = debugChunk->file->debugTypesObj;
		DebugInlineeLinesSubsectionRef inlineeLines;
		BinaryStreamReader storageReader(relocatedBytes, support::little);
		exitOnErr(inlineeLines.initialize(storageReader));
for (const InlineeSourceLine &line : inlineeLines) {		for (const InlineeSourceLine &line : inlineeLines) {
TypeIndex &inlinee = const_cast<TypeIndex >(&line.Header->Inlinee);		TypeIndex &inlinee = const_cast<TypeIndex >(&line.Header->Inlinee);
if (!source->remapTypeIndex(inlinee, TiRefKind::IndexRef)) {		if (!source->remapTypeIndex(inlinee, TiRefKind::IndexRef)) {
log("bad inlinee line record in " + file.getName() +		log("bad inlinee line record in " + debugChunk->file->getName() +
" with bad inlinee index 0x" + utohexstr(inlinee.getIndex()));		" with bad inlinee index 0x" + utohexstr(inlinee.getIndex()));
}		}
}		}
		}

		return writer.writeBytes(relocatedBytes);
		}

		void DebugSHandler::addUnrelocatedSubsection(SectionChunk *debugChunk,
		const DebugSubsectionRecord &ss) {
		ArrayRef<uint8_t> subsec;
		BinaryStreamRef sr = ss.getRecordData();
		cantFail(sr.readBytes(0, sr.getLength(), subsec));
		advanceRelocIndex(debugChunk, subsec);
		file.moduleDBI->addDebugSubsection(
		std::make_shared<UnrelocatedDebugSubsection>(ss.kind(), debugChunk,
		subsec, nextRelocIndex));
		}

		void DebugSHandler::addFrameDataSubsection(SectionChunk *debugChunk,
		const DebugSubsectionRecord &ss) {
		// We need to re-write string table indices here, so save off all
		// frame data subsections until we've processed the entire list of
		// subsections so that we can be sure we have the string table.
		ArrayRef<uint8_t> subsec;
		BinaryStreamRef sr = ss.getRecordData();
		cantFail(sr.readBytes(0, sr.getLength(), subsec));
		advanceRelocIndex(debugChunk, subsec);
		frameDataSubsecs.push_back({debugChunk, subsec, nextRelocIndex});
		}

// Add the modified inlinee line subsection directly.		static Expected<StringRef>
file.moduleDBI->addDebugSubsection(inlineeSubsection);		getFileName(const DebugStringTableSubsectionRef &strings,
		const DebugChecksumsSubsectionRef &checksums, uint32_t fileID) {
		auto iter = checksums.getArray().at(fileID);
		if (iter == checksums.getArray().end())
		return make_error<CodeViewError>(cv_error_code::no_records);
		uint32_t offset = iter->FileNameOffset;
		return strings.getString(offset);
}		}

void DebugSHandler::finish() {		void DebugSHandler::finish() {
pdb::DbiStreamBuilder &dbiBuilder = linker.builder.getDbiBuilder();		pdb::DbiStreamBuilder &dbiBuilder = linker.builder.getDbiBuilder();

		// If we found any symbol records for the module symbol stream, defer them.
		if (moduleStreamSize > kSymbolStreamMagicSize)
		file.moduleDBI->addUnmergedSymbols(&file, moduleStreamSize -
		kSymbolStreamMagicSize);

// We should have seen all debug subsections across the entire object file now		// We should have seen all debug subsections across the entire object file now
// which means that if a StringTable subsection and Checksums subsection were		// which means that if a StringTable subsection and Checksums subsection were
// present, now is the time to handle them.		// present, now is the time to handle them.
if (!cvStrTab.valid()) {		if (!cvStrTab.valid()) {
if (checksums.valid())		if (checksums.valid())
fatal(".debug$S sections with a checksums subsection must also contain a "		fatal(".debug$S sections with a checksums subsection must also contain a "
"string table subsection");		"string table subsection");

if (!stringTableReferences.empty())		if (!stringTableFixups.empty())
warn("No StringTable subsection was encountered, but there are string "		warn("No StringTable subsection was encountered, but there are string "
"table references");		"table references");
return;		return;
}		}

// Rewrite string table indices in the Fpo Data and symbol records to refer to		// Handle FPO data. Each subsection begins with a single image base
// the global PDB string table instead of the object file string table.		// relocation, which is then added to the RvaStart of each frame data record
for (DebugFrameDataSubsectionRef &fds : newFpoFrames) {		// when it is added to the PDB. The string table indices for the FPO program
const ulittle32_t *reloc = fds.getRelocPtr();		// must also be rewritten to use the PDB string table.
		for (const UnrelocatedFpoData &subsec : frameDataSubsecs) {
		// Relocate the first four bytes of the subection and reinterpret them as a
		// 32 bit integer.
		SectionChunk *debugChunk = subsec.debugChunk;
		ArrayRef<uint8_t> subsecData = subsec.subsecData;
		uint32_t relocIndex = subsec.relocIndex;
		auto unrelocatedRvaStart = subsecData.take_front(sizeof(uint32_t));
		uint8_t relocatedRvaStart[sizeof(uint32_t)];
		debugChunk->writeAndRelocateSubsection(debugChunk->getContents(),
		unrelocatedRvaStart, relocIndex,
		&relocatedRvaStart[0]);
		uint32_t rvaStart;
		memcpy(&rvaStart, &relocatedRvaStart[0], sizeof(uint32_t));

		// Copy each frame data record, add in rvaStart, translate string table
		// indices, and add the record to the PDB.
		DebugFrameDataSubsectionRef fds;
		BinaryStreamReader reader(subsecData, support::little);
		exitOnErr(fds.initialize(reader));
for (codeview::FrameData fd : fds) {		for (codeview::FrameData fd : fds) {
fd.RvaStart += *reloc;		fd.RvaStart += rvaStart;
fd.FrameFunc =		fd.FrameFunc =
translateStringTableIndex(fd.FrameFunc, cvStrTab, linker.pdbStrTab);		translateStringTableIndex(fd.FrameFunc, cvStrTab, linker.pdbStrTab);
dbiBuilder.addNewFpoData(fd);		dbiBuilder.addNewFpoData(fd);
}		}
}		}

for (ulittle32_t *ref : stringTableReferences)		// Translate the fixups and pass them off to the module builder so they will
ref = translateStringTableIndex(ref, cvStrTab, linker.pdbStrTab);		// be applied during writing.
		for (StringTableFixup &ref : stringTableFixups) {
		ref.StrTabOffset =
		translateStringTableIndex(ref.StrTabOffset, cvStrTab, linker.pdbStrTab);
		}
		file.moduleDBI->setStringTableFixups(std::move(stringTableFixups));

// Make a new file checksum table that refers to offsets in the PDB-wide		// Make a new file checksum table that refers to offsets in the PDB-wide
// string table. Generally the string table subsection appears after the		// string table. Generally the string table subsection appears after the
// checksum table, so we have to do this after looping over all the		// checksum table, so we have to do this after looping over all the
// subsections. The new checksum table must have the exact same layout and		// subsections. The new checksum table must have the exact same layout and
// size as the original. Otherwise, the file references in the line and		// size as the original. Otherwise, the file references in the line and
// inlinee line tables will be incorrect.		// inlinee line tables will be incorrect.
auto newChecksums = std::make_unique<DebugChecksumsSubsection>(linker.pdbStrTab);		auto newChecksums = std::make_unique<DebugChecksumsSubsection>(linker.pdbStrTab);
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	for (SectionChunk *debugChunk : source->file->getDebugChunks()) {
if (!debugChunk->live \|\| debugChunk->getSize() == 0)		if (!debugChunk->live \|\| debugChunk->getSize() == 0)
continue;		continue;

bool isDebugS = debugChunk->getSectionName() == ".debug$S";		bool isDebugS = debugChunk->getSectionName() == ".debug$S";
bool isDebugF = debugChunk->getSectionName() == ".debug$F";		bool isDebugF = debugChunk->getSectionName() == ".debug$F";
if (!isDebugS && !isDebugF)		if (!isDebugS && !isDebugF)
continue;		continue;

ArrayRef<uint8_t> relocatedDebugContents = relocateDebugChunk(*debugChunk);

if (isDebugS) {		if (isDebugS) {
dsh.handleDebugS(relocatedDebugContents);		dsh.handleDebugS(debugChunk);
} else if (isDebugF) {		} else if (isDebugF) {
		// Handle old FPO data .debug$F sections. These are relatively rare.
		ArrayRef<uint8_t> relocatedDebugContents =
		relocateDebugChunk(*debugChunk);
FixedStreamArray<object::FpoData> fpoRecords;		FixedStreamArray<object::FpoData> fpoRecords;
BinaryStreamReader reader(relocatedDebugContents, support::little);		BinaryStreamReader reader(relocatedDebugContents, support::little);
uint32_t count = relocatedDebugContents.size() / sizeof(object::FpoData);		uint32_t count = relocatedDebugContents.size() / sizeof(object::FpoData);
exitOnErr(reader.readArray(fpoRecords, count));		exitOnErr(reader.readArray(fpoRecords, count));

// These are already relocated and don't refer to the string table, so we		// These are already relocated and don't refer to the string table, so we
// can just copy it.		// can just copy it.
for (const object::FpoData &fd : fpoRecords)		for (const object::FpoData &fd : fpoRecords)
dbiBuilder.addOldFpoData(fd);		dbiBuilder.addOldFpoData(fd);
}		}
}		}

// Do any post-processing now that all .debug$S sections have been processed.		// Do any post-processing now that all .debug$S sections have been processed.
dsh.finish();		dsh.finish();
}		}

// Add a module descriptor for every object file. We need to put an absolute		// Add a module descriptor for every object file. We need to put an absolute
// path to the object into the PDB. If this is a plain object, we make its		// path to the object into the PDB. If this is a plain object, we make its
// path absolute. If it's an object in an archive, we make the archive path		// path absolute. If it's an object in an archive, we make the archive path
// absolute.		// absolute.
static void createModuleDBI(pdb::PDBFileBuilder &builder, ObjFile *file) {		void PDBLinker::createModuleDBI(ObjFile *file) {
pdb::DbiStreamBuilder &dbiBuilder = builder.getDbiBuilder();		pdb::DbiStreamBuilder &dbiBuilder = builder.getDbiBuilder();
SmallString<128> objName;		SmallString<128> objName;

bool inArchive = !file->parentName.empty();		bool inArchive = !file->parentName.empty();
objName = inArchive ? file->parentName : file->getName();		objName = inArchive ? file->parentName : file->getName();
pdbMakeAbsolute(objName);		pdbMakeAbsolute(objName);
StringRef modName = inArchive ? file->getName() : StringRef(objName);		StringRef modName = inArchive ? file->getName() : StringRef(objName);

file->moduleDBI = &exitOnErr(dbiBuilder.addModuleInfo(modName));		file->moduleDBI = &exitOnErr(dbiBuilder.addModuleInfo(modName));
file->moduleDBI->setObjFileName(objName);		file->moduleDBI->setObjFileName(objName);
		file->moduleDBI->setMergeSymbolsCallback(this, &commitSymbolsForObject);

ArrayRef<Chunk *> chunks = file->getChunks();		ArrayRef<Chunk *> chunks = file->getChunks();
uint32_t modi = file->moduleDBI->getModuleIndex();		uint32_t modi = file->moduleDBI->getModuleIndex();

for (Chunk *c : chunks) {		for (Chunk *c : chunks) {
auto *secChunk = dyn_cast<SectionChunk>(c);		auto *secChunk = dyn_cast<SectionChunk>(c);
if (!secChunk \|\| !secChunk->live)		if (!secChunk \|\| !secChunk->live)
continue;		continue;
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
}		}

// Add all object files to the PDB. Merge .debug$T sections into IpiData and		// Add all object files to the PDB. Merge .debug$T sections into IpiData and
// TpiData.		// TpiData.
void PDBLinker::addObjectsToPDB() {		void PDBLinker::addObjectsToPDB() {
ScopedTimer t1(addObjectsTimer);		ScopedTimer t1(addObjectsTimer);

// Create module descriptors		// Create module descriptors
for_each(ObjFile::instances,		for_each(ObjFile::instances, [&](ObjFile *obj) { createModuleDBI(obj); });
[&](ObjFile *obj) { createModuleDBI(builder, obj); });

// Reorder dependency type sources to come first.		// Reorder dependency type sources to come first.
TpiSource::sortDependencies();		TpiSource::sortDependencies();

// Merge type information from input files using global type hashing.		// Merge type information from input files using global type hashing.
if (config->debugGHashes)		if (config->debugGHashes)
tMerger.mergeTypesWithGHash();		tMerger.mergeTypesWithGHash();

▲ Show 20 Lines • Show All 366 Lines • ▼ Show 20 Lines	for (ImportFile *file : ImportFile::instances) {
ts.Segment = thunkOS->sectionIndex;		ts.Segment = thunkOS->sectionIndex;
ts.Offset = thunkChunk->getRVA() - thunkOS->getRVA();		ts.Offset = thunkChunk->getRVA() - thunkOS->getRVA();

mod->addSymbol(codeview::SymbolSerializer::writeOneSymbol(		mod->addSymbol(codeview::SymbolSerializer::writeOneSymbol(
ons, bAlloc, CodeViewContainer::Pdb));		ons, bAlloc, CodeViewContainer::Pdb));
mod->addSymbol(codeview::SymbolSerializer::writeOneSymbol(		mod->addSymbol(codeview::SymbolSerializer::writeOneSymbol(
cs, bAlloc, CodeViewContainer::Pdb));		cs, bAlloc, CodeViewContainer::Pdb));

SmallVector<SymbolScope, 4> scopes;
CVSymbol newSym = codeview::SymbolSerializer::writeOneSymbol(		CVSymbol newSym = codeview::SymbolSerializer::writeOneSymbol(
ts, bAlloc, CodeViewContainer::Pdb);		ts, bAlloc, CodeViewContainer::Pdb);
scopeStackOpen(scopes, mod->getNextSymbolOffset(), newSym);
		// Write ptrEnd for the S_THUNK32.
		ScopeRecord *thunkSymScope =
		aganeaUnsubmitted Done Reply Inline Actions I can't say I like poking plain memory without going through structured data, it makes the code less reader-friendly. It's a pity we don't have definitions for serialized structures :-( Nothing you can do now I guess. aganea: I can't say I like poking plain memory without going through structured data, it makes the code…
		rnkAuthorUnsubmitted Done Reply Inline Actions I'll factor out the reinterpret_cast from the scope stack management code above so that it can be shared. That hopefully makes it a bit more readable. rnk: I'll factor out the reinterpret_cast from the scope stack management code above so that it can…
		getSymbolScopeFields(const_cast<uint8_t *>(newSym.data().data()));

mod->addSymbol(newSym);		mod->addSymbol(newSym);

newSym = codeview::SymbolSerializer::writeOneSymbol(es, bAlloc,		newSym = codeview::SymbolSerializer::writeOneSymbol(es, bAlloc,
CodeViewContainer::Pdb);		CodeViewContainer::Pdb);
scopeStackClose(scopes, mod->getNextSymbolOffset(), file);		thunkSymScope->ptrEnd = mod->getNextSymbolOffset();

mod->addSymbol(newSym);		mod->addSymbol(newSym);

pdb::SectionContrib sc =		pdb::SectionContrib sc =
createSectionContrib(thunk->getChunk(), mod->getModuleIndex());		createSectionContrib(thunk->getChunk(), mod->getModuleIndex());
mod->setFirstSectionContrib(sc);		mod->setFirstSectionContrib(sc);
}		}
}		}
▲ Show 20 Lines • Show All 239 Lines • Show Last 20 Lines

llvm/include/llvm/DebugInfo/PDB/Native/DbiModuleDescriptorBuilder.h

Show All 28 Lines
}		}

namespace msf {		namespace msf {
class MSFBuilder;		class MSFBuilder;
struct MSFLayout;		struct MSFLayout;
}		}
namespace pdb {		namespace pdb {

		// Represents merged or unmerged symbols. Merged symbols can be written to the
		// output file as is, but unmerged symbols must be rewritten first. In either
		// case, the size must be known up front.
		struct SymbolListWrapper {
		explicit SymbolListWrapper(ArrayRef<uint8_t> Syms)
		: SymPtr(const_cast<uint8_t *>(Syms.data())), SymSize(Syms.size()),
		NeedsToBeMerged(false) {}
		explicit SymbolListWrapper(void *SymSrc, uint32_t Length)
		: SymPtr(SymSrc), SymSize(Length), NeedsToBeMerged(true) {}

		ArrayRef<uint8_t> asArray() const {
		return ArrayRef<uint8_t>(static_cast<const uint8_t *>(SymPtr), SymSize);
		}

		uint32_t size() const { return SymSize; }

		void *SymPtr = nullptr;
		uint32_t SymSize = 0;
		bool NeedsToBeMerged = false;
		};

		/// Represents a string table reference at some offset in the module symbol
		/// stream.
		struct StringTableFixup {
		uint32_t StrTabOffset = 0;
		uint32_t SymOffsetOfReference = 0;
		};

class DbiModuleDescriptorBuilder {		class DbiModuleDescriptorBuilder {
friend class DbiStreamBuilder;		friend class DbiStreamBuilder;

public:		public:
DbiModuleDescriptorBuilder(StringRef ModuleName, uint32_t ModIndex,		DbiModuleDescriptorBuilder(StringRef ModuleName, uint32_t ModIndex,
msf::MSFBuilder &Msf);		msf::MSFBuilder &Msf);
~DbiModuleDescriptorBuilder();		~DbiModuleDescriptorBuilder();

DbiModuleDescriptorBuilder(const DbiModuleDescriptorBuilder &) = delete;		DbiModuleDescriptorBuilder(const DbiModuleDescriptorBuilder &) = delete;
DbiModuleDescriptorBuilder &		DbiModuleDescriptorBuilder &
operator=(const DbiModuleDescriptorBuilder &) = delete;		operator=(const DbiModuleDescriptorBuilder &) = delete;

void setPdbFilePathNI(uint32_t NI);		void setPdbFilePathNI(uint32_t NI);
void setObjFileName(StringRef Name);		void setObjFileName(StringRef Name);

		// Callback to merge one source of unmerged symbols.
		using MergeSymbolsCallback = Error ()(void Ctx, void *Symbols,
		BinaryStreamWriter &Writer);

		void setMergeSymbolsCallback(void *Ctx, MergeSymbolsCallback Callback) {
		MergeSymsCtx = Ctx;
		MergeSymsCallback = Callback;
		}

		void setStringTableFixups(std::vector<StringTableFixup> &&Fixups) {
		StringTableFixups = std::move(Fixups);
		}

void setFirstSectionContrib(const SectionContrib &SC);		void setFirstSectionContrib(const SectionContrib &SC);
void addSymbol(codeview::CVSymbol Symbol);		void addSymbol(codeview::CVSymbol Symbol);
void addSymbolsInBulk(ArrayRef<uint8_t> BulkSymbols);		void addSymbolsInBulk(ArrayRef<uint8_t> BulkSymbols);

		// Add symbols of known size which will be merged (rewritten) when committing
		// the PDB to disk.
		void addUnmergedSymbols(void *SymSrc, uint32_t SymLength);

void		void
addDebugSubsection(std::shared_ptr<codeview::DebugSubsection> Subsection);		addDebugSubsection(std::shared_ptr<codeview::DebugSubsection> Subsection);

void		void
addDebugSubsection(const codeview::DebugSubsectionRecord &SubsectionContents);		addDebugSubsection(const codeview::DebugSubsectionRecord &SubsectionContents);

uint16_t getStreamIndex() const;		uint16_t getStreamIndex() const;
StringRef getModuleName() const { return ModuleName; }		StringRef getModuleName() const { return ModuleName; }
Show All 9 Lines	public:

/// Return the offset within the module symbol stream of the next symbol		/// Return the offset within the module symbol stream of the next symbol
/// record passed to addSymbol. Add four to account for the signature.		/// record passed to addSymbol. Add four to account for the signature.
uint32_t getNextSymbolOffset() const { return SymbolByteSize + 4; }		uint32_t getNextSymbolOffset() const { return SymbolByteSize + 4; }

void finalize();		void finalize();
Error finalizeMsfLayout();		Error finalizeMsfLayout();

Error commit(BinaryStreamWriter &ModiWriter, const msf::MSFLayout &MsfLayout,		/// Commit the DBI descriptor to the DBI stream.
		Error commit(BinaryStreamWriter &ModiWriter);

		/// Commit the accumulated symbols to the module symbol stream. Safe to call
		/// in parallel on different DbiModuleDescriptorBuilder objects. Only modifies
		/// the pre-allocated stream in question.
		Error commitSymbolStream(const msf::MSFLayout &MsfLayout,
WritableBinaryStreamRef MsfBuffer);		WritableBinaryStreamRef MsfBuffer);

private:		private:
uint32_t calculateC13DebugInfoSize() const;		uint32_t calculateC13DebugInfoSize() const;

void addSourceFile(StringRef Path);		void addSourceFile(StringRef Path);
msf::MSFBuilder &MSF;		msf::MSFBuilder &MSF;

uint32_t SymbolByteSize = 0;		uint32_t SymbolByteSize = 0;
uint32_t PdbFilePathNI = 0;		uint32_t PdbFilePathNI = 0;
std::string ModuleName;		std::string ModuleName;
std::string ObjFileName;		std::string ObjFileName;
std::vector<std::string> SourceFiles;		std::vector<std::string> SourceFiles;
std::vector<ArrayRef<uint8_t>> Symbols;		std::vector<SymbolListWrapper> Symbols;

		void *MergeSymsCtx = nullptr;
		MergeSymbolsCallback MergeSymsCallback = nullptr;

		std::vector<StringTableFixup> StringTableFixups;

std::vector<codeview::DebugSubsectionRecordBuilder> C13Builders;		std::vector<codeview::DebugSubsectionRecordBuilder> C13Builders;

ModuleInfoHeader Layout;		ModuleInfoHeader Layout;
};		};

} // end namespace pdb		} // end namespace pdb

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_DEBUGINFO_PDB_RAW_DBIMODULEDESCRIPTORBUILDER_H		#endif // LLVM_DEBUGINFO_PDB_RAW_DBIMODULEDESCRIPTORBUILDER_H

llvm/lib/DebugInfo/PDB/Native/DbiModuleDescriptorBuilder.cpp

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
}		}

void DbiModuleDescriptorBuilder::addSymbolsInBulk(		void DbiModuleDescriptorBuilder::addSymbolsInBulk(
ArrayRef<uint8_t> BulkSymbols) {		ArrayRef<uint8_t> BulkSymbols) {
// Do nothing for empty runs of symbols.		// Do nothing for empty runs of symbols.
if (BulkSymbols.empty())		if (BulkSymbols.empty())
return;		return;

Symbols.push_back(BulkSymbols);		Symbols.push_back(SymbolListWrapper(BulkSymbols));
// Symbols written to a PDB file are required to be 4 byte aligned. The same		// Symbols written to a PDB file are required to be 4 byte aligned. The same
// is not true of object files.		// is not true of object files.
assert(BulkSymbols.size() % alignOf(CodeViewContainer::Pdb) == 0 &&		assert(BulkSymbols.size() % alignOf(CodeViewContainer::Pdb) == 0 &&
"Invalid Symbol alignment!");		"Invalid Symbol alignment!");
SymbolByteSize += BulkSymbols.size();		SymbolByteSize += BulkSymbols.size();
}		}

		void DbiModuleDescriptorBuilder::addUnmergedSymbols(void *SymSrc,
		uint32_t SymLength) {
		assert(SymLength > 0);
		Symbols.push_back(SymbolListWrapper(SymSrc, SymLength));

		// Symbols written to a PDB file are required to be 4 byte aligned. The same
		// is not true of object files.
		assert(SymLength % alignOf(CodeViewContainer::Pdb) == 0 &&
		"Invalid Symbol alignment!");
		SymbolByteSize += SymLength;
		}

void DbiModuleDescriptorBuilder::addSourceFile(StringRef Path) {		void DbiModuleDescriptorBuilder::addSourceFile(StringRef Path) {
SourceFiles.push_back(std::string(Path));		SourceFiles.push_back(std::string(Path));
}		}

uint32_t DbiModuleDescriptorBuilder::calculateC13DebugInfoSize() const {		uint32_t DbiModuleDescriptorBuilder::calculateC13DebugInfoSize() const {
uint32_t Result = 0;		uint32_t Result = 0;
for (const auto &Builder : C13Builders) {		for (const auto &Builder : C13Builders) {
Result += Builder.calculateSerializedLength();		Result += Builder.calculateSerializedLength();
Show All 33 Lines	Error DbiModuleDescriptorBuilder::finalizeMsfLayout() {
auto ExpectedSN =		auto ExpectedSN =
MSF.addStream(calculateDiSymbolStreamSize(SymbolByteSize, C13Size));		MSF.addStream(calculateDiSymbolStreamSize(SymbolByteSize, C13Size));
if (!ExpectedSN)		if (!ExpectedSN)
return ExpectedSN.takeError();		return ExpectedSN.takeError();
Layout.ModDiStream = *ExpectedSN;		Layout.ModDiStream = *ExpectedSN;
return Error::success();		return Error::success();
}		}

Error DbiModuleDescriptorBuilder::commit(BinaryStreamWriter &ModiWriter,		Error DbiModuleDescriptorBuilder::commit(BinaryStreamWriter &ModiWriter) {
const msf::MSFLayout &MsfLayout,
WritableBinaryStreamRef MsfBuffer) {
// We write the Modi record to the `ModiWriter`, but we additionally write its		// We write the Modi record to the `ModiWriter`, but we additionally write its
// symbol stream to a brand new stream.		// symbol stream to a brand new stream.
if (auto EC = ModiWriter.writeObject(Layout))		if (auto EC = ModiWriter.writeObject(Layout))
return EC;		return EC;
if (auto EC = ModiWriter.writeCString(ModuleName))		if (auto EC = ModiWriter.writeCString(ModuleName))
return EC;		return EC;
if (auto EC = ModiWriter.writeCString(ObjFileName))		if (auto EC = ModiWriter.writeCString(ObjFileName))
return EC;		return EC;
if (auto EC = ModiWriter.padToAlignment(sizeof(uint32_t)))		if (auto EC = ModiWriter.padToAlignment(sizeof(uint32_t)))
return EC;		return EC;
		return Error::success();
		}

		Error DbiModuleDescriptorBuilder::commitSymbolStream(
		const msf::MSFLayout &MsfLayout, WritableBinaryStreamRef MsfBuffer) {
		if (Layout.ModDiStream == kInvalidStreamIndex)
		return Error::success();

if (Layout.ModDiStream != kInvalidStreamIndex) {
auto NS = WritableMappedBlockStream::createIndexedStream(		auto NS = WritableMappedBlockStream::createIndexedStream(
MsfLayout, MsfBuffer, Layout.ModDiStream, MSF.getAllocator());		MsfLayout, MsfBuffer, Layout.ModDiStream, MSF.getAllocator());
WritableBinaryStreamRef Ref(*NS);		WritableBinaryStreamRef Ref(*NS);
BinaryStreamWriter SymbolWriter(Ref);		BinaryStreamWriter SymbolWriter(Ref);
// Write the symbols.		// Write the symbols.
if (auto EC =		if (auto EC = SymbolWriter.writeInteger<uint32_t>(COFF::DEBUG_SECTION_MAGIC))
SymbolWriter.writeInteger<uint32_t>(COFF::DEBUG_SECTION_MAGIC))		return EC;
		for (const SymbolListWrapper &Sym : Symbols) {
		if (Sym.NeedsToBeMerged) {
		assert(MergeSymsCallback);
		if (auto EC = MergeSymsCallback(MergeSymsCtx, Sym.SymPtr, SymbolWriter))
return EC;		return EC;
for (ArrayRef<uint8_t> Syms : Symbols) {		} else {
if (auto EC = SymbolWriter.writeBytes(Syms))		if (auto EC = SymbolWriter.writeBytes(Sym.asArray()))
return EC;		return EC;
}		}
		}

		// Apply the string table fixups.
		auto SavedOffset = SymbolWriter.getOffset();
		for (const StringTableFixup &Fixup : StringTableFixups) {
		SymbolWriter.setOffset(Fixup.SymOffsetOfReference);
		if (auto E = SymbolWriter.writeInteger<uint32_t>(Fixup.StrTabOffset))
		return E;
		}
		SymbolWriter.setOffset(SavedOffset);

assert(SymbolWriter.getOffset() % alignOf(CodeViewContainer::Pdb) == 0 &&		assert(SymbolWriter.getOffset() % alignOf(CodeViewContainer::Pdb) == 0 &&
"Invalid debug section alignment!");		"Invalid debug section alignment!");
// TODO: Write C11 Line data		// TODO: Write C11 Line data
for (const auto &Builder : C13Builders) {		for (const auto &Builder : C13Builders) {
if (auto EC = Builder.commit(SymbolWriter, CodeViewContainer::Pdb))		if (auto EC = Builder.commit(SymbolWriter, CodeViewContainer::Pdb))
return EC;		return EC;
}		}

// TODO: Figure out what GlobalRefs substream actually is and populate it.		// TODO: Figure out what GlobalRefs substream actually is and populate it.
if (auto EC = SymbolWriter.writeInteger<uint32_t>(0))		if (auto EC = SymbolWriter.writeInteger<uint32_t>(0))
return EC;		return EC;
if (SymbolWriter.bytesRemaining() > 0)		if (SymbolWriter.bytesRemaining() > 0)
return make_error<RawError>(raw_error_code::stream_too_long);		return make_error<RawError>(raw_error_code::stream_too_long);
}
return Error::success();		return Error::success();
}		}

void DbiModuleDescriptorBuilder::addDebugSubsection(		void DbiModuleDescriptorBuilder::addDebugSubsection(
std::shared_ptr<DebugSubsection> Subsection) {		std::shared_ptr<DebugSubsection> Subsection) {
assert(Subsection);		assert(Subsection);
C13Builders.push_back(DebugSubsectionRecordBuilder(std::move(Subsection)));		C13Builders.push_back(DebugSubsectionRecordBuilder(std::move(Subsection)));
}		}

void DbiModuleDescriptorBuilder::addDebugSubsection(		void DbiModuleDescriptorBuilder::addDebugSubsection(
const DebugSubsectionRecord &SubsectionContents) {		const DebugSubsectionRecord &SubsectionContents) {
C13Builders.push_back(DebugSubsectionRecordBuilder(SubsectionContents));		C13Builders.push_back(DebugSubsectionRecordBuilder(SubsectionContents));
}		}

llvm/lib/DebugInfo/PDB/Native/DbiStreamBuilder.cpp

Show All 12 Lines
#include "llvm/DebugInfo/CodeView/DebugFrameDataSubsection.h"		#include "llvm/DebugInfo/CodeView/DebugFrameDataSubsection.h"
#include "llvm/DebugInfo/MSF/MSFBuilder.h"		#include "llvm/DebugInfo/MSF/MSFBuilder.h"
#include "llvm/DebugInfo/MSF/MappedBlockStream.h"		#include "llvm/DebugInfo/MSF/MappedBlockStream.h"
#include "llvm/DebugInfo/PDB/Native/DbiModuleDescriptorBuilder.h"		#include "llvm/DebugInfo/PDB/Native/DbiModuleDescriptorBuilder.h"
#include "llvm/DebugInfo/PDB/Native/DbiStream.h"		#include "llvm/DebugInfo/PDB/Native/DbiStream.h"
#include "llvm/DebugInfo/PDB/Native/RawError.h"		#include "llvm/DebugInfo/PDB/Native/RawError.h"
#include "llvm/Object/COFF.h"		#include "llvm/Object/COFF.h"
#include "llvm/Support/BinaryStreamWriter.h"		#include "llvm/Support/BinaryStreamWriter.h"
		#include "llvm/Support/Parallel.h"

using namespace llvm;		using namespace llvm;
using namespace llvm::codeview;		using namespace llvm::codeview;
using namespace llvm::msf;		using namespace llvm::msf;
using namespace llvm::pdb;		using namespace llvm::pdb;

DbiStreamBuilder::DbiStreamBuilder(msf::MSFBuilder &Msf)		DbiStreamBuilder::DbiStreamBuilder(msf::MSFBuilder &Msf)
: Msf(Msf), Allocator(Msf.getAllocator()), Age(1), BuildNumber(0),		: Msf(Msf), Allocator(Msf.getAllocator()), Age(1), BuildNumber(0),
▲ Show 20 Lines • Show All 360 Lines • ▼ Show 20 Lines	Error DbiStreamBuilder::commit(const msf::MSFLayout &Layout,
auto DbiS = WritableMappedBlockStream::createIndexedStream(		auto DbiS = WritableMappedBlockStream::createIndexedStream(
Layout, MsfBuffer, StreamDBI, Allocator);		Layout, MsfBuffer, StreamDBI, Allocator);

BinaryStreamWriter Writer(*DbiS);		BinaryStreamWriter Writer(*DbiS);
if (auto EC = Writer.writeObject(*Header))		if (auto EC = Writer.writeObject(*Header))
return EC;		return EC;

for (auto &M : ModiList) {		for (auto &M : ModiList) {
if (auto EC = M->commit(Writer, Layout, MsfBuffer))		if (auto EC = M->commit(Writer))
return EC;		return EC;
}		}

		// Commit symbol streams. This is a lot of data, so do it in parallel.
		if (auto EC = parallelForEachError(
		ModiList, [&](std::unique_ptr<DbiModuleDescriptorBuilder> &M) {
		return M->commitSymbolStream(Layout, MsfBuffer);
		}))
		return EC;

if (!SectionContribs.empty()) {		if (!SectionContribs.empty()) {
if (auto EC = Writer.writeEnum(DbiSecContribVer60))		if (auto EC = Writer.writeEnum(DbiSecContribVer60))
return EC;		return EC;
if (auto EC = Writer.writeArray(makeArrayRef(SectionContribs)))		if (auto EC = Writer.writeArray(makeArrayRef(SectionContribs)))
return EC;		return EC;
}		}

if (!SectionMap.empty()) {		if (!SectionMap.empty()) {
Show All 40 Lines