This patch is extracted from D96035, it adds support for the existing
DWARFLinker functionality. What is not supported yet:
- Types deduplication(--odr mode).
- Modules deduplication.
- Generation of index tables.
run-time performance and memory requirements for clang binary --num-threads 16 :
---------------------------------------------------------------------------------- | time, sec | mem, GB | ---------------------------------------------------------------------------------- dsymutil --no-odr --accelerator None --linker llvm | 44 | 18.0 | ---------------------------------------------------------------------------------- dsymutil --no-odr --accelerator None --linker apple | 248 | 22.2 | ----------------------------------------------------------------------------------
run-time performance and memory requirements for clang binary --num-threads 1 :
---------------------------------------------------------------------------------- | time, sec | mem, GB | ---------------------------------------------------------------------------------- dsymutil --no-odr --accelerator None --linker llvm | 242 | 17.2 | ---------------------------------------------------------------------------------- dsymutil --no-odr --accelerator None --linker apple | 260 | 19.4 | ----------------------------------------------------------------------------------
The overall linking process looks like this:
parrallel_for_each(ObjectFile) { for_each (Compile Unit) { 1. Load Clang modules. } parrallel_for_each(Compile Unit) { 1. Load input DWARF for Compile Unit. 2. Report warnings for Clang modules. 3. Analyze live DIEs. 4. Clone DIEs(Generate output DIEs and resulting DWARF tables). The result is set of sections corresponding to the current compile unit. 5. Cleanup Input and Output DIEs. } Deallocate loaded Object file. } for_each (ObjectFile) { for_each (Compile Unit) { 1. Set offsets to Compile Units DWARF sections. 2. Sort offsets/attributes/patches to have a predictable result. 3. Patch size/offsets fields. 4. Generate index tables. 5. Move DWARF sections of compile units into the resulting file. } }
Every compile unit is processed separately, visited only once
(except case inter-CU references exist), and used data is freed
after the compile unit is processed. The resulting file is glued together
from the generated debug tables which correspond to separate compile units.
Handling inter-CU references: inter-CU references are hard to process
using only one pass. f.e. if CU1 references CU100 and CU100 references
CU1, we could not finish handling of CU1 until we finished CU100.
Thus we either need to load all CUs into the memory, either load CUs several
times. This implementation loads inter-connected CU into memory at the first
pass and processes them at the second pass.
Changes from the current implementation(making DWARFLinkerParallel to be binary incompatible with current DWARFLinker):
a) No common abbreviation table. Each compile unit has
its own abbreviation table. Generating common abbreviation table slowdowns parallel execution(This is a resource that is accessed many times from many threads). Abbreviation table does not take a lot of space, so it looks cheap to have separate abbreviations tables. Later, it might be optimized a bit(by removing equal abbreviations tables).
b) .debug_frame. Already generated CIE records are not reused between object files
c) location expressions, containing type references, use fixed-length ULEB128 format as they need to be patched after reference body is generated.
d) live tracking algorithm does not depend on the order of DW_TAG_import_module nodes and in some cases keep more DIEs.