This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Shrink binding and type in Symbol
ClosedPublic

Authored by smeenai on Apr 19 2022, 1:52 PM.

Details

Summary

STB_HIPROC and STT_HIPROC are both 15, so we can fit the symbol binding
and type in 4 bits. This gives us an additional byte to use for Symbol
flags (without increasing the type's size), which I'll be making use of
in the next diff.

Reorder type and binding based on a suggestion from @MaskRay, to
optimize st_info computation on little-endian systems (see
https://godbolt.org/z/nMn8Yar43).

Diff Detail

Event Timeline

smeenai created this revision.Apr 19 2022, 1:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2022, 1:52 PM
smeenai requested review of this revision.Apr 19 2022, 1:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2022, 1:52 PM

#define ELF32_ST_INFO(bind, type) (((bind) << 4) + ((type) & 0xf))

It may be worth swapping type and bind to optimize for little-endian systems.

smeenai updated this revision to Diff 423820.Apr 19 2022, 10:51 PM

Reorder type and binding

smeenai edited the summary of this revision. (Show Details)Apr 19 2022, 10:52 PM

#define ELF32_ST_INFO(bind, type) (((bind) << 4) + ((type) & 0xf))

It may be worth swapping type and bind to optimize for little-endian systems.

Good idea, thanks.

smeenai updated this revision to Diff 423821.Apr 19 2022, 10:54 PM

Mention optimization is specific to little-endian systems

MaskRay accepted this revision.Apr 19 2022, 11:20 PM

Thanks for the update. Seems that the functions which become smaller are more than functions which become larger.
lld::elf::SymbolTable::insert(llvm::StringRef) and SymbolTableSection::writeTo are examples becoming smaller.

This revision is now accepted and ready to land.Apr 19 2022, 11:20 PM
MaskRay added a comment.EditedApr 19 2022, 11:56 PM

BTW: the new lld (/tmp/c/1) may be slightly faster when linking chromium

% hyperfine --warmup 2 --min-runs 10 "numactl -C 20-27 "{/tmp/c/0,/tmp/c/1}" -flavor gnu @response.txt --threads=8"
Benchmark 1: numactl -C 20-27 /tmp/c/0 -flavor gnu @response.txt --threads=8
  Time (mean ± σ):      5.568 s ±  0.022 s    [User: 9.089 s, System: 2.293 s]
  Range (min … max):    5.529 s …  5.605 s    10 runs
 
Benchmark 2: numactl -C 20-27 /tmp/c/1 -flavor gnu @response.txt --threads=8
  Time (mean ± σ):      5.543 s ±  0.026 s    [User: 9.038 s, System: 2.287 s]
  Range (min … max):    5.518 s …  5.605 s    10 runs
 
Summary
  'numactl -C 20-27 /tmp/c/1 -flavor gnu @response.txt --threads=8' ran
    1.00 ± 0.01 times faster than 'numactl -C 20-27 /tmp/c/0 -flavor gnu @response.txt --threads=8'

% hyperfine --warmup 2 --min-runs 10 "numactl -C 20-27 "{/tmp/c/0,/tmp/c/1}" -flavor gnu @response.txt --threads=8"
Benchmark 1: numactl -C 20-27 /tmp/c/0 -flavor gnu @response.txt --threads=8
  Time (mean ± σ):      5.596 s ±  0.055 s    [User: 9.052 s, System: 2.322 s]
  Range (min … max):    5.510 s …  5.699 s    10 runs
 
Benchmark 2: numactl -C 20-27 /tmp/c/1 -flavor gnu @response.txt --threads=8
  Time (mean ± σ):      5.548 s ±  0.041 s    [User: 9.018 s, System: 2.320 s]
  Range (min … max):    5.485 s …  5.628 s    10 runs
 
Summary
  'numactl -C 20-27 /tmp/c/1 -flavor gnu @response.txt --threads=8' ran
    1.01 ± 0.01 times faster than 'numactl -C 20-27 /tmp/c/0 -flavor gnu @response.txt --threads=8'

Ah, neat :) Thanks for checking.

This revision was automatically updated to reflect the committed changes.