This commit moves the parsing of linker optimization hints into
ARM64::applyOptimizationHints. This lets us avoid allocating memory
for holding the parsed information, and moves work out of
ObjFile::parse, which is not parallelized at the moment.
This change reduces the overhead of processing LOHs to 25-30 ms when
linking Chromium Framework on my M1 machine; previously it took close to
100 ms.
There's no statistically significant change in runtime for a --threads=1
link.
Performance figures with all 8 cores utilized:
N Min Max Median Avg Stddev x 20 3.8027232 3.8760762 3.8505335 3.8454145 0.026352574 + 20 3.7019017 3.8660538 3.7546209 3.7620371 0.032680043 Difference at 95.0% confidence -0.0833775 +/- 0.019 -2.16823% +/- 0.494094% (Student's t, pooled s = 0.0296854)
nit: does this have to be a lambda? can't it be a simple static function?
each spelling of a lambda will be a unique type - multipling with different instantiation of this forEachHint's template, we may end up with a quite a handful types.