This is an archive of the discontinued LLVM Phabricator instance.

[BOLT] stale profile matching [part 2 out of 2]
ClosedPublic

Authored by spupyrev on Mar 22 2023, 2:26 PM.

Details

Summary

This is a first "serious" version of stale profile matching in BOLT. This diff
extends the hash computation for basic blocks so that we can apply a fuzzy
hash-based matching. The idea is to compute several "versions" of a hash value
for a basic block. A loose version of a hash (computed by ignoring instruction
operands) allows to match blocks in functions whose content has been changed,
while stricter hash values (considering instruction opcodes with operands and
even based on hashes of block's successors/predecessors) allow to resolve
collisions. In order to save space and build time, individual hash components
are blended into a single uint64_t.
There are likely numerous ways of improving hash computation but already this
simple variant provides significant perf benefits.

Perf testing on the clang binary: collecting data on clang-10 and using it
to optimize clang-11 (with ~1 year of commits in between). Next, we compare

  • stale_clang (clang-11 optimized with profile collected on clang-10 with infer-stale-profile=0)
  • opt_clang (clang-11 optimized with profile collected on clang-11)
  • infer_clang (clang-11 optimized with profile collected on clang-10 with infer-stale-profile=1)

LTO-only mode:
stale_clang vs opt_clang: task-clock [delta(%): 9.4252 ± 1.6582, p-value: 0.000002]
(That is, there is a ~9.5% perf regression)
infer_clang vs opt_clang: task-clock [delta(%): 2.1834 ± 1.8158, p-value: 0.040702]
(That is, the regression is reduced to ~2%)
Related BOLT logs:

BOLT-INFO: identified 2114 (18.61%) stale functions responsible for 30.96% samples
BOLT-INFO: inferred profile for 2101 (18.52% of all profiled) functions responsible for 30.95% samples

LTO+AutoFDO mode:
stale_clang vs opt_clang: task-clock [delta(%): 19.1293 ± 1.4131, p-value: 0.000002]
infer_clang vs opt_clang: task-clock [delta(%): 7.4364 ± 1.3343, p-value: 0.000002]
Related BOLT logs:

BOLT-INFO: identified 5452 (50.27%) stale functions responsible for 85.34% samples
BOLT-INFO: inferred profile for 5442 (50.23% of all profiled) functions responsible for 85.33% samples

Diff Detail

Event Timeline

spupyrev created this revision.Mar 22 2023, 2:26 PM
Herald added a reviewer: Amir. · View Herald Transcript
Herald added a reviewer: maksfb. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
spupyrev edited the summary of this revision. (Show Details)Mar 22 2023, 2:43 PM
spupyrev published this revision for review.Mar 22 2023, 2:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2023, 2:45 PM
spupyrev retitled this revision from [BOLT] v1 stale profile matching to [BOLT] stale profile matching [part 2 out of 2].May 24 2023, 2:26 PM
Amir added a comment.May 24 2023, 10:42 PM

Please address a couple of nits. Will test internally, otherwise LG.
@maksfb – can you please take a look?

bolt/lib/Profile/StaleProfileMatching.cpp
208

Remove or replace?

282–284

nit

384–386
spupyrev updated this revision to Diff 525732.May 25 2023, 11:55 AM

comments + rebase

Amir added a comment.Jun 6 2023, 8:07 PM

X86/reader-stale-yaml.test is failing in testing, please update it.

spupyrev updated this revision to Diff 529688.Jun 8 2023, 11:45 AM
spupyrev marked 3 inline comments as done.

rebasing & fixing the test & adding debug logging

Amir accepted this revision.Jun 8 2023, 12:18 PM
This revision is now accepted and ready to land.Jun 8 2023, 12:18 PM
This revision was automatically updated to reflect the committed changes.