This is an archive of the discontinued LLVM Phabricator instance.

[mlgo][regalloc] Add score calculation for training
ClosedPublic

Authored by mtrofin on Dec 6 2021, 3:28 PM.

Details

Summary

Add the calculation of a score, which will be used during ML training. The
score qualifies the quality of a regalloc policy, and is independent of
what we train (currently, just eviction), or the regalloc algo itself.
We can then use scores to guide training (which happens offline), by
formulating a reward based on score variation - the goal being lowering
scores (currently, that reward is percentage reduction relative to
Greedy's heuristic)

Currently, we compute the score by factoring different instruction
counts (loads, stores, etc) with the machine basic block frequency,
regardless of the instructions' provenance - i.e. they could be due to
the regalloc policy or be introduced previously. This is different from
RAGreedy::reportStats, which accummulates the effects of the allocator
alone. We explored this alternative but found (at least currently) that
the more naive alternative introduced here produces better policies. We
do intend to consolidate the two, however, as we are actively
investigating improvements to our reward function, and will likely want
to re-explore scoring just the effects of the allocator.

In either case, we want to decouple score calculation from allocation
algorighm, as we currently evaluate it after a few more passes after
allocation (also, because score calculation should be reusable
regardless of allocation algorithm).

We intentionally accummulate counts independently because it facilitates
per-block reporting, which we found useful for debugging - for instance,
we can easily report the counts indepdently, and then cross-reference
with perf counter measurements.

This is based off Maruf Zaber's 2020 internship with our team.

Diff Detail

Event Timeline

mtrofin created this revision.Dec 6 2021, 3:28 PM
mtrofin requested review of this revision.Dec 6 2021, 3:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 6 2021, 3:28 PM
yundiqian accepted this revision.Dec 6 2021, 3:56 PM
yundiqian added inline comments.
llvm/lib/CodeGen/RegAllocScore.h
35–40

total instructions?

This revision is now accepted and ready to land.Dec 6 2021, 3:56 PM
mtrofin marked an inline comment as done.Dec 6 2021, 4:00 PM
mtrofin added inline comments.
llvm/lib/CodeGen/RegAllocScore.h
35–40

we're not using those currently, so decided against collecting those just yet.

yundiqian added inline comments.Dec 6 2021, 4:02 PM
llvm/lib/CodeGen/RegAllocScore.cpp
31

0.2

34

0.2

mtrofin updated this revision to Diff 392215.Dec 6 2021, 4:04 PM
mtrofin marked 3 inline comments as done.

updated weights

wenlei added a subscriber: wenlei.Dec 7 2021, 12:08 AM
This revision was automatically updated to reflect the committed changes.
mtrofin edited the summary of this revision. (Show Details)Dec 7 2021, 2:22 PM

I know this review comes pretty late, but FWIW: It seems unfortunate to me that we now have one code path producing optimization remarks in RAGreedy::computeStats, while the separate counting formula here is independent of regalloc algorithm but cannot produce optimization remarks for analysis / comparison with non-ML approaches...

llvm/lib/CodeGen/RegAllocScore.cpp
33–34

What's the reasoning for loads having weight 4.0 compared to stores being just 1.0? This feels like way too much emphasis on loads.

Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2022, 2:16 PM

I know this review comes pretty late, but FWIW: It seems unfortunate to me that we now have one code path producing optimization remarks in RAGreedy::computeStats, while the separate counting formula here is independent of regalloc algorithm but cannot produce optimization remarks for analysis / comparison with non-ML approaches...

Ack - I thought I left a FIXME somewhere about this, probably forgot to. In any case, the intent is to unify them somehow.

llvm/lib/CodeGen/RegAllocScore.cpp
33–34

Loads would tend to block the execution pipeline, while stores wouldn't. More importantly, experimentally, this got us a reward that moved the needle best (we were surprised, too - we expected the load/store ratio to be a bit lower)