This is an archive of the discontinued LLVM Phabricator instance.

Phabricator review: blockfreq: Rewrite block frequency analysis
AbandonedPublic

Authored by dexonsmith on Mar 19 2014, 1:59 PM.

Details

Reviewers
None
Summary

This is patches 0003 through 0006 squashed together from the review on
llvm-commits. I've copied the original email content (thread title was
"[PATCH] blockfreq: Rewrite block frequency analysis") here as reference.

This patch has not been updated to reflect comments received so far.


This patch series rewrites the shared implementation of BlockFrequencyInfo
and MachineBlockFrequencyInfo entirely.

  • Patches 0001 and 0002 are simple cleanups.
  • Patch 0003 is a class rename (and file move). Let me know if I should just merge it with 0006.
  • Patches 0004 and 0005 introduce some supporting classes.
  • Patch 0006 rewrites BlockFrequencyInfoImpl entirely.
  • Patch 0007 is *not* included (it’s headed separately to llvmdev).

The old implementation had a fundamental flaw: precision losses from
nested loops (and very wide branches) compounded past loop exits (and
convergence points).

The @nested_loops testcase at the end of
test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating. This
function has three nested loops, with branch weights in the loop headers
of 1:4000 (exit:continue). The old analysis gives nonsense:

Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
---- Block Freqs ----
 entry = 1.0
 for.cond1.preheader = 1.00103
 for.cond4.preheader = 5.5222
 for.body6 = 18095.19995
 for.inc8 = 4.52264
 for.inc11 = 0.00109
 for.end13 = 0.0

The new analysis gives correct results:

Printing analysis 'Block Frequency Analysis' for function 'nested_loops':
block-frequency-info: nested_loops
 - entry: float = 1.0, int = 8
 - for.cond1.preheader: float = 4001.0, int = 32007
 - for.cond4.preheader: float = 16008001.0, int = 128064007
 - for.body6: float = 64048012001.0, int = 512384096007
 - for.inc8: float = 16008001.0, int = 128064007
 - for.inc11: float = 4001.0, int = 32007
 - for.end13: float = 1.0, int = 8

The new algorithm leverages BlockMass and PositiveFloat to maintain
precision, separates "probability mass distribution" from "loop
scaling", and uses dithering to eliminate probability mass loss.

Additionally, the new algorithm has a complexity advantage over the old.
The previous algorithm was quadratic in the worst case. The new
algorithm is still worst-case quadratic in the presence of irreducible
control flow, but it's linear otherwise.

The key difference between the old algorithm and the new is that control
flow within a loop is evaluated separately from control flow outside,
limiting propagation of precision problems and allowing loop scale to be
calculated independently of mass distribution. Loops are visited
bottom-up, their loop scales are calculated, and they are replaced by
pseudo-nodes. Mass is then distributed through the function, which is
now a DAG. Finally, loops are revisited top-down to multiply through
the loop scales and the masses distributed to pseudo nodes.

There are some remaining flaws.

  • Irreducible control flow isn't ignored, but it also isn't modelled correctly. Nevertheless, the new algorithm should behave better than the old algorithm (at least it evaluates irreducible edges), but mileage may vary.
  • Loop scale is limited to 4096 per loop (2^12) to avoid exhausting the 64-bit integer precision used downstream. If downstream users of BlockFrequencyInfo are updated to use PositiveFloat (instead of BlockFrequency), this limitation can be removed. It's not clear if that's practical.
  • BlockFrequencyInfo does *not* leverage LoopInfo/MachineLoopInfo and Loop/MachineLoop. These are currently unsuitable because they use quadratic storage and don't have the API needed to make this algorithm efficient. I looked into updating them, but downstream users rely on the current API.
  • The "bias" calculation proposed recently on llvmdev is *not* incorporated here. A separate patch (0007) takes a stab at that; I’ll post it to llvmdev soon.

I ran through the LNT benchmarks; there was some movement in both
directions.

Diff Detail

Event Timeline

dexonsmith abandoned this revision.Apr 19 2017, 7:08 PM

This was committed in some form a long time ago.