This is a modification of the count inference algorithm for inferring missing
counts of basic blocks and jumps based on the sampling-based weights (e.g.,
collected via pseudo-probes). The goal is to distribute counts evenly among
the control-flow graph when there are multiple equally likely options. This is
in contrast with the earlier version that created a valid flow in CFG using an
arbitrarily chosen set of blocks/jumps.
Algorithmically, the change is a modification of the min-cost max flow approach
that works by iteratively increasing flow along augmenting paths. In the new
version, we instead find an augmenting DAG (directed acyclic graph) and push the
flow evenly along all edges of the DAG. A a result, the counts are distributed evenly
not only among dangling (w/o sample weights) blocks but also among regular
basic blocks.
The algorithm is controlled by a new flag sample-profile-even-count-distribution
(default ON).
Evaluation:
This has been tested on several benchmarks (with CSSPGO):
- For smallish SPEC17 binaries, I do not see a perf difference;
- For clang-10 (17MB of hot code), there is a 0.5% - 1.5% improvement;
- For gcc-8 (8MB of hot code), the win is in the range 0% - 0.5%.
In addition, added a few tests to verify correctness on simple instances.
Build-time impact:
With the change, the inference algorithm is ~2x-4x slower than the previous
version for (very) large binaries. However, the runtime is still close-to-linear
and I did not see any build-time regression.
nit: name it computeAugmentingPathCapacity?