BlockPlacement: Consider hotness of blocks relative to a loop iteration rather than relative to the loop as a whole
Needs ReviewPublic

Authored by rsmith on Aug 4 2017, 1:56 PM.

Details

Summary

This patch tweaks the coldness metric for blocks in loops introduced in https://reviews.llvm.org/D11662 by considering the relative hotness of a block compared to a loop iteration rather than the relative hotness of a block compared to the loop as a whole.

The threshold for coldness is changed from <= 1/5 executions of the block per execution of the enclosing loop to <= 1/160 executions of the block per iteration of the enclosing loop. This preserves the existing threshold for a loop with a backedge probability of 31/32, which is the value that BranchProbabilityInfo invents in the absence of profiling data.

Here are a couple of illustrative special cases to demonstrate why I think the threshold should be hotness relative to a loop iteration rather than relative to executions of the enclosing loop:

  1. If loop trip count is very high (say, 1 million) and loop contains a cold block (probability is say 1 in 100,000), we will force that block into the loop, potentially imposing a cost on almost all loop iterations for minimal gain.
  1. If loop trip count is very low (say, we expect there to be 1 trip) and loop contains a block with probability 0.1 relative to the loop iteration, we will force that block out of the loop (and potentially end up placing it a very long way away, if the loop contains many blocks) despite it not being particularly cold.

If we instead considered blocks to be cold based on their probability relative to an iteration of the loop, we would do the right thing in both of the above extrema. But perhaps there's some reason that coldness relative to the loop as a whole is a better metric that would justify the current behavior?

(Part of the added testcase also depends on D36296.)

Diff Detail

Repository
rL LLVM
rsmith created this revision.Aug 4 2017, 1:56 PM

Should we adjust the ratio if it's not specified explicitly and no profile data is available?

This looks fine to me. Is there any performance data with PGO?

rsmith added a comment.Aug 4 2017, 3:30 PM

Should we adjust the ratio if it's not specified explicitly and no profile data is available?

Some tuning seems like a good idea as a next step, but I'd like to keep the behavior in the no-profile-data case the same for this change.

This looks fine to me. Is there any performance data with PGO?

I've done some measurements, and found that while some benchmarks improve significantly, others regress significantly. I'd like to take a look at the regressions before pushing forward more with this.