This is an archive of the discontinued LLVM Phabricator instance.

Add heuristics for irreducible loop metadata under PGO
ClosedPublic

Authored by hjyamauchi on Nov 13 2017, 2:01 PM.

Details

Summary

Add the following heuristics for irreducible loop metadata:

  • When an irreducible loop header is missing the loop header weight metadata, give it the minimum weight seen among other headers.
  • Annotate indirectbr targets with the loop header weight metadata (as they are likely to become irreducible loop headers after indirectbr tail duplication.)

These greatly improve the accuracy of the block frequency info of the Python
interpreter loop (eg. from ~3-16x off down to ~40-55% off) and the Python
performance (eg. unpack_sequence from ~50% slower to ~8% faster than GCC) due to
better register allocation under PGO.

Event Timeline

hjyamauchi created this revision.Nov 13 2017, 2:01 PM
hjyamauchi edited the summary of this revision. (Show Details)Nov 13 2017, 2:12 PM
davidxl added inline comments.Nov 14 2017, 9:42 AM
include/llvm/Analysis/BlockFrequencyInfoImpl.h
1170

what is the root cause of missing header weight?

1184

What is the intuition behind this heuristic?

1196

Is this check needed? MinHeaderWeight is always set.

test/Analysis/BlockFrequencyInfo/irreducible_pgo.ll
83

The test simplification change can be committed as a separate patch.

test/Transforms/PGOProfile/irreducible.ll
72

Split out the test simplification change into another patch.

hjyamauchi marked 5 inline comments as done.

Rebased and addressed comments.

include/llvm/Analysis/BlockFrequencyInfoImpl.h
1170

General transformations/optimizations that happen after the PGO annotation pass and manipulate the CFG like tail dup, block splitting, etc.

1184

The idea is to not to disrupt the existing trends too much by using a weight that's in the general range of the other headers' weights (eg. don't want to use a weight of 1000000 when the others have weights in the range of 1-10). The average weight, as opposed to the minimum, would also work in that regard. But based on the performance data, the minimum seems to work better. *Maybe* because the average could be off-base if there are major outliers (a few very hot headers) and the cold headers may be more common than warm/hot headers. Added more to the comment.

1196

Done. Still need to check against the zero value which addLocal() has an assert against though.

hjyamauchi edited the summary of this revision. (Show Details)Nov 14 2017, 3:17 PM
davidxl added inline comments.Nov 14 2017, 4:02 PM
include/llvm/Analysis/BlockFrequencyInfoImpl.h
1184

Perhaps add a // FIXME to for better update in the passes that drop the header weight

hjyamauchi marked an inline comment as done.
hjyamauchi edited the summary of this revision. (Show Details)

Updated comment.

davidxl accepted this revision.Nov 15 2017, 9:29 AM

lgtm

This revision is now accepted and ready to land.Nov 15 2017, 9:29 AM
hjyamauchi closed this revision.Nov 20 2017, 1:03 PM