This is an archive of the discontinued LLVM Phabricator instance.

[For Discussion Only] PGO in LoopSimplify
AbandonedPublic

Authored by reames on Jan 7 2015, 5:02 PM.

Details

Reviewers
None
Summary

This is not a finalized patch. I want to get input before taking this too much further. Please direct most feedback to the llvmdev thread; if we choose to go forward with this, I will submit a separate code review at a later date.

The basic idea of this patch is to use profiling information about the frequency of a backedges to separate a loop with multiple latches into a loop nest with the rare backedge being the outer loop. We already use a set of static heuristics based on non-varying arguments to PHI nodes to do the same type of thing.

The motivation is that doing this pulls rarely executed code out of the innermost loop and tends to greatly simplify analysis and optimization of that loop. In particular, it can enable substantial LICM gains when there are clobbering calls down rare paths. The original motivating case for this was the slow path of a safepoint poll, but I believe the possible benefit is more general as well.

Points for discussion:

  • Is using profile information for this purpose even a reasonable thing to do?
  • I chose to implement this without relying on the existing block frequency analysis. My reasoning was that a) this is a rarely taken case and adding an expensive analysis dependency probably wasn't worthwhile and b) that block frequency analysis was more expensive/precise than I really needed. Is this reasonable?
  • If so, is the notion of 'rareness' of a loop block something that's worth extracting out on it's own and reusing? Are there other similar uses anyone can think of?
  • Currently, I'm only supporting a fairly small set of controlling conditions. Are there important cases I'm not considering?
  • Since the rarest latch is often deep in a loop - with other "if (X) continue;" (i.e. latches) before it - this tends to create loops with multiple exiting blocks. Some of the existing passes might not deal with this well, is that a major concern? Suggestions for how to analysis and validate?
  • Currently, I've structured this as pulling off the rarest latch as an outer iteration. I could also pull off the most popular latch as an inner iteration. This might give different tradeoffs; thoughts?

Generally, any thoughts anyone have on the problem or approach are welcome. I'm not particular attached to the particular approach laid out here and if there's a more advantageous approach, all the better.

Diff Detail

Event Timeline

reames updated this revision to Diff 17879.Jan 7 2015, 5:02 PM
reames retitled this revision from to [For Discussion Only] PGO in LoopSimplify.
reames updated this object.
reames edited the test plan for this revision. (Show Details)
reames abandoned this revision.Jul 22 2015, 4:05 PM

Closing an old review. If I'm remembering correctly, the conclusion from on list discussion was that the general idea was reasonable, but should be implemented in terms of the existing block frequency interfaces and that the heuristics would need careful tuning. We've figured out an alternate approach to the original problem which motivated this work, so I'm unlikely to return to this any time soon.