Page MenuHomePhabricator

Add a flag to experiment with outlining optional branches.
ClosedPublic

Authored by djasper on Feb 18 2015, 1:47 AM.

Details

Reviewers
chandlerc
Summary

In a CFG with the edges A->B->C and A->C, B is an optional branch.

LLVM's default behavior is to lay the blocks out naturally, i.e. A, B, C, in order to improve code locality and fall throughs. However, if a function contains many of those optional branches only a few of which are taken, this leads to a lot of unnecessary icache misses. Moving B out of line can work around this.

Diff Detail

Event Timeline

djasper updated this revision to Diff 20153.Feb 18 2015, 1:47 AM
djasper retitled this revision from to Add a flag to experiment with outlining optional branches..
djasper updated this object.
djasper edited the test plan for this revision. (Show Details)
djasper added a reviewer: chandlerc.
djasper added a subscriber: Unknown Object (MLST).
djasper updated this revision to Diff 20155.Feb 18 2015, 1:50 AM

Removed left-over attempt to use llvm.assume to modify branch probabilities.

djasper updated this revision to Diff 20156.Feb 18 2015, 1:55 AM

Move if statement into the right scope. I think, I might have messed this up when merging.

reames added a subscriber: reames.Feb 18 2015, 10:04 AM

Codewise, I have some concern about adding the post dominator dependency. Any idea how much running time that adds? Otherwise, LGTM.

Approach wise, I think you may want to invert your reasoning. Your currently inferring coldness of the conditional block from the hotness of the successor. It would be more clear and predictable to consider the probability of the skipped successors directly.

Overall, I'm really excited about the direction of this patch. This might be very material on my workloads. Once this is landed, I'll do a run and see what differences I can report.

djasper updated this revision to Diff 20204.Feb 18 2015, 10:41 AM

As previously discussed on IRC, using the post dominator tree is probably not an option for performance reasons. I'll try to come up with an implementation that doesn't need it. If we had it, the implementation can actually be simplified as in this version of the patch (a single lookup in the dominator tree instead of many). Just adding this for documentation of the progress.

As previously discussed on IRC, using the post dominator tree is probably not an option for performance reasons.

I highly recommend measuring the performance of the post-dom-tree version before concluding it is too expensive. True, we've traditionally avoided them. But we also added use of a post-dom tree to the MachineSink pass, because it was the right way to solve the problem, and the compile time cost increase turned out to be relatively small.

I'll try to come up with an implementation that doesn't need it. If we had it, the implementation can actually be simplified as in this version of the patch (a single lookup in the dominator tree instead of many). Just adding this for documentation of the progress.

djasper updated this revision to Diff 21129.Mar 3 2015, 12:35 PM

Updated to not use a post-dominator tree.

djasper updated this revision to Diff 21173.Mar 3 2015, 10:22 PM

Don't do needless computations if optional branches aren't outlined.

djasper updated this revision to Diff 21174.Mar 3 2015, 10:46 PM

Re-add the test.

chandlerc accepted this revision.Mar 4 2015, 3:04 AM
chandlerc edited edge metadata.

I'm not sure this is exactly the right *heuristic*, but the code is fine and I really want to get it into the tree because it highlights fundamental flaws in the existing MBP algorithm that you've found. The flag is off by default and only requires a domtree, so I think it is entirely reasonable to commit. We can and should experiment with this off-by-default heuristic, learn what we can, and either fix and enable it or replace it with a better construct.

This revision is now accepted and ready to land.Mar 4 2015, 3:04 AM
djasper closed this revision.Mar 4 2015, 3:07 AM

Submitted as r231230.