Page MenuHomePhabricator

[WIP][Shrink-wrap]split restore point
Needs ReviewPublic

Authored by junbuml on Jan 26 2018, 2:03 PM.

Details

Summary

This change split a restore point to allow it to only post-dominate blocks reachable by use or def of CSRs/FI.

This is a WIP and I'm posting it to continue the discussion about :

Bugzilla : https://bugs.llvm.org/show_bug.cgi?id=33868

I will be happy to hear any high level comment/suggestion.

This change itself increase 15% more shrink-wrapping in spec2000/2006/2017 benchmarks. I observed 160% more shrink-wrapping in spec2000/2006/2017 benchmarks if we apply the copy forwarding (D41835), PostRASink (D41463), and this change all together.

Diff Detail

Event Timeline

junbuml created this revision.Jan 26 2018, 2:03 PM

Thanks for working on this! I think the idea is worth pursuing. This kinda reminds me of Post Register Allocation Spill Code Optimization, Christopher Lupo, Kent D. Wilken. A similar issue is what they describe in §4.8. I think if you extract the blocks C, D, E, F from the example, that's your function.

I tried looking into implementing that a while ago, and I mostly gave up because our SESE region infrastructure is not used / tested much, especially during CodeGen. I think your implementation has a big advantage because it doesn't require to build such (expensive, IIRC) constructions and might be much faster since SESE regions are usually not available, and dominators are.

Few thoughts: (please correct me if I'm wrong):

  • We don't want to add extra branches. By that I mean that we want to guarantee that we always (or when we know it's not worth having extra branches) have a fall through from NMBB to MBB. In your example this is perfectly fine, and I see that NMBB is always inserted before MBB, which I think should be always fine.
  • We want to be sure that, in the end, if the points are not interesting, we don't insert NMBB.
  • I would run a verifier or add some statistics / remarks to see if any of the previous points are happening and if it causes any regressions.
  • I think doing this kind of simulations on the post-dominator tree itself as @qcolombet suggested sounds interesting. Correct me if I'm wrong, here we're looking to introduce a common post-dominator of all the "dirty" blocks, right?
I observed 160% more shrink-wrapping in spec2000/2006/2017 benchmarks if we apply the copy forwarding (D41835), PostRASink (D41463), and this change all together.

Just to be sure, on AArch64, right? Do you see any performance improvement with this change? Any regressions?

Since you marked this as [WIP] I'll skip any comments on the code itself for now.

Just two remarks:

  • So far shrink-wrapping was solely an analysis pass, i.e., it didn't modify the input code
  • Ideally I'd like we postpone creating new blocks until we decided where we are going to insert the code, otherwise we'll end up with blocks to clean-up later on

Regarding the SESE approach, I think it wouldn't be enough because what you want and not given by SESE is *new* common (post)dominator. E.g., in the motivating example we wanted to merge (A->C) (B->C) into (A->common) (B->common) (common->C).

Regarding the SESE approach, I think it wouldn't be enough because what you want and not given by SESE is *new* common (post)dominator. E.g., in the motivating example we wanted to merge (A->C) (B->C) into (A->common) (B->common) (common->C).

Yes, but if I understand the paper correctly, in Figure 4 (left) they basically create an SESE out of what they call "Set 1" in Figure 3 (depending on what they call "execution count cost model"). I read that part a while ago but I think at that point they are deciding whether to use "Region 1" as a save/restore boundary or to create a new SESE for that.

junbuml updated this revision to Diff 134496.Feb 15 2018, 1:50 PM

Based on the comments from Francis and Quentin, I tried to split the restore block only when we know that the block spilt is used as restore point. After performing current shrink wrapping, I ran post shrinking to see if we can shrink save point further by splitting restore point. Currently, this update is still WIP because I do this in shrink wrapping pass. If this approach is generally reasonable and we need to keep the shrink-wrap pass as an analysis pass, then I will move this in a new pass between shrink-wrap and PEI.

I observed 160% more shrink-wrapping in spec2000/2006/2017 benchmarks if we apply the copy forwarding (D41835), PostRASink (D41463), and this change all together.

Just to be sure, on AArch64, right? Do you see any performance improvement with this change? Any regressions?

Yes it was on AArch64. I applied this change on top of PostRASink (D41463), and I ran performance test for spec :

spec2000/mesa           -1.09 % 
spec2017/perlbench   +1.33 %
spec2017/xalancbmk  +3.04%
spec2017/povray        +3.59 %
spec2006/povray        +4.61%
junbuml added a comment.EditedFeb 21 2018, 2:40 PM

There are up/down in the scores of these benchmarks, but in povray many extra shrink wrapping happened in functions shown in profile and dynamic instruction count was decreased with this change: -4.80% (2006/povray) and -4.91% (2017/povray). Further shrink-wrapping happened in one of the hot function in xalanbmk, but the dynamic instruction count wasn't decreased in xalancbmk.

Hi Francis and Quentin,
Does any of you has any comment about it? I will be happy to hear any high level comment about it.
Thanks,
Jun

junbuml updated this revision to Diff 138433.Mar 14 2018, 12:25 PM

Rebased. In this change, I perform post shrinking where we try to further shrink the save point by splitting the restore point after finishing the regular shrink wrapping. This change is WIP for now because I do this in shrink wrapping pass. If overall approach is acceptable and we want to keep shrink-wrap pass as an analysis pas , I will move it as a new pass. Please take a look and let me know any comment.

junbuml updated this revision to Diff 140125.Mar 28 2018, 12:53 PM

Rebased and minor changes in comments. Please let me know if the approach I'm using here make sense.