This is an archive of the discontinued LLVM Phabricator instance.

Add a PreRASplit pass to enable more shrinkwrap
AbandonedPublic

Authored by wmi on Dec 8 2016, 3:41 PM.

Details

Reviewers
qcolombet
MatzeB
Summary

The patch is to solve the problem mentioned in PR29154.

This pass tries to split the live ranges of params which are passed via hard registers and live across calls inside the function. The live range of such param could be split into two parts: The first part contains the copy from param passing register and will never live across call, so it gets the freedom to be allocated to any non-CSR (Callee-Saved-Register). The second part will live across calls and will only be allocated to CSR.

The benefit of doing the split is we may get a better chance to do shrinkwrap. Another benefit is param passing copy may be sinked from entry to a colder branch (The fact that copy from hardreg cannot be sinked by machine sinking pass makes this patch more meaningful).

We get 2% performance improvement for an internal protobuf benchmark (https://github.com/google/protobuf) on SandyBridge. I am getting more performance number using spec2000.

Diff Detail

Repository
rL LLVM

Event Timeline

wmi updated this revision to Diff 80835.Dec 8 2016, 3:41 PM
wmi retitled this revision from to Add a PreRASplit pass to enable more shrinkwrap.
wmi updated this object.
wmi added reviewers: qcolombet, MatzeB.
wmi set the repository for this revision to rL LLVM.
wmi added subscribers: llvm-commits, davidxl.
qcolombet edited edge metadata.Dec 9 2016, 2:36 PM

Hi Wei,

This goes against the paradigm used in LLVM, which is aggressive coalescing then split on demand.
If we start having pre-splitting heuristics on top of the existing heuristics I am afraid we are making our future life worse.

Instead I would rather improve the splitting heuristics. For instance, I would encourage to pursue the approach taken in https://reviews.llvm.org/D27366.

Cheers,
-Quentin

wmi added a comment.Dec 9 2016, 3:56 PM

Ah, raising CSR cost to be high enough and using existing splitting is indeed a simpler approach. I didn't think of the idea even after you hinted this in PR. I just tried https://reviews.llvm.org/D27366 and it can cover the simple testcases I have. I can help to evaluate the performance on x86 and see how the overall performance looks. Thanks!

wmi added a comment.Feb 10 2017, 2:54 PM

Hi Nemanja,

I tried your testcase with my experimental patch (got from here: http://lists.llvm.org/pipermail/llvm-dev/2017-February/109977.html) and saw that the testcase was not shrinkwrap optimized (cmd I used: clang -O2 -target powerpc64le-grtev4-linux-gnu -S 1.c).

Existing reg splitting for live range across function calls took effect. After splitting, the sub vreg across call got CSR register assigned, and the other sub vreg got a non-CSR register. These all work as we expect. However, during tryHintRecoloring, the two sub vregs are coalesced again.

I add a simple logic in tryHIntRecoloring: If we are going to switch from a non-CSR reg to a CSR reg, only when the recoloring cost difference is larger than CSRCost, we will do such recoloring. In other words, to justify the planning recoloring, the benefit must be at least larger than the potential negative impact on shrinkwrapping.

With the change, the testcase is shrinkwrap optimized. I attach the changed experimental patch.

Thanks,
Wei.

wmi abandoned this revision.Feb 10 2017, 2:58 PM

Sorry, wrong reply. Drop the patch and try to push https://reviews.llvm.org/D27366