PowerPC is a target that can see great benefits from shrink-wrapping due to a high overhead for non-leaf calls (spilling SPR's and CSR's). However, the register allocator currently favours allocating a CSR over splitting a region. This in effect results in copies of parameter registers into CSR's in the entry block when the parameter is live across any calls in the function. And of course, this disables shrink-wrapping because the save point then must be the entry block.
Just providing a cost in TargetRegisterInfo::getCSRFirstUseCost() is not all that effective in alleviating this issue because it is a global setting and there are situations where allocating a CSR is better (i.e. less spilling around calls).
As the title mentions, this is a work in progress. The cost function probably needs a fair bit of tuning both for the actual values for the cost and for conditions under which a non-zero cost is returned. The current value is arbitrary (just seems to work well) and the condition is simple:
- The live range of the value spans any blocks that have no calls
However, with this very coarse cost function, we see a doubling in the number of shrink-wrapped functions as well as significant performance improvement. There are 4 SPEC INT benchmarks that degrade by 0.07% - 1.07%. Everything else improves by 2.7% - 4.7% (INT) and 0.16% - 15.0% (FP).
So hopefully through this patch, we can have a productive discussion of how to proceed with a more fine-grained cost model for allocating CSR's.