Page MenuHomePhabricator

Split functions to create shrink wrapping opportunities
AbandonedPublic

Authored by tjablin on Dec 6 2015, 4:47 PM.

Details

Reviewers
kbarton
hfinkel
Summary

Shrink wrapping functions can improve performance for functions
when the stack is unused. However, all functions where value
live-ranges contain function calls are ineligible for shrink
wrapping. Why?

  1. To avoid saving and restoring values across call-sites, LLVM

allocates values with live-ranges that cross call sites to
callee-saved registers

  1. If callee-saved registers are used, they must be

saved (usually in the function prolog)

  1. ShrinkWrapping scans from the beginning of the function prolog

to the first use of the stack. If a callee-saved register is
saved to the stack in the function prolog, ShrinkWrapping will
give up early.

To increase the applicability of ShrinkWrapping, this pass
identifies instances where a live-range straddling a call-site
prevents ShrinkWrapping, and splits the original function into a
stackless ShrinkWrappable stub that potentially makes a tail call
to the remainder of the function.

This transformation harms debuggability and consequently is not
suitable for lower -O levels.

Diff Detail

Event Timeline

tjablin updated this revision to Diff 42022.Dec 6 2015, 4:47 PM
tjablin retitled this revision from to Split functions to create shrink wrapping opportunities .
tjablin updated this object.
tjablin added reviewers: hfinkel, kbarton.
tjablin added a subscriber: llvm-commits.
hfinkel edited edge metadata.Dec 10 2015, 6:29 AM

Before we move forward with this, an alternate strategy for this has seemingly presented itself (yesterday) which we should investigate: Enabling saving callee-saved registers by copying.

Please see http://reviews.llvm.org/D15340 (and http://reviews.llvm.org/D15341). Can we build on this somehow to avoid the need to split the functions early, but rather avoid the need for saving the registers in the prologue thus enabling us to shrink wrap the original function?

lib/Target/PowerPC/PPCEnableShrinkWrap.cpp
85

Debug intrinsics also need skipping.

103

There are many other things that use the stack, at least <= P7. int <-> fp conversions, for example, and vector insert/extracts (except with QPX).

lib/Target/PowerPC/PPCTargetMachine.cpp
306

Please don't reformat this whole section of code; please make a minimal change.

gberry added a subscriber: gberry.Dec 10 2015, 1:14 PM

It doesn't seem like there is anything target specific in this transform pass. On the chance that this moves forward, would it make sense to move the transform into lib/Transforms and just have ppc add it for now? That way other targets could potentially take advantage of this pass too.

tjablin updated this revision to Diff 45216.Jan 18 2016, 3:49 PM
tjablin edited edge metadata.

Per Hal's comments, use a more conservative behavior in the case of Extract and Cast Instructions.

I explored the enabling saving callee-saved registers by copying as in http://reviews.llvm.org/D15340 and http://reviews.llvm.org/D15341, but I think this solution is less invasive and less likely to harm performance when tail calls optimization is enabled. In particular, I am worried that saving callee-saved registers by copying could harm performance in tight loops.

majnemer added inline comments.
lib/Target/PowerPC/PPCEnableShrinkWrap.cpp
36

Shouldn't this be sorted lower?

69
76

llvm:: should be unnecessary.

tjablin updated this revision to Diff 45259.Jan 19 2016, 7:33 AM

Fix style errors pointed out by David.

tjablin abandoned this revision.Mar 30 2016, 7:49 AM

Replaced with D17948, D17533, and D16984.