This patch adds the implementation for -msave-toc-indirect. This allows the TOC register to be saved once in the prologue, if no dynamic stack allocations are in the function, rather than multiple times before each indirect call.
Diff Detail
Event Timeline
I want to take a step back and discuss what problem this is trying to solve and how to best solve it. I understand that GCC has a flag for this (added in 4.7), but it is not clear to me that this is the best approach. Two thoughts:
First, we have low-hanging fruit to take care of before I'd want to consider something like this. If you look at the code generated for foo2 in this program:
typedef void (*fp)(int); void foo2(fp f) { f(5); f(10); }
you'll see:
mflr 0 std 31, -8(1) std 0, 16(1) stdu 1, -48(1) mr 31, 1 std 2, 24(1) std 30, 32(31) # 8-byte Folded Spill mr 30, 3 li 3, 5 mr 12, 30 mtctr 30 bctrl ld 2, 24(1) li 3, 10 mr 12, 30 std 2, 24(1) mtctr 30 bctrl ld 2, 24(1) ld 30, 32(31) # 8-byte Folded Reload addi 1, 1, 48 ld 0, 16(1) ld 31, -8(1) mtlr 0 blr .long 0 .quad 0
note specifically the part:
... bctrl ld 2, 24(1) li 3, 10 mr 12, 30 std 2, 24(1) ; <-- This can go away because of the load above (no need to store again). mtctr 30 ...
I'd like to think that we already have something that would clean this up, but apparently we don't. We should.
Second, is another option to always do this and then use shrink wrapping to account for cold-path-only calls?
I'd like to think that we already have something that would clean this up, but apparently we don't. We should.
I was under the impression that this patch was supposed to be the something that cleans it up. What I thought this would be is a patch that does 2 things:
- Save the TOC in the prologue if the option is set and there's an indirect call in the function
- Fix up the call lowering sequence to not do the TOC save prior to the call
If it accomplishes the stated goals, it would achieve what you're after, right @hfinkel? It certainly removes the two TOC saves from your test case. Of course, the prologue is just chosen because it is guaranteed to dominate all the indirect calls. Otherwise we may not always be able to do this without some kind of dataflow analysis + code motion + possibly creation of new basic blocks.
For example, where's the right place to save the TOC on something like this:
void foo2(void (*f)(int), int A) { if (A > 10) f(5); f(15); }
Second, is another option to always do this and then use shrink wrapping to account for cold-path-only calls?
If we're saving the TOC in the prologue, won't shrink-wrapping handle this anyway? When it moves the prologue, it moves the whole prologue. Of course, as a result of this patch, we'll always save the TOC pointer in any function that has indirect calls. This is of course worse if the indirect call happens only on the cold path. It might be worthwhile to not perform this transformation on functions that have only 1 indirect call (as that call might be on a cold path). But of course, then we'd still miss the opportunities where multiple indirect calls are all on cold paths. But, can't win 'em all.
lib/Target/PowerPC/PPCFrameLowering.cpp | ||
---|---|---|
1718 | Can you just fold these conditions (negated as required) into the original if statement so that we don't have the weird empty-body conditional? | |
test/CodeGen/PowerPC/save_toc_indirect2.ll | ||
9 | If this test case checks that we still do the TOC saves in a function that has dynamic allocations, that should be clearly noted in the test case through the name of the function and/or comments. |
Yes, but only in some special mode. I see no reason this needs to happen in some special mode.
What I thought this would be is a patch that does 2 things:
- Save the TOC in the prologue if the option is set and there's an indirect call in the function
- Fix up the call lowering sequence to not do the TOC save prior to the call
If it accomplishes the stated goals, it would achieve what you're after, right @hfinkel? It certainly removes the two TOC saves from your test case. Of course, the prologue is just chosen because it is guaranteed to dominate all the indirect calls. Otherwise we may not always be able to do this without some kind of dataflow analysis + code motion + possibly creation of new basic blocks.
For example, where's the right place to save the TOC on something like this:void foo2(void (*f)(int), int A) { if (A > 10) f(5); f(15); }Second, is another option to always do this and then use shrink wrapping to account for cold-path-only calls?
If we're saving the TOC in the prologue, won't shrink-wrapping handle this anyway?
I'd hope so.
When it moves the prologue, it moves the whole prologue. Of course, as a result of this patch, we'll always save the TOC pointer in any function that has indirect calls. This is of course worse if the indirect call happens only on the cold path. It might be worthwhile to not perform this transformation on functions that have only 1 indirect call (as that call might be on a cold path). But of course, then we'd still miss the opportunities where multiple indirect calls are all on cold paths.
Right.
But, can't win 'em all.
This is why I recommend looking at cleaning up the existing generated code for saving/restoring r2, using some data-flow analysis to remove redundant saves, because that would be a pure win. Doing things with tradeoffs can come next.
Can you just fold these conditions (negated as required) into the original if statement so that we don't have the weird empty-body conditional?