This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Add implementation for -msave-toc-indirect option - llvm portion
Needs RevisionPublic

Authored by syzaara on Oct 26 2017, 1:28 PM.

Details

Summary

This patch adds the implementation for -msave-toc-indirect. This allows the TOC register to be saved once in the prologue, if no dynamic stack allocations are in the function, rather than multiple times before each indirect call.

Diff Detail

Event Timeline

syzaara created this revision.Oct 26 2017, 1:28 PM
hfinkel requested changes to this revision.Oct 27 2017, 10:43 AM

I want to take a step back and discuss what problem this is trying to solve and how to best solve it. I understand that GCC has a flag for this (added in 4.7), but it is not clear to me that this is the best approach. Two thoughts:

First, we have low-hanging fruit to take care of before I'd want to consider something like this. If you look at the code generated for foo2 in this program:

typedef void (*fp)(int);

void foo2(fp f) {
  f(5);
  f(10);
}

you'll see:

mflr 0
std 31, -8(1) 
std 0, 16(1) 
stdu 1, -48(1)
mr 31, 1 
std 2, 24(1) 
std 30, 32(31)                  # 8-byte Folded Spill 
mr 30, 3 
li 3, 5
mr 12, 30
mtctr 30
bctrl 
ld 2, 24(1) 
li 3, 10 
mr 12, 30
std 2, 24(1) 
mtctr 30
bctrl 
ld 2, 24(1) 
ld 30, 32(31)                   # 8-byte Folded Reload
addi 1, 1, 48 
ld 0, 16(1) 
ld 31, -8(1) 
mtlr 0
blr   
.long 0
.quad 0

note specifically the part:

...
bctrl
ld 2, 24(1)
li 3, 10
mr 12, 30
std 2, 24(1) ; <-- This can go away because of the load above (no need to store again).
mtctr 30
...

I'd like to think that we already have something that would clean this up, but apparently we don't. We should.

Second, is another option to always do this and then use shrink wrapping to account for cold-path-only calls?

This revision now requires changes to proceed.Oct 27 2017, 10:43 AM
nemanjai edited edge metadata.Oct 28 2017, 3:00 AM

I'd like to think that we already have something that would clean this up, but apparently we don't. We should.

I was under the impression that this patch was supposed to be the something that cleans it up. What I thought this would be is a patch that does 2 things:

  1. Save the TOC in the prologue if the option is set and there's an indirect call in the function
  2. Fix up the call lowering sequence to not do the TOC save prior to the call

If it accomplishes the stated goals, it would achieve what you're after, right @hfinkel? It certainly removes the two TOC saves from your test case. Of course, the prologue is just chosen because it is guaranteed to dominate all the indirect calls. Otherwise we may not always be able to do this without some kind of dataflow analysis + code motion + possibly creation of new basic blocks.
For example, where's the right place to save the TOC on something like this:

void foo2(void (*f)(int), int A) {
  if (A > 10)
    f(5);
  f(15);
}

Second, is another option to always do this and then use shrink wrapping to account for cold-path-only calls?

If we're saving the TOC in the prologue, won't shrink-wrapping handle this anyway? When it moves the prologue, it moves the whole prologue. Of course, as a result of this patch, we'll always save the TOC pointer in any function that has indirect calls. This is of course worse if the indirect call happens only on the cold path. It might be worthwhile to not perform this transformation on functions that have only 1 indirect call (as that call might be on a cold path). But of course, then we'd still miss the opportunities where multiple indirect calls are all on cold paths. But, can't win 'em all.

lib/Target/PowerPC/PPCFrameLowering.cpp
1718

Can you just fold these conditions (negated as required) into the original if statement so that we don't have the weird empty-body conditional?

test/CodeGen/PowerPC/save_toc_indirect2.ll
9

If this test case checks that we still do the TOC saves in a function that has dynamic allocations, that should be clearly noted in the test case through the name of the function and/or comments.

I'd like to think that we already have something that would clean this up, but apparently we don't. We should.

I was under the impression that this patch was supposed to be the something that cleans it up.

Yes, but only in some special mode. I see no reason this needs to happen in some special mode.

What I thought this would be is a patch that does 2 things:

  1. Save the TOC in the prologue if the option is set and there's an indirect call in the function
  2. Fix up the call lowering sequence to not do the TOC save prior to the call

If it accomplishes the stated goals, it would achieve what you're after, right @hfinkel? It certainly removes the two TOC saves from your test case. Of course, the prologue is just chosen because it is guaranteed to dominate all the indirect calls. Otherwise we may not always be able to do this without some kind of dataflow analysis + code motion + possibly creation of new basic blocks.
For example, where's the right place to save the TOC on something like this:

void foo2(void (*f)(int), int A) {
  if (A > 10)
    f(5);
  f(15);
}

Second, is another option to always do this and then use shrink wrapping to account for cold-path-only calls?

If we're saving the TOC in the prologue, won't shrink-wrapping handle this anyway?

I'd hope so.

When it moves the prologue, it moves the whole prologue. Of course, as a result of this patch, we'll always save the TOC pointer in any function that has indirect calls. This is of course worse if the indirect call happens only on the cold path. It might be worthwhile to not perform this transformation on functions that have only 1 indirect call (as that call might be on a cold path). But of course, then we'd still miss the opportunities where multiple indirect calls are all on cold paths.

Right.

But, can't win 'em all.

This is why I recommend looking at cleaning up the existing generated code for saving/restoring r2, using some data-flow analysis to remove redundant saves, because that would be a pure win. Doing things with tradeoffs can come next.