This is an archive of the discontinued LLVM Phabricator instance.

[PPC] Eliminate stack frame in non-leaf function based on shrink wrapping
AbandonedPublic

Authored by inouehrs on Mar 13 2017, 9:41 AM.

Details

Summary

We can expand the scope of shrink wrapping by not creating a stack frame in non-leaf functions.
For example, in the call sequence of A->B->C , we can help shrink wrapping while compiling B by allowing C directly return the control to A when exiting from C. I call it direct return.
In this case, we do not need to create stack frame in B and it increases the opportunity of shrink wrapping.
To apply direct return, we need to confirm some conditions including:

  1. invocation B->C must be a tail call (i.e. no instruction between call and return in B).
  2. invocation B->C must be with the internal linkage since direct return does not comply with ABI.
  3. other ABI specific checks.

In this patch, this optimization is enabled only for PowerPC64 with ELFv2 ABI, but it might be applicable for other platforms by implementing a platform-specific part.

The original motivation of this patch is to optimize the hot method of tcmalloc. In which GCC applies shrink wrapping with the direct return, but LLVM cannot. By applying this patch and basic block deduplication ( https://reviews.llvm.org/D30774 ), LLVM can do shrink wrapping for this hot method.

Diff Detail

Event Timeline

inouehrs created this revision.Mar 13 2017, 9:41 AM
qcolombet edited edge metadata.Mar 16 2017, 6:23 PM

Hi,

I haven't looked at the patch proper yet.
Inlined a couple of stylistic comment.

One high level comment, I am not super confortable to change the "status" of the shrink-wrapping pass from a pure analysis to an optimization, i.e., with this patch we actually change the code in the machine function. That might be the right thing to do, but I have to have a closer look to give an informed opinion.

Cheers,
-Quentin

include/llvm/Target/TargetInstrInfo.h
162

Use /// (doxygen style comment)

162

supported

lib/CodeGen/ShrinkWrap.cpp
87

applied

137

can directly return

146

Move the comment of what this method is doing here for consistency.

146

Lower case for the first letter of the method name.

450

Iterate over the basic block etc.

450

What do you mean by target?
Candidate?

452

to directly

452

You mean the current MF as callee, right?

472

Add a message in the assert, eg. By construction we must have one successor here

502

Period

625

Proper sentence please (Capital letter at the beginning and period at the end).
See http://llvm.org/docs/CodingStandards.html#commenting

inouehrs updated this revision to Diff 92321.Mar 20 2017, 8:23 AM

I updated the comments and assert based on the suggestion.
I did rebase to the latest code base.

Quentin,

Thank you so much for your suggestions.
I am thinking how I can avoid modifying the code in this 'analysis' pass. I appreciate your further advice.
Best regards.

sfertile edited edge metadata.May 23 2017, 12:09 PM

@inouehrs Should this patch be abandoned and we focus on when we can refine the tail-call checks during lowering?

nemanjai edited edge metadata.Jun 29 2017, 1:22 AM

@inouehrs Should this patch be abandoned and we focus on when we can refine the tail-call checks during lowering?

Yes, I think we should try to show whether this approach provides any benefit over simply using the tail call optimization and abandon it if it doesn't.

Thanks for the suggestions. Let me thimk the differences between TCO and this approach.
(@sfertile Sorry seems to have missed your comment above.)

The intention of this patch is avoid creating a stack frame if all method calls in a function is optimized with tailcallopt.
But I think I should do such optimization without bothering shrink wrap analysis; x86 backend seems already doing such opt.
So, I will abandon this patch. Thank you.

inouehrs abandoned this revision.Jun 30 2017, 12:21 AM

To seek another approach as discussed above.