This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Enable tail call opt for variadic function
ClosedPublic

Authored by Jim on Aug 15 2019, 1:28 AM.

Details

Summary

Tail call opt can treat variadic function call the same as normal function call

Diff Detail

Event Timeline

Jim created this revision.Aug 15 2019, 1:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2019, 1:28 AM

Hi Jim,

Could you add a test case to illustrate that if the variable argument function using caller stack, it could be guarded by "getNextStackOffset() != 0" ?
It might be worth to preserve the original test case so that we could observe the case didn't pass arguments by the stack.

Jim updated this revision to Diff 216762.Aug 22 2019, 8:22 PM

Function with varargs passed by stack cannot be optimized to tail call.
If the varargs are all passed by registers, it can do tail call opt.

Please may you explain a bit further why calls using varargs (when not passed by the stack) are allowed to be tail-call-optimised?

I feel the justification would be good documentation to go with the patch, and will help out me and other reviewers.

luismarques added inline comments.
lib/Target/RISCV/RISCVISelLowering.cpp
1966 ↗(On Diff #216762)

@Jim you say in a review comment that "If the varargs are all passed by registers, it can do tail call opt." but this comment just says that functions with varargs cannot be tail call optimized.

Jim marked an inline comment as done.Aug 28 2019, 7:20 PM
Jim added inline comments.
lib/Target/RISCV/RISCVISelLowering.cpp
1966 ↗(On Diff #216762)

@luismarques I think this comment is incorrect. If no arguments are passed via the stack, it should be allowed to be tail-call-optimised. Whether the function is with varargs or not isn't the condition to decide to do tail-call-opt. We can treat the function with varargs the same as the function without varargs

Jim added a comment.Aug 28 2019, 8:30 PM

@lenary
If any arguments are passed by the stack, it is not allowed to do tail-call-opt.
Because the caller would allocate the stack for passing the arguments, and need to
free the stack after the call finished (the call must return back for free the stack).

The only difference on passing the varargs is that 2xXLen argument need to
be assigned an 'even' or 'aligned' register (8-byte alignment for RV32 or 16-byte alignment for RV64).
So the function with varargs is allowed to be tail-call-optimised if no arguments are passed via the stack.

brucehoult added a comment.EditedAug 29 2019, 3:59 AM

Please may you explain a bit further why calls using varargs (when not passed by the stack) are allowed to be tail-call-optimised?

I feel the justification would be good documentation to go with the patch, and will help out me and other reviewers.

Because the caller's stack frame (if any) can be completely deallocated before the tail call, leaving the only state as the return address (in ra) and the arguments being passed to the tail-called function (in a0-a7).

__attribute__((noinline))
int print_scaled(unsigned long n, int scale){
  return printf("%lu.%lu", n/scale, n%scale);
}

x86_64 gcc tail calls printf at -O2 or -Os. So does RISC-V gcc. Here -Os:

00000000000101b0 <print_scaled>:

101b0:       02b57633                remu    a2,a0,a1
101b4:       02b555b3                divu    a1,a0,a1
101b8:       0001a537                lui     a0,0x1a
101bc:       93050513                addi    a0,a0,-1744 # 19930 <__clzdi2+0x36>
101c0:       19c0006f                j       1035c <printf>

Changing the function to...

int print_scaled(unsigned long n, int scale){
  return printf("%lu.%lu %lu.%lu %lu.%lu %lu.%lu ",
                n/scale, n%scale, n/scale, n%scale, n/scale, n%scale, n/scale, n%scale);
}

... prevents tail calling on RISC-V because with nine arguments the last n%scale goes on the stack. On x86_64 the last four arguments are pushed.

Eliminating one pair enables tail calling on RISC-V, but x86_64 still spills to the stack. The x86 tail-calls with two copies (five arguments total), which is its maximum in registers.

It gets trickier if the calling function creates a stack frame, for example because it calls some other function(s) as well, or simply has too many live local variables. In this case the arguments for the printf need to be set up, then ra and any s registers reloaded and the stack popped, before the tail call.

__attribute__((noinline))
int power10(int n){
  return n == 0 ? 1 : 10 * power10(n - 1);
}

__attribute__((noinline))
int print_scaled(unsigned long n, int digits){
  int scale = power10(digits);
  return printf("%lu.%lu", n/scale, n%scale);
}

00000000000101c0 <print_scaled>:

101c0:       1141                    addi    sp,sp,-16
101c2:       e022                    sd      s0,0(sp)
101c4:       842a                    mv      s0,a0
101c6:       852e                    mv      a0,a1
101c8:       e406                    sd      ra,8(sp)
101ca:       fe5ff0ef                jal     ra,101ae <power10>
101ce:       02a47633                remu    a2,s0,a0
101d2:       60a2                    ld      ra,8(sp)
101d4:       02a455b3                divu    a1,s0,a0
101d8:       6402                    ld      s0,0(sp)
101da:       0001a537                lui     a0,0x1a
101de:       95050513                addi    a0,a0,-1712 # 19950 <__clzdi2+0x32>
101e2:       0141                    addi    sp,sp,16
101e4:       19c0006f                j       10380 <printf>

ra was saved to the stack so power10 could be called and s0 was saved to the stack so there is somewhere to save n over the call to power10. Both ra and s0 can be restored and the stack frame deallocated before tail-calling printf. This is only possible because none of the arguments to printf needed to be stored in the stack frame -- the arguments require only a0, a1 and a2 in this case.

Jim added a comment.Aug 29 2019, 7:24 PM

It can focus on whether the stack frame is created and not yet freed before the function call
for saving saved register or passing parameters or others.

In D66278#1652021, @Jim wrote:

It can focus on whether the stack frame is created and not yet freed before the function call
for saving saved register or passing parameters or others.

When you have a function in tail position with arguments that fit in registers the stack frame can *always* be freed first, because by definition the onward argument values and return address and address of the function to be called are the ONLY values in the function that are live.

lenary accepted this revision.Sep 3 2019, 2:40 AM

Ok, I'm happy with this patch and your explanation.

This revision is now accepted and ready to land.Sep 3 2019, 2:40 AM
This revision was automatically updated to reflect the committed changes.