Tail call opt can treat variadic function call the same as normal function call
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Hi Jim,
Could you add a test case to illustrate that if the variable argument function using caller stack, it could be guarded by "getNextStackOffset() != 0" ?
It might be worth to preserve the original test case so that we could observe the case didn't pass arguments by the stack.
Function with varargs passed by stack cannot be optimized to tail call.
If the varargs are all passed by registers, it can do tail call opt.
Please may you explain a bit further why calls using varargs (when not passed by the stack) are allowed to be tail-call-optimised?
I feel the justification would be good documentation to go with the patch, and will help out me and other reviewers.
lib/Target/RISCV/RISCVISelLowering.cpp | ||
---|---|---|
1966 | @luismarques I think this comment is incorrect. If no arguments are passed via the stack, it should be allowed to be tail-call-optimised. Whether the function is with varargs or not isn't the condition to decide to do tail-call-opt. We can treat the function with varargs the same as the function without varargs |
@lenary
If any arguments are passed by the stack, it is not allowed to do tail-call-opt.
Because the caller would allocate the stack for passing the arguments, and need to
free the stack after the call finished (the call must return back for free the stack).
The only difference on passing the varargs is that 2xXLen argument need to
be assigned an 'even' or 'aligned' register (8-byte alignment for RV32 or 16-byte alignment for RV64).
So the function with varargs is allowed to be tail-call-optimised if no arguments are passed via the stack.
Because the caller's stack frame (if any) can be completely deallocated before the tail call, leaving the only state as the return address (in ra) and the arguments being passed to the tail-called function (in a0-a7).
__attribute__((noinline)) int print_scaled(unsigned long n, int scale){ return printf("%lu.%lu", n/scale, n%scale); }
x86_64 gcc tail calls printf at -O2 or -Os. So does RISC-V gcc. Here -Os:
00000000000101b0 <print_scaled>:
101b0: 02b57633 remu a2,a0,a1 101b4: 02b555b3 divu a1,a0,a1 101b8: 0001a537 lui a0,0x1a 101bc: 93050513 addi a0,a0,-1744 # 19930 <__clzdi2+0x36> 101c0: 19c0006f j 1035c <printf>
Changing the function to...
int print_scaled(unsigned long n, int scale){ return printf("%lu.%lu %lu.%lu %lu.%lu %lu.%lu ", n/scale, n%scale, n/scale, n%scale, n/scale, n%scale, n/scale, n%scale); }
... prevents tail calling on RISC-V because with nine arguments the last n%scale goes on the stack. On x86_64 the last four arguments are pushed.
Eliminating one pair enables tail calling on RISC-V, but x86_64 still spills to the stack. The x86 tail-calls with two copies (five arguments total), which is its maximum in registers.
It gets trickier if the calling function creates a stack frame, for example because it calls some other function(s) as well, or simply has too many live local variables. In this case the arguments for the printf need to be set up, then ra and any s registers reloaded and the stack popped, before the tail call.
__attribute__((noinline)) int power10(int n){ return n == 0 ? 1 : 10 * power10(n - 1); } __attribute__((noinline)) int print_scaled(unsigned long n, int digits){ int scale = power10(digits); return printf("%lu.%lu", n/scale, n%scale); }
00000000000101c0 <print_scaled>:
101c0: 1141 addi sp,sp,-16 101c2: e022 sd s0,0(sp) 101c4: 842a mv s0,a0 101c6: 852e mv a0,a1 101c8: e406 sd ra,8(sp) 101ca: fe5ff0ef jal ra,101ae <power10> 101ce: 02a47633 remu a2,s0,a0 101d2: 60a2 ld ra,8(sp) 101d4: 02a455b3 divu a1,s0,a0 101d8: 6402 ld s0,0(sp) 101da: 0001a537 lui a0,0x1a 101de: 95050513 addi a0,a0,-1712 # 19950 <__clzdi2+0x32> 101e2: 0141 addi sp,sp,16 101e4: 19c0006f j 10380 <printf>
ra was saved to the stack so power10 could be called and s0 was saved to the stack so there is somewhere to save n over the call to power10. Both ra and s0 can be restored and the stack frame deallocated before tail-calling printf. This is only possible because none of the arguments to printf needed to be stored in the stack frame -- the arguments require only a0, a1 and a2 in this case.
It can focus on whether the stack frame is created and not yet freed before the function call
for saving saved register or passing parameters or others.
When you have a function in tail position with arguments that fit in registers the stack frame can *always* be freed first, because by definition the onward argument values and return address and address of the function to be called are the ONLY values in the function that are live.
@Jim you say in a review comment that "If the varargs are all passed by registers, it can do tail call opt." but this comment just says that functions with varargs cannot be tail call optimized.