This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Enable tail call opt for variadic function
ClosedPublic

Authored by Jim on Aug 15 2019, 1:28 AM.

Download Raw Diff

Details

Reviewers

mgrang
asb
lenary
lewis-revill

Commits

rL370835: [RISCV] Enable tail call opt for variadic function
rGb77aa1d2486f: [RISCV] Enable tail call opt for variadic function

Summary

Tail call opt can treat variadic function call the same as normal function call

Diff Detail

Repository: rL LLVM

Event Timeline

Jim created this revision.Aug 15 2019, 1:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2019, 1:28 AM

Herald added subscribers: llvm-commits, s.egerton, benna and 20 others. · View Herald Transcript

Hi Jim,

Could you add a test case to illustrate that if the variable argument function using caller stack, it could be guarded by "getNextStackOffset() != 0" ?
It might be worth to preserve the original test case so that we could observe the case didn't pass arguments by the stack.

Herald added a subscriber: pzheng. · View Herald TranscriptAug 20 2019, 7:54 PM

Function with varargs passed by stack cannot be optimized to tail call.
If the varargs are all passed by registers, it can do tail call opt.

Please may you explain a bit further why calls using varargs (when not passed by the stack) are allowed to be tail-call-optimised?

I feel the justification would be good documentation to go with the patch, and will help out me and other reviewers.

luismarques added a subscriber: luismarques.Aug 28 2019, 6:58 AM

luismarques added inline comments.

lib/Target/RISCV/RISCVISelLowering.cpp
1966	@Jim you say in a review comment that "If the varargs are all passed by registers, it can do tail call opt." but this comment just says that functions with varargs cannot be tail call optimized.

Jim marked an inline comment as done.Aug 28 2019, 7:20 PM

Jim added inline comments.

lib/Target/RISCV/RISCVISelLowering.cpp
1966	@luismarques I think this comment is incorrect. If no arguments are passed via the stack, it should be allowed to be tail-call-optimised. Whether the function is with varargs or not isn't the condition to decide to do tail-call-opt. We can treat the function with varargs the same as the function without varargs

@lenary
If any arguments are passed by the stack, it is not allowed to do tail-call-opt.
Because the caller would allocate the stack for passing the arguments, and need to
free the stack after the call finished (the call must return back for free the stack).

The only difference on passing the varargs is that 2xXLen argument need to
be assigned an 'even' or 'aligned' register (8-byte alignment for RV32 or 16-byte alignment for RV64).
So the function with varargs is allowed to be tail-call-optimised if no arguments are passed via the stack.

In D66278#1646758, @lenary wrote:

Please may you explain a bit further why calls using varargs (when not passed by the stack) are allowed to be tail-call-optimised?

I feel the justification would be good documentation to go with the patch, and will help out me and other reviewers.

Because the caller's stack frame (if any) can be completely deallocated before the tail call, leaving the only state as the return address (in ra) and the arguments being passed to the tail-called function (in a0-a7).

__attribute__((noinline))
int print_scaled(unsigned long n, int scale){
  return printf("%lu.%lu", n/scale, n%scale);
}

x86_64 gcc tail calls printf at -O2 or -Os. So does RISC-V gcc. Here -Os:

00000000000101b0 <print_scaled>:

101b0:       02b57633                remu    a2,a0,a1
101b4:       02b555b3                divu    a1,a0,a1
101b8:       0001a537                lui     a0,0x1a
101bc:       93050513                addi    a0,a0,-1744 # 19930 <__clzdi2+0x36>
101c0:       19c0006f                j       1035c <printf>

Changing the function to...

int print_scaled(unsigned long n, int scale){
  return printf("%lu.%lu %lu.%lu %lu.%lu %lu.%lu ",
                n/scale, n%scale, n/scale, n%scale, n/scale, n%scale, n/scale, n%scale);
}

... prevents tail calling on RISC-V because with nine arguments the last n%scale goes on the stack. On x86_64 the last four arguments are pushed.

Eliminating one pair enables tail calling on RISC-V, but x86_64 still spills to the stack. The x86 tail-calls with two copies (five arguments total), which is its maximum in registers.

It gets trickier if the calling function creates a stack frame, for example because it calls some other function(s) as well, or simply has too many live local variables. In this case the arguments for the printf need to be set up, then ra and any s registers reloaded and the stack popped, before the tail call.

__attribute__((noinline))
int power10(int n){
  return n == 0 ? 1 : 10 * power10(n - 1);
}

__attribute__((noinline))
int print_scaled(unsigned long n, int digits){
  int scale = power10(digits);
  return printf("%lu.%lu", n/scale, n%scale);
}

00000000000101c0 <print_scaled>:

101c0:       1141                    addi    sp,sp,-16
101c2:       e022                    sd      s0,0(sp)
101c4:       842a                    mv      s0,a0
101c6:       852e                    mv      a0,a1
101c8:       e406                    sd      ra,8(sp)
101ca:       fe5ff0ef                jal     ra,101ae <power10>
101ce:       02a47633                remu    a2,s0,a0
101d2:       60a2                    ld      ra,8(sp)
101d4:       02a455b3                divu    a1,s0,a0
101d8:       6402                    ld      s0,0(sp)
101da:       0001a537                lui     a0,0x1a
101de:       95050513                addi    a0,a0,-1712 # 19950 <__clzdi2+0x32>
101e2:       0141                    addi    sp,sp,16
101e4:       19c0006f                j       10380 <printf>

ra was saved to the stack so power10 could be called and s0 was saved to the stack so there is somewhere to save n over the call to power10. Both ra and s0 can be restored and the stack frame deallocated before tail-calling printf. This is only possible because none of the arguments to printf needed to be stored in the stack frame -- the arguments require only a0, a1 and a2 in this case.

It can focus on whether the stack frame is created and not yet freed before the function call
for saving saved register or passing parameters or others.

In D66278#1652021, @Jim wrote:

It can focus on whether the stack frame is created and not yet freed before the function call
for saving saved register or passing parameters or others.

When you have a function in tail position with arguments that fit in registers the stack frame can *always* be freed first, because by definition the onward argument values and return address and address of the function to be called are the ONLY values in the function that are live.

Ok, I'm happy with this patch and your explanation.

This revision is now accepted and ready to land.Sep 3 2019, 2:40 AM

Closed by commit rGb77aa1d2486f: [RISCV] Enable tail call opt for variadic function (authored by Jim). · Explain WhySep 3 2019, 7:04 PM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: hiraditya. · View Herald TranscriptSep 3 2019, 7:04 PM

Revision Contents

Path

Size

lib/

Target/

RISCV/

RISCVISelLowering.cpp

5 lines

test/

CodeGen/

RISCV/

tail-calls.ll

11 lines

Diff 215337

lib/Target/RISCV/RISCVISelLowering.cpp

	Show First 20 Lines • Show All 1,940 Lines • ▼ Show 20 Lines
	/// for tail call optimization.			/// for tail call optimization.
	/// Note: This is modelled after ARM's IsEligibleForTailCallOptimization.			/// Note: This is modelled after ARM's IsEligibleForTailCallOptimization.
	bool RISCVTargetLowering::isEligibleForTailCallOptimization(			bool RISCVTargetLowering::isEligibleForTailCallOptimization(
	CCState &CCInfo, CallLoweringInfo &CLI, MachineFunction &MF,			CCState &CCInfo, CallLoweringInfo &CLI, MachineFunction &MF,
	const SmallVector<CCValAssign, 16> &ArgLocs) const {			const SmallVector<CCValAssign, 16> &ArgLocs) const {

	auto &Callee = CLI.Callee;			auto &Callee = CLI.Callee;
	auto CalleeCC = CLI.CallConv;			auto CalleeCC = CLI.CallConv;
	auto IsVarArg = CLI.IsVarArg;
	auto &Outs = CLI.Outs;			auto &Outs = CLI.Outs;
	auto &Caller = MF.getFunction();			auto &Caller = MF.getFunction();
	auto CallerCC = Caller.getCallingConv();			auto CallerCC = Caller.getCallingConv();

	// Do not tail call opt functions with "disable-tail-calls" attribute.			// Do not tail call opt functions with "disable-tail-calls" attribute.
	if (Caller.getFnAttribute("disable-tail-calls").getValueAsString() == "true")			if (Caller.getFnAttribute("disable-tail-calls").getValueAsString() == "true")
	return false;			return false;

	// Exception-handling functions need a special set of instructions to			// Exception-handling functions need a special set of instructions to
	// indicate a return to the hardware. Tail-calling another function would			// indicate a return to the hardware. Tail-calling another function would
	// probably break this.			// probably break this.
	// TODO: The "interrupt" attribute isn't currently defined by RISC-V. This			// TODO: The "interrupt" attribute isn't currently defined by RISC-V. This
	// should be expanded as new function attributes are introduced.			// should be expanded as new function attributes are introduced.
	if (Caller.hasFnAttribute("interrupt"))			if (Caller.hasFnAttribute("interrupt"))
	return false;			return false;

	// Do not tail call opt functions with varargs.
	luismarquesUnsubmitted Not Done Reply Inline Actions @Jim you say in a review comment that "If the varargs are all passed by registers, it can do tail call opt." but this comment just says that functions with varargs cannot be tail call optimized. luismarques: @Jim you say in a review comment that "If the varargs are all passed by registers, it can do…
	JimAuthorUnsubmitted Done Reply Inline Actions @luismarques I think this comment is incorrect. If no arguments are passed via the stack, it should be allowed to be tail-call-optimised. Whether the function is with varargs or not isn't the condition to decide to do tail-call-opt. We can treat the function with varargs the same as the function without varargs Jim: @luismarques I think this comment is incorrect. If no arguments are passed via the stack, it…
	if (IsVarArg)
	return false;

	// Do not tail call opt if the stack is used to pass parameters.			// Do not tail call opt if the stack is used to pass parameters.
	if (CCInfo.getNextStackOffset() != 0)			if (CCInfo.getNextStackOffset() != 0)
	return false;			return false;

	// Do not tail call opt if any parameters need to be passed indirectly.			// Do not tail call opt if any parameters need to be passed indirectly.
	// Since long doubles (fp128) and i128 are larger than 2*XLEN, they are			// Since long doubles (fp128) and i128 are larger than 2*XLEN, they are
	// passed indirectly. So the address of the value will be passed in a			// passed indirectly. So the address of the value will be passed in a
	// register, or if not available, then the address is put on the stack. In			// register, or if not available, then the address is put on the stack. In
	▲ Show 20 Lines • Show All 785 Lines • Show Last 20 Lines

test/CodeGen/RISCV/tail-calls.ll

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: jr a5			; CHECK-NEXT: jr a5
	entry:			entry:
	%tobool = icmp eq i32 %a, 0			%tobool = icmp eq i32 %a, 0
	%callee = select i1 %tobool, void ()* @callee_indirect1, void ()* @callee_indirect2			%callee = select i1 %tobool, void ()* @callee_indirect1, void ()* @callee_indirect2
	tail call void %callee()			tail call void %callee()
	ret void			ret void
	}			}

	; Do not tail call optimize functions with varargs.
	declare i32 @callee_varargs(i32, ...)
	define void @caller_varargs(i32 %a, i32 %b) nounwind {
	; CHECK-LABEL: caller_varargs
	; CHECK-NOT: tail callee_varargs
	; CHECK: call callee_varargs
	entry:
	%call = tail call i32 (i32, ...) @callee_varargs(i32 %a, i32 %b, i32 %b, i32 %a)
	ret void
	}

	; Do not tail call optimize if stack is used to pass parameters.			; Do not tail call optimize if stack is used to pass parameters.
	declare i32 @callee_args(i32 %a, i32 %b, i32 %c, i32 %dd, i32 %e, i32 %ff, i32 %g, i32 %h, i32 %i, i32 %j, i32 %k, i32 %l, i32 %m, i32 %n)			declare i32 @callee_args(i32 %a, i32 %b, i32 %c, i32 %dd, i32 %e, i32 %ff, i32 %g, i32 %h, i32 %i, i32 %j, i32 %k, i32 %l, i32 %m, i32 %n)
	define i32 @caller_args(i32 %a, i32 %b, i32 %c, i32 %dd, i32 %e, i32 %ff, i32 %g, i32 %h, i32 %i, i32 %j, i32 %k, i32 %l, i32 %m, i32 %n) nounwind {			define i32 @caller_args(i32 %a, i32 %b, i32 %c, i32 %dd, i32 %e, i32 %ff, i32 %g, i32 %h, i32 %i, i32 %j, i32 %k, i32 %l, i32 %m, i32 %n) nounwind {
	; CHECK-LABEL: caller_args			; CHECK-LABEL: caller_args
	; CHECK-NOT: tail callee_args			; CHECK-NOT: tail callee_args
	; CHECK: call callee_args			; CHECK: call callee_args
	entry:			entry:
	%r = tail call i32 @callee_args(i32 %a, i32 %b, i32 %c, i32 %dd, i32 %e, i32 %ff, i32 %g, i32 %h, i32 %i, i32 %j, i32 %k, i32 %l, i32 %m, i32 %n)			%r = tail call i32 @callee_args(i32 %a, i32 %b, i32 %c, i32 %dd, i32 %e, i32 %ff, i32 %g, i32 %h, i32 %i, i32 %j, i32 %k, i32 %l, i32 %m, i32 %n)
	▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines