This is a attempt for fix issue PR28417.
Currently clang doesn't do sibling call optimization when function returns a struct by value:
mytype f();
mytype g()
{
return f();
}
Generated IR (-O2):
define void @g()(%struct.mytype* noalias sret) local_unnamed_addr #0 !dbg !7 {
tail call void @f()(%struct.mytype* sret %0), !dbg !21 ret void, !dbg !22
}
Generated code:
g(): # @g()
push rbx mov rbx, rdi call f() mov rax, rbx pop rbx ret
On the other hand clang can do sibling call optimization when struct is passed by pointer:
struct mytype
{
char const *a, *b, *c, *d;
};
void f(mytype*);
void g(mytype* a)
{
return f(a);
}
Generated IR (-O2):
define void @g(mytype*)(%struct.mytype*) local_unnamed_addr #0 !dbg !7 {
call void @llvm.dbg.value(metadata %struct.mytype* %0, metadata !13, metadata !DIExpression()), !dbg !14 tail call void @f(mytype*)(%struct.mytype* %0), !dbg !15 ret void, !dbg !16
}
Generated code:
g(mytype*): # @g(mytype*)
jmp f(mytype*) # TAILCALL
The difference between these two IRs are the presence of sret attribute. I believe tail call optimization is possible in the first case too. The reason why tail call optimization is not performed is that X86TargetLowering::IsEligibleForTailCallOptimization has the following if:
Also avoid sibcall optimization if either caller or callee uses struct
return semantics.
if (isCalleeStructRet || isCallerStructRet)
return false;
Unfortunately when this if was added, no explanation was given why it is needed here. The corresponding test case is marked rdar://7726868, but I can not see what it is about.
As far as I understand sibling call can be performed when both caller and callee are marked sret. The case when caller and callee have mismatched sret specification is trickier.
As I understand attribute sret serves two purposes:
(from documentation) This indicates that the pointer parameter specifies the address of a structure that is the return value of the function in the source program. This pointer must be guaranteed by the caller to be valid: loads and stores to the structure may be assumed by the callee not to trap and to be properly aligned. This is not a valid attribute for return values.
(from Itanium ABI) callee function should return its sret argument in RAX.
From (2) sret caller can not sibling-call non-sret callee. The reverse is allowed. Based on this I updated X86TargetLowering::IsEligibleForTailCallOptimization.
Base patch by https://reviews.llvm.org/D45653
Not sure why the CallerF.arg_size() != Outs.size() check is necessary? In any case, needs a comment to explain.