This is a attempt for fix issue PR28417.
Currently clang doesn't do sibling call optimization when function returns a struct by value:
mytype f(); mytype g() { return f(); }
Generated IR (-O2):
define void @g()(%struct.mytype* noalias sret) local_unnamed_addr #0 !dbg !7 { tail call void @f()(%struct.mytype* sret %0), !dbg !21 ret void, !dbg !22 }
Generated code:
g(): # @g() push rbx mov rbx, rdi call f() mov rax, rbx pop rbx ret
On the other hand clang can do sibling call optimization when struct is passed by pointer:
struct mytype { char const *a, *b, *c, *d; }; void f(mytype*); void g(mytype* a) { return f(a); }
Generated IR (-O2):
define void @g(mytype*)(%struct.mytype*) local_unnamed_addr #0 !dbg !7 { call void @llvm.dbg.value(metadata %struct.mytype* %0, metadata !13, metadata !DIExpression()), !dbg !14 tail call void @f(mytype*)(%struct.mytype* %0), !dbg !15 ret void, !dbg !16 }
Generated code:
g(mytype*): # @g(mytype*) jmp f(mytype*) # TAILCALL
The difference between these two IRs are the presence of sret attribute. I believe tail call optimization is possible in the first case too. The reason why tail call optimization is not performed is that X86TargetLowering::IsEligibleForTailCallOptimization has the following if:
// Also avoid sibcall optimization if either caller or callee uses struct // return semantics. if (isCalleeStructRet || isCallerStructRet) return false;
Unfortunately when this if was added, no explanation was given why it is needed here. The corresponding test case is marked rdar://7726868, but I can not see what it is about.
As far as I understand sibling call can be performed when both caller and callee are marked sret. The case when caller and callee have mismatched sret specification is trickier.
As I understand attribute sret serves two purposes:
- (from documentation) This indicates that the pointer parameter specifies the address of a structure that is the return value of the function in the source program. This pointer must be guaranteed by the caller to be valid: loads and stores to the structure may be assumed by the callee not to trap and to be properly aligned. This is not a valid attribute for return values.
- (from Itanium ABI) callee function should return its sret argument in RAX.
From (2) sret caller can not sibling-call non-sret callee. The reverse is allowed. Based on this I updated X86TargetLowering::IsEligibleForTailCallOptimization.
This is very minor, but shouldn't the prefixes really be X86 (instead of X32), X64 and X32 (instead of X32ABI)?