This is an archive of the discontinued LLVM Phabricator instance.

Enable sibling-call optimization for functions returning structs
AbandonedPublic

Authored by xbolva00 on Apr 30 2018, 4:33 AM.

Details

Reviewers
efriedma
Summary

This is a attempt for fix issue PR28417.

Currently clang doesn't do sibling call optimization when function returns a struct by value:

mytype f();

mytype g()
{

return f();

}
Generated IR (-O2):

define void @g()(%struct.mytype* noalias sret) local_unnamed_addr #0 !dbg !7 {

tail call void @f()(%struct.mytype* sret %0), !dbg !21
ret void, !dbg !22

}
Generated code:

g(): # @g()

push rbx
mov rbx, rdi
call f()
mov rax, rbx
pop rbx
ret

On the other hand clang can do sibling call optimization when struct is passed by pointer:

struct mytype
{

char const *a, *b, *c, *d;

};

void f(mytype*);

void g(mytype* a)
{

return f(a);

}
Generated IR (-O2):

define void @g(mytype*)(%struct.mytype*) local_unnamed_addr #0 !dbg !7 {

call void @llvm.dbg.value(metadata %struct.mytype* %0, metadata !13, metadata !DIExpression()), !dbg !14
tail call void @f(mytype*)(%struct.mytype* %0), !dbg !15
ret void, !dbg !16

}
Generated code:

g(mytype*): # @g(mytype*)

jmp f(mytype*) # TAILCALL

The difference between these two IRs are the presence of sret attribute. I believe tail call optimization is possible in the first case too. The reason why tail call optimization is not performed is that X86TargetLowering::IsEligibleForTailCallOptimization has the following if:

Also avoid sibcall optimization if either caller or callee uses struct
return semantics.
if (isCalleeStructRet || isCallerStructRet)

return false;

Unfortunately when this if was added, no explanation was given why it is needed here. The corresponding test case is marked rdar://7726868, but I can not see what it is about.

As far as I understand sibling call can be performed when both caller and callee are marked sret. The case when caller and callee have mismatched sret specification is trickier.

As I understand attribute sret serves two purposes:

(from documentation) This indicates that the pointer parameter specifies the address of a structure that is the return value of the function in the source program. This pointer must be guaranteed by the caller to be valid: loads and stores to the structure may be assumed by the callee not to trap and to be properly aligned. This is not a valid attribute for return values.
(from Itanium ABI) callee function should return its sret argument in RAX.
From (2) sret caller can not sibling-call non-sret callee. The reverse is allowed. Based on this I updated X86TargetLowering::IsEligibleForTailCallOptimization.

Base patch by https://reviews.llvm.org/D45653

Diff Detail

Event Timeline

xbolva00 created this revision.Apr 30 2018, 4:33 AM
xbolva00 edited the summary of this revision. (Show Details)
xbolva00 edited the summary of this revision. (Show Details)
xbolva00 added a subscriber: spatel.

Please abandon this. The changes should be made in the original patch by the original author. If the original author has agreed to let you commandeer D45653, then that should be stated there, and the review should continue there.

xbolva00 abandoned this revision.Apr 30 2018, 6:41 AM

Please abandon this. The changes should be made in the original patch by the original author. If the original author has agreed to let you commandeer D45653, then that should be stated there, and the review should continue there.

Ok, I understand.

xbolva00 updated this revision to Diff 204568.Jun 13 2019, 9:52 AM
xbolva00 edited the summary of this revision. (Show Details)
xbolva00 added a reviewer: efriedma.

Reopened, since original D45653 was abandoned.

xbolva00 updated this revision to Diff 204569.Jun 13 2019, 9:56 AM

Revert deleted testcase.

xbolva00 updated this revision to Diff 204621.Jun 13 2019, 1:39 PM

One more test added.

efriedma requested changes to this revision.Jun 13 2019, 2:54 PM

Please fix the "summary" to include the full expected commit message.

I had the following comment on the original revision:

If both the caller and callee are sret, do you need to check that the sret argument is the same? Consider something like tail call void @f(%struct.foo* noalias sret @aa) nounwind.

Did this ever get fixed?

This revision now requires changes to proceed.Jun 13 2019, 2:54 PM

Also, it looks like you never cc'ed llvm-commits; please abandon this and post a new revision with llvm-commits properly cc'ed

xbolva00 added a comment.EditedJun 13 2019, 3:39 PM

Please fix the "summary" to include the full expected commit message.

I had the following comment on the original revision:

If both the caller and callee are sret, do you need to check that the sret argument is the same? Consider something like tail call void @f(%struct.foo* noalias sret @aa) nounwind.

Did this ever get fixed?

t21_sret_to_sret_second_arg_sret or maybe t21_sret_to_sret_alloca ?

How do you propose to create test case like you want

define fastcc void @t21_sret_to_sret_more_args2(%struct.foo* noalias sret %agg.result) nounwind  {
   tail call fastcc void @t21_f_sret(%struct.foo* noalias sret %aa) nounwind
   ret void
}

?

xbolva00 set the repository for this revision to rL LLVM.Jun 13 2019, 3:40 PM
xbolva00 added a subscriber: llvm-commits.
Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2019, 3:40 PM
xbolva00 edited the summary of this revision. (Show Details)Jun 13 2019, 3:40 PM
arsenm added a subscriber: arsenm.Jun 19 2019, 5:09 PM
arsenm added inline comments.
lib/Target/X86/X86ISelLowering.cpp
4164–4166

No else after return

xbolva00 updated this revision to Diff 205869.Jun 20 2019, 11:57 AM

Addressed review notes, thanks.

The testcase I actually wanted, which incorrectly forms a tail call:

define fastcc void @t21_sret_to_sret_arg_mismatch(%struct.foo* noalias sret %agg.result, %struct.foo* noalias %a) nounwind  {
  %b = call fastcc %struct.foo* @ret_struct()
  tail call fastcc void @t21_f_sret(%struct.foo* noalias sret %a, %struct.foo* noalias %b) nounwind
  ret void
}
declare ccc %struct.foo* @ret_struct() nounwind
declare fastcc void @t21_f_sret(%struct.foo* noalias sret, %struct.foo* noalias) nounwind
%struct.foo = type { [4 x i32] }

Apparently other, more obvious constructs don't actually trigger the issue; there was a check introduced in https://reviews.llvm.org/D8510 which suppresses tail calls in some, but not all, of the relevant cases.

lib/Target/X86/X86ISelLowering.cpp
4161

Not sure why the CallerF.arg_size() != Outs.size() check is necessary? In any case, needs a comment to explain.

xbolva00 marked an inline comment as done.Jun 20 2019, 12:28 PM
xbolva00 added inline comments.
lib/Target/X86/X86ISelLowering.cpp
4161

Some "args_mismatch" test was miscompiled without it , I think.

xbolva00 abandoned this revision.Sep 18 2019, 12:31 PM