Download Raw Diff

Details

Reviewers

craig.topper
RKSimon
qcolombet
Gerolf
escha
alexr
sorokin
t.p.northover
efriedma
rsmith
xbolva00
spatel

Summary

This is a attempt for fix issue PR28417.

Currently clang doesn't do sibling call optimization when function returns a struct by value:

mytype f();

mytype g()
{
  return f();
}

Generated IR (-O2):

define void @g()(%struct.mytype* noalias sret) local_unnamed_addr #0 !dbg !7 {
  tail call void @f()(%struct.mytype* sret %0), !dbg !21
  ret void, !dbg !22
}

Generated code:

g(): # @g()
  push rbx
  mov rbx, rdi
  call f()
  mov rax, rbx
  pop rbx
  ret

On the other hand clang can do sibling call optimization when struct is passed by pointer:

struct mytype
{
    char const *a, *b, *c, *d;
};

void f(mytype*);

void g(mytype* a)
{
    return f(a);
}

Generated IR (-O2):

define void @g(mytype*)(%struct.mytype*) local_unnamed_addr #0 !dbg !7 {
  call void @llvm.dbg.value(metadata %struct.mytype* %0, metadata !13, metadata !DIExpression()), !dbg !14
  tail call void @f(mytype*)(%struct.mytype* %0), !dbg !15
  ret void, !dbg !16
}

Generated code:

g(mytype*): # @g(mytype*)
  jmp f(mytype*) # TAILCALL

The difference between these two IRs are the presence of sret attribute. I believe tail call optimization is possible in the first case too. The reason why tail call optimization is not performed is that X86TargetLowering::IsEligibleForTailCallOptimization has the following if:

// Also avoid sibcall optimization if either caller or callee uses struct
// return semantics.
if (isCalleeStructRet || isCallerStructRet)
    return false;

Unfortunately when this if was added, no explanation was given why it is needed here. The corresponding test case is marked rdar://7726868, but I can not see what it is about.

As far as I understand sibling call can be performed when both caller and callee are marked sret. The case when caller and callee have mismatched sret specification is trickier.

As I understand attribute sret serves two purposes:

(from documentation) This indicates that the pointer parameter specifies the address of a structure that is the return value of the function in the source program. This pointer must be guaranteed by the caller to be valid: loads and stores to the structure may be assumed by the callee not to trap and to be properly aligned. This is not a valid attribute for return values.
(from Itanium ABI) callee function should return its sret argument in RAX.

From (2) sret caller can not sibling-call non-sret callee. The reverse is allowed. Based on this I updated X86TargetLowering::IsEligibleForTailCallOptimization.

Diff Detail

Event Timeline

sorokin created this revision.Apr 14 2018, 7:39 AM

bkramer added reviewers: craig.topper, RKSimon.Apr 14 2018, 7:42 AM

xbolva00 accepted this revision.Apr 14 2018, 7:45 AM

This revision is now accepted and ready to land.Apr 14 2018, 7:45 AM

Please update with the context.

xbolva00 requested changes to this revision.Apr 14 2018, 7:48 AM

This revision now requires changes to proceed.Apr 14 2018, 7:48 AM

I updated the diff with the context.

xbolva00 accepted this revision.Apr 14 2018, 9:59 AM

This revision is now accepted and ready to land.Apr 14 2018, 9:59 AM

xbolva00 added a reviewer: spatel.Apr 16 2018, 11:32 AM

Unfortunately when this if was added, no explanation was given why it is needed here. The corresponding test case is marked rdar://7726868, but I can not see what it is about.

I don't know enough about this to see the possible logic holes, but we should have someone with Radar access comment on the patch (adding some reviewer candidates).

As a preliminary clean-up, you might want to run update_llc_test_checks.py on this test file, so we can see the exact diffs in codegen.

As a preliminary clean-up, you might want to run update_llc_test_checks.py on this test file, so we can see the exact diffs in codegen.

I ran

$python2 update_llc_test_checks.py --llc-binary <directory-of-llc>/llc llvm/test/CodeGen/X86/sibcall.ll

and it modified the sibcall.ll. I'm not familiar with with the utility. I'm afraid I don't know how to use it properly and what is the meaning of the diff. I attached 2.patch. It is the changes this tool does to sibcall.ll.

2.patch9 KBDownload

spatel mentioned this in rL330445: [x86] auto-generate checks; NFC.Apr 20 2018, 9:50 AM

In D45653#1073189, @sorokin wrote:
$python2 update_llc_test_checks.py --llc-binary <directory-of-llc>/llc llvm/test/CodeGen/X86/sibcall.ll
and it modified the sibcall.ll. I'm not familiar with with the utility. I'm afraid I don't know how to use it properly and what is the meaning of the diff. I attached 2.patch. It is the changes this tool does to sibcall.ll. {F5984860}

The '-asm-verbose=false' params screwed up the script. I removed that and updated here:
rL330445

You may want to add your new tests to the file now, so we have a baseline on those too. Then rebase this patch.

xbolva00 requested changes to this revision.Apr 25 2018, 1:38 PM

This revision now requires changes to proceed.Apr 25 2018, 1:38 PM

@xbolva00 - If you request changes to a patch, you should specify what those changes are.

Also, why have you posted an identical patch at D46262?

In D45653#1082734, @spatel wrote:

@xbolva00 - If you request changes to a patch, you should specify what those changes are.

Also, why have you posted an identical patch at D46262?

I added @sorokin 's tests and regenerated test file, just in case this patch will not be updated by @sorokin so the idea will not lost.

spatel mentioned this in D46262: Enable sibling-call optimization for functions returning structs.Apr 30 2018, 6:39 AM

xbolva00 added inline comments.Apr 30 2018, 6:40 AM

test/CodeGen/X86/sibcall.ll
436–437	Regenerate test file please

Ping

xbolva00 resigned from this revision.Jul 1 2018, 6:04 AM

aleksandr.urakov added a subscriber: aleksandr.urakov.Aug 8 2018, 5:51 AM

A colleague of mine updated the patch. The patch is rebased against the llvm trunk. He also ran update_llc_test_checks.py on sibcall.ll.

@xbolva00 I'm perfectly OK if you (or any other person) champion this patch. Also feel free to make whatever changes you consider to be necessary.

Ping!

Can you review this, please?

lebedev.ri retitled this revision from Enable sibling-call optimization for functions returning structs to [X86] Enable sibling-call optimization for functions returning structs.Aug 15 2018, 1:16 AM

While i'm unqualified to review this, i'm pretty sure you can safely land the tests themselves,
and update the diff to show the actual test change.

Thanks for the reply!

Dou you mean that it's a good idea to land newly added test cases (@t21_sret_to_sret and so on) with a separate commit, so we could see how current patch will change these test cases? Is it preferable to create a different review for the commit?

In D45653#1200286, @aleksandr.urakov wrote:

Thanks for the reply!

In D45653#1200286, @aleksandr.urakov wrote:

Dou you mean that it's a good idea to land newly added test cases (@t21_sret_to_sret and so on) with a separate commit, so we could see how current patch will change these test cases?

Correct.

In D45653#1200286, @aleksandr.urakov wrote:

Is it preferable to create a different review for the commit?

I don't see the point, just directly commit those tests and rebase this diff afterwards.

Ok, thanks again, I'll do it some later today.

aleksandr.urakov mentioned this in rL339760: [X86] Add sibling-call test cases.Aug 15 2018, 3:54 AM

Update patch to see actual test cases changes.

lebedev.ri added inline comments.Aug 15 2018, 4:40 AM

test/CodeGen/X86/sibcall.ll
650–651	Hm, this looks wrong. There is no such check prefixes on the run lines (at the beginning of the file) Did you use `utils/update_llc_test_checks.py`?

aleksandr.urakov marked an inline comment as done.Aug 15 2018, 4:50 AM

aleksandr.urakov added inline comments.

test/CodeGen/X86/sibcall.ll
650–651	Yes, I use this utility, but it doesn't delete these lines. They must've be left here since the first version of the patch and are treated as simple comments. I'll update this.

Remove meaningless comment lines from test.

spatel resigned from this revision.Aug 15 2018, 8:38 AM

@spatel Excuse me, can you explain, please, what's wrong with this patch now?

Ivan is my colleague, he have asked me to solve this review a week ago. And he have mentioned here, that he doesn't mind if someone will comandeer the review...

There was the very long delay with a reply, and I understand that it's not good (and I'm very sorry for that really). But now I want to solve this in the nearest future.

I really hope for your understanding.

In D45653#1200806, @aleksandr.urakov wrote:

@spatel Excuse me, can you explain, please, what's wrong with this patch now?

Ivan is my colleague, he have asked me to solve this review a week ago. And he have mentioned here, that he doesn't mind if someone will comandeer the review...

There was the very long delay with a reply, and I understand that it's not good (and I'm very sorry for that really). But now I want to solve this in the nearest future.

I really hope for your understanding.

Sorry - I didn't mean to imply that the patch was wrong in any way by resigning as reviewer. I did that only because I don't know enough to review the patch! The test changes certainly look like wins.
As I said earlier, I'm hoping that someone with access to the Radar bug that motivated the existing code can comment. If not, then someone who knows more about the calling conventions should have a look (but I'm not sure who else to add as a potential reviewer).

@spatel Thanks, now I understand you! Sorry for so much fuss on this. Hopefully someone with Radar access will review the patch.

In D45653#1200944, @aleksandr.urakov wrote:

@spatel Thanks, now I understand you! Sorry for so much fuss on this. Hopefully someone with Radar access will review the patch.

I guess two straight-forward questions are:

What do other compilers do? Can you come up with the similar tests (@t21_*) in C, and post godbolt links?
Have you tried using this patch? Does stage-2 of llvm build fine, and tests pass? testsuite?

Unfortunately I'm OOO today. But I ran LLVM LIT tests with this patch, and there was no new fails with it. I'll try to build whole LLVM with patched LLVM version (and will run tests within it) and will prepare gotbolt links during a week.

I made a C++ example for all 4 cases: sret/ptr function calls sret/ptr function.

Here is the link: https://godbolt.org/g/FxUYvv

I don't know if it is possible to make a ptr to sret function in pure C. I did this only in C++.

In D45653#1200945, @lebedev.ri wrote:

In D45653#1200944, @aleksandr.urakov wrote:

Have you tried using this patch? Does stage-2 of llvm build fine, and tests pass? testsuite?

I built LLVM before submitting this patch. Back then LLVM successfully rebuilt itself and rebuilt version passed all tests. I used that version as my main compiler for a while.
I haven't done more rigorous testing like rebuilding a linux distribution though.

In D45653#1202370, @sorokin wrote:

I made a C++ example for all 4 cases: sret/ptr function calls sret/ptr function.

Here is the link: https://godbolt.org/g/FxUYvv

I don't know if it is possible to make a ptr to sret function in pure C. I did this only in C++.

By C i meant anything that is not LLVM IR, so other compilers can be compared.
But, the results are not what i have hoped to see.
I kinda thought others were already doing this, and clang wasn't.
It does not look like it.
Or the C++ testcases are different from llvm ir, and we simply can't do this comparison.

So it seems we will need an ABI person after all :)

aleksandr.urakov added a reviewer: t.p.northover.Aug 17 2018, 1:19 AM

xbolva00 added a reviewer: efriedma.Aug 17 2018, 1:56 AM

xbolva00 added a reviewer: rsmith.Aug 17 2018, 1:59 AM

In D45653#1202396, @lebedev.ri wrote:

In D45653#1202370, @sorokin wrote:

I made a C++ example for all 4 cases: sret/ptr function calls sret/ptr function.

Here is the link: https://godbolt.org/g/FxUYvv

I don't know if it is possible to make a ptr to sret function in pure C. I did this only in C++.

By C i meant anything that is not LLVM IR, so other compilers can be compared.
But, the results are not what i have hoped to see.
I kinda thought others were already doing this, and clang wasn't.
It does not look like it.
Or the C++ testcases are different from llvm ir, and we simply can't do this comparison.

So it seems we will need an ABI person after all :)

Ping @rsmith ? :)

Ping!

Can anyone review this, please?

For the C calling convention, there should probably be a testcase where the callee is sret and the caller is not. (I think we correctly reject the callee-pop cases that can't be handled, though.)

If both the caller and callee are sret, do you need to check that the sret argument is the same? Consider something like tail call void @f(%struct.foo* noalias sret @aa) nounwind.

RKSimon added inline comments.Aug 25 2018, 4:39 AM

test/CodeGen/X86/sibcall.ll
4	This is very minor, but shouldn't the prefixes really be X86 (instead of X32), X64 and X32 (instead of X32ABI)?
615	Old prefixes - this can just be removed as an NFC cleanup
654	Old prefixes - this can just be removed as an NFC cleanup
691	Old prefixes - this can just be removed as an NFC cleanup

aleksandr.urakov mentioned this in rL340735: [NFC][X86] Fix `sibcall.ll` formatting.Aug 27 2018, 4:27 AM

Fixed with rL340735, thanks!

aleksandr.urakov mentioned this in rL340737: [X86] Adding the test pointing to the fail case of D45653.Aug 27 2018, 4:58 AM

aleksandr.urakov updated this revision to Diff 162659.Aug 27 2018, 5:02 AM

This patch fails in the @t22_non_sret_to_sret X86 case (as we can see, a stack becomes corrupted after the patch in this case). Thanks to @efriedma for pointing that. So I just abandon the revision. Thanks all for your help!

In D45653#1214298, @aleksandr.urakov wrote:

This patch fails in the @t22_non_sret_to_sret X86 case (as we can see, a stack becomes corrupted after the patch in this case). Thanks to @efriedma for pointing that. So I just abandon the revision. Thanks all for your help!

Is there no way to somehow actually

In D45653#1213059, @efriedma wrote:

check that the sret argument is the same?

If I understand right, the fail becomes not because of the sret argument is not the same, but because in the C calling convention sret argument is cleaned up by a callee, while a non-sret pointer to a struct is cleaned up by a caller. Or maybe I am missing something?..

but because in the C calling convention sret argument is cleaned up by a callee, while a non-sret pointer to a struct is cleaned up by a caller

This only applies to 32-bit. (And even on 32-bit, we could still handle some cases where both the caller and callee are sret, I think.)

Yes, I meant 32-bit of course.

I'm not sure about this fix, now it doesn't seem so simple to me... What do you think about the next check?

if (IsCalleeWin64 != IsCallerWin64)
  return false;

Do you see any pitfalls here?

As for different sret arguments, I think that it doesn't matter, because they will be processed in the same way for the same calling convention, and it shouldn't break the stack. So we can replace a call with a jump in this case, right?

In D45653#1215433, @aleksandr.urakov wrote:
Yes, I meant 32-bit of course.

I'm not sure about this fix, now it doesn't seem so simple to me... What do you think about the next check?
if (IsCalleeWin64 != IsCallerWin64)
  return false;
Do you see any pitfalls here?

I would rather not mess with this... mixing win64 and non-win64 calling conventions is going to be extremely rare in practice, and I don't really want to spend time combing through the ABI documents trying to figure out if anything can go wrong.

As for different sret arguments, I think that it doesn't matter, because they will be processed in the same way for the same calling convention, and it shouldn't break the stack. So we can replace a call with a jump in this case, right?

The problem would just be that EAX/RAX is set to the wrong value.

In D45653#1216359, @efriedma wrote:

I would rather not mess with this... mixing win64 and non-win64 calling conventions is going to be extremely rare in practice, and I don't really want to spend time combing through the ABI documents trying to figure out if anything can go wrong.

Oh, sorry, I've changed another condition accidentally. I wanted to ask you about the next check:

if (isCalleeStructRet != isCallerStructRet)
   return false;

The problem would just be that EAX/RAX is set to the wrong value.

Yes, you are right, thanks. So the additional check of sret arguments equality is needed.

We should not abandon it. It seems we can optimize this case

struct MyStruct {

int arr[64];

};

void struct_by_value(MyStruct s);

void call_struct_by_value(int i, MyStruct s) {

struct_by_value(s);

}

https://godbolt.org/z/lxOq6B (GGC does it)

But as @efriedma noted, your motivating case is also possible to be handled as well.

If I understand correctly, you have shown the case that is not relating to this patch? This patch is about returning structures, not about passing them by value. I think that it would be better to create separate patch for fixing that?

As for this one, it seems that it is possible to optimize the sret-to-sret case if the sret arguments are the same, but I'm not sure if it's enough (@efriedma how do you think, is it ok?). I'm going to work on this, but I'll do it some later because I'm working on another part now and I'll have a vacation soon. Or may be @sorokin can do it earlier?

I'm not sure if it's enough (@efriedma how do you think, is it ok?).

Should be enough, I think.

@aleksandr.urakov are you still looking at this?

Unfortunately, I'm not looking at this exactly now due to some work with a higher priority. But I'm keeping it in mind.

Since this patch was abandoned, I will try to continue it here: https://reviews.llvm.org/D46262

Diff 160777

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,177 Lines • ▼ Show 20 Lines	bool X86TargetLowering::IsEligibleForTailCallOptimization(
// require ABI changes. This is what gcc calls sibcall.		// require ABI changes. This is what gcc calls sibcall.

// Can't do sibcall if stack needs to be dynamically re-aligned. PEI needs to		// Can't do sibcall if stack needs to be dynamically re-aligned. PEI needs to
// emit a special epilogue.		// emit a special epilogue.
const X86RegisterInfo *RegInfo = Subtarget.getRegisterInfo();		const X86RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
if (RegInfo->needsStackRealignment(MF))		if (RegInfo->needsStackRealignment(MF))
return false;		return false;

// Also avoid sibcall optimization if either caller or callee uses struct		// Struct-return functions need to return its argument in RAX, so they can not
// return semantics.		// sibcall non-struct-return functions.
if (isCalleeStructRet \|\| isCallerStructRet)		if (!isCalleeStructRet && isCallerStructRet)
return false;		return false;

// Do not sibcall optimize vararg calls unless all arguments are passed via		// Do not sibcall optimize vararg calls unless all arguments are passed via
// registers.		// registers.
LLVMContext &C = *DAG.getContext();		LLVMContext &C = *DAG.getContext();
if (isVarArg && !Outs.empty()) {		if (isVarArg && !Outs.empty()) {
// Optimizing for varargs on Win64 is unlikely to be safe without		// Optimizing for varargs on Win64 is unlikely to be safe without
// additional testing.		// additional testing.
if (IsCalleeWin64 \|\| IsCallerWin64)		if (IsCalleeWin64 \|\| IsCallerWin64)
▲ Show 20 Lines • Show All 36,969 Lines • Show Last 20 Lines

test/CodeGen/X86/sibcall.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=i686-linux -mcpu=core2 -mattr=+sse2 \| FileCheck %s --check-prefix=X32		; RUN: llc < %s -mtriple=i686-linux -mcpu=core2 -mattr=+sse2 \| FileCheck %s --check-prefix=X32
; RUN: llc < %s -mtriple=x86_64-linux -mcpu=core2 -mattr=+sse2 \| FileCheck %s --check-prefix=X64		; RUN: llc < %s -mtriple=x86_64-linux -mcpu=core2 -mattr=+sse2 \| FileCheck %s --check-prefix=X64
; RUN: llc < %s -mtriple=x86_64-linux-gnux32 -mcpu=core2 -mattr=+sse2 \| FileCheck %s --check-prefix=X32ABI		; RUN: llc < %s -mtriple=x86_64-linux-gnux32 -mcpu=core2 -mattr=+sse2 \| FileCheck %s --check-prefix=X32ABI
		RKSimonUnsubmitted Done Reply Inline Actions This is very minor, but shouldn't the prefixes really be X86 (instead of X32), X64 and X32 (instead of X32ABI)? RKSimon: This is very minor, but shouldn't the prefixes really be X86 (instead of X32), X64 and X32…

define void @t1(i32 %x) nounwind ssp {		define void @t1(i32 %x) nounwind ssp {
; X32-LABEL: t1:		; X32-LABEL: t1:
; X32: # %bb.0:		; X32: # %bb.0:
; X32-NEXT: jmp foo # TAILCALL		; X32-NEXT: jmp foo # TAILCALL
;		;
; X64-LABEL: t1:		; X64-LABEL: t1:
; X64: # %bb.0:		; X64: # %bb.0:
▲ Show 20 Lines • Show All 415 Lines • ▼ Show 20 Lines	entry:
%6 = bitcast void ()* %1 to i8* ; <i8*> [#uses=1]		%6 = bitcast void ()* %1 to i8* ; <i8*> [#uses=1]
tail call void %5(i8* %6) nounwind		tail call void %5(i8* %6) nounwind
ret void		ret void
}		}

; rdar://7726868		; rdar://7726868
%struct.foo = type { [4 x i32] }		%struct.foo = type { [4 x i32] }

define void @t15(%struct.foo* noalias sret %agg.result) nounwind {		define void @t15(%struct.foo* noalias sret %agg.result) nounwind {
; X32-LABEL: t15:		; X32-LABEL: t15:
xbolva00Unsubmitted Not Done Reply Inline Actions Regenerate test file please xbolva00: Regenerate test file please
; X32: # %bb.0:		; X32: # %bb.0:
; X32-NEXT: pushl %esi		; X32-NEXT: pushl %esi
; X32-NEXT: subl $8, %esp		; X32-NEXT: subl $8, %esp
; X32-NEXT: movl {{[0-9]+}}(%esp), %esi		; X32-NEXT: movl {{[0-9]+}}(%esp), %esi
; X32-NEXT: movl %esi, %ecx		; X32-NEXT: movl %esi, %ecx
; X32-NEXT: calll f		; X32-NEXT: calll f
; X32-NEXT: movl %esi, %eax		; X32-NEXT: movl %esi, %eax
; X32-NEXT: addl $8, %esp		; X32-NEXT: addl $8, %esp
; X32-NEXT: popl %esi		; X32-NEXT: popl %esi
; X32-NEXT: retl $4		; X32-NEXT: retl $4
;		;
; X64-LABEL: t15:		; X64-LABEL: t15:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: pushq %rbx		; X64-NEXT: jmp f # TAILCALL
; X64-NEXT: movq %rdi, %rbx
; X64-NEXT: callq f
; X64-NEXT: movq %rbx, %rax
; X64-NEXT: popq %rbx
; X64-NEXT: retq
;		;
; X32ABI-LABEL: t15:		; X32ABI-LABEL: t15:
; X32ABI: # %bb.0:		; X32ABI: # %bb.0:
; X32ABI-NEXT: pushq %rbx		; X32ABI-NEXT: jmp f # TAILCALL
; X32ABI-NEXT: movl %edi, %ebx
; X32ABI-NEXT: callq f
; X32ABI-NEXT: movl %ebx, %eax
; X32ABI-NEXT: popq %rbx
; X32ABI-NEXT: retq
tail call fastcc void @f(%struct.foo* noalias sret %agg.result) nounwind		tail call fastcc void @f(%struct.foo* noalias sret %agg.result) nounwind
ret void		ret void
}		}

declare void @f(%struct.foo* noalias sret) nounwind		declare void @f(%struct.foo* noalias sret) nounwind

define void @t16() nounwind ssp {		define void @t16() nounwind ssp {
; X32-LABEL: t16:		; X32-LABEL: t16:
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	entry:
%0 = tail call fastcc double @foo20(double %x) nounwind		%0 = tail call fastcc double @foo20(double %x) nounwind
ret double %0		ret double %0
}		}

declare fastcc double @foo20(double) nounwind		declare fastcc double @foo20(double) nounwind

; bug 28417		; bug 28417
define fastcc void @t21_sret_to_sret(%struct.foo* noalias sret %agg.result) nounwind {		define fastcc void @t21_sret_to_sret(%struct.foo* noalias sret %agg.result) nounwind {
; 32-LABEL: t21_sret_to_sret:
; 32: jmp {{_?}}t21_f_sret
; 64-LABEL: t21_sret_to_sret:
; 64: jmp {{_?}}t21_f_sret
RKSimonUnsubmitted Done Reply Inline Actions Old prefixes - this can just be removed as an NFC cleanup RKSimon: Old prefixes - this can just be removed as an NFC cleanup
; X32-LABEL: t21_sret_to_sret:		; X32-LABEL: t21_sret_to_sret:
; X32: # %bb.0:		; X32: # %bb.0:
; X32-NEXT: pushl %esi		; X32-NEXT: jmp t21_f_sret # TAILCALL
; X32-NEXT: subl $8, %esp
; X32-NEXT: movl %ecx, %esi
; X32-NEXT: calll t21_f_sret
; X32-NEXT: movl %esi, %eax
; X32-NEXT: addl $8, %esp
; X32-NEXT: popl %esi
; X32-NEXT: retl
;		;
; X64-LABEL: t21_sret_to_sret:		; X64-LABEL: t21_sret_to_sret:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: pushq %rbx		; X64-NEXT: jmp t21_f_sret # TAILCALL
; X64-NEXT: movq %rdi, %rbx
; X64-NEXT: callq t21_f_sret
; X64-NEXT: movq %rbx, %rax
; X64-NEXT: popq %rbx
; X64-NEXT: retq
;		;
; X32ABI-LABEL: t21_sret_to_sret:		; X32ABI-LABEL: t21_sret_to_sret:
; X32ABI: # %bb.0:		; X32ABI: # %bb.0:
; X32ABI-NEXT: pushq %rbx		; X32ABI-NEXT: jmp t21_f_sret # TAILCALL
; X32ABI-NEXT: movl %edi, %ebx
; X32ABI-NEXT: callq t21_f_sret
; X32ABI-NEXT: movl %ebx, %eax
; X32ABI-NEXT: popq %rbx
; X32ABI-NEXT: retq
tail call fastcc void @t21_f_sret(%struct.foo* noalias sret %agg.result) nounwind		tail call fastcc void @t21_f_sret(%struct.foo* noalias sret %agg.result) nounwind
ret void		ret void
}		}

define fastcc void @t21_sret_to_non_sret(%struct.foo* noalias sret %agg.result) nounwind {		define fastcc void @t21_sret_to_non_sret(%struct.foo* noalias sret %agg.result) nounwind {
; 32-LABEL: t21_sret_to_non_sret:
; 32: calll {{_?}}t21_f_non_sret
; 32: retl
; 64-LABEL: t21_sret_to_non_sret:
; 64: callq {{_?}}t21_f_non_sret
; 64: retq
RKSimonUnsubmitted Done Reply Inline Actions Old prefixes - this can just be removed as an NFC cleanup RKSimon: Old prefixes - this can just be removed as an NFC cleanup
; X32-LABEL: t21_sret_to_non_sret:		; X32-LABEL: t21_sret_to_non_sret:
; X32: # %bb.0:		; X32: # %bb.0:
; X32-NEXT: pushl %esi		; X32-NEXT: pushl %esi
; X32-NEXT: subl $8, %esp		; X32-NEXT: subl $8, %esp
; X32-NEXT: movl %ecx, %esi		; X32-NEXT: movl %ecx, %esi
; X32-NEXT: calll t21_f_non_sret		; X32-NEXT: calll t21_f_non_sret
; X32-NEXT: movl %esi, %eax		; X32-NEXT: movl %esi, %eax
; X32-NEXT: addl $8, %esp		; X32-NEXT: addl $8, %esp
Show All 16 Lines
; X32ABI-NEXT: callq t21_f_non_sret		; X32ABI-NEXT: callq t21_f_non_sret
; X32ABI-NEXT: movl %ebx, %eax		; X32ABI-NEXT: movl %ebx, %eax
; X32ABI-NEXT: popq %rbx		; X32ABI-NEXT: popq %rbx
; X32ABI-NEXT: retq		; X32ABI-NEXT: retq
tail call fastcc void @t21_f_non_sret(%struct.foo* %agg.result) nounwind		tail call fastcc void @t21_f_non_sret(%struct.foo* %agg.result) nounwind
ret void		ret void
}		}

define fastcc void @t21_non_sret_to_sret(%struct.foo* %agg.result) nounwind {		define fastcc void @t21_non_sret_to_sret(%struct.foo* %agg.result) nounwind {
; 32-LABEL: t21_non_sret_to_sret:
; 32: jmp {{_?}}t21_f_sret
; 64-LABEL: t21_non_sret_to_sret:
; 64: jmp {{_?}}t21_f_sret
RKSimonUnsubmitted Done Reply Inline Actions Old prefixes - this can just be removed as an NFC cleanup RKSimon: Old prefixes - this can just be removed as an NFC cleanup
; X32-LABEL: t21_non_sret_to_sret:		; X32-LABEL: t21_non_sret_to_sret:
		lebedev.riUnsubmitted Done Reply Inline Actions Hm, this looks wrong. There is no such check prefixes on the run lines (at the beginning of the file) Did you use `utils/update_llc_test_checks.py`? lebedev.ri: Hm, this looks wrong. There is no such check prefixes on the run lines (at the beginning of the…
		aleksandr.urakovAuthorUnsubmitted Not Done Reply Inline Actions Yes, I use this utility, but it doesn't delete these lines. They must've be left here since the first version of the patch and are treated as simple comments. I'll update this. aleksandr.urakov: Yes, I use this utility, but it doesn't delete these lines. They must've be left here since the…
; X32: # %bb.0:		; X32: # %bb.0:
; X32-NEXT: subl $12, %esp		; X32-NEXT: jmp t21_f_sret # TAILCALL
; X32-NEXT: calll t21_f_sret
; X32-NEXT: addl $12, %esp
; X32-NEXT: retl
;		;
; X64-LABEL: t21_non_sret_to_sret:		; X64-LABEL: t21_non_sret_to_sret:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: pushq %rax		; X64-NEXT: jmp t21_f_sret # TAILCALL
; X64-NEXT: callq t21_f_sret
; X64-NEXT: popq %rax
; X64-NEXT: retq
;		;
; X32ABI-LABEL: t21_non_sret_to_sret:		; X32ABI-LABEL: t21_non_sret_to_sret:
; X32ABI: # %bb.0:		; X32ABI: # %bb.0:
; X32ABI-NEXT: pushq %rax		; X32ABI-NEXT: jmp t21_f_sret # TAILCALL
; X32ABI-NEXT: callq t21_f_sret
; X32ABI-NEXT: popq %rax
; X32ABI-NEXT: retq
tail call fastcc void @t21_f_sret(%struct.foo* noalias sret %agg.result) nounwind		tail call fastcc void @t21_f_sret(%struct.foo* noalias sret %agg.result) nounwind
ret void		ret void
}		}

declare fastcc void @t21_f_sret(%struct.foo* noalias sret) nounwind		declare fastcc void @t21_f_sret(%struct.foo* noalias sret) nounwind
declare fastcc void @t21_f_non_sret(%struct.foo*) nounwind		declare fastcc void @t21_f_non_sret(%struct.foo*) nounwind

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Enable sibling-call optimization for functions returning structs
AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 160777

lib/Target/X86/X86ISelLowering.cpp

test/CodeGen/X86/sibcall.ll

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Enable sibling-call optimization for functions returning structsAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 160777

lib/Target/X86/X86ISelLowering.cpp

test/CodeGen/X86/sibcall.ll

[X86] Enable sibling-call optimization for functions returning structs
AbandonedPublic