This is an archive of the discontinued LLVM Phabricator instance.

[X86] Enable sibling-call optimization for functions returning structs
AbandonedPublic

Authored by aleksandr.urakov on Apr 14 2018, 7:39 AM.

Details

Summary

This is a attempt for fix issue PR28417.

Currently clang doesn't do sibling call optimization when function returns a struct by value:

mytype f();

mytype g()
{
  return f();
}

Generated IR (-O2):

define void @g()(%struct.mytype* noalias sret) local_unnamed_addr #0 !dbg !7 {
  tail call void @f()(%struct.mytype* sret %0), !dbg !21
  ret void, !dbg !22
}

Generated code:

g(): # @g()
  push rbx
  mov rbx, rdi
  call f()
  mov rax, rbx
  pop rbx
  ret

On the other hand clang can do sibling call optimization when struct is passed by pointer:

struct mytype
{
    char const *a, *b, *c, *d;
};

void f(mytype*);

void g(mytype* a)
{
    return f(a);
}

Generated IR (-O2):

define void @g(mytype*)(%struct.mytype*) local_unnamed_addr #0 !dbg !7 {
  call void @llvm.dbg.value(metadata %struct.mytype* %0, metadata !13, metadata !DIExpression()), !dbg !14
  tail call void @f(mytype*)(%struct.mytype* %0), !dbg !15
  ret void, !dbg !16
}

Generated code:

g(mytype*): # @g(mytype*)
  jmp f(mytype*) # TAILCALL

The difference between these two IRs are the presence of sret attribute. I believe tail call optimization is possible in the first case too. The reason why tail call optimization is not performed is that X86TargetLowering::IsEligibleForTailCallOptimization has the following if:

// Also avoid sibcall optimization if either caller or callee uses struct
// return semantics.
if (isCalleeStructRet || isCallerStructRet)
    return false;

Unfortunately when this if was added, no explanation was given why it is needed here. The corresponding test case is marked rdar://7726868, but I can not see what it is about.

As far as I understand sibling call can be performed when both caller and callee are marked sret. The case when caller and callee have mismatched sret specification is trickier.

As I understand attribute sret serves two purposes:

  1. (from documentation) This indicates that the pointer parameter specifies the address of a structure that is the return value of the function in the source program. This pointer must be guaranteed by the caller to be valid: loads and stores to the structure may be assumed by the callee not to trap and to be properly aligned. This is not a valid attribute for return values.
  2. (from Itanium ABI) callee function should return its sret argument in RAX.

From (2) sret caller can not sibling-call non-sret callee. The reverse is allowed. Based on this I updated X86TargetLowering::IsEligibleForTailCallOptimization.

Diff Detail

Repository
rL LLVM

Event Timeline

sorokin created this revision.Apr 14 2018, 7:39 AM
xbolva00 accepted this revision.Apr 14 2018, 7:45 AM
This revision is now accepted and ready to land.Apr 14 2018, 7:45 AM
xbolva00 added a comment.EditedApr 14 2018, 7:48 AM

Please update with the context.

xbolva00 requested changes to this revision.Apr 14 2018, 7:48 AM
This revision now requires changes to proceed.Apr 14 2018, 7:48 AM
sorokin updated this revision to Diff 142518.EditedApr 14 2018, 8:44 AM

I updated the diff with the context.

xbolva00 accepted this revision.Apr 14 2018, 9:59 AM
This revision is now accepted and ready to land.Apr 14 2018, 9:59 AM

Unfortunately when this if was added, no explanation was given why it is needed here. The corresponding test case is marked rdar://7726868, but I can not see what it is about.

I don't know enough about this to see the possible logic holes, but we should have someone with Radar access comment on the patch (adding some reviewer candidates).

As a preliminary clean-up, you might want to run update_llc_test_checks.py on this test file, so we can see the exact diffs in codegen.

sorokin added a comment.EditedApr 20 2018, 12:43 AM

As a preliminary clean-up, you might want to run update_llc_test_checks.py on this test file, so we can see the exact diffs in codegen.

I ran

$python2 update_llc_test_checks.py --llc-binary <directory-of-llc>/llc llvm/test/CodeGen/X86/sibcall.ll

and it modified the sibcall.ll. I'm not familiar with with the utility. I'm afraid I don't know how to use it properly and what is the meaning of the diff. I attached 2.patch. It is the changes this tool does to sibcall.ll.

$python2 update_llc_test_checks.py --llc-binary <directory-of-llc>/llc llvm/test/CodeGen/X86/sibcall.ll

and it modified the sibcall.ll. I'm not familiar with with the utility. I'm afraid I don't know how to use it properly and what is the meaning of the diff. I attached 2.patch. It is the changes this tool does to sibcall.ll. {F5984860}

The '-asm-verbose=false' params screwed up the script. I removed that and updated here:
rL330445

You may want to add your new tests to the file now, so we have a baseline on those too. Then rebase this patch.

xbolva00 requested changes to this revision.Apr 25 2018, 1:38 PM
This revision now requires changes to proceed.Apr 25 2018, 1:38 PM

@xbolva00 - If you request changes to a patch, you should specify what those changes are.

Also, why have you posted an identical patch at D46262?

@xbolva00 - If you request changes to a patch, you should specify what those changes are.

Also, why have you posted an identical patch at D46262?

I added @sorokin 's tests and regenerated test file, just in case this patch will not be updated by @sorokin so the idea will not lost.

xbolva00 added inline comments.Apr 30 2018, 6:40 AM
test/CodeGen/X86/sibcall.ll
310

Regenerate test file please

xbolva00 resigned from this revision.Jul 1 2018, 6:04 AM
sorokin updated this revision to Diff 159693.Aug 8 2018, 5:55 AM

A colleague of mine updated the patch. The patch is rebased against the llvm trunk. He also ran update_llc_test_checks.py on sibcall.ll.

@xbolva00 I'm perfectly OK if you (or any other person) champion this patch. Also feel free to make whatever changes you consider to be necessary.

Ping!

Can you review this, please?

lebedev.ri retitled this revision from Enable sibling-call optimization for functions returning structs to [X86] Enable sibling-call optimization for functions returning structs.Aug 15 2018, 1:16 AM

While i'm unqualified to review this, i'm pretty sure you can safely land the tests themselves,
and update the diff to show the actual test change.

Thanks for the reply!

Dou you mean that it's a good idea to land newly added test cases (@t21_sret_to_sret and so on) with a separate commit, so we could see how current patch will change these test cases? Is it preferable to create a different review for the commit?

Thanks for the reply!

Dou you mean that it's a good idea to land newly added test cases (@t21_sret_to_sret and so on) with a separate commit, so we could see how current patch will change these test cases?

Correct.

Is it preferable to create a different review for the commit?

I don't see the point, just directly commit those tests and rebase this diff afterwards.

Ok, thanks again, I'll do it some later today.

aleksandr.urakov commandeered this revision.Aug 15 2018, 4:34 AM
aleksandr.urakov updated this revision to Diff 160772.
aleksandr.urakov added a reviewer: sorokin.

Update patch to see actual test cases changes.

lebedev.ri added inline comments.Aug 15 2018, 4:40 AM
test/CodeGen/X86/sibcall.ll
408–454

Hm, this looks wrong. There is no such check prefixes on the run lines (at the beginning of the file)
Did you use utils/update_llc_test_checks.py?

aleksandr.urakov marked an inline comment as done.Aug 15 2018, 4:50 AM
aleksandr.urakov added inline comments.
test/CodeGen/X86/sibcall.ll
408–454

Yes, I use this utility, but it doesn't delete these lines. They must've be left here since the first version of the patch and are treated as simple comments. I'll update this.

aleksandr.urakov marked an inline comment as done.

Remove meaningless comment lines from test.

spatel resigned from this revision.Aug 15 2018, 8:38 AM

@spatel Excuse me, can you explain, please, what's wrong with this patch now?

Ivan is my colleague, he have asked me to solve this review a week ago. And he have mentioned here, that he doesn't mind if someone will comandeer the review...

There was the very long delay with a reply, and I understand that it's not good (and I'm very sorry for that really). But now I want to solve this in the nearest future.

I really hope for your understanding.

@spatel Excuse me, can you explain, please, what's wrong with this patch now?

Ivan is my colleague, he have asked me to solve this review a week ago. And he have mentioned here, that he doesn't mind if someone will comandeer the review...

There was the very long delay with a reply, and I understand that it's not good (and I'm very sorry for that really). But now I want to solve this in the nearest future.

I really hope for your understanding.

Sorry - I didn't mean to imply that the patch was wrong in any way by resigning as reviewer. I did that only because I don't know enough to review the patch! The test changes certainly look like wins.
As I said earlier, I'm hoping that someone with access to the Radar bug that motivated the existing code can comment. If not, then someone who knows more about the calling conventions should have a look (but I'm not sure who else to add as a potential reviewer).

@spatel Thanks, now I understand you! Sorry for so much fuss on this. Hopefully someone with Radar access will review the patch.

@spatel Thanks, now I understand you! Sorry for so much fuss on this. Hopefully someone with Radar access will review the patch.

I guess two straight-forward questions are:

  • What do other compilers do? Can you come up with the similar tests (@t21_*) in C, and post godbolt links?
  • Have you tried using this patch? Does stage-2 of llvm build fine, and tests pass? testsuite?

Unfortunately I'm OOO today. But I ran LLVM LIT tests with this patch, and there was no new fails with it. I'll try to build whole LLVM with patched LLVM version (and will run tests within it) and will prepare gotbolt links during a week.

I made a C++ example for all 4 cases: sret/ptr function calls sret/ptr function.

Here is the link: https://godbolt.org/g/FxUYvv

I don't know if it is possible to make a ptr to sret function in pure C. I did this only in C++.

  • Have you tried using this patch? Does stage-2 of llvm build fine, and tests pass? testsuite?

I built LLVM before submitting this patch. Back then LLVM successfully rebuilt itself and rebuilt version passed all tests. I used that version as my main compiler for a while.
I haven't done more rigorous testing like rebuilding a linux distribution though.

I made a C++ example for all 4 cases: sret/ptr function calls sret/ptr function.

Here is the link: https://godbolt.org/g/FxUYvv

I don't know if it is possible to make a ptr to sret function in pure C. I did this only in C++.

By C i meant anything that is not LLVM IR, so other compilers can be compared.
But, the results are not what i have hoped to see.
I kinda thought others were already doing this, and clang wasn't.
It does not look like it.
Or the C++ testcases are different from llvm ir, and we simply can't do this comparison.

So it seems we will need an ABI person after all :)

I made a C++ example for all 4 cases: sret/ptr function calls sret/ptr function.

Here is the link: https://godbolt.org/g/FxUYvv

I don't know if it is possible to make a ptr to sret function in pure C. I did this only in C++.

By C i meant anything that is not LLVM IR, so other compilers can be compared.
But, the results are not what i have hoped to see.
I kinda thought others were already doing this, and clang wasn't.
It does not look like it.
Or the C++ testcases are different from llvm ir, and we simply can't do this comparison.

So it seems we will need an ABI person after all :)

Ping @rsmith ? :)

Ping!

Can anyone review this, please?

For the C calling convention, there should probably be a testcase where the callee is sret and the caller is not. (I think we correctly reject the callee-pop cases that can't be handled, though.)

If both the caller and callee are sret, do you need to check that the sret argument is the same? Consider something like tail call void @f(%struct.foo* noalias sret @aa) nounwind.

RKSimon added inline comments.Aug 25 2018, 4:39 AM
test/CodeGen/X86/sibcall.ll
1–3

This is very minor, but shouldn't the prefixes really be X86 (instead of X32), X64 and X32 (instead of X32ABI)?

410

Old prefixes - this can just be removed as an NFC cleanup

410

Old prefixes - this can just be removed as an NFC cleanup

410

Old prefixes - this can just be removed as an NFC cleanup

aleksandr.urakov marked 4 inline comments as done.Aug 27 2018, 4:28 AM

Fixed with rL340735, thanks!

aleksandr.urakov abandoned this revision.Aug 27 2018, 5:09 AM

This patch fails in the @t22_non_sret_to_sret X86 case (as we can see, a stack becomes corrupted after the patch in this case). Thanks to @efriedma for pointing that. So I just abandon the revision. Thanks all for your help!

This patch fails in the @t22_non_sret_to_sret X86 case (as we can see, a stack becomes corrupted after the patch in this case). Thanks to @efriedma for pointing that. So I just abandon the revision. Thanks all for your help!

Is there no way to somehow actually

check that the sret argument is the same?

If I understand right, the fail becomes not because of the sret argument is not the same, but because in the C calling convention sret argument is cleaned up by a callee, while a non-sret pointer to a struct is cleaned up by a caller. Or maybe I am missing something?..

but because in the C calling convention sret argument is cleaned up by a callee, while a non-sret pointer to a struct is cleaned up by a caller

This only applies to 32-bit. (And even on 32-bit, we could still handle some cases where both the caller and callee are sret, I think.)

Yes, I meant 32-bit of course.

I'm not sure about this fix, now it doesn't seem so simple to me... What do you think about the next check?

if (IsCalleeWin64 != IsCallerWin64)
  return false;

Do you see any pitfalls here?

As for different sret arguments, I think that it doesn't matter, because they will be processed in the same way for the same calling convention, and it shouldn't break the stack. So we can replace a call with a jump in this case, right?

Yes, I meant 32-bit of course.

I'm not sure about this fix, now it doesn't seem so simple to me... What do you think about the next check?

if (IsCalleeWin64 != IsCallerWin64)
  return false;

Do you see any pitfalls here?

I would rather not mess with this... mixing win64 and non-win64 calling conventions is going to be extremely rare in practice, and I don't really want to spend time combing through the ABI documents trying to figure out if anything can go wrong.

As for different sret arguments, I think that it doesn't matter, because they will be processed in the same way for the same calling convention, and it shouldn't break the stack. So we can replace a call with a jump in this case, right?

The problem would just be that EAX/RAX is set to the wrong value.

aleksandr.urakov added a comment.EditedAug 28 2018, 12:32 PM

I would rather not mess with this... mixing win64 and non-win64 calling conventions is going to be extremely rare in practice, and I don't really want to spend time combing through the ABI documents trying to figure out if anything can go wrong.

Oh, sorry, I've changed another condition accidentally. I wanted to ask you about the next check:

if (isCalleeStructRet != isCallerStructRet)
   return false;

The problem would just be that EAX/RAX is set to the wrong value.

Yes, you are right, thanks. So the additional check of sret arguments equality is needed.

xbolva00 added a comment.EditedSep 6 2018, 7:05 AM

We should not abandon it. It seems we can optimize this case

struct MyStruct {

int arr[64];

};

void struct_by_value(MyStruct s);

void call_struct_by_value(int i, MyStruct s) {

struct_by_value(s);

}

https://godbolt.org/z/lxOq6B (GGC does it)

But as @efriedma noted, your motivating case is also possible to be handled as well.

If I understand correctly, you have shown the case that is not relating to this patch? This patch is about returning structures, not about passing them by value. I think that it would be better to create separate patch for fixing that?

As for this one, it seems that it is possible to optimize the sret-to-sret case if the sret arguments are the same, but I'm not sure if it's enough (@efriedma how do you think, is it ok?). I'm going to work on this, but I'll do it some later because I'm working on another part now and I'll have a vacation soon. Or may be @sorokin can do it earlier?

I'm not sure if it's enough (@efriedma how do you think, is it ok?).

Should be enough, I think.

@aleksandr.urakov are you still looking at this?

Unfortunately, I'm not looking at this exactly now due to some work with a higher priority. But I'm keeping it in mind.

Since this patch was abandoned, I will try to continue it here: https://reviews.llvm.org/D46262