This is an archive of the discontinued LLVM Phabricator instance.

[X86] [pr51000] sret pointer tailcalling
Changes PlannedPublic

Authored by urnathan on Aug 27 2021, 11:12 AM.

Details

Reviewers
rnk
Summary

This patch implements tailcalling from an sret-pointer-returning function. Such functions can only tailcall sret-pointer-returning functions that we pass our own sret pointer to. When lowering the formal parameters we record the SDValue holding our incoming sret. When tail calling we check if we're passing that DAG value as the outgoing sret SDValue. We still have to copy the incoming sret to a virtual register when lowering parms, as (at least with FastISel), the DAG is no longer valid at the point we generate the move to the physical register on an actual return.

We need to #include SelectionDAGNodes.h into Target/X86/X86MachineFunctionInfo.h in order to hold the SDValue. An alternative would be to extract the SDNode from the value and record it as a void *, but that seems less clean.

sibcall.ll contains one test where the incoming sret parameter is no the first, which we then pass to a tail call as its sret argument. This patch will tail call that now, which is correct, but I can't help feeling that sibcall test is invalid -- are there real cases where the sret argument is not first?

While this patch does not contain an integration test from C++ to assembler, to make sure we actually resolve pr51000, I do have one available. It would be unpleasant to regress the bug report at some point.

Diff Detail

Event Timeline

urnathan created this revision.Aug 27 2021, 11:12 AM
urnathan requested review of this revision.Aug 27 2021, 11:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 27 2021, 11:12 AM
rnk added inline comments.Aug 30 2021, 12:36 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
4884

I don't think this will work in general, because IIRC all SDNode objects are destroyed and reused after selecting instructions in each basic block. You can use an ASan build with a multi-BB test case if you want to really confirm this theory.

I think you want to use the virtual register of the argument and compare that.

My memory is that the virtual SRet register doesn't compare equal because we insert an extra COPY instruction in the prologue:

bb.0.entry:
  liveins: $rdi, $rsi
  %1:gr64 = COPY $rsi
  %0:gr64 = COPY $rdi
  %2:gr64 = COPY %0:gr64  # EXTRA
...
  $rdi = COPY %0:gr64
  $rsi = COPY %1:gr64
  CALL64pcrel32 @_Z4copyR3Foo, ... , implicit $rsp, implicit $ssp, implicit $rdi, implicit $rsi, implicit-def $rsp, implicit-def $ssp
...
  $rax = COPY %2:gr64
  RET 0, $rax

So, the virtual registers being compared are %0 and %2.

I believe the extra copy is created because SelectionDAGBuilder.cpp is responsible for making the "standard" argument virtual register copies, and that happens later, outside X86TargetLowering::LowerFormalArguments. It is also only done if the argument is used outside the entry block.


So, I'm wondering if we can do a big refactoring. Maybe the thing to do is to change SDISel to save the sret argument virtual register when it updates the ValueMap, and have it do it for all targets. We will have to pretend that sret arguments are always used outside the current BB, since they are on many targets.

The target will decide later if it needs to insert an extra COPY from this virtual register when lowering returns. This should result in less duplicate code overall, and shorter machine IR in most cases. Interested in attempting that approach?

llvm/test/CodeGen/X86/sibcall.ll
454

This seems like it should optimize after your change. Is there a good reason it can't?

701

Looks like we already have good test cases for when TCO should fail due to sret arg mismatch.

urnathan added inline comments.Sep 3 2021, 4:29 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
4884

Ah, it seems the single-bb nature of the test cases misled me about sdnode lifetime.

The target will decide later if it needs to insert an extra COPY from this virtual
register when lowering returns. This should result in less duplicate code overall,
and shorter machine IR in most cases. Interested in attempting that approach?

right, that was what I made an attempt at to avoid needing the virtual reg /and/ the sdnode in the machine function structure. As I think I mentioned, that failed on fastisel due to sdnode lifetime -- and I thought that was some quirk of fastisel, not the bb thing you mention.

I need to understand more about the isel dag

thank you for your comments

llvm/test/CodeGen/X86/sibcall.ll
454

IIRC it was to do with the caller and callee arg popping mismatch (which looked to be an incorrect determination to my cursory investigation). I wanted to make progress though, rather than let perfection get in the way.

urnathan planned changes to this revision.Apr 6 2022, 8:13 AM

I don't want to lose track of your valuable comments, maybe sometime I'll have a chance to look again.

Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2022, 8:13 AM