This is an archive of the discontinued LLVM Phabricator instance.

CodeGenObjCXX: handle inalloca appropriately for msgSend variant
ClosedPublic

Authored by compnerd on Feb 27 2018, 4:42 PM.

Details

Reviewers
rjmccall
rnk
Summary

objc_msgSend_stret takes a hidden parameter for the returned structure's
address for the construction. When the function signature is rewritten
for the inalloca passing, the return type is no longer marked as
indirect but rather inalloca stret. This enhances the test for the
indirect return to check for that case as well. This fixes the
incorrect return classification for Windows x86.

Diff Detail

Repository
rC Clang

Event Timeline

compnerd created this revision.Feb 27 2018, 4:42 PM

Ugh, I hate inalloca *so much*.

It's still an indirect return, right? It's just that the return-slot pointer has to get stored to the inalloca allocation like any other argument?

Ugh, I hate inalloca *so much*.

It's still an indirect return, right? It's just that the return-slot pointer has to get stored to the inalloca allocation like any other argument?

Correct.

Okay. In that case, this seems correct, although it seems to me that perhaps inalloca is not actually orthogonal to anything else. In fact, it seems to me that maybe inalloca ought to just be a bit on the CGFunctionInfo and the individual ABIInfos should be left alone.

Yeah, this is still an indirect return. I can see your point about the representation, nfortunately, I think that change is way out of scope for this. That would be a pretty large and invasive change to wire that through.

rjmccall accepted this revision.Feb 28 2018, 12:04 PM

Alright, yeah, we can fix that later. LGTM.

This revision is now accepted and ready to land.Feb 28 2018, 12:04 PM
compnerd closed this revision.Feb 28 2018, 12:18 PM

SVN r326362

rnk added a comment.Feb 28 2018, 1:50 PM

Okay. In that case, this seems correct, although it seems to me that perhaps inalloca is not actually orthogonal to anything else. In fact, it seems to me that maybe inalloca ought to just be a bit on the CGFunctionInfo and the individual ABIInfos should be left alone.

I don't think this is a good idea, since when inalloca mixes with __thiscall and __fastcall, register parameters are still passed directly. Consider:

struct S {
  S();
  S(const S &);
  int x;
  void thiscall_byval(S o);
};
void S::thiscall_byval(S o) {}
int __fastcall fastcall_byval(int x, int y, S o) { return x + y; }

$ clang -S t.cpp -emit-llvm -m32 -o - | grep define
define dso_local x86_thiscallcc void @"\01?thiscall_byval@S@@QAEXU1@@Z"(%struct.S* %this, <{ %struct.S }>* inalloca) #0 align 2 {
define dso_local x86_fastcallcc i32 @"\01?fastcall_byval@@YIHHHUS@@@Z"(i32 inreg %x, i32 inreg %y, <{ %struct.S }>* inalloca) #0 {

So sometimes the sret parameter is not in the inalloca pack:

S __fastcall fastcall_sret(int x, int y, S o) { return o; }
define dso_local x86_fastcallcc void @"\01?fastcall_sret@@YI?AUS@@HHU1@@Z"(%struct.S* inreg noalias sret %agg.result, i32 inreg %x, <{ i32, %struct.S }>* inalloca)

I think the long term way to clean this up is to use separate allocations for each argument, and use tokens to prevent the middle end from messing up the SESE region between the argument allocation point and the call that consumes the allocations.

To call foo, the IR would look like:

void foo(int x, S y, int z, S w);
..
foo(1, S(2), 3, S(4));
...
; setup w
%w.token = call token @llvm.allocarg(i32 4)
%w.addr = call %S* @llvm.argaddr(token %w.token)
call void @S_ctor(%S* %w.addr, i32 4)
; setup y
%y.token = call token @llvm.allocarg(i32 4)
%y.addr = call %S* @llvm.argaddr(token %y.token)
call void @S_ctor(%S* %y.addr, i32 2)
; do the call, consume the tokens
call void @foo(i32 1, %S* inalloca %y.addr, i32 3, %S* inalloca %w.addr) "allocargs"[token %w.token, token %y.token]

The tokens would make it illegal to tail merge two calls to foo in an if/else diamond with PHIs.

It would also mean that all of this Obj-C IRGen code can go back to assuming that pointers are passed directly, because now the presence of one non-trivially copyable struct doesn't affect the ABIInfo of every other argument.

I am concerned about what happens in evil code like:

  foo(({ goto abort_call; 0 }), S(2), 3, S(4));
abort_call:

In this case, control flow breaks out of expression evaluation, and we never deallocate the stack memory for the call. We'd need some way to clean that up, perhaps with a cleanup.

This would all be a lot easier if LLVM IR had scopes and it wasn't all just basic block soup. Just saying.

That's a fair point. I agree that separate allocas would make this a lot cleaner, in both IR and frontend implementation. We could also set an inalloca bit (+ field index? or maybe keep the arg index -> field index mapping on the CGFunctionInfo) on each arg info *in addition* to the standard ABI info; then most of the cases could just do a final store after their normal transformations. Are there really no cases where you have to e.g. sign-extend before doing inalloca, or do you just special-case all that in actual argument emission?

Yeah, the block-soup basis of LLVM IR is really annoying for a lot of things that otherwise ought to be simple tasks. I spent some time trying to think about how I could support (scoped) dynamic allocas in coroutines for Swift and eventually just gave up and introduced a bunch of intrinsics based around a token. Trying to optimize static allocas based on lifetime intrinsics isn't going to be a picnic, either.