objc_msgSend_stret takes a hidden parameter for the returned structure's
address for the construction. When the function signature is rewritten
for the inalloca passing, the return type is no longer marked as
indirect but rather inalloca stret. This enhances the test for the
indirect return to check for that case as well. This fixes the
incorrect return classification for Windows x86.
Details
Diff Detail
- Repository
- rC Clang
Event Timeline
Ugh, I hate inalloca *so much*.
It's still an indirect return, right? It's just that the return-slot pointer has to get stored to the inalloca allocation like any other argument?
Okay. In that case, this seems correct, although it seems to me that perhaps inalloca is not actually orthogonal to anything else. In fact, it seems to me that maybe inalloca ought to just be a bit on the CGFunctionInfo and the individual ABIInfos should be left alone.
Yeah, this is still an indirect return. I can see your point about the representation, nfortunately, I think that change is way out of scope for this. That would be a pretty large and invasive change to wire that through.
I don't think this is a good idea, since when inalloca mixes with __thiscall and __fastcall, register parameters are still passed directly. Consider:
struct S { S(); S(const S &); int x; void thiscall_byval(S o); }; void S::thiscall_byval(S o) {} int __fastcall fastcall_byval(int x, int y, S o) { return x + y; } $ clang -S t.cpp -emit-llvm -m32 -o - | grep define define dso_local x86_thiscallcc void @"\01?thiscall_byval@S@@QAEXU1@@Z"(%struct.S* %this, <{ %struct.S }>* inalloca) #0 align 2 { define dso_local x86_fastcallcc i32 @"\01?fastcall_byval@@YIHHHUS@@@Z"(i32 inreg %x, i32 inreg %y, <{ %struct.S }>* inalloca) #0 {
So sometimes the sret parameter is not in the inalloca pack:
S __fastcall fastcall_sret(int x, int y, S o) { return o; } define dso_local x86_fastcallcc void @"\01?fastcall_sret@@YI?AUS@@HHU1@@Z"(%struct.S* inreg noalias sret %agg.result, i32 inreg %x, <{ i32, %struct.S }>* inalloca)
I think the long term way to clean this up is to use separate allocations for each argument, and use tokens to prevent the middle end from messing up the SESE region between the argument allocation point and the call that consumes the allocations.
To call foo, the IR would look like:
void foo(int x, S y, int z, S w); .. foo(1, S(2), 3, S(4)); ... ; setup w %w.token = call token @llvm.allocarg(i32 4) %w.addr = call %S* @llvm.argaddr(token %w.token) call void @S_ctor(%S* %w.addr, i32 4) ; setup y %y.token = call token @llvm.allocarg(i32 4) %y.addr = call %S* @llvm.argaddr(token %y.token) call void @S_ctor(%S* %y.addr, i32 2) ; do the call, consume the tokens call void @foo(i32 1, %S* inalloca %y.addr, i32 3, %S* inalloca %w.addr) "allocargs"[token %w.token, token %y.token]
The tokens would make it illegal to tail merge two calls to foo in an if/else diamond with PHIs.
It would also mean that all of this Obj-C IRGen code can go back to assuming that pointers are passed directly, because now the presence of one non-trivially copyable struct doesn't affect the ABIInfo of every other argument.
I am concerned about what happens in evil code like:
foo(({ goto abort_call; 0 }), S(2), 3, S(4)); abort_call:
In this case, control flow breaks out of expression evaluation, and we never deallocate the stack memory for the call. We'd need some way to clean that up, perhaps with a cleanup.
This would all be a lot easier if LLVM IR had scopes and it wasn't all just basic block soup. Just saying.
That's a fair point. I agree that separate allocas would make this a lot cleaner, in both IR and frontend implementation. We could also set an inalloca bit (+ field index? or maybe keep the arg index -> field index mapping on the CGFunctionInfo) on each arg info *in addition* to the standard ABI info; then most of the cases could just do a final store after their normal transformations. Are there really no cases where you have to e.g. sign-extend before doing inalloca, or do you just special-case all that in actual argument emission?
Yeah, the block-soup basis of LLVM IR is really annoying for a lot of things that otherwise ought to be simple tasks. I spent some time trying to think about how I could support (scoped) dynamic allocas in coroutines for Swift and eventually just gave up and introduced a bunch of intrinsics based around a token. Trying to optimize static allocas based on lifetime intrinsics isn't going to be a picnic, either.