This is an archive of the discontinued LLVM Phabricator instance.

Implement lambdas with inalloca parameters by forwarding to function without inalloca calling convention.
ClosedPublic

Authored by akhuang on Nov 11 2022, 3:19 PM.

Details

Summary

Lambda functions with inalloca parameters (which are used in 32-bit Windows) currently aren't supported because the call operator delegates to another function and forwards its arguments, and inalloca parameters can't be forwarded. (The same issue exists for variadic arguments.)

This patch emits the lambda call operator again with a different calling convention without inallocas. Then the original call operator and static invoker delegate to this new function. We also modify arrangeLLVMFunctionInfo to be able to generate a different CGFunctionInfo for delegate calls that doesn't use inalloca.

Fixes https://github.com/llvm/llvm-project/issues/28673.

Diff Detail

Event Timeline

akhuang created this revision.Nov 11 2022, 3:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 11 2022, 3:19 PM
akhuang requested review of this revision.Nov 11 2022, 3:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 11 2022, 3:19 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

I'm not quite sure I understand what's happening here. Does this actually avoid generating two copies of the function body if both the call operator and the conversion are used?

akhuang updated this revision to Diff 476633.Nov 18 2022, 4:29 PM

Clean up existing code and add code to make the call operator also call the new function.

I'm not quite sure I understand what's happening here. Does this actually avoid generating two copies of the function body if both the call operator and the conversion are used?

That was intended but missing; patch is now updated to make both the invoker and the call operator call the new function.

rnk added inline comments.Nov 21 2022, 11:52 AM
clang/include/clang/CodeGen/CGFunctionInfo.h
571 ↗(On Diff #476634)

This is an amusing name, but we should try to find something better. :)

Can we build on this flag to support re-writing a varargs prototype to receive a va_list? If so, we should plan for that, and name this to cover both use cases. Elsewhere in clang we use the terminology "delegate call" see EmitDelegateCallArg. Could this be something like ForDelegation or ForDelegateCall or IsDelegateCall or IsDelegateTarget or something like that? Or Thunk? I think I like ForDelegateCall, but I'm not sure.


Does anyone know what a "chain call" is? It wasn't clear to me immediately.

clang/lib/CodeGen/CGCall.cpp
194

This still requires boolean parameters in order. I think LLVM tends to use flag enums for this, something like:

enum class FnInfoOptions {
  None = 0,
  IsInstanceMethod = 1 << 0,
  IsChainCall = 1 << 1,
  IsDelegateCall = 1 << 2,
};

Call sites look like:

arrangeLLVMFunctionInfo(RetTy, FnInfoOptions::None, None, FTNP->getExtInfo()...)
arrangeLLVMFunctionInfo(RetTy, FnInfoOptions::IsInstanceMethod, None, FTNP->getExtInfo()...)
clang/lib/CodeGen/CGClass.cpp
3012

I believe this is only necessary for x86 platforms, so we should check the architecture too to avoid the extra work on x64.

3156

We should use a name which demangles correctly, as we do for __invoke. This is a good first draft, though.

clang/test/CodeGenCXX/inalloca-lambda.cpp
4 ↗(On Diff #476634)

Just to make the IR slightly nicer, give this an int member.

8 ↗(On Diff #476634)

Please expand the testing further to include a case where the same lambda is called directly and goes through function pointer decay, something like this:
https://gcc.godbolt.org/z/q7x3ov5e1

This is to ensure that we don't end up double-emitting the implementation of the lambda.

akhuang updated this revision to Diff 477289.Nov 22 2022, 1:52 PM
akhuang marked 4 inline comments as done.

add to test case, modify name mangling, change fn info opts enum type

rnk added inline comments.Nov 22 2022, 3:55 PM
clang/lib/CodeGen/CGCall.cpp
735

We could avoid static_cast by defining the standard flag operators, similar to what we do in this macro:
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/DebugInfo/CodeView/CodeView.h#L52

Alternatively, just define operator| and operator&.

clang/lib/CodeGen/TargetInfo.cpp
1190 ↗(On Diff #477289)

Please update the old parameter name

1841 ↗(On Diff #477289)

As a simplification, you can put || isDelegateCall here, we're basically pretending that this returns RAA_Indirect.

akhuang updated this revision to Diff 478324.Nov 28 2022, 12:20 PM
akhuang marked 3 inline comments as done.

Address comments

akhuang retitled this revision from Try to implement lambdas with inalloca parameters by forwarding without use of inallocas. to Implement lambdas with inalloca parameters by forwarding to function without inalloca calling convention..Dec 5 2022, 2:47 PM
akhuang edited the summary of this revision. (Show Details)

ping @efriedma, do you mind looking at this again? thanks!

clang/include/clang/CodeGen/CGFunctionInfo.h
571 ↗(On Diff #476634)

Yeah, that naming makes sense.

chain call: https://reviews.llvm.org/D6332?

clang/lib/CodeGen/CGClass.cpp
3156

Fixed to replace different part of the mangled name with __impl. Don't know if it would be better to use the actual mangling functions, but the string replace here seems simple enough.

whoops, I've left this here for a while.. @efriedma, are you able to review it?

I'm having a bit of trouble following how exactly the thunk creation is working here... do we generate different code depending on whether the call operator and/or the static invoker are referenced? Why is the function in EmitLambdaInAllocaCallOpFn not getting defined using the normal CodeGenModule machinery?

clang/lib/CodeGen/CGCall.cpp
770

I think you need to pass isDelegateCall to CGFunctionInfo::Profile.

I'm having a bit of trouble following how exactly the thunk creation is working here... do we generate different code depending on whether the call operator and/or the static invoker are referenced?

So I think it used to be that the static invoker calls the call operator which contains the body of the lambda? And now both the static invoker and the call operator are delegating to this new __impl function. In EmitLambdaStaticInvokeBody it first calls EmitLambdaInAllocaCallOpFn to make the new call operator (which is populated using EmitLambdaDelegatingInvokeBody). And then inside EmitLambdaDelegatingInvokeBody it generates the new impl function and calls it in the body. Not sure if that answers the question, I agree the code is a bit roundabout.

Why is the function in EmitLambdaInAllocaCallOpFn not getting defined using the normal CodeGenModule machinery?

Do you mean why it's not using CodeGenModule machinery to generate the new call op body or why we're changing the original call op body?

I'm having a bit of trouble following how exactly the thunk creation is working here... do we generate different code depending on whether the call operator and/or the static invoker are referenced?

So I think it used to be that the static invoker calls the call operator which contains the body of the lambda? And now both the static invoker and the call operator are delegating to this new __impl function. In EmitLambdaStaticInvokeBody it first calls EmitLambdaInAllocaCallOpFn to make the new call operator (which is populated using EmitLambdaDelegatingInvokeBody). And then inside EmitLambdaDelegatingInvokeBody it generates the new impl function and calls it in the body. Not sure if that answers the question, I agree the code is a bit roundabout.

Why is the function in EmitLambdaInAllocaCallOpFn not getting defined using the normal CodeGenModule machinery?

Do you mean why it's not using CodeGenModule machinery to generate the new call op body or why we're changing the original call op body?

I mean why it's not using the CodeGenModule machinery. I understand there are three function bodies involved. But I would expect that when you "emit" the lambda call operator, it emits the entry point for the call and for the call impl, and when you emit the static invoker, it just emits a call to the call impl.

Or... hmm. Is this actually erasing the existing definition of the call operator, if it was already emitted? That probably doesn't actually work in general; once a function is emitted, you can't re-emit it. Emitting a function emits other related stuff like static variables; you can't EmitFunctionBody() more than once. Either you need to decide when it's first emitted, or you need to rewrite the signature of the definition and splice the existing body in.

akhuang updated this revision to Diff 529747.Jun 8 2023, 3:32 PM
  • move call operator emission into the path in GenerateCode
  • formatting
akhuang updated this revision to Diff 529750.Jun 8 2023, 3:35 PM

remove print statements

I'm having a bit of trouble following how exactly the thunk creation is working here... do we generate different code depending on whether the call operator and/or the static invoker are referenced?

So I think it used to be that the static invoker calls the call operator which contains the body of the lambda? And now both the static invoker and the call operator are delegating to this new __impl function. In EmitLambdaStaticInvokeBody it first calls EmitLambdaInAllocaCallOpFn to make the new call operator (which is populated using EmitLambdaDelegatingInvokeBody). And then inside EmitLambdaDelegatingInvokeBody it generates the new impl function and calls it in the body. Not sure if that answers the question, I agree the code is a bit roundabout.

Why is the function in EmitLambdaInAllocaCallOpFn not getting defined using the normal CodeGenModule machinery?

Do you mean why it's not using CodeGenModule machinery to generate the new call op body or why we're changing the original call op body?

I mean why it's not using the CodeGenModule machinery. I understand there are three function bodies involved. But I would expect that when you "emit" the lambda call operator, it emits the entry point for the call and for the call impl, and when you emit the static invoker, it just emits a call to the call impl.

Or... hmm. Is this actually erasing the existing definition of the call operator, if it was already emitted? That probably doesn't actually work in general; once a function is emitted, you can't re-emit it. Emitting a function emits other related stuff like static variables; you can't EmitFunctionBody() more than once. Either you need to decide when it's first emitted, or you need to rewrite the signature of the definition and splice the existing body in.

good point, I realized I can just put the call operator emission where the call op is normally emitted, which removes the thing where I was trying to create a new call op while the static invoker body is being emitted.

akhuang updated this revision to Diff 529755.Jun 8 2023, 3:40 PM

more cleanup

akhuang updated this revision to Diff 529780.Jun 8 2023, 4:44 PM

Fix ordering in the test case

efriedma added inline comments.Jun 10 2023, 2:45 PM
clang/lib/CodeGen/CGClass.cpp
3174

Is there any way we can use GenerateCode as the entrypoint here, instead of calling EmitFunctionBody directly? Without this patch, EmitFunctionBody has exactly one caller, so this makes the control flow and special cases more complicated to reason about. For example, there's some special coroutine handling in GenerateCode.

clang/lib/CodeGen/CodeGenFunction.cpp
1472 ↗(On Diff #529780)

Does this pass the correct value of "this"? EmitLambdaStaticInvokeBody creates a new alloca to represent "this", but it's already an argument to the function.

Granted, it only matters in really obscure cases, but still.

akhuang added inline comments.Jun 14 2023, 11:21 AM
clang/lib/CodeGen/CGClass.cpp
3174

Changed to using GenerateCode -- had to add another case to the path in GenerateCode to make sure we emit different things for the call op body inside __impl and the call op body inside the call op function.

clang/lib/CodeGen/CodeGenFunction.cpp
1472 ↗(On Diff #529780)

That's true, the "this" won't be passed correctly.

Actually, would it be fine to just emit the original call op body? (so that the same function body is emitted twice -- once in the call op and once in __impl).

akhuang updated this revision to Diff 531435.Jun 14 2023, 11:21 AM

Call GenerateCode to emit __impl function body

efriedma added inline comments.Jun 14 2023, 12:00 PM
clang/lib/CodeGen/CodeGenFunction.cpp
1472 ↗(On Diff #529780)

Not completely sure what you're asking... but as I've mentioned, we can't EmitFunctionBody() the body of a function more than once.

akhuang updated this revision to Diff 531569.Jun 14 2023, 4:37 PM

Emit call op which forwards %this argument

clang/lib/CodeGen/CodeGenFunction.cpp
1472 ↗(On Diff #529780)

Oh, nevermind then. I think this version passes "this" from the call op.

akhuang updated this revision to Diff 531571.Jun 14 2023, 4:47 PM

update test case

efriedma added inline comments.Jun 15 2023, 5:04 PM
clang/lib/CodeGen/CodeGenFunction.cpp
1470 ↗(On Diff #531571)

I'm not sure __impl is unique enough that user code will never use it. (I mean, it's reserved, but we don't actually forbid using it.)

Can we use the isDelegateCall() bit?

akhuang updated this revision to Diff 532291.Jun 16 2023, 2:11 PM

Use isDelegateCall to check whether function is actual call op fn or not

clang/lib/CodeGen/CodeGenFunction.cpp
1470 ↗(On Diff #531571)

yep, that makes more sense.

This revision is now accepted and ready to land.Jun 16 2023, 4:33 PM
rnk added a comment.Jun 20 2023, 1:29 PM

I'm really happy to see this limitation finally addressed! It's quite a bit of embarrassing tech debt that affects folks making Windows x86 builds.

akhuang updated this revision to Diff 533079.Jun 20 2023, 4:18 PM

rebase and clang format

This revision was landed with ongoing or failed builds.Jun 20 2023, 5:31 PM
This revision was automatically updated to reflect the committed changes.
hans added a subscriber: hans.Aug 7 2023, 1:43 AM

For those following along, this was relanded in https://reviews.llvm.org/D154007