This is an archive of the discontinued LLVM Phabricator instance.

Try to implement lambdas with inalloca parameters by inlining the call operator function.
Needs ReviewPublic

Authored by akhuang on Oct 28 2022, 4:08 PM.

Details

Reviewers
rnk

Diff Detail

Event Timeline

akhuang created this revision.Oct 28 2022, 4:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2022, 4:08 PM
Herald added a subscriber: nlopes. · View Herald Transcript
akhuang requested review of this revision.Oct 28 2022, 4:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2022, 4:08 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
nlopes added inline comments.Oct 29 2022, 1:29 AM
clang/lib/CodeGen/CGClass.cpp
3080

Please use PoisonValue instead whenever possible, as we are trying to remove undef from LLVM.
Thank you!

akhuang added a reviewer: rnk.Nov 2 2022, 2:24 PM
rnk added inline comments.Nov 2 2022, 3:00 PM
clang/lib/CodeGen/CGClass.cpp
3064

Let's use getNullValue instead, just to avoid any complications for msan or ubsan, which may check for defined-ness.

clang/lib/CodeGen/CodeGenModule.cpp
3595

Can you move most of the details of this out of line to try to keep this as high level as possible?

3597–3600

For simplicity, what if we always emitted the call operator for all lambda static invokers into the IR first? So, this logic would then become almost exactly the same as the emitCXXStructor logic above.

Later, in EmitLambdaStaticInvokeBody, we can detect the inalloca case and start the cloning.

akhuang added inline comments.Nov 2 2022, 3:30 PM
clang/lib/CodeGen/CodeGenModule.cpp
3597–3600

For simplicity, what if we always emitted the call operator for all lambda static invokers into the IR first? So, this logic would then become almost exactly the same as the emitCXXStructor logic above.

Do you know where this should happen? I couldn't really figure out a place other than here for emitting the call operator.

If I do the cloning inside the normal EmitLambdaStaticInvokeBody path it's a bit annoying because StartFunction/EndFunction get called before/after cloning.

rnk added inline comments.Nov 2 2022, 3:33 PM
clang/lib/CodeGen/CodeGenModule.cpp
3597–3600

Do you know where this should happen? I couldn't really figure out a place other than here for emitting the call operator.

Yes, I think you should do that here, just like we do for constructors. If it's a hack, it's one we already have. The main impact is that we emit the call operator in the IR first. That may require updating some tests, but it's nice to do the same thing on all platforms, and we'd need to do it to handle forwarding varargs lambdas anyway, which are present on all platforms.

I also think it's OK to delete all the IR from StartFunction after its been generated, that doesn't seem like a big deal. How does the varargs cloning logic handle this situation?

akhuang added inline comments.Nov 2 2022, 3:54 PM
clang/lib/CodeGen/CodeGenModule.cpp
3597–3600

Oh, ok, I see what you mean.

I'll try to upload a version with the cloning function inside EmitLambdaStaticInvokeBody. I think there's some stuff in Start/End Function that prevent you from simply deleting the code. (I don't think it's an issue for the varargs cloning because that isn't called within EmitGlobalFunctionDefinition. )

akhuang added inline comments.Nov 3 2022, 4:31 PM
clang/lib/CodeGen/CodeGenModule.cpp
3597–3600

Actually, hm, trying to do the function cloning inside EmitLambdaStaticInvokeBody/GenerateCode might not work because FinishFunction does various things like emit the return statement, which won't work if we just cloned the function.

akhuang updated this revision to Diff 473082.Nov 3 2022, 4:33 PM

moved some stuff around

rnk added a subscriber: efriedma.Nov 3 2022, 4:35 PM

+@efriedma, can you review this as a Clang CodeGen owner, according to the recently updated CodeOwners.rst file?
https://github.com/llvm/llvm-project/blob/main/clang/CodeOwners.rst#clang-llvm-ir-generation

clang/lib/CodeGen/CodeGenModule.cpp
3597–3600

If it doesn't work at all, I'm OK with doing this here, but I would like to try to move as much logic as possible out of EmitGlobalDefinition.

akhuang added inline comments.Nov 3 2022, 4:56 PM
clang/lib/CodeGen/CodeGenModule.cpp
3597–3600

Yeah, it doesn't seem ideal.. I guess the logic can also be moved into EmitGlobalFunctionDefinition (or a new CodeGenModule wrapper function) but that doesn't seem much better.

I don't think cloning the function in GenerateCode between StartFunction and FinishFunction will work though. Otherwise maybe can go back to trying a different method like changing the CXXMethodDecl from the call op and emitting it through the normal codegen path.

Should we try to use this codepath for variadic lambdas as well?

Do we want to try to unify our cloning code? CodeGenFunction::GenerateVarArgsThunk has code doing something similar. (It's at least worth comparing to see if you're doing something significantly different...)

clang/lib/CodeGen/CGClass.cpp
3063

Checking the name doesn't work; -fdiscard-value-names is on by default in Release builds

Might also be worth considering if we can avoid cloning here. It should be possible to emit the lambda body into a separate function with a calling convention of your choice, and make both the call operator and the static invoker call it.

akhuang updated this revision to Diff 474889.Nov 11 2022, 4:26 PM
akhuang marked an inline comment as done.

Move cloning code into a function.

Might also be worth considering if we can avoid cloning here. It should be possible to emit the lambda body into a separate function with a calling convention of your choice, and make both the call operator and the static invoker call it.

Thanks for reviewing! I started a patch to emit a new function with a different calling convention here https://reviews.llvm.org/D137872. It probably has some issues - I'm not sure if the argument arranging for the call actually works generally, and it has a bunch of copied code from EmitCall and EmitLambdaForwardingCall. But, it doesn't involve changing code in EmitGlobalDefinition, which seems better.

akhuang updated this revision to Diff 475269.Nov 14 2022, 1:58 PM

Fix calling convention of cloned function.

rnk added a comment.Nov 14 2022, 4:00 PM

Should we try to use this codepath for variadic lambdas as well?

Yes!

Do we want to try to unify our cloning code? CodeGenFunction::GenerateVarArgsThunk has code doing something similar. (It's at least worth comparing to see if you're doing something significantly different...)

Good idea

Might also be worth considering if we can avoid cloning here. It should be possible to emit the lambda body into a separate function with a calling convention of your choice, and make both the call operator and the static invoker call it.

This would be nice. I wasn't able to provide guidance on how to do that, and I think Amy was struggling to synthesize a new method (__invoke, __impl) at the AST layer.

Should we try to use this codepath for variadic lambdas as well?

Yes!

Do we want to try to unify our cloning code? CodeGenFunction::GenerateVarArgsThunk has code doing something similar. (It's at least worth comparing to see if you're doing something significantly different...)

Good idea

Might also be worth considering if we can avoid cloning here. It should be possible to emit the lambda body into a separate function with a calling convention of your choice, and make both the call operator and the static invoker call it.

This would be nice. I wasn't able to provide guidance on how to do that, and I think Amy was struggling to synthesize a new method (__invoke, __impl) at the AST layer.

I think you wouldn't actually synthesize it at all in the AST.

Basically, the follow code changes:

  1. Change the mangling of the function that contains the actual lambda body.
  2. Make that function use an alternate calling convention that doesn't involve inalloca etc.
  3. Synthesize thunks to represent the actual call operator and static invoker.

This is similar to the way virtual overriding works: there's an actual function, but there are also alternate entry points that end up in the vtables.

Should we try to use this codepath for variadic lambdas as well?

Yes!

Do we want to try to unify our cloning code? CodeGenFunction::GenerateVarArgsThunk has code doing something similar. (It's at least worth comparing to see if you're doing something significantly different...)

Good idea

Might also be worth considering if we can avoid cloning here. It should be possible to emit the lambda body into a separate function with a calling convention of your choice, and make both the call operator and the static invoker call it.

This would be nice. I wasn't able to provide guidance on how to do that, and I think Amy was struggling to synthesize a new method (__invoke, __impl) at the AST layer.

I think you wouldn't actually synthesize it at all in the AST.

Basically, the follow code changes:

  1. Change the mangling of the function that contains the actual lambda body.
  2. Make that function use an alternate calling convention that doesn't involve inalloca etc.
  3. Synthesize thunks to represent the actual call operator and static invoker.

This is similar to the way virtual overriding works: there's an actual function, but there are also alternate entry points that end up in the vtables.

Thanks, that makes sense. I'm not very familiar with thunks but I'll look into it. Would we make a thunk that contains the call to the new lambda body function, and then the invoker and call op functions return that thunk?