"this" parameter of lambda if undef, notnull and differentiable.
So we need to pass something consistent.
Any alloca will work. It will be eliminated as unused later by optimizer.
Otherwise we generate code which Msan is expected to catch.
Differential D132275
[clang] Create alloca to pass into static lambda vitalybuka on Aug 19 2022, 9:13 PM. Authored by
Details "this" parameter of lambda if undef, notnull and differentiable. Any alloca will work. It will be eliminated as unused later by optimizer. Otherwise we generate code which Msan is expected to catch.
Diff Detail
Event TimelineComment Actions Given that we mark up lambdas with the requirement that the pointer is nonnull/noundef/dereferenceable, we can't fix that by just messing with the attributes. Among other things, we could use a definition from a different module. Either we relax the requirements globally for all lambdas, or we fix EmitLambdaDelegatingInvokeBody() so it passes a pointer to a proper alloca instead of undef. Comment Actions Is this a thing, I assumed lambda is always defined in the module?
This looks like unnecessary special case.
I will try to do this one, I assume we need any alloca and let optimized to eliminate it when it can. Comment Actions Thanks. I see, linkonce_odr can pick the one which was not converted into static function, so with original attributes. Comment Actions LGTM (We might want to consider messing with the rules for calling lambdas that don't capture anything, since there's no way to actually access it from the lambda body. But it's unlikely to matter, and this is the right fix to be consistent with the current rules.) Comment Actions Hi. For an embedded device, prior to this patch, we used to be able to see a significant savings of 28kB when we enabled the attributor. But with this patch, we don't see these savings anymore when enabling the attributor. I don't suppose folks might have any ideas on how this might affect the attributor? I'm going to attempt to make a minimal reproducer. Just chiming in early in case anyone might happen to know off the top of their head. cc @jdoerfert who owns the attributor Comment Actions Possibly the alloca doesn't get eliminated, or doesn't get eliminated early enough for attributor to kick in. Not sure what context would make that significant, off the top of my head. Is the lambda in question defined in an inline function (linkonce_odr)? Maybe we should make EmitLambdaDelegatingInvokeBody emit an always_inline hint; it should be rare that a lambda is both called directly and converted to a function pointer. Comment Actions
To clarify, I mean on the call instruction, not on either function. Comment Actions Ok here's some of my findings. So there's a step in the attributor where it replaces some instructions with unreachable. One step is: *** IR Dump Before AttributorPass on [module] *** ; Function Attrs: inlinehint minsize nounwind optsize define linkonce_odr dso_local i8 @_ZZN2pw3rpc8internal12NanopbMethod16SynchronousUnaryIXadL_ZN2ax3bud7bt_core20global_state_manager18GlobalStateManager20PropagateGlobalStateERK26_global_state_StateR18_pw_protobuf_EmptyEEEES2_jRKNS1_17NanopbMethodSerdeEENUlRNS0_7ServiceEPKvPvE_8__invokeESI_SK_SL_(ptr noundef nonnull align 4 dereferenceable(16) %0, ptr noundef %1, ptr noundef %2) #3 comdat align 2 !dbg !12909 { call void @llvm.dbg.value(metadata ptr %0, metadata !12913, metadata !DIExpression()), !dbg !12916 call void @llvm.dbg.value(metadata ptr %1, metadata !12914, metadata !DIExpression()), !dbg !12916 call void @llvm.dbg.value(metadata ptr %2, metadata !12915, metadata !DIExpression()), !dbg !12916 %4 = call i8 @_ZZN2pw3rpc8internal12NanopbMethod16SynchronousUnaryIXadL_ZN2ax3bud7bt_core20global_state_manager18GlobalStateManager20PropagateGlobalStateERK26_global_state_StateR18_pw_protobuf_EmptyEEEES2_jRKNS1_17NanopbMethodSerdeEENKUlRNS0_7ServiceEPKvPvE_clESI_SK_SL_(ptr noundef nonnull align 1 dereferenceable(1) undef, ptr noundef nonnull align 4 dereferenceable(16) %0, ptr noundef %1, ptr noundef %2) #12, !dbg !12917 ret i8 %4, !dbg !12917 } *** IR Dump After AttributorPass on [module] *** ; Function Attrs: inlinehint minsize nounwind optsize define linkonce_odr dso_local i8 @_ZZN2pw3rpc8internal12NanopbMethod16SynchronousUnaryIXadL_ZN2ax3bud7bt_core20global_state_manager18GlobalStateManager20PropagateGlobalStateERK26_global_state_StateR18_pw_protobuf_EmptyEEEES2_jRKNS1_17NanopbMethodSerdeEENUlRNS0_7ServiceEPKvPvE_8__invokeESI_SK_SL_(ptr noundef nonnull align 4 dereferenceable(16) %0, ptr noundef %1, ptr noundef %2) #3 comdat align 2 !dbg !12906 { call void @llvm.dbg.value(metadata ptr %0, metadata !12910, metadata !DIExpression()), !dbg !12913 call void @llvm.dbg.value(metadata ptr %1, metadata !12911, metadata !DIExpression()), !dbg !12913 call void @llvm.dbg.value(metadata ptr %2, metadata !12912, metadata !DIExpression()), !dbg !12913 unreachable, !dbg !12914 } The first argument in the call is an undef but the argument type is also marked as noundef, so this is unreachable. With this patch, in the final IR, the function will instead accept some non-undef argument: define linkonce_odr dso_local i8 @_ZZN2pw3rpc8internal12NanopbMethod16SynchronousUnaryIXadL_ZN2ax3bud7bt_core20global_state_manager18GlobalStateManager20PropagateGlobalStateERK26_global_state_StateR18_pw_protobuf_EmptyEEEES2_jRKNS1_17NanopbMethodSerdeEENUlRNS0_7ServiceEPKvPvE_8__invokeESI_SK_SL_(ptr noundef nonnull align 4 dereferenceable(16) %0, ptr noundef nonnull %1, ptr noundef %2) #3 comdat align 2 !dbg !14764 { ;; .. a bunch of llvm.dbg.values tail call void @_ZN2ax3bud7bt_core20global_state_manager18GlobalStateManager12ReceiveStateERK26_global_state_State(ptr noundef nonnull align 8 dereferenceable(3864) %0, ptr noundef nonnull align 8 dereferenceable(528) %1) #17, !dbg !14808 ret i8 0, !dbg !14809 } Without this patch, before any passes run, this call originally took an undef: define linkonce_odr dso_local i8 @_ZZN2pw3rpc8internal12NanopbMethod16SynchronousUnaryIXadL_ZN2ax3bud7bt_core20global_state_manager18GlobalStateManager20PropagateGlobalStateERK26_global_state_StateR18_pw_protobuf_EmptyEEEES2_jRKNS1_17NanopbMethodSerdeEENUlRNS0_7ServiceEPKvPv E_8__invokeESI_SK_SL_(ptr noundef nonnull align 4 dereferenceable(16) %0, ptr noundef %1, ptr noundef %2) #3 comdat align 2 !dbg !14493 { %4 = alloca %"class.pw::Status", align 1 %5 = alloca ptr, align 4 %6 = alloca ptr, align 4 %7 = alloca ptr, align 4 %8 = alloca %"class.pw::Status", align 1 store ptr %0, ptr %5, align 4, !tbaa !8211 call void @llvm.dbg.declare(metadata ptr %5, metadata !14497, metadata !DIExpression()), !dbg !14500 store ptr %1, ptr %6, align 4, !tbaa !8211 call void @llvm.dbg.declare(metadata ptr %6, metadata !14498, metadata !DIExpression()), !dbg !14500 store ptr %2, ptr %7, align 4, !tbaa !8211 call void @llvm.dbg.declare(metadata ptr %7, metadata !14499, metadata !DIExpression()), !dbg !14500 %9 = load ptr, ptr %5, align 4, !dbg !14501 %10 = load ptr, ptr %6, align 4, !dbg !14501, !tbaa !8211 %11 = load ptr, ptr %7, align 4, !dbg !14501, !tbaa !8211 %12 = call i8 @_ZZN2pw3rpc8internal12NanopbMethod16SynchronousUnaryIXadL_ZN2ax3bud7bt_core20global_state_manager18GlobalStateManager20PropagateGlobalStateERK26_global_state_StateR18_pw_protobuf_EmptyEEEES2_jRKNS1_17NanopbMethodSerdeEENKUlRNS0_7ServiceEPKvPvE_clESI_SK_SL_(p tr noundef nonnull align 1 dereferenceable(1) undef, ptr noundef nonnull align 4 dereferenceable(16) %9, ptr noundef %10, ptr noundef %11) #11, !dbg !14501 %13 = getelementptr inbounds %"class.pw::Status", ptr %8, i32 0, i32 0, !dbg !14501 store i8 %12, ptr %13, align 1, !dbg !14501 call void @llvm.memcpy.p0.p0.i32(ptr align 1 %4, ptr align 1 %8, i32 1, i1 false), !dbg !14501, !tbaa.struct !9432 %14 = getelementptr inbounds %"class.pw::Status", ptr %4, i32 0, i32 0, !dbg !14501 %15 = load i8, ptr %14, align 1, !dbg !14501 ret i8 %15, !dbg !14501 } However, with this patch, it accepts an %8 = alloca %class.anon.61, align 1: define linkonce_odr dso_local i8 @_ZZN2pw3rpc8internal12NanopbMethod16SynchronousUnaryIXadL_ZN2ax3bud7bt_core20global_state_manager18GlobalStateManager20PropagateGlobalStateERK26_global_state_StateR18_pw_protobuf_EmptyEEEES2_jRKNS1_17NanopbMethodSerdeEENUlRNS0_7ServiceEPKvPvE_8__invokeESI_SK_SL_(ptr noundef nonnull align 4 dereferenceable(16) %0, ptr noundef %1, ptr noundef %2) #3 comdat align 2 !dbg !14493 { %4 = alloca %"class.pw::Status", align 1 %5 = alloca ptr, align 4 %6 = alloca ptr, align 4 %7 = alloca ptr, align 4 %8 = alloca %class.anon.61, align 1 %9 = alloca %"class.pw::Status", align 1 store ptr %0, ptr %5, align 4, !tbaa !8211 call void @llvm.dbg.declare(metadata ptr %5, metadata !14497, metadata !DIExpression()), !dbg !14500 store ptr %1, ptr %6, align 4, !tbaa !8211 call void @llvm.dbg.declare(metadata ptr %6, metadata !14498, metadata !DIExpression()), !dbg !14500 store ptr %2, ptr %7, align 4, !tbaa !8211 call void @llvm.dbg.declare(metadata ptr %7, metadata !14499, metadata !DIExpression()), !dbg !14500 %10 = load ptr, ptr %5, align 4, !dbg !14501 %11 = load ptr, ptr %6, align 4, !dbg !14501, !tbaa !8211 %12 = load ptr, ptr %7, align 4, !dbg !14501, !tbaa !8211 %13 = call i8 @_ZZN2pw3rpc8internal12NanopbMethod16SynchronousUnaryIXadL_ZN2ax3bud7bt_core20global_state_manager18GlobalStateManager20PropagateGlobalStateERK26_global_state_StateR18_pw_protobuf_EmptyEEEES2_jRKNS1_17NanopbMethodSerdeEENKUlRNS0_7ServiceEPKvPvE_clESI_SK_SL_(ptr noundef nonnull align 1 dereferenceable(1) %8, ptr noundef nonnull align 4 dereferenceable(16) %10, ptr noundef %11, ptr noundef %12) #11, !dbg !14501 %14 = getelementptr inbounds %"class.pw::Status", ptr %9, i32 0, i32 0, !dbg !14501 store i8 %13, ptr %14, align 1, !dbg !14501 call void @llvm.memcpy.p0.p0.i32(ptr align 1 %4, ptr align 1 %9, i32 1, i1 false), !dbg !14501, !tbaa.struct !9432 %15 = getelementptr inbounds %"class.pw::Status", ptr %4, i32 0, i32 0, !dbg !14501 %16 = load i8, ptr %15, align 1, !dbg !14501 ret i8 %16, !dbg !14501 } It seems like the attributor is WAI. The size changes likely come from certain functions being removed altogether since this function is the only caller of a bunch of other functions, but since it's only user is replaced with unreachable, then all those functions get removed. So it looks like this patch keeps the call and all those functions alive by passing in a regular alloca instead of an undef. It sounds like @efriedma's prediction is right in that this alloca isn't deleted early enough. Not sure what the best approach from here might be in ensuring that the call here still be removed so we can reclaim space.
The lambda in question is: template <auto kMethod> static constexpr NanopbMethod SynchronousUnary( uint32_t id, const NanopbMethodSerde& serde) { constexpr SynchronousUnaryFunction wrapper = [](Service& service, const void* req, void* resp) { return CallMethodImplFunction<kMethod>( service, *static_cast<const Request<kMethod>*>(req), *static_cast<Response<kMethod>*>(resp)); }; return NanopbMethod( id, SynchronousUnaryInvoker<AllocateSpaceFor<Request<kMethod>>(), AllocateSpaceFor<Response<kMethod>>()>, Function{.synchronous_unary = wrapper}, serde); } I can provide more context/snippets if needed. Comment Actions
It looks like your code was getting miscompiled? If I'm understanding correctly, without this patch, we assume the lambda body is unreachable, and simply delete all the corresponding code. (That does save space, but not in a useful way.) |