This is an archive of the discontinued LLVM Phabricator instance.

[clang] Create alloca to pass into static lambda
ClosedPublic

Authored by vitalybuka on Aug 19 2022, 9:13 PM.

Details

Summary

"this" parameter of lambda if undef, notnull and differentiable.
So we need to pass something consistent.

Any alloca will work. It will be eliminated as unused later by optimizer.

Otherwise we generate code which Msan is expected to catch.

Diff Detail

Event Timeline

vitalybuka created this revision.Aug 19 2022, 9:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2022, 9:13 PM
vitalybuka requested review of this revision.Aug 19 2022, 9:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2022, 9:13 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

fix for result slot

remove dump

Given that we mark up lambdas with the requirement that the pointer is nonnull/noundef/dereferenceable, we can't fix that by just messing with the attributes. Among other things, we could use a definition from a different module.

Either we relax the requirements globally for all lambdas, or we fix EmitLambdaDelegatingInvokeBody() so it passes a pointer to a proper alloca instead of undef.

Among other things, we could use a definition from a different module.

Is this a thing, I assumed lambda is always defined in the module?

Either we relax the requirements globally for all lambdas,

This looks like unnecessary special case.

or we fix EmitLambdaDelegatingInvokeBody() so it passes a pointer to a proper alloca instead of undef.

I will try to do this one, I assume we need any alloca and let optimized to eliminate it when it can.

Among other things, we could use a definition from a different module.

Is this a thing, I assumed lambda is always defined in the module?

If you have a lambda in an inline function, it ends up linkonce_odr.

Among other things, we could use a definition from a different module.

Is this a thing, I assumed lambda is always defined in the module?

If you have a lambda in an inline function, it ends up linkonce_odr.

Thanks. I see, linkonce_odr can pick the one which was not converted into static function, so with original attributes.

vitalybuka retitled this revision from [clang] Reset some attributed calling lambda to [clang] Create alloca to pass into static lambda.Aug 22 2022, 3:07 PM
vitalybuka edited the summary of this revision. (Show Details)

created alloca instead of changing attributes

efriedma accepted this revision.Aug 23 2022, 12:38 PM

LGTM

(We might want to consider messing with the rules for calling lambdas that don't capture anything, since there's no way to actually access it from the lambda body. But it's unlikely to matter, and this is the right fix to be consistent with the current rules.)

This revision is now accepted and ready to land.Aug 23 2022, 12:38 PM
This revision was automatically updated to reflect the committed changes.

Hi. For an embedded device, prior to this patch, we used to be able to see a significant savings of 28kB when we enabled the attributor. But with this patch, we don't see these savings anymore when enabling the attributor. I don't suppose folks might have any ideas on how this might affect the attributor? I'm going to attempt to make a minimal reproducer. Just chiming in early in case anyone might happen to know off the top of their head.

cc @jdoerfert who owns the attributor

Possibly the alloca doesn't get eliminated, or doesn't get eliminated early enough for attributor to kick in. Not sure what context would make that significant, off the top of my head.

Is the lambda in question defined in an inline function (linkonce_odr)? Maybe we should make EmitLambdaDelegatingInvokeBody emit an always_inline hint; it should be rare that a lambda is both called directly and converted to a function pointer.

EmitLambdaDelegatingInvokeBody emit an always_inline hint

To clarify, I mean on the call instruction, not on either function.

Ok here's some of my findings. So there's a step in the attributor where it replaces some instructions with unreachable. One step is:

*** IR Dump Before AttributorPass on [module] ***
; Function Attrs: inlinehint minsize nounwind optsize
define linkonce_odr dso_local i8 @_ZZN2pw3rpc8internal12NanopbMethod16SynchronousUnaryIXadL_ZN2ax3bud7bt_core20global_state_manager18GlobalStateManager20PropagateGlobalStateERK26_global_state_StateR18_pw_protobuf_EmptyEEEES2_jRKNS1_17NanopbMethodSerdeEENUlRNS0_7ServiceEPKvPvE_8__invokeESI_SK_SL_(ptr noundef nonnull align 4 dereferenceable(16) %0, ptr noundef %1, ptr noundef %2) #3 comdat align 2 !dbg !12909 {
  call void @llvm.dbg.value(metadata ptr %0, metadata !12913, metadata !DIExpression()), !dbg !12916
  call void @llvm.dbg.value(metadata ptr %1, metadata !12914, metadata !DIExpression()), !dbg !12916
  call void @llvm.dbg.value(metadata ptr %2, metadata !12915, metadata !DIExpression()), !dbg !12916
  %4 = call i8 @_ZZN2pw3rpc8internal12NanopbMethod16SynchronousUnaryIXadL_ZN2ax3bud7bt_core20global_state_manager18GlobalStateManager20PropagateGlobalStateERK26_global_state_StateR18_pw_protobuf_EmptyEEEES2_jRKNS1_17NanopbMethodSerdeEENKUlRNS0_7ServiceEPKvPvE_clESI_SK_SL_(ptr noundef nonnull align 1 dereferenceable(1) undef, ptr noundef nonnull align 4 dereferenceable(16) %0, ptr noundef %1, ptr noundef %2) #12, !dbg !12917
  ret i8 %4, !dbg !12917
}
*** IR Dump After AttributorPass on [module] ***
; Function Attrs: inlinehint minsize nounwind optsize
define linkonce_odr dso_local i8 @_ZZN2pw3rpc8internal12NanopbMethod16SynchronousUnaryIXadL_ZN2ax3bud7bt_core20global_state_manager18GlobalStateManager20PropagateGlobalStateERK26_global_state_StateR18_pw_protobuf_EmptyEEEES2_jRKNS1_17NanopbMethodSerdeEENUlRNS0_7ServiceEPKvPvE_8__invokeESI_SK_SL_(ptr noundef nonnull align 4 dereferenceable(16) %0, ptr noundef %1, ptr noundef %2) #3 comdat align 2 !dbg !12906 {
  call void @llvm.dbg.value(metadata ptr %0, metadata !12910, metadata !DIExpression()), !dbg !12913
  call void @llvm.dbg.value(metadata ptr %1, metadata !12911, metadata !DIExpression()), !dbg !12913
  call void @llvm.dbg.value(metadata ptr %2, metadata !12912, metadata !DIExpression()), !dbg !12913
  unreachable, !dbg !12914
}

The first argument in the call is an undef but the argument type is also marked as noundef, so this is unreachable. With this patch, in the final IR, the function will instead accept some non-undef argument:

define linkonce_odr dso_local i8 @_ZZN2pw3rpc8internal12NanopbMethod16SynchronousUnaryIXadL_ZN2ax3bud7bt_core20global_state_manager18GlobalStateManager20PropagateGlobalStateERK26_global_state_StateR18_pw_protobuf_EmptyEEEES2_jRKNS1_17NanopbMethodSerdeEENUlRNS0_7ServiceEPKvPvE_8__invokeESI_SK_SL_(ptr noundef nonnull align 4 dereferenceable(16) %0, ptr noundef nonnull %1, ptr noundef %2) #3 comdat align 2 !dbg !14764 {
;; .. a bunch of llvm.dbg.values
  tail call void @_ZN2ax3bud7bt_core20global_state_manager18GlobalStateManager12ReceiveStateERK26_global_state_State(ptr noundef nonnull align 8 dereferenceable(3864) %0, ptr noundef nonnull align 8 dereferenceable(528) %1) #17, !dbg !14808
  ret i8 0, !dbg !14809
}

Without this patch, before any passes run, this call originally took an undef:

define linkonce_odr dso_local i8 @_ZZN2pw3rpc8internal12NanopbMethod16SynchronousUnaryIXadL_ZN2ax3bud7bt_core20global_state_manager18GlobalStateManager20PropagateGlobalStateERK26_global_state_StateR18_pw_protobuf_EmptyEEEES2_jRKNS1_17NanopbMethodSerdeEENUlRNS0_7ServiceEPKvPv
E_8__invokeESI_SK_SL_(ptr noundef nonnull align 4 dereferenceable(16) %0, ptr noundef %1, ptr noundef %2) #3 comdat align 2 !dbg !14493 {
  %4 = alloca %"class.pw::Status", align 1
  %5 = alloca ptr, align 4
  %6 = alloca ptr, align 4
  %7 = alloca ptr, align 4
  %8 = alloca %"class.pw::Status", align 1
  store ptr %0, ptr %5, align 4, !tbaa !8211
  call void @llvm.dbg.declare(metadata ptr %5, metadata !14497, metadata !DIExpression()), !dbg !14500
  store ptr %1, ptr %6, align 4, !tbaa !8211
  call void @llvm.dbg.declare(metadata ptr %6, metadata !14498, metadata !DIExpression()), !dbg !14500
  store ptr %2, ptr %7, align 4, !tbaa !8211
  call void @llvm.dbg.declare(metadata ptr %7, metadata !14499, metadata !DIExpression()), !dbg !14500
  %9 = load ptr, ptr %5, align 4, !dbg !14501
  %10 = load ptr, ptr %6, align 4, !dbg !14501, !tbaa !8211
  %11 = load ptr, ptr %7, align 4, !dbg !14501, !tbaa !8211
  %12 = call i8 @_ZZN2pw3rpc8internal12NanopbMethod16SynchronousUnaryIXadL_ZN2ax3bud7bt_core20global_state_manager18GlobalStateManager20PropagateGlobalStateERK26_global_state_StateR18_pw_protobuf_EmptyEEEES2_jRKNS1_17NanopbMethodSerdeEENKUlRNS0_7ServiceEPKvPvE_clESI_SK_SL_(p
tr noundef nonnull align 1 dereferenceable(1) undef, ptr noundef nonnull align 4 dereferenceable(16) %9, ptr noundef %10, ptr noundef %11) #11, !dbg !14501
  %13 = getelementptr inbounds %"class.pw::Status", ptr %8, i32 0, i32 0, !dbg !14501
  store i8 %12, ptr %13, align 1, !dbg !14501
  call void @llvm.memcpy.p0.p0.i32(ptr align 1 %4, ptr align 1 %8, i32 1, i1 false), !dbg !14501, !tbaa.struct !9432
  %14 = getelementptr inbounds %"class.pw::Status", ptr %4, i32 0, i32 0, !dbg !14501
  %15 = load i8, ptr %14, align 1, !dbg !14501
  ret i8 %15, !dbg !14501
}

However, with this patch, it accepts an %8 = alloca %class.anon.61, align 1:

define linkonce_odr dso_local i8 @_ZZN2pw3rpc8internal12NanopbMethod16SynchronousUnaryIXadL_ZN2ax3bud7bt_core20global_state_manager18GlobalStateManager20PropagateGlobalStateERK26_global_state_StateR18_pw_protobuf_EmptyEEEES2_jRKNS1_17NanopbMethodSerdeEENUlRNS0_7ServiceEPKvPvE_8__invokeESI_SK_SL_(ptr noundef nonnull align 4 dereferenceable(16) %0, ptr noundef %1, ptr noundef %2) #3 comdat align 2 !dbg !14493 {
  %4 = alloca %"class.pw::Status", align 1
  %5 = alloca ptr, align 4
  %6 = alloca ptr, align 4
  %7 = alloca ptr, align 4
  %8 = alloca %class.anon.61, align 1
  %9 = alloca %"class.pw::Status", align 1
  store ptr %0, ptr %5, align 4, !tbaa !8211
  call void @llvm.dbg.declare(metadata ptr %5, metadata !14497, metadata !DIExpression()), !dbg !14500
  store ptr %1, ptr %6, align 4, !tbaa !8211
  call void @llvm.dbg.declare(metadata ptr %6, metadata !14498, metadata !DIExpression()), !dbg !14500
  store ptr %2, ptr %7, align 4, !tbaa !8211
  call void @llvm.dbg.declare(metadata ptr %7, metadata !14499, metadata !DIExpression()), !dbg !14500
  %10 = load ptr, ptr %5, align 4, !dbg !14501
  %11 = load ptr, ptr %6, align 4, !dbg !14501, !tbaa !8211
  %12 = load ptr, ptr %7, align 4, !dbg !14501, !tbaa !8211
  %13 = call i8 @_ZZN2pw3rpc8internal12NanopbMethod16SynchronousUnaryIXadL_ZN2ax3bud7bt_core20global_state_manager18GlobalStateManager20PropagateGlobalStateERK26_global_state_StateR18_pw_protobuf_EmptyEEEES2_jRKNS1_17NanopbMethodSerdeEENKUlRNS0_7ServiceEPKvPvE_clESI_SK_SL_(ptr noundef nonnull align 1 dereferenceable(1) %8, ptr noundef nonnull align 4 dereferenceable(16) %10, ptr noundef %11, ptr noundef %12) #11, !dbg !14501
  %14 = getelementptr inbounds %"class.pw::Status", ptr %9, i32 0, i32 0, !dbg !14501
  store i8 %13, ptr %14, align 1, !dbg !14501
  call void @llvm.memcpy.p0.p0.i32(ptr align 1 %4, ptr align 1 %9, i32 1, i1 false), !dbg !14501, !tbaa.struct !9432
  %15 = getelementptr inbounds %"class.pw::Status", ptr %4, i32 0, i32 0, !dbg !14501
  %16 = load i8, ptr %15, align 1, !dbg !14501
  ret i8 %16, !dbg !14501
}

It seems like the attributor is WAI. The size changes likely come from certain functions being removed altogether since this function is the only caller of a bunch of other functions, but since it's only user is replaced with unreachable, then all those functions get removed. So it looks like this patch keeps the call and all those functions alive by passing in a regular alloca instead of an undef.

It sounds like @efriedma's prediction is right in that this alloca isn't deleted early enough. Not sure what the best approach from here might be in ensuring that the call here still be removed so we can reclaim space.

Is the lambda in question defined in an inline function (linkonce_odr)? Maybe we should make EmitLambdaDelegatingInvokeBody emit an always_inline hint; it should be rare that a lambda is both called directly and converted to a function pointer.

The lambda in question is:

template <auto kMethod>
static constexpr NanopbMethod SynchronousUnary(
    uint32_t id, const NanopbMethodSerde& serde) {
  constexpr SynchronousUnaryFunction wrapper =
      [](Service& service, const void* req, void* resp) {
        return CallMethodImplFunction<kMethod>(
            service,
            *static_cast<const Request<kMethod>*>(req),
            *static_cast<Response<kMethod>*>(resp));
      };
  return NanopbMethod(
      id,
      SynchronousUnaryInvoker<AllocateSpaceFor<Request<kMethod>>(),
                              AllocateSpaceFor<Response<kMethod>>()>,
      Function{.synchronous_unary = wrapper},
      serde);
}

I can provide more context/snippets if needed.

The first argument in the call is an undef but the argument type is also marked as noundef, so this is unreachable

It looks like your code was getting miscompiled? If I'm understanding correctly, without this patch, we assume the lambda body is unreachable, and simply delete all the corresponding code. (That does save space, but not in a useful way.)