This is an archive of the discontinued LLVM Phabricator instance.

[Coroutines] Don't mark the parameter attribute of resume function as noalias
ClosedPublic

Authored by ChuanqiXu on Dec 4 2022, 11:03 PM.

Details

Summary

Close https://github.com/llvm/llvm-project/issues/59221.

The root cause for the problem is that we marked the parameter of the resume/destroy functions as noalias previously. But this is not true. There are a lot of getelementptr uses for the parameter in the resume/destroy functions. This violates the assumption of noalias. So it would be better to remove this attribute.

Diff Detail

Event Timeline

ChuanqiXu created this revision.Dec 4 2022, 11:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 4 2022, 11:03 PM
ChuanqiXu requested review of this revision.Dec 4 2022, 11:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 4 2022, 11:03 PM
ChuanqiXu added inline comments.Dec 4 2022, 11:05 PM
llvm/lib/Transforms/Coroutines/CoroSplit.cpp
519–525

It is dead. It would be covered by the following change.

849–853

The documentation says the NonNull attribute should be used in the combination of NoUndef.

850–851

Although I suspect the Retcon ABI has the similar problem, this maintains the same behavior with Retcon ABI.

I don't know what you mean about GEPs violating noalias. As I understand it, noalias is like restrict; it says that it can be assumed that nothing aliases the parameter except pointers derived from it. GEPs derive pointers in a way that is explicitly permitted under that.

I don't know what you mean about GEPs violating noalias. As I understand it, noalias is like restrict; it says that it can be assumed that nothing aliases the parameter except pointers derived from it. GEPs derive pointers in a way that is explicitly permitted under that.

Here are my understanding: in the LLVM IR language manual (https://llvm.org/docs/LangRef.html#parameter-attributes), it says:

noalias

This indicates that memory locations accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value.

And in the pointer aliasing rules section (https://llvm.org/docs/LangRef.html#pointer-aliasing-rules), it says:

A pointer value is based on another pointer value according to the following rules:

  • A pointer value formed from a scalar getelementptr operation is based on the pointer-typed operand of the getelementptr.

So a gep result is a pointer value based on the argument. And since the parameter is marked as noalias, we shouldn't access the pointer during the execution of the function. Here is my reading. I am not 100% sure I got things right. @nikic could you take a look?

nikic added a comment.Dec 5 2022, 11:13 PM

@ChuanqiXu It's fine to access both the "plain" noalias argument and a GEP of the noalias argument. What is forbidden is to access the object the noalias argument points to through some other pointer that is unrelated to the argument. It's not really clear to me whether this is the case here, would it be possible to provide the IR directly after the CoroSplit pass?

@ChuanqiXu It's fine to access both the "plain" noalias argument and a GEP of the noalias argument. What is forbidden is to access the object the noalias argument points to through some other pointer that is unrelated to the argument. It's not really clear to me whether this is the case here, would it be possible to provide the IR directly after the CoroSplit pass?

Oh, so I got things wrong. Here is the IR after the CoroSplit pass for test() (and some other optimization):

define internal fastcc void @_ZL4testv.resume(ptr noalias nonnull align 8 dereferenceable(80) %0) #3 personality ptr @__gxx_personality_v0 {
resume.entry:
  %index.addr = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 4
  %index = load i2, ptr %index.addr, align 8
  %switch = icmp eq i2 %index, 0
  br i1 %switch, label %AfterCoroSave, label %AfterCoroSave65

AfterCoroSave:                                    ; preds = %resume.entry
  %.reload.addr = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 3
  %destroy.addr.i = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 3, i64 8
  store ptr @_ZL1av.cleanup, ptr %destroy.addr.i, align 8
  %__promise.reload.addr.i = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 3, i64 16
  %index.addr32.i = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 3, i64 32
  store i1 false, ptr %index.addr32.i, align 8
  store i2 1, ptr %index.addr, align 8
  %caller.i.i69 = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 3, i64 24
  store ptr %0, ptr %caller.i.i69, align 8, !tbaa.struct !10
  tail call void @llvm.experimental.noalias.scope.decl(metadata !18)
  store i32 42, ptr %__promise.reload.addr.i, align 8, !tbaa !3, !alias.scope !18
  store ptr null, ptr %.reload.addr, align 8, !alias.scope !18
  %1 = load ptr, ptr %0, align 8
  musttail call fastcc void %1(ptr nonnull %0) #5, !noalias !18
  ret void

AfterCoroSave65:                                  ; preds = %resume.entry
  %__promise.reload.addr = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 2
  %__promise.reload.addr.i68 = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 3, i64 16
  %2 = load i32, ptr %__promise.reload.addr.i68, align 8, !tbaa !3
  store i32 %2, ptr %__promise.reload.addr, align 8, !tbaa !3
  store ptr null, ptr %0, align 8
  %caller.i53 = getelementptr inbounds i8, ptr %0, i64 24
  %retval.sroa.0.0.copyload.i = load ptr, ptr %caller.i53, align 8, !tbaa.struct !10
  %3 = load ptr, ptr %retval.sroa.0.0.copyload.i, align 8
  musttail call fastcc void %3(ptr nonnull %retval.sroa.0.0.copyload.i) #5
  ret void
}

And this code is actually correct. Problems happens with its caller, the main function:

; Function Attrs: norecurse uwtable
define dso_local noundef i32 @main() local_unnamed_addr #1 personality ptr @__gxx_personality_v0 {
entry:
  %0 = alloca [80 x i8], align 8
  store ptr @_ZL4testv.resume, ptr %0, align 8
  %destroy.addr.i = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 1
  store ptr @_ZL4testv.cleanup, ptr %destroy.addr.i, align 8
  %__promise.reload.addr.i = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 2
  store i32 123, ptr %__promise.reload.addr.i, align 8, !tbaa !3
  %caller.i.i = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 2, i32 1
  store ptr @_ZNSt7__n486114__noop_coro_frE, ptr %caller.i.i, align 8, !tbaa.struct !10
  %index.addr70.i = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 4
  store i2 0, ptr %index.addr70.i, align 8
  call fastcc void @_ZL4testv.resume(ptr nonnull %0)
  %.pre = load i32, ptr %__promise.reload.addr.i, align 8, !tbaa !3
  %call1 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str, i32 noundef %.pre)
  call fastcc void @_ZL4testv.cleanup(ptr nonnull %0)
  ret i32 0
}

And after inlining _ZL4testv.resume into main, the main becomes:

; Function Attrs: norecurse uwtable
define dso_local noundef i32 @main() local_unnamed_addr #1 personality ptr @__gxx_personality_v0 {
entry:
  %0 = alloca [80 x i8], align 8
  store ptr @_ZL4testv.resume, ptr %0, align 8
  %destroy.addr.i = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 1
  store ptr @_ZL4testv.cleanup, ptr %destroy.addr.i, align 8
  %__promise.reload.addr.i = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 2
  store i32 123, ptr %__promise.reload.addr.i, align 8, !tbaa !3
  %caller.i.i = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 2, i32 1
  store ptr @_ZNSt7__n486114__noop_coro_frE, ptr %caller.i.i, align 8, !tbaa.struct !10
  %index.addr70.i = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 4
  store i2 0, ptr %index.addr70.i, align 8
  call void @llvm.experimental.noalias.scope.decl(metadata !12)
  %index.addr.i = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 4
  %index.i = load i2, ptr %index.addr.i, align 8, !alias.scope !12
  %switch.i = icmp eq i2 %index.i, 0
  br i1 %switch.i, label %AfterCoroSave.i, label %AfterCoroSave65.i

AfterCoroSave.i:                                  ; preds = %entry
  %.reload.addr.i = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 3
  %destroy.addr.i.i = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 3, i64 8
  store ptr @_ZL1av.cleanup, ptr %destroy.addr.i.i, align 8, !alias.scope !12
  %__promise.reload.addr.i.i = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 3, i64 16
  %index.addr32.i.i = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 3, i64 32
  store i1 false, ptr %index.addr32.i.i, align 8, !alias.scope !12
  store i2 1, ptr %index.addr.i, align 8, !alias.scope !12
  %caller.i.i69.i = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 3, i64 24
  store ptr %0, ptr %caller.i.i69.i, align 8, !tbaa.struct !10, !alias.scope !12
  call void @llvm.experimental.noalias.scope.decl(metadata !15)
  store i32 42, ptr %__promise.reload.addr.i.i, align 8, !tbaa !3, !alias.scope !18
  store ptr null, ptr %.reload.addr.i, align 8, !alias.scope !18
  %1 = call ptr @llvm.coro.subfn.addr(ptr nonnull %0, i8 0), !alias.scope !12, !noalias !15
  call fastcc void %1(ptr nonnull %0) #6, !noalias !15
  br label %_ZL4testv.resume.exit

AfterCoroSave65.i:                                ; preds = %entry
  %__promise.reload.addr.i6 = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 2
  %__promise.reload.addr.i68.i = getelementptr inbounds %_ZL4testv.Frame, ptr %0, i64 0, i32 3, i64 16
  %2 = load i32, ptr %__promise.reload.addr.i68.i, align 8, !tbaa !3, !alias.scope !12
  store i32 %2, ptr %__promise.reload.addr.i6, align 8, !tbaa !3, !alias.scope !12
  store ptr null, ptr %0, align 8, !alias.scope !12
  %caller.i53.i = getelementptr inbounds i8, ptr %0, i64 24
  %retval.sroa.0.0.copyload.i.i = load ptr, ptr %caller.i53.i, align 8, !tbaa.struct !10, !alias.scope !12
  %3 = call ptr @llvm.coro.subfn.addr(ptr %retval.sroa.0.0.copyload.i.i, i8 0), !noalias !12
  call fastcc void %3(ptr %retval.sroa.0.0.copyload.i.i) #6, !noalias !12
  br label %_ZL4testv.resume.exit

_ZL4testv.resume.exit:                            ; preds = %AfterCoroSave65.i, %AfterCoroSave.i
  %.pre = load i32, ptr %__promise.reload.addr.i, align 8, !tbaa !3
  %call1 = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str, i32 noundef %.pre)
  ret i32 0
}

Note that now the store i32 42, ptr %__promise.reload.addr.i.i is in !alias.scope !18, and the call call fastcc void %1(ptr nonnull %0) lives in !noalias !15. So that the DSE think the store is dead. And in https://github.com/llvm/llvm-project/blob/3cfe412e4c90718a3c5abe0aaf2209053a490982/llvm/lib/Transforms/Utils/InlineFunction.cpp#L1140-L1154, I see the noalias scope metadata is added if the parameter is noalias. So I thought I found the solution. And here is the patch. Remind me if you need a module level IR reproducer (since it may take some time to reduce it..)

nikic added a comment.Dec 7 2022, 7:44 AM

@ChuanqiXu Your first function already contains:

store i32 42, ptr %__promise.reload.addr.i.i, align 8, !tbaa !3, !alias.scope !18
store ptr null, ptr %.reload.addr, align 8, !alias.scope !18
%1 = load ptr, ptr %0, align 8
musttail call fastcc void %1(ptr nonnull %0) #5, !noalias !18

which says that the store and the call don't alias. I don't think the noalias argument on the function would even matter in that case. The incorrect noalias metadata must be introduced at some earlier point already.

ChuanqiXu abandoned this revision.Dec 8 2022, 7:13 PM

@ChuanqiXu Your first function already contains:

store i32 42, ptr %__promise.reload.addr.i.i, align 8, !tbaa !3, !alias.scope !18
store ptr null, ptr %.reload.addr, align 8, !alias.scope !18
%1 = load ptr, ptr %0, align 8
musttail call fastcc void %1(ptr nonnull %0) #5, !noalias !18

which says that the store and the call don't alias. I don't think the noalias argument on the function would even matter in that case. The incorrect noalias metadata must be introduced at some earlier point already.

Oh, so I misunderstood the semantics of !noalias before. So this patch may not be correct. BTW, do you have any ideas why the store in the first function would't be optimized out?

@ChuanqiXu Your first function already contains:

store i32 42, ptr %__promise.reload.addr.i.i, align 8, !tbaa !3, !alias.scope !18
store ptr null, ptr %.reload.addr, align 8, !alias.scope !18
%1 = load ptr, ptr %0, align 8
musttail call fastcc void %1(ptr nonnull %0) #5, !noalias !18

which says that the store and the call don't alias. I don't think the noalias argument on the function would even matter in that case. The incorrect noalias metadata must be introduced at some earlier point already.

Oh, so I misunderstood the semantics of !noalias before. So this patch may not be correct. BTW, do you have any ideas why the store in the first function would't be optimized out?

This is because the caller may read it. After inlining, we see that it's an alloca that nobody reads afterwards.

@ChuanqiXu Your first function already contains:

store i32 42, ptr %__promise.reload.addr.i.i, align 8, !tbaa !3, !alias.scope !18
store ptr null, ptr %.reload.addr, align 8, !alias.scope !18
%1 = load ptr, ptr %0, align 8
musttail call fastcc void %1(ptr nonnull %0) #5, !noalias !18

which says that the store and the call don't alias. I don't think the noalias argument on the function would even matter in that case. The incorrect noalias metadata must be introduced at some earlier point already.

Oh, so I misunderstood the semantics of !noalias before. So this patch may not be correct. BTW, do you have any ideas why the store in the first function would't be optimized out?

This is because the caller may read it. After inlining, we see that it's an alloca that nobody reads afterwards.

Oh, I got it. Thanks for clarification!

ChuanqiXu reclaimed this revision.Dec 12 2022, 12:34 AM
ChuanqiXu added a reviewer: bcl5980.

Although there were some misunderstandings, now this one looks like the correct fix still. See https://github.com/llvm/llvm-project/issues/59221 for the details. Long story short, for this C++ program (https://compiler-explorer.com/z/6qGcozG93), the optimized frame will be something like:

struct test_frame {
    void (*__resume_)(), // a function pointer points to the `test.resume` function, which can be imaged as the test() function in the example.
    ....
    struct a_frame {
          ...
          void **caller; // may points to test_frame at runtime.
    };
};

And the function a and function test looks just like:

define i32 @a(ptr noalias %alloc_8) {
  %alloc_8_16 = getelementptr ptr, ptr %alloc_8, i64 16
  store i32 42, ptr %alloc_8_16, align 8
  %alloc_8_8 = getelementptr ptr, ptr %alloc_8, i64 8
  %alloc = load ptr, ptr %alloc_8_8, align 8
  %p = load ptr, ptr %alloc, align 8
  %r = call i32 %p(ptr %alloc)
  ret i32 %r
}

define i32 @b(ptr %p) {
entry:
  %alloc = alloca [128 x i8], align 8
  %alloc_8 = getelementptr ptr, ptr %alloc, i64 8
  %alloc_8_8 = getelementptr ptr, ptr %alloc_8, i64 8
  store ptr %alloc, ptr %alloc_8_8, align 8
  store ptr %p, ptr %alloc, align 8
  %r = call i32 @a(ptr nonnull %alloc_8)
  ret i32 %r
}

Here inside the function a, we can access the parameter %alloc_8 by %alloc and we pass %alloc to an unknown function. So it breaks the assumption of noalias parameter.

Note that although only CoroElide optimization can put a frame inside another frame directly, the following case is not valid too:

struct test_frame {
    ....
   void **a_frame; // may points to a_frame at runtime.
};

 struct a_frame {
    void **caller; // may points to test_frame at runtime.
};

Since the C++ language allows the programmer to get the address of coroutine frames, we can't assume the above case wouldn't happen in the source codes. So we can't set the parameter as noalias no matter if CoroElide applies or not. And for other languages, it may be safe if they don't allow the programmers to get the address of coroutine frames.

nikic accepted this revision.Dec 12 2022, 1:29 AM

LG

clang/test/CodeGenCoroutines/pr59221.cpp
86

mitoptimized -> misoptimized?

llvm/lib/Transforms/Coroutines/CoroSplit.cpp
849–853

dereferenceable implies noundef, but I guess it doesn't hurt to be explicit about it.

850

I'd move this FIXME to the call for the retcon ABI.

This revision is now accepted and ready to land.Dec 12 2022, 1:29 AM
ChuanqiXu updated this revision to Diff 482328.Dec 12 2022, 6:16 PM
ChuanqiXu marked 3 inline comments as done.

Address comments.

This revision was landed with ongoing or failed builds.Dec 12 2022, 6:20 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptDec 12 2022, 6:20 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript