This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
32/50
CGCoroutine.cpp
-
test/CodeGenCoroutines/
-
CodeGenCoroutines/
1/2
coro-alloc.cpp
-
coro-cleanup.cpp
-
coro-gro.cpp
-
llvm/
-
docs/
3/6
Coroutines.rst
-
include/llvm/IR/
-
llvm/
-
IR/
-
Intrinsics.td
-
lib/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
1/4
CoroFrame.cpp
-
CoroInstr.h
-
CoroInternal.h
-
CoroSplit.cpp
-
Coroutines.cpp
-
test/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
-
coro-frame-overalign.ll

Differential D97915

[Coroutines] Handle overaligned frame allocation
AbandonedPublic

Authored by ychen on Mar 3 2021, 11:21 PM.

Download Raw Diff

Details

Reviewers

rjmccall
lxfind
GorNishanov
ChuanqiXu

Summary

by over-allocating and emitting alignTo code to adjust the frame start address.

Motivation: on a lot of machines, malloc returns >=std::max_align_t (usually just 16) aligned heap regardless of the coro frame's preferred alignment (usually specified using alignas() on the promise or some local variables). For non-coroutine-related context, this is handled by calling overloaded operator new where an alignment could be specified. For coroutine, spec here https://eel.is/c++draft/dcl.fct.def.coroutine#9.1 suggested that the alignment argument is not considered during name lookup.

Mathias Stearn and @lewissbaker suggested this is the proper workaround before the issue is addressed by the spec.

One example showing the issue: https://gcc.godbolt.org/z/rGzaco

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

ychen added inline comments.Mar 4 2021, 5:22 PM

clang/lib/CodeGen/CGBuiltin.cpp
4450 ↗	(On Diff #328232)	No, because the adjustment you have to do in `coro.alloc` isn't just an addition, it's an addition plus a mask, which isn't reversible. Suppose the frame needs to be 32-byte-aligned, but the allocator only promises 8-byte alignment. The problem is that when you go to free a frame pointer, and you see that it's 32-byte-aligned (which, again, it always will be), the pointer you got from the allocator might be the frame pointer minus any of 8, 16, or 24 — or it might be exactly the same. The only way to reverse that is to store some sort of cookie, either the amount to subtract or even just the original pointer. I got myself confused. This makes perfect sense. Now, if you could change the entire coroutine ABI, you could make the frame handle that you pass around be the unadjusted pointer and then just repeat the adjustment every time you enter the coroutine. But that doesn't work because the ABI relies on things like the promise being at a reliable offset from the frame handle. I think the best solution would be to figure out a way to use an aligned allocator, which at worst does this in a more systematic way and at best can actually just satisfy your requirement directly without any overhead. If you can't do that, adding an offset to the frame would be best; if you can't do that, doing it as a cookie is okay. This is very helpful. I'll explore the adding offset to the frame option first. If it is not plausible, I'll use the cookie method. Thanks!

I am a little confusing about the problem. The example in the link tells the align of the promise instead of the frame. The address of promise and frame is not same. It looks like you're trying to do:

+               +-----------------------------------+
|               |                                   |
+---------------+          frame                    |
| pedding       |                                   |
+               +-----------------------------------+
                ^
                |
                |
                |
                |
                |
                +

              The address of frame matches the offset of promise.

However, what we should do is:

+               +-----------------------------------+
|               |       +--------------+            |
+---------------+frame  | promise      |            |
| pedding       |       <--------------+            |
+               +-----------------------------------+
                ^       |
                |       |
                |       |
                |       |
                |       +
                |       This is what we really want
                +

              The address of frame matches the offset of promise.

If I get the problem problems, I think we can handle this problem in the middle end if the information for the promise remains.

clang/lib/CodeGen/CGBuiltin.cpp
16756 ↗	(On Diff #328232)	Why we remove the anonymous namespace here?

Harbormaster completed remote builds in B92120: Diff 328232.Mar 5 2021, 1:07 AM

In D97915#2605398, @ChuanqiXu wrote:

+               +-----------------------------------+
|               |                                   |
+---------------+          frame                    |
| pedding       |                                   |
+               +-----------------------------------+
                ^
                |
                |
                |
                |
                |
                +

              The address of frame matches the offset of promise.

However, what we should do is:

+               +-----------------------------------+
|               |       +--------------+            |
+---------------+frame  | promise      |            |
| pedding       |       <--------------+            |
+               +-----------------------------------+
                ^       |
                |       |
                |       |
                |       |
                |       +
                |       This is what we really want
                +

              The address of frame matches the offset of promise.

If I get the problem problems, I think we can handle this problem in the middle end if the information for the promise remains.

Not sure I follow. Inside the frame, the promise is in its desired position. It is not properly aligned because the frame start address is underaligned - malloc usually only returns 16 bytes aligned memory whereas alignas could make the preferred alignment larger than that.

clang/lib/CodeGen/CGBuiltin.cpp
16756 ↗	(On Diff #328232)	I added a common/helper function that takes `BuiltinAlignArgs` as an argument. Need to move it out of the anonymous namespace to forward declare it.

Let's try to avoid adding a new builtin for what we acknowledge is a workaround. Builtins become part of the language supported by the compiler, so we shouldn't add them casually.

In D97915#2607338, @rjmccall wrote:

Let's try to avoid adding a new builtin for what we acknowledge is a workaround. Builtins become part of the language supported by the compiler, so we shouldn't add them casually.

If we're going to use the aligned new in the future, do we still need this builtin, or something else is preferred?

In D97915#2607567, @ychen wrote:

In D97915#2607338, @rjmccall wrote:

Let's try to avoid adding a new builtin for what we acknowledge is a workaround. Builtins become part of the language supported by the compiler, so we shouldn't add them casually.

If we're going to use the aligned new in the future, do we still need this builtin, or something else is preferred?

Oh, sorry, for some reason I got the impression from the patch that we were adding a new Clang-level builtin. Adding a new LLVM intrinsic seems reasonable to me.

In any case, I don't think we should expose BuiltinAlignArgs outside of CGBuiltin.cpp. Seems like at most we need to add a convenience function on CGBuilderTy to do a pointer round-up-to-alignment operation.

lewissbaker added inline comments.Mar 5 2021, 5:41 PM

clang/lib/CodeGen/CGBuiltin.cpp
4450 ↗	(On Diff #328232)	There was a proposal to extend the coroutine specification with support for the align_val_t overloads of operator new() when allocating coroutine frames. See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2014r0.pdf Unfortunately this was not adopted at the time as it was proposed late in the C++20 cycle and there was not yet any implementation experience. So for now, if the compiler determines that the frame needs to be aligned to a value greater than the default alignment of global operator new it will need to overallocate, align the frame within that buffer and store the offset applied somewhere so that the it can reconstruct the address of the pointer returned from operator new() so that it can pass it to operator delete() on coroutine_handle::destroy(). Note that there was also a meeting to discuss ABI for coroutine frames with the intent that the major coroutines implementations would all (eventually) agree on a compatible coroutine ABI. The results of this meeting was written up in the doc https://docs.google.com/document/d/1t53lAuQNd3rPqN-VByZabwL6PL2Oyl4zdJxm-gQlhfU/edit?usp=sharing The end result was that we decided to place any padding needed to align the promise before the resume/destroy function pointers rather than place that padding in-between the function-pointers and the promise. The rationale here being that we can then calculate the address of the promise as a constant offset from the frame address (typically at an offset of two pointers into the frame) rather than the offset being variable depending on the promise type's alignment. This should help building of certain tooling / debuggers / walking async stack-traces etc. as we don't need to know the exact promise type to be able to determine the location of the promise. The compiler should know exactly how many bytes of padding was added at the start of the frame allocation to get to the frame address and so should be able to translate the coroutine frame address back to the allocation address before destruction - however this also may have an interplay with the support for overaligned frames (which may be required due to overaligned local variables/args and not only based on the promise-type), so I'm mentioning it here.

rjmccall added inline comments.Mar 5 2021, 7:15 PM

clang/lib/CodeGen/CGBuiltin.cpp
4450 ↗	(On Diff #328232)	Note that there was also a meeting to discuss ABI for coroutine frames with the intent that the major coroutines implementations would all (eventually) agree on a compatible coroutine ABI. Interesting! Did you consider reaching out to the Itanium C++ ABI group, which has prior expertise in the area of standardizing C++ ABIs? The end result was that we decided to place any padding needed to align the promise before the resume/destroy function pointers rather than place that padding in-between the function-pointers and the promise. That is an interesting choice. Is deriving the address of the promise within the frame without knowing what the promise type is actually something that clients need to do? It's not like coroutines carry any reflective information of the sort that exceptions do. Anyway, okay. So the function pointers are supposed to be "right-justified" so that the promise comes immediately afterwards, and the address point is supposed to point at the first function pointer. That is not what Clang implements, or has ever implemented, but I don't foresee any serious problems in adjusting the LLVM coroutine frame layout code to honor that. The compiler should know exactly how many bytes of padding was added at the start of the frame allocation to get to the frame address and so should be able to translate the coroutine frame address back to the allocation address before destruction - however this also may have an interplay with the support for overaligned frames (which may be required due to overaligned local variables/args and not only based on the promise-type), so I'm mentioning it here. Yes, this is basically only true of the adjustment you're talking about for the frame header with overaligned promises. Barring miracles, the only reasonable way to allocate one of these overaligned-promise frames is to round down to the next promise-alignment boundary and then allocate that, and that offset is indeed static given only of the promise type's alignment. But the larger allocation can still exceed allocator alignment, whether from the promise type or just local coroutine state, and that extra offset down to the allocated pointer will be dynamic. Fortunately, I don't think going from the frame pointer to the allocated pointer needs is an ABI-exposed operation, since the frame can only be destroyed by the coroutine itself. That means the details of allocation are essentially entirely implementation-private, and that includes how we adjust the frame pointer for deallocation. Am I missing something? Unfortunately this was not adopted at the time as it was proposed late in the C++20 cycle and there was not yet any implementation experience. Hmm. The role of implementation experience here would have been to point out that you hadn't considered over-alignment in your specification. It sounds more like you were running out of time to write the specification and just punted on an issue in the interests of getting the proposal into the standard. Regardless, it seems to me that the obviously correct design is that, if both an aligned and and an unaligned allocation function is available, it's unspecified which one is called, as long as the matching deallocation function is then called later. Are you suggesting that we must not do that?

lewissbaker added inline comments.Mar 7 2021, 4:13 PM

clang/lib/CodeGen/CGBuiltin.cpp
4450 ↗	(On Diff #328232)	Interesting! Did you consider reaching out to the Itanium C++ ABI group, which has prior expertise in the area of standardizing C++ ABIs? I don't recall if they were contacted, although I believe we did discuss doing so at the time. Maybe @GorNishanov remembers? There were LLVM devs (mainly from Google), GCC compiler devs and MS compiler team involved in the discussion. Is deriving the address of the promise within the frame without knowing what the promise type is actually something that clients need to do? This is something that I have found a desired to be able to do when implementing async stack trace walking. At the moment I ended up having to store basically two pointers to the parent coroutine-frame - one a coroutine_handle<void> so I can resume the parent coroutine and another pointer to an AsyncStackFrame stored within the promise so that I can walk to the parent frame. I would ideally only like to have to store the coroutine_handle and be able to determine from its address the address of the coroutine_handle pointing to the next coroutine-frame. e.g. by assuming that the promise_type has the continuation as the first data-member. At the moment I can't make this assumption because the offset from the coroutine_handle::address() to the promise might be variable depending on the concrete promise_type's alignment. That means the details of allocation are essentially entirely implementation-private, and that includes how we adjust the frame pointer for deallocation. Am I missing something? I agree with your analysis here. The role of implementation experience here would have been to point out that you hadn't considered over-alignment in your specification. Yes, this issue was identified, albeit fairly late, during the standardisation process as a result of implementation experience and raised and discussed. It sounds more like you were running out of time to write the specification and just punted on an issue in the interests of getting the proposal into the standard The proposal was already in the standard at the time the issue was identified. The problem was more that there were a couple of options for how to specify it and we didn't have any implementation experience of either way to inform which way should be chosen. So, yes, we effectively punted the decision until later. My preference is the more complicated design of "Option 1" described in P2014. it seems to me that the obviously correct design is that, if both an aligned and and an unaligned allocation function is available, it's unspecified which one is called, as long as the matching deallocation function is then called later Yes, I think this is pretty close to what "Option 1" describes, although I think it describes a slightly more involved overload resolution for deallocation functions than just "call the matching deallocation function". Are you suggesting that we must not do that? The current specification in C++20 only says that `operator new(size_t)` overloads are called. So a change that caused it to call `operator new(size_t, align_val_t)` overloads would be a non-conforming extension to C++20, although possibly one that users would be happy to have.

I am not sure how this would work, maybe I am missing something.
But this patch tries to round up the frame pointer by looking at the difference between the alignment of new and the alignment of the frame.
The alignment of new only gives you the guaranteed alignment for new, but not necessarily the maximum alignment, e.g. if the alignment of new is 16, the returned pointer can still be a multiple 32. And that difference matters.

Let's consider a frame that only has the two pointers and a promise with alignment requirement of 64. The alignment of new is 16.
Now you will calculate the difference to be 48, and create a padding of 48 before the frame:
But if the returned pointer from new is actually a multiple of 32 (but not 64), the frame will no longer be aligned to 64 (but (32 + 48) % 64 = 16).
So from what I can tell, if we cannot pass alignment to new, we need to look at the address returned by new dynamically to decide the padding.

In D97915#2632493, @lxfind wrote:

I am not sure how this would work, maybe I am missing something.
But this patch tries to round up the frame pointer by looking at the difference between the alignment of new and the alignment of the frame.
The alignment of new only gives you the guaranteed alignment for new, but not necessarily the maximum alignment, e.g. if the alignment of new is 16, the returned pointer can still be a multiple 32. And that difference matters.

Let's consider a frame that only has the two pointers and a promise with alignment requirement of 64. The alignment of new is 16.
Now you will calculate the difference to be 48, and create a padding of 48 before the frame:
But if the returned pointer from new is actually a multiple of 32 (but not 64), the frame will no longer be aligned to 64 (but (32 + 48) % 64 = 16).

48 is the maximal possible adjustment needed. For this particular case, EmitBuiltinAlignTo would make the real adjustment 32 since (32 + 32) % 64 == 0.

So from what I can tell, if we cannot pass alignment to new, we need to look at the address returned by new dynamically to decide the padding.

Indeed, that's what EmitBuiltinAlignTo is for.

ychen mentioned this in D100739: [Coroutines] Handle overaligned frame allocation (2).Apr 18 2021, 10:39 PM

Pursue D100739 instead.

ychen reclaimed this revision.Apr 29 2021, 12:03 AM

Handle deallocation.
Fix tests.

@rjmccall the patch is on the large side. I'll submit a separate patch for the Sema part about searching for two allocators.

Harbormaster completed remote builds in B101573: Diff 341418.Apr 29 2021, 2:06 AM

Found a bug. Will fix.

fix a bug.

ready for review.

fix typo

For coroutine f0 in test/CodeGenCoroutines/coro-alloc.cpp

The allocation looks like this:

; Function Attrs: noinline nounwind optnone mustprogress
define dso_local void @f0() #0 {
entry:
  %0 = alloca %struct.global_new_delete_tag, align 1
  %1 = alloca %struct.global_new_delete_tag, align 1
  %__promise = alloca %"struct.std::experimental::coroutine_traits<void, global_new_delete_tag>::promise_type", align 1
  %ref.tmp = alloca %struct.suspend_always, align 1
  %undef.agg.tmp = alloca %struct.suspend_always, align 1
  %agg.tmp = alloca %"struct.std::experimental::coroutine_handle", align 1
  %agg.tmp2 = alloca %"struct.std::experimental::coroutine_handle.0", align 1
  %undef.agg.tmp3 = alloca %"struct.std::experimental::coroutine_handle.0", align 1
  %ref.tmp4 = alloca %struct.suspend_always, align 1
  %undef.agg.tmp5 = alloca %struct.suspend_always, align 1
  %agg.tmp7 = alloca %"struct.std::experimental::coroutine_handle", align 1
  %agg.tmp8 = alloca %"struct.std::experimental::coroutine_handle.0", align 1
  %undef.agg.tmp9 = alloca %"struct.std::experimental::coroutine_handle.0", align 1
  %2 = bitcast %"struct.std::experimental::coroutine_traits<void, global_new_delete_tag>::promise_type"* %__promise to i8*
  %3 = call token @llvm.coro.id(i32 16, i8* %2, i8* null, i8* null)
  %4 = call i1 @llvm.coro.alloc(token %3)
  br i1 %4, label %coro.alloc, label %coro.init

coro.alloc:                                       ; preds = %entry
  %5 = call i64 @llvm.coro.size.i64()
  %6 = call i64 @llvm.coro.align.i64()
  %7 = sub nsw i64 %6, 16
  %8 = icmp sgt i64 %7, 0
  %9 = select i1 %8, i64 %7, i64 0
  %10 = add i64 %5, %9
  %call = call noalias nonnull i8* @_Znwm(i64 %10) #11
  br label %coro.check.align

coro.check.align:                                 ; preds = %coro.alloc
  %11 = call i64 @llvm.coro.align.i64()
  %12 = icmp ugt i64 %11, 16
  br i1 %12, label %coro.alloc.align, label %coro.init

coro.alloc.align:                                 ; preds = %coro.check.align
  %mask = sub i64 %11, 1
  %intptr = ptrtoint i8* %call to i64
  %over_boundary = add i64 %intptr, %mask
  %inverted_mask = xor i64 %mask, -1
  %aligned_intptr = and i64 %over_boundary, %inverted_mask
  %diff = sub i64 %aligned_intptr, %intptr
  %aligned_result = getelementptr inbounds i8, i8* %call, i64 %diff
  call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 %11) ]
  %13 = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
  %14 = getelementptr inbounds i8, i8* %aligned_result, i32 %13
  %15 = bitcast i8* %14 to i8**
  store i8* %call, i8** %15, align 8
  br label %coro.init

coro.init:                                        ; preds = %coro.alloc.align, %coro.check.align, %entry
  %16 = phi i8* [ null, %entry ], [ %call, %coro.check.align ], [ %aligned_result, %coro.alloc.align ]
  %17 = call i8* @llvm.coro.begin(token %3, i8* %16)
  call void @_ZNSt12experimental16coroutine_traitsIJv21global_new_delete_tagEE12promise_type17get_return_objectEv(%"struct.std::experimental::coroutine_traits<void, global_new_delete_tag>::promise_type"* nonnull dereferenceable(1) %__promise)
  call void @_ZNSt12experimental16coroutine_traitsIJv21global_new_delete_tagEE12promise_type15initial_suspendEv(%"struct.std::experimental::coroutine_traits<void, global_new_delete_tag>::promise_type"* nonnull dereferenceable(1) %__promise)
  %call1 = call zeroext i1 @_ZN14suspend_always11await_readyEv(%struct.suspend_always* nonnull dereferenceable(1) %ref.tmp) #2
  br i1 %call1, label %init.ready, label %init.suspend

The deallocation looks like this:

cleanup:                                          ; preds = %final.ready, %final.cleanup, %init.cleanup
  %cleanup.dest.slot.0 = phi i32 [ 0, %final.ready ], [ 2, %final.cleanup ], [ 2, %init.cleanup ]
  %22 = call i8* @llvm.coro.free(token %3, i8* %17)
  %23 = icmp ne i8* %22, null
  br i1 %23, label %coro.free, label %after.coro.free

coro.free:                                        ; preds = %cleanup
  %24 = call i64 @llvm.coro.align.i64()
  %25 = icmp ugt i64 %24, 16
  %26 = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
  %27 = getelementptr inbounds i8, i8* %22, i32 %26
  %28 = bitcast i8* %27 to i8**
  %29 = load i8*, i8** %28, align 8
  %30 = select i1 %25, i8* %29, i8* %22
  call void @_ZdlPv(i8* %30) #2
  br label %after.coro.free

after.coro.free:                                  ; preds = %cleanup, %coro.free
  switch i32 %cleanup.dest.slot.0, label %unreachable [
    i32 0, label %cleanup.cont
    i32 2, label %coro.ret
  ]

cleanup.cont:                                     ; preds = %after.coro.free
  br label %coro.ret

coro.ret:                                         ; preds = %cleanup.cont, %after.coro.free, %final.suspend, %init.suspend
  %31 = call i1 @llvm.coro.end(i8* null, i1 false)
  ret void

unreachable:                                      ; preds = %after.coro.free
  unreachable
}

Harbormaster completed remote builds in B101695: Diff 341591.Apr 29 2021, 2:20 PM

Harbormaster completed remote builds in B101696: Diff 341592.

May I ask a question may be too simple? What if the user specify the alignment for promise (or any other local variables) to 128 or even 256? Since it looks like all the discuss before assumes that the largest alignment requirement is 64.

In D97915#2727759, @ChuanqiXu wrote:

May I ask a question may be too simple? What if the user specify the alignment for promise (or any other local variables) to 128 or even 256? Since it looks like all the discuss before assumes that the largest alignment requirement is 64.

64 is one example. Bitwise operations (coro.alloc.align block in the attached example) should handle all valid alignment numbers.

Add missed Shape.CoroRawFramePtrOffsets.clear();

In D97915#2727787, @ychen wrote:

In D97915#2727759, @ChuanqiXu wrote:

May I ask a question may be too simple? What if the user specify the alignment for promise (or any other local variables) to 128 or even 256? Since it looks like all the discuss before assumes that the largest alignment requirement is 64.

64 is one example. Bitwise operations (coro.alloc.align block in the attached example) should handle all valid alignment numbers.

Thanks for the example. And I recommended to add comment for the corresponding code. The code for bit-operation and the example confused me. I would look into this and the other part later.

Harbormaster completed remote builds in B101818: Diff 341756.Apr 29 2021, 11:36 PM

This code snippets confused me before:

coro.alloc.align:                                 ; preds = %coro.check.align
  %mask = sub i64 %11, 1
  %intptr = ptrtoint i8* %call to i64
  %over_boundary = add i64 %intptr, %mask
  %inverted_mask = xor i64 %mask, -1
  %aligned_intptr = and i64 %over_boundary, %inverted_mask
  %diff = sub i64 %aligned_intptr, %intptr
  %aligned_result = getelementptr inbounds i8, i8* %call, i64 %diff

This code implies that %diff > 0. Formally, given Align = 2^m, m > 4 and Address=16n, we need to prove that:

(Address + Align -16)&(~(Align-1)) >= Address

&(~Align-1) would make the lowest m bit to 0. And Align-16 equals to 2^m - 16, which is 16*(2^(m-4)-1). Then Address + Align -16 could be 16*(n+2^(m-4)-1).
Then we call X for the value of the lowest m bit of Address + Align -16.
Because X has m bit, so X <= 2^m - 1. Noticed that X should be 16 aligned, so the lowest 4 bit should be zero.
Now,

X <= 2^m - 1 -1 - 2 - 4 - 8 = 2^m - 16

So the inequality we need prove now should be:

16*(n+2^(m-4)-1) - X >= 16n

Given X has the largest value wouldn't affect the inequality, so:

16*(n+2^(m-4)-1) - 2^m + 16 >= 16n

which is very easy now.

The overall prove looks non-travel to me. I spent some time to figure it out. I guess there must be some other people who can't get it immediately. I strongly recommend to add comment and corresponding prove for this code.

In D97915#2728377, @ChuanqiXu wrote:
This code snippets confused me before:
coro.alloc.align:                                 ; preds = %coro.check.align
  %mask = sub i64 %11, 1
  %intptr = ptrtoint i8* %call to i64
  %over_boundary = add i64 %intptr, %mask
  %inverted_mask = xor i64 %mask, -1
  %aligned_intptr = and i64 %over_boundary, %inverted_mask
  %diff = sub i64 %aligned_intptr, %intptr
  %aligned_result = getelementptr inbounds i8, i8* %call, i64 %diff
This code implies that %diff > 0. Formally, given Align = 2^m, m > 4 and Address=16n, we need to prove that:
(Address + Align -16)&(~(Align-1)) >= Address
&(~Align-1) would make the lowest m bit to 0. And Align-16 equals to 2^m - 16, which is 16*(2^(m-4)-1). Then Address + Align -16 could be 16*(n+2^(m-4)-1).
Then we call X for the value of the lowest m bit of Address + Align -16.
Because X has m bit, so X <= 2^m - 1. Noticed that X should be 16 aligned, so the lowest 4 bit should be zero.
Now,
X <= 2^m - 1 -1 - 2 - 4 - 8 = 2^m - 16
So the inequality we need prove now should be:
16*(n+2^(m-4)-1) - X >= 16n
Given X has the largest value wouldn't affect the inequality, so:
16*(n+2^(m-4)-1) - 2^m + 16 >= 16n
which is very easy now.

The overall prove looks non-travel to me. I spent some time to figure it out. I guess there must be some other people who can't get it immediately. I strongly recommend to add comment and corresponding prove for this code.

The code is equivalent to

(Address + Align -1)&(~(Align-1)) >= Address

which should be correct. It is implemented by CodeGenFunction::EmitBuiltinAlignTo.

Plan to rebase this together with the following patch for two lookups (aligned and non-aligned new/delete, and generate code accordingly)

Rebase on D102145.
Dynamically adjust the alignment for allocation and deallocation if the selected allocator does not have std::align_val_t argument. Otherwise, use the aligned allocation/deallocation function.

ychen added a parent revision: D102145: [Coroutines] Add `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and `llvm.coro.raw.frame.ptr.addr` intrinsics.May 9 2021, 7:43 PM

Harbormaster completed remote builds in B103419: Diff 343956.May 9 2021, 7:49 PM

Rebase

ychen added a child revision: D102147: [Clang][Coroutines] Implement P2014R0 Option 1 behind -fcoroutines-aligned-alloc.May 9 2021, 8:34 PM

ChuanqiXu mentioned this in D102145: [Coroutines] Add `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and `llvm.coro.raw.frame.ptr.addr` intrinsics.May 9 2021, 8:53 PM

Harbormaster completed remote builds in B103421: Diff 343959.May 9 2021, 9:18 PM

which should be correct. It is implemented by CodeGenFunction::EmitBuiltinAlignTo.

I agree it is correct. I just want to say we should comment it to avoid confusing.

Since the patch could handle the case if the frontend tries to search ::operator new(size_t, align_val_t), this patch should be based on D102147.

Rebase on updated D102145 (use llvm.coro.raw.frame.ptr.addr during allocation)

In D97915#2747142, @ChuanqiXu wrote:

which should be correct. It is implemented by CodeGenFunction::EmitBuiltinAlignTo.

I agree it is correct. I just want to say we should comment it to avoid confusing.

Happy to do it in a separate patch since this patch does not change the implementation of CodeGenFunction::EmitBuiltinAlignTo.

Since the patch could handle the case if the frontend tries to search ::operator new(size_t, align_val_t), this patch should be based on D102147.

This patch *could* handle both aligned and normal new/delete, so it doesn't need D102147 to work correctly?
D102147 depends on this patch since it may find a non-aligned new/delete for overaligned frame. In such a case, this patch is required.

Harbormaster completed remote builds in B103890: Diff 344602.May 11 2021, 6:40 PM

ChuanqiXu added inline comments.May 12 2021, 5:00 AM

clang/include/clang/AST/StmtCXX.h
356–359 ↗	(On Diff #344602)	Can't we merge these?
clang/lib/CodeGen/CGCoroutine.cpp
445–483	This code would only work if we use `::operator new(size_t, align_val_t)`, which is implemented in another patch. I would suggest to move this into that one.
522–542	It looks like it would emit a `deallocate` first, and emit an `alignedDeallocate`, which is very odd. Although I can find that the second `deallocate` wouldn't be emitted due to the check `LastCoroFreeUsedForDealloc`, it is still very odd to me. If the second `deallocate` wouldn't come up all the way, what's the reason we need to write `emit(deallocate)` twice?
735	Since `hasAlignArg` is called only once, I suggested to make it a lambda here which would make the code more easy to read.
737–739	I recommend to add a detailed comment here to tell the story why we need to over allocate the frame. It is really hard to understand for people who are new to this code. Otherwise, I think they need to use `git blame` to find the commit id and this review page to figure the reasons out.
741–763	It may be better to organize it as: if (!HasAlignArg) { if (auto RetOnAllocFailure = S.getReturnStmtOnAllocFailure()) { auto Cond = Builder.CreateICmpNE(AlignedAllocateCall, NullPtr); AlignAllocBB2 = createBasicBlock("coro.alloc.align2"); Builder.CreateCondBr(Cond, AlignAllocBB2, RetOnFailureBB); EmitBlock(AlignAllocBB2); } auto *CoroAlign = Builder.CreateCall( CGM.getIntrinsic(llvm::Intrinsic::coro_align, SizeTy)); ... }
919	Is it possible that it would return a nullptr value?

ychen marked an inline comment as done.May 12 2021, 9:59 PM

ychen added inline comments.

clang/include/clang/AST/StmtCXX.h
356–359 ↗	(On Diff #344602)	I'm not sure about the "merge" here. Could you be more explicit?
clang/lib/CodeGen/CGCoroutine.cpp
445–483	It handles both aligned and normal new/delete.
522–542	Agree that `LastCoroFreeUsedForDealloc` is a bit confusing. It makes sure deallocation and aligned deallocation share one `coro.free`. Otherwise, AFAIK, there would be two `coro.free` get codegen'd. %mem = llvm.coro.free() br i1 <overalign> , label <aligend-dealloc>, label <dealloc> aligend-dealloc: use %mem dealloc: use %mem what's the reason we need to write emit(deallocate) twice? John wrote a code snippet here: https://reviews.llvm.org/D100739#2717582. I think it would be helpful to look at the changed tests below to see the patterns. Basically, for allocation, it looks like below; for deallocation, it would be similar. void rawFrame =nullptr; ... if (llvm.coro.alloc()) { size_t size = llvm.coro.size(), align = llvm.coro.align(); if (align > NEW_ALIGN) { #if <an allocation function without std::align_val_t argument is selected by Sema> size += align - NEW_ALIGN + sizeof(void); frame = operator new(size); rawFrame = frame; frame = (frame + align - 1) & ~(align - 1); #else // If an aligned allocation function is selected. frame = operator new(size, align); #endif } else { frame = operator new(size); } } The true branch of the #if directive is equivalent to "coro.alloc.align" block (and "coro.alloc.align2" if `get_return_object_on_allocation_failure` is defined), the false branch is equivalent to "coro.alloc" block. The above pattern handles both aligned/normal allocation/deallocation so it is independent of D102147.
735	will do
737–739	will do.
919	Not that I know of. Because there is an early return if (!CoroFree) { CGF.CGM.Error(Deallocate->getBeginLoc(), "Deallocation expressoin does not refer to coro.free"); return; }

Address feedbacks.

ychen marked 3 inline comments as done.May 12 2021, 10:58 PM

ChuanqiXu added inline comments.May 12 2021, 11:06 PM

clang/include/clang/AST/StmtCXX.h
356–359 ↗	(On Diff #344602)	Sorry. I mean if we can merge `Allocate` with `AlignedAllocate` and merge `Deallocate` with `AlignedDeallocate`. Since from the implementation, it looks like the value of `Allocate` and `AlignedAllocate` (so as `Deallocate` and `AlignedDeallocate`) are the same.
clang/lib/CodeGen/CGCoroutine.cpp
522–542	Thanks. I get the reason why I am thinking the code isn't natural. Since I think `::operator new(size_t, align_val_t)` shouldn't come up in this patch which should be available after D102147 applies. Here you said this patch is independent with D102147, I believe this patch could work without D102147. But it contains the codes which would work only if we applies the successor patch, so I think it is dependent on D102147. The ideally relationship for me is to merge `D102145` into this one (Otherwise it is weird for me that `D102145` only introduces some intrinsics which wouldn't be used actually). Then this patch should handle the alignment for variables in coroutine frame without introducing `::new(size_t, align_val_t)`. Then the final patch could do the job that searching and generating code for `::new(size_t, align_val_t)`. Maybe it is a little bit hard to rebase again and again. But I think it is better.
919	Do you think it is better to merge this check here? if (CurCoro.Data && CurCoro.Data->LastCoroFreeUsedForDealloc) { if (!CoroFree) { CGF.CGM.Error(Deallocate->getBeginLoc(), "Deallocation expressoin does not refer to coro.free"); return something; } return RValue::get(CurCoro.Data->LastCoroFree); }

ychen added inline comments.May 12 2021, 11:19 PM

clang/lib/CodeGen/CGCoroutine.cpp
522–542	I think I know where the confusion comes from. `AlignedDeallocate` is not guaranteed to be an aligned allocator. In this patch in `SemaCoroutine.cpp`, it is set to `Deallocate` in which case we always dynamically adjust frame alignment. Once D102147 is landed. `AlignedDeallocate` may or may not be an aligned allocator. The ideally relationship for me is to merge D102145 into this one (Otherwise it is weird for me that D102145 only introduces some intrinsics which wouldn't be used actually). Then this patch should handle the alignment for variables in coroutine frame without introducing ::new(size_t, align_val_t). Then the final patch could do the job that searching and generating code for ::new(size_t, align_val_t). I was worried about the size of the patch if this is merged with D102145 but if that is preferred by more than one reviewer, I'll go ahead and do that. D102145 is pretty self-contained in that it does not contain clients of the added intrinsics but the introduced test should cover the expected intrinsic lowering.

ychen added inline comments.May 12 2021, 11:22 PM

clang/include/clang/AST/StmtCXX.h
356–359 ↗	(On Diff #344602)	Oh, this is to set the path for D102147 where `Allocate` and `AlignedAllocate` could be different. If I do this in D102147, it will also touch the `CGCoroutine.cpp` which I'm trying to avoid` since it is intended to be a Sema only patch.

ychen added inline comments.May 12 2021, 11:29 PM

clang/lib/CodeGen/CGCoroutine.cpp
522–542	Naming is hard. I had a hard time figuring out a better name. `AlignedDeallocate`/`AlignedAllocate` is intended to refer to allocator/deallocator used for handling overaligned frame. Not that they are referring to allocator/deallocator with std::align_val_t argument.

ChuanqiXu added inline comments.May 12 2021, 11:39 PM

clang/include/clang/AST/StmtCXX.h
356–359 ↗	(On Diff #344602)	Yeah, this is the key different point between us. I think that `D102147` could and should to touch the CodeGen part.
clang/lib/CodeGen/CGCoroutine.cpp
522–542	I think it is better for me to merge `D102145` into this one to understand this patch. For example, the test cases in `D102145` looks weird to me since it doesn't do over alignment at all like we discussed in that thread. Maybe my understanding is not right, but I think it isn't pretty self-contained. I am OK to wait for opinions from other reviewers.

Harbormaster completed remote builds in B104215: Diff 345053.May 12 2021, 11:52 PM

Merge D102145 by @ChuanqiXu's request.

ychen removed a parent revision: D102145: [Coroutines] Add `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and `llvm.coro.raw.frame.ptr.addr` intrinsics.Jun 15 2021, 10:48 AM

Harbormaster completed remote builds in B109335: Diff 352189.Jun 15 2021, 10:48 AM

In D97915#2819871, @ychen wrote:

Merge D102145 by @ChuanqiXu's request.

Thanks a lot. Could you add me as a reviewer? So that I can see this patch in my main page.

ychen added a reviewer: ChuanqiXu.Jun 15 2021, 7:14 PM

It looks not easy to review completely. After a rough review, the first suggest is to remove the contents that should be in 'D102147', it makes the complex problem more complex I think.
I would try to do more detailed review and test it if possible.

clang/include/clang/AST/StmtCXX.h
356–359 ↗	(On Diff #344602)	As we discussed before, I prefer to merge `Allocate` and `AlignedAllocate` (also Deallocate and AlignedDeallocate ) in this patch. It looks weird the are the same in one commit.
331–335 ↗	(On Diff #352189)	Minor issue: the intention of the comments should be the same.
clang/lib/CodeGen/CodeGenFunction.h
1920–1921 ↗	(On Diff #352189)	We shouldn't add this interface. The actual type for the first argument is BuiltinAlignArgs*, which defined in .cpp files. The signature is confusing.

ChuanqiXu added inline comments.Jun 15 2021, 11:23 PM

clang/lib/CodeGen/CGCoroutine.cpp
420	We should capitalize it as 'OverAllocateFrame'
744–749	`if (HasAlignArg)` should be the content of the next patch 'D102147', right? I don't think they should come here.
786–787	Maybe we could calculate it in place instead of trying to call a function which is not designed for llvm::value*. It looks like the calculation isn't too complex.
796	Does here miss a branch to InitBB?

Remove AlignedAllocator & AlignedDeallocator.
Use 'OverAllocateFrame'
Get rid of if (HasAlignArg)

clang/lib/CodeGen/CGCoroutine.cpp
786–787	I'm open to not calling `EmitBuiltinAlignTo`, which basically inline the useful parts of `EmitBuiltinAlignTo`. The initial intention is code sharing and easy readability. What's the benefit of not calling it?
796	`EmitBlock` would handle the case.
clang/lib/CodeGen/CodeGenFunction.h
1920–1921 ↗	(On Diff #352189)	This is a private function, supposedly only meaningful for the implementation. In that situation do you think it's bad?

It misses the part in llvm now.

clang/lib/CodeGen/CGCoroutine.cpp
437–475	We don't need this in this patch.
477	Capitalize `EmitCheckAlignBasicBlock`
556–557	Do we still need this change?
786–787	Reusing code is good. But my main concern is that the design for the interfaces. The current design smells bad to me. Another reason for implement it in place is I think it is not very complex and easy to understand. Another option I got is to implement `EmitBuitinAlign` in LLVM (someplace like `Local`), then the CodeGenFunction:: EmitBuitinAlign and this function could use it.
clang/lib/CodeGen/CodeGenFunction.h
1920–1921 ↗	(On Diff #352189)	It makes no sense to me that we can add interfaces casually if it is private. For the users of Clang/LLVM, it may be OK since they wouldn't look into the details. But for the developers, it matters. For example, I must be very confused when I see this signature. Why is the type of `Args` is void*? What's the type should I passed in? The smell is really bad.

Harbormaster completed remote builds in B109642: Diff 352617.Jun 17 2021, 5:03 AM

Rebase correctly

Not use void * in EmitBuiltinAlignTo signature.

clang/lib/CodeGen/CGCoroutine.cpp
437–475	Do you mean `// Match size_t argument with the one used during allocation.` or the function `emitDynamicAlignedDealloc`? I think either is needed here. Could you please elaborate?
556–557	Nope
786–787	Reusing code is good. But my main concern is that the design for the interfaces. The current design smells bad to me. Another reason for implement it in place is I think it is not very complex and easy to understand. Another option I got is to implement `EmitBuitinAlign` in LLVM (someplace like `Local`), then the CodeGenFunction:: EmitBuitinAlign and this function could use it.
clang/lib/CodeGen/CodeGenFunction.h
1920–1921 ↗	(On Diff #352189)	It makes no sense to me that we can add interfaces casually if it is private. For the users of Clang/LLVM, it may be OK since they wouldn't look into the details. But for the developers, it matters. For example, I must be very confused when I see this signature. Why is the type of `Args` is void*? What's the type should I passed in? The smell is really bad.

Harbormaster completed remote builds in B109858: Diff 352908.Jun 18 2021, 1:18 PM

A remained question.

what's the semantics if user specified their allocation/deallocation functions?

Previously, we discussed for the ::operator new and ::operator delete. But what would we do for user specified allocation/deallocation functions?
It looks like we would treat them just like ::operator new. And it makes sense at the first glance. But the problem comes from that we judge whether
or not to over allocate a frame by this condition:

coro.align > align of new

But if the user uses their new, it makes no sense to use the align of new as the condition. On the other hand, if user specified their new function and the
alignment requirement for their promise type, would it be better really that the compiler do the extra transformation?

May be we need to discuss with other guys who are more familiar with the C++ standard to answer this.

clang/lib/CodeGen/CGCoroutine.cpp
420	It looks like we'd better to add comment for this function.
421	CoroSizeIdx should be zero all the time in this patch.
431–433	In other comments, I find 'size += align - NEW_ALIGN + sizeof(void);'. But I don't find sizeof(void) in this function.
437–475	Sorry for that I misunderstand this function earlier.
452–461	We allocate overaligned-frame like: \| --- for align --- \| --- true frame --- \| And here we set the argument to the address of true frame. Then I wonder how about the memory for the `for align` part. Would we still free them correctly? Maybe I missed something.
466–472	We don't need to handle this condition in this patch.
693	It may be better to rename AlignAllocBB2 as AlignAllocBBCont or something similar.
766	It looks better to add an assert for RetOnFailureBB. I think it may be nullptr at the first glance.
780–781	We remove this assignment and use AlignedUpAddr directly in the following.
786–787	I guess you forgot to reply what you want to say.
clang/lib/CodeGen/CodeGenFunction.h
1920–1921 ↗	(On Diff #352189)	It looks like you missed something?
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1133–1137	Why we move this snippet to the front? Although it is not defined, the layout for the current frame would be: \| resume func addr \| destroy func addr \| promise \| other things needed \| Move this to the front may break this.
1141–1144	We need to edit the comment too.

In D97915#2829581, @ChuanqiXu wrote:
A remained question.

what's the semantics if user specified their allocation/deallocation functions?

Previously, we discussed for the ::operator new and ::operator delete. But what would we do for user specified allocation/deallocation functions?
It looks like we would treat them just like ::operator new. And it makes sense at the first glance. But the problem comes from that we judge whether
or not to over allocate a frame by this condition:
coro.align > align of new
But if the user uses their new, it makes no sense to use the align of new as the condition. On the other hand, if user specified their new function and the
alignment requirement for their promise type, would it be better really that the compiler do the extra transformation?

May be we need to discuss with other guys who are more familiar with the C++ standard to answer this.

I think @rjmccall could answer these. My understanding is that user-specified allocation/deallocation has the same semantics as their standard library's counterparts. align of new (aka STDCPP_DEFAULT_NEW_ALIGNMENT) should apply to both.

clang/lib/CodeGen/CGCoroutine.cpp
431–433	Sorry, that's a stale comment. It should be `size += align - NEW_ALIGN`. The `sizeof(void)` was supposed for the newly added raw memory pointer stored in the frame. In the current implementation, `sizeof(void)` is factored into the `llvm.coro.size()` calculation because CoroFrame is responsible for allocating the extra raw memory pointer if it is needed at all.
452–461	Would we still free them correctly? Yes, that's the tricky part. Using `f0` of `coro-alloc.cpp` as an example, `llvm.coro.raw.frame.ptr.addr` is called at alloc time to save the raw memory pointer to the coroutine frame. Later at dealloc time, `llvm.coro.raw.frame.ptr.addr` is called again to load the raw memory pointer back and free it.
466–472	This handling is for `sized delete` (`void T::operator delete ( void* ptr, std::size_t sz );`) instead of `aligned delete`. `sized delete` needs the same `size` that is used for `new`. Please check the `f3` test in `coro-alloc.cpp` (The test was missing the CHECK lines for this, I've added it.).
786–787	Yep, I meant to say the use of "void *" is removed.
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1133–1137	The intent is to structure the code better, no intention to change the frame layout here. My understanding is that `promise` already has a fixed offset ahead of this. `FrameData::Allocas` is ordered but there is no defined semantics. There seem no tests failing due to reordered frame layout. However, I might be wrong. Could you describe how it changes the layout?

Address comments

Harbormaster completed remote builds in B110353: Diff 353567.Jun 22 2021, 2:34 AM

In D97915#2832446, @ychen wrote:
In D97915#2829581, @ChuanqiXu wrote:
A remained question.

what's the semantics if user specified their allocation/deallocation functions?

Previously, we discussed for the ::operator new and ::operator delete. But what would we do for user specified allocation/deallocation functions?
It looks like we would treat them just like ::operator new. And it makes sense at the first glance. But the problem comes from that we judge whether
or not to over allocate a frame by this condition:
coro.align > align of new
But if the user uses their new, it makes no sense to use the align of new as the condition. On the other hand, if user specified their new function and the
alignment requirement for their promise type, would it be better really that the compiler do the extra transformation?

May be we need to discuss with other guys who are more familiar with the C++ standard to answer this.
I think @rjmccall could answer these. My understanding is that user-specified allocation/deallocation has the same semantics as their standard library's counterparts. align of new (aka STDCPP_DEFAULT_NEW_ALIGNMENT) should apply to both.

Yeah, I just curious about the question and not sure about the answer yet. I agree with that it should be safe if we treat user-specified allocation/deallocation as ::operator new. Maybe I am a little bit of pedantry. I just not sure if the developer would be satisfied when they find we allocate padding space they didn't want when they offered a new/delete method. (Maybe I am too anxious).

Another problem I find in this patch is that the introduction for raw frame makes the frame confusing. For example, the following codes is aim to calculate the size for the over allocated frame:

%[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
%[[NEWSIZE:.+]] = add i64 %[[SIZE]], %[[PAD]]
%[[MEM2:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[NEWSIZE]])

It makes sense only if llvm.coro.size stands for the size of 'true frame' (I am not sure what's the meaning for raw frame now. Let me use 'true frame' temporarily). But the document now says that '@llvm.coro.size' returns the size of the frame. It's confusing
I am not require to fix it by rename 'llvm.coro.size' or rephrase the document simply. I am thinking about the the semantics of coroutine intrinsics after we introduce the concept of 'raw frame'.

clang/lib/CodeGen/CGCoroutine.cpp
424	So if this would be called for deallocate. the function name is confusing. I think it may be better to rename it as something like 'GeSizeOFtOverAlignedFrame' (The name suggested looks not good too). By the way, now I am wondering why wouldn't we use llvm.coro.size directly? And make the middle end to handle it. How do you think about it?
452–461	To make it clear, what's the definition for 'raw ptr'? From the context, I think it means the true frame in above diagram from the context. So this confuses me: llvm.coro.raw.frame.ptr.addr is called again to load the raw memory pointer back and free it. If the raw memory means the true frame, it may not be right. Since the part for 'for-align' wouldn't be freed.
466–472	hmm, I understand it a bit more now.
764	How do your think about to replace EmitBuiltinAlignTo inplace?
clang/test/CodeGenCoroutines/coro-alloc.cpp
106	It defines variable 'MEM' again in conflicting with the line at 89. Does it matter?
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1133–1137	Sorry, I made a mistake. This move should be OK. My bad.

In D97915#2832667, @ChuanqiXu wrote:
In D97915#2832446, @ychen wrote:
In D97915#2829581, @ChuanqiXu wrote:
A remained question.

what's the semantics if user specified their allocation/deallocation functions?

Previously, we discussed for the ::operator new and ::operator delete. But what would we do for user specified allocation/deallocation functions?
It looks like we would treat them just like ::operator new. And it makes sense at the first glance. But the problem comes from that we judge whether
or not to over allocate a frame by this condition:
coro.align > align of new
But if the user uses their new, it makes no sense to use the align of new as the condition. On the other hand, if user specified their new function and the
alignment requirement for their promise type, would it be better really that the compiler do the extra transformation?

May be we need to discuss with other guys who are more familiar with the C++ standard to answer this.
I think @rjmccall could answer these. My understanding is that user-specified allocation/deallocation has the same semantics as their standard library's counterparts. align of new (aka STDCPP_DEFAULT_NEW_ALIGNMENT) should apply to both.
Yeah, I just curious about the question and not sure about the answer yet. I agree with that it should be safe if we treat user-specified allocation/deallocation as ::operator new. Maybe I am a little bit of pedantry. I just not sure if the developer would be satisfied when they find we allocate padding space they didn't want when they offered a new/delete method. (Maybe I am too anxious).

Another problem I find in this patch is that the introduction for raw frame makes the frame confusing. For example, the following codes is aim to calculate the size for the over allocated frame:
%[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
%[[NEWSIZE:.+]] = add i64 %[[SIZE]], %[[PAD]]
%[[MEM2:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[NEWSIZE]])
It makes sense only if llvm.coro.size stands for the size of 'true frame' (I am not sure what's the meaning for raw frame now. Let me use 'true frame' temporarily). But the document now says that '@llvm.coro.size' returns the size of the frame. It's confusing
I am not require to fix it by rename 'llvm.coro.size' or rephrase the document simply. I am thinking about the the semantics of coroutine intrinsics after we introduce the concept of 'raw frame'.

These are great points. The semantics of some coroutine intrinsics needs a little bit of tweaking otherwise they are confusing with the introduction of raw frame (suggestions for alternative names are much appreciated) which I defined in the updated patch (Coroutines.rst). Docs for llvm.coro.size mentions coroutine frame which I've used verbatim in the docs for llvm.coro.raw.frame.ptr.*.
Using your diagram below: raw frame is | --- for align --- | --- true frame --- |. Coroutine frame is | --- true frame --- |. (BTW, llvm.coro.begin docs are stale which I updated in the patch, please take a look @ChuanqiXu @rjmccall @lxfind).

clang/lib/CodeGen/CGCoroutine.cpp
424	So if this would be called for deallocate. the function name is confusing. I think it may be better to rename it as something like 'GeSizeOFtOverAlignedFrame' (The name suggested looks not good too). Renamed to `GrowFrameSize` since there are similar uses of `grow` in LLVM. Please let me know if it makes sense. By the way, now I am wondering why wouldn't we use llvm.coro.size directly? And make the middle end to handle it. How do you think about it? I think it works to a certain degree and attempted it with D100739. In theory, the over-alignment handling should be dealt with in the front-end since that's an ABI issue (not specified in any ABI documentation though). However, coroutine is special since the optimizer could change the alignment so there must be some work that needs to be delayed until CoroSplit(CoroFrame) time. In D100739, I was trying to argue that more than one frontend may need this over-alignment handling hence it might be ok to implement all work in LLVM. @rjmccall (https://reviews.llvm.org/D100739#2718681) seems not a fan of that and thinks `llvm.coro.raw.frame.ptr.offset` could be the way forward hence the current design which let front-end do all the work and leave the required piece to LLVM. That required piece is, at CoroSplit(CoroFrame) time, decide if the frame is over-aligned and if so add the `raw frame pointer` to the frame itself.
452–461	I've updated the `coroutine.rst` to a hopefully better explanation of the semantics of the newly added intrinsics. With the above diagram, `raw frame` is the whole thing `\| --- for align --- \| --- true frame --- \|`, `raw frame ptr` points to the left bar of `for align`.
764	I think with the interface issue being fixed, it is preferable to call it but I don't feel strongly about it so I just went ahead inlined `EmitBuiltinAlignTo` to help review/discussion.
clang/test/CodeGenCoroutines/coro-alloc.cpp
106	It should not matter. It works like variable definitions. Uses always get the most recent definitions. "FileCheck variables can be defined multiple times, and substitutions always get the latest value. Variables can also be substituted later on the same line they were defined on."

Update intrinsics documentation.
Inline EmitBuiltinAlignTo as emitAlignUpTo.
Address other comments.

Harbormaster completed remote builds in B110550: Diff 353855.Jun 23 2021, 2:32 AM

In D97915#2835147, @ychen wrote:
In D97915#2832667, @ChuanqiXu wrote:
In D97915#2832446, @ychen wrote:
In D97915#2829581, @ChuanqiXu wrote:
A remained question.

what's the semantics if user specified their allocation/deallocation functions?

Previously, we discussed for the ::operator new and ::operator delete. But what would we do for user specified allocation/deallocation functions?
It looks like we would treat them just like ::operator new. And it makes sense at the first glance. But the problem comes from that we judge whether
or not to over allocate a frame by this condition:
coro.align > align of new
But if the user uses their new, it makes no sense to use the align of new as the condition. On the other hand, if user specified their new function and the
alignment requirement for their promise type, would it be better really that the compiler do the extra transformation?

May be we need to discuss with other guys who are more familiar with the C++ standard to answer this.
I think @rjmccall could answer these. My understanding is that user-specified allocation/deallocation has the same semantics as their standard library's counterparts. align of new (aka STDCPP_DEFAULT_NEW_ALIGNMENT) should apply to both.
Yeah, I just curious about the question and not sure about the answer yet. I agree with that it should be safe if we treat user-specified allocation/deallocation as ::operator new. Maybe I am a little bit of pedantry. I just not sure if the developer would be satisfied when they find we allocate padding space they didn't want when they offered a new/delete method. (Maybe I am too anxious).

Another problem I find in this patch is that the introduction for raw frame makes the frame confusing. For example, the following codes is aim to calculate the size for the over allocated frame:
%[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
%[[NEWSIZE:.+]] = add i64 %[[SIZE]], %[[PAD]]
%[[MEM2:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[NEWSIZE]])
It makes sense only if llvm.coro.size stands for the size of 'true frame' (I am not sure what's the meaning for raw frame now. Let me use 'true frame' temporarily). But the document now says that '@llvm.coro.size' returns the size of the frame. It's confusing
I am not require to fix it by rename 'llvm.coro.size' or rephrase the document simply. I am thinking about the the semantics of coroutine intrinsics after we introduce the concept of 'raw frame'.
These are great points. The semantics of some coroutine intrinsics needs a little bit of tweaking otherwise they are confusing with the introduction of raw frame (suggestions for alternative names are much appreciated) which I defined in the updated patch (Coroutines.rst). Docs for llvm.coro.size mentions coroutine frame which I've used verbatim in the docs for llvm.coro.raw.frame.ptr.*.
Using your diagram below: raw frame is | --- for align --- | --- true frame --- |. Coroutine frame is | --- true frame --- |. (BTW, llvm.coro.begin docs are stale which I updated in the patch, please take a look @ChuanqiXu @rjmccall @lxfind).

Thanks for clarifying. Let's solve the semantics problem first.
With the introduction about 'raw frame', I think it's necessary to introduce this concept in the section 'Switched-Resume Lowering' or even the section 'Introduction' in the document. Add a section to tell the terminology is satisfied too.

Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the same value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are trying to solve the problem about memory leak. But I think we could use llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame (Maybe we need to add an intrinsic llvm.coro.raw.size). Then we can omit a field in the frame to save space.

Then I am a little confused for the design again, since we would treat the value for CoroBegin as the address of coroutine frame in the past and it looks like to be the raw frame now. Let me reconsider if it is OK.

llvm/docs/Coroutines.rst
974	'`llvm.coro.align`' `llvm.coro.align`
1065	coroutine frame From the implementation, it looks like `raw frame`. I am not sure if it is problematic now since CoroElide pass would convert the frame to an alloca.
1098	the coroutine frame It should be the raw frame now, isn't it?

Thanks for clarifying. Let's solve the semantics problem first.
With the introduction about 'raw frame', I think it's necessary to introduce this concept in the section 'Switched-Resume Lowering' or even the section 'Introduction' in the document. Add a section to tell the terminology is satisfied too.

Done.

Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the same value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are trying to solve the problem about memory leak. But I think we could use llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame (Maybe we need to add an intrinsic llvm.coro.raw.size). Then we can omit a field in the frame to save space.

("llvm.coro.raw.frame.ptr.offset" is an offset from coroutine frame address instead of raw frame pointer)

Apologies for the confusion. I've briefly explained it here https://reviews.llvm.org/D102145#2752445 I think it is not clear. "llvm.coro.raw.frame.ptr.addr" is conceptually "the address of a coroutine frame field storing the raw frame pointer" only after insertSpills in CoroFrame.cpp. Before that, "llvm.coro.raw.frame.ptr.addr" is actually an alloca storing the raw frame pointer (try grepping "alloc.frame.ptr" in this review page). Using "llvm.coro.raw.frame.ptr.offset" instead of "llvm.coro.raw.frame.ptr.addr" is doable which looks like below, please check line 31. The downside is that the write to coroutine frame is not through an alloca but a direct write. It is unusual because all fields in the frame are stored as 1. special/header fields 2. alloca 3. splills. Doing the write indirectly as Alloca makes me comfortable. The tradeoff is one extra intrinsic "llvm.coro.raw.frame.ptr.addr". What do you think?

19 coro.alloc.align:                                 ; preds = %coro.alloc.check.align
20   %3 = sub nsw i64 64, 16
21   %4 = add i64 128, %3
22   %call1 = call noalias nonnull i8* @_Znwm(i64 %4) #13
23   %mask = sub i64 64, 1
24   %intptr = ptrtoint i8* %call1 to i64
25   %over_boundary = add i64 %intptr, %mask
26   %inverted_mask = xor i64 %mask, -1
27   %aligned_intptr = and i64 %over_boundary, %inverted_mask
28   %diff = sub i64 %aligned_intptr, %intptr
29   %aligned_result = getelementptr inbounds i8, i8* %call1, i64 %diff
30   call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 64) ]
31   store i8* %call1, i8** %alloc.frame.ptr, align 8                     

     ; Replace line 31 with below, and must makes sure line 46~line 48 is skipped.
     ; %poff = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
     ; %addr = getelementptr inbounds i8, i8* %aligned_result, i32 %poff
     ; %addr1 = bitcast i8* %addr to i8**
     ; store i8* %call1, i8** %addr1, align 8


32   br label %coro.init.from.coro.alloc.align
33
34 coro.init.from.coro.alloc.align:                  ; preds = %coro.alloc.align
35   %aligned_result.coro.init = phi i8* [ %aligned_result, %coro.alloc.align ]
36   br label %coro.init
37
38 coro.init:                                        ; preds = %coro.init.from.entry, %coro.init.from.coro.alloc.align, %cor
   o.init.from.coro.alloc
39   %5 = phi i8* [ %.coro.init, %coro.init.from.entry ], [ %call.coro.init, %coro.init.from.coro.alloc ], [ %aligned_result
   .coro.init, %coro.init.from.coro.alloc.align ]
40   %FramePtr = bitcast i8* %5 to %f0.Frame*
41   %resume.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 0
42   store void (%f0.Frame*)* @f0.resume, void (%f0.Frame*)** %resume.addr, align 8
43   %6 = select i1 true, void (%f0.Frame*)* @f0.destroy, void (%f0.Frame*)* @f0.cleanup
44   %destroy.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 1
45   store void (%f0.Frame*)* %6, void (%f0.Frame*)** %destroy.addr, align 8
46   %7 = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 2
47   %8 = load i8*, i8** %alloc.frame.ptr, align 8
48   store i8* %8, i8** %7, align 8
49   br label %AllocaSpillBB
50
51 AllocaSpillBB:                                    ; preds = %coro.init
52   %.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 4
53   %ref.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 5
54   %agg.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 6
55   %ref.tmp5.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 7
56   %agg.tmp8.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 8
57   %__promise.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 10
58   br label %PostSpill

Then I am a little confused for the design again, since we would treat the value for CoroBegin as the address of coroutine frame in the past and it looks like to be the raw frame now. Let me reconsider if it is OK.

The returned value of CoroBegin is still coroutine frame not a raw frame even if the frame is overaligned. You could check the above code.

Add description of raw frame in Coroutines.rst.

Harbormaster completed remote builds in B111644: Diff 355401.Jun 29 2021, 7:21 PM

I am a little busy this week. I would try to look into this next week if possible.

In D97915#2848816, @ychen wrote:
Thanks for clarifying. Let's solve the semantics problem first.
With the introduction about 'raw frame', I think it's necessary to introduce this concept in the section 'Switched-Resume Lowering' or even the section 'Introduction' in the document. Add a section to tell the terminology is satisfied too.

Done.

Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the same value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are trying to solve the problem about memory leak. But I think we could use llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame (Maybe we need to add an intrinsic llvm.coro.raw.size). Then we can omit a field in the frame to save space.

("llvm.coro.raw.frame.ptr.offset" is an offset from coroutine frame address instead of raw frame pointer)

Apologies for the confusion. I've briefly explained it here https://reviews.llvm.org/D102145#2752445 I think it is not clear. "llvm.coro.raw.frame.ptr.addr" is conceptually "the address of a coroutine frame field storing the raw frame pointer" only after insertSpills in CoroFrame.cpp. Before that, "llvm.coro.raw.frame.ptr.addr" is actually an alloca storing the raw frame pointer (try grepping "alloc.frame.ptr" in this review page). Using "llvm.coro.raw.frame.ptr.offset" instead of "llvm.coro.raw.frame.ptr.addr" is doable which looks like below, please check line 31. The downside is that the write to coroutine frame is not through an alloca but a direct write. It is unusual because all fields in the frame are stored as 1. special/header fields 2. alloca 3. splills. Doing the write indirectly as Alloca makes me comfortable. The tradeoff is one extra intrinsic "llvm.coro.raw.frame.ptr.addr". What do you think?
19 coro.alloc.align:                                 ; preds = %coro.alloc.check.align
20   %3 = sub nsw i64 64, 16
21   %4 = add i64 128, %3
22   %call1 = call noalias nonnull i8* @_Znwm(i64 %4) #13
23   %mask = sub i64 64, 1
24   %intptr = ptrtoint i8* %call1 to i64
25   %over_boundary = add i64 %intptr, %mask
26   %inverted_mask = xor i64 %mask, -1
27   %aligned_intptr = and i64 %over_boundary, %inverted_mask
28   %diff = sub i64 %aligned_intptr, %intptr
29   %aligned_result = getelementptr inbounds i8, i8* %call1, i64 %diff
30   call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 64) ]
31   store i8* %call1, i8** %alloc.frame.ptr, align 8                     

     ; Replace line 31 with below, and must makes sure line 46~line 48 is skipped.
     ; %poff = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
     ; %addr = getelementptr inbounds i8, i8* %aligned_result, i32 %poff
     ; %addr1 = bitcast i8* %addr to i8**
     ; store i8* %call1, i8** %addr1, align 8


32   br label %coro.init.from.coro.alloc.align
33
34 coro.init.from.coro.alloc.align:                  ; preds = %coro.alloc.align
35   %aligned_result.coro.init = phi i8* [ %aligned_result, %coro.alloc.align ]
36   br label %coro.init
37
38 coro.init:                                        ; preds = %coro.init.from.entry, %coro.init.from.coro.alloc.align, %cor
   o.init.from.coro.alloc
39   %5 = phi i8* [ %.coro.init, %coro.init.from.entry ], [ %call.coro.init, %coro.init.from.coro.alloc ], [ %aligned_result
   .coro.init, %coro.init.from.coro.alloc.align ]
40   %FramePtr = bitcast i8* %5 to %f0.Frame*
41   %resume.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 0
42   store void (%f0.Frame*)* @f0.resume, void (%f0.Frame*)** %resume.addr, align 8
43   %6 = select i1 true, void (%f0.Frame*)* @f0.destroy, void (%f0.Frame*)* @f0.cleanup
44   %destroy.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 1
45   store void (%f0.Frame*)* %6, void (%f0.Frame*)** %destroy.addr, align 8
46   %7 = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 2
47   %8 = load i8*, i8** %alloc.frame.ptr, align 8
48   store i8* %8, i8** %7, align 8
49   br label %AllocaSpillBB
50
51 AllocaSpillBB:                                    ; preds = %coro.init
52   %.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 4
53   %ref.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 5
54   %agg.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 6
55   %ref.tmp5.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 7
56   %agg.tmp8.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 8
57   %__promise.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 10
58   br label %PostSpill
Then I am a little confused for the design again, since we would treat the value for CoroBegin as the address of coroutine frame in the past and it looks like to be the raw frame now. Let me reconsider if it is OK.

The returned value of CoroBegin is still coroutine frame not a raw frame even if the frame is overaligned. You could check the above code.

Thanks for clarifying!

I don't understand why we need to store the address for coroutine raw frame in the coroutine frame. For example, %call1 in your example marks the address for the raw frame. Then can we use the value %call1 in every place where we want to use the address for coroutine frame?
If yes, I think we could emit an intrinsic called 'llvm.coro.raw.frame' in the frontend if we need to use the address for the raw frame. Then in the middle end, we could replace llvm.coro.raw.frame with %call1 simply. Similarly, we could define intrinsic llvm.coro.raw.frame.size. As far as I know from the codes, the address for the coroutine frame is mainly used for deallocation. So it should be fine I guess.

Then the code generated now looks roughly like:

if (should over align) {
   /// ...
   mem = ...
} else {
   /// ...
   mem = ...
}
coro.begin(id, mem);

It looks redundant since the then part and else part looks very similar. I understand it would be eliminated in the middle end. But another problem is that the redundant implementation in clang. Maybe we could solve it by refactoring.
But I am wondering if it is possible to use another pattern (assume llvm.coro.alloc returns true):

%raw.frame.ptr = new(call @llvm.coro.raw.frame.size())
%true.frame.ptr = call @llvm.coro.frame(%raw.frame.ptr, NEW_ALIGN) ; we need a better name
call @llvm.coro.begin(coro.id, %true.frame.ptr)

Then for llvm.coro.frame, we could return @raw.frame.ptr simply if the alignment could be satisfied (alignment needed is less than NEW_ALIGN). Or we could do simply to align up for the coroutine frame. There are many APIs in Align.h.
And for the destruction, we could emit:

call @delete(%raw.frame.ptr, call @llvm.coro.raw.frame.size())

In this way, I guess we would get simpler implementation and generated codes.

BTW, if we choose to do so, the semantics for llvm.coro.raw.frame.ptr and llvm.coro.size would change slightly. They would stands for the address and size for the coroutine frame if we don't need over alignment.

How do you think about this?

llvm/docs/Coroutines.rst
65	Since we over align coroutine frame for switched-resume lowering coroutines only, it may be better to move this section under switched-resume lowering section.
67	I prefer to reword "sometimes" to a clearer condition. Like "When the align required is bigger than 16".

Move raw frame description to Switched-Resume section.

In D97915#2859237, @ChuanqiXu wrote:
In D97915#2848816, @ychen wrote:
Thanks for clarifying. Let's solve the semantics problem first.
With the introduction about 'raw frame', I think it's necessary to introduce this concept in the section 'Switched-Resume Lowering' or even the section 'Introduction' in the document. Add a section to tell the terminology is satisfied too.

Done.

Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the same value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are trying to solve the problem about memory leak. But I think we could use llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame (Maybe we need to add an intrinsic llvm.coro.raw.size). Then we can omit a field in the frame to save space.

("llvm.coro.raw.frame.ptr.offset" is an offset from coroutine frame address instead of raw frame pointer)

Apologies for the confusion. I've briefly explained it here https://reviews.llvm.org/D102145#2752445 I think it is not clear. "llvm.coro.raw.frame.ptr.addr" is conceptually "the address of a coroutine frame field storing the raw frame pointer" only after insertSpills in CoroFrame.cpp. Before that, "llvm.coro.raw.frame.ptr.addr" is actually an alloca storing the raw frame pointer (try grepping "alloc.frame.ptr" in this review page). Using "llvm.coro.raw.frame.ptr.offset" instead of "llvm.coro.raw.frame.ptr.addr" is doable which looks like below, please check line 31. The downside is that the write to coroutine frame is not through an alloca but a direct write. It is unusual because all fields in the frame are stored as 1. special/header fields 2. alloca 3. splills. Doing the write indirectly as Alloca makes me comfortable. The tradeoff is one extra intrinsic "llvm.coro.raw.frame.ptr.addr". What do you think?
19 coro.alloc.align:                                 ; preds = %coro.alloc.check.align
20   %3 = sub nsw i64 64, 16
21   %4 = add i64 128, %3
22   %call1 = call noalias nonnull i8* @_Znwm(i64 %4) #13
23   %mask = sub i64 64, 1
24   %intptr = ptrtoint i8* %call1 to i64
25   %over_boundary = add i64 %intptr, %mask
26   %inverted_mask = xor i64 %mask, -1
27   %aligned_intptr = and i64 %over_boundary, %inverted_mask
28   %diff = sub i64 %aligned_intptr, %intptr
29   %aligned_result = getelementptr inbounds i8, i8* %call1, i64 %diff
30   call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 64) ]
31   store i8* %call1, i8** %alloc.frame.ptr, align 8                     

     ; Replace line 31 with below, and must makes sure line 46~line 48 is skipped.
     ; %poff = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
     ; %addr = getelementptr inbounds i8, i8* %aligned_result, i32 %poff
     ; %addr1 = bitcast i8* %addr to i8**
     ; store i8* %call1, i8** %addr1, align 8


32   br label %coro.init.from.coro.alloc.align
33
34 coro.init.from.coro.alloc.align:                  ; preds = %coro.alloc.align
35   %aligned_result.coro.init = phi i8* [ %aligned_result, %coro.alloc.align ]
36   br label %coro.init
37
38 coro.init:                                        ; preds = %coro.init.from.entry, %coro.init.from.coro.alloc.align, %cor
   o.init.from.coro.alloc
39   %5 = phi i8* [ %.coro.init, %coro.init.from.entry ], [ %call.coro.init, %coro.init.from.coro.alloc ], [ %aligned_result
   .coro.init, %coro.init.from.coro.alloc.align ]
40   %FramePtr = bitcast i8* %5 to %f0.Frame*
41   %resume.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 0
42   store void (%f0.Frame*)* @f0.resume, void (%f0.Frame*)** %resume.addr, align 8
43   %6 = select i1 true, void (%f0.Frame*)* @f0.destroy, void (%f0.Frame*)* @f0.cleanup
44   %destroy.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 1
45   store void (%f0.Frame*)* %6, void (%f0.Frame*)** %destroy.addr, align 8
46   %7 = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 2
47   %8 = load i8*, i8** %alloc.frame.ptr, align 8
48   store i8* %8, i8** %7, align 8
49   br label %AllocaSpillBB
50
51 AllocaSpillBB:                                    ; preds = %coro.init
52   %.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 4
53   %ref.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 5
54   %agg.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 6
55   %ref.tmp5.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 7
56   %agg.tmp8.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 8
57   %__promise.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 10
58   br label %PostSpill
Then I am a little confused for the design again, since we would treat the value for CoroBegin as the address of coroutine frame in the past and it looks like to be the raw frame now. Let me reconsider if it is OK.

The returned value of CoroBegin is still coroutine frame not a raw frame even if the frame is overaligned. You could check the above code.
Thanks for clarifying!

I don't understand why we need to store the address for coroutine raw frame in the coroutine frame. For example, %call1 in your example marks the address for the raw frame. Then can we use the value %call1 in every place where we want to use the address for coroutine frame?
If yes, I think we could emit an intrinsic called 'llvm.coro.raw.frame' in the frontend if we need to use the address for the raw frame. Then in the middle end, we could replace llvm.coro.raw.frame with %call1 simply. Similarly, we could define intrinsic llvm.coro.raw.frame.size. As far as I know from the codes, the address for the coroutine frame is mainly used for deallocation. So it should be fine I guess.

Then the code generated now looks roughly like:
if (should over align) {
   /// ...
   mem = ...
} else {
   /// ...
   mem = ...
}
coro.begin(id, mem);
It looks redundant since the then part and else part looks very similar. I understand it would be eliminated in the middle end. But another problem is that the redundant implementation in clang. Maybe we could solve it by refactoring.
But I am wondering if it is possible to use another pattern (assume llvm.coro.alloc returns true):
%raw.frame.ptr = new(call @llvm.coro.raw.frame.size())
%true.frame.ptr = call @llvm.coro.frame(%raw.frame.ptr, NEW_ALIGN) ; we need a better name
call @llvm.coro.begin(coro.id, %true.frame.ptr)
Then for llvm.coro.frame, we could return @raw.frame.ptr simply if the alignment could be satisfied (alignment needed is less than NEW_ALIGN). Or we could do simply to align up for the coroutine frame. There are many APIs in Align.h.
And for the destruction, we could emit:
call @delete(%raw.frame.ptr, call @llvm.coro.raw.frame.size())
In this way, I guess we would get simpler implementation and generated codes.

BTW, if we choose to do so, the semantics for llvm.coro.raw.frame.ptr and llvm.coro.size would change slightly. They would stands for the address and size for the coroutine frame if we don't need over alignment.

How do you think about this?

I was confused by this and @rjmccall explained it here https://reviews.llvm.org/D97915/new/#2604871. Basically, we could not recover "raw frame pointer" (%call1) from coroutine frame pointer statically at deallocation time.

llvm/docs/Coroutines.rst
1098	It is still "coroutine frame".

In D97915#2860916, @ychen wrote:
In D97915#2859237, @ChuanqiXu wrote:
In D97915#2848816, @ychen wrote:
Thanks for clarifying. Let's solve the semantics problem first.
With the introduction about 'raw frame', I think it's necessary to introduce this concept in the section 'Switched-Resume Lowering' or even the section 'Introduction' in the document. Add a section to tell the terminology is satisfied too.

Done.

Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the same value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are trying to solve the problem about memory leak. But I think we could use llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame (Maybe we need to add an intrinsic llvm.coro.raw.size). Then we can omit a field in the frame to save space.

("llvm.coro.raw.frame.ptr.offset" is an offset from coroutine frame address instead of raw frame pointer)

Apologies for the confusion. I've briefly explained it here https://reviews.llvm.org/D102145#2752445 I think it is not clear. "llvm.coro.raw.frame.ptr.addr" is conceptually "the address of a coroutine frame field storing the raw frame pointer" only after insertSpills in CoroFrame.cpp. Before that, "llvm.coro.raw.frame.ptr.addr" is actually an alloca storing the raw frame pointer (try grepping "alloc.frame.ptr" in this review page). Using "llvm.coro.raw.frame.ptr.offset" instead of "llvm.coro.raw.frame.ptr.addr" is doable which looks like below, please check line 31. The downside is that the write to coroutine frame is not through an alloca but a direct write. It is unusual because all fields in the frame are stored as 1. special/header fields 2. alloca 3. splills. Doing the write indirectly as Alloca makes me comfortable. The tradeoff is one extra intrinsic "llvm.coro.raw.frame.ptr.addr". What do you think?
19 coro.alloc.align:                                 ; preds = %coro.alloc.check.align
20   %3 = sub nsw i64 64, 16
21   %4 = add i64 128, %3
22   %call1 = call noalias nonnull i8* @_Znwm(i64 %4) #13
23   %mask = sub i64 64, 1
24   %intptr = ptrtoint i8* %call1 to i64
25   %over_boundary = add i64 %intptr, %mask
26   %inverted_mask = xor i64 %mask, -1
27   %aligned_intptr = and i64 %over_boundary, %inverted_mask
28   %diff = sub i64 %aligned_intptr, %intptr
29   %aligned_result = getelementptr inbounds i8, i8* %call1, i64 %diff
30   call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 64) ]
31   store i8* %call1, i8** %alloc.frame.ptr, align 8                     

     ; Replace line 31 with below, and must makes sure line 46~line 48 is skipped.
     ; %poff = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
     ; %addr = getelementptr inbounds i8, i8* %aligned_result, i32 %poff
     ; %addr1 = bitcast i8* %addr to i8**
     ; store i8* %call1, i8** %addr1, align 8


32   br label %coro.init.from.coro.alloc.align
33
34 coro.init.from.coro.alloc.align:                  ; preds = %coro.alloc.align
35   %aligned_result.coro.init = phi i8* [ %aligned_result, %coro.alloc.align ]
36   br label %coro.init
37
38 coro.init:                                        ; preds = %coro.init.from.entry, %coro.init.from.coro.alloc.align, %cor
   o.init.from.coro.alloc
39   %5 = phi i8* [ %.coro.init, %coro.init.from.entry ], [ %call.coro.init, %coro.init.from.coro.alloc ], [ %aligned_result
   .coro.init, %coro.init.from.coro.alloc.align ]
40   %FramePtr = bitcast i8* %5 to %f0.Frame*
41   %resume.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 0
42   store void (%f0.Frame*)* @f0.resume, void (%f0.Frame*)** %resume.addr, align 8
43   %6 = select i1 true, void (%f0.Frame*)* @f0.destroy, void (%f0.Frame*)* @f0.cleanup
44   %destroy.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 1
45   store void (%f0.Frame*)* %6, void (%f0.Frame*)** %destroy.addr, align 8
46   %7 = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 2
47   %8 = load i8*, i8** %alloc.frame.ptr, align 8
48   store i8* %8, i8** %7, align 8
49   br label %AllocaSpillBB
50
51 AllocaSpillBB:                                    ; preds = %coro.init
52   %.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 4
53   %ref.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 5
54   %agg.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 6
55   %ref.tmp5.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 7
56   %agg.tmp8.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 8
57   %__promise.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 10
58   br label %PostSpill
Then I am a little confused for the design again, since we would treat the value for CoroBegin as the address of coroutine frame in the past and it looks like to be the raw frame now. Let me reconsider if it is OK.

The returned value of CoroBegin is still coroutine frame not a raw frame even if the frame is overaligned. You could check the above code.
Thanks for clarifying!

I don't understand why we need to store the address for coroutine raw frame in the coroutine frame. For example, %call1 in your example marks the address for the raw frame. Then can we use the value %call1 in every place where we want to use the address for coroutine frame?
If yes, I think we could emit an intrinsic called 'llvm.coro.raw.frame' in the frontend if we need to use the address for the raw frame. Then in the middle end, we could replace llvm.coro.raw.frame with %call1 simply. Similarly, we could define intrinsic llvm.coro.raw.frame.size. As far as I know from the codes, the address for the coroutine frame is mainly used for deallocation. So it should be fine I guess.

Then the code generated now looks roughly like:
if (should over align) {
   /// ...
   mem = ...
} else {
   /// ...
   mem = ...
}
coro.begin(id, mem);
It looks redundant since the then part and else part looks very similar. I understand it would be eliminated in the middle end. But another problem is that the redundant implementation in clang. Maybe we could solve it by refactoring.
But I am wondering if it is possible to use another pattern (assume llvm.coro.alloc returns true):
%raw.frame.ptr = new(call @llvm.coro.raw.frame.size())
%true.frame.ptr = call @llvm.coro.frame(%raw.frame.ptr, NEW_ALIGN) ; we need a better name
call @llvm.coro.begin(coro.id, %true.frame.ptr)
Then for llvm.coro.frame, we could return @raw.frame.ptr simply if the alignment could be satisfied (alignment needed is less than NEW_ALIGN). Or we could do simply to align up for the coroutine frame. There are many APIs in Align.h.
And for the destruction, we could emit:
call @delete(%raw.frame.ptr, call @llvm.coro.raw.frame.size())
In this way, I guess we would get simpler implementation and generated codes.

BTW, if we choose to do so, the semantics for llvm.coro.raw.frame.ptr and llvm.coro.size would change slightly. They would stands for the address and size for the coroutine frame if we don't need over alignment.

How do you think about this?
I was confused by this and @rjmccall explained it here https://reviews.llvm.org/D97915/new/#2604871. Basically, we could not recover "raw frame pointer" (%call1) from coroutine frame pointer statically at deallocation time.

Oh, I understand why we need to store the address for raw frame now. Another question is that how do you think combine the pattern:

if (should over align) {
   /// ...
   mem = ...
} else {
   /// ...
   mem = ...
}
coro.begin(id, mem);

into this one:

%true.frame.ptr = call @llvm.coro.create.frame(new(call @llvm.coro.raw.frame.size()), NEW_ALIGN) ; we need a better name
                                                                                                                                                                         ; It would be lowered to store the address of the raw frame to the alloca in the middle end if needed
call @llvm.coro.begin(coro.id, %true.frame.ptr)

and this one:

call @delete(call @llvm.coro.raw.frame.ptr(), call @llvm.coro.raw.frame.size())                                         ; Then use  `llvm.coro.raw.frame.ptr()` and `llvm.coro.raw.frame.size()` directly whenever we want.

It looks like we could generate the same code in the front for normal and over aligned coroutine.

Harbormaster completed remote builds in B112703: Diff 356838.Jul 6 2021, 7:55 PM

In D97915#2860984, @ChuanqiXu wrote:
In D97915#2860916, @ychen wrote:
In D97915#2859237, @ChuanqiXu wrote:
In D97915#2848816, @ychen wrote:
Thanks for clarifying. Let's solve the semantics problem first.
With the introduction about 'raw frame', I think it's necessary to introduce this concept in the section 'Switched-Resume Lowering' or even the section 'Introduction' in the document. Add a section to tell the terminology is satisfied too.

Done.

Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the same value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are trying to solve the problem about memory leak. But I think we could use llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame (Maybe we need to add an intrinsic llvm.coro.raw.size). Then we can omit a field in the frame to save space.

("llvm.coro.raw.frame.ptr.offset" is an offset from coroutine frame address instead of raw frame pointer)

Apologies for the confusion. I've briefly explained it here https://reviews.llvm.org/D102145#2752445 I think it is not clear. "llvm.coro.raw.frame.ptr.addr" is conceptually "the address of a coroutine frame field storing the raw frame pointer" only after insertSpills in CoroFrame.cpp. Before that, "llvm.coro.raw.frame.ptr.addr" is actually an alloca storing the raw frame pointer (try grepping "alloc.frame.ptr" in this review page). Using "llvm.coro.raw.frame.ptr.offset" instead of "llvm.coro.raw.frame.ptr.addr" is doable which looks like below, please check line 31. The downside is that the write to coroutine frame is not through an alloca but a direct write. It is unusual because all fields in the frame are stored as 1. special/header fields 2. alloca 3. splills. Doing the write indirectly as Alloca makes me comfortable. The tradeoff is one extra intrinsic "llvm.coro.raw.frame.ptr.addr". What do you think?
19 coro.alloc.align:                                 ; preds = %coro.alloc.check.align
20   %3 = sub nsw i64 64, 16
21   %4 = add i64 128, %3
22   %call1 = call noalias nonnull i8* @_Znwm(i64 %4) #13
23   %mask = sub i64 64, 1
24   %intptr = ptrtoint i8* %call1 to i64
25   %over_boundary = add i64 %intptr, %mask
26   %inverted_mask = xor i64 %mask, -1
27   %aligned_intptr = and i64 %over_boundary, %inverted_mask
28   %diff = sub i64 %aligned_intptr, %intptr
29   %aligned_result = getelementptr inbounds i8, i8* %call1, i64 %diff
30   call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 64) ]
31   store i8* %call1, i8** %alloc.frame.ptr, align 8                     

     ; Replace line 31 with below, and must makes sure line 46~line 48 is skipped.
     ; %poff = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
     ; %addr = getelementptr inbounds i8, i8* %aligned_result, i32 %poff
     ; %addr1 = bitcast i8* %addr to i8**
     ; store i8* %call1, i8** %addr1, align 8


32   br label %coro.init.from.coro.alloc.align
33
34 coro.init.from.coro.alloc.align:                  ; preds = %coro.alloc.align
35   %aligned_result.coro.init = phi i8* [ %aligned_result, %coro.alloc.align ]
36   br label %coro.init
37
38 coro.init:                                        ; preds = %coro.init.from.entry, %coro.init.from.coro.alloc.align, %cor
   o.init.from.coro.alloc
39   %5 = phi i8* [ %.coro.init, %coro.init.from.entry ], [ %call.coro.init, %coro.init.from.coro.alloc ], [ %aligned_result
   .coro.init, %coro.init.from.coro.alloc.align ]
40   %FramePtr = bitcast i8* %5 to %f0.Frame*
41   %resume.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 0
42   store void (%f0.Frame*)* @f0.resume, void (%f0.Frame*)** %resume.addr, align 8
43   %6 = select i1 true, void (%f0.Frame*)* @f0.destroy, void (%f0.Frame*)* @f0.cleanup
44   %destroy.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 1
45   store void (%f0.Frame*)* %6, void (%f0.Frame*)** %destroy.addr, align 8
46   %7 = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 2
47   %8 = load i8*, i8** %alloc.frame.ptr, align 8
48   store i8* %8, i8** %7, align 8
49   br label %AllocaSpillBB
50
51 AllocaSpillBB:                                    ; preds = %coro.init
52   %.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 4
53   %ref.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 5
54   %agg.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 6
55   %ref.tmp5.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 7
56   %agg.tmp8.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 8
57   %__promise.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 10
58   br label %PostSpill
Then I am a little confused for the design again, since we would treat the value for CoroBegin as the address of coroutine frame in the past and it looks like to be the raw frame now. Let me reconsider if it is OK.

The returned value of CoroBegin is still coroutine frame not a raw frame even if the frame is overaligned. You could check the above code.
Thanks for clarifying!

I don't understand why we need to store the address for coroutine raw frame in the coroutine frame. For example, %call1 in your example marks the address for the raw frame. Then can we use the value %call1 in every place where we want to use the address for coroutine frame?
If yes, I think we could emit an intrinsic called 'llvm.coro.raw.frame' in the frontend if we need to use the address for the raw frame. Then in the middle end, we could replace llvm.coro.raw.frame with %call1 simply. Similarly, we could define intrinsic llvm.coro.raw.frame.size. As far as I know from the codes, the address for the coroutine frame is mainly used for deallocation. So it should be fine I guess.

Then the code generated now looks roughly like:
if (should over align) {
   /// ...
   mem = ...
} else {
   /// ...
   mem = ...
}
coro.begin(id, mem);
It looks redundant since the then part and else part looks very similar. I understand it would be eliminated in the middle end. But another problem is that the redundant implementation in clang. Maybe we could solve it by refactoring.
But I am wondering if it is possible to use another pattern (assume llvm.coro.alloc returns true):
%raw.frame.ptr = new(call @llvm.coro.raw.frame.size())
%true.frame.ptr = call @llvm.coro.frame(%raw.frame.ptr, NEW_ALIGN) ; we need a better name
call @llvm.coro.begin(coro.id, %true.frame.ptr)
Then for llvm.coro.frame, we could return @raw.frame.ptr simply if the alignment could be satisfied (alignment needed is less than NEW_ALIGN). Or we could do simply to align up for the coroutine frame. There are many APIs in Align.h.
And for the destruction, we could emit:
call @delete(%raw.frame.ptr, call @llvm.coro.raw.frame.size())
In this way, I guess we would get simpler implementation and generated codes.

BTW, if we choose to do so, the semantics for llvm.coro.raw.frame.ptr and llvm.coro.size would change slightly. They would stands for the address and size for the coroutine frame if we don't need over alignment.

How do you think about this?
I was confused by this and @rjmccall explained it here https://reviews.llvm.org/D97915/new/#2604871. Basically, we could not recover "raw frame pointer" (%call1) from coroutine frame pointer statically at deallocation time.
Oh, I understand why we need to store the address for raw frame now. Another question is that how do you think combine the pattern:
if (should over align) {
   /// ...
   mem = ...
} else {
   /// ...
   mem = ...
}
coro.begin(id, mem);
into this one:
%true.frame.ptr = call @llvm.coro.create.frame(new(call @llvm.coro.raw.frame.size()), NEW_ALIGN) ; we need a better name
                                                                                                                                                                         ; It would be lowered to store the address of the raw frame to the alloca in the middle end if needed
call @llvm.coro.begin(coro.id, %true.frame.ptr)
and this one:
call @delete(call @llvm.coro.raw.frame.ptr(), call @llvm.coro.raw.frame.size())                                         ; Then use  `llvm.coro.raw.frame.ptr()` and `llvm.coro.raw.frame.size()` directly whenever we want.
It looks like we could generate the same code in the front for normal and over aligned coroutines.

Yeah, I think it works for this patch alone. It shifts the semantic lowering from Clang to LLVM but does not perform less work. For future language support like D102147, @llvm.coro.create.frame needs to be repurposed based on the new semantics and that seems a sign that it should be implemented in frontend.

In D97915#2861036, @ychen wrote:
In D97915#2860984, @ChuanqiXu wrote:
In D97915#2860916, @ychen wrote:
In D97915#2859237, @ChuanqiXu wrote:
In D97915#2848816, @ychen wrote:
Thanks for clarifying. Let's solve the semantics problem first.
With the introduction about 'raw frame', I think it's necessary to introduce this concept in the section 'Switched-Resume Lowering' or even the section 'Introduction' in the document. Add a section to tell the terminology is satisfied too.

Done.

Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the same value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are trying to solve the problem about memory leak. But I think we could use llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame (Maybe we need to add an intrinsic llvm.coro.raw.size). Then we can omit a field in the frame to save space.

("llvm.coro.raw.frame.ptr.offset" is an offset from coroutine frame address instead of raw frame pointer)

Apologies for the confusion. I've briefly explained it here https://reviews.llvm.org/D102145#2752445 I think it is not clear. "llvm.coro.raw.frame.ptr.addr" is conceptually "the address of a coroutine frame field storing the raw frame pointer" only after insertSpills in CoroFrame.cpp. Before that, "llvm.coro.raw.frame.ptr.addr" is actually an alloca storing the raw frame pointer (try grepping "alloc.frame.ptr" in this review page). Using "llvm.coro.raw.frame.ptr.offset" instead of "llvm.coro.raw.frame.ptr.addr" is doable which looks like below, please check line 31. The downside is that the write to coroutine frame is not through an alloca but a direct write. It is unusual because all fields in the frame are stored as 1. special/header fields 2. alloca 3. splills. Doing the write indirectly as Alloca makes me comfortable. The tradeoff is one extra intrinsic "llvm.coro.raw.frame.ptr.addr". What do you think?
19 coro.alloc.align:                                 ; preds = %coro.alloc.check.align
20   %3 = sub nsw i64 64, 16
21   %4 = add i64 128, %3
22   %call1 = call noalias nonnull i8* @_Znwm(i64 %4) #13
23   %mask = sub i64 64, 1
24   %intptr = ptrtoint i8* %call1 to i64
25   %over_boundary = add i64 %intptr, %mask
26   %inverted_mask = xor i64 %mask, -1
27   %aligned_intptr = and i64 %over_boundary, %inverted_mask
28   %diff = sub i64 %aligned_intptr, %intptr
29   %aligned_result = getelementptr inbounds i8, i8* %call1, i64 %diff
30   call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 64) ]
31   store i8* %call1, i8** %alloc.frame.ptr, align 8                     

     ; Replace line 31 with below, and must makes sure line 46~line 48 is skipped.
     ; %poff = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
     ; %addr = getelementptr inbounds i8, i8* %aligned_result, i32 %poff
     ; %addr1 = bitcast i8* %addr to i8**
     ; store i8* %call1, i8** %addr1, align 8


32   br label %coro.init.from.coro.alloc.align
33
34 coro.init.from.coro.alloc.align:                  ; preds = %coro.alloc.align
35   %aligned_result.coro.init = phi i8* [ %aligned_result, %coro.alloc.align ]
36   br label %coro.init
37
38 coro.init:                                        ; preds = %coro.init.from.entry, %coro.init.from.coro.alloc.align, %cor
   o.init.from.coro.alloc
39   %5 = phi i8* [ %.coro.init, %coro.init.from.entry ], [ %call.coro.init, %coro.init.from.coro.alloc ], [ %aligned_result
   .coro.init, %coro.init.from.coro.alloc.align ]
40   %FramePtr = bitcast i8* %5 to %f0.Frame*
41   %resume.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 0
42   store void (%f0.Frame*)* @f0.resume, void (%f0.Frame*)** %resume.addr, align 8
43   %6 = select i1 true, void (%f0.Frame*)* @f0.destroy, void (%f0.Frame*)* @f0.cleanup
44   %destroy.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 1
45   store void (%f0.Frame*)* %6, void (%f0.Frame*)** %destroy.addr, align 8
46   %7 = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 2
47   %8 = load i8*, i8** %alloc.frame.ptr, align 8
48   store i8* %8, i8** %7, align 8
49   br label %AllocaSpillBB
50
51 AllocaSpillBB:                                    ; preds = %coro.init
52   %.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 4
53   %ref.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 5
54   %agg.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 6
55   %ref.tmp5.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 7
56   %agg.tmp8.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 8
57   %__promise.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 10
58   br label %PostSpill
Then I am a little confused for the design again, since we would treat the value for CoroBegin as the address of coroutine frame in the past and it looks like to be the raw frame now. Let me reconsider if it is OK.

The returned value of CoroBegin is still coroutine frame not a raw frame even if the frame is overaligned. You could check the above code.
Thanks for clarifying!

I don't understand why we need to store the address for coroutine raw frame in the coroutine frame. For example, %call1 in your example marks the address for the raw frame. Then can we use the value %call1 in every place where we want to use the address for coroutine frame?
If yes, I think we could emit an intrinsic called 'llvm.coro.raw.frame' in the frontend if we need to use the address for the raw frame. Then in the middle end, we could replace llvm.coro.raw.frame with %call1 simply. Similarly, we could define intrinsic llvm.coro.raw.frame.size. As far as I know from the codes, the address for the coroutine frame is mainly used for deallocation. So it should be fine I guess.

Then the code generated now looks roughly like:
if (should over align) {
   /// ...
   mem = ...
} else {
   /// ...
   mem = ...
}
coro.begin(id, mem);
It looks redundant since the then part and else part looks very similar. I understand it would be eliminated in the middle end. But another problem is that the redundant implementation in clang. Maybe we could solve it by refactoring.
But I am wondering if it is possible to use another pattern (assume llvm.coro.alloc returns true):
%raw.frame.ptr = new(call @llvm.coro.raw.frame.size())
%true.frame.ptr = call @llvm.coro.frame(%raw.frame.ptr, NEW_ALIGN) ; we need a better name
call @llvm.coro.begin(coro.id, %true.frame.ptr)
Then for llvm.coro.frame, we could return @raw.frame.ptr simply if the alignment could be satisfied (alignment needed is less than NEW_ALIGN). Or we could do simply to align up for the coroutine frame. There are many APIs in Align.h.
And for the destruction, we could emit:
call @delete(%raw.frame.ptr, call @llvm.coro.raw.frame.size())
In this way, I guess we would get simpler implementation and generated codes.

BTW, if we choose to do so, the semantics for llvm.coro.raw.frame.ptr and llvm.coro.size would change slightly. They would stands for the address and size for the coroutine frame if we don't need over alignment.

How do you think about this?
I was confused by this and @rjmccall explained it here https://reviews.llvm.org/D97915/new/#2604871. Basically, we could not recover "raw frame pointer" (%call1) from coroutine frame pointer statically at deallocation time.
Oh, I understand why we need to store the address for raw frame now. Another question is that how do you think combine the pattern:
if (should over align) {
   /// ...
   mem = ...
} else {
   /// ...
   mem = ...
}
coro.begin(id, mem);
into this one:
%true.frame.ptr = call @llvm.coro.create.frame(new(call @llvm.coro.raw.frame.size()), NEW_ALIGN) ; we need a better name
                                                                                                                                                                         ; It would be lowered to store the address of the raw frame to the alloca in the middle end if needed
call @llvm.coro.begin(coro.id, %true.frame.ptr)
and this one:
call @delete(call @llvm.coro.raw.frame.ptr(), call @llvm.coro.raw.frame.size())                                         ; Then use  `llvm.coro.raw.frame.ptr()` and `llvm.coro.raw.frame.size()` directly whenever we want.
It looks like we could generate the same code in the front for normal and over aligned coroutines.
Yeah, I think it works for this patch alone. It shifts the semantic lowering from Clang to LLVM but does not perform less work. For future language support like D102147, @llvm.coro.create.frame needs to be repurposed based on the new semantics and that seems a sign that it should be implemented in frontend.

For language support, ::operator new(size_t, align_t), I think it could be implemented like:

%allocated = call @new(call @llvm.coro.raw.frame.size(), align_val)
%true.frame.ptr = call @llvm.coro.create.frame(%allocated, 0) ; if the second argument is 0, it means `llvm.coro.create.frame` could be lowered to `%allocated` simply.
call @llvm.coro.begin(coro.id, %true.frame.ptr)

It looks not hard to implement. And we don't need to refactor the CodeGen part a lot. In this way, I think the main effort to support ::operator new(size_t, align_t) would be in the Sema part and the works remained in CodeGen part would be little. It wouldn't touch the middle end part neither.

It shifts the semantic lowering from Clang to LLVM but does not perform less work.

I think it would be simpler. At least, we don't need to emit getReturnStmtOnAllocFailure twice and we don't need to touch CallCoroDelete neither. And we don't organize the basic blocks in the CodeGenCoroutineBody. And we could emit simpler AlignupTo (Although it could be simplified further, I believe).

And the extra work we need to do is to compare the alignment requirement for the coroutine frame with the second argument of llvm.coro.create.frame to see if we need to over align coroutine frame.
If yes, we need to lower the llvm.coro.create.frame to compute the true address for the coroutine frame and store the raw frame address.
If no, we could return %allocated simply.

In D97915#2861178, @ChuanqiXu wrote:
In D97915#2861036, @ychen wrote:
In D97915#2860984, @ChuanqiXu wrote:
In D97915#2860916, @ychen wrote:
In D97915#2859237, @ChuanqiXu wrote:
In D97915#2848816, @ychen wrote:
Thanks for clarifying. Let's solve the semantics problem first.
With the introduction about 'raw frame', I think it's necessary to introduce this concept in the section 'Switched-Resume Lowering' or even the section 'Introduction' in the document. Add a section to tell the terminology is satisfied too.

Done.

Then why we defined both 'llvm.coro.raw.frame.ptr.offset' and 'llvm.coro.raw.frame.ptr.addr' together? It looks like refer to the same value finally. It looks like 'llvm.coro.raw.frame.ptr.offset' are trying to solve the problem about memory leak. But I think we could use llvm.coro.raw.frame.ptr.addr directly instead of traversing the frame (Maybe we need to add an intrinsic llvm.coro.raw.size). Then we can omit a field in the frame to save space.

("llvm.coro.raw.frame.ptr.offset" is an offset from coroutine frame address instead of raw frame pointer)

Apologies for the confusion. I've briefly explained it here https://reviews.llvm.org/D102145#2752445 I think it is not clear. "llvm.coro.raw.frame.ptr.addr" is conceptually "the address of a coroutine frame field storing the raw frame pointer" only after insertSpills in CoroFrame.cpp. Before that, "llvm.coro.raw.frame.ptr.addr" is actually an alloca storing the raw frame pointer (try grepping "alloc.frame.ptr" in this review page). Using "llvm.coro.raw.frame.ptr.offset" instead of "llvm.coro.raw.frame.ptr.addr" is doable which looks like below, please check line 31. The downside is that the write to coroutine frame is not through an alloca but a direct write. It is unusual because all fields in the frame are stored as 1. special/header fields 2. alloca 3. splills. Doing the write indirectly as Alloca makes me comfortable. The tradeoff is one extra intrinsic "llvm.coro.raw.frame.ptr.addr". What do you think?
19 coro.alloc.align:                                 ; preds = %coro.alloc.check.align
20   %3 = sub nsw i64 64, 16
21   %4 = add i64 128, %3
22   %call1 = call noalias nonnull i8* @_Znwm(i64 %4) #13
23   %mask = sub i64 64, 1
24   %intptr = ptrtoint i8* %call1 to i64
25   %over_boundary = add i64 %intptr, %mask
26   %inverted_mask = xor i64 %mask, -1
27   %aligned_intptr = and i64 %over_boundary, %inverted_mask
28   %diff = sub i64 %aligned_intptr, %intptr
29   %aligned_result = getelementptr inbounds i8, i8* %call1, i64 %diff
30   call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 64) ]
31   store i8* %call1, i8** %alloc.frame.ptr, align 8                     

     ; Replace line 31 with below, and must makes sure line 46~line 48 is skipped.
     ; %poff = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
     ; %addr = getelementptr inbounds i8, i8* %aligned_result, i32 %poff
     ; %addr1 = bitcast i8* %addr to i8**
     ; store i8* %call1, i8** %addr1, align 8


32   br label %coro.init.from.coro.alloc.align
33
34 coro.init.from.coro.alloc.align:                  ; preds = %coro.alloc.align
35   %aligned_result.coro.init = phi i8* [ %aligned_result, %coro.alloc.align ]
36   br label %coro.init
37
38 coro.init:                                        ; preds = %coro.init.from.entry, %coro.init.from.coro.alloc.align, %cor
   o.init.from.coro.alloc
39   %5 = phi i8* [ %.coro.init, %coro.init.from.entry ], [ %call.coro.init, %coro.init.from.coro.alloc ], [ %aligned_result
   .coro.init, %coro.init.from.coro.alloc.align ]
40   %FramePtr = bitcast i8* %5 to %f0.Frame*
41   %resume.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 0
42   store void (%f0.Frame*)* @f0.resume, void (%f0.Frame*)** %resume.addr, align 8
43   %6 = select i1 true, void (%f0.Frame*)* @f0.destroy, void (%f0.Frame*)* @f0.cleanup
44   %destroy.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 1
45   store void (%f0.Frame*)* %6, void (%f0.Frame*)** %destroy.addr, align 8
46   %7 = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 2
47   %8 = load i8*, i8** %alloc.frame.ptr, align 8
48   store i8* %8, i8** %7, align 8
49   br label %AllocaSpillBB
50
51 AllocaSpillBB:                                    ; preds = %coro.init
52   %.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 4
53   %ref.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 5
54   %agg.tmp.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 6
55   %ref.tmp5.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 7
56   %agg.tmp8.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 8
57   %__promise.reload.addr = getelementptr inbounds %f0.Frame, %f0.Frame* %FramePtr, i32 0, i32 10
58   br label %PostSpill
Then I am a little confused for the design again, since we would treat the value for CoroBegin as the address of coroutine frame in the past and it looks like to be the raw frame now. Let me reconsider if it is OK.

The returned value of CoroBegin is still coroutine frame not a raw frame even if the frame is overaligned. You could check the above code.
Thanks for clarifying!

I don't understand why we need to store the address for coroutine raw frame in the coroutine frame. For example, %call1 in your example marks the address for the raw frame. Then can we use the value %call1 in every place where we want to use the address for coroutine frame?
If yes, I think we could emit an intrinsic called 'llvm.coro.raw.frame' in the frontend if we need to use the address for the raw frame. Then in the middle end, we could replace llvm.coro.raw.frame with %call1 simply. Similarly, we could define intrinsic llvm.coro.raw.frame.size. As far as I know from the codes, the address for the coroutine frame is mainly used for deallocation. So it should be fine I guess.

Then the code generated now looks roughly like:
if (should over align) {
   /// ...
   mem = ...
} else {
   /// ...
   mem = ...
}
coro.begin(id, mem);
It looks redundant since the then part and else part looks very similar. I understand it would be eliminated in the middle end. But another problem is that the redundant implementation in clang. Maybe we could solve it by refactoring.
But I am wondering if it is possible to use another pattern (assume llvm.coro.alloc returns true):
%raw.frame.ptr = new(call @llvm.coro.raw.frame.size())
%true.frame.ptr = call @llvm.coro.frame(%raw.frame.ptr, NEW_ALIGN) ; we need a better name
call @llvm.coro.begin(coro.id, %true.frame.ptr)
Then for llvm.coro.frame, we could return @raw.frame.ptr simply if the alignment could be satisfied (alignment needed is less than NEW_ALIGN). Or we could do simply to align up for the coroutine frame. There are many APIs in Align.h.
And for the destruction, we could emit:
call @delete(%raw.frame.ptr, call @llvm.coro.raw.frame.size())
In this way, I guess we would get simpler implementation and generated codes.

BTW, if we choose to do so, the semantics for llvm.coro.raw.frame.ptr and llvm.coro.size would change slightly. They would stands for the address and size for the coroutine frame if we don't need over alignment.

How do you think about this?
I was confused by this and @rjmccall explained it here https://reviews.llvm.org/D97915/new/#2604871. Basically, we could not recover "raw frame pointer" (%call1) from coroutine frame pointer statically at deallocation time.
Oh, I understand why we need to store the address for raw frame now. Another question is that how do you think combine the pattern:
if (should over align) {
   /// ...
   mem = ...
} else {
   /// ...
   mem = ...
}
coro.begin(id, mem);
into this one:
%true.frame.ptr = call @llvm.coro.create.frame(new(call @llvm.coro.raw.frame.size()), NEW_ALIGN) ; we need a better name
                                                                                                                                                                         ; It would be lowered to store the address of the raw frame to the alloca in the middle end if needed
call @llvm.coro.begin(coro.id, %true.frame.ptr)
and this one:
call @delete(call @llvm.coro.raw.frame.ptr(), call @llvm.coro.raw.frame.size())                                         ; Then use  `llvm.coro.raw.frame.ptr()` and `llvm.coro.raw.frame.size()` directly whenever we want.
It looks like we could generate the same code in the front for normal and over aligned coroutines.
Yeah, I think it works for this patch alone. It shifts the semantic lowering from Clang to LLVM but does not perform less work. For future language support like D102147, @llvm.coro.create.frame needs to be repurposed based on the new semantics and that seems a sign that it should be implemented in frontend.
For language support, ::operator new(size_t, align_t), I think it could be implemented like:
%allocated = call @new(call @llvm.coro.raw.frame.size(), align_val)
%true.frame.ptr = call @llvm.coro.create.frame(%allocated, 0) ; if the second argument is 0, it means `llvm.coro.create.frame` could be lowered to `%allocated` simply.
call @llvm.coro.begin(coro.id, %true.frame.ptr)
It looks not hard to implement. And we don't need to refactor the CodeGen part a lot. In this way, I think the main effort to support ::operator new(size_t, align_t) would be in the Sema part and the works remained in CodeGen part would be little. It wouldn't touch the middle end part neither.

I agree that something like this is simpler implementation-wise. Although in spirit, it is similar to D100739 which we decided not to pursue. What is the middle-end? is it LLVM?

It shifts the semantic lowering from Clang to LLVM but does not perform less work.

I think it would be simpler.

I agree it would be simpler.

At least, we don't need to emit getReturnStmtOnAllocFailure twice and

getReturnStmtOnAllocFailure is called twice but the code is emitted once.

we don't need to touch CallCoroDelete neither.

If we don't touch CallCoroDelete, then we cannot do the equivalent work in LLVM since there is no concept of deallocation function concept there.

And we don't organize the basic blocks in the CodeGenCoroutineBody.

That's true. It is delayed until CoroSplit.

And we could emit simpler AlignupTo (Although it could be simplified further, I believe).

D100739 has a version of it.

And the extra work we need to do is to compare the alignment requirement for the coroutine frame with the second argument of llvm.coro.create.frame to see if we need to over align coroutine frame.
If yes, we need to lower the llvm.coro.create.frame to compute the true address for the coroutine frame and store the raw frame address.
If no, we could return %allocated simply.

Yes, it is similar to CoroFrame.cpp changes in D100739.

In D97915#2862419, @ychen wrote:

It looks not hard to implement. And we don't need to refactor the CodeGen part a lot. In this way, I think the main effort to support ::operator new(size_t, align_t) would be in the Sema part and the works remained in CodeGen part would be little. It wouldn't touch the middle end part neither.

I agree that something like this is simpler implementation-wise. Although in spirit, it is similar to D100739 which we decided not to pursue. What is the middle-end? is it LLVM?

It shifts the semantic lowering from Clang to LLVM but does not perform less work.

I think it would be simpler.

I agree it would be simpler.

At least, we don't need to emit getReturnStmtOnAllocFailure twice and

getReturnStmtOnAllocFailure is called twice but the code is emitted once.

we don't need to touch CallCoroDelete neither.

If we don't touch CallCoroDelete, then we cannot do the equivalent work in LLVM since there is no concept of deallocation function concept there.

And we don't organize the basic blocks in the CodeGenCoroutineBody.

That's true. It is delayed until CoroSplit.

And we could emit simpler AlignupTo (Although it could be simplified further, I believe).

D102147 has a version of it.

And the extra work we need to do is to compare the alignment requirement for the coroutine frame with the second argument of llvm.coro.create.frame to see if we need to over align coroutine frame.
If yes, we need to lower the llvm.coro.create.frame to compute the true address for the coroutine frame and store the raw frame address.
If no, we could return %allocated simply.

Yes, it is similar to CoroFrame.cpp changes in D102147.

I agree that something like this is simpler implementation-wise. Although in spirit, it is similar to D100739 which we decided not to pursue. What is the middle-end? is it LLVM?

It is frustrating to break the decision made before. But I don't find the conclusion in D100739 that we shouldn't complete this in middle end.

getReturnStmtOnAllocFailure is called twice but the code is emitted once.

I don't think so. getReturnStmtOnAllocFailure is called twice and emitted twice. The code emitted by the frontend should be:

AllocBB:
  ; ...
  getReturnStmtOnAllocFailure()

AlignedAllocBB:
  ; ...
  getReturnStmtOnAllocFailure()

I understand that one of the branch would be pruned in the middle end. But in the frontend, they are emitted twice. And it's redundant in both compiler codes and generated codes.

If we don't touch CallCoroDelete, then we cannot do the equivalent work in LLVM since there is no concept of deallocation function concept there.

Sorry for confusing. I means that we could change semantics for llvm.coro.free to return the address of raw frame and touch the Sema part to pass llvm.coro.raw.frame.size as the second argument to deallocator.
It should be easy and clean to implement and we don't need to touch CallCoroDelete either.

And we don't organize the basic blocks in the CodeGenCoroutineBody.

That's true. It is delayed until CoroSplit.

I don't think it is delayed. I think the task to organize the BBs would be eliminated if we move the main work in the middle end.
I think if we move the work to select to over align or not, we could use the same IR structure. I think it is simpler, cleaner and more natural.

And we could emit simpler AlignupTo (Although it could be simplified further, I believe).

D102147 has a version of it.

And the extra work we need to do is to compare the alignment requirement for the coroutine frame with the second argument of llvm.coro.create.frame to see if we need to over align coroutine frame.
If yes, we need to lower the llvm.coro.create.frame to compute the true address for the coroutine frame and store the raw frame address.
If no, we could return %allocated simply.

Yes, it is similar to CoroFrame.cpp changes in D102147.

I guess you refer to the wrong revision. Since I don't find related things in D102147.

In D97915#2863581, @ChuanqiXu wrote:

In D97915#2862419, @ychen wrote:

It looks not hard to implement. And we don't need to refactor the CodeGen part a lot. In this way, I think the main effort to support ::operator new(size_t, align_t) would be in the Sema part and the works remained in CodeGen part would be little. It wouldn't touch the middle end part neither.

I agree that something like this is simpler implementation-wise. Although in spirit, it is similar to D100739 which we decided not to pursue. What is the middle-end? is it LLVM?

It shifts the semantic lowering from Clang to LLVM but does not perform less work.

I think it would be simpler.

I agree it would be simpler.

At least, we don't need to emit getReturnStmtOnAllocFailure twice and

getReturnStmtOnAllocFailure is called twice but the code is emitted once.

we don't need to touch CallCoroDelete neither.

If we don't touch CallCoroDelete, then we cannot do the equivalent work in LLVM since there is no concept of deallocation function concept there.

And we don't organize the basic blocks in the CodeGenCoroutineBody.

That's true. It is delayed until CoroSplit.

And we could emit simpler AlignupTo (Although it could be simplified further, I believe).

D102147 has a version of it.

And the extra work we need to do is to compare the alignment requirement for the coroutine frame with the second argument of llvm.coro.create.frame to see if we need to over align coroutine frame.
If yes, we need to lower the llvm.coro.create.frame to compute the true address for the coroutine frame and store the raw frame address.
If no, we could return %allocated simply.

Yes, it is similar to CoroFrame.cpp changes in D102147.

I agree that something like this is simpler implementation-wise. Although in spirit, it is similar to D100739 which we decided not to pursue. What is the middle-end? is it LLVM?

It is frustrating to break the decision made before. But I don't find the conclusion in D100739 that we shouldn't complete this in middle end.

Yeah, it is not explicitly stated there. That's just my conclusion based on @rjmccall's suggestion (https://reviews.llvm.org/D100739#2717582) and my following responses. I do think your proposal works for the dynamic allocation cases, however, it needs major change when aligned allocator/deallocator comes into play in the future when the issue is fixed at the language level. That is the primary reason that most of the implementations are in front-end and LLVM coroutine intrinsics are mostly agnostic of the alignment issue (newly added llvm.coro.raw.frame.ptr.* are two exceptions and all existing intrinsics are not touched in both in API or semantics).

getReturnStmtOnAllocFailure is called twice but the code is emitted once.

I don't think so. getReturnStmtOnAllocFailure is called twice and emitted twice. The code emitted by the frontend should be:
AllocBB:
  ; ...
  getReturnStmtOnAllocFailure()

AlignedAllocBB:
  ; ...
  getReturnStmtOnAllocFailure()
I understand that one of the branch would be pruned in the middle end. But in the frontend, they are emitted twice. And it's redundant in both compiler codes and generated codes.

Got you. I meant alloc failure block is emitted once and agree that both aligned and normal alloc block are emitted.

If we don't touch CallCoroDelete, then we cannot do the equivalent work in LLVM since there is no concept of deallocation function concept there.

Sorry for confusing. I means that we could change semantics for llvm.coro.free to return the address of raw frame and touch the Sema part to pass llvm.coro.raw.frame.size as the second argument to deallocator.
It should be easy and clean to implement and we don't need to touch CallCoroDelete either.

And we don't organize the basic blocks in the CodeGenCoroutineBody.

That's true. It is delayed until CoroSplit.

I don't think it is delayed. I think the task to organize the BBs would be eliminated if we move the main work in the middle end.
I think if we move the work to select to over align or not, we could use the same IR structure. I think it is simpler, cleaner and more natural.

Agree. But like I said above, IMHO, that's a secondary goal here. LLVM better only deal with optimizations, not semantics. (but, yeah, coroutine is special but the principle still applies).

And we could emit simpler AlignupTo (Although it could be simplified further, I believe).

D102147 has a version of it.

And the extra work we need to do is to compare the alignment requirement for the coroutine frame with the second argument of llvm.coro.create.frame to see if we need to over align coroutine frame.
If yes, we need to lower the llvm.coro.create.frame to compute the true address for the coroutine frame and store the raw frame address.
If no, we could return %allocated simply.

Yes, it is similar to CoroFrame.cpp changes in D102147.

I guess you refer to the wrong revision. Since I don't find related things in D102147.

Sorry for the typo. It should be D100739.

In D97915#2863615, @ychen wrote:

That's just my conclusion based on @rjmccall's suggestion (https://reviews.llvm.org/D100739#2717582) and my following responses.

I guess you got the conclusion from this:

2d. Use the correct allocator for the frame alignment; both allocators are (allowed to be) ODR-used, but only one would be dynamically used. This is what would be necessary for the implementation I suggested above. In reality there won't be any dynamic overhead because we should always be able to fold the branch after allocation.

It looks like necessary to emit two BBs in the frontend which use the different allocators. Then we prune the branch in the middle end.

But I still feel like that there are redundancies in current implementation. Let me think more about it.

avogelsgesang added a subscriber: avogelsgesang.Jul 13 2021, 10:41 AM

ChuanqiXu mentioned this in D106248: [Coroutines] Overalign coroutine frame when frame alignment exceeds the alignment limit.Jul 18 2021, 7:23 PM

327141f is landed as an alternative.

Herald added a project: Restricted Project. · View Herald TranscriptSep 23 2022, 5:27 PM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGCoroutine.cpp

209 lines

test/

CodeGenCoroutines/

coro-alloc.cpp

88 lines

coro-cleanup.cpp

42 lines

coro-gro.cpp

1 line

llvm/

docs/

Coroutines.rst

101 lines

include/

llvm/

IR/

Intrinsics.td

3 lines

lib/

Transforms/

Coroutines/

65 lines

41 lines

4 lines

47 lines

13 lines

test/

Transforms/

Coroutines/

coro-frame-overalign.ll

78 lines

Diff 356838

clang/lib/CodeGen/CGCoroutine.cpp

//===----- CGCoroutine.cpp - Emit LLVM Code for C++ coroutines ------------===//		//===----- CGCoroutine.cpp - Emit LLVM Code for C++ coroutines ------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This contains code dealing with C++ code generation of coroutines.		// This contains code dealing with C++ code generation of coroutines.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "CGCleanup.h"		#include "CGCleanup.h"
#include "CodeGenFunction.h"		#include "CodeGenFunction.h"
#include "llvm/ADT/ScopeExit.h"
#include "clang/AST/StmtCXX.h"		#include "clang/AST/StmtCXX.h"
#include "clang/AST/StmtVisitor.h"		#include "clang/AST/StmtVisitor.h"
		#include "llvm/ADT/ScopeExit.h"
		#include "llvm/IR/BasicBlock.h"
		#include "llvm/IR/IntrinsicInst.h"
		#include <cstdint>

using namespace clang;		using namespace clang;
using namespace CodeGen;		using namespace CodeGen;

using llvm::Value;		using llvm::Value;
using llvm::BasicBlock;		using llvm::BasicBlock;

namespace {		namespace {
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	struct clang::CodeGen::CGCoroData {
// Stores the llvm.coro.begin emitted in the function so that we can replace		// Stores the llvm.coro.begin emitted in the function so that we can replace
// all coro.frame intrinsics with direct SSA value of coro.begin that returns		// all coro.frame intrinsics with direct SSA value of coro.begin that returns
// the address of the coroutine frame of the current coroutine.		// the address of the coroutine frame of the current coroutine.
llvm::CallInst *CoroBegin = nullptr;		llvm::CallInst *CoroBegin = nullptr;

// Stores the last emitted coro.free for the deallocate expressions, we use it		// Stores the last emitted coro.free for the deallocate expressions, we use it
// to wrap dealloc code with if(auto mem = coro.free) dealloc(mem).		// to wrap dealloc code with if(auto mem = coro.free) dealloc(mem).
llvm::CallInst *LastCoroFree = nullptr;		llvm::CallInst *LastCoroFree = nullptr;
		bool LastCoroFreeUsedForDealloc = false;

// If coro.id came from the builtin, remember the expression to give better		// If coro.id came from the builtin, remember the expression to give better
// diagnostic. If CoroIdExpr is nullptr, the coro.id was created by		// diagnostic. If CoroIdExpr is nullptr, the coro.id was created by
// EmitCoroutineBody.		// EmitCoroutineBody.
CallExpr const *CoroIdExpr = nullptr;		CallExpr const *CoroIdExpr = nullptr;
};		};

// Defining these here allows to keep CGCoroData private to this file.		// Defining these here allows to keep CGCoroData private to this file.
▲ Show 20 Lines • Show All 321 Lines • ▼ Show 20 Lines	if (Bundles.empty()) {
// either to a cleanup block or a block with EH resume instruction.		// either to a cleanup block or a block with EH resume instruction.
auto ResumeBB = CGF.getEHResumeBlock(/isCleanup=*/true);		auto ResumeBB = CGF.getEHResumeBlock(/isCleanup=*/true);
auto *CleanupContBB = CGF.createBasicBlock("cleanup.cont");		auto *CleanupContBB = CGF.createBasicBlock("cleanup.cont");
CGF.Builder.CreateCondBr(CoroEnd, ResumeBB, CleanupContBB);		CGF.Builder.CreateCondBr(CoroEnd, ResumeBB, CleanupContBB);
CGF.EmitBlock(CleanupContBB);		CGF.EmitBlock(CleanupContBB);
}		}
}		}
};		};

		// If the coroutine frame is overaligned and only an allocation function
		ChuanqiXuUnsubmitted Done Reply Inline Actions We should capitalize it as 'OverAllocateFrame' ChuanqiXu: We should capitalize it as 'OverAllocateFrame'
		ChuanqiXuUnsubmitted Done Reply Inline Actions It looks like we'd better to add comment for this function. ChuanqiXu: It looks like we'd better to add comment for this function.
		// that does not take `std::align_val_t` is available, the proper alignement
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions CoroSizeIdx should be zero all the time in this patch. ChuanqiXu: CoroSizeIdx should be zero all the time in this patch.
		// for coroutine frame is achieved by allocating more memory than needed and
		// dynamically adjust the frame start address at runtime.
		void GrowFrameSize(CodeGenFunction &CGF, llvm::CallInst *CI, bool IsAlloc) {
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions So if this would be called for deallocate. the function name is confusing. I think it may be better to rename it as something like 'GeSizeOFtOverAlignedFrame' (The name suggested looks not good too). By the way, now I am wondering why wouldn't we use llvm.coro.size directly? And make the middle end to handle it. How do you think about it? ChuanqiXu: So if this would be called for deallocate. the function name is confusing. I think it may be…
		ychenAuthorUnsubmitted Done Reply Inline Actions So if this would be called for deallocate. the function name is confusing. I think it may be better to rename it as something like 'GeSizeOFtOverAlignedFrame' (The name suggested looks not good too). Renamed to `GrowFrameSize` since there are similar uses of `grow` in LLVM. Please let me know if it makes sense. By the way, now I am wondering why wouldn't we use llvm.coro.size directly? And make the middle end to handle it. How do you think about it? I think it works to a certain degree and attempted it with D100739. In theory, the over-alignment handling should be dealt with in the front-end since that's an ABI issue (not specified in any ABI documentation though). However, coroutine is special since the optimizer could change the alignment so there must be some work that needs to be delayed until CoroSplit(CoroFrame) time. In D100739, I was trying to argue that more than one frontend may need this over-alignment handling hence it might be ok to implement all work in LLVM. @rjmccall (https://reviews.llvm.org/D100739#2718681) seems not a fan of that and thinks `llvm.coro.raw.frame.ptr.offset` could be the way forward hence the current design which let front-end do all the work and leave the required piece to LLVM. That required piece is, at CoroSplit(CoroFrame) time, decide if the frame is over-aligned and if so add the `raw frame pointer` to the frame itself. ychen: > So if this would be called for deallocate. the function name is confusing. I think it may be…
		unsigned CoroSizeIdx = IsAlloc ? 0 : 1;
		CGBuilderTy &Builder = CGF.Builder;
		auto OrigIP = Builder.saveIP();
		Builder.SetInsertPoint(CI);
		llvm::Function *CoroAlign =
		CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_align, CGF.SizeTy);
		const auto &TI = CGF.CGM.getContext().getTargetInfo();
		unsigned AlignOfNew = TI.getNewAlign() / TI.getCharWidth();
		auto *AlignCall = Builder.CreateCall(CoroAlign);
		ChuanqiXuUnsubmitted Done Reply Inline Actions In other comments, I find 'size += align - NEW_ALIGN + sizeof(void);'. But I don't find sizeof(void) in this function. ChuanqiXu: In other comments, I find 'size += align - NEW_ALIGN + sizeof(void*);'. But I don't find sizeof…
		ychenAuthorUnsubmitted Done Reply Inline Actions Sorry, that's a stale comment. It should be `size += align - NEW_ALIGN`. The `sizeof(void)` was supposed for the newly added raw memory pointer stored in the frame. In the current implementation, `sizeof(void)` is factored into the `llvm.coro.size()` calculation because CoroFrame is responsible for allocating the extra raw memory pointer if it is needed at all. ychen: Sorry, that's a stale comment. It should be `size += align - NEW_ALIGN`. The `sizeof(void*)`…
		auto *AlignOfNewInt = llvm::ConstantInt::get(CGF.SizeTy, AlignOfNew, true);
		auto *Diff = Builder.CreateNSWSub(AlignCall, AlignOfNewInt);
		auto *NewCoroSize = Builder.CreateAdd(CI->getArgOperand(CoroSizeIdx), Diff);
		CI->setArgOperand(CoroSizeIdx, NewCoroSize);
		Builder.restoreIP(OrigIP);
		}

		void EmitDynamicAlignedDealloc(CodeGenFunction &CGF,
		llvm::BasicBlock *AlignedFreeBB,
		llvm::CallInst *CoroFree) {
		llvm::CallInst *Dealloc = nullptr;
		for (llvm::User *U : CoroFree->users()) {
		if (auto *CI = dyn_cast<llvm::CallInst>(U))
		if (CI->getParent() == CGF.Builder.GetInsertBlock())
		Dealloc = CI;
		}
		assert(Dealloc);

		CGF.Builder.SetInsertPoint(AlignedFreeBB->getFirstNonPHI());

		// Replace `coro.free` argument with the address from coroutine frame.

		llvm::Function *RawFramePtrOffsetIntrin = CGF.CGM.getIntrinsic(
		llvm::Intrinsic::coro_raw_frame_ptr_offset, CGF.Int32Ty);
		auto *RawFramePtrOffset = CGF.Builder.CreateCall(RawFramePtrOffsetIntrin);
		auto *FramePtrAddrStart =
		CGF.Builder.CreateInBoundsGEP(CoroFree, {RawFramePtrOffset});
		auto *FramePtrAddr = CGF.Builder.CreatePointerCast(
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions We allocate overaligned-frame like: \| --- for align --- \| --- true frame --- \| And here we set the argument to the address of true frame. Then I wonder how about the memory for the `for align` part. Would we still free them correctly? Maybe I missed something. ChuanqiXu: We allocate overaligned-frame like: ``` \| --- for align --- \| --- true frame --- \| ``` And here…
		ychenAuthorUnsubmitted Done Reply Inline Actions Would we still free them correctly? Yes, that's the tricky part. Using `f0` of `coro-alloc.cpp` as an example, `llvm.coro.raw.frame.ptr.addr` is called at alloc time to save the raw memory pointer to the coroutine frame. Later at dealloc time, `llvm.coro.raw.frame.ptr.addr` is called again to load the raw memory pointer back and free it. ychen: > Would we still free them correctly? Yes, that's the tricky part. Using `f0` of `coro-alloc.
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions To make it clear, what's the definition for 'raw ptr'? From the context, I think it means the true frame in above diagram from the context. So this confuses me: llvm.coro.raw.frame.ptr.addr is called again to load the raw memory pointer back and free it. If the raw memory means the true frame, it may not be right. Since the part for 'for-align' wouldn't be freed. ChuanqiXu: To make it clear, what's the definition for 'raw ptr'? From the context, I think it means the…
		ychenAuthorUnsubmitted Done Reply Inline Actions I've updated the `coroutine.rst` to a hopefully better explanation of the semantics of the newly added intrinsics. With the above diagram, `raw frame` is the whole thing `\| --- for align --- \| --- true frame --- \|`, `raw frame ptr` points to the left bar of `for align`. ychen: I've updated the `coroutine.rst` to a hopefully better explanation of the semantics of the…
		FramePtrAddrStart, CGF.Int8PtrTy->getPointerTo());
		auto *FramePtr =
		CGF.Builder.CreateLoad({FramePtrAddr, CGF.getPointerAlign()});
		Dealloc->setArgOperand(0, FramePtr);

		// Match size_t argument with the one used during allocation.

		assert(Dealloc->getNumArgOperands() >= 1);
		if (Dealloc->getNumArgOperands() > 1) {
		// Size may only be the second argument of allocator call.
		if (auto *CoroSize =
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions We don't need to handle this condition in this patch. ChuanqiXu: We don't need to handle this condition in this patch.
		ychenAuthorUnsubmitted Done Reply Inline Actions This handling is for `sized delete` (`void T::operator delete ( void* ptr, std::size_t sz );`) instead of `aligned delete`. `sized delete` needs the same `size` that is used for `new`. Please check the `f3` test in `coro-alloc.cpp` (The test was missing the CHECK lines for this, I've added it.). ychen: This handling is for `sized delete` (`void T::operator delete ( void* ptr, std::size_t sz );`)…
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions hmm, I understand it a bit more now. ChuanqiXu: hmm, I understand it a bit more now.
		dyn_cast<llvm::IntrinsicInst>(Dealloc->getArgOperand(1)))
		if (CoroSize->getIntrinsicID() == llvm::Intrinsic::coro_size)
		GrowFrameSize(CGF, Dealloc, /IsAlloc/ false);
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions We don't need this in this patch. ChuanqiXu: We don't need this in this patch.
		ychenAuthorUnsubmitted Done Reply Inline Actions Do you mean `// Match size_t argument with the one used during allocation.` or the function `emitDynamicAlignedDealloc`? I think either is needed here. Could you please elaborate? ychen: Do you mean `// Match size_t argument with the one used during allocation.` or the function…
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Sorry for that I misunderstand this function earlier. ChuanqiXu: Sorry for that I misunderstand this function earlier.
		}

		ChuanqiXuUnsubmitted Done Reply Inline Actions Capitalize `EmitCheckAlignBasicBlock` ChuanqiXu: Capitalize `EmitCheckAlignBasicBlock`
		CGF.Builder.SetInsertPoint(AlignedFreeBB);
		}

		void EmitCheckAlignBasicBlock(CodeGenFunction &CGF,
		llvm::BasicBlock *CheckAlignBB,
		llvm::BasicBlock *AlignBB,
		ChuanqiXuUnsubmitted Done Reply Inline Actions This code would only work if we use `::operator new(size_t, align_val_t)`, which is implemented in another patch. I would suggest to move this into that one. ChuanqiXu: This code would only work if we use `::operator new(size_t, align_val_t)`, which is implemented…
		ychenAuthorUnsubmitted Done Reply Inline Actions It handles both aligned and normal new/delete. ychen: It handles both aligned and normal new/delete.
		llvm::BasicBlock *NonAlignBB) {
		CGF.EmitBlock(CheckAlignBB);

		auto &Builder = CGF.Builder;
		auto &TI = CGF.CGM.getContext().getTargetInfo();
		unsigned NewAlign = TI.getNewAlign() / TI.getCharWidth();
		auto *CoroAlign = Builder.CreateCall(
		CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_align, CGF.SizeTy));
		auto *AlignOfNew = llvm::ConstantInt::get(CGF.SizeTy, NewAlign);
		auto *Cmp =
		Builder.CreateICmp(llvm::CmpInst::ICMP_UGT, CoroAlign, AlignOfNew);
		Builder.CreateCondBr(Cmp, AlignBB, NonAlignBB);
}		}

namespace {
// Make sure to call coro.delete on scope exit.		// Make sure to call coro.delete on scope exit.
struct CallCoroDelete final : public EHScopeStack::Cleanup {		struct CallCoroDelete final : public EHScopeStack::Cleanup {
Stmt *Deallocate;		Stmt *Deallocate;

// Emit "if (coro.free(CoroId, CoroBegin)) Deallocate;"		// Emit "if (coro.free(CoroId, CoroBegin)) Deallocate;"

// Note: That deallocation will be emitted twice: once for a normal exit and		// Note: That deallocation will be emitted twice: once for a normal exit and
// once for exceptional exit. This usage is safe because Deallocate does not		// once for exceptional exit. This usage is safe because Deallocate does not
// contain any declarations. The SubStmtBuilder::makeNewAndDeleteExpr()		// contain any declarations. The SubStmtBuilder::makeNewAndDeleteExpr()
// builds a single call to a deallocation function which is safe to emit		// builds a single call to a deallocation function which is safe to emit
// multiple times.		// multiple times.
void Emit(CodeGenFunction &CGF, Flags) override {		void Emit(CodeGenFunction &CGF, Flags) override {
// Remember the current point, as we are going to emit deallocation code		// Remember the current point, as we are going to emit deallocation code
// first to get to coro.free instruction that is an argument to a delete		// first to get to coro.free instruction that is an argument to a delete
// call.		// call.
BasicBlock *SaveInsertBlock = CGF.Builder.GetInsertBlock();		BasicBlock *SaveInsertBlock = CGF.Builder.GetInsertBlock();

		auto *CheckAlignBB = CGF.createBasicBlock("coro.free.check.align");
		auto *AlignedFreeBB = CGF.createBasicBlock("coro.free.align");
auto *FreeBB = CGF.createBasicBlock("coro.free");		auto *FreeBB = CGF.createBasicBlock("coro.free");
		auto *AfterFreeBB = CGF.createBasicBlock("after.coro.free");

		EmitCheckAlignBasicBlock(CGF, CheckAlignBB, AlignedFreeBB, FreeBB);

CGF.EmitBlock(FreeBB);		CGF.EmitBlock(FreeBB);
CGF.EmitStmt(Deallocate);		CGF.EmitStmt(Deallocate);
		CGF.Builder.CreateBr(AfterFreeBB);
auto *AfterFreeBB = CGF.createBasicBlock("after.coro.free");
CGF.EmitBlock(AfterFreeBB);

// We should have captured coro.free from the emission of deallocate.		// We should have captured coro.free from the emission of deallocate.
auto *CoroFree = CGF.CurCoro.Data->LastCoroFree;		auto *CoroFree = CGF.CurCoro.Data->LastCoroFree;
		CGF.CurCoro.Data->LastCoroFreeUsedForDealloc = true;
if (!CoroFree) {		if (!CoroFree) {
CGF.CGM.Error(Deallocate->getBeginLoc(),		CGF.CGM.Error(Deallocate->getBeginLoc(),
"Deallocation expressoin does not refer to coro.free");		"Deallocation expressoin does not refer to coro.free");
return;		return;
}		}

		CGF.EmitBlock(AlignedFreeBB);
		CGF.EmitStmt(Deallocate);
		CGF.CurCoro.Data->LastCoroFreeUsedForDealloc = false;
		EmitDynamicAlignedDealloc(CGF, AlignedFreeBB, CoroFree);

		CGF.EmitBlock(AfterFreeBB);

// Get back to the block we were originally and move coro.free there.		// Get back to the block we were originally and move coro.free there.
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions It looks like it would emit a `deallocate` first, and emit an `alignedDeallocate`, which is very odd. Although I can find that the second `deallocate` wouldn't be emitted due to the check `LastCoroFreeUsedForDealloc`, it is still very odd to me. If the second `deallocate` wouldn't come up all the way, what's the reason we need to write `emit(deallocate)` twice? ChuanqiXu: It looks like it would emit a `deallocate` first, and emit an `alignedDeallocate`, which is…
		ychenAuthorUnsubmitted Done Reply Inline Actions Agree that `LastCoroFreeUsedForDealloc` is a bit confusing. It makes sure deallocation and aligned deallocation share one `coro.free`. Otherwise, AFAIK, there would be two `coro.free` get codegen'd. %mem = llvm.coro.free() br i1 <overalign> , label <aligend-dealloc>, label <dealloc> aligend-dealloc: use %mem dealloc: use %mem what's the reason we need to write emit(deallocate) twice? John wrote a code snippet here: https://reviews.llvm.org/D100739#2717582. I think it would be helpful to look at the changed tests below to see the patterns. Basically, for allocation, it looks like below; for deallocation, it would be similar. void rawFrame =nullptr; ... if (llvm.coro.alloc()) { size_t size = llvm.coro.size(), align = llvm.coro.align(); if (align > NEW_ALIGN) { #if <an allocation function without std::align_val_t argument is selected by Sema> size += align - NEW_ALIGN + sizeof(void); frame = operator new(size); rawFrame = frame; frame = (frame + align - 1) & ~(align - 1); #else // If an aligned allocation function is selected. frame = operator new(size, align); #endif } else { frame = operator new(size); } } The true branch of the #if directive is equivalent to "coro.alloc.align" block (and "coro.alloc.align2" if `get_return_object_on_allocation_failure` is defined), the false branch is equivalent to "coro.alloc" block. The above pattern handles both aligned/normal allocation/deallocation so it is independent of D102147. ychen: Agree that `LastCoroFreeUsedForDealloc` is a bit confusing. It makes sure deallocation and…
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Thanks. I get the reason why I am thinking the code isn't natural. Since I think `::operator new(size_t, align_val_t)` shouldn't come up in this patch which should be available after D102147 applies. Here you said this patch is independent with D102147, I believe this patch could work without D102147. But it contains the codes which would work only if we applies the successor patch, so I think it is dependent on D102147. The ideally relationship for me is to merge `D102145` into this one (Otherwise it is weird for me that `D102145` only introduces some intrinsics which wouldn't be used actually). Then this patch should handle the alignment for variables in coroutine frame without introducing `::new(size_t, align_val_t)`. Then the final patch could do the job that searching and generating code for `::new(size_t, align_val_t)`. Maybe it is a little bit hard to rebase again and again. But I think it is better. ChuanqiXu: Thanks. I get the reason why I am thinking the code isn't natural. Since I think `::operator…
		ychenAuthorUnsubmitted Done Reply Inline Actions I think I know where the confusion comes from. `AlignedDeallocate` is not guaranteed to be an aligned allocator. In this patch in `SemaCoroutine.cpp`, it is set to `Deallocate` in which case we always dynamically adjust frame alignment. Once D102147 is landed. `AlignedDeallocate` may or may not be an aligned allocator. The ideally relationship for me is to merge D102145 into this one (Otherwise it is weird for me that D102145 only introduces some intrinsics which wouldn't be used actually). Then this patch should handle the alignment for variables in coroutine frame without introducing ::new(size_t, align_val_t). Then the final patch could do the job that searching and generating code for ::new(size_t, align_val_t). I was worried about the size of the patch if this is merged with D102145 but if that is preferred by more than one reviewer, I'll go ahead and do that. D102145 is pretty self-contained in that it does not contain clients of the added intrinsics but the introduced test should cover the expected intrinsic lowering. ychen: I think I know where the confusion comes from. `AlignedDeallocate` is not guaranteed to be an…
		ychenAuthorUnsubmitted Done Reply Inline Actions Naming is hard. I had a hard time figuring out a better name. `AlignedDeallocate`/`AlignedAllocate` is intended to refer to allocator/deallocator used for handling overaligned frame. Not that they are referring to allocator/deallocator with std::align_val_t argument. ychen: Naming is hard. I had a hard time figuring out a better name.
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions I think it is better for me to merge `D102145` into this one to understand this patch. For example, the test cases in `D102145` looks weird to me since it doesn't do over alignment at all like we discussed in that thread. Maybe my understanding is not right, but I think it isn't pretty self-contained. I am OK to wait for opinions from other reviewers. ChuanqiXu: I think it is better for me to merge `D102145` into this one to understand this patch. For…
auto *InsertPt = SaveInsertBlock->getTerminator();		auto *InsertPt = SaveInsertBlock->getTerminator();
CoroFree->moveBefore(InsertPt);		CoroFree->moveBefore(InsertPt);
CGF.Builder.SetInsertPoint(InsertPt);		CGF.Builder.SetInsertPoint(InsertPt);

// Add if (auto *mem = coro.free) Deallocate;		// Add if (auto *mem = coro.free) Deallocate;
auto *NullPtr = llvm::ConstantPointerNull::get(CGF.Int8PtrTy);		auto *NullPtr = llvm::ConstantPointerNull::get(CGF.Int8PtrTy);
auto *Cond = CGF.Builder.CreateICmpNE(CoroFree, NullPtr);		auto *Cond = CGF.Builder.CreateICmpNE(CoroFree, NullPtr);
CGF.Builder.CreateCondBr(Cond, FreeBB, AfterFreeBB);		CGF.Builder.CreateCondBr(Cond, CheckAlignBB, AfterFreeBB);

// No longer need old terminator.		// No longer need old terminator.
InsertPt->eraseFromParent();		InsertPt->eraseFromParent();
CGF.Builder.SetInsertPoint(AfterFreeBB);		CGF.Builder.SetInsertPoint(AfterFreeBB);
}		}
explicit CallCoroDelete(Stmt *DeallocStmt) : Deallocate(DeallocStmt) {}		explicit CallCoroDelete(Stmt *DeallocStmt) : Deallocate(DeallocStmt) {}
};		};
		ChuanqiXuUnsubmitted Done Reply Inline Actions Do we still need this change? ChuanqiXu: Do we still need this change?
		ychenAuthorUnsubmitted Done Reply Inline Actions Nope ychen: Nope
}		}

namespace {		namespace {
struct GetReturnObjectManager {		struct GetReturnObjectManager {
CodeGenFunction &CGF;		CodeGenFunction &CGF;
CGBuilderTy &Builder;		CGBuilderTy &Builder;
const CoroutineBodyStmt &S;		const CoroutineBodyStmt &S;

▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	static void emitBodyAndFallthrough(CodeGenFunction &CGF,
const CoroutineBodyStmt &S, Stmt *Body) {		const CoroutineBodyStmt &S, Stmt *Body) {
CGF.EmitStmt(Body);		CGF.EmitStmt(Body);
const bool CanFallthrough = CGF.Builder.GetInsertBlock();		const bool CanFallthrough = CGF.Builder.GetInsertBlock();
if (CanFallthrough)		if (CanFallthrough)
if (Stmt *OnFallthrough = S.getFallthroughHandler())		if (Stmt *OnFallthrough = S.getFallthroughHandler())
CGF.EmitStmt(OnFallthrough);		CGF.EmitStmt(OnFallthrough);
}		}

		static llvm::Value emitAlignUpTo(CodeGenFunction &CGF, llvm::Value Src,
		llvm::Value Align, const Expr E) {
		auto &Builder = CGF.Builder;
		llvm::Type *SrcType = Src->getType();
		llvm::IntegerType *IntType = llvm::IntegerType::get(
		CGF.getLLVMContext(),
		CGF.CGM.getDataLayout().getIndexTypeSizeInBits(SrcType));

		llvm::Value *Alignment = Align;
		auto *One = llvm::ConstantInt::get(IntType, 1);
		llvm::Value *Mask = Builder.CreateSub(Alignment, One, "mask");

		llvm::Value *SrcAddr = Builder.CreatePtrToInt(Src, IntType, "intptr");
		llvm::Value *SrcForMask = Builder.CreateAdd(SrcAddr, Mask, "over_boundary");
		llvm::Value *InvertedMask = Builder.CreateNot(Mask, "inverted_mask");
		llvm::Value *Result =
		Builder.CreateAnd(SrcForMask, InvertedMask, "aligned_result");

		Result->setName("aligned_intptr");
		llvm::Value *Difference = Builder.CreateSub(Result, SrcAddr, "diff");

		unsigned addressSpace = cast<llvm::PointerType>(SrcType)->getAddressSpace();
		llvm::PointerType *destType = CGF.Int8PtrTy;
		if (addressSpace)
		destType = llvm::Type::getInt8PtrTy(CGF.getLLVMContext(), addressSpace);
		Value *Base = Src;
		if (SrcType != destType)
		Base = Builder.CreateBitCast(Src, destType);

		if (CGF.getLangOpts().isSignedOverflowDefined())
		Result = Builder.CreateGEP(Base, Difference, "aligned_result");
		else
		Result = Builder.CreateInBoundsGEP(Base, Difference, "aligned_result");
		Result = Builder.CreatePointerCast(Result, SrcType);

		if (Alignment->getType() != CGF.IntPtrTy)
		Alignment =
		Builder.CreateIntCast(Alignment, CGF.IntPtrTy, false, "casted.align");

		Builder.CreateAlignmentAssumption(CGF.CGM.getDataLayout(), Result, Alignment);

		assert(Result->getType() == SrcType);
		return Result;
		}

void CodeGenFunction::EmitCoroutineBody(const CoroutineBodyStmt &S) {		void CodeGenFunction::EmitCoroutineBody(const CoroutineBodyStmt &S) {
auto *NullPtr = llvm::ConstantPointerNull::get(Builder.getInt8PtrTy());		auto *NullPtr = llvm::ConstantPointerNull::get(Builder.getInt8PtrTy());
auto &TI = CGM.getContext().getTargetInfo();		auto &TI = CGM.getContext().getTargetInfo();
unsigned NewAlign = TI.getNewAlign() / TI.getCharWidth();		unsigned NewAlign = TI.getNewAlign() / TI.getCharWidth();

auto *EntryBB = Builder.GetInsertBlock();		auto *EntryBB = Builder.GetInsertBlock();
auto *AllocBB = createBasicBlock("coro.alloc");		auto *AllocBB = createBasicBlock("coro.alloc");
		auto *AlignAllocBB = createBasicBlock("coro.alloc.align");
		auto *CheckAlignBB = createBasicBlock("coro.alloc.check.align");
auto *InitBB = createBasicBlock("coro.init");		auto *InitBB = createBasicBlock("coro.init");
auto *FinalBB = createBasicBlock("coro.final");		auto *FinalBB = createBasicBlock("coro.final");
auto *RetBB = createBasicBlock("coro.ret");		auto *RetBB = createBasicBlock("coro.ret");
		llvm::BasicBlock *RetOnFailureBB = nullptr;
		llvm::BasicBlock *AlignAllocBBCont = nullptr;
		ChuanqiXuUnsubmitted Done Reply Inline Actions It may be better to rename AlignAllocBB2 as AlignAllocBBCont or something similar. ChuanqiXu: It may be better to rename AlignAllocBB2 as AlignAllocBBCont or something similar.

auto *CoroId = Builder.CreateCall(		auto *CoroId = Builder.CreateCall(
CGM.getIntrinsic(llvm::Intrinsic::coro_id),		CGM.getIntrinsic(llvm::Intrinsic::coro_id),
{Builder.getInt32(NewAlign), NullPtr, NullPtr, NullPtr});		{Builder.getInt32(NewAlign), NullPtr, NullPtr, NullPtr});
createCoroData(*this, CurCoro, CoroId);		createCoroData(*this, CurCoro, CoroId);
CurCoro.Data->SuspendBB = RetBB;		CurCoro.Data->SuspendBB = RetBB;
assert(ShouldEmitLifetimeMarkers &&		assert(ShouldEmitLifetimeMarkers &&
"Must emit lifetime intrinsics for coroutines");		"Must emit lifetime intrinsics for coroutines");

// Backend is allowed to elide memory allocations, to help it, emit		// Backend is allowed to elide memory allocations, to help it, emit
// auto mem = coro.alloc() ? 0 : ... allocation code ...;		// auto mem = coro.alloc() ? 0 : ... allocation code ...;
auto *CoroAlloc = Builder.CreateCall(		auto *CoroAlloc = Builder.CreateCall(
CGM.getIntrinsic(llvm::Intrinsic::coro_alloc), {CoroId});		CGM.getIntrinsic(llvm::Intrinsic::coro_alloc), {CoroId});

Builder.CreateCondBr(CoroAlloc, AllocBB, InitBB);		Builder.CreateCondBr(CoroAlloc, CheckAlignBB, InitBB);

		EmitCheckAlignBasicBlock(*this, CheckAlignBB, AlignAllocBB, AllocBB);

EmitBlock(AllocBB);		EmitBlock(AllocBB);
auto *AllocateCall = EmitScalarExpr(S.getAllocate());		auto *AllocateCall = EmitScalarExpr(S.getAllocate());
auto *AllocOrInvokeContBB = Builder.GetInsertBlock();		auto *AllocOrInvokeContBB = Builder.GetInsertBlock();

// Handle allocation failure if 'ReturnStmtOnAllocFailure' was provided.		// Handle allocation failure if 'ReturnStmtOnAllocFailure' was provided.
if (auto *RetOnAllocFailure = S.getReturnStmtOnAllocFailure()) {		if (auto *RetOnAllocFailure = S.getReturnStmtOnAllocFailure()) {
auto *RetOnFailureBB = createBasicBlock("coro.ret.on.failure");		RetOnFailureBB = createBasicBlock("coro.ret.on.failure");

// See if allocation was successful.		// See if allocation was successful.
auto *NullPtr = llvm::ConstantPointerNull::get(Int8PtrTy);
auto *Cond = Builder.CreateICmpNE(AllocateCall, NullPtr);		auto *Cond = Builder.CreateICmpNE(AllocateCall, NullPtr);
Builder.CreateCondBr(Cond, InitBB, RetOnFailureBB);		Builder.CreateCondBr(Cond, InitBB, RetOnFailureBB);

// If not, return OnAllocFailure object.		// If not, return OnAllocFailure object.
EmitBlock(RetOnFailureBB);		EmitBlock(RetOnFailureBB);
EmitStmt(RetOnAllocFailure);		EmitStmt(RetOnAllocFailure);
}		}
else {		else {
Builder.CreateBr(InitBB);		Builder.CreateBr(InitBB);
}		}

		EmitBlock(AlignAllocBB);
		auto *AlignedAllocateCall = EmitScalarExpr(S.getAllocate());

		// The codegen'd IR looks like:
		ChuanqiXuUnsubmitted Done Reply Inline Actions Since `hasAlignArg` is called only once, I suggested to make it a lambda here which would make the code more easy to read. ChuanqiXu: Since `hasAlignArg` is called only once, I suggested to make it a lambda here which would make…
		ychenAuthorUnsubmitted Done Reply Inline Actions will do ychen: will do
		// void *rawFrame = nullptr;
		// ...
		// if (llvm.coro.alloc()) {
		// size_t size = llvm.coro.size(), align = llvm.coro.align();
		ChuanqiXuUnsubmitted Done Reply Inline Actions I recommend to add a detailed comment here to tell the story why we need to over allocate the frame. It is really hard to understand for people who are new to this code. Otherwise, I think they need to use `git blame` to find the commit id and this review page to figure the reasons out. ChuanqiXu: I recommend to add a detailed comment here to tell the story why we need to over allocate the…
		ychenAuthorUnsubmitted Done Reply Inline Actions will do. ychen: will do.
		// if (align > NEW_ALIGN) {
		// size += align - NEW_ALIGN;
		// frame = operator new(size);
		// rawFrame = frame;
		// frame = (frame + align - 1) & ~(align - 1);
		// } else {
		// frame = operator new(size);
		// }
		// }

		ChuanqiXuUnsubmitted Done Reply Inline Actions `if (HasAlignArg)` should be the content of the next patch 'D102147', right? I don't think they should come here. ChuanqiXu: `if (HasAlignArg)` should be the content of the next patch 'D102147', right? I don't think they…
		// size += align - NEW_ALIGN
		GrowFrameSize(*this, cast<llvm::CallInst>(AlignedAllocateCall),
		/IsAlloc/ true);
		if (S.getReturnStmtOnAllocFailure()) {
		auto *Cond = Builder.CreateICmpNE(AlignedAllocateCall, NullPtr);
		AlignAllocBBCont = createBasicBlock("coro.alloc.align2");
		assert(RetOnFailureBB);
		Builder.CreateCondBr(Cond, AlignAllocBBCont, RetOnFailureBB);
		EmitBlock(AlignAllocBBCont);
		}
		// frame = (frame + align - 1) & ~(align - 1)
		auto *CoroAlign =
		Builder.CreateCall(CGM.getIntrinsic(llvm::Intrinsic::coro_align, SizeTy));
		auto *AlignedUpAddr =
		ChuanqiXuUnsubmitted Done Reply Inline Actions It may be better to organize it as: if (!HasAlignArg) { if (auto RetOnAllocFailure = S.getReturnStmtOnAllocFailure()) { auto Cond = Builder.CreateICmpNE(AlignedAllocateCall, NullPtr); AlignAllocBB2 = createBasicBlock("coro.alloc.align2"); Builder.CreateCondBr(Cond, AlignAllocBB2, RetOnFailureBB); EmitBlock(AlignAllocBB2); } auto CoroAlign = Builder.CreateCall( CGM.getIntrinsic(llvm::Intrinsic::coro_align, SizeTy)); ... } ChuanqiXu:* It may be better to organize it as: ``` if (!HasAlignArg) { if (auto *RetOnAllocFailure = S.
		emitAlignUpTo(*this, AlignedAllocateCall, CoroAlign, S.getAllocate());
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions How do your think about to replace EmitBuiltinAlignTo inplace? ChuanqiXu: How do your think about to replace EmitBuiltinAlignTo inplace?
		ychenAuthorUnsubmitted Done Reply Inline Actions I think with the interface issue being fixed, it is preferable to call it but I don't feel strongly about it so I just went ahead inlined `EmitBuiltinAlignTo` to help review/discussion. ychen: I think with the interface issue being fixed, it is preferable to call it but I don't feel…
		// rawFrame = frame
		auto *RawFramePtrAddrIntrin =
		ChuanqiXuUnsubmitted Done Reply Inline Actions It looks better to add an assert for RetOnFailureBB. I think it may be nullptr at the first glance. ChuanqiXu: It looks better to add an assert for RetOnFailureBB. I think it may be nullptr at the first…
		CGM.getIntrinsic(llvm::Intrinsic::coro_raw_frame_ptr_addr);
		auto *RawFramePtrAddr = Builder.CreateCall(RawFramePtrAddrIntrin);
		Builder.CreateStore(AlignedAllocateCall,
		{RawFramePtrAddr, getPointerAlign()});

EmitBlock(InitBB);		EmitBlock(InitBB);

// Pass the result of the allocation to coro.begin.		// Pass the result of the allocation to coro.begin.
auto *Phi = Builder.CreatePHI(VoidPtrTy, 2);		auto *Phi = Builder.CreatePHI(VoidPtrTy, 3);
Phi->addIncoming(NullPtr, EntryBB);		Phi->addIncoming(NullPtr, EntryBB);
Phi->addIncoming(AllocateCall, AllocOrInvokeContBB);		Phi->addIncoming(AllocateCall, AllocOrInvokeContBB);
		Phi->addIncoming(AlignedUpAddr,
		AlignAllocBBCont ? AlignAllocBBCont : AlignAllocBB);

auto *CoroBegin = Builder.CreateCall(		auto *CoroBegin = Builder.CreateCall(
		ChuanqiXuUnsubmitted Done Reply Inline Actions We remove this assignment and use AlignedUpAddr directly in the following. ChuanqiXu: We remove this assignment and use AlignedUpAddr directly in the following.
CGM.getIntrinsic(llvm::Intrinsic::coro_begin), {CoroId, Phi});		CGM.getIntrinsic(llvm::Intrinsic::coro_begin), {CoroId, Phi});
CurCoro.Data->CoroBegin = CoroBegin;		CurCoro.Data->CoroBegin = CoroBegin;

GetReturnObjectManager GroManager(*this, S);		GetReturnObjectManager GroManager(*this, S);
GroManager.EmitGroAlloca();		GroManager.EmitGroAlloca();

		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Maybe we could calculate it in place instead of trying to call a function which is not designed for llvm::value. It looks like the calculation isn't too complex. ChuanqiXu:* Maybe we could calculate it in place instead of trying to call a function which is not designed…
		ychenAuthorUnsubmitted Done Reply Inline Actions I'm open to not calling `EmitBuiltinAlignTo`, which basically inline the useful parts of `EmitBuiltinAlignTo`. The initial intention is code sharing and easy readability. What's the benefit of not calling it? ychen: I'm open to not calling `EmitBuiltinAlignTo`, which basically inline the useful parts of…
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Reusing code is good. But my main concern is that the design for the interfaces. The current design smells bad to me. Another reason for implement it in place is I think it is not very complex and easy to understand. Another option I got is to implement `EmitBuitinAlign` in LLVM (someplace like `Local`), then the CodeGenFunction:: EmitBuitinAlign and this function could use it. ChuanqiXu: Reusing code is good. But my main concern is that the design for the interfaces. The current…
		ychenAuthorUnsubmitted Done Reply Inline Actions Reusing code is good. But my main concern is that the design for the interfaces. The current design smells bad to me. Another reason for implement it in place is I think it is not very complex and easy to understand. Another option I got is to implement `EmitBuitinAlign` in LLVM (someplace like `Local`), then the CodeGenFunction:: EmitBuitinAlign and this function could use it. ychen: > Reusing code is good. But my main concern is that the design for the interfaces. The current…
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions I guess you forgot to reply what you want to say. ChuanqiXu: I guess you forgot to reply what you want to say.
		ychenAuthorUnsubmitted Done Reply Inline Actions Yep, I meant to say the use of "void " is removed. ychen:* Yep, I meant to say the use of "void *" is removed.
CurCoro.Data->CleanupJD = getJumpDestInCurrentScope(RetBB);		CurCoro.Data->CleanupJD = getJumpDestInCurrentScope(RetBB);
{		{
CGDebugInfo *DI = getDebugInfo();		CGDebugInfo *DI = getDebugInfo();
ParamReferenceReplacerRAII ParamReplacer(LocalDeclMap);		ParamReferenceReplacerRAII ParamReplacer(LocalDeclMap);
CodeGenFunction::RunCleanupsScope ResumeScope(*this);		CodeGenFunction::RunCleanupsScope ResumeScope(*this);
EHStack.pushCleanup<CallCoroDelete>(NormalAndEHCleanup, S.getDeallocate());		EHStack.pushCleanup<CallCoroDelete>(NormalAndEHCleanup, S.getDeallocate());

// Create mapping between parameters and copy-params for coroutine function.		// Create mapping between parameters and copy-params for coroutine function.
auto ParamMoves = S.getParamMoves();		auto ParamMoves = S.getParamMoves();
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Does here miss a branch to InitBB? ChuanqiXu: Does here miss a branch to InitBB?
		ychenAuthorUnsubmitted Done Reply Inline Actions `EmitBlock` would handle the case. ychen: `EmitBlock` would handle the case.
assert(		assert(
(ParamMoves.size() == 0 \|\| (ParamMoves.size() == FnArgs.size())) &&		(ParamMoves.size() == 0 \|\| (ParamMoves.size() == FnArgs.size())) &&
"ParamMoves and FnArgs should be the same size for coroutine function");		"ParamMoves and FnArgs should be the same size for coroutine function");
if (ParamMoves.size() == FnArgs.size() && DI)		if (ParamMoves.size() == FnArgs.size() && DI)
for (const auto Pair : llvm::zip(FnArgs, ParamMoves))		for (const auto Pair : llvm::zip(FnArgs, ParamMoves))
DI->getCoroutineParameterMappings().insert(		DI->getCoroutineParameterMappings().insert(
{std::get<0>(Pair), std::get<1>(Pair)});		{std::get<0>(Pair), std::get<1>(Pair)});

▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	case llvm::Intrinsic::coro_frame: {
return RValue::get(NullPtr);		return RValue::get(NullPtr);
}		}
// The following three intrinsics take a token parameter referring to a token		// The following three intrinsics take a token parameter referring to a token
// returned by earlier call to @llvm.coro.id. Since we cannot represent it in		// returned by earlier call to @llvm.coro.id. Since we cannot represent it in
// builtins, we patch it up here.		// builtins, we patch it up here.
case llvm::Intrinsic::coro_alloc:		case llvm::Intrinsic::coro_alloc:
case llvm::Intrinsic::coro_begin:		case llvm::Intrinsic::coro_begin:
case llvm::Intrinsic::coro_free: {		case llvm::Intrinsic::coro_free: {
		// Make deallocation and aligned deallocation share one `coro.free`.
		if (CurCoro.Data && CurCoro.Data->LastCoroFreeUsedForDealloc)
		return RValue::get(CurCoro.Data->LastCoroFree);
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Is it possible that it would return a nullptr value? ChuanqiXu: Is it possible that it would return a nullptr value?
		ychenAuthorUnsubmitted Done Reply Inline Actions Not that I know of. Because there is an early return if (!CoroFree) { CGF.CGM.Error(Deallocate->getBeginLoc(), "Deallocation expressoin does not refer to coro.free"); return; } ychen: Not that I know of. Because there is an early return ``` if (!CoroFree) { CGF.CGM.
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Do you think it is better to merge this check here? if (CurCoro.Data && CurCoro.Data->LastCoroFreeUsedForDealloc) { if (!CoroFree) { CGF.CGM.Error(Deallocate->getBeginLoc(), "Deallocation expressoin does not refer to coro.free"); return something; } return RValue::get(CurCoro.Data->LastCoroFree); } ChuanqiXu: Do you think it is better to merge this check here? ``` if (CurCoro.Data && CurCoro.Data…

if (CurCoro.Data && CurCoro.Data->CoroId) {		if (CurCoro.Data && CurCoro.Data->CoroId) {
Args.push_back(CurCoro.Data->CoroId);		Args.push_back(CurCoro.Data->CoroId);
break;		break;
}		}
CGM.Error(E->getBeginLoc(), "this builtin expect that __builtin_coro_id has"		CGM.Error(E->getBeginLoc(), "this builtin expect that __builtin_coro_id has"
" been used earlier in this function");		" been used earlier in this function");
// Fallthrough to the next case to add TokenNone as the first argument.		// Fallthrough to the next case to add TokenNone as the first argument.
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
Show All 32 Lines

clang/test/CodeGenCoroutines/coro-alloc.cpp

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	struct promise_type {
void return_void() {}		void return_void() {}
};		};
};		};

// CHECK-LABEL: f0(		// CHECK-LABEL: f0(
extern "C" void f0(global_new_delete_tag) {		extern "C" void f0(global_new_delete_tag) {
// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16		// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16
// CHECK: %[[NeedAlloc:.+]] = call i1 @llvm.coro.alloc(token %[[ID]])		// CHECK: %[[NeedAlloc:.+]] = call i1 @llvm.coro.alloc(token %[[ID]])
// CHECK: br i1 %[[NeedAlloc]], label %[[AllocBB:.+]], label %[[InitBB:.+]]		// CHECK: br i1 %[[NeedAlloc]], label %[[CheckAlignBB:.+]], label %[[InitBB:.+]]

		// CHECK: [[CheckAlignBB]]:
		// CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
		// CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]], 16
		// CHECK: br i1 %[[CMP]], label %[[AlignAllocBB:.+]], label %[[AllocBB:.+]]

// CHECK: [[AllocBB]]:		// CHECK: [[AllocBB]]:
		// CHECK-NEXT: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
		// CHECK-NEXT: %[[MEM:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])
		// CHECK-NEXT: br label %[[InitBB:.+]]

		// CHECK: [[AlignAllocBB]]:
// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()		// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
// CHECK: %[[MEM:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])		// CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
		// CHECK: %[[PAD:.+]] = sub nsw i64 %[[ALIGN]], 16
		// CHECK: %[[NEWSIZE:.+]] = add i64 %[[SIZE]], %[[PAD]]
		// CHECK: %[[MEM2:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[NEWSIZE]])
		// CHECK: %[[ALIGN2:.+]] = call i64 @llvm.coro.align.i64()
		// CHECK: %[[ALIGNED:.+]] = getelementptr inbounds i8, i8* %[[MEM2]],
		// CHECK: call void @llvm.assume(i1 true) [ "align"(i8* %[[ALIGNED]], i64 %[[ALIGN2]]) ]
		// CHECK: %[[ADDR:.+]] = call i8** @llvm.coro.raw.frame.ptr.addr()
		// CHECK: store i8* %[[MEM2]], i8** %[[ADDR]], align 8
// CHECK: br label %[[InitBB]]		// CHECK: br label %[[InitBB]]

// CHECK: [[InitBB]]:		// CHECK: [[InitBB]]:
// CHECK: %[[PHI:.+]] = phi i8* [ null, %{{.+}} ], [ %call, %[[AllocBB]] ]		// CHECK: %[[PHI:.+]] = phi i8* [ null, %{{.+}} ], [ %[[MEM]], %[[AllocBB]] ], [ %[[ALIGNED]], %[[AlignAllocBB]] ]
// CHECK: %[[FRAME:.+]] = call i8* @llvm.coro.begin(token %[[ID]], i8* %[[PHI]])		// CHECK: %[[FRAME:.+]] = call i8* @llvm.coro.begin(token %[[ID]], i8* %[[PHI]])

// CHECK: %[[MEM:.+]] = call i8* @llvm.coro.free(token %[[ID]], i8* %[[FRAME]])		// CHECK: %[[MEM:.+]] = call i8* @llvm.coro.free(token %[[ID]], i8* %[[FRAME]])
// CHECK: %[[NeedDealloc:.+]] = icmp ne i8* %[[MEM]], null		// CHECK: %[[NeedDealloc:.+]] = icmp ne i8* %[[MEM]], null
// CHECK: br i1 %[[NeedDealloc]], label %[[FreeBB:.+]], label %[[Afterwards:.+]]		// CHECK: br i1 %[[NeedDealloc]], label %[[CheckAlignBB:.+]], label %[[Afterwards:.+]]

		// CHECK: [[CheckAlignBB]]:
		// CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
		// CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]], 16
		// CHECK: br i1 %[[CMP]], label %[[AlignedFreeBB:.+]], label %[[FreeBB:.+]]

// CHECK: [[FreeBB]]:		// CHECK: [[FreeBB]]:
// CHECK: call void @_ZdlPv(i8* %[[MEM]])		// CHECK-NEXT: call void @_ZdlPv(i8* %[[MEM]])
// CHECK: br label %[[Afterwards]]		// CHECK-NEXT: br label %[[Afterwards]]

		// CHECK: [[AlignedFreeBB]]:
		// CHECK-NEXT: %[[OFFSET:.+]] = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
		// CHECK-NEXT: %[[ADDR:.+]] = getelementptr inbounds i8, i8* %[[MEM]], i32 %[[OFFSET]]
		// CHECK-NEXT: %[[ADDR2:.+]] = bitcast i8* %[[ADDR]] to i8**
		// CHECK-NEXT: %[[MEM:.+]] = load i8, i8* %[[ADDR2]], align 8
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions It defines variable 'MEM' again in conflicting with the line at 89. Does it matter? ChuanqiXu: It defines variable 'MEM' again in conflicting with the line at 89. Does it matter?
		ychenAuthorUnsubmitted Done Reply Inline Actions It should not matter. It works like variable definitions. Uses always get the most recent definitions. "FileCheck variables can be defined multiple times, and substitutions always get the latest value. Variables can also be substituted later on the same line they were defined on." ychen: It should not matter. It works like variable definitions. Uses always get the most recent…
		// CHECK-NEXT: call void @_ZdlPv(i8* %[[MEM]])
		// CHECK-NEXT: br label %[[Afterwards]]

// CHECK: [[Afterwards]]:		// CHECK: [[Afterwards]]:
// CHECK: ret void		// CHECK: ret void
co_return;		co_return;
}		}

struct promise_new_tag {};		struct promise_new_tag {};

▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines

// A coroutine that takes a single pointer argument should not invoke this		// A coroutine that takes a single pointer argument should not invoke this
// placement form operator. [dcl.fct.def.coroutine]/7 dictates that lookup for		// placement form operator. [dcl.fct.def.coroutine]/7 dictates that lookup for
// allocation functions matching the coroutine function's signature be done		// allocation functions matching the coroutine function's signature be done
// within the scope of the promise type's class.		// within the scope of the promise type's class.
// CHECK-LABEL: f1b(		// CHECK-LABEL: f1b(
extern "C" void f1b(promise_matching_global_placement_new_tag, dummy *) {		extern "C" void f1b(promise_matching_global_placement_new_tag, dummy *) {
// CHECK: call noalias nonnull i8* @_Znwm(i64		// CHECK: call noalias nonnull i8* @_Znwm(i64
		// CHECK-NOT: call noalias nonnull i8* @_ZnwmSt11align_val_t(i64
co_return;		co_return;
}		}

struct promise_delete_tag {};		struct promise_delete_tag {};

template<>		template<>
struct std::experimental::coroutine_traits<void, promise_delete_tag> {		struct std::experimental::coroutine_traits<void, promise_delete_tag> {
struct promise_type {		struct promise_type {
Show All 9 Lines
extern "C" void f2(promise_delete_tag) {		extern "C" void f2(promise_delete_tag) {
// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16		// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16
// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()		// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
// CHECK: call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])		// CHECK: call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])

// CHECK: %[[FRAME:.+]] = call i8* @llvm.coro.begin(		// CHECK: %[[FRAME:.+]] = call i8* @llvm.coro.begin(
// CHECK: %[[MEM:.+]] = call i8* @llvm.coro.free(token %[[ID]], i8* %[[FRAME]])		// CHECK: %[[MEM:.+]] = call i8* @llvm.coro.free(token %[[ID]], i8* %[[FRAME]])
// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJv18promise_delete_tagEE12promise_typedlEPv(i8* %[[MEM]])		// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJv18promise_delete_tagEE12promise_typedlEPv(i8* %[[MEM]])
		// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJv18promise_delete_tagEE12promise_typedlEPv(i8*
co_return;		co_return;
}		}

struct promise_sized_delete_tag {};		struct promise_sized_delete_tag {};

template<>		template<>
struct std::experimental::coroutine_traits<void, promise_sized_delete_tag> {		struct std::experimental::coroutine_traits<void, promise_sized_delete_tag> {
struct promise_type {		struct promise_type {
void operator delete(void*, unsigned long);		void operator delete(void*, unsigned long);
void get_return_object() {}		void get_return_object() {}
suspend_always initial_suspend() { return {}; }		suspend_always initial_suspend() { return {}; }
suspend_always final_suspend() noexcept { return {}; }		suspend_always final_suspend() noexcept { return {}; }
void return_void() {}		void return_void() {}
};		};
};		};

// CHECK-LABEL: f3(		// CHECK-LABEL: f3(
extern "C" void f3(promise_sized_delete_tag) {		extern "C" void f3(promise_sized_delete_tag) {
// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16		// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16
// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()		// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
// CHECK: call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])		// CHECK: call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])

// CHECK: %[[FRAME:.+]] = call i8* @llvm.coro.begin(		// CHECK: %[[FRAME:.+]] = call i8* @llvm.coro.begin(
// CHECK: %[[MEM:.+]] = call i8* @llvm.coro.free(token %[[ID]], i8* %[[FRAME]])		// CHECK: %[[MEM:.+]] = call i8* @llvm.coro.free(token %[[ID]], i8* %[[FRAME]])
		// CHECK: call i64 @llvm.coro.align.i64()
		// CHECK: br i1 {{.*}}, label %[[AlignFreeBB:.+]], label %[[FreeBB:.+]]

		// CHECK: [[FreeBB]]:
// CHECK: %[[SIZE2:.+]] = call i64 @llvm.coro.size.i64()		// CHECK: %[[SIZE2:.+]] = call i64 @llvm.coro.size.i64()
// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJv24promise_sized_delete_tagEE12promise_typedlEPvm(i8* %[[MEM]], i64 %[[SIZE2]])		// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJv24promise_sized_delete_tagEE12promise_typedlEPvm(i8* %[[MEM]], i64 %[[SIZE2]])

		// CHECK: [[AlignFreeBB]]:
		// CHECK: %[[OFFSET:.+]] = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
		// CHECK: %[[ADDR:.+]] = getelementptr inbounds i8, i8* %[[MEM]], i32 %[[OFFSET]]
		// CHECK: %[[ADDR2:.+]] = bitcast i8* %[[ADDR]] to i8**
		// CHECK: %[[MEM2:.+]] = load i8, i8* %[[ADDR2]], align 8
		// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
		// CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
		// CHECK: %[[DIFF:.+]] = sub nsw i64 %[[ALIGN]], 16
		// CHECK: %[[SIZE2:.+]] = add i64 %[[SIZE]], %[[DIFF]]
		// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJv24promise_sized_delete_tagEE12promise_typedlEPvm(i8* %[[MEM2]], i64 %[[SIZE2]])

co_return;		co_return;
}		}

struct promise_on_alloc_failure_tag {};		struct promise_on_alloc_failure_tag {};

template<>		template<>
struct std::experimental::coroutine_traits<int, promise_on_alloc_failure_tag> {		struct std::experimental::coroutine_traits<int, promise_on_alloc_failure_tag> {
struct promise_type {		struct promise_type {
int get_return_object() { return 0; }		int get_return_object() { return 0; }
suspend_always initial_suspend() { return {}; }		suspend_always initial_suspend() { return {}; }
suspend_always final_suspend() noexcept { return {}; }		suspend_always final_suspend() noexcept { return {}; }
void return_void() {}		void return_void() {}
static int get_return_object_on_allocation_failure() { return -1; }		static int get_return_object_on_allocation_failure() { return -1; }
};		};
};		};

// CHECK-LABEL: f4(		// CHECK-LABEL: f4(
extern "C" int f4(promise_on_alloc_failure_tag) {		extern "C" int f4(promise_on_alloc_failure_tag) {
// CHECK: %[[RetVal:.+]] = alloca i32		// CHECK: %[[RetVal:.+]] = alloca i32
// CHECK: %[[Gro:.+]] = alloca i32		// CHECK: %[[Gro:.+]] = alloca i32
// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16		// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16
		// CHECK: br i1 %{{.*}}, label %[[CheckAlignBB:.+]], label %[[OKBB:.+]]

		// CHECK: [[CheckAlignBB]]:
		// CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
		// CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]], 16
		// CHECK: br i1 %[[CMP]], label %[[AlignAllocBB:.+]], label %[[AllocBB:.+]]

		// CHECK: [[AllocBB]]:
// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()		// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
// CHECK: %[[MEM:.+]] = call noalias i8* @_ZnwmRKSt9nothrow_t(i64 %[[SIZE]], %"struct.std::nothrow_t"* nonnull align 1 dereferenceable(1) @_ZStL7nothrow)		// CHECK: %[[MEM:.+]] = call noalias i8* @_ZnwmRKSt9nothrow_t(i64 %[[SIZE]], %"struct.std::nothrow_t"* nonnull align 1 dereferenceable(1) @_ZStL7nothrow)
// CHECK: %[[OK:.+]] = icmp ne i8* %[[MEM]], null		// CHECK: %[[OK:.+]] = icmp ne i8* %[[MEM]], null
// CHECK: br i1 %[[OK]], label %[[OKBB:.+]], label %[[ERRBB:.+]]		// CHECK: br i1 %[[OK]], label %[[OKBB]], label %[[ERRBB:.+]]

// CHECK: [[ERRBB]]:		// CHECK: [[ERRBB]]:
// CHECK: %[[FailRet:.+]] = call i32 @_ZNSt12experimental16coroutine_traitsIJi28promise_on_alloc_failure_tagEE12promise_type39get_return_object_on_allocation_failureEv(		// CHECK: %[[FailRet:.+]] = call i32 @_ZNSt12experimental16coroutine_traitsIJi28promise_on_alloc_failure_tagEE12promise_type39get_return_object_on_allocation_failureEv(
// CHECK: store i32 %[[FailRet]], i32* %[[RetVal]]		// CHECK: store i32 %[[FailRet]], i32* %[[RetVal]]
// CHECK: br label %[[RetBB:.+]]		// CHECK: br label %[[RetBB:.+]]

		// CHECK: [[AlignAllocBB]]:
		// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()
		// CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
		// CHECK: %[[PAD:.+]] = sub nsw i64 %[[ALIGN]], 16
		// CHECK: %[[NEWSIZE:.+]] = add i64 %[[SIZE]], %[[PAD]]
		// CHECK: %[[MEM2:.+]] = call noalias i8* @_ZnwmRKSt9nothrow_t(i64 %[[NEWSIZE]], %"struct.std::nothrow_t"* nonnull align 1 dereferenceable(1) @_ZStL7nothrow)
		// CHECK: %[[OK:.+]] = icmp ne i8* %[[MEM2]], null
		// CHECK: br i1 %[[OK]], label %[[AlignAllocBBCont:.+]], label %[[ERRBB:.+]]

		// CHECK: [[AlignAllocBBCont]]:
		// CHECK: %[[ALIGN2:.+]] = call i64 @llvm.coro.align.i64()
		// CHECK: %[[ALIGNED:.+]] = getelementptr inbounds i8, i8* %[[MEM2]],
		// CHECK: call void @llvm.assume(i1 true) [ "align"(i8* %[[ALIGNED]], i64 %[[ALIGN2]]) ]
		// CHECK: %[[ADDR:.+]] = call i8** @llvm.coro.raw.frame.ptr.addr()
		// CHECK: store i8* %[[MEM2]], i8** %[[ADDR]], align 8
		// CHECK: br label %[[OKBB]]

// CHECK: [[OKBB]]:		// CHECK: [[OKBB]]:
// CHECK: %[[OkRet:.+]] = call i32 @_ZNSt12experimental16coroutine_traitsIJi28promise_on_alloc_failure_tagEE12promise_type17get_return_objectEv(		// CHECK: %[[OkRet:.+]] = call i32 @_ZNSt12experimental16coroutine_traitsIJi28promise_on_alloc_failure_tagEE12promise_type17get_return_objectEv(
// CHECK: store i32 %[[OkRet]], i32* %[[Gro]]		// CHECK: store i32 %[[OkRet]], i32* %[[Gro]]

// CHECK: %[[Tmp1:.]] = load i32, i32 %[[Gro]]		// CHECK: %[[Tmp1:.]] = load i32, i32 %[[Gro]]
// CHECK-NEXT: store i32 %[[Tmp1]], i32* %[[RetVal]]		// CHECK-NEXT: store i32 %[[Tmp1]], i32* %[[RetVal]]
// CHECK-NEXT: %[[Gro_CAST:.+]] = bitcast i32* %[[Gro]] to i8*		// CHECK-NEXT: %[[Gro_CAST:.+]] = bitcast i32* %[[Gro]] to i8*
// CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 4, i8* %[[Gro_CAST]]) #2		// CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 4, i8* %[[Gro_CAST]]) #2
// CHECK-NEXT: br label %[[RetBB]]		// CHECK-NEXT: br label %[[RetBB]]

// CHECK: [[RetBB]]:		// CHECK: [[RetBB]]:
// CHECK: %[[LoadRet:.+]] = load i32, i32* %[[RetVal]], align 4		// CHECK: %[[LoadRet:.+]] = load i32, i32* %[[RetVal]], align 4
// CHECK: ret i32 %[[LoadRet]]		// CHECK: ret i32 %[[LoadRet]]
co_return;		co_return;
}		}

clang/test/CodeGenCoroutines/coro-cleanup.cpp

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	void f() {

// CHECK: [[Cont]]:		// CHECK: [[Cont]]:
// CHECK-NEXT: br label %[[Cont2:.+]]		// CHECK-NEXT: br label %[[Cont2:.+]]
// CHECK: [[Cont2]]:		// CHECK: [[Cont2]]:
// CHECK-NEXT: br label %[[Cleanup:.+]]		// CHECK-NEXT: br label %[[Cleanup:.+]]

// CHECK: [[Cleanup]]:		// CHECK: [[Cleanup]]:
// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJvEE12promise_typeD1Ev(		// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJvEE12promise_typeD1Ev(
// CHECK: %[[Mem0:.+]] = call i8* @llvm.coro.free(		// CHECK: %[[MEM0:.+]] = call i8* @llvm.coro.free(
// CHECK: call void @_ZdlPv(i8* %[[Mem0]]		// CHECK: br i1 %{{.*}}, label %[[CheckAlignBB:.+]], label %[[Afterwards:.+]]

		// CHECK: [[CheckAlignBB]]:
		// CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
		// CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]],
		// CHECK: br i1 %[[CMP]], label %[[AlignedFreeBB:.+]], label %[[FreeBB:.+]]

		// CHECK: [[FreeBB]]:
		// CHECK: call void @_ZdlPv(i8* %[[MEM0]]
		// CHECK: br label %[[Afterwards]]

		// CHECK: [[AlignedFreeBB]]:
		// CHECK-NEXT: %[[OFFSET:.+]] = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
		// CHECK-NEXT: %[[ADDR:.+]] = getelementptr inbounds i8, i8* %[[MEM0]], i32 %[[OFFSET]]
		// CHECK-NEXT: %[[ADDR2:.+]] = bitcast i8* %[[ADDR]] to i8**
		// CHECK-NEXT: %[[MEM:.+]] = load i8, i8* %[[ADDR2]], align 8
		// CHECK-NEXT: call void @_ZdlPv(i8* %[[MEM]])
		// CHECK-NEXT: br label %[[Afterwards]]

// CHECK: [[Dealloc]]:		// CHECK: [[Dealloc]]:
// CHECK: %[[Mem:.+]] = call i8* @llvm.coro.free(		// CHECK: %[[MEM0:.+]] = call i8* @llvm.coro.free(
// CHECK: call void @_ZdlPv(i8* %[[Mem]])		// CHECK: br i1 %{{.*}}, label %[[CheckAlignBB:.+]], label %[[Afterwards:.+]]

		// CHECK: [[CheckAlignBB]]:
		// CHECK: %[[ALIGN:.+]] = call i64 @llvm.coro.align.i64()
		// CHECK: %[[CMP:.+]] = icmp ugt i64 %[[ALIGN]],
		// CHECK: br i1 %[[CMP]], label %[[AlignedFreeBB:.+]], label %[[FreeBB:.+]]

		// CHECK: [[FreeBB]]:
		// CHECK: call void @_ZdlPv(i8* %[[MEM0]]
		// CHECK: br label %[[Afterwards]]

		// CHECK: [[AlignedFreeBB]]:
		// CHECK-NEXT: %[[OFFSET:.+]] = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
		// CHECK-NEXT: %[[ADDR:.+]] = getelementptr inbounds i8, i8* %[[MEM0]], i32 %[[OFFSET]]
		// CHECK-NEXT: %[[ADDR2:.+]] = bitcast i8* %[[ADDR]] to i8**
		// CHECK-NEXT: %[[MEM:.+]] = load i8, i8* %[[ADDR2]], align 8
		// CHECK-NEXT: call void @_ZdlPv(i8* %[[MEM]])
		// CHECK-NEXT: br label %[[Afterwards]]

co_return;		co_return;
}		}

// CHECK-LABEL: define{{.*}} void @_Z1gv(		// CHECK-LABEL: define{{.*}} void @_Z1gv(
void g() {		void g() {
for (;;)		for (;;)
co_await suspend_always{};		co_await suspend_always{};
// Since this is the endless loop there should be no fallthrough handler (call to 'return_void').		// Since this is the endless loop there should be no fallthrough handler (call to 'return_void').
// CHECK-NOT: call void @_ZNSt12experimental16coroutine_traitsIJvEE12promise_type11return_voidEv		// CHECK-NOT: call void @_ZNSt12experimental16coroutine_traitsIJvEE12promise_type11return_voidEv
}		}

clang/test/CodeGenCoroutines/coro-gro.cpp

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	int f() {
// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJiEE12promise_type11return_voidEv(		// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJiEE12promise_type11return_voidEv(
// CHECK: call void @_ZN7CleanupD1Ev(		// CHECK: call void @_ZN7CleanupD1Ev(

// Destroy promise and free the memory.		// Destroy promise and free the memory.

// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJiEE12promise_typeD1Ev(		// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJiEE12promise_typeD1Ev(
// CHECK: %[[Mem:.+]] = call i8* @llvm.coro.free(		// CHECK: %[[Mem:.+]] = call i8* @llvm.coro.free(
// CHECK: call void @_ZdlPv(i8* %[[Mem]])		// CHECK: call void @_ZdlPv(i8* %[[Mem]])
		// CHECK: call void @_ZdlPv(i8* %{{.*}})

// Initialize retval from Gro and destroy Gro		// Initialize retval from Gro and destroy Gro

// CHECK: %[[Conv:.+]] = call i32 @_ZN7GroTypecviEv(		// CHECK: %[[Conv:.+]] = call i32 @_ZN7GroTypecviEv(
// CHECK: store i32 %[[Conv]], i32* %[[RetVal]]		// CHECK: store i32 %[[Conv]], i32* %[[RetVal]]
// CHECK: %[[IsActive:.+]] = load i1, i1* %[[GroActive]]		// CHECK: %[[IsActive:.+]] = load i1, i1* %[[GroActive]]
// CHECK: br i1 %[[IsActive]], label %[[CleanupGro:.+]], label %[[Done:.+]]		// CHECK: br i1 %[[IsActive]], label %[[CleanupGro:.+]], label %[[Done:.+]]

// CHECK: [[CleanupGro]]:		// CHECK: [[CleanupGro]]:
// CHECK: call void @_ZN7GroTypeD1Ev(		// CHECK: call void @_ZN7GroTypeD1Ev(
// CHECK: br label %[[Done]]		// CHECK: br label %[[Done]]

// CHECK: [[Done]]:		// CHECK: [[Done]]:
// CHECK: %[[LoadRet:.+]] = load i32, i32* %[[RetVal]]		// CHECK: %[[LoadRet:.+]] = load i32, i32* %[[RetVal]]
// CHECK: ret i32 %[[LoadRet]]		// CHECK: ret i32 %[[LoadRet]]
}		}

llvm/docs/Coroutines.rst

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	the initial entrypoint of the coroutine, which executes until a suspend point			the initial entrypoint of the coroutine, which executes until a suspend point
	is first reached. The remainder of the original coroutine function is split			is first reached. The remainder of the original coroutine function is split
	out into some number of "resume functions". Any state which must persist			out into some number of "resume functions". Any state which must persist
	across suspensions is stored in the coroutine frame. The resume functions			across suspensions is stored in the coroutine frame. The resume functions
	must somehow be able to handle either a "normal" resumption, which continues			must somehow be able to handle either a "normal" resumption, which continues
	the normal execution of the coroutine, or an "abnormal" resumption, which			the normal execution of the coroutine, or an "abnormal" resumption, which
	must unwind the coroutine without attempting to suspend it.			must unwind the coroutine without attempting to suspend it.

	Switched-Resume Lowering			Switched-Resume Lowering
				ChuanqiXuUnsubmitted Done Reply Inline Actions Since we over align coroutine frame for switched-resume lowering coroutines only, it may be better to move this section under switched-resume lowering section. ChuanqiXu: Since we over align coroutine frame for switched-resume lowering coroutines only, it may be…
	------------------------			------------------------

				ChuanqiXuUnsubmitted Done Reply Inline Actions I prefer to reword "sometimes" to a clearer condition. Like "When the align required is bigger than 16". ChuanqiXu: I prefer to reword "sometimes" to a clearer condition. Like "When the align required is bigger…
	In LLVM's standard switched-resume lowering, signaled by the use of			In LLVM's standard switched-resume lowering, signaled by the use of
	`llvm.coro.id`, the coroutine frame is stored as part of a "coroutine			`llvm.coro.id`, the coroutine frame is stored as part of a "coroutine
	object" which represents a handle to a particular invocation of the			object" which represents a handle to a particular invocation of the
	coroutine. All coroutine objects support a common ABI allowing certain			coroutine. All coroutine objects support a common ABI allowing certain
	features to be used without knowing anything about the coroutine's			features to be used without knowing anything about the coroutine's
	implementation:			implementation:

	- A coroutine object can be queried to see if it has reached completion			- A coroutine object can be queried to see if it has reached completion
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	allocation to be elided due to inlining. This protocol is discussed			allocation to be elided due to inlining. This protocol is discussed
	in further detail below.			in further detail below.

	The frontend may generate code to call the coroutine function directly;			The frontend may generate code to call the coroutine function directly;
	this will become a call to the ramp function and will return a pointer			this will become a call to the ramp function and will return a pointer
	to the coroutine object. The frontend should always resume or destroy			to the coroutine object. The frontend should always resume or destroy
	the coroutine using the corresponding intrinsics.			the coroutine using the corresponding intrinsics.

				raw frame: When the coroutine frame alignment required is bigger than
				__STDCPP_DEFAULT_NEW_ALIGNMENT__, more space than the size of coroutine frame
				needs to be allocated to satisfy the alignment requirement of coroutine frame.
				In this case, the memory address returned by the memory allocator is different
				from coroutine frame start address. The memory address returned by the memory allocator is called the "raw frame pointer". coroutine frame start address
				is at non-negative offset from "raw frame pointer". The maximal gap between the
				two is `llvm.coro.align() - __STDCPP_DEFAULT_NEW_ALIGNMENT__` whereas the
				actual gap is a runtime property. When a coroutine frame is overaligned, the
				"raw frame pointer" may be stored in the coroutine frame and it could be
				retrieved using `llvm.coro.raw.frame.ptr.*` intrinsics.

	Returned-Continuation Lowering			Returned-Continuation Lowering
	------------------------------			------------------------------

	In returned-continuation lowering, signaled by the use of			In returned-continuation lowering, signaled by the use of
	`llvm.coro.id.retcon` or `llvm.coro.id.retcon.once`, some aspects of			`llvm.coro.id.retcon` or `llvm.coro.id.retcon.once`, some aspects of
	the ABI must be handled more explicitly by the frontend.			the ABI must be handled more explicitly by the frontend.

	In this lowering, every suspend point takes a list of "yielded values"			In this lowering, every suspend point takes a list of "yielded values"
	▲ Show 20 Lines • Show All 809 Lines • ▼ Show 20 Lines
	None			None

	Semantics:			Semantics:
	""""""""""			""""""""""

	The `coro.size` intrinsic is lowered to a constant representing the size of			The `coro.size` intrinsic is lowered to a constant representing the size of
	the coroutine frame.			the coroutine frame.

				.. _coro.align:

				'llvm.coro.align' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				::

				declare i32 @llvm.coro.align.i32()
				declare i64 @llvm.coro.align.i64()

				Overview:
				"""""""""

				The '``llvm.coro.align``' intrinsic returns the alignment of the coroutine frame
				ChuanqiXuUnsubmitted Not Done Reply Inline Actions '`llvm.coro.align`' `llvm.coro.align` ChuanqiXu: > '``llvm.coro.align``' `llvm.coro.align`
				in bytes.

				Arguments:
				""""""""""

				None

				Semantics:
				""""""""""

				The `coro.align` intrinsic is lowered to a constant representing the alignment
				of the coroutine frame.

				.. _coro.raw.frame.ptr.offset:

				'llvm.coro.raw.frame.ptr.offset' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				::

				declare i32 @llvm.coro.raw.frame.ptr.offset.i32()
				declare i64 @llvm.coro.raw.frame.ptr.offset.i64()

				Overview:
				"""""""""

				The '``llvm.coro.raw.frame.ptr.offset``' intrinsic returns the byte offset of
				the `raw frame pointer` in coroutine frame. This is only supported for
				switched-resume coroutines. The return value is undefined when the coroutine
				frame is not overaligned.

				Arguments:
				""""""""""

				None

				Semantics:
				""""""""""

				The `coro.raw.frame.ptr.offset` intrinsic is lowered to a constant representing
				the byte offset of the `raw frame pointer` in coroutine frame. `raw frame pointer`
				is the pointer returned by the allocator for the coroutine frame. The address
				returned by `llvm.coro.begin` is at a non-negative offset from `raw frame pointer`.
				The return value is undefined when the coroutine frame is not overaligned.

				.. _coro.raw.frame.ptr.addr:

				'llvm.coro.raw.frame.ptr.addr' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				::

				declare i8** @llvm.coro.raw.frame.ptr.addr()

				Overview:
				"""""""""

				The '``llvm.coro.raw.frame.ptr.addr``' intrinsic returns the address storing the
				`raw frame pointer` in the coroutine frame. This is only supported for
				switched-resume coroutines. The return value is undefined when the coroutine
				frame is not overaligned.

				Arguments:
				""""""""""

				None

				Semantics:
				""""""""""

				The `coro.raw.frame.ptr.offset.addr` intrinsic is lowered to the address of a
				coroutine frame field storing the `raw frame pointer`.

	.. _coro.begin:			.. _coro.begin:

	'llvm.coro.begin' Intrinsic			'llvm.coro.begin' Intrinsic
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	::			::

	declare i8* @llvm.coro.begin(token <id>, i8* <mem>)			declare i8* @llvm.coro.begin(token <id>, i8* <mem>)

	Overview:			Overview:
	"""""""""			"""""""""

	The '``llvm.coro.begin``' intrinsic returns an address of the coroutine frame.			The '``llvm.coro.begin``' intrinsic returns an address of the coroutine frame.

	Arguments:			Arguments:
	""""""""""			""""""""""

	The first argument is a token returned by a call to '``llvm.coro.id``'			The first argument is a token returned by a call to '``llvm.coro.id``'
	identifying the coroutine.			identifying the coroutine.

	The second argument is a pointer to a block of memory where coroutine frame			The second argument is a pointer to a block of memory where coroutine frame
				ChuanqiXuUnsubmitted Not Done Reply Inline Actions coroutine frame From the implementation, it looks like `raw frame`. I am not sure if it is problematic now since CoroElide pass would convert the frame to an alloca. ChuanqiXu: > coroutine frame From the implementation, it looks like `raw frame`. I am not sure if it is…
	will be stored if it is allocated dynamically. This pointer is ignored			will be stored if it is allocated dynamically. This pointer is ignored
	for returned-continuation coroutines.			for returned-continuation coroutines.

	Semantics:			Semantics:
	""""""""""			""""""""""

	Depending on the alignment requirements of the objects in the coroutine frame			`coro.begin` returns its second argument.
	and/or on the codegen compactness reasons the pointer returned from `coro.begin`
	may be at offset to the `%mem` argument. (This could be beneficial if
	instructions that express relative access to data can be more compactly encoded
	with small positive and negative offsets).

	A frontend should emit exactly one `coro.begin` intrinsic per coroutine.			A frontend should emit exactly one `coro.begin` intrinsic per coroutine.

	.. _coro.free:			.. _coro.free:

	'llvm.coro.free' Intrinsic			'llvm.coro.free' Intrinsic
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	::			::
	Show All 9 Lines
	supported for returned-continuation coroutines.			supported for returned-continuation coroutines.

	Arguments:			Arguments:
	""""""""""			""""""""""

	The first argument is a token returned by a call to '``llvm.coro.id``'			The first argument is a token returned by a call to '``llvm.coro.id``'
	identifying the coroutine.			identifying the coroutine.

	The second argument is a pointer to the coroutine frame. This should be the same			The second argument is a pointer to the coroutine frame. This should be the same
				ChuanqiXuUnsubmitted Not Done Reply Inline Actions the coroutine frame It should be the raw frame now, isn't it? ChuanqiXu: > the coroutine frame It should be the raw frame now, isn't it?
				ychenAuthorUnsubmitted Done Reply Inline Actions It is still "coroutine frame". ychen: It is still "coroutine frame".
	pointer that was returned by prior `coro.begin` call.			pointer that was returned by prior `coro.begin` call.

	Example (custom deallocation function):			Example (custom deallocation function):
	"""""""""""""""""""""""""""""""""""""""			"""""""""""""""""""""""""""""""""""""""

	.. code-block:: llvm			.. code-block:: llvm

	cleanup:			cleanup:
	▲ Show 20 Lines • Show All 774 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 1,259 Lines • ▼ Show 20 Lines	def int_coro_free : Intrinsic<[llvm_ptr_ty], [llvm_token_ty, llvm_ptr_ty],
NoCapture<ArgIndex<1>>]>;		NoCapture<ArgIndex<1>>]>;
def int_coro_end : Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_i1_ty], []>;		def int_coro_end : Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_i1_ty], []>;
def int_coro_end_async		def int_coro_end_async
: Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_i1_ty, llvm_vararg_ty], []>;		: Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_i1_ty, llvm_vararg_ty], []>;

def int_coro_frame : Intrinsic<[llvm_ptr_ty], [], [IntrNoMem]>;		def int_coro_frame : Intrinsic<[llvm_ptr_ty], [], [IntrNoMem]>;
def int_coro_noop : Intrinsic<[llvm_ptr_ty], [], [IntrNoMem]>;		def int_coro_noop : Intrinsic<[llvm_ptr_ty], [], [IntrNoMem]>;
def int_coro_size : Intrinsic<[llvm_anyint_ty], [], [IntrNoMem]>;		def int_coro_size : Intrinsic<[llvm_anyint_ty], [], [IntrNoMem]>;
		def int_coro_align : Intrinsic<[llvm_anyint_ty], [], [IntrNoMem]>;
		def int_coro_raw_frame_ptr_offset : Intrinsic<[llvm_anyint_ty], [], [IntrNoMem]>;
		def int_coro_raw_frame_ptr_addr : Intrinsic<[llvm_ptrptr_ty], [], [IntrNoMem]>;

def int_coro_save : Intrinsic<[llvm_token_ty], [llvm_ptr_ty], []>;		def int_coro_save : Intrinsic<[llvm_token_ty], [llvm_ptr_ty], []>;
def int_coro_suspend : Intrinsic<[llvm_i8_ty], [llvm_token_ty, llvm_i1_ty], []>;		def int_coro_suspend : Intrinsic<[llvm_i8_ty], [llvm_token_ty, llvm_i1_ty], []>;
def int_coro_suspend_retcon : Intrinsic<[llvm_any_ty], [llvm_vararg_ty], []>;		def int_coro_suspend_retcon : Intrinsic<[llvm_any_ty], [llvm_vararg_ty], []>;
def int_coro_prepare_retcon : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty],		def int_coro_prepare_retcon : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_coro_alloca_alloc : Intrinsic<[llvm_token_ty],		def int_coro_alloca_alloc : Intrinsic<[llvm_token_ty],
[llvm_anyint_ty, llvm_i32_ty], []>;		[llvm_anyint_ty, llvm_i32_ty], []>;
▲ Show 20 Lines • Show All 479 Lines • Show Last 20 Lines

llvm/lib/Transforms/Coroutines/CoroFrame.cpp

Show All 15 Lines

#include "CoroInternal.h"		#include "CoroInternal.h"
#include "llvm/ADT/BitVector.h"		#include "llvm/ADT/BitVector.h"
#include "llvm/ADT/SmallString.h"		#include "llvm/ADT/SmallString.h"
#include "llvm/Analysis/PtrUseVisitor.h"		#include "llvm/Analysis/PtrUseVisitor.h"
#include "llvm/Analysis/StackLifetime.h"		#include "llvm/Analysis/StackLifetime.h"
#include "llvm/Config/llvm-config.h"		#include "llvm/Config/llvm-config.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
		#include "llvm/IR/Constants.h"
#include "llvm/IR/DIBuilder.h"		#include "llvm/IR/DIBuilder.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include "llvm/Support/OptimizedStructLayout.h"		#include "llvm/Support/OptimizedStructLayout.h"
▲ Show 20 Lines • Show All 494 Lines • ▼ Show 20 Lines	uint64_t getStructSize() const {
return StructSize;		return StructSize;
}		}

Align getStructAlign() const {		Align getStructAlign() const {
assert(IsFinished && "not yet finished!");		assert(IsFinished && "not yet finished!");
return StructAlign;		return StructAlign;
}		}

		SmallVector<Field, 8> &getFields() { return Fields; }

FieldIDType getLayoutFieldIndex(FieldIDType Id) const {		FieldIDType getLayoutFieldIndex(FieldIDType Id) const {
assert(IsFinished && "not yet finished!");		assert(IsFinished && "not yet finished!");
return Fields[Id].LayoutFieldIndex;		return Fields[Id].LayoutFieldIndex;
}		}

Field getLayoutField(FieldIDType Id) const {		Field getLayoutField(FieldIDType Id) const {
assert(IsFinished && "not yet finished!");		assert(IsFinished && "not yet finished!");
return Fields[Id];		return Fields[Id];
▲ Show 20 Lines • Show All 579 Lines • ▼ Show 20 Lines	if (Shape.ABI == coro::ABI::Switch) {
SwitchIndexFieldId = B.addField(IndexType, None);		SwitchIndexFieldId = B.addField(IndexType, None);
} else {		} else {
assert(PromiseAlloca == nullptr && "lowering doesn't support promises");		assert(PromiseAlloca == nullptr && "lowering doesn't support promises");
}		}

// Because multiple allocas may own the same field slot,		// Because multiple allocas may own the same field slot,
// we add allocas to field here.		// we add allocas to field here.
B.addFieldForAllocas(F, FrameData, Shape);		B.addFieldForAllocas(F, FrameData, Shape);

		// Create an entry for every spilled value.
		for (auto &S : FrameData.Spills) {
		FieldIDType Id = B.addField(S.first->getType(), None);
		FrameData.setFieldIndex(S.first, Id);
		}
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Why we move this snippet to the front? Although it is not defined, the layout for the current frame would be: \| resume func addr \| destroy func addr \| promise \| other things needed \| Move this to the front may break this. ChuanqiXu: Why we move this snippet to the front? Although it is not defined, the layout for the current…
		ychenAuthorUnsubmitted Done Reply Inline Actions The intent is to structure the code better, no intention to change the frame layout here. My understanding is that `promise` already has a fixed offset ahead of this. `FrameData::Allocas` is ordered but there is no defined semantics. There seem no tests failing due to reordered frame layout. However, I might be wrong. Could you describe how it changes the layout? ychen: The intent is to structure the code better, no intention to change the frame layout here. My…
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Sorry, I made a mistake. This move should be OK. My bad. ChuanqiXu: Sorry, I made a mistake. This move should be OK. My bad.

		Optional<FieldIDType> FramePtrField = None;
		if (Shape.ABI == coro::ABI::Switch) {
// Add PromiseAlloca to Allocas list so that		// Add PromiseAlloca to Allocas list so that
// 1. updateLayoutIndex could update its index after		// 1. updateLayoutIndex could update its index after
// `performOptimizedStructLayout`		// `performOptimizedStructLayout`
// 2. it is processed in insertSpills.		// 2. it is processed in insertSpills.
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions We need to edit the comment too. ChuanqiXu: We need to edit the comment too.
if (Shape.ABI == coro::ABI::Switch && PromiseAlloca)		if (PromiseAlloca)
// We assume that the promise alloca won't be modified before		// We assume that the promise alloca won't be modified before
// CoroBegin and no alias will be create before CoroBegin.		// CoroBegin and no alias will be create before CoroBegin.
FrameData.Allocas.emplace_back(		FrameData.Allocas.emplace_back(
PromiseAlloca, DenseMap<Instruction *, llvm::Optional<APInt>>{}, false);		PromiseAlloca, DenseMap<Instruction *, llvm::Optional<APInt>>{},
// Create an entry for every spilled value.		false);
for (auto &S : FrameData.Spills) {
FieldIDType Id = B.addField(S.first->getType(), None);		Align FrameAlign =
FrameData.setFieldIndex(S.first, Id);		std::max_element(
		B.getFields().begin(), B.getFields().end(),
		[](auto &F1, auto &F2) { return F1.Alignment < F2.Alignment; })
		->Alignment;

		// Check for over-alignment.
		Value *PtrAddr =
		ConstantPointerNull::get(Type::getInt8PtrTy(C)->getPointerTo());
		unsigned NewAlign = Shape.getSwitchCoroId()->getAlignment();
		bool NeedFramePtrField = Shape.CoroRawFramePtrOffsets.size() > 0 \|\|
		Shape.CoroRawFramePtrAddrs.size() > 0;
		if (NeedFramePtrField && NewAlign && FrameAlign > NewAlign) {
		BasicBlock &Entry = F.getEntryBlock();
		IRBuilder<> Builder(&Entry, Entry.getFirstInsertionPt());

		// Reserve frame space for raw frame pointer.
		Value *Mem = Shape.CoroBegin->getMem();
		AllocaInst *FramePtrAddr =
		Builder.CreateAlloca(Mem->getType(), nullptr, "alloc.frame.ptr");
		PtrAddr = FramePtrAddr;
		FramePtrField = B.addFieldForAlloca(FramePtrAddr);
		FrameData.setFieldIndex(FramePtrAddr, *FramePtrField);
		FrameData.Allocas.emplace_back(
		FramePtrAddr, DenseMap<Instruction *, llvm::Optional<APInt>>{}, true);
		}

		for (CoroRawFramePtrAddrInst *C : Shape.CoroRawFramePtrAddrs) {
		C->replaceAllUsesWith(PtrAddr);
		C->eraseFromParent();
		}
}		}

B.finish(FrameTy);		B.finish(FrameTy);
FrameData.updateLayoutIndex(B);		FrameData.updateLayoutIndex(B);
Shape.FrameAlign = B.getStructAlign();		Shape.FrameAlign = B.getStructAlign();
Shape.FrameSize = B.getStructSize();		Shape.FrameSize = B.getStructSize();

switch (Shape.ABI) {		switch (Shape.ABI) {
case coro::ABI::Switch: {		case coro::ABI::Switch: {
// In the switch ABI, remember the switch-index field.		// In the switch ABI, remember the switch-index field.
auto IndexField = B.getLayoutField(*SwitchIndexFieldId);		auto IndexField = B.getLayoutField(*SwitchIndexFieldId);
Shape.SwitchLowering.IndexField = IndexField.LayoutFieldIndex;		Shape.SwitchLowering.IndexField = IndexField.LayoutFieldIndex;
Shape.SwitchLowering.IndexAlign = IndexField.Alignment.value();		Shape.SwitchLowering.IndexAlign = IndexField.Alignment.value();
Shape.SwitchLowering.IndexOffset = IndexField.Offset;		Shape.SwitchLowering.IndexOffset = IndexField.Offset;

		if (FramePtrField) {
		FieldIDType FieldIdx = B.getLayoutFieldIndex(*FramePtrField);
		Shape.SwitchLowering.FramePtrOffset =
		DL.getStructLayout(FrameTy)->getElementOffset(FieldIdx);
		}

// Also round the frame size up to a multiple of its alignment, as is		// Also round the frame size up to a multiple of its alignment, as is
// generally expected in C/C++.		// generally expected in C/C++.
Shape.FrameSize = alignTo(Shape.FrameSize, Shape.FrameAlign);		Shape.FrameSize = alignTo(Shape.FrameSize, Shape.FrameAlign);
break;		break;
}		}

// In the retcon ABI, remember whether the frame is inline in the storage.		// In the retcon ABI, remember whether the frame is inline in the storage.
case coro::ABI::Retcon:		case coro::ABI::Retcon:
▲ Show 20 Lines • Show All 1,527 Lines • Show Last 20 Lines

llvm/lib/Transforms/Coroutines/CoroInstr.h

Show All 21 Lines
// the Coroutine library.		// the Coroutine library.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_LIB_TRANSFORMS_COROUTINES_COROINSTR_H		#ifndef LLVM_LIB_TRANSFORMS_COROUTINES_COROINSTR_H
#define LLVM_LIB_TRANSFORMS_COROUTINES_COROINSTR_H		#define LLVM_LIB_TRANSFORMS_COROUTINES_COROINSTR_H

#include "llvm/IR/GlobalVariable.h"		#include "llvm/IR/GlobalVariable.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
		#include "llvm/IR/Intrinsics.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

namespace llvm {		namespace llvm {

/// This class represents the llvm.coro.subfn.addr instruction.		/// This class represents the llvm.coro.subfn.addr instruction.
class LLVM_LIBRARY_VISIBILITY CoroSubFnInst : public IntrinsicInst {		class LLVM_LIBRARY_VISIBILITY CoroSubFnInst : public IntrinsicInst {
enum { FrameArg, IndexArg };		enum { FrameArg, IndexArg };

▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
public:		public:
AllocaInst *getPromise() const {		AllocaInst *getPromise() const {
Value *Arg = getArgOperand(PromiseArg);		Value *Arg = getArgOperand(PromiseArg);
return isa<ConstantPointerNull>(Arg)		return isa<ConstantPointerNull>(Arg)
? nullptr		? nullptr
: cast<AllocaInst>(Arg->stripPointerCasts());		: cast<AllocaInst>(Arg->stripPointerCasts());
}		}

		unsigned getAlignment() const {
		return cast<ConstantInt>(getArgOperand(AlignArg))->getZExtValue();
		}

void clearPromise() {		void clearPromise() {
Value *Arg = getArgOperand(PromiseArg);		Value *Arg = getArgOperand(PromiseArg);
setArgOperand(PromiseArg,		setArgOperand(PromiseArg,
ConstantPointerNull::get(Type::getInt8PtrTy(getContext())));		ConstantPointerNull::get(Type::getInt8PtrTy(getContext())));
if (isa<AllocaInst>(Arg))		if (isa<AllocaInst>(Arg))
return;		return;
assert((isa<BitCastInst>(Arg) \|\| isa<GetElementPtrInst>(Arg)) &&		assert((isa<BitCastInst>(Arg) \|\| isa<GetElementPtrInst>(Arg)) &&
"unexpected instruction designating the promise");		"unexpected instruction designating the promise");
▲ Show 20 Lines • Show All 462 Lines • ▼ Show 20 Lines	public:
static bool classof(const IntrinsicInst *I) {		static bool classof(const IntrinsicInst *I) {
return I->getIntrinsicID() == Intrinsic::coro_size;		return I->getIntrinsicID() == Intrinsic::coro_size;
}		}
static bool classof(const Value *V) {		static bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}		}
};		};

		/// This represents the llvm.coro.align instruction.
		class LLVM_LIBRARY_VISIBILITY CoroAlignInst : public IntrinsicInst {
		public:
		// Methods to support type inquiry through isa, cast, and dyn_cast:
		static bool classof(const IntrinsicInst *I) {
		return I->getIntrinsicID() == Intrinsic::coro_align;
		}
		static bool classof(const Value *V) {
		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
		}
		};

		/// This represents the llvm.coro.raw.frame.ptr.offset instruction.
		class LLVM_LIBRARY_VISIBILITY CoroRawFramePtrOffsetInst : public IntrinsicInst {
		public:
		// Methods to support type inquiry through isa, cast, and dyn_cast:
		static bool classof(const IntrinsicInst *I) {
		return I->getIntrinsicID() == Intrinsic::coro_raw_frame_ptr_offset;
		}
		static bool classof(const Value *V) {
		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
		}
		};

		/// This represents the llvm.coro.raw.frame.ptr.addr instruction.
		class LLVM_LIBRARY_VISIBILITY CoroRawFramePtrAddrInst : public IntrinsicInst {
		public:
		// Methods to support type inquiry through isa, cast, and dyn_cast:
		static bool classof(const IntrinsicInst *I) {
		return I->getIntrinsicID() == Intrinsic::coro_raw_frame_ptr_addr;
		}
		static bool classof(const Value *V) {
		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
		}
		};

class LLVM_LIBRARY_VISIBILITY AnyCoroEndInst : public IntrinsicInst {		class LLVM_LIBRARY_VISIBILITY AnyCoroEndInst : public IntrinsicInst {
enum { FrameArg, UnwindArg };		enum { FrameArg, UnwindArg };

public:		public:
bool isFallthrough() const { return !isUnwind(); }		bool isFallthrough() const { return !isUnwind(); }
bool isUnwind() const {		bool isUnwind() const {
return cast<Constant>(getArgOperand(UnwindArg))->isOneValue();		return cast<Constant>(getArgOperand(UnwindArg))->isOneValue();
}		}
▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

llvm/lib/Transforms/Coroutines/CoroInternal.h

Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
};		};

// Holds structural Coroutine Intrinsics for a particular function and other		// Holds structural Coroutine Intrinsics for a particular function and other
// values used during CoroSplit pass.		// values used during CoroSplit pass.
struct LLVM_LIBRARY_VISIBILITY Shape {		struct LLVM_LIBRARY_VISIBILITY Shape {
CoroBeginInst *CoroBegin;		CoroBeginInst *CoroBegin;
SmallVector<AnyCoroEndInst *, 4> CoroEnds;		SmallVector<AnyCoroEndInst *, 4> CoroEnds;
SmallVector<CoroSizeInst *, 2> CoroSizes;		SmallVector<CoroSizeInst *, 2> CoroSizes;
		SmallVector<CoroAlignInst *, 2> CoroAligns;
		SmallVector<CoroRawFramePtrOffsetInst *, 2> CoroRawFramePtrOffsets;
		SmallVector<CoroRawFramePtrAddrInst *, 2> CoroRawFramePtrAddrs;
SmallVector<AnyCoroSuspendInst *, 4> CoroSuspends;		SmallVector<AnyCoroSuspendInst *, 4> CoroSuspends;
SmallVector<CallInst*, 2> SwiftErrorOps;		SmallVector<CallInst*, 2> SwiftErrorOps;

// Field indexes for special fields in the switch lowering.		// Field indexes for special fields in the switch lowering.
struct SwitchFieldIndex {		struct SwitchFieldIndex {
enum {		enum {
Resume,		Resume,
Destroy		Destroy
Show All 20 Lines	struct LLVM_LIBRARY_VISIBILITY Shape {

struct SwitchLoweringStorage {		struct SwitchLoweringStorage {
SwitchInst *ResumeSwitch;		SwitchInst *ResumeSwitch;
AllocaInst *PromiseAlloca;		AllocaInst *PromiseAlloca;
BasicBlock *ResumeEntryBlock;		BasicBlock *ResumeEntryBlock;
unsigned IndexField;		unsigned IndexField;
unsigned IndexAlign;		unsigned IndexAlign;
unsigned IndexOffset;		unsigned IndexOffset;
		unsigned FramePtrOffset;
bool HasFinalSuspend;		bool HasFinalSuspend;
};		};

struct RetconLoweringStorage {		struct RetconLoweringStorage {
Function *ResumePrototype;		Function *ResumePrototype;
Function *Alloc;		Function *Alloc;
Function *Dealloc;		Function *Dealloc;
BasicBlock *ReturnBlock;		BasicBlock *ReturnBlock;
▲ Show 20 Lines • Show All 143 Lines • Show Last 20 Lines

llvm/lib/Transforms/Coroutines/CoroSplit.cpp

Show First 20 Lines • Show All 1,040 Lines • ▼ Show 20 Lines	static void updateAsyncFuncPointerContextSize(coro::Shape &Shape) {
auto *NewContextSize = ConstantInt::get(OrigContextSize->getType(),		auto *NewContextSize = ConstantInt::get(OrigContextSize->getType(),
Shape.AsyncLowering.ContextSize);		Shape.AsyncLowering.ContextSize);
auto *NewFuncPtrStruct = ConstantStruct::get(		auto *NewFuncPtrStruct = ConstantStruct::get(
FuncPtrStruct->getType(), OrigRelativeFunOffset, NewContextSize);		FuncPtrStruct->getType(), OrigRelativeFunOffset, NewContextSize);

Shape.AsyncLowering.AsyncFuncPointer->setInitializer(NewFuncPtrStruct);		Shape.AsyncLowering.AsyncFuncPointer->setInitializer(NewFuncPtrStruct);
}		}

static void replaceFrameSize(coro::Shape &Shape) {		static void replaceFrameSizeAndAlign(coro::Shape &Shape) {
if (Shape.ABI == coro::ABI::Async)		if (Shape.ABI == coro::ABI::Async)
updateAsyncFuncPointerContextSize(Shape);		updateAsyncFuncPointerContextSize(Shape);

if (Shape.CoroSizes.empty())		if (!Shape.CoroSizes.empty()) {
return;

// In the same function all coro.sizes should have the same result type.		// In the same function all coro.sizes should have the same result type.
auto *SizeIntrin = Shape.CoroSizes.back();		auto *SizeIntrin = Shape.CoroSizes.back();
Module *M = SizeIntrin->getModule();		Module *M = SizeIntrin->getModule();
const DataLayout &DL = M->getDataLayout();		const DataLayout &DL = M->getDataLayout();
auto Size = DL.getTypeAllocSize(Shape.FrameTy);		auto Size = DL.getTypeAllocSize(Shape.FrameTy);
auto *SizeConstant = ConstantInt::get(SizeIntrin->getType(), Size);		auto *SizeConstant = ConstantInt::get(SizeIntrin->getType(), Size);

for (CoroSizeInst *CS : Shape.CoroSizes) {		for (CoroSizeInst *CS : Shape.CoroSizes) {
CS->replaceAllUsesWith(SizeConstant);		CS->replaceAllUsesWith(SizeConstant);
CS->eraseFromParent();		CS->eraseFromParent();
}		}
}		}

		if (!Shape.CoroAligns.empty()) {
		auto *Intrin = Shape.CoroAligns.back();
		auto *AlignConstant =
		ConstantInt::get(Intrin->getType(), Shape.FrameAlign.value());

		for (CoroAlignInst *CS : Shape.CoroAligns) {
		CS->replaceAllUsesWith(AlignConstant);
		CS->eraseFromParent();
		}
		}

		if (!Shape.CoroRawFramePtrOffsets.empty()) {
		auto *Intrin = Shape.CoroRawFramePtrOffsets.back();
		auto *FramePtrOffset = ConstantInt::get(
		Intrin->getType(), Shape.SwitchLowering.FramePtrOffset);

		for (CoroRawFramePtrOffsetInst *CS : Shape.CoroRawFramePtrOffsets) {
		CS->replaceAllUsesWith(FramePtrOffset);
		CS->eraseFromParent();
		}
		}
		}

// Create a global constant array containing pointers to functions provided and		// Create a global constant array containing pointers to functions provided and
// set Info parameter of CoroBegin to point at this constant. Example:		// set Info parameter of CoroBegin to point at this constant. Example:
//		//
// @f.resumers = internal constant [2 x void(%f.frame)]		// @f.resumers = internal constant [2 x void(%f.frame)]
// [void(%f.frame) @f.resume, void(%f.frame) @f.destroy]		// [void(%f.frame) @f.resume, void(%f.frame) @f.destroy]
// define void @f() {		// define void @f() {
// ...		// ...
// call i8* @llvm.coro.begin(i8* null, i32 0, i8* null,		// call i8* @llvm.coro.begin(i8* null, i32 0, i8* null,
▲ Show 20 Lines • Show All 716 Lines • ▼ Show 20 Lines	static coro::Shape splitCoroutine(Function &F,
removeUnreachableBlocks(F);		removeUnreachableBlocks(F);

coro::Shape Shape(F, ReuseFrameSlot);		coro::Shape Shape(F, ReuseFrameSlot);
if (!Shape.CoroBegin)		if (!Shape.CoroBegin)
return Shape;		return Shape;

simplifySuspendPoints(Shape);		simplifySuspendPoints(Shape);
buildCoroutineFrame(F, Shape);		buildCoroutineFrame(F, Shape);
replaceFrameSize(Shape);		replaceFrameSizeAndAlign(Shape);

// If there are no suspend points, no split required, just remove		// If there are no suspend points, no split required, just remove
// the allocation and deallocation blocks, they are not needed.		// the allocation and deallocation blocks, they are not needed.
if (Shape.CoroSuspends.empty()) {		if (Shape.CoroSuspends.empty()) {
handleNoSuspendCoroutine(Shape);		handleNoSuspendCoroutine(Shape);
} else {		} else {
switch (Shape.ABI) {		switch (Shape.ABI) {
case coro::ABI::Switch:		case coro::ABI::Switch:
▲ Show 20 Lines • Show All 464 Lines • Show Last 20 Lines

llvm/lib/Transforms/Coroutines/Coroutines.cpp

Show First 20 Lines • Show All 228 Lines • ▼ Show 20 Lines	void coro::updateCallGraph(Function &ParentFunc, ArrayRef<Function *> NewFuncs,

SCC.initialize(Nodes);		SCC.initialize(Nodes);
}		}

static void clear(coro::Shape &Shape) {		static void clear(coro::Shape &Shape) {
Shape.CoroBegin = nullptr;		Shape.CoroBegin = nullptr;
Shape.CoroEnds.clear();		Shape.CoroEnds.clear();
Shape.CoroSizes.clear();		Shape.CoroSizes.clear();
		Shape.CoroAligns.clear();
		Shape.CoroRawFramePtrOffsets.clear();
		Shape.CoroRawFramePtrAddrs.clear();
Shape.CoroSuspends.clear();		Shape.CoroSuspends.clear();

Shape.FrameTy = nullptr;		Shape.FrameTy = nullptr;
Shape.FramePtr = nullptr;		Shape.FramePtr = nullptr;
Shape.AllocaSpillBlock = nullptr;		Shape.AllocaSpillBlock = nullptr;
}		}

static CoroSaveInst createCoroSave(CoroBeginInst CoroBegin,		static CoroSaveInst createCoroSave(CoroBeginInst CoroBegin,
Show All 18 Lines	void coro::Shape::buildFrom(Function &F) {
for (Instruction &I : instructions(F)) {		for (Instruction &I : instructions(F)) {
if (auto II = dyn_cast<IntrinsicInst>(&I)) {		if (auto II = dyn_cast<IntrinsicInst>(&I)) {
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
default:		default:
continue;		continue;
case Intrinsic::coro_size:		case Intrinsic::coro_size:
CoroSizes.push_back(cast<CoroSizeInst>(II));		CoroSizes.push_back(cast<CoroSizeInst>(II));
break;		break;
		case Intrinsic::coro_align:
		CoroAligns.push_back(cast<CoroAlignInst>(II));
		break;
		case Intrinsic::coro_raw_frame_ptr_offset:
		CoroRawFramePtrOffsets.push_back(cast<CoroRawFramePtrOffsetInst>(II));
		break;
		case Intrinsic::coro_raw_frame_ptr_addr:
		CoroRawFramePtrAddrs.push_back(cast<CoroRawFramePtrAddrInst>(II));
		break;
case Intrinsic::coro_frame:		case Intrinsic::coro_frame:
CoroFrames.push_back(cast<CoroFrameInst>(II));		CoroFrames.push_back(cast<CoroFrameInst>(II));
break;		break;
case Intrinsic::coro_save:		case Intrinsic::coro_save:
// After optimizations, coro_suspends using this coro_save might have		// After optimizations, coro_suspends using this coro_save might have
// been removed, remember orphaned coro_saves to remove them later.		// been removed, remember orphaned coro_saves to remove them later.
if (II->use_empty())		if (II->use_empty())
UnusedCoroSaves.push_back(cast<CoroSaveInst>(II));		UnusedCoroSaves.push_back(cast<CoroSaveInst>(II));
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	void coro::Shape::buildFrom(Function &F) {
switch (auto IdIntrinsic = Id->getIntrinsicID()) {		switch (auto IdIntrinsic = Id->getIntrinsicID()) {
case Intrinsic::coro_id: {		case Intrinsic::coro_id: {
auto SwitchId = cast<CoroIdInst>(Id);		auto SwitchId = cast<CoroIdInst>(Id);
this->ABI = coro::ABI::Switch;		this->ABI = coro::ABI::Switch;
this->SwitchLowering.HasFinalSuspend = HasFinalSuspend;		this->SwitchLowering.HasFinalSuspend = HasFinalSuspend;
this->SwitchLowering.ResumeSwitch = nullptr;		this->SwitchLowering.ResumeSwitch = nullptr;
this->SwitchLowering.PromiseAlloca = SwitchId->getPromise();		this->SwitchLowering.PromiseAlloca = SwitchId->getPromise();
this->SwitchLowering.ResumeEntryBlock = nullptr;		this->SwitchLowering.ResumeEntryBlock = nullptr;
		this->SwitchLowering.FramePtrOffset = 0;

for (auto AnySuspend : CoroSuspends) {		for (auto AnySuspend : CoroSuspends) {
auto Suspend = dyn_cast<CoroSuspendInst>(AnySuspend);		auto Suspend = dyn_cast<CoroSuspendInst>(AnySuspend);
if (!Suspend) {		if (!Suspend) {
#ifndef NDEBUG		#ifndef NDEBUG
AnySuspend->dump();		AnySuspend->dump();
#endif		#endif
report_fatal_error("coro.id must be paired with coro.suspend");		report_fatal_error("coro.id must be paired with coro.suspend");
▲ Show 20 Lines • Show All 367 Lines • Show Last 20 Lines

llvm/test/Transforms/Coroutines/coro-frame-overalign.ll

This file was added.

				; Check that `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and
				; `@llvm.coro.raw.frame.ptr.alloca` are lowered correctly.
				; RUN: opt < %s -passes=coro-split -S \| FileCheck %s

				%PackedStruct = type <{ i64 }>

				declare void @consume(%PackedStruct, i32, i32, i8*)
				declare void @consume2(i32, i32)

				define i8* @f() "coroutine.presplit"="1" {
				entry:
				%data = alloca %PackedStruct, align 32
				%id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* null)
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call i8* @malloc(i32 %size)
				%hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
				%align = call i32 @llvm.coro.align.i32()
				%offset = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
				%addr = call i8** @llvm.coro.raw.frame.ptr.addr()
				call void @consume(%PackedStruct* %data, i32 %align, i32 %offset, i8** %addr)
				%0 = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %0, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]
				resume:
				br label %cleanup

				cleanup:
				%align2 = call i32 @llvm.coro.align.i32()
				%offset2 = call i32 @llvm.coro.raw.frame.ptr.offset.i32()
				call void @consume2(i32 %align2, i32 %offset2)
				%mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
				call void @free(i8* %mem)
				br label %suspend
				suspend:
				call i1 @llvm.coro.end(i8* %hdl, i1 0)
				ret i8* %hdl
				}

				; See if the raw frame pointer was inserted into the frame.
				; CHECK-LABEL: %f.Frame = type { void (%f.Frame), void (%f.Frame), i8*, i1, [7 x i8], %PackedStruct }

				; See if we used correct index to access frame addr field (field 2).
				; CHECK-LABEL: @f(
				; CHECK: %alloc.frame.ptr = alloca i8*, align 8
				; CHECK: %[[FIELD:.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
				; CHECK: %[[ADDR:.+]] = load i8, i8* %alloc.frame.ptr, align 8
				; CHECK: store i8* %[[ADDR]], i8** %[[FIELD]], align 8
				; CHECK: %[[DATA:.+]] = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 5
				; CHECK: call void @consume(%PackedStruct* %[[DATA]], i32 32, i32 16, i8** %[[FIELD]])
				; CHECK: ret i8*

				; See if `llvm.coro.align` and `llvm.coro.raw.frame.ptr.offset` are lowered
				; correctly during deallocation.
				; CHECK-LABEL: @f.destroy(
				; CHECK: call void @consume2(i32 32, i32 16)
				; CHECK: call void @free(i8* %{{.*}})

				; CHECK-LABEL: @f.cleanup(
				; CHECK: call void @consume2(i32 32, i32 16)
				; CHECK: call void @free(i8*

				declare i8* @llvm.coro.free(token, i8*)
				declare i32 @llvm.coro.size.i32()
				declare i32 @llvm.coro.align.i32()
				declare i32 @llvm.coro.raw.frame.ptr.offset.i32()
				declare i8** @llvm.coro.raw.frame.ptr.addr()
				declare i8 @llvm.coro.suspend(token, i1)
				declare void @llvm.coro.resume(i8*)
				declare void @llvm.coro.destroy(i8*)

				declare token @llvm.coro.id(i32, i8, i8, i8*)
				declare i1 @llvm.coro.alloc(token)
				declare i8* @llvm.coro.begin(token, i8*)
				declare i1 @llvm.coro.end(i8*, i1)

				declare noalias i8* @malloc(i32)
				declare double @print(double)
				declare void @free(i8*)