This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
-
CGBuiltin.cpp
-
test/CodeGenCoroutines/
-
CodeGenCoroutines/
-
coro-alloc.cpp
-
coro-builtins.c
-
coro-gro.cpp
-
llvm/
-
docs/
1
Coroutines.rst
-
include/llvm/IR/
-
llvm/
-
IR/
-
Intrinsics.td
-
lib/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
1/1
CoroFrame.cpp
-
CoroInstr.h
-
CoroInternal.h
-
CoroSplit.cpp
-
Coroutines.cpp
-
test/Transforms/Coroutines/
-
Transforms/
-
Coroutines/
-
coro-overalign.ll

Differential D100739

[Coroutines] Handle overaligned frame allocation (2)
AbandonedPublic

Authored by ychen on Apr 18 2021, 10:39 PM.

Download Raw Diff

Details

Reviewers

rjmccall
lxfind
ChuanqiXu

Summary

This is an alternative to D97915 which missed proper deallocation of the
over-allocated frame. This patch handles both allocations and deallocations.

Both user-defined promise/local variables and compiler synthesized local variables could cause the coroutine frame to be overaligned. There are some related descriptions in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2014r0.pdf (issue #3).

Contrary to D97915, this patch implements the over-allocation in the backend instead of the frontend since

The alloca of the raw frame pointer (suppose we insert it in the frontend) would be included in the non-overaligned frame if we don't teach CoroFrame how to elide it.
Only insert extra code when it is known that the frame is overaligned.
Simpler implementation.
Clients could turn it on/off by using llvm.coro.size.aligned instead of llvm.coro.size. It indicates to LLVM that it should handle overaligned frames.

Overall approach:

Overalignment handling is only performed when llvm.coro.size.aligned is used.
llvm.coro.size.aligned returns (sizeof(coroutine frame) + max(0, alignof(coroutine frame) - STDCPP_DEFAULT_NEW_ALIGNMENT), which, when coroutine is not overaligned, equal to llvm.coro.size.
In CoroFrame, immediately after the alignment of frame is known to be overaligned, extra code is emitted to 1) create a new alloca to store the malloc returned memory address. 2) emit dynamic alignment adjust code to make sure frame ptr address aligns correctly. 3) remember the frame index for the newly created alloca, use it later for deallocation
In coro::replaceCoroFree, when it is decided that heap allocation could not be elided, instead of free(<frame ptr>) do Value *v = gep <frame ptr>, 0, frame_ptr_addr_index; free(load(v))
Let Clang switch to use llvm.coro.size.aligned. When in the future, Clang gains support to handle this by itself, switch it back to use llvm.coro.size.

https://reviews.llvm.org/P8260 is a IR diff (llvm.coro.size vs llvm.coro.size.aligned) for the test case in llvm/test/Transforms/Coroutines/coro-padding.ll.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ychen created this revision.Apr 18 2021, 10:39 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptApr 18 2021, 10:39 PM

ychen requested review of this revision.Apr 18 2021, 10:39 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 18 2021, 10:39 PM

Herald added subscribers: llvm-commits, cfe-commits, jdoerfert. · View Herald Transcript

fix comment

Fix comment. Ready for review.

Harbormaster completed remote builds in B99419: Diff 338428.Apr 18 2021, 11:34 PM

Harbormaster completed remote builds in B99421: Diff 338431.Apr 18 2021, 11:59 PM

Harbormaster completed remote builds in B99420: Diff 338430.Apr 19 2021, 12:14 AM

I hadn't looked into the details. I would try to make it.
But from my understanding to this problem, the correct solution should remain the previous behavior if the end user doesn't specify alignas for promise_type. I mean, it shouldn't make so many test cases fail.

The alloca of the raw frame pointer (suppose we insert it in the frontend) would be included in the non-overaligned frame if we don't teach CoroFrame how to elide it.

It is confusing to me, maybe I need to look into the codes. First, the raw frame pointer isn't inserted in the frontend. Do you mean coro.begin? Then, what's the relationship between eliding and over-aligned? It also makes me confused.

In D100739#2700273, @ChuanqiXu wrote:

I hadn't looked into the details. I would try to make it.
But from my understanding to this problem, the correct solution should remain the previous behavior if the end user doesn't specify alignas for promise_type. I mean, it shouldn't make so many test cases fail.

The failures mostly are due to the llvm.coro.size() -> llvm.coro.size(i1 alloc) intrinsic signature change. The rest are due to the align arg of llvm.coro.id is illegal 0 in most of the tests. This patch would read that value causing asssertions.

The alloca of the raw frame pointer (suppose we insert it in the frontend) would be included in the non-overaligned frame if we don't teach CoroFrame how to elide it.

It is confusing to me, maybe I need to look into the codes. First, the raw frame pointer isn't inserted in the frontend. Do you mean coro.begin? Then, what's the relationship between eliding and over-aligned? It also makes me confused.

Since we don't know if the frame is overaligned or not until after the frame type is decided in CoroFrame, *suppose* we do this (adding raw frame ptr alloca in frontend regardless of the presence of overalignment or not) in the frontend, the code generated from CGCoroutine would be like this:

%1 = alloc i8*
%2 = malloc(x)
store %2, %1
..
..
%11 = load %1
free(%11)

The alloca %1 must be there even if the frame is not aligned. Then in the CoroFrame, we have to check if the frame is really overaligned and if not, reverse the above patterns. Otherwise, the alloca %1 would be in the frame needlessly and the following optimizations could not reliably remove it.

This patch defers adding the alloca until CoroFrame where we know for sure that the frame is aligned.

ychen edited the summary of this revision. (Show Details)Apr 19 2021, 9:24 PM

In D100739#2700324, @ychen wrote:
In D100739#2700273, @ChuanqiXu wrote:

I hadn't looked into the details. I would try to make it.
But from my understanding to this problem, the correct solution should remain the previous behavior if the end user doesn't specify alignas for promise_type. I mean, it shouldn't make so many test cases fail.

The failures mostly are due to the llvm.coro.size() -> llvm.coro.size(i1 alloc) intrinsic signature change. The rest are due to the align arg of llvm.coro.id is illegal 0 in most of the tests. This patch would read that value causing asssertions.

The alloca of the raw frame pointer (suppose we insert it in the frontend) would be included in the non-overaligned frame if we don't teach CoroFrame how to elide it.

It is confusing to me, maybe I need to look into the codes. First, the raw frame pointer isn't inserted in the frontend. Do you mean coro.begin? Then, what's the relationship between eliding and over-aligned? It also makes me confused.

Since we don't know if the frame is overaligned or not until after the frame type is decided in CoroFrame, *suppose* we do this (adding raw frame ptr alloca in frontend regardless of the presence of overalignment or not) in the frontend, the code generated from CGCoroutine would be like this:
%1 = alloc i8*
%2 = malloc(x)
store %2, %1
..
..
%11 = load %1
free(%11)
The alloca %1 must be there even if the frame is not aligned. Then in the CoroFrame, we have to check if the frame is really overaligned and if not, reverse the above patterns. Otherwise, the alloca %1 would be in the frame needlessly and the following optimizations could not reliably remove it.

This patch defers adding the alloca until CoroFrame where we know for sure that the frame is aligned.

The semantics of alloca and free in the example look similar with coro.begin and coro.end.

lxfind added inline comments.Apr 20 2021, 5:36 PM

clang/docs/LanguageExtensions.rst
2689 ↗	(On Diff #338431)	Do we need to change __builtin_coro_size? The argument will always be 1, right? It only starts to change in LLVM intrinsics, if I read the impl correctly.

ychen added inline comments.Apr 20 2021, 5:52 PM

clang/docs/LanguageExtensions.rst
2689 ↗	(On Diff #338431)	Yeah, It is always 1 for Clang until the spec is fixed (then we could revert it back to 0). Other clients using `__builtin_coro_size` may use 0 if the client doesn't care about overaligned frame or it could handle overaligned frame by itself.

There is something I am still confused about these two patch. Maybe I don't get the problem right.
The example problem shows if user uses alignas to specify the alignment for promise_type, the actual alignment for the promise isn't correctly due to compiler didn't call new method with alignment which isn't specified in the spec.
Then these two patches are trying to add paddings to the frame type to make the alignment of promise_type right. Here is a gap, in fact the problem comes from the alignment promise. But we want to solve it by over align the frame. It is odd for me.
I wonder if it is possible to simply adjust the alignment for the alloca of promise, like:

%promise = alloca %promise_type, align 64

clang/docs/LanguageExtensions.rst
2689 ↗	(On Diff #338431)	BTW, is it OK to edit the `builtin`s directly? Since builtin is different with intrinsic which is only visible in the internal of compiler, builtin could be used by any end users. Although I know there should be little users who would use `__builtin_coro` APIs, I worry if there is any guide principle for editing the `builtin`s.
llvm/docs/Coroutines.rst
958	Maybe I was missing something. I think these two intrinsic should take one i8* argument to specify the coroutine handle. Otherwise, it may be confusing if there are some coroutines get inlined into other coroutine.

ychen added inline comments.Apr 20 2021, 8:37 PM

clang/docs/LanguageExtensions.rst
2689 ↗	(On Diff #338431)	BTW, is it OK to edit the builtins directly? Since builtin is different with intrinsic which is only visible in the internal of compiler, builtin could be used by any end users. Although I know there should be little users who would use __builtin_coro APIs, I worry if there is any guide principle for editing the builtins. I think it is ok to change these if it is justified like anything else. builtins/intrinsics are interfaces on different levels. I'm trying to make __builtin_coro_size consistent with llvm.coro.size because I don't have a good reason for not doing that. (assume that we keep this opt-in overaligned frame handling in LLVM even after the spec is fixed since it helps solve a practical problem and the maintenance cost is low)

ChuanqiXu added inline comments.Apr 20 2021, 8:51 PM

clang/docs/LanguageExtensions.rst
2689 ↗	(On Diff #338431)	It doesn't make sense to me that we need to change the signature for `__builtin_coro_size` in this patch. In other words, why do we need to change `__builtin_coro_size` ? What are problems that can't be solved if we don't change `__builtin_coro_size`? At least, if it is necessary to change `__builtin_coro_size`, we could make it in successive patches.

lxfind added inline comments.Apr 21 2021, 2:38 PM

clang/docs/LanguageExtensions.rst
2689 ↗	(On Diff #338431)	Yeah I agree with ChuanqiXu, there is no need to make the builtin to be exactly the same as the llvm intrinsics just because they have the same name. Many of them are different even though they have the same name.

Thanks for working on this.
I am still having a bit hard time understanding the solution.
A few questions:

I assume this patch is to solve the problem where the promise object is not aligned according to its alignof annotation, right? The title/wording is a bit misleading. Usually "handling XXX" means XXX is a situation/problem that wasn't handle properly before, and it's being handled here. I don't really understand what "handle overaligned frame allocation" means. Isn't frame allocation under-aligned being the problem?
What is the purpose of coro.align intrinsic?
Could you provide some examples of what the IR might look like after this patch? Either that or a more detailed explanation of how this works in the summary.
Do you think it might be cleaner to introduce a new variant of coro.size instead of adding arguments to it? For example, coro.size.aligned(). This way, you can avoid changing any test file for non-switch-lowering test files, but focus on all switch-lowering tests.
Typically, coro.free is used by a comparison with nullptr. This is to enable CoroElide. See: https://llvm.org/docs/Coroutines.html#llvm-coro-free-intrinsic. So I don't think you can load from it directly.

ychen edited the summary of this revision. (Show Details)Apr 22 2021, 11:11 PM

ychen edited the summary of this revision. (Show Details)

ychen edited the summary of this revision. (Show Details)Apr 22 2021, 11:48 PM

What is the purpose of the builtin? Where is it being used? Typically you *can't* change the signature of a builtin because the builtin is itself a language feature that's documented to have a particular signature. If you've made a builtin purely for use in generated AST, that's pretty unfortunate, and you should consider whether you actually have to do that instead of e.g. synthesizing a call to an allocation function the same way that we do in new expressions.

Address feebacks.

In D100739#2706973, @lxfind wrote:

Thanks for working on this.
I am still having a bit hard time understanding the solution.
A few questions:

I assume this patch is to solve the problem where the promise object is not aligned according to its alignof annotation, right? The title/wording is a bit misleading. Usually "handling XXX" means XXX is a situation/problem that wasn't handle properly before, and it's being handled here. I don't really understand what "handle overaligned frame allocation" means. Isn't frame allocation under-aligned being the problem?

Sorry for the confusion. I think either overaligned or under-aligned could be used here to describe the problem: either "Handle overaligned frame" or "Fix under-aligned frame". Since c++ spec defines the former but not the later (https://en.cppreference.com/w/cpp/language/object#Alignment), my first intuition was to use the term "overalign". Under-aligned is the undesired outcome that should be fixed (probably too late to handle I assume). Also the overaligned is a static property whereas 'under-aligned" is a runtime property. From the compiler's perspective, I think overaligned should be preferred. With that said, I don't feel strongly about this. I could switch to use "overaligned" if that feels more intuitive.

What is the purpose of coro.align intrinsic?

To communicate frame alignment to the frontend. It shouldn't be needed for this patch, so I've removed it.

Could you provide some examples of what the IR might look like after this patch? Either that or a more detailed explanation of how this works in the summary.

Yep, please see the updated description. And a new test is added.

Do you think it might be cleaner to introduce a new variant of coro.size instead of adding arguments to it? For example, coro.size.aligned(). This way, you can avoid changing any test file for non-switch-lowering test files, but focus on all switch-lowering tests.

Agree, I've thought about variadic intrinsic and this new intrinsic, I think using new intrinsic is more flexible and avoids test fixup.

Typically, coro.free is used by a comparison with nullptr. This is to enable CoroElide. See: https://llvm.org/docs/Coroutines.html#llvm-coro-free-intrinsic. So I don't think you can load from it directly.

Agree, I've changed to do it in coro::replaceCoroFree.

@ChuanqiXu @lxfind Thanks a lot for the feedback. I've updated the description and addressed the existing comments. Please take a look.

In D100739#2711181, @rjmccall wrote:

What is the purpose of the builtin? Where is it being used? Typically you *can't* change the signature of a builtin because the builtin is itself a language feature that's documented to have a particular signature. If you've made a builtin purely for use in generated AST, that's pretty unfortunate, and you should consider whether you actually have to do that instead of e.g. synthesizing a call to an allocation function the same way that we do in new expressions.

Well, the intention was not *only* use it in AST, it could be used by clients to ask LLVM to handle overaligned frame. I'm not sure how many use cases that could have, so in the updated patch, I've removed it.

Fix typo.

fix typo.

ychen edited the summary of this revision. (Show Details)Apr 23 2021, 12:16 AM

Harbormaster completed remote builds in B100476: Diff 339900.Apr 23 2021, 1:07 AM

Harbormaster completed remote builds in B100479: Diff 339906.Apr 23 2021, 1:30 AM

Harbormaster completed remote builds in B100478: Diff 339902.Apr 23 2021, 1:42 AM

This is an alternative to D97915 which missed proper deallocation of the over-allocated frame. This patch handles both allocations and deallocations.

If D97915 is not needed, we should abandon it.

For the example shows in D97915, it says:

#include <experimental/coroutine>
#include <iostream>
#include <stdexcept>
#include <thread>
#include <cassert>

struct task{
  struct alignas(64) promise_type {
    task get_return_object() { return {}; }
    std::experimental::suspend_never initial_suspend() { return {}; }
    std::experimental::suspend_never final_suspend() noexcept { return {}; }
    void return_void() {}
    void unhandled_exception() {}
  };
  using handle = std::experimental::coroutine_handle<promise_type>;
};

auto switch_to_new_thread() {
  struct awaitable {
    bool await_ready() { return false; }
    void await_suspend(task::handle h) {
      auto i = reinterpret_cast<std::uintptr_t>(&h.promise());
      std::cout << i << std::endl;
      assert(i % 64 == 0);
    }
    void await_resume() {}
  };
  return awaitable{};
}

task resuming_on_new_thread() {
  co_await switch_to_new_thread();
}

int main() {
  resuming_on_new_thread();
}

The assertion would fail. If this is the root problem, I think we could adjust the align for the promise alloca like:

%promise = alloca %promise_type, align 8

into

%promise = alloca %promise_type, align 128

In other words, if this the problem we need to solve, I think we could make it in a simpler way.

Then I looked into the document you give in the summary. The issue#3 says the frontend can't do some work in the process of template instantiation due to the frontend doesn't know about the align and size of the coroutine. But from the implementation, it looks like not the problem this patch wants to solve.

I am really confused about the problem. Could you please restate your problem more in more detail? For example, would it make the alignment incorrect like the example above? Or does we want the frontend to get alignment information? Then what would be affected? From the title, I can guess the size of frame would get bigger. But how big would it be? Who would control and determine the final size?

Sorry for the confusion. I think either overaligned or under-aligned could be used here to describe the problem: either "Handle overaligned frame" or "Fix under-aligned frame". Since c++ spec defines the former but not the later (https://en.cppreference.com/w/cpp/language/object#Alignment), my first intuition was to use the term "overalign". Under-aligned is the undesired outcome that should be fixed (probably too late to handle I assume). Also the overaligned is a static property whereas 'under-aligned" is a runtime property. From the compiler's perspective, I think overaligned should be preferred. With that said, I don't feel strongly about this. I could switch to use "overaligned" if that feels more intuitive.

"Handle" is probably not the right word to be used here. What follows "handle" is typically a legit situation that already occurred but not current handled properly. Here "overaligned frame" doesn't already occur. From what I understand, you really just want to support promise object alignment. So why not just say that directly?
To add on that, I do think you need to describe the problem in more detail in the description. It's indeed still confusing.

In D100739#2711698, @ChuanqiXu wrote:

This is an alternative to D97915 which missed proper deallocation of the over-allocated frame. This patch handles both allocations and deallocations.

If D97915 is not needed, we should abandon it.

For the example shows in D97915, it says:

#include <experimental/coroutine>
#include <iostream>
#include <stdexcept>
#include <thread>
#include <cassert>

struct task{
  struct alignas(64) promise_type {
    task get_return_object() { return {}; }
    std::experimental::suspend_never initial_suspend() { return {}; }
    std::experimental::suspend_never final_suspend() noexcept { return {}; }
    void return_void() {}
    void unhandled_exception() {}
  };
  using handle = std::experimental::coroutine_handle<promise_type>;
};

auto switch_to_new_thread() {
  struct awaitable {
    bool await_ready() { return false; }
    void await_suspend(task::handle h) {
      auto i = reinterpret_cast<std::uintptr_t>(&h.promise());
      std::cout << i << std::endl;
      assert(i % 64 == 0);
    }
    void await_resume() {}
  };
  return awaitable{};
}

task resuming_on_new_thread() {
  co_await switch_to_new_thread();
}

int main() {
  resuming_on_new_thread();
}

The assertion would fail. If this is the root problem, I think we could adjust the align for the promise alloca like:

The problem is that any member of the coroutine frame could be overaligned (thus make the frame overaligned) including promise, local variables, spills. The problem is *not* specific to promise.

%promise = alloca %promise_type, align 8
into
%promise = alloca %promise_type, align 128
In other words, if this the problem we need to solve, I think we could make it in a simpler way.

This may not fix the problem.

Then I looked into the document you give in the summary. The issue#3 says the frontend can't do some work in the process of template instantiation due to the frontend doesn't know about the align and size of the coroutine. But from the implementation, it looks like not the problem this patch wants to solve.

I meant to use that as a reference to help describe the problem (but not the solution). The document itself includes both problem statements (issue#3) and solutions (frontend-based) which are totally unrelated to this patch. It looks like it is not that useful in this case so please disregard that.

I am really confused about the problem. Could you please restate your problem more in more detail? For example, would it make the alignment incorrect like the example above? Or does we want the frontend to get alignment information? Then what would be affected? From the title, I can guess the size of frame would get bigger. But how big would it be? Who would control and determine the final size?

understood.

There are two kinds of alignments: the alignment of a type/object at compile-time (ABI specified or user-specified), and the alignment the object of that type actually gets during runtime. The compiler assumes that the alignment of a struct is the maximal alignment of all its members. However, that assumption may not be true at runtime where the memory allocator may return a memory block that has insufficient alignment which causes some members aligned incorrectly.

For C++ coroutine, right now the default memory allocator could only return 16 bytes aligned memory block. When any member of the coroutine frame (promise, local variables, spills etc.) has alignment > 16, the frame becomes overaligned. This could only be fixed dynamically at runtime: by over-allocating memory and then adjust the frame start address so that it aligns correctly.

For example, suppose malloc returns 16 bytes aligned address 16, how do we make it 64 bytes aligned? align 16 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-16=48

Another similar example, suppose malloc returns 16 bytes aligned address 32, how do we make it 64 bytes aligned? align 32 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-32=32

Another similar example, suppose malloc returns 16 bytes aligned address 48, how do we make it 64 bytes aligned? align 48 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-48=16

Another similar example, suppose malloc returns 16 bytes aligned address 64, how do we make it 64 bytes aligned? align 64 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-64=0

So the mamximal adjustment amount is 64-16=48. We don't know until runtime if the malloc returned address X is (X % 64 == 0) or (X % 64 == 16) or (X % 64 == 32) or (X % 64 == 48), so we must emit extra code to deal with all cases (by bitwise operations).

In D100739#2712268, @lxfind wrote:

Sorry for the confusion. I think either overaligned or under-aligned could be used here to describe the problem: either "Handle overaligned frame" or "Fix under-aligned frame". Since c++ spec defines the former but not the later (https://en.cppreference.com/w/cpp/language/object#Alignment), my first intuition was to use the term "overalign". Under-aligned is the undesired outcome that should be fixed (probably too late to handle I assume). Also the overaligned is a static property whereas 'under-aligned" is a runtime property. From the compiler's perspective, I think overaligned should be preferred. With that said, I don't feel strongly about this. I could switch to use "overaligned" if that feels more intuitive.

Here "overaligned frame" doesn't already occur.

It does occur. FrameAlign > Shape.getSwitchCoroId()->getAlignment()) this check reflects the definition of C++ spec's definition of overalign.

From what I understand, you really just want to support promise object alignment. So why not just say that directly?

This patch does not deal with promise in any specific way. It treats promise just like any other frame members.

To add on that, I do think you need to describe the problem in more detail in the description. It's indeed still confusing.

Yep, will do that.

ychen mentioned this in D97915: [Coroutines] Handle overaligned frame allocation.Apr 23 2021, 3:12 PM

ychen added inline comments.Apr 25 2021, 6:55 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
854	This seems to align with what this mem argument is designed for: https://reviews.llvm.org/D66230#1631860

In D100739#2713579, @ychen wrote:
In D100739#2711698, @ChuanqiXu wrote:
This is an alternative to D97915 which missed proper deallocation of the over-allocated frame. This patch handles both allocations and deallocations.

If D97915 is not needed, we should abandon it.

For the example shows in D97915, it says:
#include <experimental/coroutine>
#include <iostream>
#include <stdexcept>
#include <thread>
#include <cassert>

struct task{
  struct alignas(64) promise_type {
    task get_return_object() { return {}; }
    std::experimental::suspend_never initial_suspend() { return {}; }
    std::experimental::suspend_never final_suspend() noexcept { return {}; }
    void return_void() {}
    void unhandled_exception() {}
  };
  using handle = std::experimental::coroutine_handle<promise_type>;
};

auto switch_to_new_thread() {
  struct awaitable {
    bool await_ready() { return false; }
    void await_suspend(task::handle h) {
      auto i = reinterpret_cast<std::uintptr_t>(&h.promise());
      std::cout << i << std::endl;
      assert(i % 64 == 0);
    }
    void await_resume() {}
  };
  return awaitable{};
}

task resuming_on_new_thread() {
  co_await switch_to_new_thread();
}

int main() {
  resuming_on_new_thread();
}
The assertion would fail. If this is the root problem, I think we could adjust the align for the promise alloca like:
The problem is that any member of the coroutine frame could be overaligned (thus make the frame overaligned) including promise, local variables, spills. The problem is *not* specific to promise.
%promise = alloca %promise_type, align 8
into
%promise = alloca %promise_type, align 128
In other words, if this the problem we need to solve, I think we could make it in a simpler way.
This may not fix the problem.

Then I looked into the document you give in the summary. The issue#3 says the frontend can't do some work in the process of template instantiation due to the frontend doesn't know about the align and size of the coroutine. But from the implementation, it looks like not the problem this patch wants to solve.

I meant to use that as a reference to help describe the problem (but not the solution). The document itself includes both problem statements (issue#3) and solutions (frontend-based) which are totally unrelated to this patch. It looks like it is not that useful in this case so please disregard that.

I am really confused about the problem. Could you please restate your problem more in more detail? For example, would it make the alignment incorrect like the example above? Or does we want the frontend to get alignment information? Then what would be affected? From the title, I can guess the size of frame would get bigger. But how big would it be? Who would control and determine the final size?

understood.

There are two kinds of alignments: the alignment of a type/object at compile-time (ABI specified or user-specified), and the alignment the object of that type actually gets during runtime. The compiler assumes that the alignment of a struct is the maximal alignment of all its members. However, that assumption may not be true at runtime where the memory allocator may return a memory block that has insufficient alignment which causes some members aligned incorrectly.

For C++ coroutine, right now the default memory allocator could only return 16 bytes aligned memory block. When any member of the coroutine frame (promise, local variables, spills etc.) has alignment > 16, the frame becomes overaligned. This could only be fixed dynamically at runtime: by over-allocating memory and then adjust the frame start address so that it aligns correctly.

For example, suppose malloc returns 16 bytes aligned address 16, how do we make it 64 bytes aligned? align 16 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-16=48

Another similar example, suppose malloc returns 16 bytes aligned address 32, how do we make it 64 bytes aligned? align 32 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-32=32

Another similar example, suppose malloc returns 16 bytes aligned address 48, how do we make it 64 bytes aligned? align 48 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-48=16

Another similar example, suppose malloc returns 16 bytes aligned address 64, how do we make it 64 bytes aligned? align 64 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-64=0

So the mamximal adjustment amount is 64-16=48. We don't know until runtime if the malloc returned address X is (X % 64 == 0) or (X % 64 == 16) or (X % 64 == 32) or (X % 64 == 48), so we must emit extra code to deal with all cases (by bitwise operations).

Thanks for the explanation. I think I got the problem now. And my understanding for the solution of this patch is, if the align of the original frame is 64, then we allocate (64+48) space to the new frame now. And the original frame becomes an alloca with align 64 in the new frame. So the actually frame gets the right alignment now. Do I get the problem and solution right?

I am wondering if there is a simpler solution. For example, after we construct the frame, we can get the alignment requirement for the frame. Then if the alignment is bigger than 16, we could lower the coro.begin to aligned new instead of default new. It looks like that the implementation would be much simpler.

Yes, if you can dynamically choose to use an aligned allocator, that's clearly just much better.

In D100739#2717227, @rjmccall wrote:

Yes, if you can dynamically choose to use an aligned allocator, that's clearly just much better.

Right now:

Intrinsic::coro_size_aligned : overaligned frame: over-allocate, adjust start address; non-overaligned frame: no-op
Intrinsic::coro_size : overaligned frame: no-op; non-overaligned frame: no-op

Do you mean to remove Intrinsic::coro_size_aligned and make
Intrinsic::coro_size : overaligned frame: over-allocate, adjust start address; non-overaligned frame: no-op

that makes sense to me. Just want to confirm first.

In D100739#2717259, @ychen wrote:

In D100739#2717227, @rjmccall wrote:

Yes, if you can dynamically choose to use an aligned allocator, that's clearly just much better.

Right now:

Intrinsic::coro_size_aligned : overaligned frame: over-allocate, adjust start address; non-overaligned frame: no-op
Intrinsic::coro_size : overaligned frame: no-op; non-overaligned frame: no-op

Do you mean to remove Intrinsic::coro_size_aligned and make
Intrinsic::coro_size : overaligned frame: over-allocate, adjust start address; non-overaligned frame: no-op

that makes sense to me. Just want to confirm first.

That's not really what I meant, no. I meant it would be better to find a way to use an allocator that promises to return a well-aligned value when possible. We've talked about this before; that will require the C++ committee to update the design.

I think the cleanest design for coro_size/align would be that they reflect the unadjusted requirements, and the frontend is expected to emit code which satisfies those requirements. In the absence of an aligned allocator, that means generating code like:

if (llvm.coro.alloc()) {
  size_t size = llvm.coro.size(), align = llvm.coro.align();
  if (align > NEW_ALIGN)
    size += align - NEW_ALIGN + sizeof(void*);
  frame = operator new(size);
  if (align > NEW_ALIGN) {
    auto rawFrame = frame;
    frame = (frame + align - 1) & ~(align - 1);
    *(void**) (frame + size) = rawFrame;
  }
}

and then on the deallocation side:

if (llvm.coro.alloc()) {
  size_t size = llvm.coro.size(), align = llvm.coro.align();
  if (align > NEW_ALIGN)
    frame = *(void**) (frame + size);
  operator delete(frame);
}

That's all quite annoying, but it does extend quite nicely to cover the presence of an aligned allocator when the committee gets around to ratifying that this is what should happen:

if (llvm.coro.alloc()) {
  size_t size = llvm.coro.size(), align = llvm.coro.align();
  if (align > NEW_ALIGN)
    frame = operator new(std::align_val_t(align), size);
  else
    frame = operator new(size);
}

and then on the deallocation side:

if (llvm.coro.alloc()) {
  size_t size = llvm.coro.size(), align = llvm.coro.align();
  if (align > NEW_ALIGN)
    operator delete(frame, std::align_val_t(align));
  else
    operator delete(frame);
}

In D100739#2717582, @rjmccall wrote:
In D100739#2717259, @ychen wrote:

In D100739#2717227, @rjmccall wrote:

Yes, if you can dynamically choose to use an aligned allocator, that's clearly just much better.

Right now:

Intrinsic::coro_size_aligned : overaligned frame: over-allocate, adjust start address; non-overaligned frame: no-op
Intrinsic::coro_size : overaligned frame: no-op; non-overaligned frame: no-op

Do you mean to remove Intrinsic::coro_size_aligned and make
Intrinsic::coro_size : overaligned frame: over-allocate, adjust start address; non-overaligned frame: no-op

that makes sense to me. Just want to confirm first.

That's not really what I meant, no. I meant it would be better to find a way to use an allocator that promises to return a well-aligned value when possible. We've talked about this before; that will require the C++ committee to update the design.

I think the cleanest design for coro_size/align would be that they reflect the unadjusted requirements, and the frontend is expected to emit code which satisfies those requirements. In the absence of an aligned allocator, that means generating code like:
if (llvm.coro.alloc()) {
  size_t size = llvm.coro.size(), align = llvm.coro.align();
  if (align > NEW_ALIGN)
    size += align - NEW_ALIGN + sizeof(void*);
  frame = operator new(size);
  if (align > NEW_ALIGN) {
    auto rawFrame = frame;
    frame = (frame + align - 1) & ~(align - 1);
    *(void**) (frame + size) = rawFrame;
  }
}
and then on the deallocation side:
if (llvm.coro.alloc()) {
  size_t size = llvm.coro.size(), align = llvm.coro.align();
  if (align > NEW_ALIGN)
    frame = *(void**) (frame + size);
  operator delete(frame);
}
That's all quite annoying, but it does extend quite nicely to cover the presence of an aligned allocator when the committee gets around to ratifying that this is what should happen:
if (llvm.coro.alloc()) {
  size_t size = llvm.coro.size(), align = llvm.coro.align();
  if (align > NEW_ALIGN)
    frame = operator new(std::align_val_t(align), size);
  else
    frame = operator new(size);
}
and then on the deallocation side:
if (llvm.coro.alloc()) {
  size_t size = llvm.coro.size(), align = llvm.coro.align();
  if (align > NEW_ALIGN)
    operator delete(frame, std::align_val_t(align));
  else
    operator delete(frame);
}

*(void**) (frame + size) = rawFrame; this means we always need the extra space to store the raw frame ptr. If either doing what the patch is currently doing or add another intrinsic say "llvm.coro.raw.frame.ptr.index" to do *(void**) (frame + llvm.coro.raw.frame.ptr.index()) = rawFrame;, it is likely that the extra pointer could reuse some existing paddings in the frame. There is an example of this in https://reviews.llvm.org/P8260. What do you think?

In D100739#2715964, @ChuanqiXu wrote:
In D100739#2713579, @ychen wrote:
In D100739#2711698, @ChuanqiXu wrote:
This is an alternative to D97915 which missed proper deallocation of the over-allocated frame. This patch handles both allocations and deallocations.

If D97915 is not needed, we should abandon it.

For the example shows in D97915, it says:
#include <experimental/coroutine>
#include <iostream>
#include <stdexcept>
#include <thread>
#include <cassert>

struct task{
  struct alignas(64) promise_type {
    task get_return_object() { return {}; }
    std::experimental::suspend_never initial_suspend() { return {}; }
    std::experimental::suspend_never final_suspend() noexcept { return {}; }
    void return_void() {}
    void unhandled_exception() {}
  };
  using handle = std::experimental::coroutine_handle<promise_type>;
};

auto switch_to_new_thread() {
  struct awaitable {
    bool await_ready() { return false; }
    void await_suspend(task::handle h) {
      auto i = reinterpret_cast<std::uintptr_t>(&h.promise());
      std::cout << i << std::endl;
      assert(i % 64 == 0);
    }
    void await_resume() {}
  };
  return awaitable{};
}

task resuming_on_new_thread() {
  co_await switch_to_new_thread();
}

int main() {
  resuming_on_new_thread();
}
The assertion would fail. If this is the root problem, I think we could adjust the align for the promise alloca like:
The problem is that any member of the coroutine frame could be overaligned (thus make the frame overaligned) including promise, local variables, spills. The problem is *not* specific to promise.
%promise = alloca %promise_type, align 8
into
%promise = alloca %promise_type, align 128
In other words, if this the problem we need to solve, I think we could make it in a simpler way.
This may not fix the problem.

Then I looked into the document you give in the summary. The issue#3 says the frontend can't do some work in the process of template instantiation due to the frontend doesn't know about the align and size of the coroutine. But from the implementation, it looks like not the problem this patch wants to solve.

I meant to use that as a reference to help describe the problem (but not the solution). The document itself includes both problem statements (issue#3) and solutions (frontend-based) which are totally unrelated to this patch. It looks like it is not that useful in this case so please disregard that.

I am really confused about the problem. Could you please restate your problem more in more detail? For example, would it make the alignment incorrect like the example above? Or does we want the frontend to get alignment information? Then what would be affected? From the title, I can guess the size of frame would get bigger. But how big would it be? Who would control and determine the final size?

understood.

There are two kinds of alignments: the alignment of a type/object at compile-time (ABI specified or user-specified), and the alignment the object of that type actually gets during runtime. The compiler assumes that the alignment of a struct is the maximal alignment of all its members. However, that assumption may not be true at runtime where the memory allocator may return a memory block that has insufficient alignment which causes some members aligned incorrectly.

For C++ coroutine, right now the default memory allocator could only return 16 bytes aligned memory block. When any member of the coroutine frame (promise, local variables, spills etc.) has alignment > 16, the frame becomes overaligned. This could only be fixed dynamically at runtime: by over-allocating memory and then adjust the frame start address so that it aligns correctly.

For example, suppose malloc returns 16 bytes aligned address 16, how do we make it 64 bytes aligned? align 16 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-16=48

Another similar example, suppose malloc returns 16 bytes aligned address 32, how do we make it 64 bytes aligned? align 32 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-32=32

Another similar example, suppose malloc returns 16 bytes aligned address 48, how do we make it 64 bytes aligned? align 48 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-48=16

Another similar example, suppose malloc returns 16 bytes aligned address 64, how do we make it 64 bytes aligned? align 64 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-64=0

So the mamximal adjustment amount is 64-16=48. We don't know until runtime if the malloc returned address X is (X % 64 == 0) or (X % 64 == 16) or (X % 64 == 32) or (X % 64 == 48), so we must emit extra code to deal with all cases (by bitwise operations).
Thanks for the explanation. I think I got the problem now. And my understanding for the solution of this patch is, if the align of the original frame is 64, then we allocate (64+48) space to the new frame now. And the original frame becomes an alloca with align 64 in the new frame. So the actually frame gets the right alignment now. Do I get the problem and solution right?

I am wondering if there is a simpler solution. For example, after we construct the frame, we can get the alignment requirement for the frame. Then if the alignment is bigger than 16, we could lower the coro.begin to aligned new instead of default new. It looks like that the implementation would be much simpler.

Oh, right now C++ coroutine standard is written in the way that the aligned new is not searched by the frontend. The limitation will be lifted in C++23 (hopefully).

In D100739#2717582, @rjmccall wrote:

That's not really what I meant, no. I meant it would be better to find a way to use an allocator that promises to return a well-aligned value when possible. We've talked about this before; that will require the C++ committee to update the design.

I had a question. Does this mean we can't use aligned allocator until the C++ committee update the wording? For example, I know Clang/LLVM implement coroutine before it becomes the standard. I mean is it possible to use aligned-allocator to solve the problem here?

In D100739#2718514, @ychen wrote:

Oh, right now C++ coroutine standard is written in the way that the aligned new is not searched by the frontend. The limitation will be lifted in C++23 (hopefully).

I see. I am curious about the relationship of compiler implementation and language standard now. For example, Clang/LLVM implement coroutine before it becomes standard. The point I curious about is that should Clang/LLVM implement based on proposals accepted only?

Here are the options I think the committee might take:

Always select an unaligned allocator and force implementors to dynamically align. This seems unlikely to me.
Allow an aligned allocator to be selected. The issue here is that we cannot know until coroutine splitting whether the frame has a new-extended alignment. So there are some sub-options:

2a. Always use an aligned allocator if available, even if the frame ends up not being aligned. I think this is unlikely.
2b. Use the correct allocator for the frame alignment, using the alignment inferred from the immediate coroutine body. This would force implementations to avoid doing anything prior to coroutine splitting that would increase the frame's alignment beyond the derived limit. This would be quite annoying for implementors, and it would have strange performance cliffs, but it's theoretically simple.
2c. Use the correct allocator for the frame alignment; only that allocator is formally ODR-used. This would force implementations to not ODR-use the right allocator until coroutine lowering, which is not really reasonable; we should be able to dissuade the committee from picking this option.
2d. Use the correct allocator for the frame alignment; both allocators are (allowed to be) ODR-used, but only one would be dynamically used. This is what would be necessary for the implementation I suggested above. In reality there won't be any dynamic overhead because we should always be able to fold the branch after allocation.

I think 2d is obvious enough that we could go ahead and implement it pending standardization. There's still a question of what to do if the promise class only provides an unaligned operator new, but the only reasonable behavior is to dynamically align: unlike with new expressions, there's no way the promise class could be expected to know/satisfy the appropriate alignment requirement in general, so assuming alignment is not acceptable. (Neither is rejecting the program if it doesn't provide both — this wouldn't be compatible with existing behavior.)

*(void) (frame + size) = rawFrame; this means we always need the extra space to store the raw frame ptr. If either doing what the patch is currently doing or add another intrinsic say "llvm.coro.raw.frame.ptr.index" to do *(void) (frame + llvm.coro.raw.frame.ptr.index()) = rawFrame;, it is likely that the extra pointer could reuse some existing paddings in the frame. There is an example of this in https://reviews.llvm.org/P8260. What do you think?

You're right that there might be space in the frame we could re-use. I was thinking that it would be a shame to always add storage to the frame for the raw frame pointer, but maybe the contract of llvm.coro.raw.frame.ptr.offset could be that it's only meaningful if the frame has extended alignment. Coroutine splitting would determine if any of the frame members was over-aligned and add a raw-pointer field if so. We'd be stuck allocating space in the frame even when allocation was elided, but stack space is cheap.

It does need to be an offset instead of a type index, though; the frontend will be emitting a GEP and will not know the frame type.

In D100739#2718681, @rjmccall wrote:

Here are the options I think the committee might take:

Always select an unaligned allocator and force implementors to dynamically align. This seems unlikely to me.

Allow an aligned allocator to be selected. The issue here is that we cannot know until coroutine splitting whether the frame has a new-extended alignment. So there are some sub-options:

2a. Always use an aligned allocator if available, even if the frame ends up not being aligned. I think this is unlikely.
2b. Use the correct allocator for the frame alignment, using the alignment inferred from the immediate coroutine body. This would force implementations to avoid doing anything prior to coroutine splitting that would increase the frame's alignment beyond the derived limit. This would be quite annoying for implementors, and it would have strange performance cliffs, but it's theoretically simple.
2c. Use the correct allocator for the frame alignment; only that allocator is formally ODR-used. This would force implementations to not ODR-use the right allocator until coroutine lowering, which is not really reasonable; we should be able to dissuade the committee from picking this option.
2d. Use the correct allocator for the frame alignment; both allocators are (allowed to be) ODR-used, but only one would be dynamically used. This is what would be necessary for the implementation I suggested above. In reality there won't be any dynamic overhead because we should always be able to fold the branch after allocation.

I think 2d is obvious enough that we could go ahead and implement it pending standardization. There's still a question of what to do if the promise class only provides an unaligned operator new, but the only reasonable behavior is to dynamically align: unlike with new expressions, there's no way the promise class could be expected to know/satisfy the appropriate alignment requirement in general, so assuming alignment is not acceptable. (Neither is rejecting the program if it doesn't provide both — this wouldn't be compatible with existing behavior.)

*(void) (frame + size) = rawFrame; this means we always need the extra space to store the raw frame ptr. If either doing what the patch is currently doing or add another intrinsic say "llvm.coro.raw.frame.ptr.index" to do *(void) (frame + llvm.coro.raw.frame.ptr.index()) = rawFrame;, it is likely that the extra pointer could reuse some existing paddings in the frame. There is an example of this in https://reviews.llvm.org/P8260. What do you think?

You're right that there might be space in the frame we could re-use. I was thinking that it would be a shame to always add storage to the frame for the raw frame pointer, but maybe the contract of llvm.coro.raw.frame.ptr.offset could be that it's only meaningful if the frame has extended alignment. Coroutine splitting would determine if any of the frame members was over-aligned and add a raw-pointer field if so. We'd be stuck allocating space in the frame even when allocation was elided, but stack space is cheap.

It does need to be an offset instead of a type index, though; the frontend will be emitting a GEP and will not know the frame type.

Sounds good to me. Thanks. I'll go ahead with llvm.coro.raw.frame.ptr.offset.

Reopened D97915 to address the feedbacks. Close this one.

In D100739#2718528, @ChuanqiXu wrote:

In D100739#2718514, @ychen wrote:

Oh, right now C++ coroutine standard is written in the way that the aligned new is not searched by the frontend. The limitation will be lifted in C++23 (hopefully).

I see. I am curious about the relationship of compiler implementation and language standard now. For example, Clang/LLVM implement coroutine before it becomes standard. The point I curious about is that should Clang/LLVM implement based on proposals accepted only?

Not a C++ language expert myself. But I think a proposal does not have to be formally accepted to kick-start the implementation (as long as the overall design is decided and the proposal is very likely to be accepted).

In D100739#2727808, @ychen wrote:

In D100739#2718528, @ChuanqiXu wrote:

In D100739#2718514, @ychen wrote:

Oh, right now C++ coroutine standard is written in the way that the aligned new is not searched by the frontend. The limitation will be lifted in C++23 (hopefully).

I see. I am curious about the relationship of compiler implementation and language standard now. For example, Clang/LLVM implement coroutine before it becomes standard. The point I curious about is that should Clang/LLVM implement based on proposals accepted only?

Not a C++ language expert myself. But I think a proposal does not have to be formally accepted to kick-start the implementation (as long as the overall design is decided and the proposal is very likely to be accepted).

I see. I am still prefer to use the aligned allocator to solve the problems, although you and other reviewers prefer to use the over aligned frame.

In D100739#2727974, @ChuanqiXu wrote:

In D100739#2727808, @ychen wrote:

In D100739#2718528, @ChuanqiXu wrote:

In D100739#2718514, @ychen wrote:

Oh, right now C++ coroutine standard is written in the way that the aligned new is not searched by the frontend. The limitation will be lifted in C++23 (hopefully).

I see. I am curious about the relationship of compiler implementation and language standard now. For example, Clang/LLVM implement coroutine before it becomes standard. The point I curious about is that should Clang/LLVM implement based on proposals accepted only?

Not a C++ language expert myself. But I think a proposal does not have to be formally accepted to kick-start the implementation (as long as the overall design is decided and the proposal is very likely to be accepted).

I see. I am still prefer to use the aligned allocator to solve the problems, although you and other reviewers prefer to use the over aligned frame.

It will be a hybrid of both. Since when a non-aligned version is picked by the frontend due to the aligned version not available, we still have to use overaligend frame. I'll send a separate patch for the aligned allocator searching.

ychen mentioned this in D106248: [Coroutines] Overalign coroutine frame when frame alignment exceeds the alignment limit.Jul 20 2021, 10:43 PM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGBuiltin.cpp

2 lines

test/

CodeGenCoroutines/

coro-alloc.cpp

14 lines

coro-builtins.c

2 lines

coro-gro.cpp

2 lines

llvm/

docs/

Coroutines.rst

29 lines

include/

llvm/

IR/

Intrinsics.td

1 line

lib/

Transforms/

Coroutines/

115 lines

16 lines

6 lines

43 lines

29 lines

test/

Transforms/

Coroutines/

coro-overalign.ll

81 lines

Diff 339906

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 4,427 Lines • ▼ Show 20 Lines

	case Builtin::BI__fastfail:			case Builtin::BI__fastfail:
	return RValue::get(EmitMSVCBuiltinExpr(MSVCIntrin::__fastfail, E));			return RValue::get(EmitMSVCBuiltinExpr(MSVCIntrin::__fastfail, E));

	case Builtin::BI__builtin_coro_size: {			case Builtin::BI__builtin_coro_size: {
	auto & Context = getContext();			auto & Context = getContext();
	auto SizeTy = Context.getSizeType();			auto SizeTy = Context.getSizeType();
	auto T = Builder.getIntNTy(Context.getTypeSize(SizeTy));			auto T = Builder.getIntNTy(Context.getTypeSize(SizeTy));
	Function *F = CGM.getIntrinsic(Intrinsic::coro_size, T);			Function *F = CGM.getIntrinsic(Intrinsic::coro_size_aligned, T);
	return RValue::get(Builder.CreateCall(F));			return RValue::get(Builder.CreateCall(F));
	}			}

	case Builtin::BI__builtin_coro_id:			case Builtin::BI__builtin_coro_id:
	return EmitCoroutineIntrinsic(E, Intrinsic::coro_id);			return EmitCoroutineIntrinsic(E, Intrinsic::coro_id);
	case Builtin::BI__builtin_coro_promise:			case Builtin::BI__builtin_coro_promise:
	return EmitCoroutineIntrinsic(E, Intrinsic::coro_promise);			return EmitCoroutineIntrinsic(E, Intrinsic::coro_promise);
	case Builtin::BI__builtin_coro_resume:			case Builtin::BI__builtin_coro_resume:
	▲ Show 20 Lines • Show All 13,533 Lines • Show Last 20 Lines

clang/test/CodeGenCoroutines/coro-alloc.cpp

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines

// CHECK-LABEL: f0(		// CHECK-LABEL: f0(
extern "C" void f0(global_new_delete_tag) {		extern "C" void f0(global_new_delete_tag) {
// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16		// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16
// CHECK: %[[NeedAlloc:.+]] = call i1 @llvm.coro.alloc(token %[[ID]])		// CHECK: %[[NeedAlloc:.+]] = call i1 @llvm.coro.alloc(token %[[ID]])
// CHECK: br i1 %[[NeedAlloc]], label %[[AllocBB:.+]], label %[[InitBB:.+]]		// CHECK: br i1 %[[NeedAlloc]], label %[[AllocBB:.+]], label %[[InitBB:.+]]

// CHECK: [[AllocBB]]:		// CHECK: [[AllocBB]]:
// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()		// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.aligned.i64()
// CHECK: %[[MEM:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])		// CHECK: %[[MEM:.+]] = call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])
// CHECK: br label %[[InitBB]]		// CHECK: br label %[[InitBB]]

// CHECK: [[InitBB]]:		// CHECK: [[InitBB]]:
// CHECK: %[[PHI:.+]] = phi i8* [ null, %{{.+}} ], [ %call, %[[AllocBB]] ]		// CHECK: %[[PHI:.+]] = phi i8* [ null, %{{.+}} ], [ %call, %[[AllocBB]] ]
// CHECK: %[[FRAME:.+]] = call i8* @llvm.coro.begin(token %[[ID]], i8* %[[PHI]])		// CHECK: %[[FRAME:.+]] = call i8* @llvm.coro.begin(token %[[ID]], i8* %[[PHI]])

// CHECK: %[[MEM:.+]] = call i8* @llvm.coro.free(token %[[ID]], i8* %[[FRAME]])		// CHECK: %[[MEM:.+]] = call i8* @llvm.coro.free(token %[[ID]], i8* %[[FRAME]])
Show All 20 Lines	struct promise_type {
suspend_always final_suspend() noexcept { return {}; }		suspend_always final_suspend() noexcept { return {}; }
void return_void() {}		void return_void() {}
};		};
};		};

// CHECK-LABEL: f1(		// CHECK-LABEL: f1(
extern "C" void f1(promise_new_tag ) {		extern "C" void f1(promise_new_tag ) {
// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16		// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16
// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()		// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.aligned.i64()
// CHECK: call i8* @_ZNSt12experimental16coroutine_traitsIJv15promise_new_tagEE12promise_typenwEm(i64 %[[SIZE]])		// CHECK: call i8* @_ZNSt12experimental16coroutine_traitsIJv15promise_new_tagEE12promise_typenwEm(i64 %[[SIZE]])

// CHECK: %[[FRAME:.+]] = call i8* @llvm.coro.begin(		// CHECK: %[[FRAME:.+]] = call i8* @llvm.coro.begin(
// CHECK: %[[MEM:.+]] = call i8* @llvm.coro.free(token %[[ID]], i8* %[[FRAME]])		// CHECK: %[[MEM:.+]] = call i8* @llvm.coro.free(token %[[ID]], i8* %[[FRAME]])
// CHECK: call void @_ZdlPv(i8* %[[MEM]])		// CHECK: call void @_ZdlPv(i8* %[[MEM]])
co_return;		co_return;
}		}

Show All 12 Lines
};		};

// CHECK-LABEL: f1a(		// CHECK-LABEL: f1a(
extern "C" void f1a(promise_matching_placement_new_tag, int x, float y , double z) {		extern "C" void f1a(promise_matching_placement_new_tag, int x, float y , double z) {
// CHECK: store i32 %x, i32* %x.addr, align 4		// CHECK: store i32 %x, i32* %x.addr, align 4
// CHECK: store float %y, float* %y.addr, align 4		// CHECK: store float %y, float* %y.addr, align 4
// CHECK: store double %z, double* %z.addr, align 8		// CHECK: store double %z, double* %z.addr, align 8
// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16		// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16
// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()		// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.aligned.i64()
// CHECK: %[[INT:.+]] = load i32, i32* %x.addr, align 4		// CHECK: %[[INT:.+]] = load i32, i32* %x.addr, align 4
// CHECK: %[[FLOAT:.+]] = load float, float* %y.addr, align 4		// CHECK: %[[FLOAT:.+]] = load float, float* %y.addr, align 4
// CHECK: %[[DOUBLE:.+]] = load double, double* %z.addr, align 8		// CHECK: %[[DOUBLE:.+]] = load double, double* %z.addr, align 8
// CHECK: call i8* @_ZNSt12experimental16coroutine_traitsIJv34promise_matching_placement_new_tagifdEE12promise_typenwEmS1_ifd(i64 %[[SIZE]], i32 %[[INT]], float %[[FLOAT]], double %[[DOUBLE]])		// CHECK: call i8* @_ZNSt12experimental16coroutine_traitsIJv34promise_matching_placement_new_tagifdEE12promise_typenwEmS1_ifd(i64 %[[SIZE]], i32 %[[INT]], float %[[FLOAT]], double %[[DOUBLE]])
co_return;		co_return;
}		}

// Declare a placement form operator new, such as the one described in		// Declare a placement form operator new, such as the one described in
Show All 33 Lines	struct promise_type {
suspend_always final_suspend() noexcept { return {}; }		suspend_always final_suspend() noexcept { return {}; }
void return_void() {}		void return_void() {}
};		};
};		};

// CHECK-LABEL: f2(		// CHECK-LABEL: f2(
extern "C" void f2(promise_delete_tag) {		extern "C" void f2(promise_delete_tag) {
// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16		// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16
// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()		// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.aligned.i64()
// CHECK: call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])		// CHECK: call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])

// CHECK: %[[FRAME:.+]] = call i8* @llvm.coro.begin(		// CHECK: %[[FRAME:.+]] = call i8* @llvm.coro.begin(
// CHECK: %[[MEM:.+]] = call i8* @llvm.coro.free(token %[[ID]], i8* %[[FRAME]])		// CHECK: %[[MEM:.+]] = call i8* @llvm.coro.free(token %[[ID]], i8* %[[FRAME]])
// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJv18promise_delete_tagEE12promise_typedlEPv(i8* %[[MEM]])		// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJv18promise_delete_tagEE12promise_typedlEPv(i8* %[[MEM]])
co_return;		co_return;
}		}

struct promise_sized_delete_tag {};		struct promise_sized_delete_tag {};

template<>		template<>
struct std::experimental::coroutine_traits<void, promise_sized_delete_tag> {		struct std::experimental::coroutine_traits<void, promise_sized_delete_tag> {
struct promise_type {		struct promise_type {
void operator delete(void*, unsigned long);		void operator delete(void*, unsigned long);
void get_return_object() {}		void get_return_object() {}
suspend_always initial_suspend() { return {}; }		suspend_always initial_suspend() { return {}; }
suspend_always final_suspend() noexcept { return {}; }		suspend_always final_suspend() noexcept { return {}; }
void return_void() {}		void return_void() {}
};		};
};		};

// CHECK-LABEL: f3(		// CHECK-LABEL: f3(
extern "C" void f3(promise_sized_delete_tag) {		extern "C" void f3(promise_sized_delete_tag) {
// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16		// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16
// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()		// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.aligned.i64()
// CHECK: call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])		// CHECK: call noalias nonnull i8* @_Znwm(i64 %[[SIZE]])

// CHECK: %[[FRAME:.+]] = call i8* @llvm.coro.begin(		// CHECK: %[[FRAME:.+]] = call i8* @llvm.coro.begin(
// CHECK: %[[MEM:.+]] = call i8* @llvm.coro.free(token %[[ID]], i8* %[[FRAME]])		// CHECK: %[[MEM:.+]] = call i8* @llvm.coro.free(token %[[ID]], i8* %[[FRAME]])
// CHECK: %[[SIZE2:.+]] = call i64 @llvm.coro.size.i64()		// CHECK: %[[SIZE2:.+]] = call i64 @llvm.coro.size.aligned.i64()
// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJv24promise_sized_delete_tagEE12promise_typedlEPvm(i8* %[[MEM]], i64 %[[SIZE2]])		// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJv24promise_sized_delete_tagEE12promise_typedlEPvm(i8* %[[MEM]], i64 %[[SIZE2]])
co_return;		co_return;
}		}

struct promise_on_alloc_failure_tag {};		struct promise_on_alloc_failure_tag {};

template<>		template<>
struct std::experimental::coroutine_traits<int, promise_on_alloc_failure_tag> {		struct std::experimental::coroutine_traits<int, promise_on_alloc_failure_tag> {
struct promise_type {		struct promise_type {
int get_return_object() { return 0; }		int get_return_object() { return 0; }
suspend_always initial_suspend() { return {}; }		suspend_always initial_suspend() { return {}; }
suspend_always final_suspend() noexcept { return {}; }		suspend_always final_suspend() noexcept { return {}; }
void return_void() {}		void return_void() {}
static int get_return_object_on_allocation_failure() { return -1; }		static int get_return_object_on_allocation_failure() { return -1; }
};		};
};		};

// CHECK-LABEL: f4(		// CHECK-LABEL: f4(
extern "C" int f4(promise_on_alloc_failure_tag) {		extern "C" int f4(promise_on_alloc_failure_tag) {
// CHECK: %[[RetVal:.+]] = alloca i32		// CHECK: %[[RetVal:.+]] = alloca i32
// CHECK: %[[Gro:.+]] = alloca i32		// CHECK: %[[Gro:.+]] = alloca i32
// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16		// CHECK: %[[ID:.+]] = call token @llvm.coro.id(i32 16
// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()		// CHECK: %[[SIZE:.+]] = call i64 @llvm.coro.size.aligned.i64()
// CHECK: %[[MEM:.+]] = call noalias i8* @_ZnwmRKSt9nothrow_t(i64 %[[SIZE]], %"struct.std::nothrow_t"* nonnull align 1 dereferenceable(1) @_ZStL7nothrow)		// CHECK: %[[MEM:.+]] = call noalias i8* @_ZnwmRKSt9nothrow_t(i64 %[[SIZE]], %"struct.std::nothrow_t"* nonnull align 1 dereferenceable(1) @_ZStL7nothrow)
// CHECK: %[[OK:.+]] = icmp ne i8* %[[MEM]], null		// CHECK: %[[OK:.+]] = icmp ne i8* %[[MEM]], null
// CHECK: br i1 %[[OK]], label %[[OKBB:.+]], label %[[ERRBB:.+]]		// CHECK: br i1 %[[OK]], label %[[OKBB:.+]], label %[[ERRBB:.+]]

// CHECK: [[ERRBB]]:		// CHECK: [[ERRBB]]:
// CHECK: %[[FailRet:.+]] = call i32 @_ZNSt12experimental16coroutine_traitsIJi28promise_on_alloc_failure_tagEE12promise_type39get_return_object_on_allocation_failureEv(		// CHECK: %[[FailRet:.+]] = call i32 @_ZNSt12experimental16coroutine_traitsIJi28promise_on_alloc_failure_tagEE12promise_type39get_return_object_on_allocation_failureEv(
// CHECK: store i32 %[[FailRet]], i32* %[[RetVal]]		// CHECK: store i32 %[[FailRet]], i32* %[[RetVal]]
// CHECK: br label %[[RetBB:.+]]		// CHECK: br label %[[RetBB:.+]]
Show All 14 Lines

clang/test/CodeGenCoroutines/coro-builtins.c

Show All 14 Lines	void f(int n) {
__builtin_coro_id(32, &promise, 0, 0);		__builtin_coro_id(32, &promise, 0, 0);

// CHECK-NEXT: call i1 @llvm.coro.alloc(token %[[COROID]])		// CHECK-NEXT: call i1 @llvm.coro.alloc(token %[[COROID]])
__builtin_coro_alloc();		__builtin_coro_alloc();

// CHECK-NEXT: call i8* @llvm.coro.noop()		// CHECK-NEXT: call i8* @llvm.coro.noop()
__builtin_coro_noop();		__builtin_coro_noop();

// CHECK-NEXT: %[[SIZE:.+]] = call i64 @llvm.coro.size.i64()		// CHECK-NEXT: %[[SIZE:.+]] = call i64 @llvm.coro.size.aligned.i64()
// CHECK-NEXT: %[[MEM:.+]] = call i8* @myAlloc(i64 %[[SIZE]])		// CHECK-NEXT: %[[MEM:.+]] = call i8* @myAlloc(i64 %[[SIZE]])
// CHECK-NEXT: %[[FRAME:.+]] = call i8* @llvm.coro.begin(token %[[COROID]], i8* %[[MEM]])		// CHECK-NEXT: %[[FRAME:.+]] = call i8* @llvm.coro.begin(token %[[COROID]], i8* %[[MEM]])
__builtin_coro_begin(myAlloc(__builtin_coro_size()));		__builtin_coro_begin(myAlloc(__builtin_coro_size()));

// CHECK-NEXT: call void @llvm.coro.resume(i8* %[[FRAME]])		// CHECK-NEXT: call void @llvm.coro.resume(i8* %[[FRAME]])
__builtin_coro_resume(__builtin_coro_frame());		__builtin_coro_resume(__builtin_coro_frame());

// CHECK-NEXT: call void @llvm.coro.destroy(i8* %[[FRAME]])		// CHECK-NEXT: call void @llvm.coro.destroy(i8* %[[FRAME]])
Show All 22 Lines

clang/test/CodeGenCoroutines/coro-gro.cpp

	Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	struct Cleanup { ~Cleanup(); };			struct Cleanup { ~Cleanup(); };
	void doSomething() noexcept;			void doSomething() noexcept;

	// CHECK: define{{.*}} i32 @_Z1fv(			// CHECK: define{{.*}} i32 @_Z1fv(
	int f() {			int f() {
	// CHECK: %[[RetVal:.+]] = alloca i32			// CHECK: %[[RetVal:.+]] = alloca i32
	// CHECK: %[[GroActive:.+]] = alloca i1			// CHECK: %[[GroActive:.+]] = alloca i1

	// CHECK: %[[Size:.+]] = call i64 @llvm.coro.size.i64()			// CHECK: %[[Size:.+]] = call i64 @llvm.coro.size.aligned.i64()
	// CHECK: call noalias nonnull i8* @_Znwm(i64 %[[Size]])			// CHECK: call noalias nonnull i8* @_Znwm(i64 %[[Size]])
	// CHECK: store i1 false, i1* %[[GroActive]]			// CHECK: store i1 false, i1* %[[GroActive]]
	// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJiEE12promise_typeC1Ev(			// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJiEE12promise_typeC1Ev(
	// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJiEE12promise_type17get_return_objectEv(			// CHECK: call void @_ZNSt12experimental16coroutine_traitsIJiEE12promise_type17get_return_objectEv(
	// CHECK: store i1 true, i1* %[[GroActive]]			// CHECK: store i1 true, i1* %[[GroActive]]

	Cleanup cleanup;			Cleanup cleanup;
	doSomething();			doSomething();
	Show All 27 Lines

llvm/docs/Coroutines.rst

	Show First 20 Lines • Show All 942 Lines • ▼ Show 20 Lines
	None			None

	Semantics:			Semantics:
	""""""""""			""""""""""

	The `coro.size` intrinsic is lowered to a constant representing the size of			The `coro.size` intrinsic is lowered to a constant representing the size of
	the coroutine frame.			the coroutine frame.

				.. _coro.size.aligned:

				'llvm.coro.size.aligned' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				::

				declare i32 @llvm.coro.size.aligned.i32()
				declare i64 @llvm.coro.size.aligned.i64()
				ChuanqiXuUnsubmitted Not Done Reply Inline Actions Maybe I was missing something. I think these two intrinsic should take one i8* argument to specify the coroutine handle. Otherwise, it may be confusing if there are some coroutines get inlined into other coroutine. ChuanqiXu: Maybe I was missing something. I think these two intrinsic should take one i8* argument to…

				Overview:
				"""""""""

				The '``llvm.coro.size.aligned``' intrinsic returns the number of bytes
				allocated by a memory allocator to store a `coroutine frame`_. It is usually
				greater than or equal to '``llvm.coro.size``'.

				Arguments:
				""""""""""

				None

				Semantics:
				""""""""""

				Using this intrinsic indicates to LLVM that it should handle overaligned
				`coroutine frame`_ by requesting more memory than needed to store a
				`coroutine frame`_ to satisfy its memory alignment requirement. This is only
				supported for switched-resume coroutines.

	.. _coro.begin:			.. _coro.begin:

	'llvm.coro.begin' Intrinsic			'llvm.coro.begin' Intrinsic
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	::			::

	declare i8* @llvm.coro.begin(token <id>, i8* <mem>)			declare i8* @llvm.coro.begin(token <id>, i8* <mem>)

	▲ Show 20 Lines • Show All 831 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 1,231 Lines • ▼ Show 20 Lines	def int_coro_free : Intrinsic<[llvm_ptr_ty], [llvm_token_ty, llvm_ptr_ty],
NoCapture<ArgIndex<1>>]>;		NoCapture<ArgIndex<1>>]>;
def int_coro_end : Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_i1_ty], []>;		def int_coro_end : Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_i1_ty], []>;
def int_coro_end_async		def int_coro_end_async
: Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_i1_ty, llvm_vararg_ty], []>;		: Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_i1_ty, llvm_vararg_ty], []>;

def int_coro_frame : Intrinsic<[llvm_ptr_ty], [], [IntrNoMem]>;		def int_coro_frame : Intrinsic<[llvm_ptr_ty], [], [IntrNoMem]>;
def int_coro_noop : Intrinsic<[llvm_ptr_ty], [], [IntrNoMem]>;		def int_coro_noop : Intrinsic<[llvm_ptr_ty], [], [IntrNoMem]>;
def int_coro_size : Intrinsic<[llvm_anyint_ty], [], [IntrNoMem]>;		def int_coro_size : Intrinsic<[llvm_anyint_ty], [], [IntrNoMem]>;
		def int_coro_size_aligned : Intrinsic<[llvm_anyint_ty], [], [IntrNoMem]>;

def int_coro_save : Intrinsic<[llvm_token_ty], [llvm_ptr_ty], []>;		def int_coro_save : Intrinsic<[llvm_token_ty], [llvm_ptr_ty], []>;
def int_coro_suspend : Intrinsic<[llvm_i8_ty], [llvm_token_ty, llvm_i1_ty], []>;		def int_coro_suspend : Intrinsic<[llvm_i8_ty], [llvm_token_ty, llvm_i1_ty], []>;
def int_coro_suspend_retcon : Intrinsic<[llvm_any_ty], [llvm_vararg_ty], []>;		def int_coro_suspend_retcon : Intrinsic<[llvm_any_ty], [llvm_vararg_ty], []>;
def int_coro_prepare_retcon : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty],		def int_coro_prepare_retcon : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_coro_alloca_alloc : Intrinsic<[llvm_token_ty],		def int_coro_alloca_alloc : Intrinsic<[llvm_token_ty],
[llvm_anyint_ty, llvm_i32_ty], []>;		[llvm_anyint_ty, llvm_i32_ty], []>;
▲ Show 20 Lines • Show All 431 Lines • Show Last 20 Lines

llvm/lib/Transforms/Coroutines/CoroFrame.cpp

//===- CoroFrame.cpp - Builds and manipulates coroutine frame -------------===//		//===- CoroFrame.cpp - Builds and manipulates coroutine frame -------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// This file contains classes used to discover if for a particular value		// This file contains classes used to discover if for a particular value
// there from sue to definition that crosses a suspend block.		// there from sue to definition that crosses a suspend block.
//		//
// Using the information discovered we form a Coroutine Frame structure to		// Using the information discovered we form a Coroutine Frame structure to
// contain those values. All uses of those values are replaced with appropriate		// contain those values. All uses of those values are replaced with appropriate
// GEP + load from the coroutine frame. At the point of the definition we spill		// GEP + load from the coroutine frame. At the point of the definition we spill
// the value into the coroutine frame.		// the value into the coroutine frame.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		#include "CoroInstr.h"
#include "CoroInternal.h"		#include "CoroInternal.h"
#include "llvm/ADT/BitVector.h"		#include "llvm/ADT/BitVector.h"
#include "llvm/ADT/SmallString.h"		#include "llvm/ADT/SmallString.h"
#include "llvm/Analysis/PtrUseVisitor.h"		#include "llvm/Analysis/PtrUseVisitor.h"
#include "llvm/Analysis/StackLifetime.h"		#include "llvm/Analysis/StackLifetime.h"
#include "llvm/Config/llvm-config.h"		#include "llvm/Config/llvm-config.h"
		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
		#include "llvm/IR/Constants.h"
#include "llvm/IR/DIBuilder.h"		#include "llvm/IR/DIBuilder.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
		#include "llvm/IR/Instruction.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include "llvm/Support/OptimizedStructLayout.h"		#include "llvm/Support/OptimizedStructLayout.h"
#include "llvm/Support/circular_raw_ostream.h"		#include "llvm/Support/circular_raw_ostream.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/PromoteMemToReg.h"		#include "llvm/Transforms/Utils/PromoteMemToReg.h"
▲ Show 20 Lines • Show All 445 Lines • ▼ Show 20 Lines	uint64_t getStructSize() const {
return StructSize;		return StructSize;
}		}

Align getStructAlign() const {		Align getStructAlign() const {
assert(IsFinished && "not yet finished!");		assert(IsFinished && "not yet finished!");
return StructAlign;		return StructAlign;
}		}

		SmallVector<Field, 8> &getFields() { return Fields; }

FieldIDType getLayoutFieldIndex(FieldIDType Id) const {		FieldIDType getLayoutFieldIndex(FieldIDType Id) const {
assert(IsFinished && "not yet finished!");		assert(IsFinished && "not yet finished!");
return Fields[Id].LayoutFieldIndex;		return Fields[Id].LayoutFieldIndex;
}		}
};		};
} // namespace		} // namespace

void FrameDataInfo::updateLayoutIndex(FrameTypeBuilder &B) {		void FrameDataInfo::updateLayoutIndex(FrameTypeBuilder &B) {
▲ Show 20 Lines • Show All 208 Lines • ▼ Show 20 Lines	for (auto &F : Fields) {
assert(Ty->getElementType(F.LayoutFieldIndex) == F.Ty);		assert(Ty->getElementType(F.LayoutFieldIndex) == F.Ty);
assert(Layout->getElementOffset(F.LayoutFieldIndex) == F.Offset);		assert(Layout->getElementOffset(F.LayoutFieldIndex) == F.Offset);
}		}
#endif		#endif

IsFinished = true;		IsFinished = true;
}		}

		// Adapted from CodeGenFunction::EmitBuiltinAlignTo.
		static Value emitAlignUpTo(IRBuilder<> &Builder, Value Src, uint64_t Align) {
		const DataLayout &DL = Builder.GetInsertBlock()->getModule()->getDataLayout();

		auto *SrcType = cast<PointerType>(Src->getType());
		IntegerType *IntType = IntegerType::get(Builder.getContext(),
		DL.getIndexTypeSizeInBits(SrcType));
		Value *Alignment = ConstantInt::get(IntType, Align);
		auto *One = ConstantInt::get(IntType, 1);
		Value *Mask = Builder.CreateSub(Alignment, One, "mask");
		Value *SrcAddr = Builder.CreatePtrToInt(Src, IntType, "intptr");

		// When aligning up we have to first add the mask to ensure we go over the
		// next alignment value and then align down to the next valid multiple.
		// By adding the mask, we ensure that align_up on an already aligned
		// value will not change the value.
		Value *SrcForMask = Builder.CreateAdd(SrcAddr, Mask, "over_boundary");

		// Invert the mask to only clear the lower bits.
		Value *InvertedMask = Builder.CreateNot(Mask, "inverted_mask");
		Value *Result = Builder.CreateAnd(SrcForMask, InvertedMask, "aligned_result");

		Result->setName("aligned_intptr");
		Value *Difference = Builder.CreateSub(Result, SrcAddr, "diff");
		// The result must point to the same underlying allocation. This means we
		// can use an inbounds GEP to enable better optimization.

		PointerType *DestType = Builder.getInt8PtrTy();
		if (unsigned AddrSpace = SrcType->getAddressSpace())
		DestType = Type::getInt8PtrTy(Builder.getContext(), AddrSpace);

		Value *Base = Src;
		if (SrcType != DestType)
		Base = Builder.CreateBitCast(Src, DestType);

		// Out-of-bound case could not happen.
		Result = Builder.CreateGEP(Base, Difference, "aligned_result");
		Result = Builder.CreatePointerCast(Result, SrcType);

		Type *IntPtrTy = Builder.getIntPtrTy(DL);
		if (Alignment->getType() != IntPtrTy)
		Alignment =
		Builder.CreateIntCast(Alignment, IntPtrTy, false, "casted.align");
		(void)Builder.CreateAlignmentAssumption(DL, Result, Alignment);
		assert(Result->getType() == SrcType);
		return Result;
		}

// Build a struct that will keep state for an active coroutine.		// Build a struct that will keep state for an active coroutine.
// struct f.frame {		// struct f.frame {
// ResumeFnTy ResumeFnAddr;		// ResumeFnTy ResumeFnAddr;
// ResumeFnTy DestroyFnAddr;		// ResumeFnTy DestroyFnAddr;
// int ResumeIndex;		// int ResumeIndex;
// ... promise (if present) ...		// ... promise (if present) ...
// ... spills ...		// ... spills ...
// };		// };
Show All 38 Lines	if (Shape.ABI == coro::ABI::Switch) {
SwitchIndexFieldId = B.addField(IndexType, None);		SwitchIndexFieldId = B.addField(IndexType, None);
} else {		} else {
assert(PromiseAlloca == nullptr && "lowering doesn't support promises");		assert(PromiseAlloca == nullptr && "lowering doesn't support promises");
}		}

// Because multiple allocas may own the same field slot,		// Because multiple allocas may own the same field slot,
// we add allocas to field here.		// we add allocas to field here.
B.addFieldForAllocas(F, FrameData, Shape);		B.addFieldForAllocas(F, FrameData, Shape);

		// Create an entry for every spilled value.
		for (auto &S : FrameData.Spills) {
		FieldIDType Id = B.addField(S.first->getType(), None);
		FrameData.setFieldIndex(S.first, Id);
		}

		Optional<FieldIDType> FramePtrField = None;
		if (Shape.ABI == coro::ABI::Switch) {
// Add PromiseAlloca to Allocas list so that		// Add PromiseAlloca to Allocas list so that
// 1. updateLayoutIndex could update its index after		// 1. updateLayoutIndex could update its index after
// `performOptimizedStructLayout`		// `performOptimizedStructLayout`
// 2. it is processed in insertSpills.		// 2. it is processed in insertSpills.
if (Shape.ABI == coro::ABI::Switch && PromiseAlloca)		if (PromiseAlloca)
// We assume that the promise alloca won't be modified before		// We assume that the promise alloca won't be modified before
// CoroBegin and no alias will be create before CoroBegin.		// CoroBegin and no alias will be create before CoroBegin.
FrameData.Allocas.emplace_back(		FrameData.Allocas.emplace_back(
PromiseAlloca, DenseMap<Instruction *, llvm::Optional<APInt>>{}, false);		PromiseAlloca, DenseMap<Instruction *, llvm::Optional<APInt>>{},
// Create an entry for every spilled value.		false);
for (auto &S : FrameData.Spills) {
FieldIDType Id = B.addField(S.first->getType(), None);		Align FrameAlign =
FrameData.setFieldIndex(S.first, Id);		std::max_element(
		B.getFields().begin(), B.getFields().end(),
		[](auto &F1, auto &F2) { return F1.Alignment < F2.Alignment; })
		->Alignment;

		// Check for over-alignment.
		if (!Shape.CoroSizeAligneds.empty() &&
		FrameAlign > Shape.getSwitchCoroId()->getAlignment()) {
		BasicBlock &Entry = F.getEntryBlock();
		IRBuilder<> Builder(&Entry, Entry.getFirstInsertionPt());

		// Save raw frame pointer to alloca
		Value *Mem = Shape.CoroBegin->getMem();
		ychenAuthorUnsubmitted Done Reply Inline Actions This seems to align with what this mem argument is designed for: https://reviews.llvm.org/D66230#1631860 ychen: This seems to align with what this mem argument is designed for: https://reviews.llvm.
		AllocaInst *FramePtrAddr =
		Builder.CreateAlloca(Mem->getType(), nullptr, "alloc.frame.ptr");
		Builder.SetInsertPoint(Shape.CoroBegin);
		Value *MockMem = Builder.CreatePointerCast(FramePtrAddr, Mem->getType());
		Builder.CreateStore(MockMem, FramePtrAddr);

		// Ajust frame pointer value.
		Value *NewMem = emitAlignUpTo(Builder, MockMem, FrameAlign.value());
		Mem->replaceAllUsesWith(NewMem);
		MockMem->replaceAllUsesWith(Mem);
		cast<Instruction>(MockMem)->eraseFromParent();

		// Add alloca to frame.
		FramePtrField = B.addFieldForAlloca(FramePtrAddr);
		FrameData.setFieldIndex(FramePtrAddr, *FramePtrField);
		FrameData.Allocas.emplace_back(
		FramePtrAddr, DenseMap<Instruction *, llvm::Optional<APInt>>{}, true);
		}
}		}

B.finish(FrameTy);		B.finish(FrameTy);
FrameData.updateLayoutIndex(B);		FrameData.updateLayoutIndex(B);
Shape.FrameAlign = B.getStructAlign();		Shape.FrameAlign = B.getStructAlign();
Shape.FrameSize = B.getStructSize();		Shape.FrameSize = B.getStructSize();

switch (Shape.ABI) {		switch (Shape.ABI) {
case coro::ABI::Switch:		case coro::ABI::Switch:
// In the switch ABI, remember the switch-index field.		// In the switch ABI, remember the switch-index field.
Shape.SwitchLowering.IndexField =		Shape.SwitchLowering.IndexField =
B.getLayoutFieldIndex(*SwitchIndexFieldId);		B.getLayoutFieldIndex(*SwitchIndexFieldId);

		if (FramePtrField)
		Shape.SwitchLowering.FramePtrField =
		B.getLayoutFieldIndex(*FramePtrField);

// Also round the frame size up to a multiple of its alignment, as is		// Also round the frame size up to a multiple of its alignment, as is
// generally expected in C/C++.		// generally expected in C/C++.
Shape.FrameSize = alignTo(Shape.FrameSize, Shape.FrameAlign);		Shape.FrameSize = alignTo(Shape.FrameSize, Shape.FrameAlign);
break;		break;

// In the retcon ABI, remember whether the frame is inline in the storage.		// In the retcon ABI, remember whether the frame is inline in the storage.
case coro::ABI::Retcon:		case coro::ABI::Retcon:
case coro::ABI::RetconOnce: {		case coro::ABI::RetconOnce: {
▲ Show 20 Lines • Show All 1,561 Lines • Show Last 20 Lines

llvm/lib/Transforms/Coroutines/CoroInstr.h

Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
public:		public:
AllocaInst *getPromise() const {		AllocaInst *getPromise() const {
Value *Arg = getArgOperand(PromiseArg);		Value *Arg = getArgOperand(PromiseArg);
return isa<ConstantPointerNull>(Arg)		return isa<ConstantPointerNull>(Arg)
? nullptr		? nullptr
: cast<AllocaInst>(Arg->stripPointerCasts());		: cast<AllocaInst>(Arg->stripPointerCasts());
}		}

		unsigned getAlignment() const {
		return cast<ConstantInt>(getArgOperand(AlignArg))->getZExtValue();
		}

void clearPromise() {		void clearPromise() {
Value *Arg = getArgOperand(PromiseArg);		Value *Arg = getArgOperand(PromiseArg);
setArgOperand(PromiseArg,		setArgOperand(PromiseArg,
ConstantPointerNull::get(Type::getInt8PtrTy(getContext())));		ConstantPointerNull::get(Type::getInt8PtrTy(getContext())));
if (isa<AllocaInst>(Arg))		if (isa<AllocaInst>(Arg))
return;		return;
assert((isa<BitCastInst>(Arg) \|\| isa<GetElementPtrInst>(Arg)) &&		assert((isa<BitCastInst>(Arg) \|\| isa<GetElementPtrInst>(Arg)) &&
"unexpected instruction designating the promise");		"unexpected instruction designating the promise");
▲ Show 20 Lines • Show All 462 Lines • ▼ Show 20 Lines	public:
static bool classof(const IntrinsicInst *I) {		static bool classof(const IntrinsicInst *I) {
return I->getIntrinsicID() == Intrinsic::coro_size;		return I->getIntrinsicID() == Intrinsic::coro_size;
}		}
static bool classof(const Value *V) {		static bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}		}
};		};

		/// This represents the llvm.coro.size.aligned instruction.
		class LLVM_LIBRARY_VISIBILITY CoroSizeAlignedInst : public IntrinsicInst {
		public:
		// Methods to support type inquiry through isa, cast, and dyn_cast:
		static bool classof(const IntrinsicInst *I) {
		return I->getIntrinsicID() == Intrinsic::coro_size_aligned;
		}
		static bool classof(const Value *V) {
		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
		}
		};

class LLVM_LIBRARY_VISIBILITY AnyCoroEndInst : public IntrinsicInst {		class LLVM_LIBRARY_VISIBILITY AnyCoroEndInst : public IntrinsicInst {
enum { FrameArg, UnwindArg };		enum { FrameArg, UnwindArg };

public:		public:
bool isFallthrough() const { return !isUnwind(); }		bool isFallthrough() const { return !isUnwind(); }
bool isUnwind() const {		bool isUnwind() const {
return cast<Constant>(getArgOperand(UnwindArg))->isOneValue();		return cast<Constant>(getArgOperand(UnwindArg))->isOneValue();
}		}
▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

llvm/lib/Transforms/Coroutines/CoroInternal.h

Show All 38 Lines
#define CORO_PRESPLIT_ATTR "coroutine.presplit"		#define CORO_PRESPLIT_ATTR "coroutine.presplit"
#define UNPREPARED_FOR_SPLIT "0"		#define UNPREPARED_FOR_SPLIT "0"
#define PREPARED_FOR_SPLIT "1"		#define PREPARED_FOR_SPLIT "1"
#define ASYNC_RESTART_AFTER_SPLIT "2"		#define ASYNC_RESTART_AFTER_SPLIT "2"

#define CORO_DEVIRT_TRIGGER_FN "coro.devirt.trigger"		#define CORO_DEVIRT_TRIGGER_FN "coro.devirt.trigger"

namespace coro {		namespace coro {
		struct Shape;

bool declaresIntrinsics(const Module &M,		bool declaresIntrinsics(const Module &M,
const std::initializer_list<StringRef>);		const std::initializer_list<StringRef>);
void replaceCoroFree(CoroIdInst *CoroId, bool Elide);		void replaceCoroFree(CoroIdInst CoroId, bool Elide, Shape Shape = nullptr);
void updateCallGraph(Function &Caller, ArrayRef<Function *> Funcs,		void updateCallGraph(Function &Caller, ArrayRef<Function *> Funcs,
CallGraph &CG, CallGraphSCC &SCC);		CallGraph &CG, CallGraphSCC &SCC);
/// Recover a dbg.declare prepared by the frontend and emit an alloca		/// Recover a dbg.declare prepared by the frontend and emit an alloca
/// holding a pointer to the coroutine frame.		/// holding a pointer to the coroutine frame.
void salvageDebugInfo(		void salvageDebugInfo(
SmallDenseMap<llvm::Value , llvm::AllocaInst , 4> &DbgPtrAllocaCache,		SmallDenseMap<llvm::Value , llvm::AllocaInst , 4> &DbgPtrAllocaCache,
DbgDeclareInst *DDI);		DbgDeclareInst *DDI);

Show All 35 Lines
};		};

// Holds structural Coroutine Intrinsics for a particular function and other		// Holds structural Coroutine Intrinsics for a particular function and other
// values used during CoroSplit pass.		// values used during CoroSplit pass.
struct LLVM_LIBRARY_VISIBILITY Shape {		struct LLVM_LIBRARY_VISIBILITY Shape {
CoroBeginInst *CoroBegin;		CoroBeginInst *CoroBegin;
SmallVector<AnyCoroEndInst *, 4> CoroEnds;		SmallVector<AnyCoroEndInst *, 4> CoroEnds;
SmallVector<CoroSizeInst *, 2> CoroSizes;		SmallVector<CoroSizeInst *, 2> CoroSizes;
		SmallVector<CoroSizeAlignedInst *, 2> CoroSizeAligneds;
SmallVector<AnyCoroSuspendInst *, 4> CoroSuspends;		SmallVector<AnyCoroSuspendInst *, 4> CoroSuspends;
SmallVector<CallInst*, 2> SwiftErrorOps;		SmallVector<CallInst*, 2> SwiftErrorOps;

// Field indexes for special fields in the switch lowering.		// Field indexes for special fields in the switch lowering.
struct SwitchFieldIndex {		struct SwitchFieldIndex {
enum {		enum {
Resume,		Resume,
Destroy		Destroy
Show All 17 Lines	struct LLVM_LIBRARY_VISIBILITY Shape {

bool ReuseFrameSlot;		bool ReuseFrameSlot;

struct SwitchLoweringStorage {		struct SwitchLoweringStorage {
SwitchInst *ResumeSwitch;		SwitchInst *ResumeSwitch;
AllocaInst *PromiseAlloca;		AllocaInst *PromiseAlloca;
BasicBlock *ResumeEntryBlock;		BasicBlock *ResumeEntryBlock;
unsigned IndexField;		unsigned IndexField;
		Optional<unsigned> FramePtrField;
bool HasFinalSuspend;		bool HasFinalSuspend;
};		};

struct RetconLoweringStorage {		struct RetconLoweringStorage {
Function *ResumePrototype;		Function *ResumePrototype;
Function *Alloc;		Function *Alloc;
Function *Dealloc;		Function *Dealloc;
BasicBlock *ReturnBlock;		BasicBlock *ReturnBlock;
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	struct LLVM_LIBRARY_VISIBILITY Shape {
/// \param CG - if non-null, will be updated for the new call		/// \param CG - if non-null, will be updated for the new call
Value emitAlloc(IRBuilder<> &Builder, Value Size, CallGraph *CG) const;		Value emitAlloc(IRBuilder<> &Builder, Value Size, CallGraph *CG) const;

/// Deallocate memory according to the rules of the active lowering.		/// Deallocate memory according to the rules of the active lowering.
///		///
/// \param CG - if non-null, will be updated for the new call		/// \param CG - if non-null, will be updated for the new call
void emitDealloc(IRBuilder<> &Builder, Value Ptr, CallGraph CG) const;		void emitDealloc(IRBuilder<> &Builder, Value Ptr, CallGraph CG) const;

Shape() = default;
explicit Shape(Function &F, bool ReuseFrameSlot = false)		explicit Shape(Function &F, bool ReuseFrameSlot = false)
: ReuseFrameSlot(ReuseFrameSlot) {		: ReuseFrameSlot(ReuseFrameSlot) {
buildFrom(F);		buildFrom(F);
}		}
void buildFrom(Function &F);		void buildFrom(Function &F);
};		};

void buildCoroutineFrame(Function &F, Shape &Shape);		void buildCoroutineFrame(Function &F, Shape &Shape);
CallInst createMustTailCall(DebugLoc Loc, Function MustTailCallFn,		CallInst createMustTailCall(DebugLoc Loc, Function MustTailCallFn,
ArrayRef<Value *> Arguments, IRBuilder<> &);		ArrayRef<Value *> Arguments, IRBuilder<> &);
} // End namespace coro.		} // End namespace coro.
} // End namespace llvm		} // End namespace llvm

#endif		#endif

llvm/lib/Transforms/Coroutines/CoroSplit.cpp

Show First 20 Lines • Show All 956 Lines • ▼ Show 20 Lines	void CoroCloner::create() {

// Salvage debug info that points into the coroutine frame.		// Salvage debug info that points into the coroutine frame.
salvageDebugInfo();		salvageDebugInfo();

// Eliminate coro.free from the clones, replacing it with 'null' in cleanup,		// Eliminate coro.free from the clones, replacing it with 'null' in cleanup,
// to suppress deallocation code.		// to suppress deallocation code.
if (Shape.ABI == coro::ABI::Switch)		if (Shape.ABI == coro::ABI::Switch)
coro::replaceCoroFree(cast<CoroIdInst>(VMap[Shape.CoroBegin->getId()]),		coro::replaceCoroFree(cast<CoroIdInst>(VMap[Shape.CoroBegin->getId()]),
/Elide=/ FKind == CoroCloner::Kind::SwitchCleanup);		/Elide=/FKind == CoroCloner::Kind::SwitchCleanup,
		&Shape);
}		}

// Create a resume clone by cloning the body of the original function, setting		// Create a resume clone by cloning the body of the original function, setting
// new entry block and replacing coro.suspend an appropriate value to force		// new entry block and replacing coro.suspend an appropriate value to force
// resume or cleanup pass for every suspend point.		// resume or cleanup pass for every suspend point.
static Function *createClone(Function &F, const Twine &Suffix,		static Function *createClone(Function &F, const Twine &Suffix,
coro::Shape &Shape, CoroCloner::Kind FKind) {		coro::Shape &Shape, CoroCloner::Kind FKind) {
CoroCloner Cloner(F, Suffix, Shape, FKind);		CoroCloner Cloner(F, Suffix, Shape, FKind);
Show All 22 Lines	static void updateAsyncFuncPointerContextSize(coro::Shape &Shape) {

Shape.AsyncLowering.AsyncFuncPointer->setInitializer(NewFuncPtrStruct);		Shape.AsyncLowering.AsyncFuncPointer->setInitializer(NewFuncPtrStruct);
}		}

static void replaceFrameSize(coro::Shape &Shape) {		static void replaceFrameSize(coro::Shape &Shape) {
if (Shape.ABI == coro::ABI::Async)		if (Shape.ABI == coro::ABI::Async)
updateAsyncFuncPointerContextSize(Shape);		updateAsyncFuncPointerContextSize(Shape);

if (Shape.CoroSizes.empty())		if (!Shape.CoroSizes.empty()) {
return;

// In the same function all coro.sizes should have the same result type.		// In the same function all coro.sizes should have the same result type.
auto *SizeIntrin = Shape.CoroSizes.back();		auto *SizeIntrin = Shape.CoroSizes.back();
Module *M = SizeIntrin->getModule();		Module *M = SizeIntrin->getModule();
const DataLayout &DL = M->getDataLayout();		const DataLayout &DL = M->getDataLayout();
auto Size = DL.getTypeAllocSize(Shape.FrameTy);		auto Size = DL.getTypeAllocSize(Shape.FrameTy);
auto *SizeConstant = ConstantInt::get(SizeIntrin->getType(), Size);		auto *SizeConstant = ConstantInt::get(SizeIntrin->getType(), Size);

for (CoroSizeInst *CS : Shape.CoroSizes) {		for (CoroSizeInst *CS : Shape.CoroSizes) {
CS->replaceAllUsesWith(SizeConstant);		CS->replaceAllUsesWith(SizeConstant);
CS->eraseFromParent();		CS->eraseFromParent();
}		}
}		}

		if (!Shape.CoroSizeAligneds.empty()) {
		auto *SizeIntrin = Shape.CoroSizeAligneds.back();
		Module *M = SizeIntrin->getModule();
		const DataLayout &DL = M->getDataLayout();
		auto Size = DL.getTypeAllocSize(Shape.FrameTy);

		uint64_t FrameAlign = Shape.FrameAlign.value();
		uint64_t NewAlign = Shape.getSwitchCoroId()->getAlignment();
		uint64_t Extra = FrameAlign > NewAlign ? FrameAlign - NewAlign : 0;
		auto *SizeConstant = ConstantInt::get(SizeIntrin->getType(), Size + Extra);

		for (CoroSizeAlignedInst *CS : Shape.CoroSizeAligneds) {
		CS->replaceAllUsesWith(SizeConstant);
		CS->eraseFromParent();
		}
		}
		}

// Create a global constant array containing pointers to functions provided and		// Create a global constant array containing pointers to functions provided and
// set Info parameter of CoroBegin to point at this constant. Example:		// set Info parameter of CoroBegin to point at this constant. Example:
//		//
// @f.resumers = internal constant [2 x void(%f.frame)]		// @f.resumers = internal constant [2 x void(%f.frame)]
// [void(%f.frame) @f.resume, void(%f.frame) @f.destroy]		// [void(%f.frame) @f.resume, void(%f.frame) @f.destroy]
// define void @f() {		// define void @f() {
// ...		// ...
// call i8* @llvm.coro.begin(i8* null, i32 0, i8* null,		// call i8* @llvm.coro.begin(i8* null, i32 0, i8* null,
▲ Show 20 Lines • Show All 217 Lines • ▼ Show 20 Lines
// frame if possible.		// frame if possible.
static void handleNoSuspendCoroutine(coro::Shape &Shape) {		static void handleNoSuspendCoroutine(coro::Shape &Shape) {
auto *CoroBegin = Shape.CoroBegin;		auto *CoroBegin = Shape.CoroBegin;
auto *CoroId = CoroBegin->getId();		auto *CoroId = CoroBegin->getId();
auto *AllocInst = CoroId->getCoroAlloc();		auto *AllocInst = CoroId->getCoroAlloc();
switch (Shape.ABI) {		switch (Shape.ABI) {
case coro::ABI::Switch: {		case coro::ABI::Switch: {
auto SwitchId = cast<CoroIdInst>(CoroId);		auto SwitchId = cast<CoroIdInst>(CoroId);
coro::replaceCoroFree(SwitchId, /Elide=/AllocInst != nullptr);		coro::replaceCoroFree(SwitchId, /Elide=/AllocInst != nullptr, &Shape);
if (AllocInst) {		if (AllocInst) {
IRBuilder<> Builder(AllocInst);		IRBuilder<> Builder(AllocInst);
auto *Frame = Builder.CreateAlloca(Shape.FrameTy);		auto *Frame = Builder.CreateAlloca(Shape.FrameTy);
Frame->setAlignment(Shape.FrameAlign);		Frame->setAlignment(Shape.FrameAlign);
auto *VFrame = Builder.CreateBitCast(Frame, Builder.getInt8PtrTy());		auto *VFrame = Builder.CreateBitCast(Frame, Builder.getInt8PtrTy());
AllocInst->replaceAllUsesWith(Builder.getFalse());		AllocInst->replaceAllUsesWith(Builder.getFalse());
AllocInst->eraseFromParent();		AllocInst->eraseFromParent();
CoroBegin->replaceAllUsesWith(VFrame);		CoroBegin->replaceAllUsesWith(VFrame);
▲ Show 20 Lines • Show All 962 Lines • Show Last 20 Lines

llvm/lib/Transforms/Coroutines/Coroutines.cpp

Show All 16 Lines
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Analysis/CallGraph.h"		#include "llvm/Analysis/CallGraph.h"
#include "llvm/Analysis/CallGraphSCCPass.h"		#include "llvm/Analysis/CallGraphSCCPass.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	if (M.getNamedValue(Name))
return true;		return true;
}		}

return false;		return false;
}		}

// Replace all coro.frees associated with the provided CoroId either with 'null'		// Replace all coro.frees associated with the provided CoroId either with 'null'
// if Elide is true and with its frame parameter otherwise.		// if Elide is true and with its frame parameter otherwise.
void coro::replaceCoroFree(CoroIdInst *CoroId, bool Elide) {		void coro::replaceCoroFree(CoroIdInst CoroId, bool Elide, Shape Shape) {
SmallVector<CoroFreeInst *, 4> CoroFrees;		SmallVector<CoroFreeInst *, 4> CoroFrees;
for (User *U : CoroId->users())		for (User *U : CoroId->users())
if (auto CF = dyn_cast<CoroFreeInst>(U))		if (auto CF = dyn_cast<CoroFreeInst>(U))
CoroFrees.push_back(CF);		CoroFrees.push_back(CF);

if (CoroFrees.empty())		if (CoroFrees.empty())
return;		return;

Value *Replacement =		LLVMContext &Ctx = CoroId->getContext();
Elide ? ConstantPointerNull::get(Type::getInt8PtrTy(CoroId->getContext()))		PointerType *Int8PtrTy = Type::getInt8PtrTy(Ctx);
		Value *Replacement = Elide ? ConstantPointerNull::get(Int8PtrTy)
: CoroFrees.front()->getFrame();		: CoroFrees.front()->getFrame();

		if (!Elide && Shape && Shape->SwitchLowering.FramePtrField) {
		unsigned FramePtrField = *Shape->SwitchLowering.FramePtrField;
		for (CoroFreeInst *CF : CoroFrees) {
		IRBuilder<> Builder(CF);
		Value *FramePtr =
		Builder.CreateBitCast(Replacement, Shape->FrameTy->getPointerTo());
		Value *GEP = Builder.CreateConstGEP2_32(Shape->FrameTy, FramePtr, 0,
		FramePtrField);
		Value *LI = Builder.CreateLoad(Int8PtrTy, GEP, "raw.frame.ptr");
		CF->replaceAllUsesWith(LI);
		CF->eraseFromParent();
		}
		return;
		}

for (CoroFreeInst *CF : CoroFrees) {		for (CoroFreeInst *CF : CoroFrees) {
CF->replaceAllUsesWith(Replacement);		CF->replaceAllUsesWith(Replacement);
CF->eraseFromParent();		CF->eraseFromParent();
}		}
}		}

// FIXME: This code is stolen from CallGraph::addToCallGraph(Function *F), which		// FIXME: This code is stolen from CallGraph::addToCallGraph(Function *F), which
// happens to be private. It is better for this functionality exposed by the		// happens to be private. It is better for this functionality exposed by the
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	void coro::Shape::buildFrom(Function &F) {
for (Instruction &I : instructions(F)) {		for (Instruction &I : instructions(F)) {
if (auto II = dyn_cast<IntrinsicInst>(&I)) {		if (auto II = dyn_cast<IntrinsicInst>(&I)) {
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
default:		default:
continue;		continue;
case Intrinsic::coro_size:		case Intrinsic::coro_size:
CoroSizes.push_back(cast<CoroSizeInst>(II));		CoroSizes.push_back(cast<CoroSizeInst>(II));
break;		break;
		case Intrinsic::coro_size_aligned:
		CoroSizeAligneds.push_back(cast<CoroSizeAlignedInst>(II));
		break;
case Intrinsic::coro_frame:		case Intrinsic::coro_frame:
CoroFrames.push_back(cast<CoroFrameInst>(II));		CoroFrames.push_back(cast<CoroFrameInst>(II));
break;		break;
case Intrinsic::coro_save:		case Intrinsic::coro_save:
// After optimizations, coro_suspends using this coro_save might have		// After optimizations, coro_suspends using this coro_save might have
// been removed, remember orphaned coro_saves to remove them later.		// been removed, remember orphaned coro_saves to remove them later.
if (II->use_empty())		if (II->use_empty())
UnusedCoroSaves.push_back(cast<CoroSaveInst>(II));		UnusedCoroSaves.push_back(cast<CoroSaveInst>(II));
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	void coro::Shape::buildFrom(Function &F) {
switch (auto IdIntrinsic = Id->getIntrinsicID()) {		switch (auto IdIntrinsic = Id->getIntrinsicID()) {
case Intrinsic::coro_id: {		case Intrinsic::coro_id: {
auto SwitchId = cast<CoroIdInst>(Id);		auto SwitchId = cast<CoroIdInst>(Id);
this->ABI = coro::ABI::Switch;		this->ABI = coro::ABI::Switch;
this->SwitchLowering.HasFinalSuspend = HasFinalSuspend;		this->SwitchLowering.HasFinalSuspend = HasFinalSuspend;
this->SwitchLowering.ResumeSwitch = nullptr;		this->SwitchLowering.ResumeSwitch = nullptr;
this->SwitchLowering.PromiseAlloca = SwitchId->getPromise();		this->SwitchLowering.PromiseAlloca = SwitchId->getPromise();
this->SwitchLowering.ResumeEntryBlock = nullptr;		this->SwitchLowering.ResumeEntryBlock = nullptr;
		this->SwitchLowering.FramePtrField = None;

for (auto AnySuspend : CoroSuspends) {		for (auto AnySuspend : CoroSuspends) {
auto Suspend = dyn_cast<CoroSuspendInst>(AnySuspend);		auto Suspend = dyn_cast<CoroSuspendInst>(AnySuspend);
if (!Suspend) {		if (!Suspend) {
#ifndef NDEBUG		#ifndef NDEBUG
AnySuspend->dump();		AnySuspend->dump();
#endif		#endif
report_fatal_error("coro.id must be paired with coro.suspend");		report_fatal_error("coro.id must be paired with coro.suspend");
▲ Show 20 Lines • Show All 367 Lines • Show Last 20 Lines

llvm/test/Transforms/Coroutines/coro-overalign.ll

This file was added.

				; Check that we will emit extra code to handle overaligned frame.
				; RUN: opt < %s -coro-split -S \| FileCheck %s
				; RUN: opt < %s -passes=coro-split -S \| FileCheck %s

				%PackedStruct = type <{ i64 }>

				declare void @consume(%PackedStruct*)

				define i8* @f() "coroutine.presplit"="1" {
				entry:
				%data = alloca %PackedStruct, align 32
				%id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* null)
				%size = call i32 @llvm.coro.size.aligned.i32()
				%alloc = call i8* @malloc(i32 %size)
				%hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
				call void @consume(%PackedStruct* %data)
				%0 = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %0, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]
				resume:
				call void @consume(%PackedStruct* %data)
				br label %cleanup

				cleanup:
				%mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
				call void @free(i8* %mem)
				br label %suspend
				suspend:
				call i1 @llvm.coro.end(i8* %hdl, i1 0)
				ret i8* %hdl
				}

				; See if the frame pointer was inserted.
				; CHECK-LABEL: %f.Frame = type { void (%f.Frame), void (%f.Frame), i8*, i1, [7 x i8], %PackedStruct }

				; See if we over-allocate, adjust frame ptr start address and use a alloca to
				; save the raw frame pointer.
				; CHECK-LABEL: @f(
				;CHECK: %alloc.frame.ptr = alloca i8*, align 8
				;CHECK: %id = call token @llvm.coro.id(i32 16, i8* null, i8* null, i8* bitcast ([3 x void (%f.Frame)]* @f.resumers to i8*))
				;CHECK: %alloc = call i8* @malloc(i32 56)
				;CHECK: store i8* %alloc, i8** %alloc.frame.ptr, align 8
				;CHECK: %intptr = ptrtoint i8* %alloc to i64
				;CHECK: %over_boundary = add i64 %intptr, 31
				;CHECK: %aligned_intptr = and i64 %over_boundary, -32
				;CHECK: %diff = sub i64 %aligned_intptr, %intptr
				;CHECK: %aligned_result = getelementptr i8, i8* %alloc, i64 %diff
				;CHECK: call void @llvm.assume(i1 true) [ "align"(i8* %aligned_result, i64 32) ]
				;CHECK: %hdl = call noalias nonnull i8* @llvm.coro.begin(token %id, i8* %aligned_result)

				; See if we emit correct deallocation code.

				; CHECK-LABEL: @f.resume(
				; CHECK: %0 = getelementptr %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
				; CHECK-NEXT: %raw.frame.ptr = load i8, i8* %0, align 8
				; CHECK-NEXT: call void @free(i8* %raw.frame.ptr)
				; CHECK-NEXT: ret void

				; CHECK-LABEL: @f.destroy(
				; CHECK: %0 = getelementptr %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
				; CHECK-NEXT: %raw.frame.ptr = load i8, i8* %0, align 8
				; CHECK-NEXT: call void @free(i8* %raw.frame.ptr)
				; CHECK-NEXT: ret void

				; CHECK-LABEL: @f.cleanup(
				; CHECK: call void @free(i8* null)
				; CHECK-NEXT: ret void

				declare i8* @llvm.coro.free(token, i8*)
				declare i32 @llvm.coro.size.aligned.i32()
				declare i8 @llvm.coro.suspend(token, i1)
				declare void @llvm.coro.resume(i8*)
				declare void @llvm.coro.destroy(i8*)

				declare token @llvm.coro.id(i32, i8, i8, i8*)
				declare i1 @llvm.coro.alloc(token)
				declare i8* @llvm.coro.begin(token, i8*)
				declare i1 @llvm.coro.end(i8*, i1)

				declare noalias i8* @malloc(i32)
				declare void @free(i8*)

This is an archive of the discontinued LLVM Phabricator instance.

[Coroutines] Handle overaligned frame allocation (2)AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 339906

clang/lib/CodeGen/CGBuiltin.cpp

clang/test/CodeGenCoroutines/coro-alloc.cpp

clang/test/CodeGenCoroutines/coro-builtins.c

clang/test/CodeGenCoroutines/coro-gro.cpp

llvm/docs/Coroutines.rst

llvm/include/llvm/IR/Intrinsics.td

llvm/lib/Transforms/Coroutines/CoroFrame.cpp

llvm/lib/Transforms/Coroutines/CoroInstr.h

llvm/lib/Transforms/Coroutines/CoroInternal.h

llvm/lib/Transforms/Coroutines/CoroSplit.cpp

llvm/lib/Transforms/Coroutines/Coroutines.cpp

llvm/test/Transforms/Coroutines/coro-overalign.ll

[Coroutines] Handle overaligned frame allocation (2)
AbandonedPublic