Index: clang/docs/DebuggingCoroutines.rst
===================================================================
--- /dev/null
+++ clang/docs/DebuggingCoroutines.rst
@@ -0,0 +1,371 @@
+=======
+Debugging C++ Coroutines
+=======
+
+.. contents::
+   :local:
+
+Introduction
+============
+
+Due to some reasons, especially for performance, the implementation for the C++
+corotuines in Clang compiler consists of 2 parts: the clang part for semantic
+analyzing and the LLVM part for constructing and otpimizing the coroutine
+frame. However, the design hurts the debugability of C++ coroutines. Since the
+compiler would generate debug information in the frontend generally due to the
+debug information is highly language specific. However, the frontend can't emit
+debug information for coroutine frames since the coroutine frames are
+constructed in the middle end. To mitigare the painful problem, we tried to
+generate debug information in the middle end. This document tries to tell
+how to use the generated debug information to help debug. 
+
+Note: due to many language related information is missing in the middle end,
+we might not be able to induce as many information as we want.
+
+Terminology
+============
+
+Due to C++20 coroutines is a new feature and the language specification talks
+it with the smallest words, the terminology used now is in chaos. The section
+tries to specify the terminology used in this document.
+
+coroutine type
+---------------
+Here we say a coroutine type is a type which could be the return type of a
+coroutine function. (A coroutine function is a function which contains any of
+'co_await', 'co_yield' and 'co_return'). 'Task' and 'Generator' are well known
+coroutine types.
+
+coroutine
+---------------
+
+Technically, a coroutine is a suspendable function. However, we generally say a
+coroutine as if a coroutine instance in practice. For example:
+
+.. code-block:: c++
+
+  std::vector<Task> Coros; // Task is a coroutine type.
+  for (int i = 0; i < 3; i++)
+    Coros.push_back(CoroTask()); // CoroTask is a coroutine function, which
+                                 // would return a coroutine type 'Task'.
+
+In practice, we generally say "Coros contains 3 coroutines" after the above
+code snippet. But this is not strictly right. Maybe we should say "Coros
+contain 3 coroutine instances" or "Coros contains 3 coroutine objects".
+
+In the document, we would use the term 'coroutine' as if 'coroutine instances'
+since both the terms 'coroutine instance' and 'coroutine object' is not well
+defined.
+
+coroutine frame
+---------------
+
+In the language specification, it said 'coroutine state' to describe the
+the allocated storage. And in the compiler side, we generally use 'coroutine
+frame' to describe the generated data structure which keeps all the needed
+information.
+
+The structure of coroutine frames
+============
+
+The structure of coroutine frames is fixed as:
+
+.. code-block:: c++
+
+  struct {
+    void (*__r)(); // function pointer to resume function
+	  void (*__d)(); // function pointer to destroy function
+    promise_type; // Corresponding promise_type
+    ... // Any other needed information 
+  }
+
+Since we could get the function name by the function address in the debugger,
+we could know the coroutine function once we get the address of a coroutine
+frame.
+
+Print promise_type
+============
+
+Every coroutine has a promise_type. The promise_type is shared by common
+coroutine types. To print a promise_type in the debugger, if we stop at
+a breakpoint in a coroutine, we could print the promise_type by:
+
+.. parsed-literal::
+
+  p __promise
+
+And if we want to print a promise_type of other coroutine, we could print it
+as long as we know the address of that coroutine frame. For example, if the
+address of the wanted coroutine frame is `0x416eb0`, and the type of the wanted
+promise_type is `task::promise_type`. We could print the promise_type by:
+
+.. parsed-literal::
+
+  p (task::promise_type)*(0x416eb0+0x10)
+
+Since the offset of promise_type from coroutine frame is guaranteed to be 16 by
+the ABI.
+
+Print coroutine frames
+============
+
+We would generate debug information for the coroutine frame in the middle end.
+So that we could print the coroutine frame. Similar to promise_type, if we stop
+at a breakpoint in a coroutine, we could print the coroutine frame by:
+
+.. parsed-literal::
+
+  p __coro_frame
+
+And if we want to print a coroutine frame of other coroutine and we know the
+address is '0x418eb0', we could get it by (example in gdb):
+
+.. parsed-literal::
+
+  (gdb) # Get the address of coroutine frame
+  (gdb) p/x *0x418eb0
+  $1 = 0x4019e0
+  (gdb) # Get the linkage name for the coroutine
+  (gdb) x 0x4019e0
+  0x4019e0 <_ZL9coro_taski>:	0xe5894855
+  (gdb) # The coroutine frame type is 'linkage name + __coro_frame_ty'
+  (gdb) p  (_ZL9coro_taski__coro_frame_ty)*(0x418eb0)
+  $2 = {__resume_fn = 0x4019e0 <coro_task(int)>, __destroy_fn = 0x402000 <coro_task(int)>, __promise = {...}, ...}
+
+The knowledge behind above code is:
+(1) The name of the debug type of coroutine frame is the 'linkage_name' plus
+'__coro_frame_ty' suffix. Since each coroutine function should have the same
+coroutine type.
+(2) We could get the coroutine function name by the address of the coroutine
+frame.
+
+We could ease the above commands in debug scripts.
+ 
+Examples to print coroutine frames
+---------------
+
+Here we use the example to show the printing result for coroutine frames:
+
+.. code-block:: c++
+
+  #include "coroutine"
+  #include <iostream>
+  
+  struct task{
+    struct promise_type {
+      task get_return_object() { return std::coroutine_handle<promise_type>::from_promise(*this); }
+      std::suspend_always initial_suspend() { return {}; }
+      std::suspend_always final_suspend() noexcept { return {}; }
+      void return_void() noexcept {}
+      void unhandled_exception() noexcept {}
+  
+      int count = 0;
+    };
+  
+    void resume() noexcept {
+      handle.resume();
+    }
+  
+    task(std::coroutine_handle<promise_type> hdl) : handle(hdl) {}
+    ~task() {
+      if (handle)
+        handle.destroy();
+    }
+  
+    std::coroutine_handle<> handle;
+  };
+  
+  class await_counter : public std::suspend_always {
+  public:
+      template<class PromiseType>
+      void await_suspend(std::coroutine_handle<PromiseType> handle) noexcept {
+          handle.promise().count++;
+      }
+  };
+  
+  static task coro_task(int v) {
+    int a = v;
+    co_await await_counter{};
+    a++;
+    std::cout << a << "\n";
+    a++;
+    std::cout << a << "\n";
+    a++;
+    std::cout << a << "\n";
+    co_await await_counter{};
+    a++;
+    std::cout << a << "\n";
+    a++;
+    std::cout << a << "\n";
+  }
+   
+  int main() {
+    task t = coro_task(43);
+    t.resume();
+    t.resume();
+    t.resume();
+    return 0;
+  }
+
+In debug mode (O0 + g), the printing result would be something like:
+
+.. parsed-literal::
+
+  {__resume_fn = 0x4019e0 <coro_task(int)>, __destroy_fn = 0x402000 <coro_task(int)>, __promise = {count = 1}, v = 43, a = 45, __coro_index = 1 '\001', struct_std__suspend_always_0 = {__int_8 = 0 '\000'},
+    class_await_counter_1 = {__int_8 = 0 '\000'}, class_await_counter_2 = {__int_8 = 0 '\000'}, struct_std__suspend_always_3 = {__int_8 = 0 '\000'}}
+
+We could find that the result is pretty clear. We could find the name and value
+for 'v' and 'a' clearly. Also we could the temporary values for `await_counter`
+and `std::suspend_always`. The `__coro_index` means the index of suspend points
+the coroutine suspended at.
+
+However, things changed if we turn optimizations on:
+
+.. parsed-literal::
+
+  {__resume_fn = 0x401280 <coro_task(int)>, __destroy_fn = 0x401390 <coro_task(int)>, __promise = {count = 1}, __int_32_0 = 43, __coro_index = 1 '\001'}
+
+Many unused values get optimized out. However, the name for local variable 'a'
+gets optimized out too. We only know it is an int. Although we could know
+`__int_32_0` means 'a' in this simple case, it goes unclear in complex case.
+
+Another important note with optimization is that we shouldn't bind the variable
+in the codes with the slot in the frame tightly. Here is the example:
+
+.. code-block:: c++
+
+  static task coro_task(int v) {
+    int a = v;
+    co_await await_counter{};
+    a++; // __int_32_0 is 43 here
+    std::cout << a << "\n";
+    a++; // __int_32_0 is still 43 here 
+    std::cout << a << "\n";
+    a++; // __int_32_0 is still 43 here!
+    std::cout << a << "\n";
+    co_await await_counter{};
+    a++; // __int_32_0 is still 43 here!!
+    std::cout << a << "\n";
+    a++; // Why is __int_32_0 still 43 here?
+    std::cout << a << "\n";
+  }
+
+In the above example, if we choose to debug step by step, we could find the
+value of `__int_32_0` in coroutine frame is not right. It is always 43 after
+the initial suspend. It is really surprsing. The reason behind is that the
+compiler would try to eliminate load/store as much as possible. So the above
+code would be optimized to:
+
+.. code-block:: c++
+
+  static task coro_task(int v) {
+    store v to __int_32_0 in the frame 
+    co_await await_counter{};
+    a = load __int_32_0
+    std::cout << a+1 << "\n";
+    std::cout << a+2 << "\n";
+    std::cout << a+3 << "\n";
+    co_await await_counter{};
+    a = load __int_32_0
+    std::cout << a+4 << "\n";
+    std::cout << a+5 << "\n";
+  }
+
+It makes sense now why the value of `__int_32_0` is always 43. Since
+`__int_32_0` is not 'a'. The `__int_32_0` is just a helper variable created by
+the compiler. So the tip here is to not think the variables in coroutine
+frame is equal to the variables in the source C++ codes. They are related but
+not equal.
+
+Get the suspended points
+============
+
+An important and general requirement of debugging in coroutines is to know the
+suspended points. In other words, we want to know where the coroutine is
+suspended and what it is awaiting for.
+
+To make the simple solution, we could observe the `__coro_index` variable in
+the coroutine frame. It works really well for simple straight line cases like
+the above one.
+
+But it might not be so simple in really complex situations. In these cases,
+we could know the suspending line number by coworking with coroutine libraries.
+
+For example:
+
+.. code-block:: c++
+  
+  // For all the promise_type we want:
+  class promise_type {
+    ...
+  +  unsigned line_number = 0xffffffff;
+  };
+
+  #include <source_location>
+  
+  // For all the awaiter type we need:
+  class awaiter {
+    ...
+    template <typename Promise>
+    void await_suspend(std::coroutine_handle<Promise> handle,
+                       std::source_location sl = std::source_location::current) {
+          ...
+          handle.promise().line_number = sl.line();
+    }
+  };
+
+By the use of `std::source_location`, we could know the awaiting line number.
+Remember that we could locate the coroutine function by the address of the
+coroutine. So we could locate suspended points well in this manner.
+
+The downsides here is that the users need to pay additional runtime cost. But
+this is consistent with C++'s philosophy: "Pay for what you use".
+
+Get the asynchronous stack
+============
+
+Another important requirement for debugging coroutine is to print the
+asynchronous stack. In another words, we want to know the asynchronous caller
+of the coroutine. This is easy to make since many implementation of coroutine
+types would store 'std::coroutine_handle<> continuation' in the promise_type.
+The 'continuation' means the awaiting coroutine for the current coroutine
+generally, in another words, the asynchronous parent.
+
+Once we know the address of a coroutine, we could know its promise_type and
+the corresponding continuation so the asynchronous parent. And it is easy to
+know the asynchronous grandparent and so on. So we could print the whole
+asynchronous stack!
+
+The above logic should be easy to be recorded in a debug script.
+
+Get the living coroutines
+============
+
+Another requirement for debugging coroutines is "Is it possible to print all
+the corotuines like we could make it for threads".
+
+This is technically possible but not suggested since it is expensive. The
+solution is roughly like:
+
+.. code-block:: c++
+
+  inline std::unordered_set<void*> lived_coroutines;
+  // For all promise_type we want to record
+  class promise_type {
+  public:
+      promise_type() {
+          // Note to avoid data races
+          lived_coroutines.insert(std::coroutine_handle<promise_type>::from_promise(*this).address());
+      }
+      ~promise_type() {
+          // Note to avoid data races
+          lived_coroutines.erase(std::coroutine_handle<promise_type>::from_promise(*this).address());
+      }
+  };
+
+In the above code snippet, we could record all the addresses of lived
+coroutines in `lived_coroutines`. And if we know the addresses, we could know
+its function, its promise_type and other members of that frame. So we could
+print all the lived coroutines in above way.
+
+Please note that it is expensive and we need to avoid data races.
Index: clang/docs/index.rst
===================================================================
--- clang/docs/index.rst
+++ clang/docs/index.rst
@@ -49,6 +49,7 @@
    HLSLSupport
    ThinLTO
    APINotes
+   DebuggingCoroutines
    CommandGuide/index
    FAQ