Index: clang/docs/DebuggingCoroutines.rst =================================================================== --- /dev/null +++ clang/docs/DebuggingCoroutines.rst @@ -0,0 +1,371 @@ +======= +Debugging C++ Coroutines +======= + +.. contents:: + :local: + +Introduction +============ + +Due to some reasons, especially for performance, the implementation for the C++ +corotuines in Clang compiler consists of 2 parts: the clang part for semantic +analyzing and the LLVM part for constructing and otpimizing the coroutine +frame. However, the design hurts the debugability of C++ coroutines. Since the +compiler would generate debug information in the frontend generally due to the +debug information is highly language specific. However, the frontend can't emit +debug information for coroutine frames since the coroutine frames are +constructed in the middle end. To mitigare the painful problem, we tried to +generate debug information in the middle end. This document tries to tell +how to use the generated debug information to help debug. + +Note: due to many language related information is missing in the middle end, +we might not be able to induce as many information as we want. + +Terminology +============ + +Due to C++20 coroutines is a new feature and the language specification talks +it with the smallest words, the terminology used now is in chaos. The section +tries to specify the terminology used in this document. + +coroutine type +--------------- +Here we say a coroutine type is a type which could be the return type of a +coroutine function. (A coroutine function is a function which contains any of +'co_await', 'co_yield' and 'co_return'). 'Task' and 'Generator' are well known +coroutine types. + +coroutine +--------------- + +Technically, a coroutine is a suspendable function. However, we generally say a +coroutine as if a coroutine instance in practice. For example: + +.. code-block:: c++ + + std::vector Coros; // Task is a coroutine type. + for (int i = 0; i < 3; i++) + Coros.push_back(CoroTask()); // CoroTask is a coroutine function, which + // would return a coroutine type 'Task'. + +In practice, we generally say "Coros contains 3 coroutines" after the above +code snippet. But this is not strictly right. Maybe we should say "Coros +contain 3 coroutine instances" or "Coros contains 3 coroutine objects". + +In the document, we would use the term 'coroutine' as if 'coroutine instances' +since both the terms 'coroutine instance' and 'coroutine object' is not well +defined. + +coroutine frame +--------------- + +In the language specification, it said 'coroutine state' to describe the +the allocated storage. And in the compiler side, we generally use 'coroutine +frame' to describe the generated data structure which keeps all the needed +information. + +The structure of coroutine frames +============ + +The structure of coroutine frames is fixed as: + +.. code-block:: c++ + + struct { + void (*__r)(); // function pointer to resume function + void (*__d)(); // function pointer to destroy function + promise_type; // Corresponding promise_type + ... // Any other needed information + } + +Since we could get the function name by the function address in the debugger, +we could know the coroutine function once we get the address of a coroutine +frame. + +Print promise_type +============ + +Every coroutine has a promise_type. The promise_type is shared by common +coroutine types. To print a promise_type in the debugger, if we stop at +a breakpoint in a coroutine, we could print the promise_type by: + +.. parsed-literal:: + + p __promise + +And if we want to print a promise_type of other coroutine, we could print it +as long as we know the address of that coroutine frame. For example, if the +address of the wanted coroutine frame is `0x416eb0`, and the type of the wanted +promise_type is `task::promise_type`. We could print the promise_type by: + +.. parsed-literal:: + + p (task::promise_type)*(0x416eb0+0x10) + +Since the offset of promise_type from coroutine frame is guaranteed to be 16 by +the ABI. + +Print coroutine frames +============ + +We would generate debug information for the coroutine frame in the middle end. +So that we could print the coroutine frame. Similar to promise_type, if we stop +at a breakpoint in a coroutine, we could print the coroutine frame by: + +.. parsed-literal:: + + p __coro_frame + +And if we want to print a coroutine frame of other coroutine and we know the +address is '0x418eb0', we could get it by (example in gdb): + +.. parsed-literal:: + + (gdb) # Get the address of coroutine frame + (gdb) p/x *0x418eb0 + $1 = 0x4019e0 + (gdb) # Get the linkage name for the coroutine + (gdb) x 0x4019e0 + 0x4019e0 <_ZL9coro_taski>: 0xe5894855 + (gdb) # The coroutine frame type is 'linkage name + __coro_frame_ty' + (gdb) p (_ZL9coro_taski__coro_frame_ty)*(0x418eb0) + $2 = {__resume_fn = 0x4019e0 , __destroy_fn = 0x402000 , __promise = {...}, ...} + +The knowledge behind above code is: +(1) The name of the debug type of coroutine frame is the 'linkage_name' plus +'__coro_frame_ty' suffix. Since each coroutine function should have the same +coroutine type. +(2) We could get the coroutine function name by the address of the coroutine +frame. + +We could ease the above commands in debug scripts. + +Examples to print coroutine frames +--------------- + +Here we use the example to show the printing result for coroutine frames: + +.. code-block:: c++ + + #include "coroutine" + #include + + struct task{ + struct promise_type { + task get_return_object() { return std::coroutine_handle::from_promise(*this); } + std::suspend_always initial_suspend() { return {}; } + std::suspend_always final_suspend() noexcept { return {}; } + void return_void() noexcept {} + void unhandled_exception() noexcept {} + + int count = 0; + }; + + void resume() noexcept { + handle.resume(); + } + + task(std::coroutine_handle hdl) : handle(hdl) {} + ~task() { + if (handle) + handle.destroy(); + } + + std::coroutine_handle<> handle; + }; + + class await_counter : public std::suspend_always { + public: + template + void await_suspend(std::coroutine_handle handle) noexcept { + handle.promise().count++; + } + }; + + static task coro_task(int v) { + int a = v; + co_await await_counter{}; + a++; + std::cout << a << "\n"; + a++; + std::cout << a << "\n"; + a++; + std::cout << a << "\n"; + co_await await_counter{}; + a++; + std::cout << a << "\n"; + a++; + std::cout << a << "\n"; + } + + int main() { + task t = coro_task(43); + t.resume(); + t.resume(); + t.resume(); + return 0; + } + +In debug mode (O0 + g), the printing result would be something like: + +.. parsed-literal:: + + {__resume_fn = 0x4019e0 , __destroy_fn = 0x402000 , __promise = {count = 1}, v = 43, a = 45, __coro_index = 1 '\001', struct_std__suspend_always_0 = {__int_8 = 0 '\000'}, + class_await_counter_1 = {__int_8 = 0 '\000'}, class_await_counter_2 = {__int_8 = 0 '\000'}, struct_std__suspend_always_3 = {__int_8 = 0 '\000'}} + +We could find that the result is pretty clear. We could find the name and value +for 'v' and 'a' clearly. Also we could the temporary values for `await_counter` +and `std::suspend_always`. The `__coro_index` means the index of suspend points +the coroutine suspended at. + +However, things changed if we turn optimizations on: + +.. parsed-literal:: + + {__resume_fn = 0x401280 , __destroy_fn = 0x401390 , __promise = {count = 1}, __int_32_0 = 43, __coro_index = 1 '\001'} + +Many unused values get optimized out. However, the name for local variable 'a' +gets optimized out too. We only know it is an int. Although we could know +`__int_32_0` means 'a' in this simple case, it goes unclear in complex case. + +Another important note with optimization is that we shouldn't bind the variable +in the codes with the slot in the frame tightly. Here is the example: + +.. code-block:: c++ + + static task coro_task(int v) { + int a = v; + co_await await_counter{}; + a++; // __int_32_0 is 43 here + std::cout << a << "\n"; + a++; // __int_32_0 is still 43 here + std::cout << a << "\n"; + a++; // __int_32_0 is still 43 here! + std::cout << a << "\n"; + co_await await_counter{}; + a++; // __int_32_0 is still 43 here!! + std::cout << a << "\n"; + a++; // Why is __int_32_0 still 43 here? + std::cout << a << "\n"; + } + +In the above example, if we choose to debug step by step, we could find the +value of `__int_32_0` in coroutine frame is not right. It is always 43 after +the initial suspend. It is really surprsing. The reason behind is that the +compiler would try to eliminate load/store as much as possible. So the above +code would be optimized to: + +.. code-block:: c++ + + static task coro_task(int v) { + store v to __int_32_0 in the frame + co_await await_counter{}; + a = load __int_32_0 + std::cout << a+1 << "\n"; + std::cout << a+2 << "\n"; + std::cout << a+3 << "\n"; + co_await await_counter{}; + a = load __int_32_0 + std::cout << a+4 << "\n"; + std::cout << a+5 << "\n"; + } + +It makes sense now why the value of `__int_32_0` is always 43. Since +`__int_32_0` is not 'a'. The `__int_32_0` is just a helper variable created by +the compiler. So the tip here is to not think the variables in coroutine +frame is equal to the variables in the source C++ codes. They are related but +not equal. + +Get the suspended points +============ + +An important and general requirement of debugging in coroutines is to know the +suspended points. In other words, we want to know where the coroutine is +suspended and what it is awaiting for. + +To make the simple solution, we could observe the `__coro_index` variable in +the coroutine frame. It works really well for simple straight line cases like +the above one. + +But it might not be so simple in really complex situations. In these cases, +we could know the suspending line number by coworking with coroutine libraries. + +For example: + +.. code-block:: c++ + + // For all the promise_type we want: + class promise_type { + ... + + unsigned line_number = 0xffffffff; + }; + + #include + + // For all the awaiter type we need: + class awaiter { + ... + template + void await_suspend(std::coroutine_handle handle, + std::source_location sl = std::source_location::current) { + ... + handle.promise().line_number = sl.line(); + } + }; + +By the use of `std::source_location`, we could know the awaiting line number. +Remember that we could locate the coroutine function by the address of the +coroutine. So we could locate suspended points well in this manner. + +The downsides here is that the users need to pay additional runtime cost. But +this is consistent with C++'s philosophy: "Pay for what you use". + +Get the asynchronous stack +============ + +Another important requirement for debugging coroutine is to print the +asynchronous stack. In another words, we want to know the asynchronous caller +of the coroutine. This is easy to make since many implementation of coroutine +types would store 'std::coroutine_handle<> continuation' in the promise_type. +The 'continuation' means the awaiting coroutine for the current coroutine +generally, in another words, the asynchronous parent. + +Once we know the address of a coroutine, we could know its promise_type and +the corresponding continuation so the asynchronous parent. And it is easy to +know the asynchronous grandparent and so on. So we could print the whole +asynchronous stack! + +The above logic should be easy to be recorded in a debug script. + +Get the living coroutines +============ + +Another requirement for debugging coroutines is "Is it possible to print all +the corotuines like we could make it for threads". + +This is technically possible but not suggested since it is expensive. The +solution is roughly like: + +.. code-block:: c++ + + inline std::unordered_set lived_coroutines; + // For all promise_type we want to record + class promise_type { + public: + promise_type() { + // Note to avoid data races + lived_coroutines.insert(std::coroutine_handle::from_promise(*this).address()); + } + ~promise_type() { + // Note to avoid data races + lived_coroutines.erase(std::coroutine_handle::from_promise(*this).address()); + } + }; + +In the above code snippet, we could record all the addresses of lived +coroutines in `lived_coroutines`. And if we know the addresses, we could know +its function, its promise_type and other members of that frame. So we could +print all the lived coroutines in above way. + +Please note that it is expensive and we need to avoid data races. Index: clang/docs/index.rst =================================================================== --- clang/docs/index.rst +++ clang/docs/index.rst @@ -49,6 +49,7 @@ HLSLSupport ThinLTO APINotes + DebuggingCoroutines CommandGuide/index FAQ