Page MenuHomePhabricator

[LLDB] Devirtualize coroutine promise types for `std::coroutine_handle`
ClosedPublic

Authored by avogelsgesang on Aug 24 2022, 5:10 PM.

Details

Summary

This commit teaches the std::coroutine_handle pretty-printer to
devirtualize type-erased promise types. This is particularly useful to
resonstruct call stacks, either of asynchronous control flow or of
recursive invocations of std::generator. For the example recently
introduced by https://reviews.llvm.org/D132451, printing the __promise
variable now shows

(std::__coroutine_traits_sfinae<task, void>::promise_type) __promise = {
  continuation = coro frame = 0x555555562430 {
    resume = 0x0000555555556310 (a.out`task detail::chain_fn<1>() at llvm-nested-example.cpp:66)
    destroy = 0x0000555555556700 (a.out`task detail::chain_fn<1>() at llvm-nested-example.cpp:66)
    promise = {
      continuation = coro frame = 0x5555555623e0 {
        resume = 0x0000555555557070 (a.out`task detail::chain_fn<2>() at llvm-nested-example.cpp:66)
        destroy = 0x0000555555557460 (a.out`task detail::chain_fn<2>() at llvm-nested-example.cpp:66)
        promise = {
          ...
        }
      }
      result = 0
    }
  }
  result = 0
}

(shortened to keep the commit message readable) instead of

(std::__coroutine_traits_sfinae<task, void>::promise_type) __promise = {
  continuation = coro frame = 0x555555562430 {
    resume = 0x0000555555556310 (a.out`task detail::chain_fn<1>() at llvm-nested-example.cpp:66)
    destroy = 0x0000555555556700 (a.out`task detail::chain_fn<1>() at llvm-nested-example.cpp:66)
  }
  result = 0
}

Note how the new debug output reveals the complete asynchronous call
stack: our own function resumes chain_fn<1> which in turn will resume
chain_fn<2> and so on. Thereby this change allows users of lldb to
inspect the logical coroutine call stack without using any custom debug
scripts (although the display is still a bit clumsy. It would be nicer
to also integrate this into lldb's backtrace feature, but I don't know
how to do so)

The devirtualization currently works by introspecting the function
pointed to by the destroy pointer. (The resume pointer is not worth
much, given that for the final suspend point resume is set to a
nullptr. We have to use the destroy pointer instead.) We then look
for a __promise variable inside the destroy function. This
__promise variable is synthetically generated by LLVM, and looking at
its type reveals the type-erased promise_type.

This approach only works for clang-generated code, though. While gcc
also adds a _Coro_promise variable to the resume function, it does
not do so for the destroy function. However, we can't use the resume
function, as it will be reset to a nullptr at the final suspension
point. For the time being, I am happy with de-virtualization only working
for clang. A follow-up commit will further improve devirtualization and
also expose the variables spilled to the coroutine frame. As part of
this, I will also revisit gcc support.

Diff Detail

Event Timeline

avogelsgesang created this revision.Aug 24 2022, 5:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2022, 5:10 PM
avogelsgesang requested review of this revision.Aug 24 2022, 5:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2022, 5:10 PM
avogelsgesang edited reviewers, added: ChuanqiXu; removed: Chuanfeng.Aug 24 2022, 5:11 PM

LGTM basically. Thanks!

ChuanqiXu added a comment.EditedAug 24 2022, 7:30 PM

The only concern is that if it would be not so easy to read if there are too many levels? (Consider the 30levels example in D132451). If it would be better to print ... at certain level? Or this is already handled by lldb?

The only concern is that if it would be not so easy to read if there are too many levels? (Consider the 30levels example in D132451). If it would be better to print ... at certain level? Or this is already handled by lldb?

Agree, it would be better to limit the printing at some point. However, afaict, limiting the depth to which children are expanded cannot be done from inside the data formatter/synthetic children frontend. I am not sure if lldb already has a separate mechanism in place to limit printing in this case. If it does not, I think it makes sense to add it, but that would be a separate commit

The only concern is that if it would be not so easy to read if there are too many levels? (Consider the 30levels example in D132451). If it would be better to print ... at certain level? Or this is already handled by lldb?

Agree, it would be better to limit the printing at some point. However, afaict, limiting the depth to which children are expanded cannot be done from inside the data formatter/synthetic children frontend. I am not sure if lldb already has a separate mechanism in place to limit printing in this case. If it does not, I think it makes sense to add it, but that would be a separate commit

Yeah, I am not objecting this. I understand it is bad to make perfect the enemy of better.

labath added a subscriber: labath.Aug 25 2022, 8:11 AM

The only concern is that if it would be not so easy to read if there are too many levels? (Consider the 30levels example in D132451). If it would be better to print ... at certain level? Or this is already handled by lldb?

Agree, it would be better to limit the printing at some point. However, afaict, limiting the depth to which children are expanded cannot be done from inside the data formatter/synthetic children frontend. I am not sure if lldb already has a separate mechanism in place to limit printing in this case. If it does not, I think it makes sense to add it, but that would be a separate commit

The best (I think) mechanism we have is the "pointer depth" limit (defaulting to 1). It works fine on regular objects, but requires some care with synthetic children. The fact that it does not kick in here leads me to believe that the data formatter is creating the synthetic children as non pointer objects (e.g. by automatically dereferencing any nested pointers), even though the structures clearly contain some pointers inside. That is a problem. Not only it creates excessively large outputs, it can even cause lldb to hang (looping endlessly while trying to print "all" children) if the data structures it is trying to print are circular (e.g. due to corruption).

avogelsgesang added a comment.EditedAug 25 2022, 9:01 AM

leads me to believe that the data formatter is creating the synthetic children as non pointer objects (e.g. by automatically dereferencing any nested pointers), even though the structures clearly contain some pointers inside

Yes, that is correct. The formatter (introduced recently in https://reviews.llvm.org/D132415) does dereference the pointer internally. It replaces the

(std::coroutine_handle<>) hdl = {
  __handle_ = 0x55555555b2a0
}

by

(std::coroutine_handle<>) hdl = coro frame = 0x55555555b2a0 {
    resume = 0x0000555555555a10 (a.out`coro_task(int, int) at llvm-example.cpp:36)
    destroy = 0x0000555555556090 (a.out`coro_task(int, int) at llvm-example.cpp:36)
}

by dereferencing the pointer and directly exposing its children.

I could not find a good way to not dereference the pointer internally. Maybe you can help me find a better way?

I tried to just adjust the pointer type of the __handle_ member, but then I got multiple problems:

(1) Unnecessary clutter

(std::coroutine_handle<>) hdl = {
    __handle_ = coro frame = 0x55555555b2a0 {
        resume = 0x0000555555555a10 (a.out`coro_task(int, int) at llvm-example.cpp:36)
        destroy = 0x0000555555556090 (a.out`coro_task(int, int) at llvm-example.cpp:36)
    }
}

The additional __handle_ in it is just visual clutter.

(2) Not expanded by default

By default, if I write p hdl, LLDB did not expand its children and just displayed

(std::coroutine_handle<>) hdl = {
  __handle_ = 0x55555555b2a0
}

without an indication that there was more to see inside the pointer.

Also, when I write p hdl.__handle_, LLDB did not expand the pointer. Because now the hdl.__handle_ access is done with C++ semantics, returning a void* and hence the data formatter for std::exception_handle did not kick in

leads me to believe that the data formatter is creating the synthetic children as non pointer objects (e.g. by automatically dereferencing any nested pointers), even though the structures clearly contain some pointers inside

Yes, that is correct. The formatter (introduced recently in https://reviews.llvm.org/D132415) does dereference the pointer internally. It replaces the

(std::coroutine_handle<>) hdl = {
  __handle_ = 0x55555555b2a0
}

by

(std::coroutine_handle<>) hdl = coro frame = 0x55555555b2a0 {
    resume = 0x0000555555555a10 (a.out`coro_task(int, int) at llvm-example.cpp:36)
    destroy = 0x0000555555556090 (a.out`coro_task(int, int) at llvm-example.cpp:36)
}

The question isn't so much about the dereferencing of actual physical pointers, but more of a any "logical" pointers in the data structures and how they're being presented by the formatters. Most of our formatters dereference some kinds of pointers. For example, a std::set formatter may need to dereference hundreds of pointers to display all of the members. That is OK(*), because those elements are logically "inside" that set. Just like you cannot have a struct S hold itself as a member (only a pointer to S), you also cannot have a std::set contain itself (it would either have to be a pointer, or a set of a different type). And, I think, neither should a coroutine "contain" another coroutine. I don't know much about coroutines, but it seems like your goal is to format them like a linked list, so may the (synthetic) object which represents the "next" coroutine (promise?) in the list should be of a pointer type, forcing the lldb (and user) to do an actual dereference operation before viewing the next element. Maybe that could be achieved by changing the type of __promise.continuation.promise (I'm using the example from your patch description) from std::...::promise_type to std::...::promise_type * ?

(*) However, the formatter should be wary of loops due to corrupted data structures. That's why our std::list formatter has an internal loop detection to prevent it from getting stuck on corrupted lists. AFAIK, our map formatter does not have that, probably because noone ran into that problem yet.

I don't know much about coroutines, but it seems like your goal is to format them like a linked list

Yes, and no. Coroutines are a compiler-provided type-erasure mechanism which (among other things) erase the type of the contained promise.
One common use case is that the library types (std::generator, the upcoming std::lazy, user-defined types, ...) store a "next"/"continuation" pointer inside that promise.

so maybe the (synthetic) object which represents the "next" coroutine (promise?) in the list should be of a pointer type, forcing the lldb (and user) to do an actual dereference operation before viewing the next element. Maybe that could be achieved by changing the type of __promise.continuation.promise (I'm using the example from your patch description) from std::...::promise_type to std::...::promise_type * ?

Good idea. That should work!

I will prepare a separate review which uses this approach. I would prefer to still first land this patch, then D132735, because I want to reuse helper functions from those two commits for making the proposed change

avogelsgesang added a comment.EditedAug 26 2022, 7:05 AM

I don't know much about coroutines, but it seems like your goal is to format them like a linked list

actually, my preferred goal would be to show them as a logical, user-level thread. Such that you can type

thread backtrace cxxcoro:0x55555555b2a0

to get the backtrace of the logical coroutine thread routed at the coroutine_handle at address 0x55555555b2a0, or maybe even

thread backtrace cxxcoro:hdl

where hdl is evaluated as an expression to identify the coroutine handle from where to dump the backtrace.

Also, it would be neat if those logical threads show up in thread list...

But it seems there is currently no infrastructure yet in LLDB for logical threads provided by LanguageRuntime plugins.

I guess at some point, I will write an RFC about that on discourse. But before that, I will first do some more exploration on how LLDB works and I will first grab the low-hanging fruits (like a data formatter for std::coroutine_handle and patching LLVM to emit the necessary debug info)

disable tests on gcc

@labath D132815 addresses your point: With that change, the promise member now is typed as a pointer.

I would prefer to land the patches in the order D132624 (this review), D132735, D132815.

@labath D132815 addresses your point: With that change, the promise member now is typed as a pointer.

I would prefer to land the patches in the order D132624 (this review), D132735, D132815.

You could sort these revisions by 'Edit Related Revisions' automatically.

@labath @aprantl Thank you for your reviews on https://reviews.llvm.org/D132815 and https://reviews.llvm.org/D132735
I would appreciate if you also have time to review this change here, as it is a pre-requisite for the other two changes :)

aprantl accepted this revision.Nov 10 2022, 3:13 PM
aprantl added a reviewer: Michael137.

Mechanically this looks good to me.

This revision is now accepted and ready to land.Nov 10 2022, 3:15 PM
avogelsgesang reopened this revision.Nov 14 2022, 6:17 PM
This revision is now accepted and ready to land.Nov 14 2022, 6:17 PM

Fix ownership of CompilerType between "scratch typesystem" and "module type system"

@aprantl I still had to adjust the code slightly to copy the type from the ASTContext for '.../a.out' to the scratch ASTContext (without this, my code was triggering an assert).
See https://reviews.llvm.org/D132624?vs=474793&id=475327 for the related changes.
Because I have no prior experience with ClangASTImporter and the usage of multiple different TypeSystemClang instances, I would appreciate if you could take another quick look

I temporarily reverted this via

commit 2b2f2f66141d52dc0d3082ddd12805d36872a189
Author: Jason Molenda <jason@molenda.com>
Date:   Fri Nov 25 12:28:28 2022 -0800

    Revert "[LLDB] Recognize `std::noop_coroutine()` in `std::coroutine_handle` pretty printer"
    
    This reverts commit 4346318f5c700f4e85f866610fb8328fc429319b.
    
    This test case is failing on macOS, reverting until it can be
    looked at more closely to unblock the macOS CI bots.
    
      File "/Volumes/work/llvm/llvm-project/lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/coroutine_handle/TestCoroutineHandle.py", line 121, in test_libcpp
        self.do_test(USE_LIBCPP)
      File "/Volumes/work/llvm/llvm-project/lldb/test/API/functionalities/data-formatter/data-formatter-stl/generic/coroutine_handle/TestCoroutineHandle.py", line 45, in do_test
        self.expect_expr("noop_hdl",
      File "/Volumes/work/llvm/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 2441, in expect_expr
        value_check.check_value(self, eval_result, str(eval_result))
      File "/Volumes/work/llvm/llvm-project/lldb/packages/Python/lldbsuite/test/lldbtest.py", line 306, in check_value
        test_base.assertEqual(self.expect_summary, val.GetSummary(),
    AssertionError: 'noop_coroutine' != 'coro frame = 0x100004058'
    - noop_coroutine+ coro frame = 0x100004058 : (std::coroutine_handle<void>) $1 = coro frame = 0x100004058 {
      resume = 0x0000000100003344 (a.out`___lldb_unnamed_symbol223)
      destroy = 0x0000000100003344 (a.out`___lldb_unnamed_symbol223)
    }
    Checking SBValue: (std::coroutine_handle<void>) $1 = coro frame = 0x100004058 {
      resume = 0x0000000100003344 (a.out`___lldb_unnamed_symbol223)
      destroy = 0x0000000100003344 (a.out`___lldb_unnamed_symbol223)
    }

    Those lldb_unnamed_symbols are synthetic names that ObjectFileMachO
    adds to the symbol table, most often seen with stripped binaries,
    based off of the function start addresses for all the functions -
    if a function has no symbol name, lldb adds one of these names.
    This change was originally landed via https://reviews.llvm.org/D132624

@jasonmolenda I am a bit confused about your revert.
Did you indeed revert this change here? Afaict, you reverted https://reviews.llvm.org/D132735 and https://reviews.llvm.org/D132815, but left this commit in place. I did not actually read through your code changes, though, I only looked at the commit messages so far. Is my understanding of which commits you actually reverted correct?

@jasonmolenda I am a bit confused about your revert.
Did you indeed revert this change here? Afaict, you reverted https://reviews.llvm.org/D132735 and https://reviews.llvm.org/D132815, but left this commit in place. I did not actually read through your code changes, though, I only looked at the commit messages so far. Is my understanding of which commits you actually reverted correct?

Ah sigh. :( I reverted both https://reviews.llvm.org/D132624 and https://reviews.llvm.org/D132735 -- failed to comment here in https://reviews.llvm.org/D132624 that I reverted the change, and put the comment about https://reviews.llvm.org/D132735 in the wrong phabracator. I don't know how I got the two mixed up in so many ways ;) but they are both backed out right now and we have a green status on the macOS Incremental CI bot https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/

The mixup I can see, but I don't know how I failed to comment in this one about it being reverted. I must not have hit a Submit button, apologies.