This is an archive of the discontinued LLVM Phabricator instance.

[RFC] [Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2
ClosedPublic

Authored by ChuanqiXu on Feb 17 2021, 11:27 PM.

Details

Summary

Try to insert dbg.declare to entry.resume basic block in resume function. In this way, we could print alloca such as __promise in gdb/lldb under O2, which would be beneficial to debug coroutine program.

I am not sure whether this change is correct. I only test it by C++ programs locally. I would update the test case if needed. It could print __promise in all of the C++ coroutines I tested.

By the way, I think it is very beneficial to print the coroutine frame by keywords like __coro_frame. But I find it seems a little hard to implement.

Test-Plan: check-llvm

Diff Detail

Event Timeline

ChuanqiXu created this revision.Feb 17 2021, 11:27 PM
ChuanqiXu requested review of this revision.Feb 17 2021, 11:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2021, 11:27 PM

The document (https://llvm.org/docs/SourceLevelDebugging.html#llvm-dbg-declare) says that dbg.declare is deprecated since it may obscure the optimization.
I think it is OK to use dbg.declare here. Since once we built the frame, it is hard to optimize the coroutine frame from my view.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1223

If we search for the dbg.value uses for parameters here and insert dbg.declare uses for the parameters, we could debug some parameter under O2 (for example this). But I am not sure if this is a good idea.

This comment was removed by junparser.

Does this change keeps the debug behavior in O0? @aprantl any next step with coroutine debug info?

Does this change keeps the debug behavior in O0? @aprantl any next step with coroutine debug info?

In my local tests, it did keep debug behavior in O0.

Could you post an example that you are interested in at -O2 and show the diff of the LLVM IR for your example before/after this patch? I find it unintuitive that removing the shadow copy improved the debug info, but if don't need it and removing it doesn't pessimize the situation at -O0 we could remove it.

ChuanqiXu updated this revision to Diff 325973.Feb 23 2021, 8:59 PM

Update test example under O2.

Could you post an example that you are interested in at -O2 and show the diff of the LLVM IR for your example before/after this patch? I find it unintuitive that removing the shadow copy improved the debug info, but if don't need it and removing it doesn't pessimize the situation at -O0 we could remove it.

The diff at -O2 before and after this patch is as follows:

 define internal fastcc void @f.resume(%f.Frame* noalias nonnull align 8 dereferenceable(40) %FramePtr) {
 entry.resume:
-   call void @llvm.dbg.value(metadata %f.Frame* undef, metadata !19, metadata !DIExpression(DW_OP_deref, DW_OP_plus_uconst, 16)), !dbg !21
-   %vFrame = bitcast %f.Frame* %FramePtr to i8*, !dbg !21
+  %vFrame = bitcast %f.Frame* %FramePtr to i8*, !dbg !19
+  call void @llvm.dbg.declare(metadata %f.Frame* %FramePtr, metadata !21, metadata !DIExpression(DW_OP_plus_uconst, 16)), !dbg !19

Since the original version would emit the following codes in resume function:

define internal fastcc void @f.resume(%f.Frame* noalias nonnull align 8 dereferenceable(40) %FramePtr) {
 entry.resume:
   %FramePtr.debug = alloca %f.Frame*, align 8, !dbg !19
  call void @llvm.dbg.declare(metadata %f.Frame** %FramePtr.debug, metadata !21, metadata !DIExpression(DW_OP_deref, DW_OP_plus_uconst, 16)), !dbg !19

And %FramePtr.debug would be eliminated in SROA pass which would run after Coro-split pass at -O2. So the value for the corresponding declare would be undef finally.

I find it unintuitive that removing the shadow copy improved the debug info, but if don't need it and removing it doesn't pessimize the situation at -O0 we could remove it.

At O0, the difference before and after this patch would be:

define internal fastcc void @f.resume(%f.Frame* noalias nonnull align 8 dereferenceable(40) %FramePtr) {
 entry.resume:
-   %FramePtr.debug = alloca %f.Frame*, align 8, !dbg !19
-   call void @llvm.dbg.declare(metadata %f.Frame** %FramePtr.debug, metadata !21, metadata !DIExpression(DW_OP_deref, DW_OP_plus_uconst, 16)), !dbg !19
+  call void @llvm.dbg.declare(metadata %f.Frame* %FramePtr, metadata !21, metadata !DIExpression(DW_OP_plus_uconst, 16)), !dbg !19

Due to the limits of my knowledge for debug information, I can't prove the semantics remain the same. I could only test for it in C++. And for now it looks like this patch keeps the debug behavior.

Thanks for posting your analysis @ChuanqiXu! I would very much prefer to fix the pipeline such that after LowerDbgDeclare does its work the resulting dbg.value can be salvaged. I we can pull that off it would benefit the entire LLVM optimized code debug info story. I'll try to debug what's happening to see how feasible that is.

Thanks for posting your analysis @ChuanqiXu! I would very much prefer to fix the pipeline such that after LowerDbgDeclare does its work the resulting dbg.value can be salvaged. I we can pull that off it would benefit the entire LLVM optimized code debug info story. I'll try to debug what's happening to see how feasible that is.

From my point of view, the root reason why original version can't work at -O2 is that when the optimization passes (may be not only SROA ) eliminate/transform the alloca, these passes simply replace the value in the related dbg.declare/dbg.value with undef. So LowerDbgDeclare may not be the problem to me. So it seems like we need fix a lot optimization passes. It looks like a large and hard work.

aprantl added inline comments.Mar 3 2021, 2:02 PM
llvm/test/Transforms/Coroutines/coro-debug-O2.ll
12

this is missing a !dbg !8

I looked at your example in the debugger and the problem seems to be that SROA is being run before LowerDbgDeclare (from Local.cpp). I believe reversing the order might work. Alternatively we could just not create an alloca in coro::salvageDebugInfo if optimizations are enabled, but I am not sure if this is known to the pass.

ChuanqiXu updated this revision to Diff 328362.Mar 4 2021, 7:34 PM

Don't emit temporary alloca for arguments if optimization is enabled.

I looked at your example in the debugger and the problem seems to be that SROA is being run before LowerDbgDeclare (from Local.cpp). I believe reversing the order might work. Alternatively we could just not create an alloca in coro::salvageDebugInfo if optimizations are enabled, but I am not sure if this is known to the pass.

It looks OK to skip the creation for the alloca if optimizations are enabled.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2193

This style looks really ugly. Is there any way to fix it? Can we use a goto in the if statement?

I think it would better to keep the orginal, more descriptive name ReuseFrameSlot and explain in a comment that this is only true if optimizations are enabled. With that change, this is fine.

Added one more question inline.

llvm/lib/Transforms/Coroutines/CoroSplit.cpp
666

I don't understand this change. Is this covered by your test? Are we even emitting dbg.declares when optimizations are turned on?

ChuanqiXu updated this revision to Diff 332950.Mar 24 2021, 5:54 AM

Edit as the comment.

ChuanqiXu added inline comments.Mar 24 2021, 6:00 AM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
2196

After salvage the debug information, if the Storage is Argument (which means the FramePtr in resume function), we should insert DDI just in the entry block of the resume function. I am not clear about whether the design is suitable for swift and mlir.

llvm/lib/Transforms/Coroutines/CoroSplit.cpp
666

I moved this change to SalvageDebugInfo at CoroFrame.cpp.

666

With the the optimization are turned on, the debug information for promise (which would be part of coroutine by the standard) is dbg.declare. I am not clear if this is just a miss match or by design. As a result, it makes it easy to print the promise. And I think it should be OK to emit dbg.declare for promise. Since it would be put in the coroutine frame, I think the chance to optimize it is rare.

aprantl accepted this revision.Mar 30 2021, 10:08 AM
This revision is now accepted and ready to land.Mar 30 2021, 10:08 AM