Page MenuHomePhabricator

ChuanqiXu (Chuanqi Xu)
User

Projects

User does not belong to any projects.

User Details

User Since
Jun 22 2020, 11:08 PM (41 w, 6 d)

Recent Activity

Today

ChuanqiXu updated the diff for D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.

Address the comments. Now we would only collect dbg.values who would't change the layout of the frame.

Mon, Apr 12, 4:31 AM · debug-info, Restricted Project

Yesterday

ChuanqiXu added inline comments to D99179: [RFC] [Coroutines] Enable printing coroutine frame in debugger if program is compiled with -g.
Sun, Apr 11, 11:54 PM · debug-info, Restricted Project
ChuanqiXu updated the diff for D99179: [RFC] [Coroutines] Enable printing coroutine frame in debugger if program is compiled with -g.

Address the comments and enhance the test case.

Sun, Apr 11, 11:50 PM · debug-info, Restricted Project
ChuanqiXu added a comment to D100282: [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass.

Ah, if the pass does more than just setting the attribute, then sure, it makes sense to keep it. But I do think we should be requiring the attribute to be added by frontends, since it's really an IR invariant that it's present on all unlowered coroutines.

By the way, it also sets these attributes for other types of coroutines (retcon and async). So the down-side would be then we need to do this for all front-ends (clang and swift).

Sun, Apr 11, 10:55 PM · Restricted Project, Restricted Project
ChuanqiXu added a comment to D100282: [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass.

This patch looks OK to me.

Sun, Apr 11, 10:46 PM · Restricted Project, Restricted Project
ChuanqiXu added a comment to D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.

I am also a bit skeptical about this patch.
Specifically I agree that dbg info should not affect the coroutine frame.
From what I can tell, dbg intrinsics are location insensitive, i.e. they can be put at any location. (correct me if I am wrong)
So a use of any value by dbg intrinsics should not cause the value to be put on the frame.
Perhaps we could first move all dbg intrinsics to a dedicated location (e.g. right after corobegin) before creating the frame, and copy them during function cloning?

Sun, Apr 11, 8:41 PM · debug-info, Restricted Project

Thu, Apr 8

ChuanqiXu added inline comments to D99179: [RFC] [Coroutines] Enable printing coroutine frame in debugger if program is compiled with -g.
Thu, Apr 8, 8:46 PM · debug-info, Restricted Project
ChuanqiXu added a comment to D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.

I would recommend to pick a name for -enhance-debug-with-coroutine that does not contain debug. The name should make clear that this changes codegen, so nobody gets the idea of turning it as part of a -g option. For example, something like -coroutine-spill-all-locals.

Thu, Apr 8, 8:33 PM · debug-info, Restricted Project
ChuanqiXu accepted D99693: Update the linkage name of coro-split functions where applicable.

LGTM.

Thu, Apr 8, 6:43 PM · Restricted Project
ChuanqiXu updated the diff for D99179: [RFC] [Coroutines] Enable printing coroutine frame in debugger if program is compiled with -g.

Rebase with trunk and address the comments.

Thu, Apr 8, 12:30 AM · debug-info, Restricted Project

Wed, Apr 7

ChuanqiXu added a comment to D99179: [RFC] [Coroutines] Enable printing coroutine frame in debugger if program is compiled with -g.

Normally we don't want passes making educated guesses about things and codifying that into debug info, but in this specific instance i don't think it's unreasonable, since the debug frame is an artificial type. Ideally you would find the debug info that's present and use that to describe the frame layout as much as possible. The way frame layout works should allow that kind of reversal well enough.

The biggest conceptual problem is that, for the fields that you can't find debug info for, you'll have to either come up with a C type from the LLVM type or leave them out the layout entirely. The latter seems like it would make this much less useful, but the former is really problematic because the sizes of C's fundamental types are so unportable.

Wed, Apr 7, 10:50 PM · debug-info, Restricted Project
ChuanqiXu updated the diff for D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.

Rebase with trunk and add option to control whether we would collect variables used only by dbg.vaues.
Although we can use materializing process to reduce the extra space, we can't prove we could get the same result with -g or not.
Previously, I think if we can't get the same result, the materialization part should take responsibility. However, I find that a pattern recently:
State before:

%a = alloca  ...
; ... some uses who wouldn't cross suspend points
call to coro.suspend()
; ... alternative path
store %v to %a
dbg.value(metadata %v, dbg variable for %a, ...

Then after some optimization, the store in some path would be eliminated:

%a = alloca  ...
; ... some uses who wouldn't cross suspend points
call to coro.suspend()
; ... alternative path
dbg.value(metadata %v, dbg variable for %a, ...
Wed, Apr 7, 2:47 AM · debug-info, Restricted Project

Tue, Apr 6

ChuanqiXu added a comment to D99179: [RFC] [Coroutines] Enable printing coroutine frame in debugger if program is compiled with -g.

Testing looks a bit better - could probably be more comprehensive.

General architecture of the change: Yeah, I dunno about creating new structures and things (& corresponding debug info) during optimizations, bit unfortunate/not something we've done before - but I don't know of much better, given my cursory understanding of the current architecture & lowering of the coroutines this seems understandable if unfortunate.

Happy for @aprantl to continue with review for the broader design, but I'll keep reading & chime in if I have further thoughts.

Perhaps if restructuring the passes is not practical - maybe the pass could be parameterized by the frontend that's creating the pass pipeline? insert some kind of callback... hmm, nope, that doesn't work because you might -disable-llvm-optzns to get LLVM IR out from clang then use llc (or disable frontend optimizations in an LTO build, etc) - yeah, not sure.

Tue, Apr 6, 7:16 PM · debug-info, Restricted Project
ChuanqiXu updated the diff for D99179: [RFC] [Coroutines] Enable printing coroutine frame in debugger if program is compiled with -g.

Enhance Test Cases as the comment address.

Tue, Apr 6, 4:26 AM · debug-info, Restricted Project

Thu, Apr 1

ChuanqiXu added inline comments to D93838: [LLVM] [SCCP] : Add Function Specialization pass.
Thu, Apr 1, 8:40 PM · Restricted Project
ChuanqiXu added a comment to D93838: [LLVM] [SCCP] : Add Function Specialization pass.

I have updated the full patch along with the cost model. I have also shared the perf numbers for SPEC CPU 2017 benchmarks in Graviton2 with various modes of optimization enabled :

This version of the patch contains the FunctionSpecializer implementation inside SCCP.cpp. It is not dependent on the D93762 refactoring of SCCPSolver just for review purpose.

Thu, Apr 1, 7:13 PM · Restricted Project
ChuanqiXu added a comment to D40477: Enable Partial Inlining by default.
In D40477#2664384, @vsk wrote:

I was referring to CodeExtractor. The concerns I had about using it were addressed in D53267, D72801, D72795, and some other follow ups.

Thu, Apr 1, 7:08 PM
ChuanqiXu added inline comments to D99693: Update the linkage name of coro-split functions where applicable.
Thu, Apr 1, 6:57 PM · Restricted Project
ChuanqiXu added a comment to D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.

@aprantl gentle ping~

Thu, Apr 1, 4:12 AM · debug-info, Restricted Project

Wed, Mar 31

ChuanqiXu added a comment to D40477: Enable Partial Inlining by default.
In D40477#1302461, @vsk wrote:

Imho CodeExtractor needs a little more work before it's ready to be on-by-default. There are two main blockers: missing debug info support, and a missing whitelist of extract-able intrinsics. The latter poses a high risk for miscompiles (see llvm.org/PR39545, llvm.org/PR39671).

Wed, Mar 31, 11:06 PM
ChuanqiXu added a comment to D99693: Update the linkage name of coro-split functions where applicable.

I am curious about the test case for coro-debug.ll. The ABI used in this test case is Switch ABI. And this patch still affects it.

Wed, Mar 31, 7:04 PM · Restricted Project
Herald added a reviewer for D65169: [WIP] LLVM Optimization Remark for Attributes: aaron.ballman.
Wed, Mar 31, 4:28 AM · Restricted Project, Restricted Project
ChuanqiXu updated the diff for D99179: [RFC] [Coroutines] Enable printing coroutine frame in debugger if program is compiled with -g.

Only generating debug information for coroutine frame by searching DW_LANG_C_plus_plus derivatives.

Wed, Mar 31, 12:38 AM · debug-info, Restricted Project
ChuanqiXu updated the diff for D99179: [RFC] [Coroutines] Enable printing coroutine frame in debugger if program is compiled with -g.

Only generates debug information for coroutine frame by searching DW_LANG_C_plus_plusderivatives.

Wed, Mar 31, 12:36 AM · debug-info, Restricted Project

Tue, Mar 30

ChuanqiXu committed rGeb51dd719f34: [Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the… (authored by ChuanqiXu).
[Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the…
Tue, Mar 30, 7:38 PM
ChuanqiXu closed D96938: [RFC] [Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2.
Tue, Mar 30, 7:38 PM · Restricted Project

Thu, Mar 25

ChuanqiXu added a comment to D99179: [RFC] [Coroutines] Enable printing coroutine frame in debugger if program is compiled with -g.

Maybe generating a C-language DWARF struct to describe the frame layout could be triggered by the presence of C-language debug info?

The code could check for the presence of one of the DW_LANG_C_plus_plus derivatives (cf. isCPlusPlus() in Dwarf.h in the containing DICompileUnit to check that this is indeed C++ and just not run in any other case.

Thu, Mar 25, 12:39 AM · debug-info, Restricted Project

Wed, Mar 24

ChuanqiXu committed rG20b4f484d16f: [Driver] Add -fno-split-stack (authored by ChuanqiXu).
[Driver] Add -fno-split-stack
Wed, Mar 24, 11:19 PM
ChuanqiXu closed D99245: [Driver] Add -fno-split-stack.
Wed, Mar 24, 11:19 PM · Restricted Project
ChuanqiXu added a comment to D99245: [Driver] Add -fno-split-stack.

If we want to improve test coverage, we can test -fno-split-stack -fsplit-stack passes -fsplit-stack. But doing this for every option may be excessive...
I have filed the libgcc morestack.S .init_array.0 feature request: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99759

Wed, Mar 24, 11:04 PM · Restricted Project
ChuanqiXu added inline comments to D99245: [Driver] Add -fno-split-stack.
Wed, Mar 24, 9:28 PM · Restricted Project
ChuanqiXu added inline comments to D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.
Wed, Mar 24, 8:56 PM · debug-info, Restricted Project
ChuanqiXu added a comment to D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.

When looping over the debug users of instructions and replacing the dbg.values operands, you should now use the "replaceVariableLocationOp" helper instead -- this avoids re-creating a dbg.value intrinsic, and should handle variadic variable locations (the patches for which are 95% landed) seamlessly.

Overall the aim of the patch makes sense to me, it's storing values used by debug intrinsics in the coroutine frame for later retrieval right? Is there a risk that some later coroutine optimisation will try to optimise the frame and overwrite the stored values-for-debug-users -- this can happen with stack slot colouring at the other end of the compiler.

Wed, Mar 24, 8:55 PM · debug-info, Restricted Project
ChuanqiXu updated the diff for D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.

Use replaceVariableLocationOp and remove EnhanceDebugability option.

Wed, Mar 24, 8:53 PM · debug-info, Restricted Project
ChuanqiXu updated the diff for D99245: [Driver] Add -fno-split-stack.

Merge -fsplit-stack and -fno-split-stack into one in options.td. And rename -split-stacks into -fsplit-stack.

Wed, Mar 24, 7:16 PM · Restricted Project
ChuanqiXu added inline comments to D99245: [Driver] Add -fno-split-stack.
Wed, Mar 24, 6:03 AM · Restricted Project
ChuanqiXu added inline comments to D96938: [RFC] [Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2.
Wed, Mar 24, 6:00 AM · Restricted Project
ChuanqiXu updated the diff for D96938: [RFC] [Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2.

Edit as the comment.

Wed, Mar 24, 5:54 AM · Restricted Project
ChuanqiXu added a comment to D98689: [X86] [split-stack] Add an option to allocate extra space for stack split functions.

gentle ping~

Wed, Mar 24, 2:53 AM · Restricted Project
ChuanqiXu abandoned D98698: [RFC] [lld] [split-stack] Rewrite the reserved space for __morestack_non_split.
Wed, Mar 24, 2:52 AM
ChuanqiXu updated the diff for D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.

Materialize bit cast for dbg.values as suggested.

Wed, Mar 24, 2:38 AM · debug-info, Restricted Project
ChuanqiXu updated the diff for D99245: [Driver] Add -fno-split-stack.

Edit as suggestion.

Wed, Mar 24, 2:36 AM · Restricted Project
ChuanqiXu added inline comments to D99245: [Driver] Add -fno-split-stack.
Wed, Mar 24, 2:35 AM · Restricted Project
ChuanqiXu requested review of D99245: [Driver] Add -fno-split-stack.
Wed, Mar 24, 1:57 AM · Restricted Project
ChuanqiXu added a comment to D99179: [RFC] [Coroutines] Enable printing coroutine frame in debugger if program is compiled with -g.

Trying to agree on frame layout across multiple passes seems really brittle to me, since nearly any change to the function could trigger a change in the frame layout.

Wed, Mar 24, 1:05 AM · debug-info, Restricted Project

Tue, Mar 23

ChuanqiXu added a comment to D99227: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines.

Only one problem I had for emitting lifetime markers even at O0 is that would action make allocas to be optimized even at O0? If so, I wonder if it confuses programmers since they may find some variables disappear surprisingly. Or there would be no optimization since every function would be marked with optnone attribute. I am not sure about this.

It will only cause variables to be put on the stack instead of on the frame, which shouldn't affect developer's view?

Tue, Mar 23, 11:12 PM · Restricted Project
ChuanqiXu added a comment to D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.

I am not sure if %0 = bit cast %a to ... is the majority redundant fields of coroutine frame if we enable this patch. My original idea is to add a rematerialization process to reduce the coroutine frame like register allocation. However, it maybe a little hard and benefit less. It sounds more easy and beneficial to use original pointer by value tracking. Then I think this problem may be not introduced by this patch. Here is my example:

%0 = bitcast %a to ...
call @llvm.coro.suspend 
; ....
use of %a
use of %0

Then both %a and %0 would be put in the frame, which is totally redundant. We could see if there are examples other than bit cast in other patch.

We already do materialization though. I wonder why that doesn't cover the case? Can we take advantage of that?

Tue, Mar 23, 11:04 PM · debug-info, Restricted Project
ChuanqiXu committed rG3b83590cb25b: [NFC] [Support] Fix unconsistent comment with codes for ExtendSigned (authored by ChuanqiXu).
[NFC] [Support] Fix unconsistent comment with codes for ExtendSigned
Tue, Mar 23, 10:59 PM
ChuanqiXu added a comment to D99179: [RFC] [Coroutines] Enable printing coroutine frame in debugger if program is compiled with -g.

From a layering perspective, this is not something we would usually do. One immediate problem is that LLVM should be mostly language agnostic. For example the Swift compiler also uses to CoroSplit pass, and you wouldn't want to inject C++ type information into Swift functions. Maybe it would be cleaner for Clang to register an LLVM pass to run after CoroSplit that inserts the C++-specific debug info?

Tue, Mar 23, 8:24 PM · debug-info, Restricted Project
ChuanqiXu added a comment to D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.

Curious, is %0 = bitcast %a to ... the majority of the cases we are missing? If so, could we turn all the dbg.value instructions to use their original pointer by stripping pointer cast?

Tue, Mar 23, 8:14 PM · debug-info, Restricted Project
ChuanqiXu added a comment to D99227: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines.

Only one problem I had for emitting lifetime markers even at O0 is that would action make allocas to be optimized even at O0? If so, I wonder if it confuses programmers since they may find some variables disappear surprisingly. Or there would be no optimization since every function would be marked with option attribute. I am not sure about this.

Tue, Mar 23, 7:58 PM · Restricted Project
ChuanqiXu requested review of D99179: [RFC] [Coroutines] Enable printing coroutine frame in debugger if program is compiled with -g.
Tue, Mar 23, 5:11 AM · debug-info, Restricted Project

Mon, Mar 22

ChuanqiXu requested review of D99067: [RFC] [Coroutines] Split Ramp Function.
Mon, Mar 22, 3:57 AM
ChuanqiXu added inline comments to D97533: [Clang][Coroutine][DebugInfo] remove useless debug info for coroutine parameters.
Mon, Mar 22, 12:58 AM · Restricted Project, debug-info

Sun, Mar 21

ChuanqiXu added a comment to D96938: [RFC] [Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2.

@aprantl gentle ping~

Sun, Mar 21, 8:38 PM · Restricted Project
ChuanqiXu committed rG55486161fa0b: [ASTMatcher] Add AST Matcher support for C++20 coroutine keywords (authored by ChuanqiXu).
[ASTMatcher] Add AST Matcher support for C++20 coroutine keywords
Sun, Mar 21, 7:28 PM
ChuanqiXu closed D96316: [ASTMatcher] Add AST Matcher support for C++20 coroutine keywords.
Sun, Mar 21, 7:28 PM · Restricted Project

Fri, Mar 19

ChuanqiXu added a comment to D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.

Looks like it's calling "llvm::dbgs()" directly.

btw is this patch based off main branch or some other internal patch? I wasn't able to arc patch this one. I would like to play with it locally.

Fri, Mar 19, 3:08 AM · debug-info, Restricted Project
ChuanqiXu updated the diff for D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.

rebase with trunk

Fri, Mar 19, 3:02 AM · debug-info, Restricted Project

Tue, Mar 16

ChuanqiXu added inline comments to D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.
Tue, Mar 16, 11:17 PM · Restricted Project, Restricted Project
ChuanqiXu added inline comments to D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.
Tue, Mar 16, 11:03 PM · Restricted Project, Restricted Project
ChuanqiXu added a comment to D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.

That's a fair point. I agree that we have no guarantee these are the only two cases.
It does seem to me that coroutine implementation somewhat relies on proper lifetime markers so that data are being put correctly, which may be the fundamental problem we are trying to solve.

Tue, Mar 16, 10:58 PM · Restricted Project, Restricted Project
ChuanqiXu added a comment to D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.

What do you think is the fundamental problem, though?

Tue, Mar 16, 10:22 PM · Restricted Project, Restricted Project
ChuanqiXu added a comment to D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.

I am a little confused about the first problem. Would it cause the program to crash? (e.g., we access the fields of coroutine frame after the frame gets destroyed). Or it just wastes some storage?

This is a repro of the crash (in TSAN mode): https://godbolt.org/z/KvPY66

Tue, Mar 16, 8:32 PM · Restricted Project, Restricted Project
ChuanqiXu requested review of D98698: [RFC] [lld] [split-stack] Rewrite the reserved space for __morestack_non_split.
Tue, Mar 16, 5:05 AM
ChuanqiXu updated the diff for D98689: [X86] [split-stack] Add an option to allocate extra space for stack split functions.

fix unit-tests.

Tue, Mar 16, 4:33 AM · Restricted Project
ChuanqiXu requested review of D98689: [X86] [split-stack] Add an option to allocate extra space for stack split functions.
Tue, Mar 16, 2:29 AM · Restricted Project
ChuanqiXu abandoned D98409: [lld] Recommit for implementing --ctors-in-init-array.
Tue, Mar 16, 1:45 AM

Mon, Mar 15

ChuanqiXu added a comment to D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call.

It looks like there are two things this patch wants to do:

  1. Don't put the temporary generated by symmetric-transfer on the coroutine frame.
  2. Offer a mechanism to force some values (it is easy to extend Alloca to Value) to put in the stack instead of the coroutine frame.
Mon, Mar 15, 10:27 PM · Restricted Project, Restricted Project

Mar 11 2021

ChuanqiXu added a comment to D96938: [RFC] [Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2.

@aprantl gentle ping~

Mar 11 2021, 8:46 PM · Restricted Project
ChuanqiXu added a comment to D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.

@aprantl gentle ping~

Mar 11 2021, 8:46 PM · debug-info, Restricted Project
ChuanqiXu added a comment to D98409: [lld] Recommit for implementing --ctors-in-init-array.

You can contribute a patch conditionally using .init_array if the target prefers SHT_INIT_ARRAY. They will assuredly accept such a patch.

Mar 11 2021, 7:09 PM
ChuanqiXu added a comment to D98409: [lld] Recommit for implementing --ctors-in-init-array.

BTW: why do you still use the deprecated .ctors/.dtors?

Since me want to use split-stack, which is a feature dependent on libgcc.a. And the corresponding initialization section in libgcc.a uses .ctors. Here is the codes moreStack. They uses assembly directly. But lld don't combine .ctors into .init_array, which would cause segmentation fault due to uninitialized code.

Mar 11 2021, 6:20 PM
ChuanqiXu updated the diff for D96316: [ASTMatcher] Add AST Matcher support for C++20 coroutine keywords.

Re-run documentation with the nit suggested.

Mar 11 2021, 4:32 AM · Restricted Project
ChuanqiXu requested review of D98409: [lld] Recommit for implementing --ctors-in-init-array.
Mar 11 2021, 3:53 AM

Mar 7 2021

ChuanqiXu added a comment to D45508: Implement --ctors-in-init-array..

I rebased this patch, did some simple change to make it work at trunk and reverse the contents of .ctors after applying relocations as @ruiu mentioned in comments. May I ask is it ok for me to create a new diffusion? I am not familiar with the rule for the situation in the LLVM community convention. @MaskRay @grimar

Mar 7 2021, 10:15 PM

Mar 4 2021

ChuanqiXu added a comment to D96316: [ASTMatcher] Add AST Matcher support for C++20 coroutine keywords.

@njames93 @sammccall gental ping~

Mar 4 2021, 9:06 PM · Restricted Project
ChuanqiXu added a comment to D97915: [Coroutines] Handle overaligned frame allocation.

I am a little confusing about the problem. The example in the link tells the align of the promise instead of the frame. The address of promise and frame is not same. It looks like you're trying to do:

+               +-----------------------------------+
|               |                                   |
+---------------+          frame                    |
| pedding       |                                   |
+               +-----------------------------------+
                ^
                |
                |
                |
                |
                |
                +
Mar 4 2021, 8:04 PM · Restricted Project, Restricted Project
ChuanqiXu added a comment to D96938: [RFC] [Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2.

I looked at your example in the debugger and the problem seems to be that SROA is being run before LowerDbgDeclare (from Local.cpp). I believe reversing the order might work. Alternatively we could just not create an alloca in coro::salvageDebugInfo if optimizations are enabled, but I am not sure if this is known to the pass.

Mar 4 2021, 7:36 PM · Restricted Project
ChuanqiXu updated the diff for D96938: [RFC] [Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2.

Don't emit temporary alloca for arguments if optimization is enabled.

Mar 4 2021, 7:34 PM · Restricted Project
ChuanqiXu updated the diff for D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.

Add option to control the behavior.

Mar 4 2021, 6:12 PM · debug-info, Restricted Project

Mar 3 2021

ChuanqiXu accepted D97345: Improve the debug info for coro-split .resume functions.

This patch LGTM.

Mar 3 2021, 6:07 PM · Restricted Project

Mar 2 2021

Herald added a reviewer for D45508: Implement --ctors-in-init-array.: MaskRay.

I found that this patch isn't correct. Sorry guys. The issue is that we need to reverse the contents after applying relocations. This patch reverses it before applying relocations. I need to fix that. I'll update the patch.

Mar 2 2021, 7:07 PM

Mar 1 2021

ChuanqiXu requested review of D97673: [RFC] [[Coroutine] [Debug] Salvage dbg.value at O2.
Mar 1 2021, 3:18 AM · debug-info, Restricted Project

Feb 25 2021

ChuanqiXu added inline comments to D97345: Improve the debug info for coro-split .resume functions.
Feb 25 2021, 6:05 PM · Restricted Project
ChuanqiXu added a comment to D96938: [RFC] [Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2.

Thanks for posting your analysis @ChuanqiXu! I would very much prefer to fix the pipeline such that after LowerDbgDeclare does its work the resulting dbg.value can be salvaged. I we can pull that off it would benefit the entire LLVM optimized code debug info story. I'll try to debug what's happening to see how feasible that is.

Feb 25 2021, 6:01 PM · Restricted Project
ChuanqiXu added a comment to D96316: [ASTMatcher] Add AST Matcher support for C++20 coroutine keywords.

So is it ok to omit the implementation like yields ?

Yes, that can be done in a follow up and there's no onus on you to do it.

Feb 25 2021, 5:46 PM · Restricted Project
ChuanqiXu added a comment to D96316: [ASTMatcher] Add AST Matcher support for C++20 coroutine keywords.

He's talking about Traversal Matchers. Though they could go in a follow up patch.
Basically matchers that let one write:

coyieldExpr(yeilds(...))
coreturnStmt(returns(...))
coawaitExpr(hasOperand(...))

I am not familiar with ASTMatchers. And from the link you give, it looks like that coyieldExpr, coreturnStmt and coawaitExpr should be Node Matchers instead of Traversal Matchers.

Those should be Node matchers, the traversal matchers would be the yields(...), returns(...) and hasOperand(...) matchers.

Feb 25 2021, 2:55 AM · Restricted Project
ChuanqiXu updated the diff for D96316: [ASTMatcher] Add AST Matcher support for C++20 coroutine keywords.

update c++20 with c++2a

Feb 25 2021, 2:54 AM · Restricted Project

Feb 24 2021

ChuanqiXu added a comment to D96922: [Coroutine] Check indirect uses of alloca when checking lifetime info.

Well, even if we haven't seen any new bugs, a more robust solution would be welcome; it's hard to feel confident in what's happening there.

I agree. Although I am leaning towards a different approach than specializing allocations. I am thinking about introducing another ramp function, whose sole job is to allocate the frame, copy the parameters, and then just to the real ramp function. This will prevent any random instructions from getting moved before coro.begin, because now we have a function boundary.

Feb 24 2021, 10:49 PM · Restricted Project
ChuanqiXu accepted D96922: [Coroutine] Check indirect uses of alloca when checking lifetime info.

Is there any way we can simplify this problem for ourselves by expecting IR to be emitted differently within coroutines? Because I think the true fundamental problem here is that LLVM's representation of stack allocations is not very good, and we're stuck trying to do an analysis that the representation isn't suited for at all. There's no real chance that LLVM in general is going to move away from alloca, but if we can recognize when we're emitting one of these allocations that needs to overlap coro.begin or coro.end and just do it with a special intrinsic that we treat differently in lowering, that might substantially reduce the difficulty of the problem. If necessary, we can even change the rules for inlining into unsplit coroutines.

A special alloca for coroutines will certainly help but it's not sufficient. A special alloca and a set of special rules around data allocation in coroutines can help resolve the escape-before-coro.begin issue (though I haven't seen any new bugs around that).
There are other problems that need to be solved. I need some more time to think about how to put them all together.
I think we are doing OK identifying variables that need to be on the heap (a conservative approach will do just that).
But it gets complex when we also need to be able to identify a set of variables that MUST be put on the stack. These are the variables that are being used after a coroutine is suspended (i.e. coro.suspend switch goes to the default destination), at which point the coroutine frame may have been destroyed.
In the example I had above, the coroutine return object is allocated and returned in the very end. Since it's the return object, it has to be on the stack. But the compiler isn't powerful enough to tell that for two reasons:

  1. It appears that it lives through coro.suspend, but it really only is used at the suspend path, not any other paths.
  2. It's constructed by calling a (sret) struct pointer, which suggests that it may capture, further complicating the problem.

I think a proper solution might require introducing a new IR instruction for coroutine suspension, instead of relying on intrinsics. I have similar conclusions in another bug that involves LICM and coroutine.

To make this patch fully correct, we need to first check PI.isEscaped() before relying on lifetime information.
Then for the return object case to work, we have two short-term solutions:

  1. We can treat sret as nocapture. This isn't 100% accurate. But the only case where C++ can capture a sret pointer is the constructor captures "this". But even that happens, in order for the "this" to really escape the object needs to be used again. So I think we should be safe there.
  2. We can special handle the return object variable, and make sure it's always on the stack. This should also work, I think.
Feb 24 2021, 6:01 PM · Restricted Project

Feb 23 2021

ChuanqiXu added a comment to D96938: [RFC] [Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2.

Could you post an example that you are interested in at -O2 and show the diff of the LLVM IR for your example before/after this patch? I find it unintuitive that removing the shadow copy improved the debug info, but if don't need it and removing it doesn't pessimize the situation at -O0 we could remove it.

Feb 23 2021, 9:11 PM · Restricted Project
ChuanqiXu updated the diff for D96938: [RFC] [Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2.

Update test example under O2.

Feb 23 2021, 8:59 PM · Restricted Project
ChuanqiXu added a comment to D96922: [Coroutine] Check indirect uses of alloca when checking lifetime info.

Is there any way we can simplify this problem for ourselves by expecting IR to be emitted differently within coroutines? Because I think the true fundamental problem here is that LLVM's representation of stack allocations is not very good, and we're stuck trying to do an analysis that the representation isn't suited for at all. There's no real chance that LLVM in general is going to move away from alloca, but if we can recognize when we're emitting one of these allocations that needs to overlap coro.begin or coro.end and just do it with a special intrinsic that we treat differently in lowering, that might substantially reduce the difficulty of the problem. If necessary, we can even change the rules for inlining into unsplit coroutines.

Feb 23 2021, 6:24 PM · Restricted Project
ChuanqiXu added a comment to D97345: Improve the debug info for coro-split .resume functions.

Is this change only works for continuation and async ABIs? If so, what can we do for switch ABIs?

Feb 23 2021, 5:56 PM · Restricted Project

Feb 22 2021

ChuanqiXu added a comment to D96938: [RFC] [Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2.

@aprantl @bruno gentle ping~

Feb 22 2021, 5:58 PM · Restricted Project

Feb 21 2021

ChuanqiXu added a comment to D96316: [ASTMatcher] Add AST Matcher support for C++20 coroutine keywords.

it seems like you'll want the ability to match the wrapped expression too, but I don't see a reason that needs to be in this patch.

I am confused about the statements? What does it mean?

He's talking about Traversal Matchers. Though they could go in a follow up patch.
Basically matchers that let one write:

coyieldExpr(yeilds(...))
coreturnStmt(returns(...))
coawaitExpr(hasOperand(...))
Feb 21 2021, 5:54 PM · Restricted Project

Feb 19 2021

ChuanqiXu added a comment to D96938: [RFC] [Coroutine] [Debug] Insert dbg.declare to entry.resume to print alloca in the coroutine frame under O2.

Does this change keeps the debug behavior in O0? @aprantl any next step with coroutine debug info?

Feb 19 2021, 6:35 PM · Restricted Project
ChuanqiXu added a comment to D96922: [Coroutine] Check indirect uses of alloca when checking lifetime info.

We don't need more information to do that; we need to use the information we have *correctly*. When you're analyzing uses of the alloca, you need to deal with uses that you don't fully understand by just giving up on the analysis and assuming that the object is used for its entire lifetime (as limited by lifetime intrinsics, if possible, but the entire function if not).

The fundamental problem (and what makes this so hard) is that we need to be very accurate to make the program correct. Being conservative is insufficient for correctness. This is due to the fact that we often access data after the frame is destroyed, and anything accessed after the frame is destroyed cannot be on the frame. So for those data we need to be able to accurately tell if they should be on the stack or if the developer has a bug.

Below is some IR that troubled me for a while (removed some lines that are irrelevant to simplify it). The alloca in question is % coro_gro. Its lifetime started early, used after coro.end.
Since it's used after coro.end, at which point the coroutine frame is already destroyed,
coro_gro cannot be put on the frame and has to be put on the stack.
However technically it lives across suspension points and should be put on the frame.
The current implementation happens to work and put it on the stack because suspension checker stops propagating when it sees coro.end. So any use of data after coro.end is considered to not cross suspension point to anything before coro.suspend.
Also, I am not sure if we can prove that _ZN5folly15expected_detail7PromiseIiN12_GLOBALN_13ErrEE17get_return_objectEv never captures %coro_gro. Maybe a function cannot capture an argument that's sret? Not sure.

; Function Attrs: sanitize_address uwtable
define hidden i64 @_Z3foov() #3 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
entry:
  %__coro_gro = alloca %"struct.folly::expected_detail::PromiseReturn", align 8
  %__promise = alloca %"struct.folly::expected_detail::Promise", align 8
  %0 = bitcast %"struct.folly::expected_detail::Promise"* %__promise to i8*
  %1 = call token @llvm.coro.id(i32 16, i8* nonnull %0, i8* bitcast (i64 ()* @_Z3foov to i8*), i8* null)
  %2 = call i1 @llvm.coro.alloc(token %1)
  br i1 %2, label %coro.alloc, label %invoke.cont21

coro.alloc:                                       ; preds = %entry
  %3 = tail call i64 @llvm.coro.size.i64()
  %call = call i8* @_Znwm(i64 %3)
  br label %invoke.cont21

invoke.cont21:                                    ; preds = %coro.alloc, %entry
  %4 = phi i8* [ null, %entry ], [ %call, %coro.alloc ]
  %5 = call i8* @llvm.coro.begin(token %1, i8* %4) #13
  %6 = bitcast %"struct.folly::expected_detail::PromiseReturn"* %__coro_gro to i8*
  call void @llvm.lifetime.start.p0i8(i64 24, i8* nonnull %6) #6
  call fastcc void @_ZN5folly15expected_detail7PromiseIiN12_GLOBAL__N_13ErrEE17get_return_objectEv(%"struct.folly::expected_detail::PromiseReturn"* nonnull sret %__coro_gro, %"struct.folly::expected_detail::Promise"* nonnull %__promise) #6
  %call26 = call fastcc zeroext i1 @_ZNK5folly15expected_detail9AwaitableIiN12_GLOBAL__N_13ErrEE11await_readyEv(%"struct.folly::expected_detail::Awaitable"* nonnull %tmpcast) #6
  br i1 %call26, label %cont37, label %await.suspend

await.suspend:                                    ; preds = %invoke.cont21
  %10 = call token @llvm.coro.save(i8* null)
  %11 = bitcast %"class.std::experimental::coroutines_v1::coroutine_handle.21"* %retval.i109 to i8*
  call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %11)
  call fastcc void @_ZNSt12experimental13coroutines_v116coroutine_handleIN5folly15expected_detail7PromiseIiN12_GLOBAL__N_13ErrEEEEC2Ev(%"class.std::experimental::coroutines_v1::coroutine_handle.21"* nonnull %retval.i109) #6
  call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %11)
  invoke fastcc void @_ZN5folly15expected_detail9AwaitableIiN12_GLOBAL__N_13ErrEE13await_suspendIiEEvNSt12experimental13coroutines_v116coroutine_handleINS0_7PromiseIT_S3_EEEE(%"struct.folly::expected_detail::Awaitable"* nonnull %tmpcast, i8* %5)
          to label %invoke.cont35 unwind label %lpad

invoke.cont35:                                    ; preds = %await.suspend
  %12 = call i8 @llvm.coro.suspend(token %10, i1 false)
  switch i8 %12, label %coro.ret [
    i8 0, label %cont37
    i8 1, label %cleanup49
  ]

...

cleanup83:                                        ; preds = %invoke.cont54, %cleanup49, %cleanup49.thread
  call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %0) #6
  %24 = call i8* @llvm.coro.free(token %1, i8* %5)
  %25 = icmp eq i8* %24, null
  br i1 %25, label %coro.ret, label %coro.free

coro.free:                                        ; preds = %cleanup83
  %26 = tail call i64 @llvm.coro.size.i64()
  call void @_ZdlPvm(i8* nonnull %24, i64 %26) #6
  br label %coro.ret

coro.ret:                                         ; preds = %coro.free, %cleanup83, %invoke.cont35
  %27 = call i1 @llvm.coro.end(i8* null, i1 false) #13
  %call95 = invoke fastcc i64 @_ZNR5folly15expected_detail13PromiseReturnIiN12_GLOBAL__N_13ErrEEcvNS_8ExpectedIiS3_EEEv(%"struct.folly::expected_detail::PromiseReturn"* nonnull %__coro_gro)
          to label %cleanup.action unwind label %lpad93
...
}
Feb 19 2021, 6:32 PM · Restricted Project

Feb 18 2021

ChuanqiXu added a comment to D96922: [Coroutine] Check indirect uses of alloca when checking lifetime info.

When you're analyzing uses of the alloca, you need to deal with uses that you don't fully understand by just giving up on the analysis and assuming that the object is used for its entire lifetime (as limited by lifetime intrinsics, if possible, but the entire function if not).

Feb 18 2021, 9:36 PM · Restricted Project