Page MenuHomePhabricator

ychen (Yuanfang Chen)
User

Projects

User Details

User Since
Nov 19 2013, 9:02 PM (394 w, 3 d)

Recent Activity

Thu, Jun 10

ychen added a comment to D102145: [Coroutines] Add `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and `llvm.coro.raw.frame.ptr.addr` intrinsics.

It looks like that other reviewers are not available at this time. I am curious about the alignment problem for coroutine frame since I met confusing regression problem due to alignment mismatch in the past. Although the condition to trigger the alignment mismatch problem is not common (it requires the user to specify the alignment for a variable with alignment large than 16), I am desired to complete the (series) patch(es). So again, it looks better for me to merge this with D97915 to review and test. How do you think about it @ychen ?

Thu, Jun 10, 11:10 PM · Restricted Project
ychen added a comment to D103938: Diagnose -Wunused-value in constant evaluation context.

This will diagnose unused values in unreachable code in constant-evaluated contexts; that doesn't seem quite right. For example, in:

void f() {
  new double[false ? (1, 2) : 3][false ? (1, 2) : 3];
}

... we'll diagnose that the 1 is unused in only one of the two array bounds, because one of them is a constant-evaluated context and the other is not.

I'd suggest we start by adding a separate DiagIfReachable function that does the same thing as DiagRuntimeBehavior except without the context check, with the intent to eventually refine it to check for reachability even in expressions for which we don't build a CFG (eg, in the initializer of a global or in a constant expression).

@rsmith Thanks for the pointer! Comments addressed. For the reachability check in contexts without CFG, it seems that some delayed processing is needed since constant evaluation happens later, however, I'm not sure about that.

Thu, Jun 10, 11:07 PM · Restricted Project
ychen updated the diff for D103938: Diagnose -Wunused-value in constant evaluation context.
  • Add DiagIfReachable and use it in Sema::DiagnoseUnusedExprResult.
  • Update tests.
Thu, Jun 10, 11:00 PM · Restricted Project

Tue, Jun 8

ychen added a comment to D102531: PR45881: Properly use CXXThisOverride for templated lambda.

ping

Tue, Jun 8, 5:51 PM · Restricted Project
ychen requested review of D103938: Diagnose -Wunused-value in constant evaluation context.
Tue, Jun 8, 5:51 PM · Restricted Project
ychen added a comment to D103885: [clang] Suppress warnings for tautological comparison in generated macro code.

We could use a table to avoid the warning and speed the build a little bit. https://reviews.llvm.org/D98110

That's an interesting approach, thank you for sharing it! I wonder how use of a table impacts performance given that the table is on the stack and there can be a lot of options (a bit less worried for things like OpenMP kinds or overloadable operators as those as much smaller)?

Tue, Jun 8, 11:12 AM · Restricted Project, Restricted Project, Restricted Project
ychen added a comment to D103885: [clang] Suppress warnings for tautological comparison in generated macro code.

We could use a table to avoid the warning and speed the build a little bit. https://reviews.llvm.org/D98110

Tue, Jun 8, 9:39 AM · Restricted Project, Restricted Project, Restricted Project

Sat, Jun 5

ychen added inline comments to D34803: [LTO] Remove values from non-prevailing comdats.
Sat, Jun 5, 3:13 PM · Restricted Project

Thu, Jun 3

ychen added inline comments to D34803: [LTO] Remove values from non-prevailing comdats.
Thu, Jun 3, 11:08 AM · Restricted Project
ychen added inline comments to D34803: [LTO] Remove values from non-prevailing comdats.
Thu, Jun 3, 10:10 AM · Restricted Project

Wed, Jun 2

Herald added a project to D34803: [LTO] Remove values from non-prevailing comdats: Restricted Project.
Wed, Jun 2, 11:52 PM · Restricted Project
ychen added a comment to D103495: [static initializers] Emit global_ctors and global_dtors in reverse order when .ctors/.dtors are used..

Is there anything preventing us from using the existing priority field to define the order instead of introducing the order among initializers with the same priority? If we go with 1., there would be no way to say that "the order does not matter" right?

The priority is used mainly for inter-object initialization order. It moves the initializer into a .init_array.N section, for some number N. The programmer may be using existing numbers via __attribute__((init_priority(N))), so it isn't safe for the compiler to use anything other than the default initialization priority.

Wed, Jun 2, 2:47 PM · Restricted Project
ychen added a comment to D103495: [static initializers] Emit global_ctors and global_dtors in reverse order when .ctors/.dtors are used..

The LLVM IR langref says that llvm.global_ctors are called in ascending priority order, and the order between initializers is not defined. That's not very helpful and often doesn't reflect reality. I wonder if we could do two things, perhaps separately:

  1. emit llvm.global_ctors so that they are called in order of appearance in the IR array (is this not already true?)
  2. define the order of initialization in LangRef
Wed, Jun 2, 2:18 PM · Restricted Project
ychen added a comment to D102145: [Coroutines] Add `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and `llvm.coro.raw.frame.ptr.addr` intrinsics.

ping?

Wed, Jun 2, 10:13 AM · Restricted Project

Tue, May 25

ychen added a comment to D99903: [Clang][Sema] better -Wcast-function-type diagnose for pointer parameters and parameters with cv-qualifiers.

ping

Tue, May 25, 5:40 PM · Restricted Project
ychen added a comment to D102531: PR45881: Properly use CXXThisOverride for templated lambda.

ping...

Tue, May 25, 5:40 PM · Restricted Project
ychen added inline comments to D97430: [SanitizerCoverage] Drop !associated on metadata sections.
Tue, May 25, 3:41 PM · Restricted Project

Mon, May 17

ychen added inline comments to D102145: [Coroutines] Add `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and `llvm.coro.raw.frame.ptr.addr` intrinsics.
Mon, May 17, 5:33 PM · Restricted Project
ychen updated the diff for D102531: PR45881: Properly use CXXThisOverride for templated lambda.
  • Address comments.
Mon, May 17, 4:07 PM · Restricted Project

Sat, May 15

ychen added inline comments to D97735: [Globals] Treat nobuiltin fns as maybe-derefined..
Sat, May 15, 10:56 PM · Restricted Project

Fri, May 14

ychen requested review of D102531: PR45881: Properly use CXXThisOverride for templated lambda.
Fri, May 14, 1:51 PM · Restricted Project

May 13 2021

ychen accepted D102244: [llvm][AsmPrinter] Restore source location to register clobber warning.

LGTM. Thanks.

May 13 2021, 10:04 AM · Restricted Project, Restricted Project

May 12 2021

ychen added inline comments to D97915: [Coroutines] Handle overaligned frame allocation.
May 12 2021, 11:29 PM · Restricted Project, Restricted Project
ychen added inline comments to D97915: [Coroutines] Handle overaligned frame allocation.
May 12 2021, 11:22 PM · Restricted Project, Restricted Project
ychen added inline comments to D97915: [Coroutines] Handle overaligned frame allocation.
May 12 2021, 11:19 PM · Restricted Project, Restricted Project
ychen updated the diff for D97915: [Coroutines] Handle overaligned frame allocation.
  • Address feedbacks.
May 12 2021, 10:58 PM · Restricted Project, Restricted Project
ychen added inline comments to D97915: [Coroutines] Handle overaligned frame allocation.
May 12 2021, 9:59 PM · Restricted Project, Restricted Project
ychen added a comment to D102145: [Coroutines] Add `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and `llvm.coro.raw.frame.ptr.addr` intrinsics.

Thanks for the review!

May 12 2021, 5:05 PM · Restricted Project
ychen added a comment to D102244: [llvm][AsmPrinter] Restore source location to register clobber warning.

Looks good except for one suggestion and the patch need rebasing on an upstream commit.

May 12 2021, 10:59 AM · Restricted Project, Restricted Project
ychen added a comment to D101980: [RFC] [Coroutines] Put the contents of byval argument to the frame.

How is the situation with byval different from the situation with indirect arguments? It sounds like we generally need to perform this extra copy into the coroutine frame once it's allocated, and yes, that's something we need to emit in the frontend because it might be non-trivial.

Or is the problem that LLVM might be helpfully eliminating the copy when the source is a byval parameter? But that's also something it could be doing with indirect arguments, so again, treating byval specially seems suspect.

Sorry. May I ask what does indirect argument mean? I tried to copy all byval parameters since it looks like if the frontend needs to pass parameters by value for structs, it would emit byval attributes.

An indirect argument is one that's implicitly passed by reference. For example, most x86-64 ABIs will pass a struct like struct { char x[1024]; }; as a pointer to a temporary instead of on the stack. Normally, functions that take such arguments just bind the parameter variable directly to the pointer that's passed in; if that's a problem for coroutines, you're going to need to force a copy there as well.

Indirect arguments are also used (on pretty much every target except Windows) for non-trivially-copyable class types. That gets tricky because unfortunately you can't just emit a correct move-construction during coroutine lowering; we'll have to do it in the frontend, if we aren't already.

That brings up a related problem: there are LLVM optimizations that assume things about parameters during a function. For example, the pointers passed as indirect arguments are generally assumed not to alias anything, which means that we can generally eliminate redundant things like, say, copies out of them that are perfectly happy to just use the original memory. That's not a problem for copies emitted during coroutine lowering, since we'll immediately split the function and make the coroutine non-redundant from LLVM's perspective, but I am worried that we might trigger these optimizations on copies we emit earlier, and that could mess things up.

For example, consider code like this:

struct A {
  char buffer[32];
  A(const A &);
};

A::A(const A &other) {
  memcpy(buffer, other.buffer, sizeof(buffer));
}

my_coroutine foo(A a) {
  ...
}

If we emit the copy of a in the frontend — which I think we have no choice but to do — it'll generate IR something like this:

define ... @foo(%struct.A* %0) {
  %a = alloca %struct.A
  call void @llvm.memcpy(%a, %0, ...)
  ...
}

If the LLVM optimizer sees this, it very well might just remove the memcpy and use %0 everywhere instead of %a, and then we'll miscompile the coroutine. I'm not sure of the best way to fix this; maybe we just have to disable those optimizations on unlowered coroutines.

May 12 2021, 1:26 AM · Restricted Project

May 11 2021

ychen updated the diff for D102145: [Coroutines] Add `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and `llvm.coro.raw.frame.ptr.addr` intrinsics.
  • Fix a test typo.
May 11 2021, 8:35 PM · Restricted Project
ychen updated the diff for D102147: [Clang][Coroutines] Implement P2014R0 Option 1 behind -fcoroutines-aligned-alloc.
  • rebase
May 11 2021, 5:25 PM · Restricted Project
ychen added a comment to D97915: [Coroutines] Handle overaligned frame allocation.

which should be correct. It is implemented by CodeGenFunction::EmitBuiltinAlignTo.

I agree it is correct. I just want to say we should comment it to avoid confusing.

May 11 2021, 5:23 PM · Restricted Project, Restricted Project
ychen updated the diff for D97915: [Coroutines] Handle overaligned frame allocation.
  • Rebase on updated D102145 (use llvm.coro.raw.frame.ptr.addr during allocation)
May 11 2021, 5:11 PM · Restricted Project, Restricted Project
ychen retitled D102145: [Coroutines] Add `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and `llvm.coro.raw.frame.ptr.addr` intrinsics from [Coroutines] Add `llvm.coro.align` and `llvm.coro.raw.frame.ptr.offset` intrinsics to [Coroutines] Add `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and `llvm.coro.raw.frame.ptr.addr` intrinsics.
May 11 2021, 5:05 PM · Restricted Project
ychen updated the diff for D102145: [Coroutines] Add `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and `llvm.coro.raw.frame.ptr.addr` intrinsics.
  • Add llvm.coro.raw.frame.ptr.addr to help storing raw frame address during allocation. llvm.coro.raw.frame.ptr.offset cannot be used during allocation because coroutine lowering would overwrite the raw frame address field with %alloc.frame.ptr alloca after llvm.coro.begin. The malloc returned address should be stored to %alloc.frame.ptr instead of the raw frame address field directly.
  • Add tests for added intrinsics.
May 11 2021, 5:03 PM · Restricted Project
ychen added a comment to D102145: [Coroutines] Add `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and `llvm.coro.raw.frame.ptr.addr` intrinsics.

The overall patch looks good to me. It seems like this patch did nothing without D97915. And it is different from normal NFC changes. I am not sure if it is more proper to merge this with D97915.

May 11 2021, 4:57 PM · Restricted Project
ychen added inline comments to D102244: [llvm][AsmPrinter] Restore source location to register clobber warning.
May 11 2021, 11:18 AM · Restricted Project, Restricted Project

May 10 2021

ychen added a comment to rG67b04465c00e: Fix cv-qualification of '*this' captures and nasty bug PR27507.

FYI, review link is https://reviews.llvm.org/D19783

May 10 2021, 3:02 PM
ychen added a comment to D99903: [Clang][Sema] better -Wcast-function-type diagnose for pointer parameters and parameters with cv-qualifiers.

ping

May 10 2021, 8:45 AM · Restricted Project
ychen added a comment to D102147: [Clang][Coroutines] Implement P2014R0 Option 1 behind -fcoroutines-aligned-alloc.

Since D97915 would fix the problem that the variables in the frame may not be aligned, I think this option fcoroutines-aligned-alloc won't affect normal programmers other than language lawyers. Do you think so?

May 10 2021, 8:28 AM · Restricted Project
ychen added a comment to D101980: [RFC] [Coroutines] Put the contents of byval argument to the frame.

How is the situation with byval different from the situation with indirect arguments? It sounds like we generally need to perform this extra copy into the coroutine frame once it's allocated, and yes, that's something we need to emit in the frontend because it might be non-trivial.

Or is the problem that LLVM might be helpfully eliminating the copy when the source is a byval parameter? But that's also something it could be doing with indirect arguments, so again, treating byval specially seems suspect.

May 10 2021, 12:15 AM · Restricted Project

May 9 2021

ychen requested review of D102147: [Clang][Coroutines] Implement P2014R0 Option 1 behind -fcoroutines-aligned-alloc.
May 9 2021, 8:34 PM · Restricted Project
ychen updated the diff for D97915: [Coroutines] Handle overaligned frame allocation.
  • Rebase
May 9 2021, 8:25 PM · Restricted Project, Restricted Project
ychen added reviewers for D102145: [Coroutines] Add `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and `llvm.coro.raw.frame.ptr.addr` intrinsics: rjmccall, lxfind, ChuanqiXu.
May 9 2021, 7:42 PM · Restricted Project
ychen updated the diff for D97915: [Coroutines] Handle overaligned frame allocation.
  • Rebase on D102145.
  • Dynamically adjust the alignment for allocation and deallocation if the selected allocator does not have std::align_val_t argument. Otherwise, use the aligned allocation/deallocation function.
May 9 2021, 7:41 PM · Restricted Project, Restricted Project
ychen updated the diff for D102145: [Coroutines] Add `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and `llvm.coro.raw.frame.ptr.addr` intrinsics.
  • Fix doc.
May 9 2021, 7:35 PM · Restricted Project
ychen retitled D102145: [Coroutines] Add `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and `llvm.coro.raw.frame.ptr.addr` intrinsics from [Coroutines] Add `llvm.coro.align` and `llvm.coro.get.raw.frame.addr` intrinsics to [Coroutines] Add `llvm.coro.align` and `llvm.coro.raw.frame.ptr.offset` intrinsics.
May 9 2021, 7:31 PM · Restricted Project
ychen requested review of D102145: [Coroutines] Add `llvm.coro.align`, `llvm.coro.raw.frame.ptr.offset` and `llvm.coro.raw.frame.ptr.addr` intrinsics.
May 9 2021, 7:24 PM · Restricted Project
ychen committed rG9ffd4924e8e1: [NFC][Coroutines] Fix two tests by removing hardcoded SSA value. (authored by ychen).
[NFC][Coroutines] Fix two tests by removing hardcoded SSA value.
May 9 2021, 7:07 PM

May 7 2021

ychen accepted D101797: [NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose.

LGTM. Need a test for pass managers debug output.

May 7 2021, 3:35 PM · Restricted Project, Restricted Project
ychen accepted D102093: [NewPM] Move analysis invalidation/clearing logging to instrumentation.

LGTM.

May 7 2021, 3:19 PM · Restricted Project, Restricted Project

May 5 2021

ychen planned changes to D97915: [Coroutines] Handle overaligned frame allocation.

Plan to rebase this together with the following patch for two lookups (aligned and non-aligned new/delete, and generate code accordingly)

May 5 2021, 5:54 PM · Restricted Project, Restricted Project

May 3 2021

ychen added a comment to D101797: [NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose.

Looks like the precheck failures are real.

May 3 2021, 9:51 PM · Restricted Project, Restricted Project
ychen added a comment to D99903: [Clang][Sema] better -Wcast-function-type diagnose for pointer parameters and parameters with cv-qualifiers.

ping..

May 3 2021, 9:36 PM · Restricted Project
ychen added a comment to D97915: [Coroutines] Handle overaligned frame allocation.

This code snippets confused me before:

coro.alloc.align:                                 ; preds = %coro.check.align
  %mask = sub i64 %11, 1
  %intptr = ptrtoint i8* %call to i64
  %over_boundary = add i64 %intptr, %mask
  %inverted_mask = xor i64 %mask, -1
  %aligned_intptr = and i64 %over_boundary, %inverted_mask
  %diff = sub i64 %aligned_intptr, %intptr
  %aligned_result = getelementptr inbounds i8, i8* %call, i64 %diff

This code implies that %diff > 0. Formally, given Align = 2^m, m > 4 and Address=16n, we need to prove that:

(Address + Align -16)&(~(Align-1)) >= Address

&(~Align-1) would make the lowest m bit to 0. And Align-16 equals to 2^m - 16, which is 16*(2^(m-4)-1). Then Address + Align -16 could be 16*(n+2^(m-4)-1).
Then we call X for the value of the lowest m bit of Address + Align -16.
Because X has m bit, so X <= 2^m - 1. Noticed that X should be 16 aligned, so the lowest 4 bit should be zero.
Now,

X <= 2^m - 1 -1 - 2 - 4 - 8 = 2^m - 16

So the inequality we need prove now should be:

16*(n+2^(m-4)-1) - X >= 16n

Given X has the largest value wouldn't affect the inequality, so:

16*(n+2^(m-4)-1) - 2^m + 16 >= 16n

which is very easy now.

The overall prove looks non-travel to me. I spent some time to figure it out. I guess there must be some other people who can't get it immediately. I strongly recommend to add comment and corresponding prove for this code.

May 3 2021, 9:51 AM · Restricted Project, Restricted Project

Apr 29 2021

ychen added a comment to D100739: [Coroutines] Handle overaligned frame allocation (2).

Oh, right now C++ coroutine standard is written in the way that the aligned new is not searched by the frontend. The limitation will be lifted in C++23 (hopefully).

I see. I am curious about the relationship of compiler implementation and language standard now. For example, Clang/LLVM implement coroutine before it becomes standard. The point I curious about is that should Clang/LLVM implement based on proposals accepted only?

Not a C++ language expert myself. But I think a proposal does not have to be formally accepted to kick-start the implementation (as long as the overall design is decided and the proposal is very likely to be accepted).

I see. I am still prefer to use the aligned allocator to solve the problems, although you and other reviewers prefer to use the over aligned frame.

Apr 29 2021, 10:37 PM · Restricted Project, Restricted Project
ychen added a comment to D100739: [Coroutines] Handle overaligned frame allocation (2).

Oh, right now C++ coroutine standard is written in the way that the aligned new is not searched by the frontend. The limitation will be lifted in C++23 (hopefully).

I see. I am curious about the relationship of compiler implementation and language standard now. For example, Clang/LLVM implement coroutine before it becomes standard. The point I curious about is that should Clang/LLVM implement based on proposals accepted only?

Apr 29 2021, 8:25 PM · Restricted Project, Restricted Project
ychen updated the diff for D97915: [Coroutines] Handle overaligned frame allocation.
  • Add missed Shape.CoroRawFramePtrOffsets.clear();
Apr 29 2021, 8:13 PM · Restricted Project, Restricted Project
ychen added a comment to D97915: [Coroutines] Handle overaligned frame allocation.

May I ask a question may be too simple? What if the user specify the alignment for promise (or any other local variables) to 128 or even 256? Since it looks like all the discuss before assumes that the largest alignment requirement is 64.

Apr 29 2021, 8:11 PM · Restricted Project, Restricted Project
ychen added a comment to D97915: [Coroutines] Handle overaligned frame allocation.

For coroutine f0 in test/CodeGenCoroutines/coro-alloc.cpp

Apr 29 2021, 11:53 AM · Restricted Project, Restricted Project
ychen updated the diff for D97915: [Coroutines] Handle overaligned frame allocation.
  • fix typo
Apr 29 2021, 11:43 AM · Restricted Project, Restricted Project
ychen updated the diff for D97915: [Coroutines] Handle overaligned frame allocation.
  • fix a bug.
Apr 29 2021, 11:41 AM · Restricted Project, Restricted Project
ychen planned changes to D97915: [Coroutines] Handle overaligned frame allocation.

Found a bug. Will fix.

Apr 29 2021, 10:21 AM · Restricted Project, Restricted Project
ychen accepted D100912: [docs][NewPM] Add section on analyses.

LGTM

Apr 29 2021, 12:12 AM · Restricted Project
ychen abandoned D100739: [Coroutines] Handle overaligned frame allocation (2).

Reopened D97915 to address the feedbacks. Close this one.

Apr 29 2021, 12:09 AM · Restricted Project, Restricted Project
ychen added a comment to D97915: [Coroutines] Handle overaligned frame allocation.

@rjmccall the patch is on the large side. I'll submit a separate patch for the Sema part about searching for two allocators.

Apr 29 2021, 12:08 AM · Restricted Project, Restricted Project
ychen updated the diff for D97915: [Coroutines] Handle overaligned frame allocation.
  • Handle deallocation.
  • Fix tests.
Apr 29 2021, 12:04 AM · Restricted Project, Restricted Project
ychen reclaimed D97915: [Coroutines] Handle overaligned frame allocation.
Apr 29 2021, 12:03 AM · Restricted Project, Restricted Project

Apr 26 2021

ychen added a comment to D100739: [Coroutines] Handle overaligned frame allocation (2).

Here are the options I think the committee might take:

  1. Always select an unaligned allocator and force implementors to dynamically align. This seems unlikely to me.
  2. Allow an aligned allocator to be selected. The issue here is that we cannot know until coroutine splitting whether the frame has a new-extended alignment. So there are some sub-options:

2a. Always use an aligned allocator if available, even if the frame ends up not being aligned. I think this is unlikely.
2b. Use the correct allocator for the frame alignment, using the alignment inferred from the immediate coroutine body. This would force implementations to avoid doing anything prior to coroutine splitting that would increase the frame's alignment beyond the derived limit. This would be quite annoying for implementors, and it would have strange performance cliffs, but it's theoretically simple.
2c. Use the correct allocator for the frame alignment; only that allocator is formally ODR-used. This would force implementations to not ODR-use the right allocator until coroutine lowering, which is not really reasonable; we should be able to dissuade the committee from picking this option.
2d. Use the correct allocator for the frame alignment; both allocators are (allowed to be) ODR-used, but only one would be dynamically used. This is what would be necessary for the implementation I suggested above. In reality there won't be any dynamic overhead because we should always be able to fold the branch after allocation.

I think 2d is obvious enough that we could go ahead and implement it pending standardization. There's still a question of what to do if the promise class only provides an unaligned operator new, but the only reasonable behavior is to dynamically align: unlike with new expressions, there's no way the promise class could be expected to know/satisfy the appropriate alignment requirement in general, so assuming alignment is not acceptable. (Neither is rejecting the program if it doesn't provide both — this wouldn't be compatible with existing behavior.)

*(void) (frame + size) = rawFrame; this means we always need the extra space to store the raw frame ptr. If either doing what the patch is currently doing or add another intrinsic say "llvm.coro.raw.frame.ptr.index" to do *(void) (frame + llvm.coro.raw.frame.ptr.index()) = rawFrame;, it is likely that the extra pointer could reuse some existing paddings in the frame. There is an example of this in https://reviews.llvm.org/P8260. What do you think?

You're right that there might be space in the frame we could re-use. I was thinking that it would be a shame to always add storage to the frame for the raw frame pointer, but maybe the contract of llvm.coro.raw.frame.ptr.offset could be that it's only meaningful if the frame has extended alignment. Coroutine splitting would determine if any of the frame members was over-aligned and add a raw-pointer field if so. We'd be stuck allocating space in the frame even when allocation was elided, but stack space is cheap.

It does need to be an offset instead of a type index, though; the frontend will be emitting a GEP and will not know the frame type.

Apr 26 2021, 9:59 PM · Restricted Project, Restricted Project
ychen added a comment to D100739: [Coroutines] Handle overaligned frame allocation (2).

This is an alternative to D97915 which missed proper deallocation of the over-allocated frame. This patch handles both allocations and deallocations.

If D97915 is not needed, we should abandon it.

For the example shows in D97915, it says:

#include <experimental/coroutine>
#include <iostream>
#include <stdexcept>
#include <thread>
#include <cassert>

struct task{
  struct alignas(64) promise_type {
    task get_return_object() { return {}; }
    std::experimental::suspend_never initial_suspend() { return {}; }
    std::experimental::suspend_never final_suspend() noexcept { return {}; }
    void return_void() {}
    void unhandled_exception() {}
  };
  using handle = std::experimental::coroutine_handle<promise_type>;
};

auto switch_to_new_thread() {
  struct awaitable {
    bool await_ready() { return false; }
    void await_suspend(task::handle h) {
      auto i = reinterpret_cast<std::uintptr_t>(&h.promise());
      std::cout << i << std::endl;
      assert(i % 64 == 0);
    }
    void await_resume() {}
  };
  return awaitable{};
}

task resuming_on_new_thread() {
  co_await switch_to_new_thread();
}

int main() {
  resuming_on_new_thread();
}

The assertion would fail. If this is the root problem, I think we could adjust the align for the promise alloca like:

The problem is that any member of the coroutine frame could be overaligned (thus make the frame overaligned) including promise, local variables, spills. The problem is *not* specific to promise.

%promise = alloca %promise_type, align 8

into

%promise = alloca %promise_type, align 128

In other words, if this the problem we need to solve, I think we could make it in a simpler way.

This may not fix the problem.

Then I looked into the document you give in the summary. The issue#3 says the frontend can't do some work in the process of template instantiation due to the frontend doesn't know about the align and size of the coroutine. But from the implementation, it looks like not the problem this patch wants to solve.

I meant to use that as a reference to help describe the problem (but not the solution). The document itself includes both problem statements (issue#3) and solutions (frontend-based) which are totally unrelated to this patch. It looks like it is not that useful in this case so please disregard that.

I am really confused about the problem. Could you please restate your problem more in more detail? For example, would it make the alignment incorrect like the example above? Or does we want the frontend to get alignment information? Then what would be affected? From the title, I can guess the size of frame would get bigger. But how big would it be? Who would control and determine the final size?

understood.

There are two kinds of alignments: the alignment of a type/object at compile-time (ABI specified or user-specified), and the alignment the object of that type actually gets during runtime. The compiler assumes that the alignment of a struct is the maximal alignment of all its members. However, that assumption may not be true at runtime where the memory allocator may return a memory block that has insufficient alignment which causes some members aligned incorrectly.

For C++ coroutine, right now the default memory allocator could only return 16 bytes aligned memory block. When any member of the coroutine frame (promise, local variables, spills etc.) has alignment > 16, the frame becomes overaligned. This could only be fixed dynamically at runtime: by over-allocating memory and then adjust the frame start address so that it aligns correctly.

For example, suppose malloc returns 16 bytes aligned address 16, how do we make it 64 bytes aligned? align 16 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-16=48

Another similar example, suppose malloc returns 16 bytes aligned address 32, how do we make it 64 bytes aligned? align 32 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-32=32

Another similar example, suppose malloc returns 16 bytes aligned address 48, how do we make it 64 bytes aligned? align 48 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-48=16

Another similar example, suppose malloc returns 16 bytes aligned address 64, how do we make it 64 bytes aligned? align 64 up to an address that is 64 bytes aligned which is 64, so the adjustment amount is 64-64=0

So the mamximal adjustment amount is 64-16=48. We don't know until runtime if the malloc returned address X is (X % 64 == 0) or (X % 64 == 16) or (X % 64 == 32) or (X % 64 == 48), so we must emit extra code to deal with all cases (by bitwise operations).

Thanks for the explanation. I think I got the problem now. And my understanding for the solution of this patch is, if the align of the original frame is 64, then we allocate (64+48) space to the new frame now. And the original frame becomes an alloca with align 64 in the new frame. So the actually frame gets the right alignment now. Do I get the problem and solution right?

I am wondering if there is a simpler solution. For example, after we construct the frame, we can get the alignment requirement for the frame. Then if the alignment is bigger than 16, we could lower the coro.begin to aligned new instead of default new. It looks like that the implementation would be much simpler.

Apr 26 2021, 7:13 PM · Restricted Project, Restricted Project
ychen added reviewers for D95541: Support Os or Oz inside the LTO: tejohnson, pcc.
Apr 26 2021, 5:01 PM · Restricted Project
ychen added a comment to D100739: [Coroutines] Handle overaligned frame allocation (2).

Yes, if you can dynamically choose to use an aligned allocator, that's clearly just much better.

Right now:

Intrinsic::coro_size_aligned : overaligned frame: over-allocate, adjust start address; non-overaligned frame: no-op
Intrinsic::coro_size : overaligned frame: no-op; non-overaligned frame: no-op

Do you mean to remove Intrinsic::coro_size_aligned and make
Intrinsic::coro_size : overaligned frame: over-allocate, adjust start address; non-overaligned frame: no-op

that makes sense to me. Just want to confirm first.

That's not really what I meant, no. I meant it would be better to find a way to use an allocator that promises to return a well-aligned value when possible. We've talked about this before; that will require the C++ committee to update the design.

I think the cleanest design for coro_size/align would be that they reflect the unadjusted requirements, and the frontend is expected to emit code which satisfies those requirements. In the absence of an aligned allocator, that means generating code like:

if (llvm.coro.alloc()) {
  size_t size = llvm.coro.size(), align = llvm.coro.align();
  if (align > NEW_ALIGN)
    size += align - NEW_ALIGN + sizeof(void*);
  frame = operator new(size);
  if (align > NEW_ALIGN) {
    auto rawFrame = frame;
    frame = (frame + align - 1) & ~(align - 1);
    *(void**) (frame + size) = rawFrame;
  }
}

and then on the deallocation side:

if (llvm.coro.alloc()) {
  size_t size = llvm.coro.size(), align = llvm.coro.align();
  if (align > NEW_ALIGN)
    frame = *(void**) (frame + size);
  operator delete(frame);
}

That's all quite annoying, but it does extend quite nicely to cover the presence of an aligned allocator when the committee gets around to ratifying that this is what should happen:

if (llvm.coro.alloc()) {
  size_t size = llvm.coro.size(), align = llvm.coro.align();
  if (align > NEW_ALIGN)
    frame = operator new(std::align_val_t(align), size);
  else
    frame = operator new(size);
}

and then on the deallocation side:

if (llvm.coro.alloc()) {
  size_t size = llvm.coro.size(), align = llvm.coro.align();
  if (align > NEW_ALIGN)
    operator delete(frame, std::align_val_t(align));
  else
    operator delete(frame);
}
Apr 26 2021, 3:56 PM · Restricted Project, Restricted Project
ychen added a comment to D100739: [Coroutines] Handle overaligned frame allocation (2).

Yes, if you can dynamically choose to use an aligned allocator, that's clearly just much better.

Apr 26 2021, 10:23 AM · Restricted Project, Restricted Project

Apr 25 2021

ychen added inline comments to D100739: [Coroutines] Handle overaligned frame allocation (2).
Apr 25 2021, 6:55 PM · Restricted Project, Restricted Project

Apr 23 2021

ychen abandoned D97915: [Coroutines] Handle overaligned frame allocation.

Pursue D100739 instead.

Apr 23 2021, 3:12 PM · Restricted Project, Restricted Project
ychen added a comment to D100739: [Coroutines] Handle overaligned frame allocation (2).

Sorry for the confusion. I think either overaligned or under-aligned could be used here to describe the problem: either "Handle overaligned frame" or "Fix under-aligned frame". Since c++ spec defines the former but not the later (https://en.cppreference.com/w/cpp/language/object#Alignment), my first intuition was to use the term "overalign". Under-aligned is the undesired outcome that should be fixed (probably too late to handle I assume). Also the overaligned is a static property whereas 'under-aligned" is a runtime property. From the compiler's perspective, I think overaligned should be preferred. With that said, I don't feel strongly about this. I could switch to use "overaligned" if that feels more intuitive.

Here "overaligned frame" doesn't already occur.

It does occur. FrameAlign > Shape.getSwitchCoroId()->getAlignment()) this check reflects the definition of C++ spec's definition of overalign.

Apr 23 2021, 2:44 PM · Restricted Project, Restricted Project
ychen added a comment to D100739: [Coroutines] Handle overaligned frame allocation (2).

This is an alternative to D97915 which missed proper deallocation of the over-allocated frame. This patch handles both allocations and deallocations.

If D97915 is not needed, we should abandon it.

For the example shows in D97915, it says:

#include <experimental/coroutine>
#include <iostream>
#include <stdexcept>
#include <thread>
#include <cassert>

struct task{
  struct alignas(64) promise_type {
    task get_return_object() { return {}; }
    std::experimental::suspend_never initial_suspend() { return {}; }
    std::experimental::suspend_never final_suspend() noexcept { return {}; }
    void return_void() {}
    void unhandled_exception() {}
  };
  using handle = std::experimental::coroutine_handle<promise_type>;
};

auto switch_to_new_thread() {
  struct awaitable {
    bool await_ready() { return false; }
    void await_suspend(task::handle h) {
      auto i = reinterpret_cast<std::uintptr_t>(&h.promise());
      std::cout << i << std::endl;
      assert(i % 64 == 0);
    }
    void await_resume() {}
  };
  return awaitable{};
}

task resuming_on_new_thread() {
  co_await switch_to_new_thread();
}

int main() {
  resuming_on_new_thread();
}

The assertion would fail. If this is the root problem, I think we could adjust the align for the promise alloca like:

Apr 23 2021, 2:36 PM · Restricted Project, Restricted Project
ychen updated the summary of D100739: [Coroutines] Handle overaligned frame allocation (2).
Apr 23 2021, 12:16 AM · Restricted Project, Restricted Project
ychen updated the diff for D100739: [Coroutines] Handle overaligned frame allocation (2).

fix typo.

Apr 23 2021, 12:14 AM · Restricted Project, Restricted Project
ychen updated the diff for D100739: [Coroutines] Handle overaligned frame allocation (2).

Fix typo.

Apr 23 2021, 12:00 AM · Restricted Project, Restricted Project

Apr 22 2021

ychen added a comment to D100739: [Coroutines] Handle overaligned frame allocation (2).

What is the purpose of the builtin? Where is it being used? Typically you *can't* change the signature of a builtin because the builtin is itself a language feature that's documented to have a particular signature. If you've made a builtin purely for use in generated AST, that's pretty unfortunate, and you should consider whether you actually have to do that instead of e.g. synthesizing a call to an allocation function the same way that we do in new expressions.

Apr 22 2021, 11:53 PM · Restricted Project, Restricted Project
ychen added a comment to D100739: [Coroutines] Handle overaligned frame allocation (2).

Thanks for working on this.
I am still having a bit hard time understanding the solution.
A few questions:

  1. I assume this patch is to solve the problem where the promise object is not aligned according to its alignof annotation, right? The title/wording is a bit misleading. Usually "handling XXX" means XXX is a situation/problem that wasn't handle properly before, and it's being handled here. I don't really understand what "handle overaligned frame allocation" means. Isn't frame allocation under-aligned being the problem?
Apr 22 2021, 11:50 PM · Restricted Project, Restricted Project
ychen updated the diff for D100739: [Coroutines] Handle overaligned frame allocation (2).
  • Address feebacks.
Apr 22 2021, 11:49 PM · Restricted Project, Restricted Project
ychen updated the summary of D100739: [Coroutines] Handle overaligned frame allocation (2).
Apr 22 2021, 11:48 PM · Restricted Project, Restricted Project
ychen updated the summary of D100739: [Coroutines] Handle overaligned frame allocation (2).
Apr 22 2021, 11:12 PM · Restricted Project, Restricted Project
ychen updated the summary of D100739: [Coroutines] Handle overaligned frame allocation (2).
Apr 22 2021, 11:11 PM · Restricted Project, Restricted Project
ychen added a comment to D99903: [Clang][Sema] better -Wcast-function-type diagnose for pointer parameters and parameters with cv-qualifiers.

ping

Apr 22 2021, 4:53 PM · Restricted Project
ychen added a comment to D101017: [NewPM] Make GlobalsAA available earlier in the pipeline.

That sounds like less powerful optimizations due to the GlobalsAA movement. If adding a GlobalsAA plus D100917 is a win overall, that should work.

Apr 22 2021, 2:33 PM · Restricted Project, Restricted Project
ychen accepted D101016: [IR][sanitizer] Add module flag "frame-pointer" and set it for cc1 -mframe-pointer={non-leaf,all}.

LGTM

Apr 22 2021, 1:20 PM · Restricted Project, Restricted Project

Apr 21 2021

ychen added a comment to D100912: [docs][NewPM] Add section on analyses.

Thanks for writing this up. It is very helpful. Some comments inline.

Apr 21 2021, 5:37 PM · Restricted Project
ychen added a comment to D100917: [NewPM] Only invalidate modified functions' analyses in CGSCC passes.

It seems awkward to enhance PreservedAnalyses to only apply to specific functions when this patch is explicitly doing exactly what we want. If you have a clean way of modeling this inside PreservedAnalyses I'm happy to go with that, but so far I think this is the cleanest way.

Apr 21 2021, 4:57 PM · Restricted Project, Restricted Project

Apr 20 2021

ychen added a comment to D100917: [NewPM] Only invalidate modified functions' analyses in CGSCC passes.

I think we should be careful about letting passes invalidate analyses directly. IIUC the only interface for propagating invalidation from passes is PreservedAnalyses. Is there any way to enhance PreservedAnalyses to achieve the same effect?

Apr 20 2021, 10:56 PM · Restricted Project, Restricted Project
ychen added inline comments to D100739: [Coroutines] Handle overaligned frame allocation (2).
Apr 20 2021, 8:37 PM · Restricted Project, Restricted Project
ychen added inline comments to D100614: [Coroutine] Collect CoroBegin if all of terminators are dominated by one coro.destroy.
Apr 20 2021, 5:53 PM · Restricted Project
ychen added inline comments to D100739: [Coroutines] Handle overaligned frame allocation (2).
Apr 20 2021, 5:52 PM · Restricted Project, Restricted Project

Apr 19 2021

ychen updated the summary of D100739: [Coroutines] Handle overaligned frame allocation (2).
Apr 19 2021, 9:24 PM · Restricted Project, Restricted Project
ychen added a comment to D100739: [Coroutines] Handle overaligned frame allocation (2).

I hadn't looked into the details. I would try to make it.
But from my understanding to this problem, the correct solution should remain the previous behavior if the end user doesn't specify alignas for promise_type. I mean, it shouldn't make so many test cases fail.

Apr 19 2021, 9:21 PM · Restricted Project, Restricted Project
ychen added a comment to D98591: [CodeGen] Add extension points for TargetPassConfig::addMachinePasses.

@drti hi, could you please rebase the patch? The base commit seems not in the git history.

Apr 19 2021, 12:10 AM · Restricted Project

Apr 18 2021

ychen updated the diff for D100739: [Coroutines] Handle overaligned frame allocation (2).

Fix comment. Ready for review.

Apr 18 2021, 10:46 PM · Restricted Project, Restricted Project