Page MenuHomePhabricator

[Coroutines] Handle overaligned frame allocation
Needs ReviewPublic

Authored by ychen on Mar 3 2021, 11:21 PM.

Details

Summary

by over-allocating and emitting alignTo code to adjust the frame start address.

Motivation: on a lot of machines, malloc returns >=std::max_align_t (usually just 16) aligned heap regardless of the coro frame's preferred alignment (usually specified using alignas() on the promise or some local variables). For non-coroutine-related context, this is handled by calling overloaded operator new where an alignment could be specified. For coroutine, spec here https://eel.is/c++draft/dcl.fct.def.coroutine#9.1 suggested that the alignment argument is not considered during name lookup.

Mathias Stearn and @lewissbaker suggested this is the proper workaround before the issue is addressed by the spec.

One example showing the issue: https://gcc.godbolt.org/z/rGzaco

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ychen planned changes to this revision.May 5 2021, 5:54 PM

Plan to rebase this together with the following patch for two lookups (aligned and non-aligned new/delete, and generate code accordingly)

ychen updated this revision to Diff 343956.EditedMay 9 2021, 7:41 PM
  • Rebase on D102145.
  • Dynamically adjust the alignment for allocation and deallocation if the selected allocator does not have std::align_val_t argument. Otherwise, use the aligned allocation/deallocation function.
ychen updated this revision to Diff 343959.May 9 2021, 8:25 PM
  • Rebase

which should be correct. It is implemented by CodeGenFunction::EmitBuiltinAlignTo.

I agree it is correct. I just want to say we should comment it to avoid confusing.

Since the patch could handle the case if the frontend tries to search ::operator new(size_t, align_val_t), this patch should be based on D102147.

ychen updated this revision to Diff 344602.May 11 2021, 5:11 PM
  • Rebase on updated D102145 (use llvm.coro.raw.frame.ptr.addr during allocation)
ychen added a comment.May 11 2021, 5:23 PM

which should be correct. It is implemented by CodeGenFunction::EmitBuiltinAlignTo.

I agree it is correct. I just want to say we should comment it to avoid confusing.

Happy to do it in a separate patch since this patch does not change the implementation of CodeGenFunction::EmitBuiltinAlignTo.

Since the patch could handle the case if the frontend tries to search ::operator new(size_t, align_val_t), this patch should be based on D102147.

This patch *could* handle both aligned and normal new/delete, so it doesn't need D102147 to work correctly?
D102147 depends on this patch since it may find a non-aligned new/delete for overaligned frame. In such a case, this patch is required.

ChuanqiXu added inline comments.May 12 2021, 5:00 AM
clang/include/clang/AST/StmtCXX.h
356–359 ↗(On Diff #344602)

Can't we merge these?

clang/lib/CodeGen/CGCoroutine.cpp
436–450

It looks like it would emit a deallocate first, and emit an alignedDeallocate, which is very odd. Although I can find that the second deallocate wouldn't be emitted due to the check LastCoroFreeUsedForDealloc, it is still very odd to me. If the second deallocate wouldn't come up all the way, what's the reason we need to write emit(deallocate) twice?

441–479

This code would only work if we use ::operator new(size_t, align_val_t), which is implemented in another patch. I would suggest to move this into that one.

593

Since hasAlignArg is called only once, I suggested to make it a lambda here which would make the code more easy to read.

595–597

I recommend to add a detailed comment here to tell the story why we need to over allocate the frame. It is really hard to understand for people who are new to this code. Otherwise, I think they need to use git blame to find the commit id and this review page to figure the reasons out.

599–621

It may be better to organize it as:

if (!HasAlignArg) {
   if (auto *RetOnAllocFailure = S.getReturnStmtOnAllocFailure()) {
       auto *Cond = Builder.CreateICmpNE(AlignedAllocateCall, NullPtr);
       AlignAllocBB2 = createBasicBlock("coro.alloc.align2");
       Builder.CreateCondBr(Cond, AlignAllocBB2, RetOnFailureBB);
       EmitBlock(AlignAllocBB2);
   }
   auto *CoroAlign = Builder.CreateCall(
        CGM.getIntrinsic(llvm::Intrinsic::coro_align, SizeTy));
   ...
}
733

Is it possible that it would return a nullptr value?

ychen marked an inline comment as done.May 12 2021, 9:59 PM
ychen added inline comments.
clang/include/clang/AST/StmtCXX.h
356–359 ↗(On Diff #344602)

I'm not sure about the "merge" here. Could you be more explicit?

clang/lib/CodeGen/CGCoroutine.cpp
436–450

Agree that LastCoroFreeUsedForDealloc is a bit confusing. It makes sure deallocation and aligned deallocation share one coro.free. Otherwise, AFAIK, there would be two coro.free get codegen'd.

%mem = llvm.coro.free()
br i1 <overalign> , label <aligend-dealloc>, label <dealloc>

aligend-dealloc:
    use %mem

dealloc:
    use %mem

what's the reason we need to write emit(deallocate) twice?

John wrote a code snippet here: https://reviews.llvm.org/D100739#2717582. I think it would be helpful to look at the changed tests below to see the patterns.

Basically, for allocation, it looks like below; for deallocation, it would be similar.

void *rawFrame =nullptr;
...
if (llvm.coro.alloc()) {
  size_t size = llvm.coro.size(), align = llvm.coro.align();
  if (align > NEW_ALIGN) {
#if <an allocation function without std::align_val_t argument is selected by Sema>
    size += align - NEW_ALIGN + sizeof(void*);
    frame = operator new(size);
    rawFrame = frame;
    frame = (frame + align - 1) & ~(align - 1);
#else
    // If an aligned allocation function is selected.
    frame = operator new(size, align);
#endif
  } else {
    frame = operator new(size);
  }
}

The true branch of the #if directive is equivalent to "coro.alloc.align" block (and "coro.alloc.align2" if get_return_object_on_allocation_failure is defined), the false branch is equivalent to "coro.alloc" block.
The above pattern handles both aligned/normal allocation/deallocation so it is independent of D102147.

441–479

It handles both aligned and normal new/delete.

593

will do

595–597

will do.

733

Not that I know of. Because there is an early return

if (!CoroFree) {
  CGF.CGM.Error(Deallocate->getBeginLoc(),
                "Deallocation expressoin does not refer to coro.free");
  return;
}
ychen updated this revision to Diff 345053.May 12 2021, 10:58 PM
ychen marked an inline comment as done.
  • Address feedbacks.
ychen marked 3 inline comments as done.May 12 2021, 10:58 PM
ChuanqiXu added inline comments.May 12 2021, 11:06 PM
clang/include/clang/AST/StmtCXX.h
356–359 ↗(On Diff #344602)

Sorry. I mean if we can merge Allocate with AlignedAllocate and merge Deallocate with AlignedDeallocate. Since from the implementation, it looks like the value of Allocate and AlignedAllocate (so as Deallocate and AlignedDeallocate) are the same.

clang/lib/CodeGen/CGCoroutine.cpp
436–450

Thanks. I get the reason why I am thinking the code isn't natural. Since I think ::operator new(size_t, align_val_t) shouldn't come up in this patch which should be available after D102147 applies. Here you said this patch is independent with D102147, I believe this patch could work without D102147. But it contains the codes which would work only if we applies the successor patch, so I think it is dependent on D102147.

The ideally relationship for me is to merge D102145 into this one (Otherwise it is weird for me that D102145 only introduces some intrinsics which wouldn't be used actually). Then this patch should handle the alignment for variables in coroutine frame without introducing ::new(size_t, align_val_t). Then the final patch could do the job that searching and generating code for ::new(size_t, align_val_t).

Maybe it is a little bit hard to rebase again and again. But I think it is better.

733

Do you think it is better to merge this check here?

 if (CurCoro.Data && CurCoro.Data->LastCoroFreeUsedForDealloc) {
     if (!CoroFree) {
          CGF.CGM.Error(Deallocate->getBeginLoc(),
                "Deallocation expressoin does not refer to coro.free");
          return something;
     }
    return RValue::get(CurCoro.Data->LastCoroFree);
}
ychen added inline comments.May 12 2021, 11:19 PM
clang/lib/CodeGen/CGCoroutine.cpp
436–450

I think I know where the confusion comes from. AlignedDeallocate is not guaranteed to be an aligned allocator. In this patch in SemaCoroutine.cpp, it is set to Deallocate in which case we always dynamically adjust frame alignment. Once D102147 is landed. AlignedDeallocate may or may not be an aligned allocator.

The ideally relationship for me is to merge D102145 into this one (Otherwise it is weird for me that D102145 only introduces some intrinsics which wouldn't be used actually). Then this patch should handle the alignment for variables in coroutine frame without introducing ::new(size_t, align_val_t). Then the final patch could do the job that searching and generating code for ::new(size_t, align_val_t).

I was worried about the size of the patch if this is merged with D102145 but if that is preferred by more than one reviewer, I'll go ahead and do that. D102145 is pretty self-contained in that it does not contain clients of the added intrinsics but the introduced test should cover the expected intrinsic lowering.

ychen added inline comments.May 12 2021, 11:22 PM
clang/include/clang/AST/StmtCXX.h
356–359 ↗(On Diff #344602)

Oh, this is to set the path for D102147 where Allocate and AlignedAllocate could be different. If I do this in D102147, it will also touch the CGCoroutine.cpp which I'm trying to avoid` since it is intended to be a Sema only patch.

ychen added inline comments.May 12 2021, 11:29 PM
clang/lib/CodeGen/CGCoroutine.cpp
436–450

Naming is hard. I had a hard time figuring out a better name. AlignedDeallocate/AlignedAllocate is intended to refer to allocator/deallocator used for handling overaligned frame. Not that they are referring to allocator/deallocator with std::align_val_t argument.

ChuanqiXu added inline comments.May 12 2021, 11:39 PM
clang/include/clang/AST/StmtCXX.h
356–359 ↗(On Diff #344602)

Yeah, this is the key different point between us. I think that D102147 could and should to touch the CodeGen part.

clang/lib/CodeGen/CGCoroutine.cpp
436–450

I think it is better for me to merge D102145 into this one to understand this patch. For example, the test cases in D102145 looks weird to me since it doesn't do over alignment at all like we discussed in that thread. Maybe my understanding is not right, but I think it isn't pretty self-contained. I am OK to wait for opinions from other reviewers.

ychen updated this revision to Diff 352189.Tue, Jun 15, 10:47 AM

Thanks a lot. Could you add me as a reviewer? So that I can see this patch in my main page.

It looks not easy to review completely. After a rough review, the first suggest is to remove the contents that should be in 'D102147', it makes the complex problem more complex I think.
I would try to do more detailed review and test it if possible.

clang/include/clang/AST/StmtCXX.h
331–335 ↗(On Diff #352189)

Minor issue: the intention of the comments should be the same.

356–359 ↗(On Diff #344602)

As we discussed before, I prefer to merge Allocate and AlignedAllocate (also Deallocate and AlignedDeallocate ) in this patch. It looks weird the are the same in one commit.

clang/lib/CodeGen/CGCoroutine.cpp
416

We should capitalize it as 'OverAllocateFrame'

602–607

if (HasAlignArg) should be the content of the next patch 'D102147', right? I don't think they should come here.

644–645

Maybe we could calculate it in place instead of trying to call a function which is not designed for llvm::value*. It looks like the calculation isn't too complex.

654

Does here miss a branch to InitBB?

clang/lib/CodeGen/CodeGenFunction.h
1920–1921 ↗(On Diff #352189)

We shouldn't add this interface. The actual type for the first argument is BuiltinAlignArgs*, which defined in .cpp files. The signature is confusing.

ychen updated this revision to Diff 352617.Wed, Jun 16, 8:22 PM
ychen marked 3 inline comments as done.
  • Remove AlignedAllocator & AlignedDeallocator.
  • Use 'OverAllocateFrame'
  • Get rid of if (HasAlignArg)
clang/lib/CodeGen/CGCoroutine.cpp
644–645

I'm open to not calling EmitBuiltinAlignTo, which basically inline the useful parts of EmitBuiltinAlignTo. The initial intention is code sharing and easy readability. What's the benefit of not calling it?

654

EmitBlock would handle the case.

clang/lib/CodeGen/CodeGenFunction.h
1920–1921 ↗(On Diff #352189)

This is a private function, supposedly only meaningful for the implementation. In that situation do you think it's bad?

It misses the part in llvm now.

clang/lib/CodeGen/CGCoroutine.cpp
433–471

We don't need this in this patch.

464–465

Do we still need this change?

473

Capitalize EmitCheckAlignBasicBlock

644–645

Reusing code is good. But my main concern is that the design for the interfaces. The current design smells bad to me. Another reason for implement it in place is I think it is not very complex and easy to understand.

Another option I got is to implement EmitBuitinAlign in LLVM (someplace like Local), then the CodeGenFunction:: EmitBuitinAlign and this function could use it.

clang/lib/CodeGen/CodeGenFunction.h
1920–1921 ↗(On Diff #352189)

It makes no sense to me that we can add interfaces casually if it is private. For the users of Clang/LLVM, it may be OK since they wouldn't look into the details. But for the developers, it matters. For example, I must be very confused when I see this signature. Why is the type of Args is void*? What's the type should I passed in? The smell is really bad.

ychen updated this revision to Diff 352810.Thu, Jun 17, 11:43 AM
  • Rebase correctly
ychen updated this revision to Diff 352908.Thu, Jun 17, 8:40 PM
ychen marked 2 inline comments as done.
  • Not use void * in EmitBuiltinAlignTo signature.
clang/lib/CodeGen/CGCoroutine.cpp
433–471

Do you mean // Match size_t argument with the one used during allocation. or the function emitDynamicAlignedDealloc? I think either is needed here. Could you please elaborate?

464–465

Nope

644–645

Reusing code is good. But my main concern is that the design for the interfaces. The current design smells bad to me. Another reason for implement it in place is I think it is not very complex and easy to understand.

Another option I got is to implement EmitBuitinAlign in LLVM (someplace like Local), then the CodeGenFunction:: EmitBuitinAlign and this function could use it.

clang/lib/CodeGen/CodeGenFunction.h
1920–1921 ↗(On Diff #352189)

It makes no sense to me that we can add interfaces casually if it is private. For the users of Clang/LLVM, it may be OK since they wouldn't look into the details. But for the developers, it matters. For example, I must be very confused when I see this signature. Why is the type of Args is void*? What's the type should I passed in? The smell is really bad.

A remained question.

  • what's the semantics if user specified their allocation/deallocation functions?

Previously, we discussed for the ::operator new and ::operator delete. But what would we do for user specified allocation/deallocation functions?
It looks like we would treat them just like ::operator new. And it makes sense at the first glance. But the problem comes from that we judge whether
or not to over allocate a frame by this condition:

coro.align > align of new

But if the user uses their new, it makes no sense to use the align of new as the condition. On the other hand, if user specified their new function and the
alignment requirement for their promise type, would it be better really that the compiler do the extra transformation?

May be we need to discuss with other guys who are more familiar with the C++ standard to answer this.

clang/lib/CodeGen/CGCoroutine.cpp
416

It looks like we'd better to add comment for this function.

417

CoroSizeIdx should be zero all the time in this patch.

427–429

In other comments, I find 'size += align - NEW_ALIGN + sizeof(void*);'. But I don't find sizeof(void*) in this function.

433–471

Sorry for that I misunderstand this function earlier.

448–457

We allocate overaligned-frame like:

| --- for align --- | --- true frame --- |

And here we set the argument to the address of true frame. Then I wonder how about the memory for the for align part. Would we still free them correctly? Maybe I missed something.

462–468

We don't need to handle this condition in this patch.

557

It may be better to rename AlignAllocBB2 as AlignAllocBBCont or something similar.

624

It looks better to add an assert for RetOnFailureBB. I think it may be nullptr at the first glance.

638–639

We remove this assignment and use AlignedUpAddr directly in the following.

644–645

I guess you forgot to reply what you want to say.

clang/lib/CodeGen/CodeGenFunction.h
1920–1921 ↗(On Diff #352189)

It looks like you missed something?

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1133–1137 ↗(On Diff #352908)

Why we move this snippet to the front? Although it is not defined, the layout for the current frame would be:

| resume func addr | destroy func addr | promise | other things needed |

Move this to the front may break this.

1141–1144 ↗(On Diff #352908)

We need to edit the comment too.