This is an archive of the discontinued LLVM Phabricator instance.

[Coroutines] Refactor/Rewrite Spill and Alloca processing
ClosedPublic

Authored by lxfind on Oct 5 2020, 11:11 PM.

Details

Summary

This patch is a refactoring of how we process spills and allocas during CoroSplit.
In the previous implementation, everything that needs to go to the heap is put into Spills, including all the values defined by allocas.
And the way to identify a Spill, is to check whether there exists a use-def relationship that crosses suspension points.

This approach is fundamentally confusing, and unfortunately, incorrect.
First of all, allocas are always process differently than spills, hence it's quite confusing to put them together. It's a much cleaner to separate them and process them separately.
Doing so simplify lots of code and makes the logic more clear and easier to reason about.

Secondly, use-def relationship is insufficient to decide whether a value defined by AllocaInst needs to go to the heap.
There are many cases where a value defined by AllocaInst can implicitly be used across suspension points without a direct use-def relationship.
For example, you can store the address of an alloca into the heap, and load that address after suspension. Or you can escape the address into an object through a function call.
Or you can have a PHINode that takes two allocas, and this PHINode is used across suspension point (when this happens, the existing implementation will spill the PHINode, a.k.a a stack adddress to the heap!).
All these issues suggest that we need to separate spill and alloca in order to properly implement this.
This patch does not yet fix these bugs, however it sets up the code in a better shape so that we can start fixing them in the next patch.

The core idea of this patch is to add a new struct called FrameDataInfo, which contains all Spills, all Allocas, and a map from each definition to its layout index in the frame (FieldIndexMap).
Spills and Allocas are identified, stored and processed independently. When they are initially added to the frame, we record their field index through FieldIndexMap. When the frame layout is finalized, we update each index into their final layout index.

In doing so, I also cleaned up a few things and also discovered a few other bugs.

Cleanups:

  1. Found out that PromiseFieldId is not used, delete it.
  2. Previously, SpillInfo is a vector, which is strange because every def can have multiple users. This patch cleans it up by turning it into a map from def to users.
  3. Previously, a frame Field struct contains a list of Spills that field corresponds to. This isn't necessary since we only need the layout index for each given definition. This patch removes that list. Instead, we connect each field and definition using the FieldIndexMap.
  4. All the loops that process Spills are simplified now because we use a map instead of a vector.

Bugs:
It seems that we are only keeping llvm.dbg.declare intrinsics in the .resume part of the function. The ramp function will no longer has it. This means we are dropping some debug information in the ramp function.

The next step is to start fixing the bugs where the implementation fails to identify some allocas that should live on the frame.

Diff Detail

Event Timeline

lxfind created this revision.Oct 5 2020, 11:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 5 2020, 11:11 PM
lxfind requested review of this revision.Oct 5 2020, 11:11 PM

Thanks for your patch! It mentions some bugs about allocas we need to handle in the future.

For this patch, I'm little confusing about why we need to separate alloca from spills. In my mind, a spill means something we need to put in the frame. And an alloca which would be in the frame is naturally a spill.
I think the patch benefits from replacing

using SpillInfo = SmallVector<Spill, 8>;

to

using SpillInfo = SmallMapVector<Value *, SmallVector<Instruction *, 2>, 8>;
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1104

I'm confusing about the comment. It says it would migrate dbg.declare from alloca. But how could it know the CurrentValue must be alloca from the context ?

lxfind added a comment.Oct 9 2020, 8:32 AM

Thanks for your patch! It mentions some bugs about allocas we need to handle in the future.

For this patch, I'm little confusing about why we need to separate alloca from spills. In my mind, a spill means something we need to put in the frame. And an alloca which would be in the frame is naturally a spill.
I think the patch benefits from replacing

using SpillInfo = SmallVector<Spill, 8>;

to

using SpillInfo = SmallMapVector<Value *, SmallVector<Instruction *, 2>, 8>;

Thanks for taking a look at the patch. If you look at the implementation, the handling of "Spills" and always different from the handling of allocas. They do share the same concept that they need to go to the frame (which is why they both belong to FrameDataInfo).
The primary reason to separate them (and hence set up the code for future fixes), is this one primary difference between them: A Spill is a direct def-use relationship that crosses suspension points; while an alloca may not be exposed to a direct def-use relationship that crosses suspension points but still need to go to the frame. The condition for them to go to the frame is fundamentally different.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
1104

I think dbg.declare instructions were all generated for alloca instructions.

Thanks for your patch! It mentions some bugs about allocas we need to handle in the future.

For this patch, I'm little confusing about why we need to separate alloca from spills. In my mind, a spill means something we need to put in the frame. And an alloca which would be in the frame is naturally a spill.
I think the patch benefits from replacing

using SpillInfo = SmallVector<Spill, 8>;

to

using SpillInfo = SmallMapVector<Value *, SmallVector<Instruction *, 2>, 8>;

Thanks for taking a look at the patch. If you look at the implementation, the handling of "Spills" and always different from the handling of allocas. They do share the same concept that they need to go to the frame (which is why they both belong to FrameDataInfo).
The primary reason to separate them (and hence set up the code for future fixes), is this one primary difference between them: A Spill is a direct def-use relationship that crosses suspension points; while an alloca may not be exposed to a direct def-use relationship that crosses suspension points but still need to go to the frame. The condition for them to go to the frame is fundamentally different.

I agree with we can benefit from separating alloca from spills. At least we don't need extract allocas from spills by a redundant loop any more. After we separate allocas from spills, the name spills seems a little strange. But I think it doesn't really matter.

Here is a question about the bug mentioned in summary. I write a simple code like this:

void consuming(int* pa);

task escape_alloca() {
  int a;
  consuming(&a);
  co_await something();
}

But clang would still put a in the frame in O0 or O2. I guess it is because the def of a and the use of a is cross initial_suspend (in O0 mode) or the lifetime markers is cross the co_await something point. Is there anything wrong? Or what is the example about the bug?

lxfind added a comment.Oct 9 2020, 9:33 PM

Thanks for your patch! It mentions some bugs about allocas we need to handle in the future.

For this patch, I'm little confusing about why we need to separate alloca from spills. In my mind, a spill means something we need to put in the frame. And an alloca which would be in the frame is naturally a spill.
I think the patch benefits from replacing

using SpillInfo = SmallVector<Spill, 8>;

to

using SpillInfo = SmallMapVector<Value *, SmallVector<Instruction *, 2>, 8>;

Thanks for taking a look at the patch. If you look at the implementation, the handling of "Spills" and always different from the handling of allocas. They do share the same concept that they need to go to the frame (which is why they both belong to FrameDataInfo).
The primary reason to separate them (and hence set up the code for future fixes), is this one primary difference between them: A Spill is a direct def-use relationship that crosses suspension points; while an alloca may not be exposed to a direct def-use relationship that crosses suspension points but still need to go to the frame. The condition for them to go to the frame is fundamentally different.

I agree with we can benefit from separating alloca from spills. At least we don't need extract allocas from spills by a redundant loop any more. After we separate allocas from spills, the name spills seems a little strange. But I think it doesn't really matter.

Here is a question about the bug mentioned in summary. I write a simple code like this:

void consuming(int* pa);

task escape_alloca() {
  int a;
  consuming(&a);
  co_await something();
}

But clang would still put a in the frame in O0 or O2. I guess it is because the def of a and the use of a is cross initial_suspend (in O0 mode) or the lifetime markers is cross the co_await something point. Is there anything wrong? Or what is the example about the bug?

It's harder to generate an example from source code (it has to be quite complicated, I have some production code, but not small enough to share). But it's easy to see from IR examples.
Consider the following IR (you can run it through opt --coro-split:

define i8* @f(i1 %n) "coroutine.presplit"="1" {
entry:
  %a = alloca i32, align 8
  %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
  %size = call i32 @llvm.coro.size.i32()
  %alloc = call i8* @malloc(i32 %size)
  %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)

  %flag = call i1 @check()
  br i1 %flag, label %flag_true, label %flag_false

flag_true:
  %b = bitcast i32* %a to i8*
  br label %merge

flag_false:
  %c = bitcast i32* %a to i8*
  br label %merge

merge:
  %d = phi i8* [ %b, %flag_true ], [ %c, %flag_false ]
  %sp1 = call i8 @llvm.coro.suspend(token none, i1 false)
  switch i8 %sp1, label %suspend [i8 0, label %resume
                                  i8 1, label %cleanup]
resume:
  call void @print(i8* %d)
  br label %cleanup

cleanup:
  %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
  call void @free(i8* %mem)
  br label %suspend
suspend:
  call i1 @llvm.coro.end(i8* %hdl, i1 0)
  ret i8* %hdl
}

declare i8* @llvm.coro.free(token, i8*)
declare i32 @llvm.coro.size.i32()
declare i8  @llvm.coro.suspend(token, i1)
declare void @llvm.coro.resume(i8*)
declare void @llvm.coro.destroy(i8*)

declare token @llvm.coro.id(i32, i8*, i8*, i8*)
declare i1 @llvm.coro.alloc(token)
declare i8* @llvm.coro.begin(token, i8*)
declare i1 @llvm.coro.end(i8*, i1)

declare noalias i8* @malloc(i32)
declare i1 @check()
declare void @print(i8*)
declare void @free(i8*)

Notice that %a's alias is stored in a PHI, which is used after suspend. When you run coro-split on it, the current implementation will think that only the PHI needs to be spilled, while %a can stay on stack.
So in the generated IR, %a will still be an alloca, but a pointer to it will be store to the frame! The generated IR looks like this:

define i8* @f(i1 %n) {
entry:
  %a = alloca i32, align 8
  ...
  %b = bitcast i32* %a to i8*
  %d.spill.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
  store i8* %b, i8** %d.spill.addr, align 8
  ...
}

This is just one example, but you can also escape it by storing the address, or call a function.

rjmccall accepted this revision.Oct 9 2020, 10:48 PM

This is a nice cleanup, thank you. Just minor comments.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
721

Well, it needs to be explicitly added here because it's a header field which has to have a fixed offset based on its alignment, which is *why* it's not in FrameData.Allocas.

753

It's just "index" now.

1958

I see that this seems to be the existing algorithm, and I know this is intended to be a refactoring patch rather than a functional one, so I'll let this go without comment beyond expressing my eagerness to see a follow-up. :)

llvm/lib/Transforms/Coroutines/CoroInternal.h
125

Yeah, I think this is never used directly — clients expect to find it by offset, and within the function it's of course just referenced as a value.

This revision is now accepted and ready to land.Oct 9 2020, 10:48 PM

Thanks for your patch! It mentions some bugs about allocas we need to handle in the future.

For this patch, I'm little confusing about why we need to separate alloca from spills. In my mind, a spill means something we need to put in the frame. And an alloca which would be in the frame is naturally a spill.
I think the patch benefits from replacing

using SpillInfo = SmallVector<Spill, 8>;

to

using SpillInfo = SmallMapVector<Value *, SmallVector<Instruction *, 2>, 8>;

Thanks for taking a look at the patch. If you look at the implementation, the handling of "Spills" and always different from the handling of allocas. They do share the same concept that they need to go to the frame (which is why they both belong to FrameDataInfo).
The primary reason to separate them (and hence set up the code for future fixes), is this one primary difference between them: A Spill is a direct def-use relationship that crosses suspension points; while an alloca may not be exposed to a direct def-use relationship that crosses suspension points but still need to go to the frame. The condition for them to go to the frame is fundamentally different.

I agree with we can benefit from separating alloca from spills. At least we don't need extract allocas from spills by a redundant loop any more. After we separate allocas from spills, the name spills seems a little strange. But I think it doesn't really matter.

Here is a question about the bug mentioned in summary. I write a simple code like this:

void consuming(int* pa);

task escape_alloca() {
  int a;
  consuming(&a);
  co_await something();
}

But clang would still put a in the frame in O0 or O2. I guess it is because the def of a and the use of a is cross initial_suspend (in O0 mode) or the lifetime markers is cross the co_await something point. Is there anything wrong? Or what is the example about the bug?

It's harder to generate an example from source code (it has to be quite complicated, I have some production code, but not small enough to share). But it's easy to see from IR examples.
Consider the following IR (you can run it through opt --coro-split:

define i8* @f(i1 %n) "coroutine.presplit"="1" {
entry:
  %a = alloca i32, align 8
  %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
  %size = call i32 @llvm.coro.size.i32()
  %alloc = call i8* @malloc(i32 %size)
  %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)

  %flag = call i1 @check()
  br i1 %flag, label %flag_true, label %flag_false

flag_true:
  %b = bitcast i32* %a to i8*
  br label %merge

flag_false:
  %c = bitcast i32* %a to i8*
  br label %merge

merge:
  %d = phi i8* [ %b, %flag_true ], [ %c, %flag_false ]
  %sp1 = call i8 @llvm.coro.suspend(token none, i1 false)
  switch i8 %sp1, label %suspend [i8 0, label %resume
                                  i8 1, label %cleanup]
resume:
  call void @print(i8* %d)
  br label %cleanup

cleanup:
  %mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
  call void @free(i8* %mem)
  br label %suspend
suspend:
  call i1 @llvm.coro.end(i8* %hdl, i1 0)
  ret i8* %hdl
}

declare i8* @llvm.coro.free(token, i8*)
declare i32 @llvm.coro.size.i32()
declare i8  @llvm.coro.suspend(token, i1)
declare void @llvm.coro.resume(i8*)
declare void @llvm.coro.destroy(i8*)

declare token @llvm.coro.id(i32, i8*, i8*, i8*)
declare i1 @llvm.coro.alloc(token)
declare i8* @llvm.coro.begin(token, i8*)
declare i1 @llvm.coro.end(i8*, i1)

declare noalias i8* @malloc(i32)
declare i1 @check()
declare void @print(i8*)
declare void @free(i8*)

Notice that %a's alias is stored in a PHI, which is used after suspend. When you run coro-split on it, the current implementation will think that only the PHI needs to be spilled, while %a can stay on stack.
So in the generated IR, %a will still be an alloca, but a pointer to it will be store to the frame! The generated IR looks like this:

define i8* @f(i1 %n) {
entry:
  %a = alloca i32, align 8
  ...
  %b = bitcast i32* %a to i8*
  %d.spill.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 2
  store i8* %b, i8** %d.spill.addr, align 8
  ...
}

This is just one example, but you can also escape it by storing the address, or call a function.

Good example! I got it. This patch looks good to me.

lxfind updated this revision to Diff 297439.Oct 10 2020, 9:24 PM

address comments; rebase

This revision was landed with ongoing or failed builds.Oct 10 2020, 10:23 PM
This revision was automatically updated to reflect the committed changes.