This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Transforms/Coroutines/
-
lib/
-
Transforms/
-
Coroutines/
16/18
CoroFrame.cpp

Differential D154695

[Coroutines] Add an O(n) algorithm for computing the cross suspend point information.
ClosedPublic

Authored by witstorm95 on Jul 7 2023, 2:15 AM.

Download Raw Diff

Details

Reviewers

ChuanqiXu
nikic
GorNishanov
RKSimon

Commits

rG77ef88d7ee32: [Coroutines] Add an O(n) algorithm for computing the cross suspend point

Summary

Fixed https://github.com/llvm/llvm-project/issues/62348

Propagate cross suspend point information by visiting CFG.

Just only go through two times at most, you can get all the cross suspend point information.

Before the patch:

n: 20000
4.31user 0.11system 0:04.44elapsed 99%CPU (0avgtext+0avgdata 552352maxresident)k
0inputs+8848outputs (0major+126254minor)pagefaults 0swaps

n: 40000
11.24user 0.40system 0:11.66elapsed 99%CPU (0avgtext+0avgdata 1788404maxresident)k
0inputs+17600outputs (0major+431105minor)pagefaults 0swaps

n: 60000
21.65user 0.96system 0:22.62elapsed 99%CPU (0avgtext+0avgdata 3809836maxresident)k
0inputs+26352outputs (0major+934749minor)pagefaults 0swaps

n: 80000
37.05user 1.53system 0:38.58elapsed 99%CPU (0avgtext+0avgdata 6602396maxresident)k
0inputs+35096outputs (0major+1622584minor)pagefaults 0swaps

n: 100000
51.87user 2.67system 0:54.54elapsed 99%CPU (0avgtext+0avgdata 10210736maxresident)k
0inputs+43848outputs (0major+2518945minor)pagefaults 0swaps

After the patch:

n: 20000
3.17user 0.16system 0:03.33elapsed 100%CPU (0avgtext+0avgdata 551736maxresident)k
0inputs+8848outputs (0major+126192minor)pagefaults 0swaps

n: 40000
6.10user 0.42system 0:06.54elapsed 99%CPU (0avgtext+0avgdata 1787848maxresident)k
0inputs+17600outputs (0major+432212minor)pagefaults 0swaps

n: 60000
9.13user 0.89system 0:10.03elapsed 99%CPU (0avgtext+0avgdata 3809108maxresident)k
0inputs+26352outputs (0major+931280minor)pagefaults 0swaps

n: 80000
12.44user 1.57system 0:14.02elapsed 99%CPU (0avgtext+0avgdata 6603432maxresident)k
0inputs+35096outputs (0major+1624635minor)pagefaults 0swaps

n: 100000
16.29user 2.28system 0:18.59elapsed 99%CPU (0avgtext+0avgdata 10212808maxresident)k
0inputs+43848outputs (0major+2522200minor)pagefaults 0swaps

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

witstorm95 created this revision.Jul 7 2023, 2:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2023, 2:15 AM

Herald added subscribers: ChuanqiXu, hiraditya. · View Herald Transcript

witstorm95 requested review of this revision.Jul 7 2023, 2:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2023, 2:15 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

witstorm95 added reviewers: ChuanqiXu, nikic.Jul 7 2023, 2:25 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptJul 7 2023, 2:25 AM

witstorm95 edited the summary of this revision. (Show Details)Jul 7 2023, 2:37 AM

Harbormaster completed remote builds in B243705: Diff 538037.Jul 7 2023, 3:30 AM

witstorm95 removed a project: Restricted Project.Jul 7 2023, 5:16 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2023, 5:16 PM

witstorm95 added reviewers: GorNishanov, RKSimon.Jul 9 2023, 6:29 PM

Thanks for working on this.

While I haven't looked into the details, would you like to give the new algorithm an abstract explanation in the comments? Also I suggested to avoid using names like IndegreeTmp, VisitingTmp and Tmp. It is better to use informative words.

Add algorithm comments.

Harbormaster completed remote builds in B244043: Diff 538502.Jul 9 2023, 9:21 PM

Sorry. I still can't understand the algorithm yet. e.g., why is it sufficient to iterate the graph 3 times? How do we identity loops in the algorithm? Could you elaborate it in the code?

Thanks for your suggestions, more clear now.

I got your point roughly. And it makes sense to me. But the patch itself is still not so readable. It will be better to refactor it to make it more informative.

For example,

VisitingSave is not different with VisitingTmp really.
We can't understand the meaning of things like TopVisiting. In case it is hard to give a very meaningful name, we should add a comment to describe the meaning of the variable.
It is confusing that the variables with names like Indegree and HasLoop can change during the iteration. Since such names indicates that these variables tries to describe the attributes of the graph. So that they shouldn't change if the graph don't change.
The names J and the first, second, third iterations are really confusing. We should split them into phases with formal name.
Also it is bad to put things in a big loop. We can split them with different functionalities. e.g., we can have a phase to collect consume information and another phase to calculate the kill information.

Also for correctness, it is recommended to run this with folly library with coroutines enabled to make sure we don't get things wrong.

Harbormaster completed remote builds in B244068: Diff 538538.Jul 10 2023, 2:44 AM

Sorry but I don't know this part of the codebase well enough to review

@ChuanqiXu I have refactor it as your suggestions. Do you think it better now ?

In D154695#4484439, @ChuanqiXu wrote:

Also for correctness, it is recommended to run this with folly library with coroutines enabled to make sure we don't get things wrong.

I have tested folly library. The results are the same before and after this patch. But there still exists some fail. Maybe it's because I use WSL2. Here are the failed cases:
The following tests FAILED:

245 - heap_vector_types_test.HeapVectorTypes.GrowthPolicy (Failed)
1362 - HHWheelTimerTest.HHWheelTimerTest.FireOnce (Failed)
1366 - HHWheelTimerTest.HHWheelTimerTest.CancelTimeout (Failed)
1368 - HHWheelTimerTest.HHWheelTimerTest.SlowFast (Failed)
1369 - HHWheelTimerTest.HHWheelTimerTest.ReschedTest (Failed)
1370 - HHWheelTimerTest.HHWheelTimerTest.DeleteWheelInTimeout (Failed)
1371 - HHWheelTimerTest.HHWheelTimerTest.DefaultTimeout (Failed)
1375 - HHWheelTimerTest.HHWheelTimerTest.IntrusivePtr (Failed)
1487 - lang_exception_test.ExceptionTest.terminate_with_direct (Failed)
1488 - lang_exception_test.ExceptionTest.terminate_with_variadic (Failed)

And failed reason is 'unable to determine jiffies/second: failed to parse kernel release string "5.15.90.1-microsoft-standard-WSL2"'.

In D154695#4487690, @witstorm95 wrote:
In D154695#4484439, @ChuanqiXu wrote:

Also for correctness, it is recommended to run this with folly library with coroutines enabled to make sure we don't get things wrong.

I have tested folly library. The results are the same before and after this patch. But there still exists some fail. Maybe it's because I use WSL2. Here are the failed cases:
The following tests FAILED:
245 - heap_vector_types_test.HeapVectorTypes.GrowthPolicy (Failed)
1362 - HHWheelTimerTest.HHWheelTimerTest.FireOnce (Failed)
1366 - HHWheelTimerTest.HHWheelTimerTest.CancelTimeout (Failed)
1368 - HHWheelTimerTest.HHWheelTimerTest.SlowFast (Failed)
1369 - HHWheelTimerTest.HHWheelTimerTest.ReschedTest (Failed)
1370 - HHWheelTimerTest.HHWheelTimerTest.DeleteWheelInTimeout (Failed)
1371 - HHWheelTimerTest.HHWheelTimerTest.DefaultTimeout (Failed)
1375 - HHWheelTimerTest.HHWheelTimerTest.IntrusivePtr (Failed)
1487 - lang_exception_test.ExceptionTest.terminate_with_direct (Failed)
1488 - lang_exception_test.ExceptionTest.terminate_with_variadic (Failed)
And failed reason is 'unable to determine jiffies/second: failed to parse kernel release string "5.15.90.1-microsoft-standard-WSL2"'.

To make sure I enable coroutine, I write a case to check it. Here is code(test.cpp),

#include <folly/experimental/coro/Task.h>
#include <folly/experimental/coro/BlockingWait.h>
#include <folly/futures/Future.h>
#include <folly/executors/GlobalExecutor.h>
#include <folly/init/Init.h>
#include <iostream>

folly::coro::Task<int> slow() {

std::cout << "before sleep" << std::endl;
co_await folly::futures::sleep(std::chrono::seconds{1});
std::cout << "after sleep" << std::endl;
co_return 1;

}

int main(int argc, char** argv) {

std::cout << FOLLY_HAS_COROUTINES << std::endl;
folly::init(&argc, &argv);
folly::coro::blockingWait(
    slow().scheduleOn(folly::getGlobalCPUExecutor().get()));
return 0;

}

And the compile script b.sh is,

INSTALL_PATH=/tmp/fbcode_builder_getdeps-ZhomeZwitstormZgithubZfollyZbuildZfbcode_builder/installed/
clang++ test.cpp \

-I $INSTALL_PATH/folly/include/  \
-I $INSTALL_PATH/glog-*/include/ \
-I $INSTALL_PATH/gflags-*/include \
-I $INSTALL_PATH/boost-*/include/ \
-I $INSTALL_PATH/fmt-*/include \
-I $INSTALL_PATH/double-conversion-*/include \
-I $INSTALL_PATH/libevent-*/include \
-L $INSTALL_PATH/folly/lib/  \
-L $INSTALL_PATH/glog-*/lib/ \
-L $INSTALL_PATH/gflags-*/lib\
-L $INSTALL_PATH/boost-*/lib/ \
-L $INSTALL_PATH/fmt-*/lib\
-L $INSTALL_PATH/double-conversion-*/lib\
-L $INSTALL_PATH/libevent-*/lib\
-std=c++20 \
-stdlib=libc++ \
-lpthread -lm -ldl -lfolly -lfolly_test_util -lfollybenchmark \
-lglog -lgflags -levent -ldouble-conversion \
-lfmt -lunwind -lboost_context

export LD_LIBRARY_PATH=$INSTALL_PATH/glog-6R0Ow7ztX3g6RnwHaeTiDLzaaoxXwt87lAWO5PSwHzU/lib/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$INSTALL_PATH/gflags-AZEDFvV8PiCcmWJd_mLRhEQ1irNZvVX2HbvgCsFWJ_Q//lib/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$INSTALL_PATH/libevent-nN5usJQWSTX11BbXrm3tORQxwAjvOLlXhgNZrK8zRS4/lib/:$LD_LIBRARY_PATH
./a.out

And the result is,
1
before sleep
after sleep

Thanks for running the out-of-tree tests. I know it is not easy. The failure tests look not related to coroutines. Another method to test this is to add an assert(false) in your code and re-run tests. So that you know you won't get things wrong.

Then let's focus on readability in this review. It looks better now but there is still confusing points like:

if constexpr (Iteration > 1)

Also we'd better to not use the term Indegree so that we can avoid:

Indegrees[I]--;

I suggest to replace the current computeBlockData with 2 functions: collectConsumingBlock<bool SearchBackEdges>, collectKillingBlock. Then the semantics should be much more clear.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
123	In the algorithm, I prefer to use the term `back edge` than `loop`. Since loop has many more meanings and attributes so that the back edges would be more precise. Also with the new term `back edge`, we can describe the algorithm much more simpler: In the initial iterations, all the consume information along the forward edges are collected. (If there is any back edges), we can iterate again to get all the consume information along the back edges. We can compute the Kill informations by the consume informations. In this way, it is much easier to understand too.

Harbormaster completed remote builds in B244340: Diff 538912.Jul 10 2023, 10:46 PM

I suggest to replace the current computeBlockData with 2 functions: collectConsumingBlock<bool SearchBackEdges>, collectKillingBlock. Then the semantics should be much more clear.

For code reuse, I don't really think split computeBlockData to 2 functions is good idea.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
123	@ChuanqiXu Thanks for your comment. I takes some time to figure out what means about terms like back edge and forward edge. Here is DFS Edge Classification. It's really suitable for topological sorting algorithm? If we use that tems, I have some questions about your description of this algorithm: In the initial iterations, all the consume information along the forward edges are collected. The Consume info should along the tree edge and back edge(if exists) are collected. And the same time, the Kill info should be collected too as we don't know there exists back edges or not in initial iteration. (If there is any back edges), we can iterate again to get all the consume information along the back edges. The Consume info should along the tree edge and back edge are collected again and the Kill info should be collected too as you can't compute the all Kill informations in the third iteration only.

For code reuse, I don't really think split computeBlockData to 2 functions is good idea.

My major concern is about the readability. The so called 1,2,3 iterations are confusing. It is magic numbers and reader may have a hard time to read them.

Then out of curiosity, are the algorithm capable to handle the following case?

J -> I
I -> S
S -> I

S is a suspend block. And I.Kill[J] should be false since the path from J to I that contains the suspend block must repeat node I. So it doesn't fit into the definition of Kill. I feel the algorithm doesn't get this right but I am super sure. It is better to give it a test case too.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
123	I takes some time to figure out what means about terms like back edge and forward edge. Here is DFS Edge Classification. It's really suitable for topological sorting algorithm? Sorry for that it is confusing. This is a suggestion instead of a requirement. Different areas have different conventions for terminologies. For example, I didn't see the definition of tree edge and cross edge in the link you give before. In compiler engineerings, the term 'loop' has many more meanings than in pure algorithms. So that the instinct reaction that I had when I see your patch was that you did something wrong. Since I felt if you want loop informations, you should use the existing components to query the loop information. But then I realized all you need about is the back edges. you can't compute the all Kill informations in the third iteration only. It is the third step and it is not necessarily an iteration. For example, in your current implementations, it may still access a node multiple times if it has a lot of predecessors.

In D154695#4492224, @ChuanqiXu wrote:
For code reuse, I don't really think split computeBlockData to 2 functions is good idea.

My major concern is about the readability. The so called 1,2,3 iterations are confusing. It is magic numbers and reader may have a hard time to read them.

Then out of curiosity, are the algorithm capable to handle the following case?
J -> I
I -> S
S -> I
S is a suspend block. And I.Kill[J] should be false since the path from J to I that contains the suspend block must repeat node I. So it doesn't fit into the definition of Kill. I feel the algorithm doesn't get this right but I am super sure. It is better to give it a test case too.

Thanks. This is a good example. But this problem exists previous algorithm too(before this patch). Let me illustrate it for you,

Pick a visiting order: J, I, S

Initial status:

J Consumes: J         Kills:           Suspend: false   End: false     KillLoop: false    Changed: true
I Consumes: I         Kills:           Suspend: false   End: false     KillLoop: false    Changed: true
S Consumes: S         Kills: S         Suspend: true    End: false     KillLoop: false    Changed: true

Excuting computeBlockData</*Initialize=*/true>()

Visit J, J has no predecessors.           Status of J Consumes: J         Kills:           Suspend: false   End: false     KillLoop: false    Changed: true
Visit I, I has predecessors J, S. 
  So propagate J to I. After propagation, status of I Consumes: J, I         Kills:           Suspend: false   End: false     KillLoop: false    Changed: true
  So propagate S to I. After propagation, status of I Consumes: J, I, S         Kills: S         Suspend: false   End: false     KillLoop: false    Changed: true
                           After visit I, status of I Consumes: J, I, S         Kills: S         Suspend: false   End: false     KillLoop: false    Changed: true
Visit S, S has predecessors I.
   So propagate I to S. After propagation status of S Consumes: J, I, S         Kills: J, I, S         Suspend: true    End: false     KillLoop: false    Changed: true

After execute computeBlockData</*Initialize=*/true>(), the status is:

J Consumes: J               Kills:           Suspend: false   End: false     KillLoop: false    Changed: true
I Consumes: J, I, S         Kills: S         Suspend: false   End: false     KillLoop: false    Changed: true
S Consumes: J, I, S         Kills: J, I, S   Suspend: true    End: false     KillLoop: false    Changed: true

Executing while (computeBlockData());

Visit J, J has no predecessors.            Status of J Consumes: J         Kills:           Suspend: false   End: false     KillLoop: false    Changed: false
Visit I, I has predecessors J, S and predecessors have a change. 
   So propagate J to I. After propagation, status of I Consumes: J, I, S         Kills: S          Suspend: false   End: false     KillLoop: false    Changed: true
   So propagate S to I. After propagation, status of I Consumes: J, I, S         Kills: J, S         Suspend: false   End: false     KillLoop: true    Changed: true
                            After visit I, status of I Consumes: J, I, S         Kills: J, S         Suspend: false   End: false     KillLoop: true    Changed: true
 Visit S, S has predecessors I and predecessors have a change.
   So propagate I to S, after propagation status of S Consumes: J, I, S         Kills: J, I, S         Suspend: true    End: false     KillLoop: false    Changed: false

After execute computeBlockData</*Initialize=*/true>() firstly, the status is:

J Consumes: J               Kills:              Suspend: false   End: false     KillLoop: false    Changed: false
I Consumes: J, I, S         Kills: J, S         Suspend: false   End: false     KillLoop: true     Changed: true
S Consumes: J, I, S         Kills: J, I, S      Suspend: true    End: false     KillLoop: false    Changed: false

S is a suspend block. And I.Kill[J] should be false since the path from J to I that contains the suspend block must repeat node I. So it doesn't fit into the definition of Kill.

I think current result will be enough to prove it or there is something wrong with the proof process ?

@ChuanqiXu I takes some time to figure out where the cross suspend point information is used and what it does . It used to determine whether the stack variable need to be placed on Coroutine frame. So the current problem does not affect the correctness of the coroutine programs.

If we stick to the definition about KIlls, we may get the wrong result in some cases. For example(llvm/test/Transforms/Coroutines/ArgAddr.ll),

define nonnull ptr @f(i32 %n) presplitcoroutine {
; CHECK-LABEL: @f(
; CHECK-NEXT:  entry:
; CHECK-NEXT:    [[ID:%.*]] = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr @f.resumers)
; CHECK-NEXT:    [[N_ADDR:%.*]] = alloca i32, align 4
; CHECK-NEXT:    store i32 [[N:%.*]], ptr [[N_ADDR]], align 4
; CHECK-NEXT:    [[CALL:%.*]] = tail call ptr @malloc(i32 24)
; CHECK-NEXT:    [[TMP0:%.*]] = tail call noalias nonnull ptr @llvm.coro.begin(token [[ID]], ptr [[CALL]])
; CHECK-NEXT:    store ptr @f.resume, ptr [[TMP0]], align 8
; CHECK-NEXT:    [[DESTROY_ADDR:%.*]] = getelementptr inbounds [[F_FRAME:%.*]], ptr [[TMP0]], i32 0, i32 1
; CHECK-NEXT:    store ptr @f.destroy, ptr [[DESTROY_ADDR]], align 8
; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds [[F_FRAME]], ptr [[TMP0]], i32 0, i32 2
; CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr [[N_ADDR]], align 4
; CHECK-NEXT:    store i32 [[TMP2]], ptr [[TMP1]], align 4
;
entry:
  %id = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr null);
  %n.addr = alloca i32
  store i32 %n, ptr %n.addr ; this needs to go after coro.begin
  %0 = tail call i32 @llvm.coro.size.i32()
  %call = tail call ptr @malloc(i32 %0)
  %1 = tail call noalias nonnull ptr @llvm.coro.begin(token %id, ptr %call)
  call void @ctor(ptr %n.addr)
  br label %for.cond

for.cond:
  %2 = load i32, ptr %n.add
  %dec = add nsw i32 %2, -1
  store i32 %dec, ptr %n.addr
  call void @print(i32 %2)
  %3 = call i8 @llvm.coro.suspend(token none, i1 false)
  %conv = sext i8 %3 to i32
  switch i32 %conv, label %coro_Suspend [
  i32 0, label %for.cond
  i32 1, label %coro_Cleanup
  ]

coro_Cleanup:
  %4 = call ptr @llvm.coro.free(token %id, ptr nonnull %1)
  call void @free(ptr %4)
  br label %coro_Suspend

coro_Suspend:
  call i1 @llvm.coro.end(ptr null, i1 false)
  ret ptr %1
}

Apparently, stack variable %n.addr should be placed on Coroutine frame as it be used in for.cond block that contains suspend point. Right ? But If we stick to the definition about KIlls, we can't get this result. Why? Because splitAround will split for.cond into 4 blocks(for.cond, CoroSave, CoroSuspend, AfterSuspend) before the cross suspend point information is computed. And splitAround will lead some info lost. Here is CFG about function f,

Code of each block as follow,

entry:
  %id = call token @llvm.coro.id(i32 0, ptr null, ptr null, ptr null)
  %n.addr = alloca i32, align 4
  store i32 %n, ptr %n.addr, align 4
  %0 = tail call i32 @llvm.coro.size.i32()
  %call = tail call ptr @malloc(i32 %0)
  %1 = tail call noalias nonnull ptr @llvm.coro.begin(token %id, ptr %call)
  call void @ctor(ptr %n.addr)
  br label %for.cond

for.cond:                                         ; preds = %AfterCoroSuspend, %entry
  %2 = load i32, ptr %n.addr, align 4
  %dec = add nsw i32 %2, -1
  store i32 %dec, ptr %n.addr, align 4
  call void @print(i32 %2)
  br label %CoroSave

coro_Suspend:                                     ; preds = %coro_Cleanup, %AfterCoroSuspend
  br label %CoroEnd

coro_Cleanup:                                     ; preds = %AfterCoroSuspend
  %5 = call ptr @llvm.coro.free(token %id, ptr nonnull %1)
  call void @free(ptr %5)
  br label %coro_Suspend

CoroSave:                                         ; preds = %for.cond
  %3 = call token @llvm.coro.save(ptr %1)
  br label %CoroSuspend

CoroSuspend:                                      ; preds = %CoroSave
  %4 = call i8 @llvm.coro.suspend(token %3, i1 false)
  br label %AfterCoroSuspend

AfterCoroSuspend:                                 ; preds = %CoroSuspend
  %conv = sext i8 %4 to i32
  switch i32 %conv, label %coro_Suspend [
    i32 0, label %for.cond
    i32 1, label %coro_Cleanup
  ]

CoroEnd:                                          ; preds = %coro_Suspend
  %6 = call i1 @llvm.coro.end(ptr null, i1 false)
  br label %AfterCoroEnd

AfterCoroEnd:                                     ; preds = %CoroEnd
  ret ptr %1

Apparently, for.cond.Kills[Entry] is false, it means %n.addr not crossing a suspend point. So %n.addr will not be placed on Coroutine frame. But it should be placed on Coroutine frame actually.

If I don't understand it correctly, pelase help to point out.

Apparently, for.cond.Kills[Entry] is false, it means %n.addr not crossing a suspend point.

IIUC, for.cond.Kills[Entry] should be true since there is a path from Entry to for.cond without repeating Entry, right?

@ChuanqiXu I takes some time to figure out where the cross suspend point information is used and what it does . It used to determine whether the stack variable need to be placed on Coroutine frame. So the current problem does not affect the correctness of the coroutine programs.

Yes, it is only about optimizations. So it is always correct to mark all blocks as killed and the analysis will be extremely fast in this way.

For the patch itself, I think it'd better to improve the readability. And also it would be better to have some (may be not so formal) proof to the correctness and precision.

@ChuanqiXu Thanks for your comments. I will improve it.

IIUC, for.cond.Kills[Entry] should be true since there is a path from Entry to for.cond without repeating Entry, right?

The definition about Kill is,

//   Kills: a bit vector which contains a set of indices of blocks that can                                                                                                                                                                     
//          reach block 'i' but there is a path crossing a suspend point                                                                                                                                                                       
//          not repeating 'i' (path to 'i' without cycles containing 'i'). 
}

So for.cond.Kills[Entry] means whether exists a path from Entry to for.cond crossing a suspend point not repeating for.cond.

In D154695#4514133, @witstorm95 wrote:
@ChuanqiXu Thanks for your comments. I will improve it.

IIUC, for.cond.Kills[Entry] should be true since there is a path from Entry to for.cond without repeating Entry, right?

The definition about Kill is,
//   Kills: a bit vector which contains a set of indices of blocks that can                                                                                                                                                                     
//          reach block 'i' but there is a path crossing a suspend point                                                                                                                                                                       
//          not repeating 'i' (path to 'i' without cycles containing 'i'). 
}
So for.cond.Kills[Entry] means whether exists a path from Entry to for.cond crossing a suspend point not repeating for.cond.

Oh, you're right. There are some problems about the definitions. Let's correct it in other patches.

@ChuanqiXu I have refactor it again as your suggestions. Some redundant operation has been removed.

I have tested folly library again and enabled coro tests(experimental/coro/test). The results are the same before and after this patch. But there still exists some fail. Here are the failed cases:

245 - heap_vector_types_test.HeapVectorTypes.GrowthPolicy (Failed)
1362 - HHWheelTimerTest.HHWheelTimerTest.FireOnce (Failed)
1366 - HHWheelTimerTest.HHWheelTimerTest.CancelTimeout (Failed)
1368 - HHWheelTimerTest.HHWheelTimerTest.SlowFast (Failed)
1369 - HHWheelTimerTest.HHWheelTimerTest.ReschedTest (Failed)
1371 - HHWheelTimerTest.HHWheelTimerTest.DefaultTimeout (Failed)
1375 - HHWheelTimerTest.HHWheelTimerTest.IntrusivePtr (Failed)
1376 - HHWheelTimerTest.HHWheelTimerTest.GetTimeRemaining (Failed)
1378 - HHWheelTimerTest.HHWheelTimerTest.Level1 (Failed)
1487 - lang_exception_test.ExceptionTest.terminate_with_direct (Failed)
1488 - lang_exception_test.ExceptionTest.terminate_with_variadic (Failed)
3016 - coro_asyncscope_test.AsyncScopeTest.DontThrowOnJoin (Failed)
3078 - coro_async_scope_test.AsyncScopeTest.DontThrowOnJoin (Failed)
3090 - coro_async_stack_test.AsyncStackTest.MixedStackWalk (SEGFAULT)

As you can see, some coro tests failed. Do you know the problem ?

BTW, I feel the grammar of the comment reads slightly bad. Could you try to improve it? I can't help a lot since I am not native speaker.

As you can see, some coro tests failed. Do you know the problem ?

I don't have an idea. Folly is pretty complex. Maybe you can try to choose another stable version for folly.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
43	Generally we prefer llvm::DenseSet instead of unordered_set. You can refer to the LLVM Programmers manual.
149	It is better to explain the meaning of the parameters too.
152	nit: we generally prefer the style.
330	nit:
331	Such comment is meaningless.
333	So it is better to explain what's the meaning for indegree. I know the definition of in degree. But I feel it has different semantics in this algorithm. For example, what's the information represented by `indegree 0`? Why the indegree of a node change while we don't touch the graph? So I think you'd better to give it a different name and a corresponding explanation.
337–354	Same with above.
350–364	This is pretty clear. We don't need such comments.
375
389	nit
412

@ChuanqiXu Thanks a lot. These suggestions very helpful. Sorry for var's name problems, I have no talent for making names.

For folly library, same result as before in llvm-project tag llvmorg-16.0.0 and llvmorg-16.0.6.

Accept these suggestions.

Harbormaster completed remote builds in B248221: Diff 544313.Jul 26 2023, 9:58 AM

ChuanqiXu added inline comments.Jul 26 2023, 7:22 PM

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
151	ditto in other places.
331	Now we can understand it is a dynamic attribute instead of a static one. Then it is not odd that it may change.
393–406
407–408

@ChuanqiXu Thank you for your suggestions. I've already fixed it.

Accept these suggestions.

LGTM. Thanks for your effort!

I noticed that this is your first time to contribute for LLVM. Then you can leave your Name and Email Address then I'll land it for you.

This revision is now accepted and ready to land.Jul 27 2023, 12:30 AM

Harbormaster completed remote builds in B248443: Diff 544614.Jul 27 2023, 12:41 AM

Thanks for your review and comment.

My name and email as follow,
Name: witstorm95
Email: witstorm@163.com

witstorm95 edited the summary of this revision. (Show Details)Jul 27 2023, 12:57 AM

I forgot to format it earlier. Sorry about that.

This revision was landed with ongoing or failed builds.Jul 27 2023, 2:29 AM

Closed by commit rG77ef88d7ee32: [Coroutines] Add an O(n) algorithm for computing the cross suspend point (authored by witstorm95, committed by ChuanqiXu). · Explain Why

This revision was automatically updated to reflect the committed changes.

ChuanqiXu added a commit: rG77ef88d7ee32: [Coroutines] Add an O(n) algorithm for computing the cross suspend point.

Harbormaster completed remote builds in B248466: Diff 544646.Jul 27 2023, 2:53 AM

We're seeing crashes in our builds where clang -flto=thin seems to produce IR that cannot be loaded anymore:

Stderr: Instruction does not dominate all uses!
  %47 = getelementptr inbounds %_ZN5folly8channels6detail22TransformProcessorBaseIN8facebook6falcon6client18FalconNotificationENS5_10BlobEntityENS5_29FalconClientEntityTransformerIS7_NS5_25FalconTransformerSettingsILb0ELb0ELb0EEEEEE13processValuesENS1_5QueueINS_3TryIS6_EEEE.Frame, ptr %0, i64 0, i32 3, i32 0, i32 1
  store i64 %70, ptr %47, align 8, !dbg !220318, !tbaa !81281
Instruction does not dominate all uses!

Unfortunately it seems to be indeterministic (so it's hard to 100% blame this change, but given all our errors happen with load/stores from xxx.Frame types this seemed likely). I'm unfortunately also not seeing any reports from ASAN that would explain the indeterminism.

Can't you use a reverse-port-order (see PostOrderIterator.h / ReversePostOrderTraversal) to get a topological sorting of CFG blocks? That is a simpler algorithm than the one here and it already exists as a helper function in LLVM!

With current trunk I see invalid IR produced every 2nd or 3rd build. With this change reverted I was able to >10 builds in a row without invalid IR. I'd like to revert this.

Unfortunately creating a reproducer is tricky given the nondeterminism so llvm-reduce doesn't work reliably. Would it be acceptable to revert this for the time being even without a reproducer?

I don't think I grasped the full algorithm employed here. But give this is just a standard dataflow problem we are dealing with here, this seems too complicated to me. I can't shake the feeling that the code would be simpler with just visiting every block once in RPOT to deal with those artifical inputs from the task and then keep doing the normal worklist algorithm till the fixpoint (= the code before this change).

In D154695#4549149, @MatzeB wrote:

With current trunk I see invalid IR produced every 2nd or 3rd build. With this change reverted I was able to >10 builds in a row without invalid IR. I'd like to revert this.

Unfortunately creating a reproducer is tricky given the nondeterminism so llvm-reduce doesn't work reliably. Would it be acceptable to revert this for the time being even without a reproducer?

I don't think I grasped the full algorithm employed here. But give this is just a standard dataflow problem we are dealing with here, this seems too complicated to me. I can't shake the feeling that the code would be simpler with just visiting every block once in RPOT to deal with those artifical inputs from the task and then keep doing the normal worklist algorithm till the fixpoint (= the code before this change).

So the problem arose when you applied this patch ? If it is, then it's a problem with this patch, and I need to find out why. Could you provide more info about this?

ChuanqiXu added a reverting change: rG9b69a4d84ef6: Revert "[Coroutines] Add an O(n) algorithm for computing the cross suspend….Jul 31 2023, 6:42 PM

Would it be acceptable to revert this for the time being even without a reproducer?

Reverted. @witstorm95 it is common to revert patches within LLVM with post commit review (or crash report). Let's reland this one after we figure it out. @MatzeB given the patch matters for compilation speed, it is pretty helpful to provide a reproducer.

So the problem arose when you applied this patch ? If it is, then it's a problem with this patch, and I need to find out why. Could you provide more info about this?

Unfortunately creating a reproducer will require more time, as currently I only see it happening for internal code (and reducing big codebases is takes time). I should be able to give you a reproducer in the next days.

@witstorm95 it is common to revert patches within LLVM with post commit review (or crash report).

I mean reverting is always allowed when a reproducer exists. I don't have a real reproducer yet that I can publish so this is a judgement call / nice ask here. So thanks for reverting, this does help our testing! I'll see to provide a reproducer or fix to the diff.

@MatzeB Thanks for your report. I have reproduced it by compiling folly library and enabling experimental/channels/test. The algorithm of this patch cannot handle super complex CFG with many loops.

In D154695#4549064, @MatzeB wrote:

Can't you use a reverse-port-order (see PostOrderIterator.h / ReversePostOrderTraversal) to get a topological sorting of CFG blocks? That is a simpler algorithm than the one here and it already exists as a helper function in LLVM!

Thanks for your reminding. I will try this.

I did spent some more time on the patch yesterday. I believe the indeterminism was somewhat triggered by the existing code in BlockToIndexMapping: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Coroutines/CoroFrame.cpp#L63 it sorts the blocks by their memory address and as memory address can fluctuate between runs (caused by ASLR etc.) iterating over the Mapping resulting in a nondeterministic order. This doesn't matter when the algorithm always reaches the same solution regardless of the order, but this wasn't the case after the rewrite it seems.

I also did prepare a proposal patch to use initial RPO pass + fixpoint worklist algorithm. Cleaning this up right now for submission / further discussion...

MatzeB mentioned this in D156835: CoroFrame: Rework SuspendCrossingInfo analysis.Aug 1 2023, 4:08 PM

Tried to show the RPO variant here: https://reviews.llvm.org/D156835

In D154695#4552152, @MatzeB wrote:

Tried to show the RPO variant here: https://reviews.llvm.org/D156835

It seems better than this patch and reduce many reduntant operation.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Coroutines/

CoroFrame.cpp

218 lines

Diff 544661

llvm/lib/Transforms/Coroutines/CoroFrame.cpp

Show All 34 Lines

#include "llvm/Support/circular_raw_ostream.h" #include "llvm/Support/circular_raw_ostream.h"

#include "llvm/Support/raw_ostream.h" #include "llvm/Support/raw_ostream.h"

#include "llvm/Transforms/Utils/BasicBlockUtils.h" #include "llvm/Transforms/Utils/BasicBlockUtils.h"

#include "llvm/Transforms/Utils/Local.h" #include "llvm/Transforms/Utils/Local.h"

#include "llvm/Transforms/Utils/PromoteMemToReg.h" #include "llvm/Transforms/Utils/PromoteMemToReg.h"

#include <algorithm> #include <algorithm>

#include <deque> #include <deque>

#include <optional> #include <optional>

ChuanqiXuUnsubmitted

Done

Generally we prefer llvm::DenseSet instead of unordered_set. You can refer to the LLVM Programmers manual.

ChuanqiXu: Generally we prefer llvm::DenseSet instead of unordered_set. You can refer to the LLVM…

using namespace llvm; using namespace llvm;

// The "coro-suspend-crossing" flag is very noisy. There is another debug type, // The "coro-suspend-crossing" flag is very noisy. There is another debug type,

// "coro-frame", which results in leaner debug spew. // "coro-frame", which results in leaner debug spew.

#define DEBUG_TYPE "coro-suspend-crossing" #define DEBUG_TYPE "coro-suspend-crossing"

enum { SmallVectorThreshold = 32 }; enum { SmallVectorThreshold = 32 };

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines class SuspendCrossingInfo {

BlockToIndexMapping Mapping; BlockToIndexMapping Mapping;

struct BlockData { struct BlockData {

BitVector Consumes; BitVector Consumes;

BitVector Kills; BitVector Kills;

bool Suspend = false; bool Suspend = false;

bool End = false; bool End = false;

bool KillLoop = false; bool KillLoop = false;

bool Changed = false;

}; };

SmallVector<BlockData, SmallVectorThreshold> Block; SmallVector<BlockData, SmallVectorThreshold> Block;

iterator_range<pred_iterator> predecessors(BlockData const &BD) const { iterator_range<pred_iterator> predecessors(BlockData const &BD) const {

BasicBlock *BB = Mapping.indexToBlock(&BD - &Block[0]); BasicBlock *BB = Mapping.indexToBlock(&BD - &Block[0]);

return llvm::predecessors(BB); return llvm::predecessors(BB);

} }

size_t pred_size(BlockData const &BD) const {

BasicBlock *BB = Mapping.indexToBlock(&BD - &Block[0]);

return llvm::pred_size(BB);

}

iterator_range<succ_iterator> successors(BlockData const &BD) const {

BasicBlock *BB = Mapping.indexToBlock(&BD - &Block[0]);

return llvm::successors(BB);

}

BlockData &getBlockData(BasicBlock *BB) { BlockData &getBlockData(BasicBlock *BB) {

return Block[Mapping.blockToIndex(BB)]; return Block[Mapping.blockToIndex(BB)];

} }

/// Compute the BlockData for the current function in one iteration. /// This algorithm is based on topological sorting. As we know, topological

/// Returns whether the BlockData changes in this iteration. /// sorting is typically used on Directed Acyclic Graph (DAG). However, a

/// Initialize - Whether this is the first iteration, we can optimize /// Control Flow Graph (CFG) may not always be a DAG, as it can contain back

ChuanqiXuUnsubmitted

Not Done

In the algorithm, I prefer to use the term back edge than loop. Since loop has many more meanings and attributes so that the back edges would be more precise.

Also with the new term back edge, we can describe the algorithm much more simpler:

In the initial iterations, all the consume information along the forward edges are collected.
(If there is any back edges), we can iterate again to get all the consume information along the back edges.
We can compute the Kill informations by the consume informations.

In this way, it is much easier to understand too.

ChuanqiXu: In the algorithm, I prefer to use the term `back edge` than `loop`. Since loop has many more…

witstorm95AuthorUnsubmitted

Done

@ChuanqiXu Thanks for your comment.

I takes some time to figure out what means about terms like back edge and forward edge. Here is DFS Edge Classification. It's really suitable for topological sorting algorithm?

If we use that tems, I have some questions about your description of this algorithm:

In the initial iterations, all the consume information along the forward edges are collected.

The Consume info should along the tree edge and back edge(if exists) are collected. And the same time, the Kill info should be collected too as we don't know there exists back edges or not in initial iteration.

(If there is any back edges), we can iterate again to get all the consume information along the back edges.

The Consume info should along the tree edge and back edge are collected again and the Kill info should be collected too as you can't compute the all Kill informations in the third iteration only.

witstorm95: @ChuanqiXu Thanks for your comment. I takes some time to figure out what means about terms…

ChuanqiXuUnsubmitted

Not Done

I takes some time to figure out what means about terms like back edge and forward edge. Here is DFS Edge Classification. It's really suitable for topological sorting algorithm?

Sorry for that it is confusing. This is a suggestion instead of a requirement. Different areas have different conventions for terminologies. For example, I didn't see the definition of tree edge and cross edge in the link you give before.

In compiler engineerings, the term 'loop' has many more meanings than in pure algorithms. So that the instinct reaction that I had when I see your patch was that you did something wrong. Since I felt if you want loop informations, you should use the existing components to query the loop information. But then I realized all you need about is the back edges.

you can't compute the all Kill informations in the third iteration only.

It is the third step and it is not necessarily an iteration. For example, in your current implementations, it may still access a node multiple times if it has a lot of predecessors.

ChuanqiXu: > I takes some time to figure out what means about terms like back edge and forward edge. Here…

/// the initial case a little bit by manual loop switch. /// edges or loops. To handle this, we need to break the back edge when we

template <bool Initialize = false> bool computeBlockData(); /// encounter it in order to ensure a valid topological sorting.

/// Why do we need an extra traversal when a CFG contains a back edge?

/// Firstly, we need to figure out how the Consumes information propagates

/// along the back edge. For example,

///

/// A -> B -> C -> D -> H

/// ^ |

/// | v

/// G <- F <- E

///

/// Following the direction of the arrow, we can obtain the traversal

/// sequences: A, B, C, D, H, E, F, G or A, B, C, D, E, H, F, G. We know that

/// there is a path from C to G after the first traversal. However, we are

/// uncertain about the existence of a path from G to C, as the Consumes info

/// of G has not yet propagated to C (via B). Therefore, we need a second

/// traversal to propagate G's Consumes info to C (via B) and its successors.

/// The second traversal allows us to obtain the complete Consumes info. Since

/// the computation of the Kills info depends on the Consumes info.

/// The parameter "EntryNo" represents the index associated with the entry

/// block.

/// The parameter "BlockPredecessorsNum" represents the number of predecessors

/// for each block.

/// Returns true if there exists back edges in CFG.

template <bool HasBackEdge = false>

ChuanqiXuUnsubmitted

Done

It is better to explain the meaning of the parameters too.

ChuanqiXu: It is better to explain the meaning of the parameters too.

bool collectConsumeKillInfo(size_t EntryNo,

const SmallVector<size_t> &BlockPredecessorsNum);

ChuanqiXuUnsubmitted

Done

bool collectConsumeKillInfo(size_t EntryNo,

- const SmallVector<int> &BlockPredecessorsNum);

+ const SmallVector<size_t> &BlockPredecessorsNum);

public:

ditto in other places.

ChuanqiXu: ditto in other places.

ChuanqiXuUnsubmitted

Done

bool collectConsumeKillInfo(size_t EntryNo,

- SmallVector<int> const &BlocksIndegree);

+ const SmallVector<int> &BlocksIndegree);

public:

nit: we generally prefer the style.

ChuanqiXu: nit: we generally prefer the style.

public: public:

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)

void dump() const; void dump() const;

void dump(StringRef Label, BitVector const &BV) const; void dump(StringRef Label, BitVector const &BV) const;

#endif #endif

SuspendCrossingInfo(Function &F, coro::Shape &Shape); SuspendCrossingInfo(Function &F, coro::Shape &Shape);

▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines for (size_t I = 0, N = Block.size(); I < N; ++I) {

dbgs() << B->getName() << ":\n"; dbgs() << B->getName() << ":\n";

dump(" Consumes", Block[I].Consumes); dump(" Consumes", Block[I].Consumes);

dump(" Kills", Block[I].Kills); dump(" Kills", Block[I].Kills);

} }

dbgs() << "\n"; dbgs() << "\n";

} }

#endif #endif

template <bool Initialize> bool SuspendCrossingInfo::computeBlockData() { template <bool HasBackEdge>

const size_t N = Mapping.size(); bool SuspendCrossingInfo::collectConsumeKillInfo(

bool Changed = false; size_t EntryNo, const SmallVector<size_t> &BlockPredecessorsNum) {

bool FoundBackEdge = false;

for (size_t I = 0; I < N; ++I) { SmallVector<size_t> UnvisitedBlockPredNum = BlockPredecessorsNum;

auto &B = Block[I]; // BlockNo Queue with BlockPredNum[BlockNo] equal to zero.

std::queue<size_t> CandidateQueue;

// We don't need to count the predecessors when initialization. // For blocks that maybe has a back edge.

if constexpr (!Initialize) DenseSet<size_t> MaybeBackEdgeSet;

// If all the predecessors of the current Block don't change, // Visit BlockNo

// the BlockData for the current block must not change too. auto visit = [&](size_t BlockNo) {

if (all_of(predecessors(B), [this](BasicBlock *BB) { switch (UnvisitedBlockPredNum[BlockNo]) {

return !Block[Mapping.blockToIndex(BB)].Changed; // Already visited, not visit again.

})) { case 0:

B.Changed = false; break;

continue; // If predecessors number of BlockNo is 1, it means all predecessors of

// BlockNo have propagated its info to BlockNo. So add BlockNo to

// CandidateQueue.

case 1: {

CandidateQueue.push(BlockNo);

MaybeBackEdgeSet.erase(BlockNo);

UnvisitedBlockPredNum[BlockNo] = 0;

break;

}

// If predecessors number of BlockNo bigger than 1, it means BlockNo not

// collect full Consumes/Kills info yet. So decrease

// UnvisitedBlockPredNum[BlockNo] and insert BlockNo into MaybeBackEdgeSet.

default: {

UnvisitedBlockPredNum[BlockNo]--;

MaybeBackEdgeSet.insert(BlockNo);

break;

}

} }

};

// Saved Consumes and Kills bitsets so that it is easy to see CandidateQueue.push(EntryNo);

// if anything changed after propagation.

auto SavedConsumes = B.Consumes; // Topological sorting.

auto SavedKills = B.Kills; while (!CandidateQueue.empty()) {

auto &B = Block[CandidateQueue.front()];

for (BasicBlock *PI : predecessors(B)) { CandidateQueue.pop();

auto PrevNo = Mapping.blockToIndex(PI); for (BasicBlock *SI : successors(B)) {

auto &P = Block[PrevNo]; auto SuccNo = Mapping.blockToIndex(SI);

auto &S = Block[SuccNo];

// Propagate Kills and Consumes from predecessors into B.

B.Consumes |= P.Consumes; // Propagate Kills and Consumes from predecessors into S.

B.Kills |= P.Kills; S.Consumes |= B.Consumes;

S.Kills |= B.Kills;

// If block P is a suspend block, it should propagate kills into block

// B for every block P consumes. if (B.Suspend)

if (P.Suspend) S.Kills |= B.Consumes;

B.Kills |= P.Consumes;

} if (S.Suspend) {

// If block S is a suspend block, it should kill all of the blocks

if (B.Suspend) { // it consumes.

// If block S is a suspend block, it should kill all of the blocks it S.Kills |= S.Consumes;

// consumes. } else if (S.End) {

B.Kills |= B.Consumes; // If block S is an end block, it should not propagate kills as the

} else if (B.End) {

// If block B is an end block, it should not propagate kills as the

// blocks following coro.end() are reached during initial invocation // blocks following coro.end() are reached during initial invocation

// of the coroutine while all the data are still available on the // of the coroutine while all the data are still available on the

// stack or in the registers. // stack or in the registers.

B.Kills.reset(); S.Kills.reset();

} else { } else {

// This is reached when B block it not Suspend nor coro.end and it // This is reached when S block it not Suspend nor coro.end and it

// need to make sure that it is not in the kill set. // need to make sure that it is not in the kill set.

B.KillLoop |= B.Kills[I]; S.KillLoop |= S.Kills[SuccNo];

B.Kills.reset(I); S.Kills.reset(SuccNo);

}

if constexpr (!Initialize) {

B.Changed = (B.Kills != SavedKills) || (B.Consumes != SavedConsumes);

Changed |= B.Changed;

} }

// visit SuccNo.

visit(SuccNo);

} }

ChuanqiXuUnsubmitted

Done

size_t EntryNo, SmallVector<int> const &BlocksIndegree) {

- bool ExistBackEdge = false;

+ bool FoundBackEdge = false;

// Copy BlocksIndegree to IndegreeOfBlocks.

nit:

ChuanqiXu: nit:

if constexpr (Initialize) // If the CandidateQueue is empty but the MaybeBackEdgeSet is not, it

ChuanqiXuUnsubmitted

Done

Such comment is meaningless.

ChuanqiXu: Such comment is meaningless.

ChuanqiXuUnsubmitted

Done

bool FoundBackEdge = false;

- auto BlockPredNum = BlockPredecessorsNum;

+ SmallVector<unsigned> UnvisitedBlockPredNum = BlockPredecessorsNum;

// BlockNo Queue with BlockPredNum[BlockNo] equal to zero.

Now we can understand it is a dynamic attribute instead of a static one. Then it is not odd that it may change.

ChuanqiXu: Now we can understand it is a dynamic attribute instead of a static one. Then it is not odd…

// indicates the presence of a back edge that needs to be addressed. In such

// cases, it is necessary to break the back edge.

ChuanqiXuUnsubmitted

Done

So it is better to explain what's the meaning for *indegree*. I know the definition of in degree. But I feel it has different semantics in this algorithm. For example, what's the information represented by indegree 0? Why the indegree of a node change while we don't touch the graph? So I think you'd better to give it a different name and a corresponding explanation.

ChuanqiXu: So it is better to explain what's the meaning for *indegree*. I know the definition of in…

if (CandidateQueue.empty() && !MaybeBackEdgeSet.empty()) {

FoundBackEdge = true;

size_t CandidateNo = -1;

if constexpr (HasBackEdge) {

auto IsCandidate = [this](size_t I) {

for (BasicBlock *PI : llvm::predecessors(Mapping.indexToBlock(I))) {

auto PredNo = Mapping.blockToIndex(PI);

auto &P = Block[PredNo];

// The node I can reach its predecessor. So we found a loop.

if (P.Consumes[I])

return true; return true;

}

return false;

};

return Changed; for (auto I : MaybeBackEdgeSet) {

if (IsCandidate(I)) {

CandidateNo = I;

break;

}

ChuanqiXuUnsubmitted

Done

Same with above.

ChuanqiXu: Same with above.

}

assert(CandidateNo != size_t(-1) && "We collected the wrong backegdes");

} else

// When the value of HasBackEdge is false and we don't have any

// information about back edges, we can simply select one block from the

// MaybeBackEdgeSet.

CandidateNo = *(MaybeBackEdgeSet.begin());

CandidateQueue.push(CandidateNo);

MaybeBackEdgeSet.erase(CandidateNo);

UnvisitedBlockPredNum[CandidateNo] = 0;

ChuanqiXuUnsubmitted

Done

This is pretty clear. We don't need such comments.

ChuanqiXu: This is pretty clear. We don't need such comments.

}

return FoundBackEdge;

} }

SuspendCrossingInfo::SuspendCrossingInfo(Function &F, coro::Shape &Shape) SuspendCrossingInfo::SuspendCrossingInfo(Function &F, coro::Shape &Shape)

: Mapping(F) { : Mapping(F) {

const size_t N = Mapping.size(); const size_t N = Mapping.size();

Block.resize(N); Block.resize(N);

size_t EntryNo = Mapping.blockToIndex(&(F.getEntryBlock()));

ChuanqiXuUnsubmitted

Done

Block.resize(N);

- // Set EntryNo.

size_t EntryNo = Mapping.blockToIndex(&(F.getEntryBlock()));

ChuanqiXu:

SmallVector<size_t> BlockPredecessorsNum(N, 0);

// Initialize every block so that it consumes itself // Initialize every block so that it consumes itself

for (size_t I = 0; I < N; ++I) { for (size_t I = 0; I < N; ++I) {

auto &B = Block[I]; auto &B = Block[I];

B.Consumes.resize(N); B.Consumes.resize(N);

B.Kills.resize(N); B.Kills.resize(N);

B.Consumes.set(I); B.Consumes.set(I);

B.Changed = true; BlockPredecessorsNum[I] = pred_size(B);

} }

// Mark all CoroEnd Blocks. We do not propagate Kills beyond coro.ends as // Mark all CoroEnd Blocks. We do not propagate Kills beyond coro.ends as

// the code beyond coro.end is reachable during initial invocation of the // the code beyond coro.end is reachable during initial invocation of the

// coroutine. // coroutine.

ChuanqiXuUnsubmitted

Done

// there exists back edge and need to break it.

- if (CandidateQueue.empty() && MaybeBackEdgeSet.size()) {

+ if (CandidateQueue.empty() && !MaybeBackEdgeSet.empty()) {

ExistBackEdge = true;

nit

ChuanqiXu: nit

for (auto *CE : Shape.CoroEnds) for (auto *CE : Shape.CoroEnds)

getBlockData(CE->getParent()).End = true; getBlockData(CE->getParent()).End = true;

// Mark all suspend blocks and indicate that they kill everything they // Mark all suspend blocks and indicate that they kill everything they

// consume. Note, that crossing coro.save also requires a spill, as any code // consume. Note, that crossing coro.save also requires a spill, as any code

// between coro.save and coro.suspend may resume the coroutine and all of the // between coro.save and coro.suspend may resume the coroutine and all of the

// state needs to be saved by that time. // state needs to be saved by that time.

auto markSuspendBlock = [&](IntrinsicInst *BarrierInst) { auto markSuspendBlock = [&](IntrinsicInst *BarrierInst) {

BasicBlock *SuspendBlock = BarrierInst->getParent(); BasicBlock *SuspendBlock = BarrierInst->getParent();

auto &B = getBlockData(SuspendBlock); auto &B = getBlockData(SuspendBlock);

B.Suspend = true; B.Suspend = true;

B.Kills |= B.Consumes; B.Kills |= B.Consumes;

}; };

for (auto *CSI : Shape.CoroSuspends) { for (auto *CSI : Shape.CoroSuspends) {

markSuspendBlock(CSI); markSuspendBlock(CSI);

if (auto *Save = CSI->getCoroSave()) if (auto *Save = CSI->getCoroSave())

markSuspendBlock(Save); markSuspendBlock(Save);

ChuanqiXuUnsubmitted

Done

if constexpr (HasBackEdge) {

- for (auto I : MaybeBackEdgeSet) {

- for (BasicBlock *PI : llvm::predecessors(Mapping.indexToBlock(I))) {

- auto PredNo = Mapping.blockToIndex(PI);

- auto &P = Block[PredNo];

- // PredNo -> I exists, then check path I -> PredNo.

- if (P.Consumes[I]) {

- CandidateNo = I;

- break;

- }

- // Pick one CandidateNo.

- if (CandidateNo != size_t(-1))

- break;

- }

+ auto IsCandidate = [this](size_t I) {

+ for (BasicBlock *PI : llvm::predecessors(Mapping.indexToBlock(I))) {

+ auto PredNo = Mapping.blockToIndex(PI);

+ auto &P = Block[PredNo];

+ // The node I can reach its predecessor. So we found a loop.

+ if (P.Consumes[I])

+ return true;

+ }

+ return false;

+ };

+ for (auto I : MaybeBackEdgeSet) {

+ if (IsCandidate(I)) {

+ CandidateNo = I;

+ break;

+ }

// Must find one CandidateNo, otherwise this is a bug.

ChuanqiXu:

} }

ChuanqiXuUnsubmitted

Done

break;

}

- // Must find one CandidateNo, otherwise this is a bug.

- assert(CandidateNo != size_t(-1) && "A bug reached");

+ assert(CandidateNo != size_t(-1) && "We collected the wrong backegdes");

} else

ChuanqiXu:

computeBlockData</*Initialize=*/true>(); // We should collect the Consumes and Kills information initially. If there is

// a back edge present, it is necessary to perform the collection process

while (computeBlockData()) // again.

; if (collectConsumeKillInfo(EntryNo, BlockPredecessorsNum))

ChuanqiXuUnsubmitted

Done

if (collectConsumeKillInfo(EntryNo, BlocksIndegree))

- collectConsumeKillInfo<true>(EntryNo, BlocksIndegree);

+ collectConsumeKillInfo</*HasBackEdge=*/true>(EntryNo, BlocksIndegree);

LLVM_DEBUG(dump());

ChuanqiXu:

collectConsumeKillInfo</*HasBackEdge*/ true>(EntryNo, BlockPredecessorsNum);

LLVM_DEBUG(dump()); LLVM_DEBUG(dump());

} }

namespace { namespace {

// RematGraph is used to construct a DAG for rematerializable instructions // RematGraph is used to construct a DAG for rematerializable instructions

// When the constructor is invoked with a candidate instruction (which is // When the constructor is invoked with a candidate instruction (which is

▲ Show 20 Lines • Show All 2,782 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Coroutines] Add an O(n) algorithm for computing the cross suspend point information.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 544661

llvm/lib/Transforms/Coroutines/CoroFrame.cpp

[Coroutines] Add an O(n) algorithm for computing the cross suspend point information.
ClosedPublic