Page MenuHomePhabricator

[IR][coroutine] make final and non-final llvm.coro.saves unique
AbandonedPublic

Authored by ychen on Jul 6 2022, 2:59 PM.

Details

Summary

An alternative to D129025.

For

define void @f(i32 %x) {
entry:
  %cmp = icmp slt i32 %x, 0
  br i1 %cmp, label %await.suspend, label %final.suspend

await.suspend:
  %0 = call token @llvm.coro.save(ptr null)
  %1 = call i8 @llvm.coro.suspend(token %0, i1 false)
  br label %coro.ret

final.suspend:
  %2 = call token @llvm.coro.save(ptr null)
  %3 = call i8 @llvm.coro.suspend(token %2, i1 true)
  br label %coro.ret

coro.ret:
  ret void
}

SimplifyCFG would hoist llvm.coro.save into the entry block. However,
since one is for the final suspend and the other is not, the transformation
is not desired.

I proposed D129025, which requires a update to the token type definition
in LangRef to make merging two token-returning instructions illegal
(which I didn't do yet and I think it may affect other users of the token
type like WinEH, GC intrinsics). D129025 also excludes some valid
transformation: for this test case, if both branches of the if statement
contains non-final suspend, it is ok to hoisting llvm.coro.save.

So this patch, I propose to directly change the IR equality between two
llvm.coro.save to disable the invalid hoisting. I don't like the
special casing of llvm.coro.save in the llvm/IR space however, the
only other choice would be to change llvm.coro.save in some way like:

  1. add new flag parameter to llvm.coro.save
  2. add llvm.coro.save.final

I'm not a fan of both.

Any thoughts?

Diff Detail

Unit TestsFailed

TimeTest
250 msx64 debian > BOLT.runtime/X86::user-func-reorder.c
Script: -- : 'RUN: at line 30'; /usr/bin/clang --target=x86_64-linux -fuse-ld=lld -Wl,--unresolved-symbols=ignore-all -no-pie /var/lib/buildkite-agent/builds/llvm-project/bolt/test/runtime/X86/user-func-reorder.c -o /var/lib/buildkite-agent/builds/llvm-project/build/tools/bolt/test/runtime/X86/Output/user-func-reorder.c.tmp.exe -Wl,-q
60,030 msx64 debian > MLIR.Examples/standalone::test.toy
Script: -- : 'RUN: at line 1'; /usr/bin/cmake /var/lib/buildkite-agent/builds/llvm-project/mlir/examples/standalone -G "Ninja" -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DCMAKE_C_COMPILER=/usr/bin/clang -DLLVM_ENABLE_LIBCXX=OFF -DMLIR_DIR=/var/lib/buildkite-agent/builds/llvm-project/build/lib/cmake/mlir -DLLVM_USE_LINKER=lld
60,030 msx64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/var/lib/buildkite-agent/builds/llvm-project/compiler-rt/lib/fuzzer -m64 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/fuzzer/LeakTest.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/fuzzer/X86_64DefaultLinuxConfig/Output/fuzzer-leak.test.tmp-LeakTest

Event Timeline

ychen created this revision.Jul 6 2022, 2:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2022, 2:59 PM
ychen requested review of this revision.Jul 6 2022, 2:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2022, 2:59 PM
ychen edited the summary of this revision. (Show Details)Jul 6 2022, 3:03 PM
ychen added a subscriber: nikic.
ChuanqiXu added a comment.EditedJul 6 2022, 11:14 PM

D129025 also excludes some valid
transformation: for this test case, if both branches of the if statement
contains non-final suspend, it is ok to hoisting llvm.coro.save.

It looks not valid to me to hoist non-final suspend llvm.coro.save. From https://github.com/llvm/llvm-project/blob/011d2bf86487520c3515f16e0b1d32994bf2b48f/llvm/lib/Transforms/Coroutines/CoroSplit.cpp#L388-L405 we could know a llvm.coro.save would be lowered into:

%index.addr = getelementptr %f.frame... (index field number)
store i32 %IndexVal, i32* %index.addr1

So the following code:

if.then:
  %0 = call token @llvm.coro.save(ptr null)
  %1 = call i8 @llvm.coro.suspend(token %0, i1 false)
  br label %merge

if.else:
  %2 = call token @llvm.coro.save(ptr null)
  %3 = call i8 @llvm.coro.suspend(token %2, i1 false)
  br label %merge

would be lowered into:

if.then:
  %0 = getelementptr %f.frame... (index field number)
  store i32 0, i32* %0 ; assume this is the first suspend points
  %1 = call i8 @llvm.coro.suspend(token %0, i1 false)
  br label %merge

if.else:
  %2 = getelementptr %f.frame... (index field number)
  store i32 1, i32* %2
  %3 = call i8 @llvm.coro.suspend(token %2, i1 false)
  br label %merge

so it clearly makes no sense to merge the non-final coro saves. D129025 looks good enough to me temporarily.

however, the only other choice would be to change llvm.coro.save in some way like:

1.add new flag parameter to llvm.coro.save
2.add llvm.coro.save.final

I'm not a fan of both.

So option2 is not valid. But I am a fan of option 1. Since it looks naturally to merge 2 @llvm.coro.save(ptr null) but it wouldn't be naturally to merge @llvm.coro.save(ptr null, i32 0) and @llvm.coro.save(ptr null, i32 1). So we solve the problem in the IR level.

nikic added a comment.Jul 7 2022, 12:11 AM

Another possibility is to mark llvm.coro.save as IntrNoMerge, though that would prevent any merging (similar to the SimplifyCFG patch, just only for this intrinsic and not other token-returning intrinsics).

I'm not familiar with these coro intrinsics, but if these have two different ways of being lowered, then adding a flag to distinguish them generally makes sense to me.

ychen added a comment.EditedJul 7 2022, 12:12 AM

so it clearly makes no sense to merge the non-final coro saves. D129025 looks good enough to me temporarily.

Agreed.

however, the only other choice would be to change llvm.coro.save in some way like:

1.add new flag parameter to llvm.coro.save
2.add llvm.coro.save.final

I'm not a fan of both.

So option2 is not valid. But I am a fan of option 1. Since it looks naturally to merge 2 @llvm.coro.save(ptr null) but it wouldn't be naturally to merge @llvm.coro.save(ptr null, i32 0) and @llvm.coro.save(ptr null, i32 1). So we solve the problem in the IR level.

it looks naturally to merge 2 @llvm.coro.save(ptr null)

Hmmm, I don't quite understand this. Do you please give an example?

One more complication is that for

define void @f(i32 %x) {
entry:
  %cmp = icmp slt i32 %x, 0
  br i1 %cmp, label %await.suspend, label %final.suspend

await.suspend:
  %0 = call token @llvm.coro.save(ptr null)
  %1 = call i8 @llvm.coro.suspend(token %0, i1 false)
  br label %coro.ret

final.suspend:
  %2 = call token @llvm.coro.save(ptr null)
  %3 = call i8 @llvm.coro.suspend(token %2, i1 false)
  br label %coro.ret

coro.ret:
  ret void
}

this is not correct (SimplifyCFG only hoists llvm.coro.save):

define void @f(i32 %x) {
entry:
  %cmp = icmp slt i32 %x, 0
  %0 = call token @llvm.coro.save(ptr null)
  br i1 %cmp, label %await.suspend, label %final.suspend

await.suspend:
  %1 = call i8 @llvm.coro.suspend(token %0, i1 false)
  br label %coro.ret

final.suspend:
  %3 = call i8 @llvm.coro.suspend(token %0, i1 false)
  br label %coro.ret

coro.ret:
  ret void
}

but this is correct (SimplifyCFG hoists llvm.coro.suspend too):

define void @f(i32 %x) {
entry:
  %cmp = icmp slt i32 %x, 0
  %0 = call token @llvm.coro.save(ptr null)
  %1 = call i8 @llvm.coro.suspend(token %0, i1 false)
  ret void
}

I think what we need is that: only allow merging llvm.coro.save when its user llvm.coro.suspend is also merged. But I don't know a good way to express that with IR.

so it clearly makes no sense to merge the non-final coro saves. D129025 looks good enough to me temporarily.

Agreed.

however, the only other choice would be to change llvm.coro.save in some way like:

1.add new flag parameter to llvm.coro.save
2.add llvm.coro.save.final

I'm not a fan of both.

So option2 is not valid. But I am a fan of option 1. Since it looks naturally to merge 2 @llvm.coro.save(ptr null) but it wouldn't be naturally to merge @llvm.coro.save(ptr null, i32 0) and @llvm.coro.save(ptr null, i32 1). So we solve the problem in the IR level.

it looks naturally to merge 2 @llvm.coro.save(ptr null)

Hmmm, I don't quite understand this. Do you please give an example?

I mean it might be a little bit surprising for people (who understand compiler but don't understand the coro intrinsics) that the optimizer wouldn't merge @llvm.coro.save in the example. But if the @llvm.coro.save intrinsic have another parameter, they won't be surprised.

One more complication is that for

define void @f(i32 %x) {
entry:
  %cmp = icmp slt i32 %x, 0
  br i1 %cmp, label %await.suspend, label %final.suspend

await.suspend:
  %0 = call token @llvm.coro.save(ptr null)
  %1 = call i8 @llvm.coro.suspend(token %0, i1 false)
  br label %coro.ret

final.suspend:
  %2 = call token @llvm.coro.save(ptr null)
  %3 = call i8 @llvm.coro.suspend(token %2, i1 false)
  br label %coro.ret

coro.ret:
  ret void
}

this is not correct (SimplifyCFG only hoists llvm.coro.save):

define void @f(i32 %x) {
entry:
  %cmp = icmp slt i32 %x, 0
  %0 = call token @llvm.coro.save(ptr null)
  br i1 %cmp, label %await.suspend, label %final.suspend

await.suspend:
  %1 = call i8 @llvm.coro.suspend(token %0, i1 false)
  br label %coro.ret

final.suspend:
  %3 = call i8 @llvm.coro.suspend(token %0, i1 false)
  br label %coro.ret

coro.ret:
  ret void
}

but this is correct (SimplifyCFG hoists llvm.coro.suspend too):

define void @f(i32 %x) {
entry:
  %cmp = icmp slt i32 %x, 0
  %0 = call token @llvm.coro.save(ptr null)
  %1 = call i8 @llvm.coro.suspend(token %0, i1 false)
  ret void
}

Oh, you're right. It won't be a problem if the @llvm.coro.suspend intrinsic get merged too.

I think what we need is that: only allow merging llvm.coro.save when its user llvm.coro.suspend is also merged. But I don't know a good way to express that with IR.

I think you're right. I don't have a good idea too. Before we get a solution, I think the suggestion from @nikic is good enough in practice.

ychen added a comment.Jul 7 2022, 12:29 AM

Another possibility is to mark llvm.coro.save as IntrNoMerge, though that would prevent any merging (similar to the SimplifyCFG patch, just only for this intrinsic and not other token-returning intrinsics).

I'm not familiar with these coro intrinsics, but if these have two different ways of being lowered, then adding a flag to distinguish them generally makes sense to me.

Ah, nomerge function attribute looks very useful. Although it could not cover all the cases perfectly, I wouldn't be too worried since these cases should be relatively rare in practice.

@ChuanqiXu @nikic Thanks for the discussion. I'll create a patch to obsolete D129025.

ychen abandoned this revision.Jul 12 2022, 11:43 AM

Obsoleted by D129530