This is an archive of the discontinued LLVM Phabricator instance.

[Coroutines] Enhance symmetric transfer for constant CmpInst
ClosedPublic

Authored by ChuanqiXu on Dec 27 2021, 9:17 PM.

Details

Summary

This fixes bug52896: https://github.com/llvm/llvm-project/issues/52896.

Simply, some symmetric transfer optimization chances get invalided due to we delete some inlined optimization passes in https://github.com/llvm/llvm-project/commit/822b92aae439c4ba2946980c8a27bd2c8a62d90c#diff-8cf21bda84c593733aa099f89fe7d197fd83203c6bc4f6fbdcd94f8bb4256d23L1138-L1147 . This would cause stack-overflow in some situations which should be avoided by the design of coroutine: https://godbolt.org/z/E55h9Pv9f

This patch tries to fix this by transforming the constant CmpInst instruction which was done in the deleted passes.

Diff Detail

Event Timeline

ChuanqiXu created this revision.Dec 27 2021, 9:17 PM
ChuanqiXu requested review of this revision.Dec 27 2021, 9:17 PM

This is general LGTM with some comments

llvm/lib/Transforms/Coroutines/CoroSplit.cpp
1234

Instead simplify each instructions at pieces,would you use SimplifyInstructionsInBlock to handle each basic block at once?

ChuanqiXu added inline comments.Jan 10 2022, 8:23 PM
llvm/lib/Transforms/Coroutines/CoroSplit.cpp
1234

What this function do is slightly different from SimplifyInstructionsInBlock. The function is trying to simplify the sequence of terminators to ret. So what the function do is intra-BB optimization which seems not be able to be handled by SimplifyInstructionsInBlock. And this function is more effective than running SimplifyInstructionsInBlock for each basic blocks.

rjmccall added inline comments.Jan 10 2022, 9:00 PM
llvm/lib/Transforms/Coroutines/CoroSplit.cpp
1253

I liked the old comment that was here that explained why we bother handling this case.

1256

Should we do the same lookup in ResolvedValues here?

I would suggest having a general helper that you can put at the top of this function:

auto tryResolveConstant = [&](Value *V) -> Constant* {
  auto it = ResolvedValues.find(V));
  if (it != ResolvedValues.end())
   V = it->second;
  return dyn_cast<Constant>(V);
};

You don't need to check specifically for ConstantInt; LLVM can constant-fold some conditions on other kinds of constants.

Address comments.

ChuanqiXu marked 2 inline comments as done.Jan 10 2022, 11:07 PM
ChuanqiXu added inline comments.
llvm/lib/Transforms/Coroutines/CoroSplit.cpp
1256

Should we do the same lookup in ResolvedValues here?

According to the comment and transformation detail, the second operand should be literal constant, so we shouldn't call TryResolveConstant for it. I've added this to comment.

I would suggest having a general helper that you can put at the top of this function:

Done. Thanks for suggestion!

You don't need to check specifically for ConstantInt; LLVM can constant-fold some conditions on other kinds of constants.

If we check for Constant instead of ConstantInt, then we must write auto *Cond = dyn_cast<ConstantInt>(TryResolveConstant(...)); on line 1277 since the argument type of findCaseValue is ConstantInt, it looks odd to me. And we wouldn't lose anything if we check for ConstantInt only, so I chose to check specifically for ConstantInt in this revision still.

rjmccall accepted this revision.Jan 11 2022, 11:01 AM

Hmm, alright. I really don't love that high-level semantics are dependent on exactly the right transforms happening in exactly the right way, but I guess that's what the C++ committee has stuck us with.

This revision is now accepted and ready to land.Jan 11 2022, 11:01 AM
ChuanqiXu marked an inline comment as done.Jan 11 2022, 5:52 PM

Hmm, alright. I really don't love that high-level semantics are dependent on exactly the right transforms happening in exactly the right way, but I guess that's what the C++ committee has stuck us with.

Yeah, agreed. BTW, the symmetric transfer is not part of the C++ standard document like tail call is not in the standard too. It is just a compiler optimization that end-users are familiar with. So it becomes a somehow de facto standard so that users would feel very odd if the compiler fails to do so.

This revision was landed with ongoing or failed builds.Jan 11 2022, 6:15 PM
This revision was automatically updated to reflect the committed changes.
sidorovd added a subscriber: sidorovd.EditedJan 14 2022, 5:05 AM

@ChuanqiXu hi, with this patch the generated LLVM IR snippet is looking like:

%_Z5task0v.Frame = type { void (%_Z5task0v.Frame*)*, void (%_Z5task0v.Frame*)*, %struct.TaskPromise, i1 }

(see https://godbolt.org/z/b3s9a4xnW - it's the same example from the patch's headed, but with -S -emit-llvm).

In the following patch you add a regression test clang/test/CodeGenCoroutines/coro-elide.cpp , which checks an opposite, which is a bit confusing

// CHECK: %_Z5task1v.Frame = type {{.*}}%_Z5task0v.Frame

Could you please tell if it's a typo or it was done intentional? If intentional, I keen to know, where could I learn more about the coroutine frame jump, thanks :)

@ChuanqiXu hi, with this patch the generated LLVM IR snippet is looking like:

%_Z5task0v.Frame = type { void (%_Z5task0v.Frame*)*, void (%_Z5task0v.Frame*)*, %struct.TaskPromise, i1 }

(see https://godbolt.org/z/b3s9a4xnW - it's the same example from the patch's headed, but with -S -emit-llvm).

In the following patch you add a regression test clang/test/CodeGenCoroutines/coro-elide.cpp , which checks an opposite, which is a bit confusing

// CHECK: %_Z5task1v.Frame = type {{.*}}%_Z5task0v.Frame

Could you please tell if it's a typo or it was done intentional? If intentional, I keen to know, where could I learn more about the coroutine frame jump, thanks :)

Yeah, it is intentional. I added two regression tests in the following commit. But only one is related to this revision, while the other one you mentioned is related to another optimization for coroutine. The optimization is called Coroutine Elision. You could find the original design in: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0981r0.html. For the more detailed implementation, you might need to take a look at CoroElide pass. For the coroutine intrinsic, it is good to visit https://llvm.org/docs/Coroutines.html. It is not completed now. For example, I found that it lacks a ABI document just now...

erichkeane added a subscriber: erichkeane.EditedJan 26 2022, 10:25 AM

@ChuanqiXu hi, with this patch the generated LLVM IR snippet is looking like:

%_Z5task0v.Frame = type { void (%_Z5task0v.Frame*)*, void (%_Z5task0v.Frame*)*, %struct.TaskPromise, i1 }

(see https://godbolt.org/z/b3s9a4xnW - it's the same example from the patch's headed, but with -S -emit-llvm).

In the following patch you add a regression test clang/test/CodeGenCoroutines/coro-elide.cpp , which checks an opposite, which is a bit confusing

// CHECK: %_Z5task1v.Frame = type {{.*}}%_Z5task0v.Frame

Could you please tell if it's a typo or it was done intentional? If intentional, I keen to know, where could I learn more about the coroutine frame jump, thanks :)

Yeah, it is intentional. I added two regression tests in the following commit. But only one is related to this revision, while the other one you mentioned is related to another optimization for coroutine. The optimization is called Coroutine Elision. You could find the original design in: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0981r0.html. For the more detailed implementation, you might need to take a look at CoroElide pass. For the coroutine intrinsic, it is good to visit https://llvm.org/docs/Coroutines.html. It is not completed now. For example, I found that it lacks a ABI document just now...

So the problem I'm seeing in our downstream (which might be behind) is that the _Z5task1v.Frame is:

%_Z5task1v.Frame = type { void (%_Z5task1v.Frame*)*, void (%_Z5task1v.Frame*)*, %"struct.Task::promise_type", i2 }

Note that this does NOT match the 'CHECK' line.

HOWEVER, in upstream, I see:
%_Z5task1v.Frame = type { void (%_Z5task1v.Frame*)*, void (%_Z5task1v.Frame*)*, %"struct.Task::promise_type", %_Z5task0v.Frame, i2 }

Note the %_Z5task0v.Frame before the 'i2'.

Do you perhaps have any hints as to where that could have gone?

@ChuanqiXu hi, with this patch the generated LLVM IR snippet is looking like:

%_Z5task0v.Frame = type { void (%_Z5task0v.Frame*)*, void (%_Z5task0v.Frame*)*, %struct.TaskPromise, i1 }

(see https://godbolt.org/z/b3s9a4xnW - it's the same example from the patch's headed, but with -S -emit-llvm).

In the following patch you add a regression test clang/test/CodeGenCoroutines/coro-elide.cpp , which checks an opposite, which is a bit confusing

// CHECK: %_Z5task1v.Frame = type {{.*}}%_Z5task0v.Frame

Could you please tell if it's a typo or it was done intentional? If intentional, I keen to know, where could I learn more about the coroutine frame jump, thanks :)

Yeah, it is intentional. I added two regression tests in the following commit. But only one is related to this revision, while the other one you mentioned is related to another optimization for coroutine. The optimization is called Coroutine Elision. You could find the original design in: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0981r0.html. For the more detailed implementation, you might need to take a look at CoroElide pass. For the coroutine intrinsic, it is good to visit https://llvm.org/docs/Coroutines.html. It is not completed now. For example, I found that it lacks a ABI document just now...

So the problem I'm seeing in our downstream (which might be behind) is that the _Z5task1v.Frame is:

%_Z5task1v.Frame = type { void (%_Z5task1v.Frame*)*, void (%_Z5task1v.Frame*)*, %"struct.Task::promise_type", i2 }

Note that this does NOT match the 'CHECK' line.

HOWEVER, in upstream, I see:
%_Z5task1v.Frame = type { void (%_Z5task1v.Frame*)*, void (%_Z5task1v.Frame*)*, %"struct.Task::promise_type", %_Z5task0v.Frame, i2 }

Note the %_Z5task0v.Frame before the 'i2'.

Do you perhaps have any hints as to where that could have gone?

Do you talk about https://github.com/llvm/llvm-project/commit/bf5f2354fa6e3f31a1acea75a229fee54359e279#diff-0b5d68fb7c4bb0c5aad6c883444d80715cf6879e3d27e24e38c22a61c6971730?

This is a regression test for an optimization called CoroElide, which going to eliminate the heap allocation.

For example, without the optimization, when we call task1, it would call std::new to generate task1v.Frame and call std::new again to generate task0v.Frame. So here are 2 calls to allocate on the heap. But with the optimization, it would call std::new only once when we call task1. So we decrease the time to allocate on heap.

If this tests fails, it implies the optimization fails in the case. hmmm since it fails in the downstream, I am not sure how could I help you. Any thoughts?

BTW, if coroutine is not important to you, I think you could delete the tests in downstream. It shouldn't do something harm to the semantics. It is just an optimization. It doesn't matter about the semantics in the standard. (The idea of CoroElide comes from a language proposal. But it doesn't show up in the standard. I guess the reason may be that the standard don't care optimization like this one)

@ChuanqiXu hi, with this patch the generated LLVM IR snippet is looking like:

%_Z5task0v.Frame = type { void (%_Z5task0v.Frame*)*, void (%_Z5task0v.Frame*)*, %struct.TaskPromise, i1 }

(see https://godbolt.org/z/b3s9a4xnW - it's the same example from the patch's headed, but with -S -emit-llvm).

In the following patch you add a regression test clang/test/CodeGenCoroutines/coro-elide.cpp , which checks an opposite, which is a bit confusing

// CHECK: %_Z5task1v.Frame = type {{.*}}%_Z5task0v.Frame

Could you please tell if it's a typo or it was done intentional? If intentional, I keen to know, where could I learn more about the coroutine frame jump, thanks :)

Yeah, it is intentional. I added two regression tests in the following commit. But only one is related to this revision, while the other one you mentioned is related to another optimization for coroutine. The optimization is called Coroutine Elision. You could find the original design in: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0981r0.html. For the more detailed implementation, you might need to take a look at CoroElide pass. For the coroutine intrinsic, it is good to visit https://llvm.org/docs/Coroutines.html. It is not completed now. For example, I found that it lacks a ABI document just now...

So the problem I'm seeing in our downstream (which might be behind) is that the _Z5task1v.Frame is:

%_Z5task1v.Frame = type { void (%_Z5task1v.Frame*)*, void (%_Z5task1v.Frame*)*, %"struct.Task::promise_type", i2 }

Note that this does NOT match the 'CHECK' line.

HOWEVER, in upstream, I see:
%_Z5task1v.Frame = type { void (%_Z5task1v.Frame*)*, void (%_Z5task1v.Frame*)*, %"struct.Task::promise_type", %_Z5task0v.Frame, i2 }

Note the %_Z5task0v.Frame before the 'i2'.

Do you perhaps have any hints as to where that could have gone?

Do you talk about https://github.com/llvm/llvm-project/commit/bf5f2354fa6e3f31a1acea75a229fee54359e279#diff-0b5d68fb7c4bb0c5aad6c883444d80715cf6879e3d27e24e38c22a61c6971730?

This is a regression test for an optimization called CoroElide, which going to eliminate the heap allocation.

For example, without the optimization, when we call task1, it would call std::new to generate task1v.Frame and call std::new again to generate task0v.Frame. So here are 2 calls to allocate on the heap. But with the optimization, it would call std::new only once when we call task1. So we decrease the time to allocate on heap.

If this tests fails, it implies the optimization fails in the case. hmmm since it fails in the downstream, I am not sure how could I help you. Any thoughts?

BTW, if coroutine is not important to you, I think you could delete the tests in downstream. It shouldn't do something harm to the semantics. It is just an optimization. It doesn't matter about the semantics in the standard. (The idea of CoroElide comes from a language proposal. But it doesn't show up in the standard. I guess the reason may be that the standard don't care optimization like this one)

Yep, thats the file.

I'm just shocked to see a CFE test that depends on optimization/does optimization checks like this. This is pretty far from typical, and of course causes confusion. I presume one of our optimizations is working a bit differently (another reason we typically don't have lit tests like this in the CFE) from community.

We LIKE to keep as many of the tests in downstream, as it helps with correctness, but I'm beginning to doubt the validity of this test; it seems it'll be quite fragile.

@ChuanqiXu hi, with this patch the generated LLVM IR snippet is looking like:

%_Z5task0v.Frame = type { void (%_Z5task0v.Frame*)*, void (%_Z5task0v.Frame*)*, %struct.TaskPromise, i1 }

(see https://godbolt.org/z/b3s9a4xnW - it's the same example from the patch's headed, but with -S -emit-llvm).

In the following patch you add a regression test clang/test/CodeGenCoroutines/coro-elide.cpp , which checks an opposite, which is a bit confusing

// CHECK: %_Z5task1v.Frame = type {{.*}}%_Z5task0v.Frame

Could you please tell if it's a typo or it was done intentional? If intentional, I keen to know, where could I learn more about the coroutine frame jump, thanks :)

Yeah, it is intentional. I added two regression tests in the following commit. But only one is related to this revision, while the other one you mentioned is related to another optimization for coroutine. The optimization is called Coroutine Elision. You could find the original design in: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0981r0.html. For the more detailed implementation, you might need to take a look at CoroElide pass. For the coroutine intrinsic, it is good to visit https://llvm.org/docs/Coroutines.html. It is not completed now. For example, I found that it lacks a ABI document just now...

So the problem I'm seeing in our downstream (which might be behind) is that the _Z5task1v.Frame is:

%_Z5task1v.Frame = type { void (%_Z5task1v.Frame*)*, void (%_Z5task1v.Frame*)*, %"struct.Task::promise_type", i2 }

Note that this does NOT match the 'CHECK' line.

HOWEVER, in upstream, I see:
%_Z5task1v.Frame = type { void (%_Z5task1v.Frame*)*, void (%_Z5task1v.Frame*)*, %"struct.Task::promise_type", %_Z5task0v.Frame, i2 }

Note the %_Z5task0v.Frame before the 'i2'.

Do you perhaps have any hints as to where that could have gone?

Do you talk about https://github.com/llvm/llvm-project/commit/bf5f2354fa6e3f31a1acea75a229fee54359e279#diff-0b5d68fb7c4bb0c5aad6c883444d80715cf6879e3d27e24e38c22a61c6971730?

This is a regression test for an optimization called CoroElide, which going to eliminate the heap allocation.

For example, without the optimization, when we call task1, it would call std::new to generate task1v.Frame and call std::new again to generate task0v.Frame. So here are 2 calls to allocate on the heap. But with the optimization, it would call std::new only once when we call task1. So we decrease the time to allocate on heap.

If this tests fails, it implies the optimization fails in the case. hmmm since it fails in the downstream, I am not sure how could I help you. Any thoughts?

BTW, if coroutine is not important to you, I think you could delete the tests in downstream. It shouldn't do something harm to the semantics. It is just an optimization. It doesn't matter about the semantics in the standard. (The idea of CoroElide comes from a language proposal. But it doesn't show up in the standard. I guess the reason may be that the standard don't care optimization like this one)

Yep, thats the file.

I'm just shocked to see a CFE test that depends on optimization/does optimization checks like this. This is pretty far from typical, and of course causes confusion. I presume one of our optimizations is working a bit differently (another reason we typically don't have lit tests like this in the CFE) from community.

We LIKE to keep as many of the tests in downstream, as it helps with correctness, but I'm beginning to doubt the validity of this test; it seems it'll be quite fragile.

Yeah, before I created the file, I searched O2/O3 in clang/test/CodeGen* directory and I found many results. So I feel it might not be bad to have test in clang to depend on optimization in middle end directly. The best practice maybe we check the pattern generated in the frontend and we check the transformation for the pattern in the middle end. I would like to do this but I am a little bit busying now and I am going to take a vacation next week so it might wait for a longer time.