When the initial_suspend of coroutine isn't suspend_always, the ramp function may be very large. And the large ramp function would be hard to inline, which would disable Coro-elide optimization further. And split ramp function would normalize the formal for all the ramp function. Now all the ramp function looks same and only did some simple initializations.
This patch did split ramp function by insert a coro.suspend before initial_suspend and a call to resume function just after coro.end. The inserted coro.suspend makes sure that the ramp function would be small. And the inserted resume call makes sure the control flow remains the same.
But there is still some issues:
- It seems like we shouldn't do this for whose initial_suspend is suspend_always. But where should we add this check? In the front end or in the middle end.
- If ramp function returns gro by the argument, then the call to resume in the end of ramp would be a tail call, which is nice. But if the gro should be returned directly, then how much is the cost about the inserted call?
- If we compiled a coroutine program with -g, debug information for the coroutine would be copied 3 times. Then the final binary would be very large, which is really hard for CI/CD. I was imaged that this patch could help the problem. However, the debug information would be copied before compiler try to do splitting, which results the binary size compiled with -g wouldn't reduce after this patch.
This patch isn't intended for committing. I just want to ask for opinions about your guys.