This is an archive of the discontinued LLVM Phabricator instance.

[Coroutine] Allocas used by StoreInst does not always escape
ClosedPublic

Authored by lxfind on Nov 11 2020, 3:50 PM.

Details

Summary

In the existing logic, for a given alloca, as long as its pointer value is stored into another location, it's considered as escaped.
This is a bit too conservative. Specifically, in non-optimized build mode, it's often to have patterns of code that first store an alloca somewhere and then load it right away.
These used should be handled without conservatively marking them escaped.

This patch tracks how the memory location where an alloca pointer is stored into is being used. As long as we only try to load from that location and nothing else, we can still
consider the original alloca not escaping and keep it on the stack instead of putting it on the frame.

Diff Detail

Event Timeline

lxfind created this revision.Nov 11 2020, 3:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 11 2020, 3:50 PM
lxfind requested review of this revision.Nov 11 2020, 3:50 PM

Good. The intuition and implementation make sense.

This revision was not accepted when it landed; it landed in state Needs Review.Nov 11 2020, 8:54 PM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
lxfind reopened this revision.Nov 11 2020, 9:10 PM

Landed by accident. Reverted and reopening for review.

I'm not sure whether we should fix this case by case. Maybe we should use memoryssa to handle this.

I'm not sure whether we should fix this case by case. Maybe we should use memoryssa to handle this.

In general yes I agree with you that we don't want to handle case by case and make this over-complicated. However there are good reasons that this patch is quite necessary.

  1. When optimizations are enabled, there are usually pretty accurate lifetime information, and hence we don't usually even need the alloca visitor to track them. So the alloca visitor is crucial when optimizations are turned off and there is no lifetime information. When optimizations are turned off, because there isn't mem2reg, almost all data accesses are done through the patterns of stores followed with loads. What I found is that in such case, when optimizations are turned off, almost all allocas are put on the frame because they are all used in store instructions, making this a bit useless. This patch is a key to make the alloca tracking effective and so that majority of allocas can be moved out of frame. Hence the effect of this patch is quite significant comparing to the effort. It's unlikely that I will add any more logic to track other cases.
  2. This is also a continuation of D90990, which is to avoid frame-use-after-destroy. What I found is that, even I fixed the front-end such that the use of temporaries after a call to await_suspend() are contained locally that does not go across suspension points, because they are all involved in this store/load pattern, they are still being put on the frame, causing memory corruptions in non-optimized mode. This patch can fix that issue completely.

I'm not sure whether we should fix this case by case. Maybe we should use memoryssa to handle this.

In general yes I agree with you that we don't want to handle case by case and make this over-complicated. However there are good reasons that this patch is quite necessary.

  1. When optimizations are enabled, there are usually pretty accurate lifetime information, and hence we don't usually even need the alloca visitor to track them. So the alloca visitor is crucial when optimizations are turned off and there is no lifetime information. When optimizations are turned off, because there isn't mem2reg, almost all data accesses are done through the patterns of stores followed with loads. What I found is that in such case, when optimizations are turned off, almost all allocas are put on the frame because they are all used in store instructions, making this a bit useless. This patch is a key to make the alloca tracking effective and so that majority of allocas can be moved out of frame. Hence the effect of this patch is quite significant comparing to the effort. It's unlikely that I will add any more logic to track other cases.
  2. This is also a continuation of D90990, which is to avoid frame-use-after-destroy. What I found is that, even I fixed the front-end such that the use of temporaries after a call to await_suspend() are contained locally that does not go across suspension points, because they are all involved in this store/load pattern, they are still being put on the frame, causing memory corruptions in non-optimized mode. This patch can fix that issue completely.

Do we really need handle this in non-optimized mode? @bruno I'm afraid that this patch may affect the debugging quality fixed in D90772, @lxfind, have you ever checked this?

Do we really need handle this in non-optimized mode? @bruno I'm afraid that this patch may affect the debugging quality fixed in D90772, @lxfind, have you ever checked this?

I believe we do. As I mentioned in the second reason above, there should be absolutely no frame access after the call to await_suspend() until the symmetric transfer, since the frame may have been destroyed in await_suspend(). However based on the current way of determining whether an alloca should live in the frame, almost all allocas will be put on the frame in O0 because of the common use of store instructions. Such issue leads to TSAN failures since TSAN usually runs in O0 without lifetime intrinsics.

This patch doesn't affects debugability. In general, whether or not an alloca should be put on the frame should never affect debugability. If so that would be a problem that needs to be fixed alone.

junparser accepted this revision.Nov 14 2020, 5:58 PM

Thanks for the patch!

This revision is now accepted and ready to land.Nov 14 2020, 5:58 PM

Thanks for the patch!

Thank you for the review!