This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] Make capture check during call slot optimization more precise
ClosedPublic

Authored by nikic on Dec 13 2021, 2:18 AM.

Details

Summary

Call slot optimization is currently supposed to be prevented if the call can capture the source pointer. Due to an implementation bug, this check currently doesn't trigger if a bitcast of the source pointer is passed instead. I'm somewhat afraid of the fallout of fixing this bug (due to heavy reliance on call slot optimization in rust), so I'd like to strengthen the capture reasoning a bit first.

In particular, I believe that the capture is fine as long as there is no possible use of the captured pointer before the lifetime of the source alloca ends, either due to lifetime.end or a return from a function. At that point the potentially captured pointer becomes dangling. (I tried checking this in alive2, but it also accepts the transform in cases it clearly shouldn't, so I assume it just doesn't model captures yet.)

Diff Detail

Event Timeline

nikic created this revision.Dec 13 2021, 2:18 AM
nikic requested review of this revision.Dec 13 2021, 2:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 13 2021, 2:18 AM

I don't see the original bug your fixing in the test cases, and your explanation isn't clear to me. Can you expand on that point a bit?

On your extension, I think there might be a useful generalization here. As near as I can tell, your bit of code is effectively a local no-capture analysis. Your reasoning could be phrased as "if the captured memory can't be accessed in a well defined manner before the end of the lifetime of the captured storage, it can't actually have been captured". Right?

Assuming that's correct, what about doing this in DSE and annotating the call param no capture directly? Shouldn't the backwards walk DSE does on dead allocas be enough to annotate these cases? If so, MemCpyOpt could then simply fix the bug, and we could get generally strongly nocapture reasoning everywhere.

Your modref check is the only bit I'm not sure ports over naturally. It depends on how important that case is to you.

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
966

I believe you need to check the size of the end here as well. You could have a zero sized end, which I believe is a noop.

Hm, I did think of one concerning case. What about well defined synchronization inside the callee? I think your change is still correct since the capture can't outlive the call (i.e. the callee must arrange for the other thread to be done working with storage before returning), but that point is a bit subtle.

I don't see the original bug your fixing in the test cases, and your explanation isn't clear to me. Can you expand on that point a bit?

The referenced bug is the TODO on the @test_bitcast testcase.

On your extension, I think there might be a useful generalization here. As near as I can tell, your bit of code is effectively a local no-capture analysis. Your reasoning could be phrased as "if the captured memory can't be accessed in a well defined manner before the end of the lifetime of the captured storage, it can't actually have been captured". Right?

Assuming that's correct, what about doing this in DSE and annotating the call param no capture directly? Shouldn't the backwards walk DSE does on dead allocas be enough to annotate these cases? If so, MemCpyOpt could then simply fix the bug, and we could get generally strongly nocapture reasoning everywhere.

Yeah, that sounds about right. However, I don't think that this would allow us to place a nocapture attribute: After all, the pointer may indeed be captured, it's just that the capture ends up not being used before the object goes out of scope. The capture in the call may be relevant independently of effects it may allow after the call.

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
966

Yeah, you're right.

nikic updated this revision to Diff 395813.Dec 22 2021, 1:50 AM

Check size in lifetime intrinsic.

fhahn added inline comments.Dec 22 2021, 2:24 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
951

could be any_of

983

do we have a test for the terminator case?

nikic updated this revision to Diff 395825.Dec 22 2021, 3:13 AM
nikic marked 3 inline comments as done.

Use any_of, add terminator test, make clang codegen test more robust.

nikic updated this revision to Diff 395833.Dec 22 2021, 3:29 AM

Rebase over pre-commit of clang test change.

reames added a comment.Jan 3 2022, 7:57 AM

Yeah, that sounds about right. However, I don't think that this would allow us to place a nocapture attribute: After all, the pointer may indeed be captured, it's just that the capture ends up not being used before the object goes out of scope. The capture in the call may be relevant independently of effects it may allow after the call.

I'm not sure this is true. Or at least, we seem inconsistent about how we model this today. I won't push for this now, but I do think we need to clarify the meaning of nocapture on this point.

However, I think I found a counter example to this transform.

Consider the following case:
%dest = &g
%src = alloca
foo(%src)
memcpy src to dest

foo(i8* %a) {

if (%a == &g) throw();

}

That is, if we allow foo to capture it's argument, one of the things it's allowed to do is to check the address of address of that argument against an already captured address (such as a global). In the original program, that comparison will always be false, but if dest is the global, we've changed program behavior.

Do you see something I'm missing which disallows this case?

nikic added a comment.Jan 3 2022, 8:14 AM

However, I think I found a counter example to this transform.

Consider the following case:
%dest = &g
%src = alloca
foo(%src)
memcpy src to dest

foo(i8* %a) {

if (%a == &g) throw();

}

That is, if we allow foo to capture it's argument, one of the things it's allowed to do is to check the address of address of that argument against an already captured address (such as a global). In the original program, that comparison will always be false, but if dest is the global, we've changed program behavior.

Do you see something I'm missing which disallows this case?

This is a bit tricky. We do a related check in the code at https://github.com/llvm/llvm-project/blob/4435d1819efec06e11461799fe83d6f148b098f4/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp#L966-L975, because generally this transform is already illegal for other reasons if dest may be captured by the call. This is an AA based check though, so a sufficiently smart GlobalsModRef might decide that the call cannot mod/ref the global if the global is only used in a comparison (haven't checked whether it is actually that smart). To be more conservative, I could add a check that the dest is "identified function local" to preclude the possibility that it has already escaped before the function (like a global).

nikic updated this revision to Diff 397070.Jan 3 2022, 8:35 AM

Check that dest is identified function local. Note that the added @test_global test already passes before this change, because GlobalsModRef is not actually smart enough to cause problems. Still, it might be in the future.

reames added a comment.Jan 3 2022, 8:56 AM

isIdentifiedFunctionLocal isn't the right check. I think you need is not captured before.

Take my prior example, replace the global with an alloca, and add a capturing store of that alloca into some global location. Foo reads back the global, and checks the address.

nikic updated this revision to Diff 397284.Jan 4 2022, 6:48 AM

Do a captured before check on dest. I initially thought this is already covered by the callCapturesBefore check, but it isn't always. The test_dest_captured_before_alloca test was miscompiled before the additional check.

Thanks for catching!

reames accepted this revision.Jan 4 2022, 8:33 AM

LGTM

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
994

Aside: We have this slightly error prone pattern repeated in a few places, maybe it's time to add a matcher? Or helper routine?

(Entirely optional follow up.)

This revision is now accepted and ready to land.Jan 4 2022, 8:33 AM
reames added inline comments.Jan 4 2022, 8:35 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
964

Oh, please add a comment about how this depends on source not being captured *before* the call as well. Captured during the call is fine, but we depend on their not being a capture before for the same reason as dest.

This revision was landed with ongoing or failed builds.Jan 5 2022, 12:42 AM
This revision was automatically updated to reflect the committed changes.
hans added a subscriber: hans.Jan 14 2022, 8:31 AM

Just an early heads up that we bisected a test regression in V8 to this: https://bugs.chromium.org/p/chromium/issues/detail?id=1287437
It still needs more investigation, but it would be interesting to know if others have hit issues too.

hans added a comment.Jan 18 2022, 8:42 AM

Just an early heads up that we bisected a test regression in V8 to this: https://bugs.chromium.org/p/chromium/issues/detail?id=1287437
It still needs more investigation, but it would be interesting to know if others have hit issues too.

https://bugs.chromium.org/p/chromium/issues/detail?id=1287437#c15 has the root cause. The problem is that the call slot optimization replaces a call argument without considering the call's !noalias metadata, introducing an aliasing violation. (Probably the optimization could be conservative and just drop the metadata if it exists).

I suspect this patch didn't actually introduce the problem, but rather made the optimization more likely to fire and so uncovered the pre-existing problem. In any case, I'll revert back to green until this can be fixed.