This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] Fix thread-safety of call slot opimization
Needs ReviewPublic

Authored by nikic on Oct 6 2020, 1:20 PM.

Details

Summary

This is a followup to D88799 to address the thread-safety issue raised there. If the destination is captured, we need to check that the copy instruction is executed, to exclude concurrent access from another thread.

It turns out that this doesn't really change much: If the destination is captured, then the call itself needs to be argmemonly to not potentially modify the destination, and argmemonly currently (incorrectly...) implies willreturn. Thus the only requirement that actually gets added in practice is nounwind (even for alloca dest).

Diff Detail

Event Timeline

nikic created this revision.Oct 6 2020, 1:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 6 2020, 1:20 PM
nikic requested review of this revision.Oct 6 2020, 1:20 PM
efriedma added inline comments.Oct 6 2020, 1:55 PM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
810

callCapturesBefore only checks the call C itself. Don't we also care if the pointer is captured somewhere before the call?

It turns out that this doesn't really change much: If the destination is captured, then the call itself needs to be argmemonly to not potentially modify the destination, and argmemonly currently (incorrectly...) implies willreturn. Thus the only requirement that actually gets added in practice is nounwind (even for alloca dest).

I did not get this :(


If the caller is nosync there is no need for all this thread safety checking.


The isGuaranteedToTransferExecutionToSuccessor loop is not necessary for nounwind + willreturn functions.


I feel we probably need the same logic in different places and this might even deserve a helper.

If the caller is nosync there is no need for all this thread safety checking.

I'm not sure nosync helps here? The issue this patch is trying to solve is the case where the memcpy is dynamically dead due to an infinite loop; there isn't any synchronization involved.

nikic added a comment.Oct 7 2020, 12:48 AM

If the caller is nosync there is no need for all this thread safety checking.

I'm not sure nosync helps here? The issue this patch is trying to solve is the case where the memcpy is dynamically dead due to an infinite loop; there isn't any synchronization involved.

That's right. If we synchronized between the copy and the memcpy, then we'd see that as the memcpy dependency. This only applies to the unsynchronized, dynamically dead case.

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
810

I find this method to be somewhat confusingly named, but what it actually does is to a) check for captures up to and including the call (-> ModRef) and b) a potential access via a non-capturing argument in the call itself.

See https://github.com/llvm/llvm-project/blob/334ec6f807fa65e09571fa42a0c3be0eb39e7c0f/llvm/lib/Analysis/AliasAnalysis.cpp#L652.

efriedma added inline comments.Oct 7 2020, 1:02 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
810

That makes sense.

That said, I don't think the handling of noalias arguments here is correct. In general, noalias only affects memory accesses that dynamically execute. So here, if the write to destLoc is dynamically dead, you can't do anything useful with the noalias attribute; you have to treat it like any other captured pointer.

Not sure if that's a bug in callCapturesBefore(), or the way it's used here.

nikic added inline comments.Oct 7 2020, 1:08 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
810

That said, I don't think the handling of noalias arguments here is correct. In general, noalias only affects memory accesses that dynamically execute. So here, if the write to destLoc is dynamically dead, you can't do anything useful with the noalias attribute; you have to treat it like any other captured pointer.

But doesn't (a non-captured) noalias imply that a write also cannot happen from a different thread, as that means that a write to the noalias object occurred without going through the noalias pointer?

efriedma added inline comments.Oct 7 2020, 9:26 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
810

The description of noalias in LangRef starts with "This indicates that memory locations accessed via pointer values based on the argument [...]". If that set of memory locations is empty, the rest of the description doesn't guarantee anything.

nikic updated this revision to Diff 296726.Oct 7 2020, 10:10 AM

Handle noalias arguments the same as other captures. Now this change does have a practical effect, and I had to annotate some existing tests with willreturn.

nikic added inline comments.Oct 7 2020, 10:12 AM
llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
810

I see, that's unfortunate :( I've updated the implementation to treat noalias arguments the same as other captures now.

Any opinion on whether we should change callCapturesBefore()? Given it only has three existing callers, probably worth looking into.

nikic added a comment.Oct 7 2020, 12:29 PM

Any opinion on whether we should change callCapturesBefore()? Given it only has three existing callers, probably worth looking into.

There's actually only one other caller in MemDepAnalysis (rest is comments or trivial forwarding), where it is used to determine whether a call can mod/ref a location: https://github.com/llvm/llvm-project/blob/9c09e2055ee4d4e3b26e393ab460635825a79538/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp#L622 The current noalias handling seems reasonable to me, mostly in the sense that it matches how noalias would be treated for non-calls. (If I understand the full implication here, whether the result of an AA/MD query is "correct" actually depends on whether you are querying a location that corresponds to a real IR access, and then it is only valid at that access.)

Oh, I see, the unmodified callCapturesBefore result is correct if you're trying to determine whether the the call might modify a value that can be consumed by a load, but not if you're trying to determine if a store clobbers memory that might be used in another thread.

I think I'd prefer to sink the conditional into callCapturesBefore itself, so anyone who modifies it in the future has the necessary context. I think the isa<Argument> check is too tightly coupled to the implementation of callCapturesBefore, as opposed to its abstract semantics.

nikic added a comment.Oct 9 2020, 2:47 PM

Oh, I see, the unmodified callCapturesBefore result is correct if you're trying to determine whether the the call might modify a value that can be consumed by a load, but not if you're trying to determine if a store clobbers memory that might be used in another thread.

I think I'd prefer to sink the conditional into callCapturesBefore itself, so anyone who modifies it in the future has the necessary context. I think the isa<Argument> check is too tightly coupled to the implementation of callCapturesBefore, as opposed to its abstract semantics.

To clarify, the suggestion here is to add a boolean flag to callCapturesBefore() that determines whether arguments should always be considered capturing?

One thing to note is that this would not allow us to treat the call modref check (which does not care about noalias arguments) and the willreturn check (which does) separately. That is, the following code would no longer fold:

define void @test(i8* noalias dereferenceable(16) %dest.i8) {
  %src = alloca [16 x i8]
  %src.i8 = bitcast [16 x i8]* %src to i8*
  call void @accept_ptr(i8* %src.i8) nounwind willreturn
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %dest.i8, i8* %src.i8, i64 16, i1 false)
  call void @accept_ptr(i8* %dest.i8) ; capture
  ret void
}

Of course, it's not a big loss.

To clarify, the suggestion here is to add a boolean flag to callCapturesBefore() that determines whether arguments should always be considered capturing?

Something like that.

One thing to note is that this would not allow us to treat the call modref check (which does not care about noalias arguments) and the willreturn check (which does) separately. That is, the following code would no longer fold:

I guess if we cared, we could make callCapturesBefore() return two results, instead of adding a boolean argument.

Commit message:

If the destination is captured, we need to check that the copy instruction is executed, to exclude concurrent access from another thread.

If the caller is nosync there is no need for all this thread safety checking.

I'm not sure nosync helps here? The issue this patch is trying to solve is the case where the memcpy is dynamically dead due to an infinite loop; there isn't any synchronization involved.

If it is nosync there is no concurrent access without it being a race and therefore UB.