Download Raw Diff

Details

Reviewers

asbirlea
fhahn
jdoerfert
reames

Commits

rG655a7024dbbc: Reapply [MemCpyOpt] Make capture check during call slot optimization more…
rG487a34ed9d7d: [MemCpyOpt] Make capture check during call slot optimization more precise

Summary

Call slot optimization is currently supposed to be prevented if the call can capture the source pointer. Due to an implementation bug, this check currently doesn't trigger if a bitcast of the source pointer is passed instead. I'm somewhat afraid of the fallout of fixing this bug (due to heavy reliance on call slot optimization in rust), so I'd like to strengthen the capture reasoning a bit first.

In particular, I believe that the capture is fine as long as there is no possible use of the captured pointer before the lifetime of the source alloca ends, either due to lifetime.end or a return from a function. At that point the potentially captured pointer becomes dangling. (I tried checking this in alive2, but it also accepts the transform in cases it clearly shouldn't, so I assume it just doesn't model captures yet.)

Diff Detail

Unit TestsFailed

	Time	Test
	170 ms	x64 debian > Clang.CodeGen::aggregate-assign-call.c
	100 ms	x64 debian > LLVM.Bindings/Go::go.test

Event Timeline

nikic created this revision.Dec 13 2021, 2:18 AM

Herald added subscribers: JDevlieghere, hiraditya. · View Herald TranscriptDec 13 2021, 2:18 AM

nikic requested review of this revision.Dec 13 2021, 2:18 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 13 2021, 2:18 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B138915: Diff 393817.Dec 13 2021, 2:59 AM

nikic added a reviewer: reames.Dec 17 2021, 12:59 PM

ping :)

I don't see the original bug your fixing in the test cases, and your explanation isn't clear to me. Can you expand on that point a bit?

On your extension, I think there might be a useful generalization here. As near as I can tell, your bit of code is effectively a local no-capture analysis. Your reasoning could be phrased as "if the captured memory can't be accessed in a well defined manner before the end of the lifetime of the captured storage, it can't actually have been captured". Right?

Assuming that's correct, what about doing this in DSE and annotating the call param no capture directly? Shouldn't the backwards walk DSE does on dead allocas be enough to annotate these cases? If so, MemCpyOpt could then simply fix the bug, and we could get generally strongly nocapture reasoning everywhere.

Your modref check is the only bit I'm not sure ports over naturally. It depends on how important that case is to you.

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
966	I believe you need to check the size of the end here as well. You could have a zero sized end, which I believe is a noop.

Hm, I did think of one concerning case. What about well defined synchronization inside the callee? I think your change is still correct since the capture can't outlive the call (i.e. the callee must arrange for the other thread to be done working with storage before returning), but that point is a bit subtle.

In D115615#3205077, @reames wrote:

I don't see the original bug your fixing in the test cases, and your explanation isn't clear to me. Can you expand on that point a bit?

The referenced bug is the TODO on the @test_bitcast testcase.

On your extension, I think there might be a useful generalization here. As near as I can tell, your bit of code is effectively a local no-capture analysis. Your reasoning could be phrased as "if the captured memory can't be accessed in a well defined manner before the end of the lifetime of the captured storage, it can't actually have been captured". Right?

Assuming that's correct, what about doing this in DSE and annotating the call param no capture directly? Shouldn't the backwards walk DSE does on dead allocas be enough to annotate these cases? If so, MemCpyOpt could then simply fix the bug, and we could get generally strongly nocapture reasoning everywhere.

Yeah, that sounds about right. However, I don't think that this would allow us to place a nocapture attribute: After all, the pointer may indeed be captured, it's just that the capture ends up not being used before the object goes out of scope. The capture in the call may be relevant independently of effects it may allow after the call.

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
966	Yeah, you're right.

Check size in lifetime intrinsic.

fhahn added inline comments.Dec 22 2021, 2:24 AM

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
951	could be any_of
983	do we have a test for the terminator case?

Harbormaster completed remote builds in B140353: Diff 395813.Dec 22 2021, 2:46 AM

Use any_of, add terminator test, make clang codegen test more robust.

Rebase over pre-commit of clang test change.

Harbormaster completed remote builds in B140371: Diff 395833.Dec 22 2021, 4:22 AM

Ping

In D115615#3205364, @nikic wrote:

Yeah, that sounds about right. However, I don't think that this would allow us to place a nocapture attribute: After all, the pointer may indeed be captured, it's just that the capture ends up not being used before the object goes out of scope. The capture in the call may be relevant independently of effects it may allow after the call.

I'm not sure this is true. Or at least, we seem inconsistent about how we model this today. I won't push for this now, but I do think we need to clarify the meaning of nocapture on this point.

However, I think I found a counter example to this transform.

Consider the following case:
%dest = &g
%src = alloca
foo(%src)
memcpy src to dest

foo(i8* %a) {

if (%a == &g) throw();

}

That is, if we allow foo to capture it's argument, one of the things it's allowed to do is to check the address of address of that argument against an already captured address (such as a global). In the original program, that comparison will always be false, but if dest is the global, we've changed program behavior.

Do you see something I'm missing which disallows this case?

In D115615#3217559, @reames wrote:
However, I think I found a counter example to this transform.

Consider the following case:
%dest = &g
%src = alloca
foo(%src)
memcpy src to dest

foo(i8* %a) {
if (%a == &g) throw();
}

That is, if we allow foo to capture it's argument, one of the things it's allowed to do is to check the address of address of that argument against an already captured address (such as a global). In the original program, that comparison will always be false, but if dest is the global, we've changed program behavior.

Do you see something I'm missing which disallows this case?

This is a bit tricky. We do a related check in the code at https://github.com/llvm/llvm-project/blob/4435d1819efec06e11461799fe83d6f148b098f4/llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp#L966-L975, because generally this transform is already illegal for other reasons if dest may be captured by the call. This is an AA based check though, so a sufficiently smart GlobalsModRef might decide that the call cannot mod/ref the global if the global is only used in a comparison (haven't checked whether it is actually that smart). To be more conservative, I could add a check that the dest is "identified function local" to preclude the possibility that it has already escaped before the function (like a global).

Check that dest is identified function local. Note that the added @test_global test already passes before this change, because GlobalsModRef is not actually smart enough to cause problems. Still, it might be in the future.

isIdentifiedFunctionLocal isn't the right check. I think you need is not captured before.

Take my prior example, replace the global with an alloca, and add a capturing store of that alloca into some global location. Foo reads back the global, and checks the address.

Harbormaster completed remote builds in B141333: Diff 397070.Jan 3 2022, 10:01 AM

Do a captured before check on dest. I initially thought this is already covered by the callCapturesBefore check, but it isn't always. The test_dest_captured_before_alloca test was miscompiled before the additional check.

Thanks for catching!

Harbormaster completed remote builds in B141486: Diff 397284.Jan 4 2022, 7:28 AM

LGTM

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
994	Aside: We have this slightly error prone pattern repeated in a few places, maybe it's time to add a matcher? Or helper routine? (Entirely optional follow up.)

This revision is now accepted and ready to land.Jan 4 2022, 8:33 AM

reames added inline comments.Jan 4 2022, 8:35 AM

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
964	Oh, please add a comment about how this depends on source not being captured before the call as well. Captured during the call is fine, but we depend on their not being a capture before for the same reason as dest.

This revision was landed with ongoing or failed builds.Jan 5 2022, 12:42 AM

Closed by commit rG487a34ed9d7d: [MemCpyOpt] Make capture check during call slot optimization more precise (authored by nikic). · Explain Why

This revision was automatically updated to reflect the committed changes.

nikic added a commit: rG487a34ed9d7d: [MemCpyOpt] Make capture check during call slot optimization more precise.

Just an early heads up that we bisected a test regression in V8 to this: https://bugs.chromium.org/p/chromium/issues/detail?id=1287437
It still needs more investigation, but it would be interesting to know if others have hit issues too.

In D115615#3243958, @hans wrote:

Just an early heads up that we bisected a test regression in V8 to this: https://bugs.chromium.org/p/chromium/issues/detail?id=1287437
It still needs more investigation, but it would be interesting to know if others have hit issues too.

https://bugs.chromium.org/p/chromium/issues/detail?id=1287437#c15 has the root cause. The problem is that the call slot optimization replaces a call argument without considering the call's !noalias metadata, introducing an aliasing violation. (Probably the optimization could be conservative and just drop the metadata if it exists).

I suspect this patch didn't actually introduce the problem, but rather made the optimization more likely to fire and so uncovered the pre-existing problem. In any case, I'll revert back to green until this can be fixed.

hans added a reverting change: rG53a51acc361a: Revert "[MemCpyOpt] Make capture check during call slot optimization more….Jan 18 2022, 8:42 AM

uabelho added a subscriber: uabelho.Jan 18 2022, 10:19 PM

nikic mentioned this in D117679: [MemCpyOpt] Fix metadata merging during call slot optimization.Jan 19 2022, 6:59 AM

nikic mentioned this in rGd7bff2e9d2e4: [MemCpyOpt] Fix metadata merging during call slot optimization.Jan 20 2022, 12:26 AM

nikic added a commit: rG655a7024dbbc: Reapply [MemCpyOpt] Make capture check during call slot optimization more….Jan 20 2022, 12:30 AM

Diff 395813

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp

Show First 20 Lines • Show All 939 Lines • ▼ Show 20 Lines	while (!srcUseList.empty()) {
if (const IntrinsicInst *IT = dyn_cast<IntrinsicInst>(U))		if (const IntrinsicInst *IT = dyn_cast<IntrinsicInst>(U))
if (IT->isLifetimeStartOrEnd())		if (IT->isLifetimeStartOrEnd())
continue;		continue;

if (U != C && U != cpyLoad)		if (U != C && U != cpyLoad)
return false;		return false;
}		}

// Check that src isn't captured by the called function since the		// Check whether src is captured by the called function, in which case there
// transformation can cause aliasing issues in that case.		// may be further indirect uses of src.
		bool SrcIsCaptured = false;
for (unsigned ArgI = 0, E = C->arg_size(); ArgI != E; ++ArgI)		for (unsigned ArgI = 0, E = C->arg_size(); ArgI != E; ++ArgI)
		fhahnUnsubmitted Done Reply Inline Actions could be any_of fhahn: could be any_of
if (C->getArgOperand(ArgI) == cpySrc && !C->doesNotCapture(ArgI))		if (C->getArgOperand(ArgI) == cpySrc && !C->doesNotCapture(ArgI))
		SrcIsCaptured = true;

		// If src is captured, then check whether there are any potential uses of
		// src through the captured pointer before the lifetime of src ends, either
		// due to a lifetime.end or a return from the function.
		if (SrcIsCaptured) {
		MemoryLocation SrcLoc =
		MemoryLocation(srcAlloca, LocationSize::precise(srcSize));
		for (Instruction &I :
		make_range(++C->getIterator(), C->getParent()->end())) {
		// Lifetime of srcAlloca ends at lifetime.end.
		if (auto *II = dyn_cast<IntrinsicInst>(&I)) {
		reamesUnsubmitted Not Done Reply Inline Actions Oh, please add a comment about how this depends on source not being captured before the call as well. Captured during the call is fine, but we depend on their not being a capture before for the same reason as dest. reames: Oh, please add a comment about how this depends on source not being captured before the call…
		if (II->getIntrinsicID() == Intrinsic::lifetime_end &&
		II->getArgOperand(1)->stripPointerCasts() == srcAlloca &&
		reamesUnsubmitted Done Reply Inline Actions I believe you need to check the size of the end here as well. You could have a zero sized end, which I believe is a noop. reames: I believe you need to check the size of the end here as well. You could have a zero sized end…
		nikicAuthorUnsubmitted Done Reply Inline Actions Yeah, you're right. nikic: Yeah, you're right.
		cast<ConstantInt>(II->getArgOperand(0))->uge(srcSize))
		break;
		}

		// Lifetime of srcAlloca ends at return.
		if (isa<ReturnInst>(&I))
		break;

		// Ignore the direct read of src in the load.
		if (&I == cpyLoad)
		continue;

		// Check whether this instruction may mod/ref src through the captured
		// pointer (we have already any direct mod/refs in the loop above).
		// Also bail if we hit a terminator, as we don't want to scan into other
		// blocks.
		if (isModOrRefSet(AA->getModRefInfo(&I, SrcLoc)) \|\| I.isTerminator())
		fhahnUnsubmitted Done Reply Inline Actions do we have a test for the terminator case? fhahn: do we have a test for the terminator case?
return false;		return false;
		}
		}

// Since we're changing the parameter to the callsite, we need to make sure		// Since we're changing the parameter to the callsite, we need to make sure
// that what would be the new parameter dominates the callsite.		// that what would be the new parameter dominates the callsite.
if (!DT->dominates(cpyDest, C)) {		if (!DT->dominates(cpyDest, C)) {
// Support moving a constant index GEP before the call.		// Support moving a constant index GEP before the call.
auto *GEP = dyn_cast<GetElementPtrInst>(cpyDest);		auto *GEP = dyn_cast<GetElementPtrInst>(cpyDest);
if (GEP && GEP->hasAllConstantIndices() &&		if (GEP && GEP->hasAllConstantIndices() &&
DT->dominates(GEP->getPointerOperand(), C))		DT->dominates(GEP->getPointerOperand(), C))
		reamesUnsubmitted Not Done Reply Inline Actions Aside: We have this slightly error prone pattern repeated in a few places, maybe it's time to add a matcher? Or helper routine? (Entirely optional follow up.) reames: Aside: We have this slightly error prone pattern repeated in a few places, maybe it's time to…
GEP->moveBefore(C);		GEP->moveBefore(C);
else		else
return false;		return false;
}		}

// In addition to knowing that the call does not access src in some		// In addition to knowing that the call does not access src in some
// unexpected manner, for example via a global, which we deduce from		// unexpected manner, for example via a global, which we deduce from
// the use analysis, we also need to know that it does not sneakily		// the use analysis, we also need to know that it does not sneakily
▲ Show 20 Lines • Show All 654 Lines • Show Last 20 Lines

llvm/test/Transforms/MemCpyOpt/capturing-func.ll

	Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines

	; Lifetime of %ptr2 ends before the potential use of the capture in the second			; Lifetime of %ptr2 ends before the potential use of the capture in the second
	; call.			; call.
	define void @test_lifetime_end() {			define void @test_lifetime_end() {
	; CHECK-LABEL: @test_lifetime_end(			; CHECK-LABEL: @test_lifetime_end(
	; CHECK-NEXT: [[PTR1:%.*]] = alloca i8, align 1			; CHECK-NEXT: [[PTR1:%.*]] = alloca i8, align 1
	; CHECK-NEXT: [[PTR2:%.*]] = alloca i8, align 1			; CHECK-NEXT: [[PTR2:%.*]] = alloca i8, align 1
	; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 1, i8* [[PTR2]])			; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 1, i8* [[PTR2]])
	; CHECK-NEXT: call void @foo(i8* [[PTR2]])			; CHECK-NEXT: call void @foo(i8* [[PTR1]])
	; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* [[PTR1]], i8* [[PTR2]], i32 1, i1 false)
	; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 1, i8* [[PTR2]])			; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 1, i8* [[PTR2]])
	; CHECK-NEXT: call void @foo(i8* [[PTR1]])			; CHECK-NEXT: call void @foo(i8* [[PTR1]])
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%ptr1 = alloca i8			%ptr1 = alloca i8
	%ptr2 = alloca i8			%ptr2 = alloca i8
	call void @llvm.lifetime.start.p0i8(i64 1, i8* %ptr2)			call void @llvm.lifetime.start.p0i8(i64 1, i8* %ptr2)
	call void @foo(i8* %ptr2)			call void @foo(i8* %ptr2)
	call void @llvm.memcpy.p0i8.p0i8.i32(i8* %ptr1, i8* %ptr2, i32 1, i1 false)			call void @llvm.memcpy.p0i8.p0i8.i32(i8* %ptr1, i8* %ptr2, i32 1, i1 false)
	call void @llvm.lifetime.end.p0i8(i64 1, i8* %ptr2)			call void @llvm.lifetime.end.p0i8(i64 1, i8* %ptr2)
	call void @foo(i8* %ptr1)			call void @foo(i8* %ptr1)
	ret void			ret void
	}			}

				; Lifetime of %ptr2 does not end, because of size mismatch.
				define void @test_lifetime_not_end() {
				; CHECK-LABEL: @test_lifetime_not_end(
				; CHECK-NEXT: [[PTR1:%.*]] = alloca i8, align 1
				; CHECK-NEXT: [[PTR2:%.*]] = alloca i8, align 1
				; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 1, i8* [[PTR2]])
				; CHECK-NEXT: call void @foo(i8* [[PTR2]])
				; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* [[PTR1]], i8* [[PTR2]], i32 1, i1 false)
				; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 0, i8* [[PTR2]])
				; CHECK-NEXT: call void @foo(i8* [[PTR1]])
				; CHECK-NEXT: ret void
				;
				%ptr1 = alloca i8
				%ptr2 = alloca i8
				call void @llvm.lifetime.start.p0i8(i64 1, i8* %ptr2)
				call void @foo(i8* %ptr2)
				call void @llvm.memcpy.p0i8.p0i8.i32(i8* %ptr1, i8* %ptr2, i32 1, i1 false)
				call void @llvm.lifetime.end.p0i8(i64 0, i8* %ptr2)
				call void @foo(i8* %ptr1)
				ret void
				}

	; Lifetime of %ptr2 ends before any potential use of the capture because we			; Lifetime of %ptr2 ends before any potential use of the capture because we
	; return from the function.			; return from the function.
	define void @test_function_end() {			define void @test_function_end() {
	; CHECK-LABEL: @test_function_end(			; CHECK-LABEL: @test_function_end(
	; CHECK-NEXT: [[PTR1:%.*]] = alloca i8, align 1			; CHECK-NEXT: [[PTR1:%.*]] = alloca i8, align 1
	; CHECK-NEXT: [[PTR2:%.*]] = alloca i8, align 1			; CHECK-NEXT: [[PTR2:%.*]] = alloca i8, align 1
	; CHECK-NEXT: call void @foo(i8* [[PTR2]])			; CHECK-NEXT: call void @foo(i8* [[PTR1]])
	; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* [[PTR1]], i8* [[PTR2]], i32 1, i1 false)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%ptr1 = alloca i8			%ptr1 = alloca i8
	%ptr2 = alloca i8			%ptr2 = alloca i8
	call void @foo(i8* %ptr2)			call void @foo(i8* %ptr2)
	call void @llvm.memcpy.p0i8.p0i8.i32(i8* %ptr1, i8* %ptr2, i32 1, i1 false)			call void @llvm.memcpy.p0i8.p0i8.i32(i8* %ptr1, i8* %ptr2, i32 1, i1 false)
	ret void			ret void
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] Make capture check during call slot optimization more precise
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 395813

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp

llvm/test/Transforms/MemCpyOpt/capturing-func.ll

This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] Make capture check during call slot optimization more preciseClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 395813

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp

llvm/test/Transforms/MemCpyOpt/capturing-func.ll

[MemCpyOpt] Make capture check during call slot optimization more precise
ClosedPublic