This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
1/2
MemCpyOptimizer.cpp
-
test/Transforms/MemCpyOpt/
-
Transforms/
-
MemCpyOpt/
-
callslot.ll

Differential D89623

[MemCpyOpt] Move GEP during call slot optimization
ClosedPublic

Authored by nikic on Oct 17 2020, 6:59 AM.

Download Raw Diff

Details

Reviewers

efriedma
jdoerfert

Commits

rG3e37543111f4: [MemCpyOpt] Move GEP during call slot optimization

Summary

When performing a call slot optimization to a GEP dest, it will currently usually not apply, because the GEP is directly before the memcpy and as such does not dominate the call. We should move it above the call if that satisfies the domination requirement. I think that a constant-index GEP is the only useful thing to move here, as otherwise isDereferenceablePointer couldn't look through it anyway. As such I'm not trying to generalize this further.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nikic created this revision.Oct 17 2020, 6:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 17 2020, 6:59 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

nikic requested review of this revision.Oct 17 2020, 6:59 AM

nikic added inline comments.

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
920	I'm wondering if it would make sense to extend the `dominates()` API to accept a Value* instead of Instruction* as first argument. Handling "arguments, constants and globals dominate everything" seems like something the DominatorTree API should do, not the caller.

Harbormaster completed remote builds in B75424: Diff 298826.Oct 17 2020, 7:37 AM

MSxDOS added a subscriber: MSxDOS.Oct 17 2020, 7:51 AM

jdoerfert added inline comments.Oct 17 2020, 9:32 AM

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp
920	agreed.

nikic mentioned this in D89632: [DomTree] Accept Value as Def (NFC).Oct 17 2020, 12:01 PM

nikic mentioned this in rG32b6e9a450ff: [DomTree] Accept Value as Def (NFC).Oct 22 2020, 9:37 AM

Rebase over generalized dominates() API.

LGTM.

This revision is now accepted and ready to land.Oct 22 2020, 10:11 AM

Closed by commit rG3e37543111f4: [MemCpyOpt] Move GEP during call slot optimization (authored by nikic). · Explain WhyOct 22 2020, 11:41 AM

This revision was automatically updated to reflect the committed changes.

nikic added a commit: rG3e37543111f4: [MemCpyOpt] Move GEP during call slot optimization.

Hi, we have a regression that started with this commit. The problem happens when GEP %43 is moved before the call on line 49: https://paste.mozilla.org/gYqBnKes . Can you see anything obviously wrong with that move?

If not, our repro is quite large and I fear it will take me a very long time to fully understand what went wrong and/or get a runnable reduced test case. At this point I'm not even sure whether this transformation was itself the culprit or merely uncovered a problem elsewhere. Hoping you might be able to quickly spot something and save me lots of time!

I thiiiink I'm starting to see this a little better, and the problem is that the memcpy clobbers the optimized call slot.

Before performCallSlotOptzn

call func(..., X, ...)
load from X
GEP
store to Y

After performCallSlotOptzn

GEP
call func(..., Y, ...)
load from X
store to Y  ; oops

Am I interpreting that correctly?

@dmajor If the call slot optimization applies, then that load/store pair should also get dropped as part of the optimization. I don't immediately see how we can end up both changing the call argument and not dropping the load/store, if I understand right what is happening in your case.

In D89623#2361288, @nikic wrote:

@dmajor If the call slot optimization applies, then that load/store pair should also get dropped as part of the optimization. I don't immediately see how we can end up both changing the call argument and not dropping the load/store, if I understand right what is happening in your case.

You're right, I was looking too narrowly at performCallSlotOptzn. The IR is sane after MemCpyOptPass as a whole. Our badness actually happens later in the pipeline, sorry for the noise.

dsprenkels added a subscriber: dsprenkels.Nov 9 2020, 1:15 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

MemCpyOptimizer.cpp

11 lines

test/

Transforms/

MemCpyOpt/

callslot.ll

5 lines

Diff 300060

llvm/lib/Transforms/Scalar/MemCpyOptimizer.cpp

Show First 20 Lines • Show All 907 Lines • ▼ Show 20 Lines	bool MemCpyOptPass::performCallSlotOptzn(Instruction *cpyLoad,
// Check that src isn't captured by the called function since the		// Check that src isn't captured by the called function since the
// transformation can cause aliasing issues in that case.		// transformation can cause aliasing issues in that case.
for (unsigned ArgI = 0, E = C->arg_size(); ArgI != E; ++ArgI)		for (unsigned ArgI = 0, E = C->arg_size(); ArgI != E; ++ArgI)
if (C->getArgOperand(ArgI) == cpySrc && !C->doesNotCapture(ArgI))		if (C->getArgOperand(ArgI) == cpySrc && !C->doesNotCapture(ArgI))
return false;		return false;

// Since we're changing the parameter to the callsite, we need to make sure		// Since we're changing the parameter to the callsite, we need to make sure
// that what would be the new parameter dominates the callsite.		// that what would be the new parameter dominates the callsite.
// TODO: Support moving instructions like GEPs upwards.		if (!DT->dominates(cpyDest, C)) {
if (Instruction *cpyDestInst = dyn_cast<Instruction>(cpyDest))		// Support moving a constant index GEP before the call.
if (!DT->dominates(cpyDestInst, C))		auto *GEP = dyn_cast<GetElementPtrInst>(cpyDest);
		if (GEP && GEP->hasAllConstantIndices() &&
		DT->dominates(GEP->getPointerOperand(), C))
		nikicAuthorUnsubmitted Done Reply Inline Actions I'm wondering if it would make sense to extend the `dominates()` API to accept a Value* instead of Instruction* as first argument. Handling "arguments, constants and globals dominate everything" seems like something the DominatorTree API should do, not the caller. nikic: I'm wondering if it would make sense to extend the `dominates()` API to accept a Value* instead…
		jdoerfertUnsubmitted Not Done Reply Inline Actions agreed. jdoerfert: agreed.
		GEP->moveBefore(C);
		else
return false;		return false;
		}

// In addition to knowing that the call does not access src in some		// In addition to knowing that the call does not access src in some
// unexpected manner, for example via a global, which we deduce from		// unexpected manner, for example via a global, which we deduce from
// the use analysis, we also need to know that it does not sneakily		// the use analysis, we also need to know that it does not sneakily
// access dest. We rely on AA to figure this out for us.		// access dest. We rely on AA to figure this out for us.
ModRefInfo MR = AA->getModRefInfo(C, cpyDest, LocationSize::precise(srcSize));		ModRefInfo MR = AA->getModRefInfo(C, cpyDest, LocationSize::precise(srcSize));
// If necessary, perform additional analysis.		// If necessary, perform additional analysis.
if (isModOrRefSet(MR))		if (isModOrRefSet(MR))
▲ Show 20 Lines • Show All 624 Lines • Show Last 20 Lines

llvm/test/Transforms/MemCpyOpt/callslot.ll

Show First 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

define void @dest_is_gep_requires_movement() {		define void @dest_is_gep_requires_movement() {
; CHECK-LABEL: @dest_is_gep_requires_movement(		; CHECK-LABEL: @dest_is_gep_requires_movement(
; CHECK-NEXT: [[DEST:%.*]] = alloca [16 x i8], align 1		; CHECK-NEXT: [[DEST:%.*]] = alloca [16 x i8], align 1
; CHECK-NEXT: [[SRC:%.*]] = alloca [8 x i8], align 1		; CHECK-NEXT: [[SRC:%.*]] = alloca [8 x i8], align 1
; CHECK-NEXT: [[SRC_I8:%.]] = bitcast [8 x i8] [[SRC]] to i8*		; CHECK-NEXT: [[SRC_I8:%.]] = bitcast [8 x i8] [[SRC]] to i8*
; CHECK-NEXT: call void @accept_ptr(i8* [[SRC_I8]]) [[ATTR3]]
; CHECK-NEXT: [[DEST_I8:%.]] = getelementptr [16 x i8], [16 x i8] [[DEST]], i64 0, i64 8		; CHECK-NEXT: [[DEST_I8:%.]] = getelementptr [16 x i8], [16 x i8] [[DEST]], i64 0, i64 8
; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* [[DEST_I8]], i8* [[SRC_I8]], i64 8, i1 false)		; CHECK-NEXT: [[DEST_I81:%.]] = bitcast i8 [[DEST_I8]] to [8 x i8]*
		; CHECK-NEXT: [[DEST_I812:%.]] = bitcast [8 x i8] [[DEST_I81]] to i8*
		; CHECK-NEXT: call void @accept_ptr(i8* [[DEST_I812]]) [[ATTR3]]
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%dest = alloca [16 x i8]		%dest = alloca [16 x i8]
%src = alloca [8 x i8]		%src = alloca [8 x i8]
%src.i8 = bitcast [8 x i8]* %src to i8*		%src.i8 = bitcast [8 x i8]* %src to i8*
call void @accept_ptr(i8* %src.i8) nounwind		call void @accept_ptr(i8* %src.i8) nounwind
%dest.i8 = getelementptr [16 x i8], [16 x i8]* %dest, i64 0, i64 8		%dest.i8 = getelementptr [16 x i8], [16 x i8]* %dest, i64 0, i64 8
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %dest.i8, i8* %src.i8, i64 8, i1 false)		call void @llvm.memcpy.p0i8.p0i8.i64(i8* %dest.i8, i8* %src.i8, i64 8, i1 false)
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines