This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
MemCpyOptimizer.cpp
-
test/Transforms/MemCpyOpt/
-
Transforms/
-
MemCpyOpt/
-
memcpy.ll

Differential D23937

[MemCpyOpt] Return value `memcpy` elision.
Needs ReviewPublic

Authored by bryant on Aug 26 2016, 11:14 AM.

Download Raw Diff

Details

Reviewers

aaron.ballman
majnemer
eli.friedman
rnk

Summary

Clang (as opposed to clang++, which never exhibits this behavior due to NRVO)
tends to generate IR of the form (GEPs redacted for clarity),

%large = type { ... }
define void @f(%large* noalias sret, [remaining args]) {
  %retval = alloca %large
  [do stuff that stores to retval, possibly across several bb]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %retval, i64 ..., i32 ..., i1 false)
  ret void
}

It's my understanding that if the sret is noalias: the memcpy could be elided,
the alloca removed, and all operations involving %retval and its computed
pointers replaced with the sret (and pointers computed thereof):

define void @f(%large* noalias sret, [remaining args]) {
  [do stuff that stores directly to %0]
  ret void
}

This patch augments MemCpyOptPass to do just that.

Fixes http://llvm.org/PR2218 .

Diff Detail

Repository: rL LLVM

Event Timeline

bryant updated this revision to Diff 69402.Aug 26 2016, 11:14 AM

bryant retitled this revision from to [MemCpyOpt] Return value `memcpy` elision..

bryant updated this object.

bryant added reviewers: eli.friedman, majnemer, rnk, aaron.ballman.

bryant set the repository for this revision to rL LLVM.

bryant added a subscriber: llvm-commits.

This isn't safe. Consider:

void f(bool b) {
  S x, y;
  g(&x, &y);
  if (b) {
    return x; // memcpy to sret
  } else {
    return y; // memcpy to sret
  }
}

With your transform, the two arguments to g become the same pointer.

In D23937#526679, @efriedma wrote:
This isn't safe. Consider:
void f(bool b) {
  S x, y;
  g(&x, &y);
  if (b) {
    return x; // memcpy to sret
  } else {
    return y; // memcpy to sret
  }
}
With your transform, the two arguments to g become the same pointer.

Is this the only case it would be unsafe, i.e., in the general case, it's always safe as long as sret depends on a single alloca. Would that be the correct rule, or perhaps too restrictive?

Well, in the general case, you could treat it like a sort of coloring problem (if you can prove x and y aren't live at the same time, you can allocate them both on top of the sret parameter), but probably best to stick with the simple case for now, where the only use of the sret is the memcpy you're examining.

Beyond that, in theory there's also a problem if the memcpy writes beyond the end of the sret (it's only undefined behavior if the memcpy actually executes), but I'm not sure if you can realistically trigger that issue.

I am no longer as certain about the usefulness of this transformation. For
starters, it's only valid for nounwind functions. If the memcpy is elided in
function that's allowed to throw, it's possible for control to leave that
function after part of sret is written (contrast with the non-elided version of
the function that would leave sret untouched).

Furthermore, the memcpy would (theoretically) be elided anyway after inlining,
so it seems sensible to redirect effort to fixing elision in the inlined cases.
Since the elision is imperfect in actual inlining cases, I've created
https://reviews.llvm.org/D25175 .

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

MemCpyOptimizer.cpp

43 lines

test/

Transforms/

MemCpyOpt/

memcpy.ll

47 lines

Diff 69402

lib/Transforms/Scalar/MemCpyOptimizer.cpp

Show First 20 Lines • Show All 692 Lines • ▼ Show 20 Lines	if (LI->isSimple() && LI->hasOneUse() &&
MD->removeInstruction(SI);		MD->removeInstruction(SI);
SI->eraseFromParent();		SI->eraseFromParent();
MD->removeInstruction(LI);		MD->removeInstruction(LI);
LI->eraseFromParent();		LI->eraseFromParent();
++NumMemCpyInstr;		++NumMemCpyInstr;
return true;		return true;
}		}
}		}

		if (AllocaInst *AI = dyn_cast<AllocaInst>(
		LI->getPointerOperand()->stripPointerCasts())) {
		if (Argument *Arg = dyn_cast<Argument>(
		SI->getPointerOperand()->stripPointerCasts())) {
		if (Arg->hasStructRetAttr() && Arg->hasNoAliasAttr() &&
		Arg->getType() == AI->getType()) {
		AI->replaceAllUsesWith(Arg);

		MD->removeInstruction(SI);
		SI->eraseFromParent();

		MD->removeInstruction(LI);
		LI->eraseFromParent();

		MD->removeInstruction(AI);
		AI->eraseFromParent();

		++NumMemCpyInstr;
		return true;
		}
		}
		}
}		}
}		}

// There are two cases that are interesting for this code to handle: memcpy		// There are two cases that are interesting for this code to handle: memcpy
// and memset. Right now we only handle memset.		// and memset. Right now we only handle memset.

// Ensure that the value being stored is something that can be memset'able a		// Ensure that the value being stored is something that can be memset'able a
// byte at a time like "0" or "-1" or any width, as well as things like		// byte at a time like "0" or "-1" or any width, as well as things like
▲ Show 20 Lines • Show All 463 Lines • ▼ Show 20 Lines	bool MemCpyOptPass::processMemCpy(MemCpyInst *M) {

// There are four possible optimizations we can do for memcpy:		// There are four possible optimizations we can do for memcpy:
// a) memcpy-memcpy xform which exposes redundance for DSE.		// a) memcpy-memcpy xform which exposes redundance for DSE.
// b) call-memcpy xform for return slot optimization.		// b) call-memcpy xform for return slot optimization.
// c) memcpy from freshly alloca'd space or space that has just started its		// c) memcpy from freshly alloca'd space or space that has just started its
// lifetime copies undefined data, and we can therefore eliminate the		// lifetime copies undefined data, and we can therefore eliminate the
// memcpy in favor of the data that was already at the destination.		// memcpy in favor of the data that was already at the destination.
// d) memcpy from a just-memset'd source can be turned into memset.		// d) memcpy from a just-memset'd source can be turned into memset.
		// e) memcpy from an alloca to a noalias sret can be converted into direct
		// access to the sret.
if (DepInfo.isClobber()) {		if (DepInfo.isClobber()) {
if (CallInst *C = dyn_cast<CallInst>(DepInfo.getInst())) {		if (CallInst *C = dyn_cast<CallInst>(DepInfo.getInst())) {
if (performCallSlotOptzn(M, M->getDest(), M->getSource(),		if (performCallSlotOptzn(M, M->getDest(), M->getSource(),
CopySize->getZExtValue(), M->getAlignment(),		CopySize->getZExtValue(), M->getAlignment(),
C)) {		C)) {
MD->removeInstruction(M);		MD->removeInstruction(M);
M->eraseFromParent();		M->eraseFromParent();
return true;		return true;
Show All 33 Lines	if (SrcDepInfo.isClobber())
if (MemSetInst *MDep = dyn_cast<MemSetInst>(SrcDepInfo.getInst()))		if (MemSetInst *MDep = dyn_cast<MemSetInst>(SrcDepInfo.getInst()))
if (performMemCpyToMemSetOptzn(M, MDep)) {		if (performMemCpyToMemSetOptzn(M, MDep)) {
MD->removeInstruction(M);		MD->removeInstruction(M);
M->eraseFromParent();		M->eraseFromParent();
++NumCpyToSet;		++NumCpyToSet;
return true;		return true;
}		}

		if (AllocaInst *AI = dyn_cast<AllocaInst>(M->getSource())) {
		if (Argument *Arg = dyn_cast<Argument>(M->getDest())) {
		if (Arg->hasStructRetAttr() && Arg->hasNoAliasAttr() &&
		Arg->getType() == AI->getType()) {
		AI->replaceAllUsesWith(Arg);

		MD->removeInstruction(M);
		M->eraseFromParent();

		MD->removeInstruction(AI);
		AI->eraseFromParent();

		++NumMemCpyInstr;
		return true;
		}
		}
		}

return false;		return false;
}		}

/// Transforms memmove calls to memcpy calls when the src/dst are guaranteed		/// Transforms memmove calls to memcpy calls when the src/dst are guaranteed
/// not to alias.		/// not to alias.
bool MemCpyOptPass::processMemMove(MemMoveInst *M) {		bool MemCpyOptPass::processMemMove(MemMoveInst *M) {
AliasAnalysis &AA = LookupAliasAnalysis();		AliasAnalysis &AA = LookupAliasAnalysis();

▲ Show 20 Lines • Show All 208 Lines • Show Last 20 Lines

test/Transforms/MemCpyOpt/memcpy.ll

Show First 20 Lines • Show All 196 Lines • ▼ Show 20 Lines	define void @test10(%opaque* noalias nocapture sret %x, i32 %y) {
store i32 %y, i32* %a		store i32 %y, i32* %a
call void @foo(i32* noalias nocapture %a)		call void @foo(i32* noalias nocapture %a)
%c = load i32, i32* %a		%c = load i32, i32* %a
%d = bitcast %opaque* %x to i32*		%d = bitcast %opaque* %x to i32*
store i32 %c, i32* %d		store i32 %c, i32* %d
ret void		ret void
}		}

		define void @elide_memcpy_to_noalias_sret(%struct.big* noalias nocapture sret,
		%struct.big* nocapture readonly,
		%struct.big* nocapture readonly) {
		; CHECK-LABEL: @elide_memcpy_to_noalias_sret
		; CHECK-NOT: memcpy
		%4 = alloca %struct.big, align 4
		%5 = bitcast %struct.big* %4 to i8*
		call void @llvm.lifetime.start(i64 200, i8* %5)
		br label %8

		; <label>:6: ; preds = %8
		%7 = bitcast %struct.big* %0 to i8*
		call void @llvm.memcpy.p0i8.p0i8.i64(i8* %7, i8* nonnull %5, i64 200, i32 4, i1 false)
		call void @llvm.lifetime.end(i64 200, i8* nonnull %5)
		ret void

		; <label>:8: ; preds = %8, %3
		%9 = phi i64 [ 0, %3 ], [ %16, %8 ]
		%10 = getelementptr inbounds %struct.big, %struct.big* %1, i64 0, i32 0, i64 %9
		%11 = load i32, i32* %10, align 4
		%12 = getelementptr inbounds %struct.big, %struct.big* %2, i64 0, i32 0, i64 %9
		%13 = load i32, i32* %12, align 4
		%14 = xor i32 %13, %11
		%15 = getelementptr inbounds %struct.big, %struct.big* %4, i64 0, i32 0, i64 %9
		store i32 %14, i32* %15, align 4
		%16 = add nuw nsw i64 %9, 1
		%17 = icmp eq i64 %16, 50
		br i1 %17, label %6, label %8
		}

		define void @pr2218(i8* noalias sret %result) {
		; CHECK-LABEL: @pr2218
		; CHECK: call void @initialize(i8* noalias sret %result)
		; CHECK: ret void
		entry:
		%temporary = alloca i8 ; <i8*> [#uses=3]
		%pointless = alloca i8 ; <i8*> [#uses=1]
		call void @initialize(i8* noalias sret %temporary)
		call void @llvm.memcpy.p0i8.p0i8.i64(i8* %pointless, i8* %temporary, i64 1, i32 4, i1 false)
		call void @llvm.memcpy.p0i8.p0i8.i64(i8* %result, i8* %temporary, i64 1, i32 4, i1 false)
		ret void
		}

		declare void @initialize(i8* noalias sret)
		declare void @llvm.lifetime.start(i64, i8* nocapture)
		declare void @llvm.lifetime.end(i64, i8* nocapture)

declare void @f1(%struct.big* nocapture sret)		declare void @f1(%struct.big* nocapture sret)
declare void @f2(%struct.big*)		declare void @f2(%struct.big*)

; CHECK: attributes [[NUW]] = { nounwind }		; CHECK: attributes [[NUW]] = { nounwind }
; CHECK: attributes #1 = { argmemonly nounwind }		; CHECK: attributes #1 = { argmemonly nounwind }
; CHECK: attributes #2 = { nounwind ssp }		; CHECK: attributes #2 = { nounwind ssp }
; CHECK: attributes #3 = { nounwind ssp uwtable }		; CHECK: attributes #3 = { nounwind ssp uwtable }