This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] Return value `memcpy` elision.
Needs ReviewPublic

Authored by bryant on Aug 26 2016, 11:14 AM.

Details

Summary

Clang (as opposed to clang++, which never exhibits this behavior due to NRVO)
tends to generate IR of the form (GEPs redacted for clarity),

%large = type { ... }
define void @f(%large* noalias sret, [remaining args]) {
  %retval = alloca %large
  [do stuff that stores to retval, possibly across several bb]
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %retval, i64 ..., i32 ..., i1 false)
  ret void
}

It's my understanding that if the sret is noalias: the memcpy could be elided,
the alloca removed, and all operations involving %retval and its computed
pointers replaced with the sret (and pointers computed thereof):

define void @f(%large* noalias sret, [remaining args]) {
  [do stuff that stores directly to %0]
  ret void
}

This patch augments MemCpyOptPass to do just that.

Fixes http://llvm.org/PR2218 .

Diff Detail

Repository
rL LLVM

Event Timeline

bryant updated this revision to Diff 69402.Aug 26 2016, 11:14 AM
bryant retitled this revision from to [MemCpyOpt] Return value `memcpy` elision..
bryant updated this object.
bryant set the repository for this revision to rL LLVM.
bryant added a subscriber: llvm-commits.

This isn't safe. Consider:

void f(bool b) {
  S x, y;
  g(&x, &y);
  if (b) {
    return x; // memcpy to sret
  } else {
    return y; // memcpy to sret
  }
}

With your transform, the two arguments to g become the same pointer.

This isn't safe. Consider:

void f(bool b) {
  S x, y;
  g(&x, &y);
  if (b) {
    return x; // memcpy to sret
  } else {
    return y; // memcpy to sret
  }
}

With your transform, the two arguments to g become the same pointer.

Is this the only case it would be unsafe, i.e., in the general case, it's always safe as long as sret depends on a single alloca. Would that be the correct rule, or perhaps too restrictive?

Well, in the general case, you could treat it like a sort of coloring problem (if you can prove x and y aren't live at the same time, you can allocate them both on top of the sret parameter), but probably best to stick with the simple case for now, where the only use of the sret is the memcpy you're examining.

Beyond that, in theory there's also a problem if the memcpy writes beyond the end of the sret (it's only undefined behavior if the memcpy actually executes), but I'm not sure if you can realistically trigger that issue.

bryant added a comment.Oct 4 2016, 9:33 AM

I am no longer as certain about the usefulness of this transformation. For
starters, it's only valid for nounwind functions. If the memcpy is elided in
function that's allowed to throw, it's possible for control to leave that
function after part of sret is written (contrast with the non-elided version of
the function that would leave sret untouched).

Furthermore, the memcpy would (theoretically) be elided anyway after inlining,
so it seems sensible to redirect effort to fixing elision in the inlined cases.
Since the elision is imperfect in actual inlining cases, I've created
https://reviews.llvm.org/D25175 .