This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] Perform call slot optimizations through GEPs
Needs ReviewPublic

Authored by dotdash on Dec 19 2017, 8:57 AM.

Details

Summary

Call slot optimization needs to know that writing to the destination of the
memcpy won't trap, so it has to know the number of dereferenceable bytes of the
destination. Currently it can only handle this for allocas and function
arguments that are directly written to. We can improve upon this by handling
simple GEPs with constant offsets.

This allows call slot optimizations to happen in code like this:

struct Foo {
  int o[200];
};

struct Bar {
  char p;
  Foo f;
};

  __attribute__((noinline))
Foo moo()
{
  return {0};
}

Bar goo()
{
  Bar f;
  f.f = moo();
  return f;
}

Which is a slightly modified version of the code in PR35134 that works around
the missing alias information in the original C++ code.

It also makes the optimization happen in the Rust code that originally lead to
the above PR.

Event Timeline

dotdash created this revision.Dec 19 2017, 8:57 AM
efriedma added inline comments.Dec 19 2017, 12:24 PM
lib/Transforms/Scalar/MemCpyOptimizer.cpp
836

You probably want to clarify why an alloca is special here.

Does it matter if cpyDest could be accessed from another thread?

Do you need to check whether any calls between "C" and "cpy" could throw?

848

Maybe this code belongs in getPointerDereferenceableBytes, rather than here?

915

This check is kind of fragile, in that it assumes the other operands to the GEP are all ConstantInts.

dotdash added inline comments.Dec 21 2017, 6:46 AM
lib/Transforms/Scalar/MemCpyOptimizer.cpp
836

You probably want to clarify why an alloca is special here.

I basically just kept the logic as it was here, but looks like that comment is a bit outdated, and I already made call slot opt a bit too aggressive back when I introduced handling of arguments other than sret ones.

Do you need to check whether any calls between "C" and "cpy" could throw?

I guess so, already to properly handle arguments that are merely dereferenceable and not sret, to avoid stores from showing up that should not be visible after a throw. Would a check for postdominance be sufficient here?

Does it matter if cpyDest could be accessed from another thread?

I'm not completely sure, but I'd say that for multithreaded code to behave correctly here, one would need a memory barrier or something similar here anyway, which should break the dependency between the call and the memcpy, so we should not end up here in the first place. Am I missing something here?

848

I had some problems where moving that code broke the original user of getPointerDereferenceableBytes. I can try to see if I can fix that.

915

That's a prerequisite to get here, but I see what you mean. I guess I add an explicit hasAllConstantIndices() check here?

efriedma added inline comments.Dec 21 2017, 11:36 AM
lib/Transforms/Scalar/MemCpyOptimizer.cpp
836

Would a check for postdominance be sufficient here?

LLVM's PostDominatorTree isn't usable for this, if that's what you're asking.

for multithreaded code to behave correctly here, one would need a memory barrier or something similar here anyway, which should break the dependency between the call and the memcpy

Oh, okay, that's right. (There's a similar transform in LICM I was comparing this to, but LICM is more complicated because we promote conditional stores in some cases.)

874

Sort of orthogonal to the patch, but the check here ("Check that src is not accessed except via the call and the memcpy") doesn't provide the intended guarantee. If the call is in a loop, src doesn't hold undefined values after the first iteration of the loop.

915

Sure. (You could make it an assert if you're confident this code isn't reachable otherwise.)

sanjoy resigned from this revision.Jan 29 2022, 5:32 PM