This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] memset->memcpy forwarding with undef tail
ClosedPublic

Authored by nikic on Nov 30 2018, 5:18 AM.

Details

Summary

Currently memcpyopt optimizes cases like

memset(a, byte, N);
memcpy(b, a, M);
// to
memset(a, byte, N);
memset(b, byte, M);

if M <= N. Often this allows further simplifications down the line, which drop the first memset entirely.

This patch extends this optimization for the case where M > N, but we know that the bytes a[N..M] are undef due to alloca/lifetime.start.

This situation arises relatively often for Rust code, because Rust does not initialize trailing structure padding and loves to insert redundant memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844.

For the implementation, I'm reusing a bit of code for a similar existing optimization (direct memcpy of undef).

Diff Detail

Repository
rL LLVM

Event Timeline

nikic created this revision.Nov 30 2018, 5:18 AM
nikic added a comment.Nov 30 2018, 6:47 AM

An alternative way to handle this would be to tell call slot optimization that if the call happens to be a memset, it does not need to check that the destination is non-trapping. This would drop the memcpy entirely right away. However, it is also less general, because it will not handle cases where the source value is still used afterwards (the malloc/free example here).

efriedma added inline comments.Dec 4 2018, 1:08 PM
lib/Analysis/MemoryDependenceAnalysis.cpp
173 ↗(On Diff #176095)

This probably makes sense, but it looks like it'll impact other transforms; would it be possible to test separately? Or does it actually not have any impact on other transforms for some reason?

Looks fine otherwise.

nikic marked an inline comment as done.Dec 6 2018, 4:27 AM
nikic added inline comments.
lib/Analysis/MemoryDependenceAnalysis.cpp
173 ↗(On Diff #176095)

It seems that it indeed does not impact other transforms, at least not in any way that I can see. The main consumer of GetLocation() is getDependency(), for which:

  • GVN only deals with loads.
  • DSE has it's own mechanism for determining location.
  • MemCpyOpt does not call getDependency on memsets (prior to this change).

The other consumer is getCallSiteDependencyFrom(), where previously memset would have been treated as a normal call-site. However in both cases (known location or call-site) the result will ultimately be determinded by AA.


What I'm unsure about is whether this is correct wrt handling of volatile. I see in the code above that an ordered (and non-monotonic) StoreInst will return ModRef with empty location. Maybe memset needs the same treatment for volatile?

efriedma added inline comments.Dec 6 2018, 11:13 AM
lib/Analysis/MemoryDependenceAnalysis.cpp
173 ↗(On Diff #176095)

We don't really use volatile memsets anywhere important, but yes, probably best to be conservative and treat them as ModRef.

nikic updated this revision to Diff 177213.Dec 7 2018, 7:33 AM

Conversatively return ModRef for volatile memsets. Add a few more tests.

Forwarding happens for the case of volatile memset + non-volatile memcpy and doesn't for non-volatile memset + volatile memcpy, which I *think* is correct. (In either case this is existing behavior, this patch doesn't change it.)

This revision is now accepted and ready to land.Dec 7 2018, 12:50 PM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Dec 11 2018, 12:09 PM

Heads-up that this causes crashes during static initialization in one binary in the chrome/win build, https://bugs.chromium.org/p/chromium/issues/detail?id=913423 I don't have a reduced repro yet, but my feeling so far is that this will be a compiler bug, not a code bug. (Nothing to do here yet until I have a repro.)

nikic reopened this revision.Dec 13 2018, 3:27 AM

Reopening, as this was reverted by rL349002, with a reduced test case showing miscompilation at http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610520.html.

This revision is now accepted and ready to land.Dec 13 2018, 3:27 AM
nikic planned changes to this revision.Dec 13 2018, 4:27 AM
nikic updated this revision to Diff 178076.Dec 13 2018, 9:00 AM
nikic edited the summary of this revision. (Show Details)

Fix memory dependence query. We need to query at the location of the memset, but with the size of the memcpy. Otherwise we miss possible writes in the region between the end of the memset and the end of the memcpy. Ideally we'd only query for locations in that region, but as there is no easy way to do so we just use the whole memcpy source region.

Add additional tests which have writes prior to the memset, either in the memset region (legal to optimize, but we currently don't), as well as after the memset region, or overlapping both (not legal to optimize).

This revision is now accepted and ready to land.Dec 13 2018, 9:00 AM
nikic updated this revision to Diff 178078.Dec 13 2018, 9:02 AM

Fix typos in comment.

LGTM. Sorry, I should have considered the dependency check a little more carefully the first time around.

This revision was automatically updated to reflect the committed changes.