Page MenuHomePhabricator

[MemCpyOpt] Don't sink LoadInst below possible clobber.
ClosedPublic

Authored by bryant on Nov 17 2016, 12:34 PM.

Details

Summary

Currently, MemCpyOpt::processStore will convert this

val = load A
store _, B      ; may-alias A
store _, C      ; may-alias A + D
store val, D

into

store _, C      ; may-alias A + D
val = load A
store val, D
store _, B      ; may-alias A

and then into

store _, C      ; may-alias A + D
memcpy(D, A)
store _, B      ; may-alias A

This is incorrect, since C and A may alias and therefore cannot be re-ordered.

Adding Amaury since this was introduced in http://reviews.llvm.org/D16523 .

Diff Detail

Repository
rL LLVM

Event Timeline

bryant updated this revision to Diff 78398.Nov 17 2016, 12:34 PM
bryant retitled this revision from to [MemCpyOpt] Don't sink LoadInst below possible clobber..
bryant updated this object.
bryant added reviewers: deadalnix, efriedma, mehdi_amini.
bryant set the repository for this revision to rL LLVM.
bryant added a subscriber: llvm-commits.
bryant added a comment.EditedNov 17 2016, 12:38 PM

For the sake of clarity, I've broken down the precise sequence of
transformations.

  1. Original:
val = load A
store _, B      ; may-alias A
store _, C      ; may-alias A + D
store val, D
  1. Lift store val, D and its dependents and clobbers above the store to B:
val = load A
store _, C      ; may-alias A + D
store val, D
store _, B      ; may-alias A
  1. Implicitly lower the load to its counterpart store (this is the bug):
store _, C      ; may-alias A + D
val = load A
store val, D
store _, B      ; may-alias A
  1. Replace the load and store with a memcpy:
store _, C      ; may-alias A + D
memcpy(D, A)
store _, B      ; may-alias A

EDIT: Formatting.
EDIT 2: Ordering.

efriedma requested changes to this revision.Nov 21 2016, 4:13 PM
efriedma edited edge metadata.
efriedma added inline comments.
lib/Transforms/Scalar/MemCpyOptimizer.cpp
619 ↗(On Diff #78398)

I'm not really following what this patch is doing. Whether it's legal to sink the load to location P isn't related to whether it's legal to hoist the store to location P.

On a side-note, this whole block of code will never run in a normal pass pipeline; "T->isAggregateType()" will never be true because instcombine breaks up aggregate loads and stores.

test/Transforms/MemCpyOpt/load-store-to-memcpy.ll
2 ↗(On Diff #78398)

-aa-eval doesn't do anything useful here.

This revision now requires changes to proceed.Nov 21 2016, 4:13 PM
bryant updated this revision to Diff 78857.Nov 22 2016, 5:26 AM
bryant edited edge metadata.

better test:

  • inject some padding into the agg to thwart instcombine from exploding load/store.
  • remove -aa-eval.
bryant marked an inline comment as done.Nov 22 2016, 6:05 AM
bryant added inline comments.
lib/Transforms/Scalar/MemCpyOptimizer.cpp
619 ↗(On Diff #78398)

I'm not really following what this patch is doing. Whether it's legal to sink
the load to location P isn't related to whether it's legal to hoist the store
to location P.

The store isn't necessarily the only thing to be hoisted to P. Instructions that
the store 1) depends on or 2) aliases also need to be hoisted. That's moveUp's
job: To figure out which instructions, in addition to the store, need to be
hoisted.

Once all the hoisting is done, a memcpy/memmove is created to replace the load
and store. However, this implies sinking the load past the hoisted stuff to just
before the store, which may not be legal.

So this patch ensures that it's legal for the load to sink past the hoisted
stuff, even though no explicit sinking is needed.

Does that make sense?

On a side-note, this whole block of code will never run in a normal pass
pipeline; "T->isAggregateType()" will never be true because instcombine
breaks up aggregate loads and stores.

InstCombine refrains from exploding aggregate loads and stores when there's
padding. I've updated the test case accordingly.

bryant updated this revision to Diff 78863.Nov 22 2016, 6:08 AM
bryant edited edge metadata.

Update FileChecks to the new aggregate.

bryant added inline comments.Nov 22 2016, 6:12 AM
lib/Transforms/Scalar/MemCpyOptimizer.cpp
619 ↗(On Diff #78398)

Here's an example, in case the explanation wasn't clear:

%x = load
P   ; clobbers the load
X   ; clobbers the load and store
store %x

MCO will turn this into

X   ; clobbers the load and store
%x = load
store %x       ; merged with the load into a memcpy
P   ; clobbers the load

which is bad, since the load is moved past X.

efriedma accepted this revision.Nov 22 2016, 1:29 PM
efriedma edited edge metadata.

LGTM with the testcase fixed... but please don't commit changes to this code unless you're intending to write a followup which actually makes it useful. (If you're going to leave it in its current state, we might as well just delete it.)

lib/Transforms/Scalar/MemCpyOptimizer.cpp
619 ↗(On Diff #78398)

Oh, wait, I see, I was misunderstanding how the algorithm works (I somehow thought one of the loops was iterating the wrong way). Yes, your approach makes sense.

test/Transforms/MemCpyOpt/load-store-to-memcpy.ll
2 ↗(On Diff #78863)

evaluate-aa-metadata doesn't do anything here.

This revision is now accepted and ready to land.Nov 22 2016, 1:29 PM

LGTM with the testcase fixed... but please don't commit changes to this code unless you're intending to write a followup which actually makes it useful. (If you're going to leave it in its current state, we might as well just delete it.)

Not sure that I understand. Are you suggesting that we leave the bug unfixed?

deadalnix edited edge metadata.Nov 22 2016, 8:41 PM

LGTM with the testcase fixed... but please don't commit changes to this code unless you're intending to write a followup which actually makes it useful. (If you're going to leave it in its current state, we might as well just delete it.)

What makes you think this isn't useful and should be deleted ? This is the only thing that can do load/store forwarding in the codebase when aggregates contains padding, so this ends up being important. I think it is very much worth committing.

However, if you have a better alternative to suggest, please do, I can even probably spend some time on it. Improving aggregates support is important to me a several other non clang front end devs.

Oh, we special-case stores of aggregate types which specifically contain padding? Ugh, I missed that... that's *really* subtle. Also, there's probably some better way to set padding bits to undef, but that's an argument for another time.

@efriedma load/store of aggregate cannot be optimized the same way other load/store are done for various reasons.

If the aggregate do not contains padding, then it is possible to deaggregate in a serie of load/store and recompose the integrate. But in the case the aggregate has some padding, this transformation is lossy. However, doing it is memcpy and alike is not lossy so here you go. This is also useful for large aggregates. This transformation is already gated on aggregates types.

lib/Transforms/Scalar/MemCpyOptimizer.cpp
598 ↗(On Diff #78863)

This is where the transformation is gated on aggregates.

bryant updated this revision to Diff 79340.Nov 26 2016, 11:35 AM
bryant edited edge metadata.

Removed -evaluate-aa-metadata.

bryant marked an inline comment as done.Nov 26 2016, 11:35 AM
bryant added inline comments.Dec 27 2016, 9:24 AM
test/Transforms/MemCpyOpt/load-store-to-memcpy.ll
6 ↗(On Diff #79340)

fix typo.

bryant updated this revision to Diff 82546.Dec 27 2016, 10:05 AM
  • "hould" to "should."
  • remove unneeded stderr redirect.
bryant marked an inline comment as done.Dec 27 2016, 10:06 AM
This revision was automatically updated to reflect the committed changes.