Download Raw Diff

Details

Reviewers

deadalnix
mehdi_amini
efriedma

Commits

rG7cb744621b28: [MemCpyOpt] Don't sink LoadInst below possible clobber.
rL290611: [MemCpyOpt] Don't sink LoadInst below possible clobber.

Summary

Currently, MemCpyOpt::processStore will convert this

val = load A
store _, B      ; may-alias A
store _, C      ; may-alias A + D
store val, D

into

store _, C      ; may-alias A + D
val = load A
store val, D
store _, B      ; may-alias A

and then into

store _, C      ; may-alias A + D
memcpy(D, A)
store _, B      ; may-alias A

This is incorrect, since C and A may alias and therefore cannot be re-ordered.

Adding Amaury since this was introduced in http://reviews.llvm.org/D16523 .

Diff Detail

Repository: rL LLVM

Event Timeline

bryant updated this revision to Diff 78398.Nov 17 2016, 12:34 PM

bryant retitled this revision from to [MemCpyOpt] Don't sink LoadInst below possible clobber..

bryant updated this object.

bryant added reviewers: deadalnix, efriedma, mehdi_amini.

bryant set the repository for this revision to rL LLVM.

bryant added a subscriber: llvm-commits.

For the sake of clarity, I've broken down the precise sequence of
transformations.

Original:

val = load A
store _, B      ; may-alias A
store _, C      ; may-alias A + D
store val, D

Lift store val, D and its dependents and clobbers above the store to B:

val = load A
store _, C      ; may-alias A + D
store val, D
store _, B      ; may-alias A

Implicitly lower the load to its counterpart store (this is the bug):

store _, C      ; may-alias A + D
val = load A
store val, D
store _, B      ; may-alias A

Replace the load and store with a memcpy:

store _, C      ; may-alias A + D
memcpy(D, A)
store _, B      ; may-alias A

EDIT: Formatting.
EDIT 2: Ordering.

efriedma requested changes to this revision.Nov 21 2016, 4:13 PM

efriedma edited edge metadata.

efriedma added inline comments.

lib/Transforms/Scalar/MemCpyOptimizer.cpp
619 ↗	(On Diff #78398)	I'm not really following what this patch is doing. Whether it's legal to sink the load to location P isn't related to whether it's legal to hoist the store to location P. On a side-note, this whole block of code will never run in a normal pass pipeline; "T->isAggregateType()" will never be true because instcombine breaks up aggregate loads and stores.
test/Transforms/MemCpyOpt/load-store-to-memcpy.ll
2 ↗	(On Diff #78398)	-aa-eval doesn't do anything useful here.

This revision now requires changes to proceed.Nov 21 2016, 4:13 PM

better test:

inject some padding into the agg to thwart instcombine from exploding load/store.
remove -aa-eval.

bryant marked an inline comment as done.Nov 22 2016, 6:05 AM

bryant added inline comments.

lib/Transforms/Scalar/MemCpyOptimizer.cpp
619 ↗	(On Diff #78398)	I'm not really following what this patch is doing. Whether it's legal to sink the load to location P isn't related to whether it's legal to hoist the store to location P. The store isn't necessarily the only thing to be hoisted to P. Instructions that the store 1) depends on or 2) aliases also need to be hoisted. That's `moveUp`'s job: To figure out which instructions, in addition to the store, need to be hoisted. Once all the hoisting is done, a memcpy/memmove is created to replace the load and store. However, this implies sinking the load past the hoisted stuff to just before the store, which may not be legal. So this patch ensures that it's legal for the load to sink past the hoisted stuff, even though no explicit sinking is needed. Does that make sense? On a side-note, this whole block of code will never run in a normal pass pipeline; "T->isAggregateType()" will never be true because instcombine breaks up aggregate loads and stores. InstCombine refrains from exploding aggregate loads and stores when there's padding. I've updated the test case accordingly.

Update FileChecks to the new aggregate.

bryant added inline comments.Nov 22 2016, 6:12 AM

lib/Transforms/Scalar/MemCpyOptimizer.cpp
619 ↗	(On Diff #78398)	Here's an example, in case the explanation wasn't clear: %x = load P ; clobbers the load X ; clobbers the load and store store %x MCO will turn this into X ; clobbers the load and store %x = load store %x ; merged with the load into a memcpy P ; clobbers the load which is bad, since the load is moved past X.

LGTM with the testcase fixed... but please don't commit changes to this code unless you're intending to write a followup which actually makes it useful. (If you're going to leave it in its current state, we might as well just delete it.)

lib/Transforms/Scalar/MemCpyOptimizer.cpp
619 ↗	(On Diff #78398)	Oh, wait, I see, I was misunderstanding how the algorithm works (I somehow thought one of the loops was iterating the wrong way). Yes, your approach makes sense.
test/Transforms/MemCpyOpt/load-store-to-memcpy.ll
2 ↗	(On Diff #78863)	evaluate-aa-metadata doesn't do anything here.

This revision is now accepted and ready to land.Nov 22 2016, 1:29 PM

In D26811#603131, @efriedma wrote:

LGTM with the testcase fixed... but please don't commit changes to this code unless you're intending to write a followup which actually makes it useful. (If you're going to leave it in its current state, we might as well just delete it.)

Not sure that I understand. Are you suggesting that we leave the bug unfixed?

In D26811#603131, @efriedma wrote:

LGTM with the testcase fixed... but please don't commit changes to this code unless you're intending to write a followup which actually makes it useful. (If you're going to leave it in its current state, we might as well just delete it.)

What makes you think this isn't useful and should be deleted ? This is the only thing that can do load/store forwarding in the codebase when aggregates contains padding, so this ends up being important. I think it is very much worth committing.

However, if you have a better alternative to suggest, please do, I can even probably spend some time on it. Improving aggregates support is important to me a several other non clang front end devs.

Oh, we special-case stores of aggregate types which specifically contain padding? Ugh, I missed that... that's *really* subtle. Also, there's probably some better way to set padding bits to undef, but that's an argument for another time.

@efriedma load/store of aggregate cannot be optimized the same way other load/store are done for various reasons.

If the aggregate do not contains padding, then it is possible to deaggregate in a serie of load/store and recompose the integrate. But in the case the aggregate has some padding, this transformation is lossy. However, doing it is memcpy and alike is not lossy so here you go. This is also useful for large aggregates. This transformation is already gated on aggregates types.

lib/Transforms/Scalar/MemCpyOptimizer.cpp
598 ↗	(On Diff #78863)	This is where the transformation is gated on aggregates.

Removed -evaluate-aa-metadata.

bryant marked an inline comment as done.Nov 26 2016, 11:35 AM

bryant added inline comments.Dec 27 2016, 9:24 AM

test/Transforms/MemCpyOpt/load-store-to-memcpy.ll
6 ↗	(On Diff #79340)	fix typo.

"hould" to "should."
remove unneeded stderr redirect.

bryant marked an inline comment as done.Dec 27 2016, 10:06 AM

Closed by commit rL290611: [MemCpyOpt] Don't sink LoadInst below possible clobber. (authored by bryant). · Explain WhyDec 27 2016, 10:08 AM

This revision was automatically updated to reflect the committed changes.

This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] Don't sink LoadInst below possible clobber.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 82547

llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp

llvm/trunk/test/Transforms/MemCpyOpt/load-store-to-memcpy.ll

This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] Don't sink LoadInst below possible clobber.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 82547

llvm/trunk/lib/Transforms/Scalar/MemCpyOptimizer.cpp

llvm/trunk/test/Transforms/MemCpyOpt/load-store-to-memcpy.ll

[MemCpyOpt] Don't sink LoadInst below possible clobber.
ClosedPublic