This is an archive of the discontinued LLVM Phabricator instance.

[EarlyCSE] DSE of stores which write back loaded values
ClosedPublic

Authored by reames on Dec 9 2015, 3:25 PM.

Details

Summary

Extend EarlyCSE with an additional style of dead store elimination. If we write back a value just read from that memory location, we can eliminate the store under the assumption that the value hasn't changed.

I'd really appreciate someone taking a close look at my "dse5" test case. I want to make sure the logic involved holds up. For normal loads and stores it clearly does - for the store to change the value, there must be a race. I believe it also holds for unordered atomics, but would really like a second opinion.

I'm implementing this mostly because I noticed the omission when looking at the code. It seemed strange to have InstCombine have a peephole which was more powerful than EarlyCSE. :)

Diff Detail

Repository
rL LLVM

Event Timeline

reames updated this revision to Diff 42350.Dec 9 2015, 3:25 PM
reames retitled this revision from to [EarlyCSE] DSE of stores which write back loaded values.
reames updated this object.
reames added a subscriber: llvm-commits.
hfinkel accepted this revision.Dec 10 2015, 1:43 AM
hfinkel edited edge metadata.

LGTM.

lib/Transforms/Scalar/EarlyCSE.cpp
711 ↗(On Diff #42350)

Comments should be sentences (start with a capital, end with a period).

test/Transforms/EarlyCSE/basic.ll
252 ↗(On Diff #42350)

I agree this makes sense. Regardless of what else happens (for %P == %Q), we're always free to pick an ordering in which %v == %a and where we always observe our own store, implying %a == %b.

This revision is now accepted and ready to land.Dec 10 2015, 1:43 AM
jfb edited edge metadata.Dec 10 2015, 12:23 PM

Technically you could even remove seq_cst load/store pairs if they're not synchronizing with anything in the middle. You'd still have to preserve the effect with a fence, but the access is dead since it must have raced if it didn't synchronize with anything else.

Just to confirm (even though this optimization happens elsewhere): we don't care about memory accesses which normalize FP values, e.g. canonicalizing NaNs or flushing denormals to zero?

test/Transforms/EarlyCSE/basic.ll
248 ↗(On Diff #42350)

Clarify: "that's okay because we're using relaxed memory ordering".

In D15397#307456, @jfb wrote:

Technically you could even remove seq_cst load/store pairs if they're not synchronizing with anything in the middle. You'd still have to preserve the effect with a fence, but the access is dead since it must have raced if it didn't synchronize with anything else.

Not going to argue this. I don't care to optimize this case and reasoning about it in enough detail to agree or disagree is more investment than I want to make right now.

Just to confirm (even though this optimization happens elsewhere): we don't care about memory accesses which normalize FP values, e.g. canonicalizing NaNs or flushing denormals to zero?

Hm. Good question. I don't know the answer to this. In practice, we side step the question by not allowing atomic stores of floating point at all - the verifier will reject - but given I'm planning on changing that soon, that's an interesting question.

jfb added a comment.Dec 10 2015, 11:53 PM
In D15397#307456, @jfb wrote:

Technically you could even remove seq_cst load/store pairs if they're not synchronizing with anything in the middle. You'd still have to preserve the effect with a fence, but the access is dead since it must have raced if it didn't synchronize with anything else.

Not going to argue this. I don't care to optimize this case and reasoning about it in enough detail to agree or disagree is more investment than I want to make right now.

Sounds good.

Just to confirm (even though this optimization happens elsewhere): we don't care about memory accesses which normalize FP values, e.g. canonicalizing NaNs or flushing denormals to zero?

Hm. Good question. I don't know the answer to this. In practice, we side step the question by not allowing atomic stores of floating point at all - the verifier will reject - but given I'm planning on changing that soon, that's an interesting question.

Dead non-atomic FP stores get eliminated, so I'm assuming this is fine.

This revision was automatically updated to reflect the committed changes.