This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] Fix wrong merging of adjacent nontemporal stores into memset calls.
ClosedPublic

Authored by andreadb on Oct 7 2015, 10:34 AM.

Details

Summary

Pass MemCpyOpt doesn't check if a store instructions is nontemporal.
As a consequence, adjacent nontemporal stores may be wrongly merged into memset calls.

Example:

define void @foo(<4 x float>* nocapture %dst) {
entry:
  store <4 x float> zeroinitializer, <4 x float>* %dst, align 16, !nontemporal !0
  %ptr1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
  store <4 x float> zeroinitializer, <4 x float>* %ptr1, align 16, !nontemporal !0
  ret void
}

!0 = !{i32 1}

In this example, the two nontemporal stores are combined to a memset of zero which does not preserve the nontemporal hint. Later on the backend (tested on my x86-64 corei7) expands that memset call into a sequence of two normal 16-byte aligned vector stores.

opt -memcpyopt foo.ll -S -o - | llc -mcpu=corei7 -o -

Before:

xorps  %xmm0, %xmm0
movaps  %xmm0, 16(%rdi)
movaps  %xmm0, (%rdi)

With this patch, we no longer merge nontemporal stores into calls to memset.
In this example, llc correctly expands the two stores into two movntps:

xorps  %xmm0, %xmm0
movntps %xmm0, 16(%rdi)
movntps  %xmm0, (%rdi)

Please let me know if okay to submit.

Thanks,
Andrea

Diff Detail

Repository
rL LLVM

Event Timeline

andreadb updated this revision to Diff 36761.Oct 7 2015, 10:34 AM
andreadb retitled this revision from to [MemCpyOpt] Fix wrong merging of adjacent nontemporal stores into memset calls..
andreadb updated this object.
andreadb added reviewers: ab, qcolombet, hfinkel.
andreadb added a subscriber: llvm-commits.
qcolombet added inline comments.Oct 7 2015, 12:25 PM
lib/Transforms/Scalar/MemCpyOptimizer.cpp
495 ↗(On Diff #36761)

Would it make sense tp extend the metadata to allow this propagation to happen?
I believe the problem here is that we need to differentiate between the load or the store being non-temporal.

andreadb added inline comments.Oct 8 2015, 6:53 AM
lib/Transforms/Scalar/MemCpyOptimizer.cpp
495 ↗(On Diff #36761)

Hi Quentin,

Are you suggesting to extend the usage of !nontemporal metadata to memcpy/memset calls as well? If my understanding is correct then yes, that would definitely enable the merging of nontemporal stores.

However - correct me if I am wrong - I am under the impression that a change like that would only have the effect of forcing the backend to expand !nontemporal memsets back to sequences of store instructions.
A memset library call would not have the same semantic of a builtin !nontemporal memset call. So, SelectionDAG will have to conservatively expand it back to a sequence of !nontemporal stores (effectively undoing the merging).
So, unless I got it completely wrong (which may be the case :-) ), I find it difficult to justify a change like that one.

What do you think?

qcolombet accepted this revision.Oct 8 2015, 9:53 AM
qcolombet edited edge metadata.

Hi Andrea,

I am under the impression that a change like that would only have the effect of forcing the backend to expand !nontemporal memsets back to sequences of store instructions.

That’s a good point.
Add that in the comment so that we remember why we made this design decision :).

LGTM.

Thanks,
Q.

This revision is now accepted and ready to land.Oct 8 2015, 9:53 AM
This revision was automatically updated to reflect the committed changes.