Page MenuHomePhabricator

Redundant vmov instruction generated with vcvtph2ps

Authored by rob.lougher on Jan 11 2016, 7:50 AM.



Since revision 248784 the following code generates a redundant vmov:

m128 test1(m128i const *src) {

return _mm_cvtph_ps(_mm_loadl_epi64(src));



vmovq (%rdi), %xmm0 # xmm0 = mem[0],zero
vcvtph2ps %xmm0, %xmm0

The regression was caused by a change made in revision 247504 which teaches the instruction combiner that only the lower 64 bits of a 128-bit vcvtph2ps are used. The IR for the above code is:

define <4 x float> @_Z4testPKDv2_x(<2 x i64>* nocapture readonly %src) #0 {

%__u.i = getelementptr inbounds <2 x i64>, <2 x i64>* %src, i64 0, i64 0
%0 = load i64, i64* %__u.i, align 1, !tbaa !1
%vecinit.i = insertelement <2 x i64> undef, i64 %0, i32 0
%vecinit1.i = insertelement <2 x i64> %vecinit.i, i64 0, i32 1
%1 = bitcast <2 x i64> %vecinit1.i to <8 x i16>
%2 = call <4 x float> @llvm.x86.vcvtph2ps.128(<8 x i16> %1) #2
ret <4 x float> %2


After r247504 the instruction combiner can see that the second insertelement is redundant and it is deleted. While this is correct, it interferes with the isel patterns that select the memory-register form of VCVTPH2PS. Previously, the load followed by the two insertelements would have been recognized by the pattern fragment 'vzmovl_v2i64'. This would then have selected a VCVTPH2PSrm via the following pattern:

// Pattern match vcvtph2ps of a scalar i64 load.
def : Pat<(int_x86_vcvtph2ps_128 (vzmovl_v2i64 addr:$src)),
          (VCVTPH2PSrm addr:$src)>;

However, after r247504 we only have a single insertelement and the vzmovl_v2i64 is no longer matched. The reason the problem only occurs after r248784 is that prior to this the combiner was unable to peek through the bitcast operation.

Diff Detail


Event Timeline

rob.lougher retitled this revision from to Redundant vmov instruction generated with vcvtph2ps.
rob.lougher updated this object.
rob.lougher added reviewers: RKSimon, qcolombet.
rob.lougher added a subscriber: llvm-commits.
qcolombet accepted this revision.Jan 11 2016, 2:06 PM
qcolombet edited edge metadata.



Should we have something similar for the 256bit variant in a following patch?


This revision is now accepted and ready to land.Jan 11 2016, 2:06 PM

Hi Quentin,

Thanks for the review! I've done a little investigation into the 256-bit variant and I don't think we have an issue there. In this case we're converting from a full XMM register to a YMM register so we don't have the zero-entension as in the 64-bit case.


This revision was automatically updated to reflect the committed changes.