This is an archive of the discontinued LLVM Phabricator instance.

[X86] Fix bug: Scalar FMA intrinsics generate wrong result
AbandonedPublic

Authored by LiuChen3 on Mar 3 2020, 1:25 AM.

Details

Summary

For example, _mm_maskz_fmadd_sd would generate the following assembly:

vmovapd 48(%rsp), %xmm1
vmovapd 32(%rsp), %xmm2
vmovapd 16(%rsp), %xmm0
kmovw %eax, %k1
vfmadd231sd %xmm2, %xmm1, %xmm0 {%k1} {z} # xmm0 = (xmm1 * xmm2) + xmm0

In some cases it will be optimized as follows:

vmovapd 48(%rsp), %xmm0
vmovapd 32(%rsp), %xmm1
vmovapd 16(%rsp), %xmm2
kmovw %eax, %k1
vfmadd213sd %xmm2, %xmm1, %xmm0 {%k1} {z} # xmm0 = (xmm1 * xmm0) + xmm2

The upper 64 bit of the result isn't right.

Diff Detail

Event Timeline

LiuChen3 created this revision.Mar 3 2020, 1:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2020, 1:25 AM

Doesn't the modifier {z} clean the upper 64 bits?

Doesn't the modifier {z} clean the upper 64 bits?

DEST[127:63] should be unchanged. In the example, the upper 64bit should be upper 64 bit of 16(%rsp).

test case ?

LiuChen3 abandoned this revision.Mar 4 2020, 5:39 PM

Has been correctly fix by D75526 .