This is an archive of the discontinued LLVM Phabricator instance.

[X86] When pattern-matching scalar FMA3 intrinsics, don't re-arrange the first and second operands
ClosedPublic

Authored by mkuper on May 21 2015, 5:38 AM.

Details

Summary

The semantics of the scalar FMA intrinsics are that the high vector elements are copied from the first source, e.g. (from the Intel manual):

m128 _mm_fmadd_ss (m128 a, m128 b, m128 c)
Operation:
dst[31:0] := (a[31:0] * b[31:0]) + c[31:0]
dst[127:32] := a[127:32]
dst[MAX:128] := 0

The current pattern switches src1 and src2 around (I guess to match the "213" order), which ends up tying the original src2 to the dest.
Since the actual scalar fma3 instructions copy the high elements from the dest register, the wrong values are copied.

This modifies the pattern to leave src1 and src2 in their original order.

Diff Detail

Repository
rL LLVM

Event Timeline

mkuper updated this revision to Diff 26222.May 21 2015, 5:38 AM
mkuper retitled this revision from to [X86] When pattern-matching scalar FMA3 intrinsics, don't re-arrange the first and second operands.
mkuper updated this object.
mkuper edited the test plan for this revision. (Show Details)
mkuper added reviewers: delena, lhames, craig.topper.
mkuper added a subscriber: Unknown Object (MLST).
delena edited edge metadata.May 25 2015, 1:05 AM

LGTM

lib/Target/X86/X86InstrFMA.td
190 ↗(On Diff #26222)

Please add a comment, that you use 1-2-3 instead of 2-1-3 because src1 is tied to dest.

Will do.
Thanks, Elena!

This revision was automatically updated to reflect the committed changes.