This is an archive of the discontinued LLVM Phabricator instance.

[AVX-512] Correctly preserve the passthru semantics of the FMA scalar intrinsics
ClosedPublic

Authored by craig.topper on Nov 26 2016, 4:34 PM.

Details

Summary

Scalar intrinsics have specific semantics about the which input's upper bits are passed through to the output. The same input is also supposed to be the input we use for the lower element when the mask bit is 0 in a masked operation. We aren't currently keeping these semantics with instruction selection.

This patch corrects this by introducing new scalar FMA ISD nodes that indicate whether operand 1(one of the multiply inputs) or operand 3(the additon/subtraction input) should pass thru its upper bits.

We use this information to select 213/132 form for the operand 1 version and the 231 form for the operand 3 version.

We also use this information to suppress combining FNEG operations on the passthru input since semantically the passthru bits aren't negated. This is stronger than the earlier check added for a user being SELECTS so we can remove that.

This fixes PR30913.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper retitled this revision from to [AVX-512] Correctly preserve the passthru semantics of the FMA scalar intrinsics.
craig.topper updated this object.
craig.topper added reviewers: delena, zvi, v_klochkov.
craig.topper added a subscriber: llvm-commits.
zvi edited edge metadata.Dec 7 2016, 12:06 PM

I started reviewing the patch, but will need some more time to complete. So far LGTM.

v_klochkov edited edge metadata.Dec 7 2016, 3:46 PM

Hi Craig,

It looks good to me, I had only 2 minor comments below.

Also, in the process of the review I had the impression that not all needed forms of intrinsics are implemented.
Clearly, that is not your problem, I just wanted to mention it somewhere.
For example, I can see 3 forms {mask, mask3, maskz} for FMADD, but only 1 form {mask3} for FMSUB, etc.

15:35:49 > grep fmsub_sd X86IntrinsicsInfo.h

X86_INTRINSIC_DATA(avx512_mask3_vfmsub_sd, FMA_OP_SCALAR_MASK3, X86ISD::FMSUB_RND, 0),

15:36:08 > grep fmadd_sd X86IntrinsicsInfo.h

X86_INTRINSIC_DATA(avx512_mask_vfmadd_sd, FMA_OP_SCALAR_MASK, X86ISD::FMADD_RND, 0),
X86_INTRINSIC_DATA(avx512_mask3_vfmadd_sd, FMA_OP_SCALAR_MASK3, X86ISD::FMADD_RND, 0),
X86_INTRINSIC_DATA(avx512_maskz_vfmadd_sd, FMA_OP_SCALAR_MASKZ, X86ISD::FMADD_RND, 0),

Thank you,
Slava

lib/Target/X86/X86ISelLowering.cpp
31294 ↗(On Diff #79342)

Can you please change: 'a FMA node' to 'an FMA node'?

31307 ↗(On Diff #79342)

FNEG negates all elements of the input vector, while fixing the FMA intrinsic would negate only 1 element. Nothing can be done with that here, right? If so, then this comment should be just a regular comment (not a FIXME-comment).
Consider removing the 'FIXME'-word here. Or alternatively you could write here that we could have a special version of FNEG that negates only the lowest element of the FNEG argument.

v_klochkov accepted this revision.Dec 7 2016, 4:02 PM
v_klochkov edited edge metadata.

LGTM from me.
I think it also makes sense to wait for Zvi to complete his code-review of this patch.

This revision is now accepted and ready to land.Dec 7 2016, 4:02 PM

Slava, you are correct not all possible intrinsics are implemented. Clang uses negation operators in the intrinsic header file to implement the rest. This is why fneg combining is important. We have the mask3 intrinsics on fmsub/fnmsub because the passthru argument for masking is the operand that needs to be negated.

See this review for the addition of the scalar versions of mask3 fmsub/fnmsub https://reviews.llvm.org/D26019

delena added inline comments.Dec 8 2016, 2:20 AM
lib/Target/X86/X86ISelLowering.cpp
31801 ↗(On Diff #79342)

I remember that I did not check isScalarMaskedMode() in the first version and it was a bug.
Then I fixed it. Now you are removing it again..

I'm removing it because I'm checking for it being a scalar opcode and which src operand the scalar opcode expects to be preserved. The SELECTS should be created at the same time as the scalar opcode and the opcode already conveys which source might be bound to the SELECTS. So it should be unneeded now.

This revision was automatically updated to reflect the committed changes.