This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add separate intrinsics for scalar FMA4 instructions.
ClosedPublic

Authored by craig.topper on Nov 9 2017, 10:34 AM.

Details

Summary

These instructions zero the non-scalar part of the lower 128-bits which makes them different than the FMA3 instructions which pass through the non-scalar part of the lower 128-bits.

I've only added fmadd because we should be able to derive all other variants using operand negation in the intrinsic header like we do for AVX512.

I think there are still some missed negate folding opportunities with the FMA4 instructions in light of this behavior difference that I hadn't noticed before.

I've split the tests so that we can use different intrinsics for scalar testing between the two. I just copied the tests split the RUN lines and changed out the scalar intrinsics.

fma4-fneg-combine.ll is a new test to make sure we negate the fma4 intrinsics correctly though there are a couple TODOs in it.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Nov 9 2017, 10:34 AM
craig.topper edited the summary of this revision. (Show Details)Nov 9 2017, 10:35 AM
craig.topper added reviewers: RKSimon, spatel.
  • Gather optimization

Remove accidental update

RKSimon added inline comments.Nov 25 2017, 9:08 AM
lib/Target/X86/X86Subtarget.h
466 ↗(On Diff #122286)

This change concerns me - bdver2/bdver3 both support FMA3 as well as FMA4 but via a microcoding hack that costs extra cycles to perform, hence the preference for FMA4.

test/CodeGen/X86/fma4-fneg-combine.ll
2 ↗(On Diff #122286)

Add -mattr=+fma4,+fma tests as well?

test/CodeGen/X86/fma4-intrinsics-x86.ll
2 ↗(On Diff #122286)

Add -mattr=+fma4,+fma tests as well?

craig.topper added inline comments.Nov 25 2017, 9:57 AM
lib/Target/X86/X86Subtarget.h
466 ↗(On Diff #122286)

I'm still giving priority to FMA4 for the generic fma intrinsic and the packed x86 intrinsics, I'm just doing it by including NoFMA4 in the "Requires" line in X86InstrFormats.td now.

RKSimon accepted this revision.Nov 25 2017, 10:02 AM

LGTM - please still add +fma4,+fma tests.

This revision is now accepted and ready to land.Nov 25 2017, 10:02 AM
This revision was automatically updated to reflect the committed changes.