Optimized FMA intrinsic + FNEG , like
-(a*b+c)
and FNEG + FMA, like
a*b-c or (-a)*b+c
Legalization of ISD::FNEG is delayed to give a chance to combine it with X86ISD::FMADD/MSUB/NMADD/NMSUB.
7 out of 8 test cases are optimized. The last test case requires additional investigation and, probably, changes in the clang header.
It's good to generalize this, but you should also change the variable names from Mask* to something generic since it's not about shuffles anymore. Please make this change ahead of the fneg changes to reduce the size of the patch.