If the DAG looks like this: a*b+c*d, it could be folded into fma(a, b, c*d) or fma(c, d, a*b). https://reviews.llvm.org/D11855 was posted to improve it that respects the uses of a*b or c*d to do the best choice. 
But for a*b-c*d, it could be also folded into fma(a, b, -c*d) or fma(-c, d, a*b). This patch is trying to respect the uses of a*b and c*d to make the best choice. 
And this is the motivated case:
define double @fsub1(double %a, double %b, double %c, double %d)  {
entry:
  %mul = fmul fast double %b, %a
  %mul1 = fmul fast double %d, %c
  %sub = fsub fast double %mul, %mul1
  %mul3 = fmul fast double %mul, %sub
  ret double %mul3
} define double @fsub1(double %a, double %b, double %c, double %d)  {
 ; CHECK-LABEL: fsub1:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    xsmuldp 3, 4, 3
 ; CHECK-NEXT:    xsmuldp 0, 2, 1
-; CHECK-NEXT:    xsmsubadp 3, 2, 1
-; CHECK-NEXT:    xsmuldp 1, 0, 3
+; CHECK-NEXT:    fmr 1, 0
+; CHECK-NEXT:    xsnmsubadp 1, 4, 3
+; CHECK-NEXT:    xsmuldp 1, 0, 1
I'm not sure why Aggressive && isContractableFMUL(N0) && isContractableFMUL(N1) check is here?
That is already checked in the lambdas.