fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)
This is only allowed when "reassoc" is present on the fadd.
As discussed in D80801, this transform goes beyond what is allowed by "contract" FMF (-ffp-contract=fast). That is because we are fusing the trailing add of 'E' with a multiply, but without "reassoc", the code mandates that the products A*B and C*D are added together before adding in 'E'.
I've added this example to the LangRef to try to clarify the meaning of "contract". If that seems reasonable, we should probably do something similar for the clang docs because there does not appear to be any formal spec for the behavior of -ffp-contract=fast.
Do we need to check Aggressive here? For a hypothetical target with 2 FMUL/FADD ports and 1 FMA port, assuming slow FMAs, this could be a performance loss.
It shouldn't be a problem for modern chips that I care about, so just picking nits.