Start with an assumption that FMA is faster than Fmul+FAdd. If thats not true on some particular implementation we can add a tuning parameter in the future.
I've update the fmuladd test cases and added new test cases for fast math flag based contraction.