The current fast-math implementation is based on DAGCombiner. One of disadvantages of that is ignoring of any cost model (throughput and/or length). Another disadvantage is implementation of target specific optimization at improper level of transformations. The introduced version moves the implementation lower into MachineCombiner. As result we're using the existing mechanism of transformations like getMachineCombinerPatterns and getAlternativeCodeSequence. In addition both throughput and length cost models are being used automatically.
This patch is only initial step to demonstrate the intention. The patch implements only one type of transformation: the reciprocal estimated code instead of vdivss instruction. I'm going to support other types of reciprocal optimizations as well. But I'd like to get first comments on this my job. The code is safely compiled and produces the working output.