Hi Tim and other reviewers,
This patch try to combine multiple FDIVs with the same divisor to one FDIV and multiple FMULs. This can have benefit on performance because a FMUL is much faster than a FDIV.
E.g. we combine:
a / D; b / D; c / D;
To
recip = 1.0 / D; a * recip; b * recip; c * recip;
This is not always benefit, as we can see that the critical path increases from one FDIV to one FDIV and one FMUL, which may cause regressions. So this patch will only do such combine when there are more than 2 FDIVs.
This patch can only benefit some special benchmarks.
Our performance test on Cortex-A57 shows only SPEC2006 benchmark 188.ammp has 2.5%-3.0% improvement.
Review please.
Thanks,
-Hao
Don't do the comparison by creating new nodes. You can:
Create the FPOne later if you need it.