For the case A * B + A * C, now, reassociate pass(function OptimizeAdd) will change it to A * (B + C) to save one mul as A is a common mul factor .
But transformation like above has no benefit at all on PowerPC target. In fact, if target prefers fma, it generates worse IR.
Because on PowerPC target:
A * B + A * C can be generated as fma(fmul(A,B), A, C);
A * (B + C) can be generated as fmul(A, fadd(B, C));
fma, fmul, fadd, fsub all have same latency on PowerPC arch, so no cpu cycle benefit.
Reducing number of mul also makes number of fma reduce. So this is not a benefit transformation on PowerPC.
This patch tries to bail out this opt early to expose more fma folding opportunities and save some compile time on PowerPC target.
clang-format: please reformat the code