This patch extends the existing FMA form switching algorithm (X86TargetLowering::emitFMA3Instr) with 3 more heuristics for rearranging operands:
- operand defined via instruction with canFoldAsLoad()==true moves to 3rd place to help memory folding and to save a phys register.
- if FMA result is written into phys register and one of operands is defined by copying from the same phys register, then make this operand 1st to eliminate excessive COPY.
- prefer to make kill> operand 1st as it can help to re-use phys. registers.
This patch fixes cases from http://llvm.org/bugs/show_bug.cgi?id=17229
The issue from http://llvm.org/bugs/show_bug.cgi?id=20043 is also fixed by this patch, but later TwoAddressInstructionPass decides
to commute operands I arranged and re-creates excessive COPY. So it needs to be investigated and fixed later.
I haven't seen large performance gain of this path for internal benchmarks we use (Elena D. in http://permalink.gmane.org/gmane.comp.compilers.llvm.devel/69035 was right :-) ), but lots of MOVAPS disappeared and the number stack operations also decreased. I also have not seen noticeable regressions for benchmarks I run.