Hello,
Please review the patch that implements the commute transformations for
X86-FMA3 FMA*_Int opcodes (i.e. opcodes generated only for scalar FMA intrinsics).
Previously, the commute transformation was implemented for all FMA3 instructions
except FMA*_Int. Please see ( D13269 ) for details.
So, this change-set is mostly a minor tuning/update of the optimization introduced in ( D13269 ).
X86InstrFMA.td:
Set the 'isCommuteble' attribute to 1 for FMA*_Int opcodes.
X86InstrInfo.cpp:
Added FMA*_Int opcodes to isFMA3() routine.
Added a table containing FMA*_Int in groups of three opcodes in each group(132, 213, 231).
fma-commute-x86.ll:
Tightened the checks.
Changed the FMA opcode generated for scalar intrincis on Windows.
The generated code is different now because previously the FMA*_INt instructions
were not commutable. Now they are.
PeepholeOptimizer tries to do memory folding of operands starting from the 1st operand.
The 1st operand cannot be commuted with 3rd (foldable) operand as it chagnes the intrinsic result.
The 2nd operand can be commuted with 3rd. So, it was commuted. The 2nd operand became 3rd
(that also required the opcode changes from 213 to 132); then the operand was folded
with the load.
fma-commute-x86.ll:
Added some test cases for scalar intrinsics.
Thank you,
Slava