This change-set is one in the series of change-sets improving X86-FMA3 optimizations.
Please see (D11370) and (D13269) for more details.
This change-sets adds new X86-FMA3 opcodes that must be used for SCALAR FMA INTRINSICS.
The new FMA*_Int opcodes are similar to existing ADD*_Int, SUB*_Int, MUL*_Int opcodes.
The key difference between FMA* and FMA*_Int opcodes is that FMA*_Int opcodes are handled
more conservatively. For example, it is illegal to commute 1st and 2nd operands of FMA*_Int
as such commute transformation would change the upper bits of the intrinsic result which should be taken
from the 1st operand of the FMA intrinsic.
So, this patch fixes the existing problem in LLVM X86 Code-Gen.
The definitions of X86-FMA3 opcodes were simplified a lot.
Unused or unnecessary template parameters were removed.
Now the definitions look quite similar to definitions of ADD/SUB/MUL opcodes.
Temporarily, the FMA*_Int opcodes are defined as non-commutable.
This constraint was added to reduce the size of the current patch and it will be eliminated
in the next changes very soon.
X86InstrFMA.td:
- Simplified the definitions of scalar FMA3 opcodes by removing the template parameters that were unused or not really necessary.
- Added definitions for FMA3 opcodes generated for scalar FMA instructions.
X86InstrInfo.cpp:
- Added the new FMA*_Int opcodes to MemoryFoldTable3 to enable memory-folding optimization.
fma-intrinsics-ph-213-to-231.ll:
- Added tests for scalar FMA intrinscis. PHI-213-to-231 optimization should not be used for scalar intrinsics.
- Added tests for FNMADD and FNMSUB intrinsics to make the test more complete.
fma-intrinsics-x86.ll:
- Added more test cases to check that 1st and 2nd operands of scalar FMAs generated for intrinsics are not commuted anymore.
Do you have a test that checks memory folding of intrinsic?