Instead of sub and mul instructions, use v_mad, v_mac or v_fma if fma
instructions are faster and are legal for the given architecture.
Combiner for a simple case that has only one subtraction and one
multiplication instruction and transforms them into some of the fma
instructions depending on the architecture.
Details
Diff Detail
Unit Tests
Event Timeline
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp | ||
---|---|---|
4085–4086 | This should probably allow vectors we can break down later too | |
4088 | Don't see where isFMADLegal is fedined | |
4107–4119 | I'm not sure I follow this heuristic, or what SwapPriority means | |
4160–4161 | The types are all identical, there's no reason to query every type | |
4164 | You can directly use the type and avoid the explicit createGenericVirtualRegister with | |
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-sub-mul.ll | ||
118–119 | Why did we fail to fold the modifier here? |
Thanks for the suggestions!
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp | ||
---|---|---|
4088 | In the other revision (the parent): D93305 | |
4107–4119 | If SwapPriority is equal to 0 that means that the first and second operands aren't both fmul instructions, if it's equal to 2 it means that both of the arguments are fmul and that the second arg has fewer uses so we pick him for folding, vice versa if it's equal to 1. I will make it more simple in the next version. |
Put back the accidentally deleted combiner from the list of combiners (load_or_combine).
Similar to SelectionDAG which negates an SDValue:
fold (fsub (fmul x, y), z) -> (fma x, y, (fneg z)) DAG.getNode(PreferredFusedOpcode, SL, VT, XY.getOperand(0), XY.getOperand(1), DAG.getNode(ISD::FNEG, SL, VT, Z));
Use m_MInstr instead of m_Reg in matching patterns (mi_match).
A few minor bug fixes.
Formatting and refactoring.
clang-format for CombinerHelper.cpp
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-sub-mul.ll | ||
---|---|---|
203 | %a and %z should be swapped here, otherwise this is the same test as the one above. Also combiner fails for this test for -mcpu=gfx900 --denormal-fp-math=preserve-sign. Same for test above (test_half_sub_mul). It produces correct result only because fsub is replaced by fadd + fneg in legalizer and then is probably matched by one of other combiners that start from fadd. | |
680 | Same here, swap %a and %z. |
This should probably allow vectors we can break down later too