Instead of sub and mul instructions, use v_mad, v_mac or v_fma if fma
instructions are faster and are legal for the given architecture.
Combiner for a simple case that has only one subtraction and one
multiplication instruction and transforms them into some of the fma
instructions depending on the architecture.
Details
Diff Detail
Event Timeline
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp | ||
---|---|---|
4085–4086 | This should probably allow vectors we can break down later too | |
4088 | Don't see where isFMADLegal is fedined | |
4107–4119 | I'm not sure I follow this heuristic, or what SwapPriority means | |
4160–4161 | The types are all identical, there's no reason to query every type | |
4164 | You can directly use the type and avoid the explicit createGenericVirtualRegister with | |
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-sub-mul.ll | ||
118–119 | Why did we fail to fold the modifier here? |
Thanks for the suggestions!
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp | ||
---|---|---|
4088 | In the other revision (the parent): D93305 | |
4107–4119 | If SwapPriority is equal to 0 that means that the first and second operands aren't both fmul instructions, if it's equal to 2 it means that both of the arguments are fmul and that the second arg has fewer uses so we pick him for folding, vice versa if it's equal to 1. I will make it more simple in the next version. |
Put back the accidentally deleted combiner from the list of combiners (load_or_combine).
Similar to SelectionDAG which negates an SDValue:
fold (fsub (fmul x, y), z) -> (fma x, y, (fneg z)) DAG.getNode(PreferredFusedOpcode, SL, VT, XY.getOperand(0), XY.getOperand(1), DAG.getNode(ISD::FNEG, SL, VT, Z));
Use m_MInstr instead of m_Reg in matching patterns (mi_match).
A few minor bug fixes.
Formatting and refactoring.
clang-format for CombinerHelper.cpp
llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-sub-mul.ll | ||
---|---|---|
203 | %a and %z should be swapped here, otherwise this is the same test as the one above. Also combiner fails for this test for -mcpu=gfx900 --denormal-fp-math=preserve-sign. Same for test above (test_half_sub_mul). It produces correct result only because fsub is replaced by fadd + fneg in legalizer and then is probably matched by one of other combiners that start from fadd. | |
680 | Same here, swap %a and %z. |
This should probably allow vectors we can break down later too