This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GlobalISel] Transform (fsub (fmul x, y), z) -> (fma x, y, -z)
ClosedPublic

Authored by matejam on Feb 12 2021, 9:29 AM.

Details

Summary

Instead of sub and mul instructions, use v_mad, v_mac or v_fma if fma
instructions are faster and are legal for the given architecture.
Combiner for a simple case that has only one subtraction and one
multiplication instruction and transforms them into some of the fma
instructions depending on the architecture.

Diff Detail

Event Timeline

matejam created this revision.Feb 12 2021, 9:29 AM
matejam requested review of this revision.Feb 12 2021, 9:29 AM
arsenm added inline comments.Feb 12 2021, 9:39 AM
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
4085–4086

This should probably allow vectors we can break down later too

4088

Don't see where isFMADLegal is fedined

4107–4119

I'm not sure I follow this heuristic, or what SwapPriority means

4160–4161

The types are all identical, there's no reason to query every type

4164

You can directly use the type and avoid the explicit createGenericVirtualRegister with
auto Neg = B.buildFNeg(Ty, X)

llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-sub-mul.ll
118–119

Why did we fail to fold the modifier here?

matejam updated this revision to Diff 327459.Mar 2 2021, 8:05 AM

Added support for vector types and some refactoring.

Thanks for the suggestions!

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
4088

In the other revision (the parent): D93305

4107–4119

If SwapPriority is equal to 0 that means that the first and second operands aren't both fmul instructions, if it's equal to 2 it means that both of the arguments are fmul and that the second arg has fewer uses so we pick him for folding, vice versa if it's equal to 1. I will make it more simple in the next version.

matejam updated this revision to Diff 328111.Mar 4 2021, 3:45 AM
This comment was removed by matejam.
matejam updated this revision to Diff 329335.Mar 9 2021, 7:18 AM

Put back the accidentally deleted combiner from the list of combiners (load_or_combine).

matejam updated this revision to Diff 343374.May 6 2021, 5:47 AM

Minor changes in CombinerHelper.cpp and in the tests.

matejam updated this revision to Diff 346787.May 20 2021, 10:27 AM

Use mi_match for comparing instructions instead of comparing the opcodes.

foad added a comment.May 28 2021, 7:51 AM

Does this assume that all targets can do the fneg for free? Or can a target choose to fold only (fadd (fmul x, y), z), not (fsub (fmul x, y), z) ?

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
4541

Typo "refers", and they're called MI0 and MI1.

4546

Use hasMoreUses() from the previous patch.

Does this assume that all targets can do the fneg for free? Or can a target choose to fold only (fadd (fmul x, y), z), not (fsub (fmul x, y), z) ?

Similar to SelectionDAG which negates an SDValue:

fold (fsub (fmul x, y), z) -> (fma x, y, (fneg z))
DAG.getNode(PreferredFusedOpcode, SL, VT, XY.getOperand(0),
            XY.getOperand(1), DAG.getNode(ISD::FNEG, SL, VT, Z));
matejam updated this revision to Diff 349856.Jun 4 2021, 7:10 AM

Typos and refactoring.

matejam updated this revision to Diff 375873.Sep 29 2021, 7:05 AM

Use m_MInstr instead of m_Reg in matching patterns (mi_match).
A few minor bug fixes.
Formatting and refactoring.

matejam updated this revision to Diff 376206.Sep 30 2021, 7:09 AM

Use applyBuildFn instead of writing my own apply.

mbrkusanin added inline comments.Sep 30 2021, 9:02 AM
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
4687–4691

Rename Src3 to NegZ so it matches the comment above.
Use B not Builder.

4700–4704

Rename Src1 to NegY so it matches the comment above.
Use B not Builder.

matejam updated this revision to Diff 376520.Oct 1 2021, 7:23 AM

Changes in tests.

clang-format for CombinerHelper.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-sub-mul.ll
203

%a and %z should be swapped here, otherwise this is the same test as the one above.

Also combiner fails for this test for -mcpu=gfx900 --denormal-fp-math=preserve-sign.

Same for test above (test_half_sub_mul). It produces correct result only because fsub is replaced by fadd + fneg in legalizer and then is probably matched by one of other combiners that start from fadd.

680

Same here, swap %a and %z.

matejam updated this revision to Diff 384763.Nov 4 2021, 8:27 AM
matejam marked 2 inline comments as done.

Minor changes in tests.

This revision is now accepted and ready to land.Nov 17 2021, 9:58 AM
matejam updated this revision to Diff 390331.Nov 29 2021, 6:20 AM

Added missing comments and an FMA combiner group in table gen.

This revision was landed with ongoing or failed builds.Nov 29 2021, 7:28 AM
This revision was automatically updated to reflect the committed changes.