This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Allow FMA combine with both FMA and FMAD
ClosedPublic

Authored by critson on Aug 24 2021, 1:44 AM.

Details

Summary

Without this change only the preferred fusion opcode is tested
when attempting to combine FMA operations.
If both FMA and FMAD are available then FMA ops formed prior to
legalization will not be merged post legalization as FMAD becomes
the preferred fusion opcode.

Diff Detail

Event Timeline

critson created this revision.Aug 24 2021, 1:44 AM
critson requested review of this revision.Aug 24 2021, 1:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2021, 1:44 AM
foad added a comment.Aug 24 2021, 2:02 AM

s/FMAC/FMAD/g in the commit message, since that's what SelectionDAG calls it.

foad added a comment.Aug 24 2021, 2:09 AM

will not be merged post legalization as FMAC becomes
the preferred fusion opcode

I'm not sure wha you mean about FMAD "becoming the preferred fusion opcode". Is that just an AMDGPU quirk? Or is there something in generic code that prefers FMA before legalization and FMAD after??

llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll
5

Could use GCN-COUNT-13: v_fma.

But maybe it would be better to generate the checks for this file? It's not immediately obvious (to me) why 13 is the optimal number of fmas. There are more than 13 fmuls and fadds and fsubs in the IR.

critson retitled this revision from [DAGCombine] Allow FMA combine with both FMA and FMAC to [DAGCombine] Allow FMA combine with both FMA and FMAD.Aug 24 2021, 7:27 PM
critson edited the summary of this revision. (Show Details)
critson marked an inline comment as done.Aug 24 2021, 7:42 PM

will not be merged post legalization as FMAC becomes
the preferred fusion opcode

I'm not sure wha you mean about FMAD "becoming the preferred fusion opcode". Is that just an AMDGPU quirk? Or is there something in generic code that prefers FMA before legalization and FMAD after??

bool HasFMAD = (LegalOperations && TLI.isFMADLegal(DAG, N));

bool HasFMA =
    TLI.isFMAFasterThanFMulAndFAdd(DAG.getMachineFunction(), VT) &&
    (!LegalOperations || TLI.isOperationLegalOrCustom(ISD::FMA, VT));

unsigned PreferredFusedOpcode = HasFMAD ? ISD::FMAD : ISD::FMA;

This is most probably an AMDGPU quirk.
FMAD is only available when LegalOperations is true, i.e. post legalize vector ops. Where as FMA is always legal if backend sets it to be Legal or Custom. I don't know the origin of this behaviour; however, the only backend that marks ISD::FMAD legal appears to be AMDGPU (SI and R600).
So preferred fusion opcode will only be FMAD on AMDGPU.

llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll
5

I'll generate the checks for this file and precommit.
This example was specifically reduced to take the code paths changed by this patch.
With the patch the number of instructions falls, number of FMAs goes from 14 to 13.

critson updated this revision to Diff 368544.Aug 24 2021, 8:19 PM
critson marked an inline comment as done.
  • Rebase onto pre-committed test
foad added inline comments.Aug 25 2021, 3:08 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13056

Why do you need the HasFMA and HasFMAD checks here? I can see that you don't want to create new FMA instructions if FMA is not legal, but I can't see why you wouldn't want to combine existing FMAs.

In other words can this just be return Opcode == ISD::FMA || Opcode == ISD::FMAD;?

critson marked an inline comment as done.Aug 25 2021, 4:20 AM
critson added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13056

Sure, this yields some test changes (instruction count reductions).

critson updated this revision to Diff 368604.Aug 25 2021, 4:20 AM
critson marked an inline comment as done.
  • Incorporate reviewer comments
foad accepted this revision.Aug 25 2021, 6:19 AM

Looks fine to me, with or without the tweak suggested inline.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13107

Could use FMA.getOpcode() instead of PreferredFusedOpcode here, to try to preserve the original opcode of the A*B+something part of the expression. But that's getting pretty subtle and I'm not sure if it will make any practical difference.

The same goes for any other parts of combines which preserve an fma(d) instead of creating a new one from scratch.

This revision is now accepted and ready to land.Aug 25 2021, 6:19 AM
critson updated this revision to Diff 369022.Aug 26 2021, 7:59 PM
  • Tweak to preserve original fusion opcode when forming new FMAs
critson marked an inline comment as done.Aug 26 2021, 7:59 PM
foad added inline comments.Aug 27 2021, 1:50 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13107

What I meant was, use FMA.getOpcode() for A*B+something because it's preserving an existing fma(d) from the input, but still use PreferredFusedOpcode for C*D+something because it is a new fma(d) that we are creating by fusing an fmul and and fadd from the input.

I realise this is pretty subtle and I'm not sure it will make any practical difference.

foad added inline comments.Aug 27 2021, 2:56 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13108

On second thoughts, perhaps there's no point trying to preserve the original opcode for the A*B+something part, because the something is different, so it's impossible to retain the (fused vs unfused) rounding behaviour of the original opcode.

So perhaps your original approach of using DAG.getNode(PreferredFusedOpcode, ...) everywhere is the best we can do. Sorry for the noise.

critson marked 2 inline comments as done.Aug 27 2021, 3:04 AM
critson added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13108

No worries, I have spent a while thinking about it too.
I tested both versions so far on image output tests and could not detect and meaningful difference.
I also implemented the mixed preservation intended with your comment.
However, let's just go with the original.

critson updated this revision to Diff 369055.Aug 27 2021, 3:06 AM
critson marked an inline comment as done.
  • Revert opcode preservation changes
foad added a comment.Aug 27 2021, 3:29 AM

Still LGTM, thanks!

This revision was landed with ongoing or failed builds.Aug 27 2021, 3:49 AM
This revision was automatically updated to reflect the committed changes.