This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner][VP] Add DAGCombine for merging VP_FADD and VP_FMUL to VP_FMA.
Needs ReviewPublic

Authored by fakepaper56 on May 23 2022, 9:24 PM.

Details

Summary

The patch does two DAGcombines:
fold (vp_fadd a, (vp_mul b, c)) to (vp_fma b, c, a)
fold (vp_fadd (vp_mul a, b), c) to (vp_fma a, b, c)

Diff Detail

Event Timeline

fakepaper56 created this revision.May 23 2022, 9:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 9:24 PM
fakepaper56 requested review of this revision.May 23 2022, 9:24 PM

Using clang-format.

craig.topper added inline comments.May 23 2022, 9:39 PM
llvm/test/CodeGen/RISCV/rvv/fold-fadd-and-fmul.ll
1

Test should probably have -vp- in it's name

When I test tsvc.

The IR is:

@llvm.fmuladd.nxv2f32(<vscale x 2 x float>.....

Not

@llvm.riscv.vfmul.nxv2f32.nxv2f32(<vscale x 2 x float> ......
@llvm.riscv.vfadd.nxv2f32.nxv2f32(<vscale x 2 x float> ......

So I am not sure, we need merge vp.fmul and vp.fadd to vp.fma. maybe vp.fma is enough?

Rename fold-fadd-and-fmul.ll to fold-vp-fadd-and-vp-fmul.ll.

When I test tsvc.

The IR is:

@llvm.fmuladd.nxv2f32(<vscale x 2 x float>.....

Not

@llvm.riscv.vfmul.nxv2f32.nxv2f32(<vscale x 2 x float> ......
@llvm.riscv.vfadd.nxv2f32.nxv2f32(<vscale x 2 x float> ......

So I am not sure, we need merge vp.fmul and vp.fadd to vp.fma. maybe vp.fma is enough?

I am not sure what is tscv. Could you tell its full name?
But I think the case may happen when loop vectorizer generates vector prediction intrinsics.

When I test tsvc.

The IR is:

@llvm.fmuladd.nxv2f32(<vscale x 2 x float>.....

Not

@llvm.riscv.vfmul.nxv2f32.nxv2f32(<vscale x 2 x float> ......
@llvm.riscv.vfadd.nxv2f32.nxv2f32(<vscale x 2 x float> ......

So I am not sure, we need merge vp.fmul and vp.fadd to vp.fma. maybe vp.fma is enough?

I am not sure what is tscv. Could you tell its full name?
But I think the case may happen when loop vectorizer generates vector prediction intrinsics.

tsvc:
https://github.com/llvm/llvm-test-suite/blob/main/MultiSource/Benchmarks/TSVC/tsc.inc

RKSimon added inline comments.May 24 2022, 6:04 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
23061

It'd be very useful if we pulled stuff like this out and shared it between all the various FMA generating combines.

simoll added inline comments.May 24 2022, 6:36 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
23061

Yes. It is possible to lift the existing DAGCombiner Patterns to work on VP SDNodes as well as on regular SDNodes.
I've implemented this in the LLVM stack for SX-Aurora under the LLVM license, ie it could be upstreamed:

https://github.com/sx-aurora-dev/llvm-project/blob/feature/generalized_pattern_rewriting/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L13859

You could rephrase this patch using the generalized pattern-rewriting technique. I'd happy to help with that!

The same applies to https://reviews.llvm.org/D121187

simoll added inline comments.May 24 2022, 6:47 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
23061

Some more info on the generalized pattern-matching thing:

The visitFADDForFMACombine function is templatized. The template parameter abstracts away the actual matching and SDNode creation. The flow of the code is the same (SDNode matching and creation is re-directed through the matcher class, that's all).

The templated function is instantiated twice, once with the EmptyMatchContext for regular SDNodes and once with the VPMatchContext for VP-SDNodes.

When I test tsvc.

The IR is:

@llvm.fmuladd.nxv2f32(<vscale x 2 x float>.....

Not

@llvm.riscv.vfmul.nxv2f32.nxv2f32(<vscale x 2 x float> ......
@llvm.riscv.vfadd.nxv2f32.nxv2f32(<vscale x 2 x float> ......

So I am not sure, we need merge vp.fmul and vp.fadd to vp.fma. maybe vp.fma is enough?

fmuladd is use with -ffp-contract=on(the default). fmul+fadd is used with -ffast-math or -ffp-contract=fast or -Ofast.

When I test tsvc.

The IR is:

@llvm.fmuladd.nxv2f32(<vscale x 2 x float>.....

Not

@llvm.riscv.vfmul.nxv2f32.nxv2f32(<vscale x 2 x float> ......
@llvm.riscv.vfadd.nxv2f32.nxv2f32(<vscale x 2 x float> ......

So I am not sure, we need merge vp.fmul and vp.fadd to vp.fma. maybe vp.fma is enough?

fmuladd is use with -ffp-contract=on(the default). fmul+fadd is used with -ffast-math or -ffp-contract=fast or -Ofast.

Got it. Thank you very much.

craig.topper added inline comments.May 24 2022, 9:42 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
23055

What about these early exits from the original function

// If the addition is not contractable, do not combine.                        
if (!AllowFusionGlobally && !N->getFlags().hasAllowContract())                 
  return SDValue();                                                            
                                                                               
if (TLI.generateFMAsInMachineCombiner(VT, OptLevel))                           
  return SDValue();