The patch does two DAGcombines:
fold (vp_fadd a, (vp_mul b, c)) to (vp_fma b, c, a)
fold (vp_fadd (vp_mul a, b), c) to (vp_fma a, b, c)
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/RISCV/rvv/fold-fadd-and-fmul.ll | ||
---|---|---|
2 | Test should probably have -vp- in it's name |
When I test tsvc.
The IR is:
@llvm.fmuladd.nxv2f32(<vscale x 2 x float>.....
Not
@llvm.riscv.vfmul.nxv2f32.nxv2f32(<vscale x 2 x float> ...... @llvm.riscv.vfadd.nxv2f32.nxv2f32(<vscale x 2 x float> ......
So I am not sure, we need merge vp.fmul and vp.fadd to vp.fma. maybe vp.fma is enough?
I am not sure what is tscv. Could you tell its full name?
But I think the case may happen when loop vectorizer generates vector prediction intrinsics.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
23056 | It'd be very useful if we pulled stuff like this out and shared it between all the various FMA generating combines. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
23056 | Yes. It is possible to lift the existing DAGCombiner Patterns to work on VP SDNodes as well as on regular SDNodes. You could rephrase this patch using the generalized pattern-rewriting technique. I'd happy to help with that! The same applies to https://reviews.llvm.org/D121187 |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
23056 | Some more info on the generalized pattern-matching thing: The visitFADDForFMACombine function is templatized. The template parameter abstracts away the actual matching and SDNode creation. The flow of the code is the same (SDNode matching and creation is re-directed through the matcher class, that's all). The templated function is instantiated twice, once with the EmptyMatchContext for regular SDNodes and once with the VPMatchContext for VP-SDNodes. |
fmuladd is use with -ffp-contract=on(the default). fmul+fadd is used with -ffast-math or -ffp-contract=fast or -Ofast.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
23050 | What about these early exits from the original function // If the addition is not contractable, do not combine. if (!AllowFusionGlobally && !N->getFlags().hasAllowContract()) return SDValue(); if (TLI.generateFMAsInMachineCombiner(VT, OptLevel)) return SDValue(); |
clang-format suggested style edits found: