This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
4
DAGCombiner.cpp
-
test/CodeGen/RISCV/rvv/
-
CodeGen/
-
RISCV/
-
rvv/
1
fold-fadd-and-fmul.ll

Differential D126273

[DAGCombiner][VP] Add DAGCombine for merging VP_FADD and VP_FMUL to VP_FMA.
Needs ReviewPublic

Authored by fakepaper56 on May 23 2022, 9:24 PM.

Download Raw Diff

Details

Reviewers

craig.topper
frasercrmck
RKSimon
simoll
spatel
rogfer01

Summary

The patch does two DAGcombines:
fold (vp_fadd a, (vp_mul b, c)) to (vp_fma b, c, a)
fold (vp_fadd (vp_mul a, b), c) to (vp_fma a, b, c)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fakepaper56 created this revision.May 23 2022, 9:24 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 9:24 PM

Herald added subscribers: luke957, StephenFan, ecnelises and 22 others. · View Herald Transcript

fakepaper56 requested review of this revision.May 23 2022, 9:24 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 9:24 PM

Herald added subscribers: llvm-commits, • pcwang-thead, MaskRay. · View Herald Transcript

Using clang-format.

craig.topper added reviewers: spatel, rogfer01.May 23 2022, 9:36 PM

craig.topper added inline comments.May 23 2022, 9:39 PM

llvm/test/CodeGen/RISCV/rvv/fold-fadd-and-fmul.ll
2	Test should probably have `-vp-` in it's name

Harbormaster completed remote builds in B165994: Diff 431586.May 23 2022, 10:05 PM

When I test tsvc.

The IR is:

@llvm.fmuladd.nxv2f32(<vscale x 2 x float>.....

Not

@llvm.riscv.vfmul.nxv2f32.nxv2f32(<vscale x 2 x float> ......
@llvm.riscv.vfadd.nxv2f32.nxv2f32(<vscale x 2 x float> ......

So I am not sure, we need merge vp.fmul and vp.fadd to vp.fma. maybe vp.fma is enough?

Rename fold-fadd-and-fmul.ll to fold-vp-fadd-and-vp-fmul.ll.

In D126273#3533580, @liaolucy wrote:
When I test tsvc.

The IR is:
@llvm.fmuladd.nxv2f32(<vscale x 2 x float>.....
Not
@llvm.riscv.vfmul.nxv2f32.nxv2f32(<vscale x 2 x float> ......
@llvm.riscv.vfadd.nxv2f32.nxv2f32(<vscale x 2 x float> ......
So I am not sure, we need merge vp.fmul and vp.fadd to vp.fma. maybe vp.fma is enough?

I am not sure what is tscv. Could you tell its full name?
But I think the case may happen when loop vectorizer generates vector prediction intrinsics.

Harbormaster completed remote builds in B166031: Diff 431644.May 24 2022, 5:39 AM

In D126273#3534004, @fakepaper56 wrote:
In D126273#3533580, @liaolucy wrote:
When I test tsvc.

The IR is:
@llvm.fmuladd.nxv2f32(<vscale x 2 x float>.....
Not
@llvm.riscv.vfmul.nxv2f32.nxv2f32(<vscale x 2 x float> ......
@llvm.riscv.vfadd.nxv2f32.nxv2f32(<vscale x 2 x float> ......
So I am not sure, we need merge vp.fmul and vp.fadd to vp.fma. maybe vp.fma is enough?
I am not sure what is tscv. Could you tell its full name?
But I think the case may happen when loop vectorizer generates vector prediction intrinsics.

tsvc:
https://github.com/llvm/llvm-test-suite/blob/main/MultiSource/Benchmarks/TSVC/tsc.inc

RKSimon added inline comments.May 24 2022, 6:04 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
23056	It'd be very useful if we pulled stuff like this out and shared it between all the various FMA generating combines.

simoll added inline comments.May 24 2022, 6:36 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
23056	Yes. It is possible to lift the existing DAGCombiner Patterns to work on VP SDNodes as well as on regular SDNodes. I've implemented this in the LLVM stack for SX-Aurora under the LLVM license, ie it could be upstreamed: https://github.com/sx-aurora-dev/llvm-project/blob/feature/generalized_pattern_rewriting/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L13859 You could rephrase this patch using the generalized pattern-rewriting technique. I'd happy to help with that! The same applies to https://reviews.llvm.org/D121187

simoll added inline comments.May 24 2022, 6:47 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
23056	Some more info on the generalized pattern-matching thing: The visitFADDForFMACombine function is templatized. The template parameter abstracts away the actual matching and SDNode creation. The flow of the code is the same (SDNode matching and creation is re-directed through the matcher class, that's all). The templated function is instantiated twice, once with the EmptyMatchContext for regular SDNodes and once with the VPMatchContext for VP-SDNodes.

In D126273#3533580, @liaolucy wrote:
When I test tsvc.

The IR is:
@llvm.fmuladd.nxv2f32(<vscale x 2 x float>.....
Not
@llvm.riscv.vfmul.nxv2f32.nxv2f32(<vscale x 2 x float> ......
@llvm.riscv.vfadd.nxv2f32.nxv2f32(<vscale x 2 x float> ......
So I am not sure, we need merge vp.fmul and vp.fadd to vp.fma. maybe vp.fma is enough?

fmuladd is use with -ffp-contract=on(the default). fmul+fadd is used with -ffast-math or -ffp-contract=fast or -Ofast.

In D126273#3534429, @craig.topper wrote:
In D126273#3533580, @liaolucy wrote:
When I test tsvc.

The IR is:
@llvm.fmuladd.nxv2f32(<vscale x 2 x float>.....
Not
@llvm.riscv.vfmul.nxv2f32.nxv2f32(<vscale x 2 x float> ......
@llvm.riscv.vfadd.nxv2f32.nxv2f32(<vscale x 2 x float> ......
So I am not sure, we need merge vp.fmul and vp.fadd to vp.fma. maybe vp.fma is enough?
fmuladd is use with -ffp-contract=on(the default). fmul+fadd is used with -ffast-math or -ffp-contract=fast or -Ofast.

Got it. Thank you very much.

craig.topper added inline comments.May 24 2022, 9:42 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

23050

What about these early exits from the original function

// If the addition is not contractable, do not combine.                        
if (!AllowFusionGlobally && !N->getFlags().hasAllowContract())                 
  return SDValue();                                                            
                                                                               
if (TLI.generateFMAsInMachineCombiner(VT, OptLevel))                           
  return SDValue();

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

52 lines

test/

CodeGen/

RISCV/

rvv/

fold-fadd-and-fmul.ll

33 lines

Diff 431584

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

//===- DAGCombiner.cpp - Implement a DAG node combiner --------------------===//		//===- DAGCombiner.cpp - Implement a DAG node combiner --------------------===//
		Lint: Lint Inline Actions clang-format suggested style edits found: Lint: Lint: clang-format suggested style edits found:
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This pass combines dag nodes to form fewer, simpler DAG nodes. It can be run		// This pass combines dag nodes to form fewer, simpler DAG nodes. It can be run
▲ Show 20 Lines • Show All 502 Lines • ▼ Show 20 Lines	private:
SDValue visitFP16_TO_FP(SDNode *N);		SDValue visitFP16_TO_FP(SDNode *N);
SDValue visitVECREDUCE(SDNode *N);		SDValue visitVECREDUCE(SDNode *N);
SDValue visitVPOp(SDNode *N);		SDValue visitVPOp(SDNode *N);

SDValue visitFADDForFMACombine(SDNode *N);		SDValue visitFADDForFMACombine(SDNode *N);
SDValue visitFSUBForFMACombine(SDNode *N);		SDValue visitFSUBForFMACombine(SDNode *N);
SDValue visitFMULForFMADistributiveCombine(SDNode *N);		SDValue visitFMULForFMADistributiveCombine(SDNode *N);

		SDValue visitVPFADDForVPFMACombine(SDNode *N);

SDValue XformToShuffleWithZero(SDNode *N);		SDValue XformToShuffleWithZero(SDNode *N);
bool reassociationCanBreakAddressingModePattern(unsigned Opc,		bool reassociationCanBreakAddressingModePattern(unsigned Opc,
const SDLoc &DL,		const SDLoc &DL,
SDNode *N,		SDNode *N,
SDValue N0,		SDValue N0,
SDValue N1);		SDValue N1);
SDValue reassociateOpsCommutative(unsigned Opc, const SDLoc &DL, SDValue N0,		SDValue reassociateOpsCommutative(unsigned Opc, const SDLoc &DL, SDValue N0,
SDValue N1);		SDValue N1);
▲ Show 20 Lines • Show All 22,492 Lines • ▼ Show 20 Lines	if ((Opcode == ISD::VECREDUCE_OR &&
(Opcode == ISD::VECREDUCE_AND &&		(Opcode == ISD::VECREDUCE_AND &&
(N0.getOperand(0).isUndef() \|\| isAllOnesOrAllOnesSplat(Vec))))		(N0.getOperand(0).isUndef() \|\| isAllOnesOrAllOnesSplat(Vec))))
return DAG.getNode(Opcode, SDLoc(N), N->getValueType(0), Subvec);		return DAG.getNode(Opcode, SDLoc(N), N->getValueType(0), Subvec);
}		}

return SDValue();		return SDValue();
}		}

		/// Try to perform VP_FMA combining on a given VP_FADD node.
		SDValue DAGCombiner::visitVPFADDForVPFMACombine(SDNode *N) {
		SDValue N0 = N->getOperand(0);
		SDValue N1 = N->getOperand(1);
		SDValue Mask = N->getOperand(2);
		SDValue VL = N->getOperand(3);
		EVT VT = N->getValueType(0);
		SDLoc SL(N);

		const TargetOptions &Options = DAG.getTarget().Options;

		bool HasFMA =
		TLI.isFMAFasterThanFMulAndFAdd(DAG.getMachineFunction(), VT) &&
		(!LegalOperations \|\| TLI.isOperationLegalOrCustom(ISD::VP_FMA, VT));

		if (!HasFMA)
		return SDValue();

		bool AllowFusionGlobally = (Options.AllowFPOpFusion == FPOpFusion::Fast \|\|
		Options.UnsafeFPMath);

		craig.topperUnsubmitted Not Done Reply Inline Actions What about these early exits from the original function // If the addition is not contractable, do not combine. if (!AllowFusionGlobally && !N->getFlags().hasAllowContract()) return SDValue(); if (TLI.generateFMAsInMachineCombiner(VT, OptLevel)) return SDValue(); craig.topper: What about these early exits from the original function ``` // If the addition is not…
		// Is the node an VP_FMUL and contractable either due to global flags or
		// SDNodeFlags.
		auto isContractableVPFMUL = [AllowFusionGlobally](SDValue N) {
		if (N.getOpcode() != ISD::VP_FMUL)
		return false;
		return AllowFusionGlobally \|\| N->getFlags().hasAllowContract();
		RKSimonUnsubmitted Not Done Reply Inline Actions It'd be very useful if we pulled stuff like this out and shared it between all the various FMA generating combines. RKSimon: It'd be very useful if we pulled stuff like this out and shared it between all the various FMA…
		simollUnsubmitted Not Done Reply Inline Actions Yes. It is possible to lift the existing DAGCombiner Patterns to work on VP SDNodes as well as on regular SDNodes. I've implemented this in the LLVM stack for SX-Aurora under the LLVM license, ie it could be upstreamed: https://github.com/sx-aurora-dev/llvm-project/blob/feature/generalized_pattern_rewriting/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L13859 You could rephrase this patch using the generalized pattern-rewriting technique. I'd happy to help with that! The same applies to https://reviews.llvm.org/D121187 simoll: Yes. It is possible to lift the existing DAGCombiner Patterns to work on VP SDNodes as well as…
		simollUnsubmitted Not Done Reply Inline Actions Some more info on the generalized pattern-matching thing: The visitFADDForFMACombine function is templatized. The template parameter abstracts away the actual matching and SDNode creation. The flow of the code is the same (SDNode matching and creation is re-directed through the matcher class, that's all). The templated function is instantiated twice, once with the EmptyMatchContext for regular SDNodes and once with the VPMatchContext for VP-SDNodes. simoll: Some more info on the generalized pattern-matching thing: The [visitFADDForFMACombine…
		};

		// fold (vp_fadd (vp_fmul x, y), z) -> (vp_fma x, y, z)
		if (isContractableVPFMUL(N0) && N0->hasOneUse())
		return DAG.getNode(ISD::VP_FMA, SL, VT, N0.getOperand(0),
		N0.getOperand(1), N1, Mask, VL);

		// fold (vp_fadd x, (vp_fmul y, z)) -> (vp_fma y, z, x)
		// Note: Commutes VP_FADD operands.
		if (isContractableVPFMUL(N1) && N1->hasOneUse())
		return DAG.getNode(ISD::VP_FMA, SL, VT, N1.getOperand(0),
		N1.getOperand(1), N0, Mask, VL);

		return SDValue();
		}

SDValue DAGCombiner::visitVPOp(SDNode *N) {		SDValue DAGCombiner::visitVPOp(SDNode *N) {
// VP operations in which all vector elements are disabled - either by		// VP operations in which all vector elements are disabled - either by
// determining that the mask is all false or that the EVL is 0 - can be		// determining that the mask is all false or that the EVL is 0 - can be
// eliminated.		// eliminated.
bool AreAllEltsDisabled = false;		bool AreAllEltsDisabled = false;
if (auto EVLIdx = ISD::getVPExplicitVectorLengthIdx(N->getOpcode()))		if (auto EVLIdx = ISD::getVPExplicitVectorLengthIdx(N->getOpcode()))
AreAllEltsDisabled \|= isNullConstant(N->getOperand(*EVLIdx));		AreAllEltsDisabled \|= isNullConstant(N->getOperand(*EVLIdx));
if (auto MaskIdx = ISD::getVPMaskIdx(N->getOpcode()))		if (auto MaskIdx = ISD::getVPMaskIdx(N->getOpcode()))
AreAllEltsDisabled \|=		AreAllEltsDisabled \|=
ISD::isConstantSplatVectorAllZeros(N->getOperand(*MaskIdx).getNode());		ISD::isConstantSplatVectorAllZeros(N->getOperand(*MaskIdx).getNode());

// This is the only generic VP combine we support for now.		// This is the only generic VP combine we support for now.
if (!AreAllEltsDisabled)		if (!AreAllEltsDisabled) {
		switch (N->getOpcode()) {
		case ISD::VP_FADD:
		return visitVPFADDForVPFMACombine(N);
		}
return SDValue();		return SDValue();
		}

// Binary operations can be replaced by UNDEF.		// Binary operations can be replaced by UNDEF.
if (ISD::isVPBinaryOp(N->getOpcode()))		if (ISD::isVPBinaryOp(N->getOpcode()))
return DAG.getUNDEF(N->getValueType(0));		return DAG.getUNDEF(N->getValueType(0));

// VP Memory operations can be replaced by either the chain (stores) or the		// VP Memory operations can be replaced by either the chain (stores) or the
// chain + undef (loads).		// chain + undef (loads).
if (const auto *MemSD = dyn_cast<MemSDNode>(N)) {		if (const auto *MemSD = dyn_cast<MemSDNode>(N)) {
▲ Show 20 Lines • Show All 1,622 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/fold-fadd-and-fmul.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=riscv64 -mattr=+v -target-abi=lp64d -verify-machineinstrs < %s \| FileCheck %s
				craig.topperUnsubmitted Not Done Reply Inline Actions Test should probably have `-vp-` in it's name craig.topper: Test should probably have `-vp-` in it's name

				declare <vscale x 1 x double> @llvm.vp.fmul.nxv1f64(<vscale x 1 x double> %x, <vscale x 1 x double> %y, <vscale x 1 x i1> %m, i32 %vl)
				declare <vscale x 1 x double> @llvm.vp.fadd.nxv1f64(<vscale x 1 x double> %x, <vscale x 1 x double> %y, <vscale x 1 x i1> %m, i32 %vl)

				define <vscale x 1 x double> @test1(<vscale x 1 x double> %x, <vscale x 1 x double> %y, <vscale x 1 x double> %z, <vscale x 1 x i1> %m, i32 %vl) {
				; CHECK-LABEL: test1:
				; CHECK: # %bb.0:
				; CHECK-NEXT: slli a0, a0, 32
				; CHECK-NEXT: srli a0, a0, 32
				; CHECK-NEXT: vsetvli zero, a0, e64, m1, tu, mu
				; CHECK-NEXT: vfmadd.vv v9, v8, v10, v0.t
				; CHECK-NEXT: vmv1r.v v8, v9
				; CHECK-NEXT: ret
				%1 = call fast <vscale x 1 x double> @llvm.vp.fmul.nxv1f64(<vscale x 1 x double> %x, <vscale x 1 x double> %y, <vscale x 1 x i1> %m, i32 %vl)
				%2 = call fast <vscale x 1 x double> @llvm.vp.fadd.nxv1f64(<vscale x 1 x double> %1, <vscale x 1 x double> %z, <vscale x 1 x i1> %m, i32 %vl)
				ret <vscale x 1 x double> %2
				}

				define <vscale x 1 x double> @test2(<vscale x 1 x double> %x, <vscale x 1 x double> %y, <vscale x 1 x double> %z, <vscale x 1 x i1> %m, i32 %vl) {
				; CHECK-LABEL: test2:
				; CHECK: # %bb.0:
				; CHECK-NEXT: slli a0, a0, 32
				; CHECK-NEXT: srli a0, a0, 32
				; CHECK-NEXT: vsetvli zero, a0, e64, m1, tu, mu
				; CHECK-NEXT: vfmadd.vv v9, v8, v10, v0.t
				; CHECK-NEXT: vmv1r.v v8, v9
				; CHECK-NEXT: ret
				%1 = call fast <vscale x 1 x double> @llvm.vp.fmul.nxv1f64(<vscale x 1 x double> %x, <vscale x 1 x double> %y, <vscale x 1 x i1> %m, i32 %vl)
				%2 = call fast <vscale x 1 x double> @llvm.vp.fadd.nxv1f64(<vscale x 1 x double> %z, <vscale x 1 x double> %1, <vscale x 1 x i1> %m, i32 %vl)
				ret <vscale x 1 x double> %2
				}

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner][VP] Add DAGCombine for merging VP_FADD and VP_FMUL to VP_FMA.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 431584

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/RISCV/rvv/fold-fadd-and-fmul.ll

[DAGCombiner][VP] Add DAGCombine for merging VP_FADD and VP_FMUL to VP_FMA.
Needs ReviewPublic