Download Raw Diff

Details

Reviewers

majnemer
efriedma
gberry
RKSimon
craig.topper
arsenm

Commits

rG84a238dd6204: [DAGCombine] Transform (fadd A, (fmul B, -2.0)) -> (fsub A, (fadd B, B)).
rL302153: [DAGCombine] Transform (fadd A, (fmul B, -2.0)) -> (fsub A, (fadd B, B)).

Summary

PTAL

Chad

Diff Detail

Repository: rL LLVM

Event Timeline

mcrosier created this revision.Apr 27 2017, 9:19 AM

I would guess we should prefer fmul as the canonical form, for the same reason we prefer "shl %x, 1" over "add %x, %x": we optimize instructions where hasOneUse() is true more aggressively. I guess it doesn't make a big difference either way, though.

Either way, please add a testcase for "fadd %x, %x", to confirm that we canonicalize both fadd and fmul to the same form.

-Add test case to check fadd X, X is canonical form, per Eli's request.

FWIW, I could also fix this in the DAG combiner. The particular case I care about looks like 'a * b - 2.0 * c', but the expression is transformed to 'a * c + -2.0 * c' before we hit the DAG Combine of interest (which only expects a +2.0).

Another data-point: reassociate currently prefers to canonicalize "fadd %x, %x", to "fmul %x, 2". We don't want to fight back and forth in IR.

In D32596#740344, @efriedma wrote:

Another data-point: reassociate currently prefers to canonicalize "fadd %x, %x", to "fmul %x, 2". We don't want to fight back and forth in IR.

Thanks, Eli. I'll work on extending the DAG combine in that case. Does that sound reasonable?

-Rewrite as a DAG combine.

Herald added a subscriber: nhaehnle. · View Herald TranscriptApr 28 2017, 11:32 AM

Not sure if we need a target hook for this...? Replacing one instruction with two might not always be a good idea.

Otherwise looks fine.

This is worse for AMDGPU because it requires a larger instruction encoding for f16/f32. For f64 this is better

In D32596#743761, @arsenm wrote:

This is worse for AMDGPU because it requires a larger instruction encoding for f16/f32. For f64 this is better

Specifically if the user has a neg source modifier. Otherwise it is always worse

spatel added a subscriber: spatel.May 2 2017, 3:12 PM

RKSimon added inline comments.May 2 2017, 3:20 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9583 ↗	(On Diff #97131)	Only do this if isFNegFree()?

spatel added inline comments.May 2 2017, 3:21 PM

test/CodeGen/X86/fmul-combines.ll
21–22 ↗	(On Diff #97131)	What we don't see in this check, but you probably know or can infer: x86 doesn't have an 'fneg' op for SSE/AVX (they ran out of transistors?). So we load the 128-bit sign-bit mask from memory: xorps LCPI1_0(%rip), %xmm0 It's also true that the mul version would load a '2.0', but this adds an extra op, and I don't think that's good for any x86 target. There's one other reason this may not be good: there are actually CPUs (hello, Jaguar!) that have faster FP multiplies than FP adds.

Address reviewers feedback by narrowing transform so that the negation is "free".

@efriedma: This transform now replaces 2 instruction for 2 instruction in the worst case. For AArch64 it's actually replaces 3 for 2 because we now avoid materializing the -2.0. Probably true for other targets.
@arsenm: I don't believe this new version causes the instruction encoding to change in size, but please correct me if I'm wrong.
@spatel: This should addresses your comment w.r.t. X86. If you wish, I can add a target-hook to predicate this transform if the target supports fast multiplication. Please let me know if that's still a concern.

Thank you everyone for your feedback. Very much appreciated.

Herald added a subscriber: wdng. · View Herald TranscriptMay 3 2017, 9:39 AM

In D32596#744882, @mcrosier wrote:

@arsenm: I don't believe this new version causes the instruction encoding to change in size, but please correct me if I'm wrong.

Yes, this is always better if it's really an fsub since the user then doesn't matter

This revision is now accepted and ready to land.May 3 2017, 10:14 AM

In D32596#744944, @arsenm wrote:

In D32596#744882, @mcrosier wrote:

@arsenm: I don't believe this new version causes the instruction encoding to change in size, but please correct me if I'm wrong.

Yes, this is always better if it's really an fsub since the user then doesn't matter

Great. I'll wait for other's feedback before committing.

efriedma added inline comments.May 3 2017, 12:20 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9468 ↗	(On Diff #97667)	hasOneUse()?

mcrosier added inline comments.May 3 2017, 12:48 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9468 ↗	(On Diff #97667)	Very good point! One second.

-Check for a single use, so that we know the fmul will be folded away.

LGTM.

My only comment is that you may be missing additional cases where the fneg could be folded away, but perhaps those can be fixed in a follow up change.

In D32596#745247, @gberry wrote:

My only comment is that you may be missing additional cases where the fneg could be folded away, but perhaps those can be fixed in a follow up change.

Yes, I'll investigate once this patch lands.

In D32596#745262, @mcrosier wrote:

In D32596#745247, @gberry wrote:

My only comment is that you may be missing additional cases where the fneg could be folded away, but perhaps those can be fixed in a follow up change.

Yes, I'll investigate once this patch lands.

Seems like isNegatibleForFree() would be the place to recognize patterns like this, and then we'd have a corresponding special case for -2.0 in GetNegatedExpression().

As written, this transform should be good for all x86 because it removes a constant load, so no objections, but...
I'm confused about our handling of FP folds. We're saying that this is a universally good (all targets and no relaxed FP needed) codegen fold, but we don't want it in IR/InstCombine because we prefer constants there.

Some tests to think about below. We'll fold the first 3 in DAGCombiner after this patch (universally afaict), but InstCombine does nothing with those. Should InstCombine fold fnegs into constants and fsub -> fadd?

The last case is transformed partially in InstCombine (div -> mul), but DAGCombiner does nothing with that. It's ok to not have a DAG fold for that because nothing this late is producing an fdiv?

define float @add_mul_neg2(float %a, float %b) {
  %mul = fmul float %b, -2.0
  %add = fadd float %a, %mul
  ret float %add
}

define float @sub_mul_neg2(float %a, float %b) {
  %mul = fmul float %b, -2.0
  %sub = fsub float %a, %mul
  ret float %sub
}

define float @mul_mul_neg2(float %a, float %b) {
  %mul = fmul float %b, -2.0
  %neg = fsub float -0.0, %a
  %mul2 = fmul float %neg, %mul
  ret float %mul2
}

define float @sub_div_neghalf(float %a, float %b) {
  %div = fdiv float %b, -0.5
  %sub = fsub float %a, %div
  ret float %sub
}

Closed by commit rL302153: [DAGCombine] Transform (fadd A, (fmul B, -2.0)) -> (fsub A, (fadd B, B)). (authored by mcrosier). · Explain WhyMay 4 2017, 7:28 AM

This revision was automatically updated to reflect the committed changes.

In D32596#745330, @spatel wrote:

Seems like isNegatibleForFree() would be the place to recognize patterns like this, and then we'd have a corresponding special case for -2.0 in GetNegatedExpression().

I'll investigate this suggestion. Thanks.

As written, this transform should be good for all x86 because it removes a constant load, so no objections, but...

Okay, good.

I'm confused about our handling of FP folds. We're saying that this is a universally good (all targets and no relaxed FP needed) codegen fold, but we don't want it in IR/InstCombine because we prefer constants there.

As Eli pointed out, the reassociation pass prefers the fmul with constant (to expose additional factoring opportunities, IIRC). He also pointed out that we should prefer fmul X, 2.0 as the canonical form, for the same reason we prefer "shl %x, 1" over "add %x, %x": we optimize instructions where hasOneUse() is true more aggressively. Given these two implications I went ahead and implemented this as a DAG combine. However, I think it might make sense to have InstCombine canonicalize to the mul with constant form as well.

Some tests to think about below. We'll fold the first 3 in DAGCombiner after this patch (universally afaict), but InstCombine does nothing with those. Should InstCombine fold fnegs into constants and fsub -> fadd?

The last case is transformed partially in InstCombine (div -> mul), but DAGCombiner does nothing with that. It's ok to not have a DAG fold for that because nothing this late is producing an fdiv?

While InstCombine does nothing, the reassociation pass will canonicalize to

(fsub A, (fmul/fdiv B, -C)) -> (fadd A, (fmul/fdiv B, C))
(fadd A, (fmul/fdiv B, -C)) -> (fsub A, (fmul/fdiv B, C))

where C is a constant and

(fsub A, (fdiv b, -0.5)) -> (fadd A, (fmul b, 2.0))

for the last test.

We could make InstCombine prefer these forms as well and that might make a difference, but it might not. My guess is it probably doesn't matter, but I'll experiment.

In D32596#746288, @mcrosier wrote:
While InstCombine does nothing, the reassociation pass will canonicalize to
(fsub A, (fmul/fdiv B, -C)) -> (fadd A, (fmul/fdiv B, C))
(fadd A, (fmul/fdiv B, -C)) -> (fsub A, (fmul/fdiv B, C))

Thanks for checking those out. I haven't looked at the reassociation pass very much. I'm surprised to see it flip a constant's sign and create an fsub rather than fadd. Any ideas why that is a good thing to do?

In D32596#746314, @spatel wrote:
In D32596#746288, @mcrosier wrote:
While InstCombine does nothing, the reassociation pass will canonicalize to
(fsub A, (fmul/fdiv B, -C)) -> (fadd A, (fmul/fdiv B, C))
(fadd A, (fmul/fdiv B, -C)) -> (fsub A, (fmul/fdiv B, C))
Thanks for checking those out. I haven't looked at the reassociation pass very much. I'm surprised to see it flip a constant's sign and create an fsub rather than fadd. Any ideas why that is a good thing to do?

AFAICT reassociation is trying to force all constants to be positive so it can increase the opportunities for factorization. This should also allows CSE and GVN to eliminate more duplicate expressions (per D4904 and D5363).

In D32596#745247, @gberry wrote:

My only comment is that you may be missing additional cases where the fneg could be folded away, but perhaps those can be fixed in a follow up change.

Here's at least one case we're missing: https://bugs.llvm.org/show_bug.cgi?id=32939

mcrosier mentioned this in D39830: [DAGCombine] Transform (A + -2.0*B*C) -> (A - (B+B)*C).Nov 10 2017, 9:21 AM

Diff 97825

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,463 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFMULForFMADistributiveCombine(SDNode *N) {
if (SDValue FMA = FuseFSUB(N0, N1))		if (SDValue FMA = FuseFSUB(N0, N1))
return FMA;		return FMA;
if (SDValue FMA = FuseFSUB(N1, N0))		if (SDValue FMA = FuseFSUB(N1, N0))
return FMA;		return FMA;

return SDValue();		return SDValue();
}		}

		static bool isFMulNegTwo(SDValue &N) {
		if (N.getOpcode() != ISD::FMUL)
		return false;
		if (ConstantFPSDNode *CFP = isConstOrConstSplatFP(N.getOperand(1)))
		return CFP->isExactlyValue(-2.0);
		return false;
		}

SDValue DAGCombiner::visitFADD(SDNode *N) {		SDValue DAGCombiner::visitFADD(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
bool N0CFP = isConstantFPBuildVectorOrConstantFP(N0);		bool N0CFP = isConstantFPBuildVectorOrConstantFP(N0);
bool N1CFP = isConstantFPBuildVectorOrConstantFP(N1);		bool N1CFP = isConstantFPBuildVectorOrConstantFP(N1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc DL(N);		SDLoc DL(N);
const TargetOptions &Options = DAG.getTarget().Options;		const TargetOptions &Options = DAG.getTarget().Options;
Show All 22 Lines	return DAG.getNode(ISD::FSUB, DL, VT, N0,
GetNegatedExpression(N1, DAG, LegalOperations), Flags);		GetNegatedExpression(N1, DAG, LegalOperations), Flags);

// fold (fadd (fneg A), B) -> (fsub B, A)		// fold (fadd (fneg A), B) -> (fsub B, A)
if ((!LegalOperations \|\| TLI.isOperationLegalOrCustom(ISD::FSUB, VT)) &&		if ((!LegalOperations \|\| TLI.isOperationLegalOrCustom(ISD::FSUB, VT)) &&
isNegatibleForFree(N0, LegalOperations, TLI, &Options) == 2)		isNegatibleForFree(N0, LegalOperations, TLI, &Options) == 2)
return DAG.getNode(ISD::FSUB, DL, VT, N1,		return DAG.getNode(ISD::FSUB, DL, VT, N1,
GetNegatedExpression(N0, DAG, LegalOperations), Flags);		GetNegatedExpression(N0, DAG, LegalOperations), Flags);

		// fold (fadd A, (fmul B, -2.0)) -> (fsub A, (fadd B, B))
		// fold (fadd (fmul B, -2.0), A) -> (fsub A, (fadd B, B))
		if ((isFMulNegTwo(N0) && N0.hasOneUse()) \|\|
		(isFMulNegTwo(N1) && N1.hasOneUse())) {
		bool N1IsFMul = isFMulNegTwo(N1);
		SDValue AddOp = N1IsFMul ? N1.getOperand(0) : N0.getOperand(0);
		SDValue Add = DAG.getNode(ISD::FADD, DL, VT, AddOp, AddOp, Flags);
		return DAG.getNode(ISD::FSUB, DL, VT, N1IsFMul ? N0 : N1, Add, Flags);
		}

// FIXME: Auto-upgrade the target/function-level option.		// FIXME: Auto-upgrade the target/function-level option.
if (Options.NoSignedZerosFPMath \|\| N->getFlags().hasNoSignedZeros()) {		if (Options.NoSignedZerosFPMath \|\| N->getFlags().hasNoSignedZeros()) {
// fold (fadd A, 0) -> A		// fold (fadd A, 0) -> A
if (ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1))		if (ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1))
if (N1C->isZero())		if (N1C->isZero())
return N0;		return N0;
}		}

▲ Show 20 Lines • Show All 7,074 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/fadd-combines.ll

				; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -verify-machineinstrs \| FileCheck %s

				; CHECK-LABEL: test1:
				; CHECK: fadd d1, d1, d1
				; CHECK: fsub d0, d0, d1
				define double @test1(double %a, double %b) local_unnamed_addr #0 {
				entry:
				%mul = fmul double %b, -2.000000e+00
				%add1 = fadd double %a, %mul
				ret double %add1
				}

				; DAGCombine will canonicalize 'a - 2.0b' to 'a + -2.0b'
				; CHECK-LABEL: test2:
				; CHECK: fadd d1, d1, d1
				; CHECK: fsub d0, d0, d1
				define double @test2(double %a, double %b) local_unnamed_addr #0 {
				entry:
				%mul = fmul double %b, 2.000000e+00
				%add1 = fsub double %a, %mul
				ret double %add1
				}

				; CHECK-LABEL: test3:
				; CHECK: fmul d0, d0, d1
				; CHECK: fadd d1, d2, d2
				; CHECK: fsub d0, d0, d1
				define double @test3(double %a, double %b, double %c) local_unnamed_addr #0 {
				entry:
				%mul = fmul double %a, %b
				%mul1 = fmul double %c, 2.000000e+00
				%sub = fsub double %mul, %mul1
				ret double %sub
				}

				; CHECK-LABEL: test4:
				; CHECK: fmul d0, d0, d1
				; CHECK: fadd d1, d2, d2
				; CHECK: fsub d0, d0, d1
				define double @test4(double %a, double %b, double %c) local_unnamed_addr #0 {
				entry:
				%mul = fmul double %a, %b
				%mul1 = fmul double %c, -2.000000e+00
				%add2 = fadd double %mul, %mul1
				ret double %add2
				}

				; CHECK-LABEL: test5:
				; CHECK: fadd v1.4s, v1.4s, v1.4s
				; CHECK: fsub v0.4s, v0.4s, v1.4s
				define <4 x float> @test5(<4 x float> %a, <4 x float> %b) {
				%mul = fmul <4 x float> %b, <float -2.0, float -2.0, float -2.0, float -2.0>
				%add = fadd <4 x float> %a, %mul
				ret <4 x float> %add
				}

				; CHECK-LABEL: test6:
				; CHECK: fadd v1.4s, v1.4s, v1.4s
				; CHECK: fsub v0.4s, v0.4s, v1.4s
				define <4 x float> @test6(<4 x float> %a, <4 x float> %b) {
				%mul = fmul <4 x float> %b, <float 2.0, float 2.0, float 2.0, float 2.0>
				%add = fsub <4 x float> %a, %mul
				ret <4 x float> %add
				}

				; Don't fold (fadd A, (fmul B, -2.0)) -> (fsub A, (fadd B, B)) if the fmul has
				; multiple uses.
				; CHECK-LABEL: test7:
				; CHECK: fmul
				define double @test7(double %a, double %b) local_unnamed_addr #0 {
				entry:
				%mul = fmul double %b, -2.000000e+00
				%add1 = fadd double %a, %mul
				call void @use(double %mul)
				ret double %add1
				}

				declare void @use(double)

llvm/trunk/test/CodeGen/AMDGPU/fmuladd.f32.ll

	Show First 20 Lines • Show All 185 Lines • ▼ Show 20 Lines

	; GCN-LABEL: {{^}}fmuladd_neg_2.0_a_b_f32			; GCN-LABEL: {{^}}fmuladd_neg_2.0_a_b_f32
	; GCN: {{buffer\|flat}}_load_dword [[R1:v[0-9]+]],			; GCN: {{buffer\|flat}}_load_dword [[R1:v[0-9]+]],
	; GCN: {{buffer\|flat}}_load_dword [[R2:v[0-9]+]],			; GCN: {{buffer\|flat}}_load_dword [[R2:v[0-9]+]],
	; GCN-FLUSH: v_mac_f32_e32 [[R2]], -2.0, [[R1]]			; GCN-FLUSH: v_mac_f32_e32 [[R2]], -2.0, [[R1]]

	; GCN-DENORM-FASTFMA: v_fma_f32 [[RESULT:v[0-9]+]], [[R1]], -2.0, [[R2]]			; GCN-DENORM-FASTFMA: v_fma_f32 [[RESULT:v[0-9]+]], [[R1]], -2.0, [[R2]]

	; GCN-DENORM-SLOWFMA: v_mul_f32_e32 [[TMP:v[0-9]+]], -2.0, [[R1]]			; GCN-DENORM-SLOWFMA: v_add_f32_e32 [[TMP:v[0-9]+]], [[R1]], [[R1]]
	; GCN-DENORM-SLOWFMA: v_add_f32_e32 [[RESULT:v[0-9]+]], [[R2]], [[TMP]]			; GCN-DENORM-SLOWFMA: v_subrev_f32_e32 [[RESULT:v[0-9]+]], [[TMP]], [[R2]]

	; SI-DENORM: buffer_store_dword [[RESULT]]			; SI-DENORM: buffer_store_dword [[RESULT]]
	; VI-DENORM: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RESULT]]			; VI-DENORM: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RESULT]]
	define amdgpu_kernel void @fmuladd_neg_2.0_a_b_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {			define amdgpu_kernel void @fmuladd_neg_2.0_a_b_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep.0 = getelementptr float, float addrspace(1)* %out, i32 %tid			%gep.0 = getelementptr float, float addrspace(1)* %out, i32 %tid
	%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1
	%gep.out = getelementptr float, float addrspace(1)* %out, i32 %tid			%gep.out = getelementptr float, float addrspace(1)* %out, i32 %tid
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; GCN: {{buffer\|flat}}_load_dword [[R2:v[0-9]+]],			; GCN: {{buffer\|flat}}_load_dword [[R2:v[0-9]+]],

	; GCN-FLUSH: v_mac_f32_e32 [[R2]], -2.0, [[R1]]			; GCN-FLUSH: v_mac_f32_e32 [[R2]], -2.0, [[R1]]
	; SI-FLUSH: buffer_store_dword [[R2]]			; SI-FLUSH: buffer_store_dword [[R2]]
	; VI-FLUSH: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[R2]]			; VI-FLUSH: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[R2]]

	; GCN-DENORM-FASTFMA: v_fma_f32 [[RESULT:v[0-9]+]], -[[R1]], 2.0, [[R2]]			; GCN-DENORM-FASTFMA: v_fma_f32 [[RESULT:v[0-9]+]], -[[R1]], 2.0, [[R2]]

	; GCN-DENORM-SLOWFMA: v_mul_f32_e32 [[TMP:v[0-9]+]], -2.0, [[R1]]			; GCN-DENORM-SLOWFMA: v_add_f32_e32 [[TMP:v[0-9]+]], [[R1]], [[R1]]
	; GCN-DENORM-SLOWFMA: v_add_f32_e32 [[RESULT:v[0-9]+]], [[TMP]], [[R2]]			; GCN-DENORM-SLOWFMA: v_subrev_f32_e32 [[RESULT:v[0-9]+]], [[TMP]], [[R2]]

	; SI-DENORM: buffer_store_dword [[RESULT]]			; SI-DENORM: buffer_store_dword [[RESULT]]
	; VI-DENORM: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RESULT]]			; VI-DENORM: flat_store_dword v{{\[[0-9]+:[0-9]+\]}}, [[RESULT]]
	define amdgpu_kernel void @fmuladd_2.0_neg_a_b_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {			define amdgpu_kernel void @fmuladd_2.0_neg_a_b_f32(float addrspace(1)* %out, float addrspace(1)* %in) #0 {
	%tid = call i32 @llvm.amdgcn.workitem.id.x()			%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%gep.0 = getelementptr float, float addrspace(1)* %out, i32 %tid			%gep.0 = getelementptr float, float addrspace(1)* %out, i32 %tid
	%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1			%gep.1 = getelementptr float, float addrspace(1)* %gep.0, i32 1
	%gep.out = getelementptr float, float addrspace(1)* %out, i32 %tid			%gep.out = getelementptr float, float addrspace(1)* %out, i32 %tid
	▲ Show 20 Lines • Show All 320 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Transform (fadd A, (fmul B, -2.0)) -> (fsub A, (fadd B, B)).
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 97825

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/trunk/test/CodeGen/AArch64/fadd-combines.ll

llvm/trunk/test/CodeGen/AMDGPU/fmuladd.f32.ll

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Transform (fadd A, (fmul B, -2.0)) -> (fsub A, (fadd B, B)).ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 97825

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/trunk/test/CodeGen/AArch64/fadd-combines.ll

llvm/trunk/test/CodeGen/AMDGPU/fmuladd.f32.ll

[DAGCombine] Transform (fadd A, (fmul B, -2.0)) -> (fsub A, (fadd B, B)).
ClosedPublic