This is an archive of the discontinued LLVM Phabricator instance.

Fast-math fold: x / (y * sqrt(z)) -> x * rsqrt(z) / y
ClosedPublic

Authored by spatel on Oct 6 2014, 11:11 AM.

Download Raw Diff

Details

Reviewers

willschm
wschmidt
hfinkel

Commits

rG7bc9185ab58e: Fast-math fold: x / (y * sqrt(z)) -> x * (rsqrt(z) / y)
rL219139: Fast-math fold: x / (y * sqrt(z)) -> x * (rsqrt(z) / y)

Summary

This patch only affects PPC at the moment because no other target has enabled reciprocal sqrt estimate or reciprocal estimate optimizations yet.

The motivation is to recognize code such as this from /llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c:

float distance = sqrt(dx * dx + dy * dy + dz * dz);
float mag = dt / (distance * distance * distance);

Without this patch, we don't match the sqrt as a reciprocal sqrt, so for PPC the new testcase in this patch produces:

   addis 3, 2, .LCPI4_2@toc@ha
   lfs 4, .LCPI4_2@toc@l(3)
   addis 3, 2, .LCPI4_1@toc@ha
   lfs 0, .LCPI4_1@toc@l(3)
   fcmpu 0, 1, 4
   beq 0, .LBB4_2
# BB#1:
   frsqrtes 4, 1
   addis 3, 2, .LCPI4_0@toc@ha
   lfs 5, .LCPI4_0@toc@l(3)
   fnmsubs 13, 1, 5, 1
   fmuls 6, 4, 4
   fmadds 1, 13, 6, 5
   fmuls 1, 4, 1
   fres 4, 1                <--- reciprocal of reciprocal square root
   fnmsubs 1, 1, 4, 0
   fmadds 4, 4, 1, 4
.LBB4_2:
   fmuls 1, 4, 2
   fres 2, 1
   fnmsubs 0, 1, 2, 0
   fmadds 0, 2, 0, 2
   fmuls 1, 3, 0
   blr

After the patch, this simplifies to:

frsqrtes 0, 1
addis 3, 2, .LCPI4_1@toc@ha
fres 5, 2
lfs 4, .LCPI4_1@toc@l(3)
addis 3, 2, .LCPI4_0@toc@ha
lfs 7, .LCPI4_0@toc@l(3)
fnmsubs 13, 1, 4, 1
fmuls 6, 0, 0
fnmsubs 2, 2, 5, 7
fmadds 1, 13, 6, 4
fmadds 2, 5, 2, 5
fmuls 0, 0, 1
fmuls 0, 0, 2
fmuls 1, 3, 0
blr

I don't have any PPC hardware to measure this patch on (still no reply from gcc's CompileFarm), but I think it should be quite a bit faster just based on the number of flops saved.

There should be a measurable perf win using the n-body program from test-suite or here:
http://benchmarksgame.alioth.debian.org/u32/performance.php?test=nbody
or using the test loop/program from:
http://llvm.org/bugs/show_bug.cgi?id=20900

Diff Detail

Event Timeline

spatel updated this revision to Diff 14462.Oct 6 2014, 11:11 AM

spatel retitled this revision from to Fast-math fold: x / (y * sqrt(z)) -> x * rsqrt(z) / y.

spatel updated this object.

spatel edited the test plan for this revision. (Show Details)

spatel added reviewers: hfinkel, wschmidt, willschm.

spatel added a subscriber: Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptOct 6 2014, 11:11 AM

Sorry to hear about nonresponsiveness from the compile farm. You should usually be able to find Laurent Guerby on irc.oftc.net #gcc as "guerby" -- you may want to ping him directly that way.

LGTM, thanks!

This revision is now accepted and ready to land.Oct 6 2014, 11:52 AM

Patch looks fine to me as well!

Closed by commit rL219139 (authored by @spatel).

Thanks for the quick review - checked in with r219139.

tycho added a subscriber: tycho.Oct 9 2014, 9:14 AM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

22 lines

test/

CodeGen/

PowerPC/

recipest.ll

28 lines

Diff 14462

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,030 Lines • ▼ Show 20 Lines	if (Options.UnsafeFPMath) {
} else if (N1.getOpcode() == ISD::FP_ROUND &&		} else if (N1.getOpcode() == ISD::FP_ROUND &&
N1.getOperand(0).getOpcode() == ISD::FSQRT) {		N1.getOperand(0).getOpcode() == ISD::FSQRT) {
if (SDValue RV = BuildRsqrtEstimate(N1.getOperand(0).getOperand(0))) {		if (SDValue RV = BuildRsqrtEstimate(N1.getOperand(0).getOperand(0))) {
AddToWorklist(RV.getNode());		AddToWorklist(RV.getNode());
RV = DAG.getNode(ISD::FP_ROUND, SDLoc(N1), VT, RV, N1.getOperand(1));		RV = DAG.getNode(ISD::FP_ROUND, SDLoc(N1), VT, RV, N1.getOperand(1));
AddToWorklist(RV.getNode());		AddToWorklist(RV.getNode());
return DAG.getNode(ISD::FMUL, DL, VT, N0, RV);		return DAG.getNode(ISD::FMUL, DL, VT, N0, RV);
}		}
		} else if (N1.getOpcode() == ISD::FMUL) {
		// Look through an FMUL. Even though this won't remove the FDIV directly,
		// it's still worthwhile to get rid of the FSQRT if possible.
		SDValue SqrtOp;
		SDValue OtherOp;
		if (N1.getOperand(0).getOpcode() == ISD::FSQRT) {
		SqrtOp = N1.getOperand(0);
		OtherOp = N1.getOperand(1);
		} else if (N1.getOperand(1).getOpcode() == ISD::FSQRT) {
		SqrtOp = N1.getOperand(1);
		OtherOp = N1.getOperand(0);
		}
		if (SqrtOp.getNode()) {
		// We found a FSQRT, so try to make this fold:
		// x / (y * sqrt(z)) -> x * rsqrt(z) / y
		if (SDValue RV = BuildRsqrtEstimate(SqrtOp.getOperand(0))) {
		AddToWorklist(RV.getNode());
		RV = DAG.getNode(ISD::FDIV, SDLoc(N1), VT, RV, OtherOp);
		AddToWorklist(RV.getNode());
		return DAG.getNode(ISD::FMUL, DL, VT, N0, RV);
		}
		}
}		}

// Fold into a reciprocal estimate and multiply instead of a real divide.		// Fold into a reciprocal estimate and multiply instead of a real divide.
if (SDValue RV = BuildReciprocalEstimate(N1)) {		if (SDValue RV = BuildReciprocalEstimate(N1)) {
AddToWorklist(RV.getNode());		AddToWorklist(RV.getNode());
return DAG.getNode(ISD::FMUL, DL, VT, N0, RV);		return DAG.getNode(ISD::FMUL, DL, VT, N0, RV);
}		}
}		}
▲ Show 20 Lines • Show All 5,206 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/recipest.ll

	Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: blr			; CHECK-NEXT: blr

	; CHECK-SAFE: @goo			; CHECK-SAFE: @goo
	; CHECK-SAFE: fsqrts			; CHECK-SAFE: fsqrts
	; CHECK-SAFE: fdivs			; CHECK-SAFE: fdivs
	; CHECK-SAFE: blr			; CHECK-SAFE: blr
	}			}

				; Recognize that this is rsqrt(a) * rcp(b) * c,
				; not 1 / ( 1 / sqrt(a)) * rcp(b) * c.
				define float @rsqrt_fmul(float %a, float %b, float %c) {
				%x = call float @llvm.sqrt.f32(float %a)
				%y = fmul float %x, %b
				%z = fdiv float %c, %y
				ret float %z

				; CHECK: @rsqrt_fmul
				; CHECK-DAG: frsqrtes
				; CHECK-DAG: fres
				; CHECK-DAG: fnmsubs
				; CHECK-DAG: fmuls
				; CHECK-DAG: fnmsubs
				; CHECK-DAG: fmadds
				; CHECK-DAG: fmadds
				; CHECK: fmuls
				; CHECK-NEXT: fmuls
				; CHECK-NEXT: fmuls
				; CHECK-NEXT: blr

				; CHECK-SAFE: @rsqrt_fmul
				; CHECK-SAFE: fsqrts
				; CHECK-SAFE: fmuls
				; CHECK-SAFE: fdivs
				; CHECK-SAFE: blr
				}

	define <4 x float> @hoo(<4 x float> %a, <4 x float> %b) nounwind {			define <4 x float> @hoo(<4 x float> %a, <4 x float> %b) nounwind {
	%x = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)			%x = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %b)
	%r = fdiv <4 x float> %a, %x			%r = fdiv <4 x float> %a, %x
	ret <4 x float> %r			ret <4 x float> %r

	; CHECK: @hoo			; CHECK: @hoo
	; CHECK: vrsqrtefp			; CHECK: vrsqrtefp

	▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines