This is an archive of the discontinued LLVM Phabricator instance.

Fast-math fold: x / (y * sqrt(z)) -> x * rsqrt(z) / y
ClosedPublic

Authored by spatel on Oct 6 2014, 11:11 AM.

Details

Summary

This patch only affects PPC at the moment because no other target has enabled reciprocal sqrt estimate or reciprocal estimate optimizations yet.

The motivation is to recognize code such as this from /llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c:

float distance = sqrt(dx * dx + dy * dy + dz * dz);
float mag = dt / (distance * distance * distance);

Without this patch, we don't match the sqrt as a reciprocal sqrt, so for PPC the new testcase in this patch produces:

   addis 3, 2, .LCPI4_2@toc@ha
   lfs 4, .LCPI4_2@toc@l(3)
   addis 3, 2, .LCPI4_1@toc@ha
   lfs 0, .LCPI4_1@toc@l(3)
   fcmpu 0, 1, 4
   beq 0, .LBB4_2
# BB#1:
   frsqrtes 4, 1
   addis 3, 2, .LCPI4_0@toc@ha
   lfs 5, .LCPI4_0@toc@l(3)
   fnmsubs 13, 1, 5, 1
   fmuls 6, 4, 4
   fmadds 1, 13, 6, 5
   fmuls 1, 4, 1
   fres 4, 1                <--- reciprocal of reciprocal square root
   fnmsubs 1, 1, 4, 0
   fmadds 4, 4, 1, 4
.LBB4_2:
   fmuls 1, 4, 2
   fres 2, 1
   fnmsubs 0, 1, 2, 0
   fmadds 0, 2, 0, 2
   fmuls 1, 3, 0
   blr

After the patch, this simplifies to:

frsqrtes 0, 1
addis 3, 2, .LCPI4_1@toc@ha
fres 5, 2
lfs 4, .LCPI4_1@toc@l(3)
addis 3, 2, .LCPI4_0@toc@ha
lfs 7, .LCPI4_0@toc@l(3)
fnmsubs 13, 1, 4, 1
fmuls 6, 0, 0
fnmsubs 2, 2, 5, 7
fmadds 1, 13, 6, 4
fmadds 2, 5, 2, 5
fmuls 0, 0, 1
fmuls 0, 0, 2
fmuls 1, 3, 0
blr

I don't have any PPC hardware to measure this patch on (still no reply from gcc's CompileFarm), but I think it should be quite a bit faster just based on the number of flops saved.

There should be a measurable perf win using the n-body program from test-suite or here:
http://benchmarksgame.alioth.debian.org/u32/performance.php?test=nbody
or using the test loop/program from:
http://llvm.org/bugs/show_bug.cgi?id=20900

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 14462.Oct 6 2014, 11:11 AM
spatel retitled this revision from to Fast-math fold: x / (y * sqrt(z)) -> x * rsqrt(z) / y.
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added reviewers: hfinkel, wschmidt, willschm.
spatel added a subscriber: Unknown Object (MLST).
wschmidt edited edge metadata.Oct 6 2014, 11:46 AM

Sorry to hear about nonresponsiveness from the compile farm. You should usually be able to find Laurent Guerby on irc.oftc.net #gcc as "guerby" -- you may want to ping him directly that way.

hfinkel accepted this revision.Oct 6 2014, 11:52 AM
hfinkel edited edge metadata.

LGTM, thanks!

This revision is now accepted and ready to land.Oct 6 2014, 11:52 AM

Patch looks fine to me as well!

spatel closed this revision.Oct 6 2014, 12:41 PM
spatel updated this revision to Diff 14470.

Closed by commit rL219139 (authored by @spatel).

Thanks for the quick review - checked in with r219139.

tycho added a subscriber: tycho.Oct 9 2014, 9:14 AM