This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] optionally filter out denorms when using frsqrte to calculate sqrt
AbandonedPublic

Authored by spatel on Feb 1 2018, 10:25 AM.

Details

Summary

This is a follow-up to D42323 where we noticed that AArch64 did not pick up the change for filtering out denorm inputs to a sqrt estimate. That's because it wasn't using the generic DAGCombiner code to mask off a 0.0 sqrt input.

Note: This patch is only for the (odd?) case where we're ok with using a sqrt estimate, but are not using flush-to-zero mode to kill denorms.

I don't have a system to test the HW behavior (should confirm that frsqrte returns INF with a denorm input?), and my reading of AArch asm isn't good, so please make sure the new code is actually correct.

Diff Detail

Event Timeline

spatel created this revision.Feb 1 2018, 10:25 AM

frsqrte produces a finite estimate for denormal numbers. From the ARM ARM:

if exp == 0 then
    while fraction<51> == 0 do
        fraction = fraction<50:0> : '0';
        exp = exp - 1;
    fraction = fraction<50:0> : '0';

frsqrte produces a finite estimate for denormal numbers. From the ARM ARM:

if exp == 0 then
    while fraction<51> == 0 do
        fraction = fraction<50:0> : '0';
        exp = exp - 1;
    fraction = fraction<50:0> : '0';

Ah, great - so we don't need this patch. Should still change the select operand to FPZero to produce the 'bic' rather than 'bsl' though?

Should still change the select operand to FPZero to produce the 'bic' rather than 'bsl' though?

I think that changes the result for sqrt(-0.0)? I guess we could still do it under appropriate fast-math flags.

spatel abandoned this revision.Feb 1 2018, 11:38 AM

Should still change the select operand to FPZero to produce the 'bic' rather than 'bsl' though?

I think that changes the result for sqrt(-0.0)? I guess we could still do it under appropriate fast-math flags.

Good point. The DAGCombiner estimate generation is only guarded by Options.UnsafeFPMath (doesn't check nodes' flags or NoSignedZerosFPMath), so it's probably wrong currently.