Page MenuHomePhabricator

[DAGCombine] Adding a hook to improve the precision of fsqrt if the input is denormal
ClosedPublic

Authored by steven.zhang on Jun 1 2020, 9:06 PM.

Details

Summary

For now, we will hardcode the result as 0.0 if the input is denormal or 0. That will have the impact the precision. As the fsqrt added belong to the cold path of the cmp+branch, it won't impact the performance for normal inputs for PowerPC. Besides, it removes the xxlxor of the hot path.

clang without this patch

sqrt(2.2250738585072014e-308) = 1.4916681462400413e-154
sqrt(2.2250738585072009e-308) = 0
sqrt(4.9406564584124654e-324) = 0

With this patch:

sqrt(2.2250738585072014e-308) = 1.4916681462400413e-154
sqrt(2.2250738585072009e-308) = 1.4916681462400412e-154
sqrt(4.9406564584124654e-324) = 2.2227587494850775e-162

Diff Detail

Event Timeline

steven.zhang created this revision.Jun 1 2020, 9:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2020, 9:06 PM
spatel added a comment.Jun 3 2020, 6:09 AM

Not sure about other targets, but x86 is mostly expected to flush denorms (FTZ/DAZ) when using fast-math, so this should not matter even if we decide to override the default setting.

llvm/include/llvm/CodeGen/TargetLowering.h
4293

Can we have this default to:

return DAG.getConstantFP(0.0, SDLoc(Operand), Operand.getValueType());

and save some code in the calling function?

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
22079

"value that provided" -> "value provided"

steven.zhang planned changes to this revision.Sep 7 2020, 6:56 PM

Address comments.

spatel added a comment.Nov 4 2020, 1:21 PM

As with the other patch, this LGTM in general, so if someone can verify that the PPC changes are as expected, we should be good.

shchenz added a subscriber: shchenz.Nov 4 2020, 7:36 PM
shchenz added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
12771

Target independent code supports vector type and we also have vector sqrt instruction on Powerpc target. Can we use them for vector types like v4f32 or v2f64?

qiucf added a subscriber: qiucf.Nov 4 2020, 7:42 PM
qiucf added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
12771

Yes. Supported types:

  • VSX off: f32, f64
  • VSX on: f32, f64, f128, v4f32, v2f64

Address comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
12771

Yes, I will post another patch for the vector type as what I have mentioned in the comments of D80706.

shchenz accepted this revision.Nov 10 2020, 8:16 PM
shchenz added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
22084

On Powerpc target, we use 0.0 as denormal float point sqrt result for a long time. Changing the result to hardware sqrt instructions will improve the precision for sure, but it also degrades the runtime performance. Is it possible to do it like: if we concern about performance, we use 0.0, if we concern about precision, we use hardware sqrt instruction. Maybe -Ofast is an indicatation for this?

This revision is now accepted and ready to land.Nov 10 2020, 8:16 PM

oops, accept by mistake...

shchenz requested changes to this revision.Nov 10 2020, 8:17 PM
This revision now requires changes to proceed.Nov 10 2020, 8:17 PM
steven.zhang requested review of this revision.Nov 10 2020, 9:05 PM
steven.zhang added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
22084

It won't deg the runtime performance if the input is not denormal which is the usual case, as the select is expanded as cmp + branch later and hw will predict to the normal code path. And it indeed slows down the performance if the input is denormal, but it is expected as the precision is improved. All the optimization is done under -Ofast.

Considering that we only have impact on the denormal input code path, which usually cares about the precision, not the performance, I tend to keep it this way. Does it make sense ?

shchenz accepted this revision.Nov 10 2020, 9:37 PM

LGTM

This revision is now accepted and ready to land.Nov 10 2020, 9:37 PM
This revision was landed with ongoing or failed builds.Nov 26 2020, 6:13 PM
This revision was automatically updated to reflect the committed changes.