This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Add hook to allow target specific test for sqrt input
ClosedPublic

Authored by steven.zhang on May 28 2020, 3:12 AM.

Details

Summary

PowerPC has instruction ftsqrt/xstsqrtdp etc to do the input test for software square root. LLVM now tests it with smallest normalized value using abs + setcc. We should add hook to target that has test instructions.

Diff Detail

Event Timeline

steven.zhang created this revision.May 28 2020, 3:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2020, 3:12 AM
steven.zhang marked an inline comment as done.May 28 2020, 3:13 AM
steven.zhang added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
22078

Question here: do we miss to propagate the SDFlags for SETCC which might affect how we lower the select_cc ?

nemanjai added inline comments.May 28 2020, 4:49 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
22078

See inline for a minor improvement, otherwise LGTM. Someone from PPC should provide final approval after looking at the test diffs.

llvm/include/llvm/CodeGen/TargetLowering.h
4280–4282

The description did not read clearly to me. How about:
"Return a target-dependent comparison result if the input operand is suitable for use with a square root estimate calculation. For example, the comparison may check if the operand is NAN, INF, zero, normal, etc. The result should be used as the condition operand for a select or branch."

If the plan is to generalize this hook for 'ftdiv' as well, then we can make the description less specific.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
22078

Yes, I think we are missing propagation of flags on all of the created nodes in this sequence. I don't know if we can induce any test differences from that with current regression tests, but that can be another patch.

steven.zhang marked 2 inline comments as done.May 28 2020, 8:18 PM
steven.zhang added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
22078

PowerPC backend depends on the flags to determine how to lower select_cc between select and cmp+branch. I believe we can see the test difference. And yes, it is another patch.

22078

This thread may be relevant here: http://lists.llvm.org/pipermail/llvm-dev/2020-May/141561.html

Yeah, this seems to be a big hole of DAGCombine. We even don't have parameter to pass the flags for getSetCC now, though it could be set later.

Update the comments.

steven.zhang marked an inline comment as done.May 29 2020, 12:29 AM
steven.zhang added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
22078

Can we add a verifier in DAG to verify the flag ? i.e. Checking that, there is no flags missed during the dagcombine.

spatel added inline comments.May 30 2020, 9:48 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
22078

I'm not seeing how it would work because the flags are always optional and not necessarily propagated from the operands. But if you see a way to do it, that would be a nice enhancement.

Ping for PowerPC backend change ...

shchenz added a subscriber: shchenz.Jun 9 2020, 7:26 PM

I think it's better to add a denormal test case

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
1451

Better to follow legacy formatting or reformat all the sentences here?

12769

Move the comments above line 12383?

12772

Any specific reason for i32 here? I guess here i8 would be enough

12781

Target independent code checks denormal input, ftsqrt denormal input checking result is in fg_flag, your comments seems like related to fe_flag, does this matters?

llvm/lib/Target/PowerPC/PPCInstrVSX.td
633

Same as above, i8 should be enough for crD?

steven.zhang marked 8 inline comments as done.Jun 10 2020, 1:50 AM
steven.zhang added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
1451

It is format by clang-format. Maybe, we can commit a NFC patch to format all the statements here.

12769

Ok, I will do it with later change with this patch.

12772

It is because the type of the CRRC must be i32.

def CRRC : RegisterClass<"PPC", [i32], 32,
  (add CR0, CR1, CR5, CR6,
       CR7, CR2, CR3, CR4)> {
  let AltOrders = [(sub CRRC, CR2, CR3, CR4)];
  let AltOrderSelect = [{
    return MF.getSubtarget<PPCSubtarget>().isELFv2ABI() &&
           MF.getInfo<PPCFunctionInfo>()->isNonVolatileCRDisabled();
  }];
}
12781

Target independent code didn't assume the content of the check and the target is free to do the kind of check. The ISA contains a program note:

ftdiv and ftsqrt are provided to accelerate software
emulation of divide and square root operations, by
performing the requisite special case checking.
Software needs only a single branch, on FE=1 (in
CR[BF]), to a special case handler. FG and FL may
provide further acceleration opportunities.

So, I select the FE for the special case handle.

shchenz added inline comments.Jun 10 2020, 3:41 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
12781

I think there is functionality issue here if we use fe_flag, not fg_flag. From the comments in target independent code:

// The estimate is now completely wrong if the input was exactly 0.0 or
// possibly a denormal. Force the answer to 0.0 for those cases.

The iteration method to calculate the sqrt would be wrong if the input if denormal.

But in PowerPC's hook implementation, fe_flag will not be set even if the input is denormal. So now for denormal input, we may also use the newton iterated est after testing fe_flag.

steven.zhang planned changes to this revision.Sep 7 2020, 6:56 PM
steven.zhang marked 3 inline comments as done.
steven.zhang added inline comments.Nov 3 2020, 12:08 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
12781

According to http://web.mit.edu/hyperbook/Patrikalakis-Maekawa-Cho/node47.html, the double floating point is denormal if exp < -1022. So, the ftsqrt must return 1 as it is set if e_b <= -970. That means we won't have functionality issue but with precision issue for the value between exp >= -1022 ~ exp <= -970, which is handled by D80974

Address comments.

shchenz accepted this revision.Nov 4 2020, 8:12 PM

LGTM. Thanks for improving this.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
12781

Thanks for your explanation. So if flag fg_flag is 1, fe_flag must be also 1. For the normal input cases where fe_flag is 1, but fg_flag is 0, you handle them in D80974. This makes sense to me.

This revision is now accepted and ready to land.Nov 4 2020, 8:12 PM

Fix a bug of the type we used for SELECT. For now, we are using the operand type, which is not right as we need to use the cond type.

qiucf added a subscriber: qiucf.Nov 10 2020, 10:14 PM
qiucf added inline comments.
llvm/include/llvm/CodeGen/TargetLowering.h
4284

testSqrtEstimate?

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
22062

Will it be better if put logic below into base getSqrtInputTest implementation?

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
12773–12780
llvm/lib/Target/PowerPC/PPCInstrFormats.td
643

Typo: extra space

steven.zhang added inline comments.Nov 12 2020, 5:42 PM
llvm/include/llvm/CodeGen/TargetLowering.h
4284

Hmm, I still prefer the getXXX as we have getSqrtEstimate and getRecipEstimate likewise routine.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
22062

I think both are ok. The good things for this is to have the target specific test inside getSqrtInputTest() and return SDVaue() if didn't have. We are making the default implementation non-override which makes sense in fact till now.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
12773–12780

Thank you for this. I will update it.

llvm/lib/Target/PowerPC/PPCInstrFormats.td
643

Good catch. Update it when I commit it.

This revision was landed with ongoing or failed builds.Nov 24 2020, 9:39 PM
This revision was automatically updated to reflect the committed changes.
jgorbe added a subscriber: jgorbe.Dec 3 2020, 5:56 PM

Found a crash that seems to be caused by this patch. This is a reduced test case:

extern "C"
#define a(b, c, args) d(double, b, , args)
#define d(g, b, c, args) g b args
    a(sqrt, , (double));
int e;
double f = sqrt(e);

Building with clang -target powerpc64le-grtev4-linux-gnu -ffast-math repro.cpp crashes a top-of-tree clang.

Do not know how to custom type legalize this operation!                                                                                              
UNREACHABLE executed at /usr/local/google/home/jgorbe/code/llvm/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:11088!                                   
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:                                                                                                                                          
[... stack dump omitted for brevity ...]

Quick fixed by commit c25b039e211441033069c7046324d2f76de37bed . Thank you for reporting this. We miss the test coverage for the case that crbits is disabled before.