[DAGCombine] Add hook to allow target specific test for sqrt inputClosedPublicActions

Authored by steven.zhang on May 28 2020, 3:12 AM.

Details

Reviewers
 spatel RKSimon jsji shchenz
Group Reviewers
 Restricted Project
Commits
rG9c588f53fc42: [DAGCombine] Add hook to allow target specific test for sqrt input
Summary

PowerPC has instruction ftsqrt/xstsqrtdp etc to do the input test for software square root. LLVM now tests it with smallest normalized value using abs + setcc. We should add hook to target that has test instructions.

Unit TestsFailedView All

TimeTest
100 msLLVM.DebugInfo::Unknown Unit Message ("")
Script: -- : 'RUN: at line 3'; c:\ws\beta\llvm-project\build\bin\llc.exe -mtriple=x86_64-pc-windows-gnu C:\ws\beta\llvm-project\llvm\test\DebugInfo\fortranSubrangeVar.ll -filetype=obj -o - | c:\ws\beta\llvm-project\build\bin\llvm-dwarfdump.exe - | c:\ws\beta\llvm-project\build\bin\filecheck.exe C:\ws\beta\llvm-project\llvm\test\DebugInfo\fortranSubrangeVar.ll

Event Timeline

steven.zhang created this revision.May 28 2020, 3:12 AM
Herald added a project: Restricted Project. May 28 2020, 3:12 AM
Herald added subscribers: ecnelises, wuzish, kbarton and 2 others.
steven.zhang marked an inline comment as done.May 28 2020, 3:13 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21382

Question here: do we miss to propagate the SDFlags for SETCC which might affect how we lower the select_cc ?

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21382

See inline for a minor improvement, otherwise LGTM. Someone from PPC should provide final approval after looking at the test diffs.

llvm/include/llvm/CodeGen/TargetLowering.h
4197–4199

"Return a target-dependent comparison result if the input operand is suitable for use with a square root estimate calculation. For example, the comparison may check if the operand is NAN, INF, zero, normal, etc. The result should be used as the condition operand for a select or branch."

If the plan is to generalize this hook for 'ftdiv' as well, then we can make the description less specific.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21382

Yes, I think we are missing propagation of flags on all of the created nodes in this sequence. I don't know if we can induce any test differences from that with current regression tests, but that can be another patch.

steven.zhang marked 2 inline comments as done.May 28 2020, 8:18 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21382

PowerPC backend depends on the flags to determine how to lower select_cc between select and cmp+branch. I believe we can see the test difference. And yes, it is another patch.

21382

This thread may be relevant here: http://lists.llvm.org/pipermail/llvm-dev/2020-May/141561.html

Yeah, this seems to be a big hole of DAGCombine. We even don't have parameter to pass the flags for getSetCC now, though it could be set later.

steven.zhang marked an inline comment as done.May 29 2020, 12:29 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21382

Can we add a verifier in DAG to verify the flag ? i.e. Checking that, there is no flags missed during the dagcombine.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21382

I'm not seeing how it would work because the flags are always optional and not necessarily propagated from the operands. But if you see a way to do it, that would be a nice enhancement.

Ping for PowerPC backend change ...

shchenz added a subscriber: shchenz.Jun 9 2020, 7:26 PM

I think it's better to add a denormal test case

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
1437

Better to follow legacy formatting or reformat all the sentences here?

12386

Move the comments above line 12383?

12389

Any specific reason for i32 here? I guess here i8 would be enough

12398

Target independent code checks denormal input, ftsqrt denormal input checking result is in fg_flag, your comments seems like related to fe_flag, does this matters?

llvm/lib/Target/PowerPC/PPCInstrVSX.td
624

Same as above, i8 should be enough for crD?

steven.zhang marked 8 inline comments as done.Jun 10 2020, 1:50 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
1437

It is format by clang-format. Maybe, we can commit a NFC patch to format all the statements here.

12386

Ok, I will do it with later change with this patch.

12389

It is because the type of the CRRC must be i32.

def CRRC : RegisterClass<"PPC", [i32], 32,
CR7, CR2, CR3, CR4)> {
let AltOrders = [(sub CRRC, CR2, CR3, CR4)];
let AltOrderSelect = [{
return MF.getSubtarget<PPCSubtarget>().isELFv2ABI() &&
MF.getInfo<PPCFunctionInfo>()->isNonVolatileCRDisabled();
}];
}
12398

Target independent code didn't assume the content of the check and the target is free to do the kind of check. The ISA contains a program note:

ftdiv and ftsqrt are provided to accelerate software
emulation of divide and square root operations, by
performing the requisite special case checking.
Software needs only a single branch, on FE=1 (in
CR[BF]), to a special case handler. FG and FL may
provide further acceleration opportunities.

So, I select the FE for the special case handle.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
12398

I think there is functionality issue here if we use fe_flag, not fg_flag. From the comments in target independent code:

// The estimate is now completely wrong if the input was exactly 0.0 or
// possibly a denormal. Force the answer to 0.0 for those cases.

The iteration method to calculate the sqrt would be wrong if the input if denormal.

But in PowerPC's hook implementation, fe_flag will not be set even if the input is denormal. So now for denormal input, we may also use the newton iterated est after testing fe_flag.

steven.zhang planned changes to this revision.Sep 7 2020, 6:56 PM
steven.zhang marked 3 inline comments as done.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
12398

According to http://web.mit.edu/hyperbook/Patrikalakis-Maekawa-Cho/node47.html, the double floating point is denormal if exp < -1022. So, the ftsqrt must return 1 as it is set if e_b <= -970. That means we won't have functionality issue but with precision issue for the value between exp >= -1022 ~ exp <= -970, which is handled by D80974

shchenz accepted this revision.Nov 4 2020, 8:12 PM

LGTM. Thanks for improving this.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
12398

Thanks for your explanation. So if flag fg_flag is 1, fe_flag must be also 1. For the normal input cases where fe_flag is 1, but fg_flag is 0, you handle them in D80974. This makes sense to me.

This revision is now accepted and ready to land.Nov 4 2020, 8:12 PM

Fix a bug of the type we used for SELECT. For now, we are using the operand type, which is not right as we need to use the cond type.

qiucf added a subscriber: qiucf.Nov 10 2020, 10:14 PM
llvm/include/llvm/CodeGen/TargetLowering.h
4201

testSqrtEstimate?

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21366

Will it be better if put logic below into base getSqrtInputTest implementation?

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
12390–12397
llvm/lib/Target/PowerPC/PPCInstrFormats.td
643

Typo: extra space

llvm/include/llvm/CodeGen/TargetLowering.h
4201

Hmm, I still prefer the getXXX as we have getSqrtEstimate and getRecipEstimate likewise routine.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21366

I think both are ok. The good things for this is to have the target specific test inside getSqrtInputTest() and return SDVaue() if didn't have. We are making the default implementation non-override which makes sense in fact till now.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
12390–12397

Thank you for this. I will update it.

llvm/lib/Target/PowerPC/PPCInstrFormats.td
643

Good catch. Update it when I commit it.

This revision was landed with ongoing or failed builds.Nov 24 2020, 9:39 PM
This revision was automatically updated to reflect the committed changes.
jgorbe added a subscriber: jgorbe.Dec 3 2020, 5:56 PM

Found a crash that seems to be caused by this patch. This is a reduced test case:

extern "C"
#define a(b, c, args) d(double, b, , args)
#define d(g, b, c, args) g b args
a(sqrt, , (double));
int e;
double f = sqrt(e);

Building with clang -target powerpc64le-grtev4-linux-gnu -ffast-math repro.cpp crashes a top-of-tree clang.

Do not know how to custom type legalize this operation!
[... stack dump omitted for brevity ...]