PowerPC has instruction ftsqrt/xstsqrtdp etc to do the input test for software square root. LLVM now tests it with smallest normalized value using abs + setcc. We should add hook to target that has test instructions.
Details
- Reviewers
spatel RKSimon jsji shchenz - Group Reviewers
Restricted Project - Commits
- rG9c588f53fc42: [DAGCombine] Add hook to allow target specific test for sqrt input
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
22078 | Question here: do we miss to propagate the SDFlags for SETCC which might affect how we lower the select_cc ? |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
22078 | This thread may be relevant here: http://lists.llvm.org/pipermail/llvm-dev/2020-May/141561.html |
See inline for a minor improvement, otherwise LGTM. Someone from PPC should provide final approval after looking at the test diffs.
llvm/include/llvm/CodeGen/TargetLowering.h | ||
---|---|---|
4280–4282 | The description did not read clearly to me. How about: If the plan is to generalize this hook for 'ftdiv' as well, then we can make the description less specific. | |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
22078 | Yes, I think we are missing propagation of flags on all of the created nodes in this sequence. I don't know if we can induce any test differences from that with current regression tests, but that can be another patch. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
22078 | PowerPC backend depends on the flags to determine how to lower select_cc between select and cmp+branch. I believe we can see the test difference. And yes, it is another patch. | |
22078 |
Yeah, this seems to be a big hole of DAGCombine. We even don't have parameter to pass the flags for getSetCC now, though it could be set later. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
22078 | Can we add a verifier in DAG to verify the flag ? i.e. Checking that, there is no flags missed during the dagcombine. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
22078 | I'm not seeing how it would work because the flags are always optional and not necessarily propagated from the operands. But if you see a way to do it, that would be a nice enhancement. |
I think it's better to add a denormal test case
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
1451 | Better to follow legacy formatting or reformat all the sentences here? | |
12769 | Move the comments above line 12383? | |
12772 | Any specific reason for i32 here? I guess here i8 would be enough | |
12781 | Target independent code checks denormal input, ftsqrt denormal input checking result is in fg_flag, your comments seems like related to fe_flag, does this matters? | |
llvm/lib/Target/PowerPC/PPCInstrVSX.td | ||
633 | Same as above, i8 should be enough for crD? |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
1451 | It is format by clang-format. Maybe, we can commit a NFC patch to format all the statements here. | |
12769 | Ok, I will do it with later change with this patch. | |
12772 | It is because the type of the CRRC must be i32. def CRRC : RegisterClass<"PPC", [i32], 32, (add CR0, CR1, CR5, CR6, CR7, CR2, CR3, CR4)> { let AltOrders = [(sub CRRC, CR2, CR3, CR4)]; let AltOrderSelect = [{ return MF.getSubtarget<PPCSubtarget>().isELFv2ABI() && MF.getInfo<PPCFunctionInfo>()->isNonVolatileCRDisabled(); }]; } | |
12781 | Target independent code didn't assume the content of the check and the target is free to do the kind of check. The ISA contains a program note: ftdiv and ftsqrt are provided to accelerate software emulation of divide and square root operations, by performing the requisite special case checking. Software needs only a single branch, on FE=1 (in CR[BF]), to a special case handler. FG and FL may provide further acceleration opportunities. So, I select the FE for the special case handle. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
12781 | I think there is functionality issue here if we use fe_flag, not fg_flag. From the comments in target independent code: // The estimate is now completely wrong if the input was exactly 0.0 or // possibly a denormal. Force the answer to 0.0 for those cases. The iteration method to calculate the sqrt would be wrong if the input if denormal. But in PowerPC's hook implementation, fe_flag will not be set even if the input is denormal. So now for denormal input, we may also use the newton iterated est after testing fe_flag. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
12781 | According to http://web.mit.edu/hyperbook/Patrikalakis-Maekawa-Cho/node47.html, the double floating point is denormal if exp < -1022. So, the ftsqrt must return 1 as it is set if e_b <= -970. That means we won't have functionality issue but with precision issue for the value between exp >= -1022 ~ exp <= -970, which is handled by D80974 |
Fix a bug of the type we used for SELECT. For now, we are using the operand type, which is not right as we need to use the cond type.
llvm/include/llvm/CodeGen/TargetLowering.h | ||
---|---|---|
4284 | testSqrtEstimate? | |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
22062 | Will it be better if put logic below into base getSqrtInputTest implementation? | |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
12773–12780 | ||
llvm/lib/Target/PowerPC/PPCInstrFormats.td | ||
643 | Typo: extra space |
llvm/include/llvm/CodeGen/TargetLowering.h | ||
---|---|---|
4284 | Hmm, I still prefer the getXXX as we have getSqrtEstimate and getRecipEstimate likewise routine. | |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
22062 | I think both are ok. The good things for this is to have the target specific test inside getSqrtInputTest() and return SDVaue() if didn't have. We are making the default implementation non-override which makes sense in fact till now. | |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
12773–12780 | Thank you for this. I will update it. | |
llvm/lib/Target/PowerPC/PPCInstrFormats.td | ||
643 | Good catch. Update it when I commit it. |
Found a crash that seems to be caused by this patch. This is a reduced test case:
extern "C" #define a(b, c, args) d(double, b, , args) #define d(g, b, c, args) g b args a(sqrt, , (double)); int e; double f = sqrt(e);
Building with clang -target powerpc64le-grtev4-linux-gnu -ffast-math repro.cpp crashes a top-of-tree clang.
Do not know how to custom type legalize this operation! UNREACHABLE executed at /usr/local/google/home/jgorbe/code/llvm/llvm/lib/Target/PowerPC/PPCISelLowering.cpp:11088! PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: [... stack dump omitted for brevity ...]
Quick fixed by commit c25b039e211441033069c7046324d2f76de37bed . Thank you for reporting this. We miss the test coverage for the case that crbits is disabled before.
The description did not read clearly to me. How about:
"Return a target-dependent comparison result if the input operand is suitable for use with a square root estimate calculation. For example, the comparison may check if the operand is NAN, INF, zero, normal, etc. The result should be used as the condition operand for a select or branch."
If the plan is to generalize this hook for 'ftdiv' as well, then we can make the description less specific.