Page MenuHomePhabricator

[DAGCombine] Add hook to allow target specific test for sqrt input
Needs ReviewPublic

Authored by steven.zhang on May 28 2020, 3:12 AM.

Details

Reviewers
spatel
RKSimon
jsji
Group Reviewers
Restricted Project
Summary

PowerPC has instruction ftsqrt/xstsqrtdp etc to do the input test for software square root. LLVM now tests it with smallest normalized value using abs + setcc. We should add hook to target that has test instructions.

Diff Detail

Event Timeline

steven.zhang created this revision.May 28 2020, 3:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2020, 3:12 AM
steven.zhang marked an inline comment as done.May 28 2020, 3:13 AM
steven.zhang added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21382

Question here: do we miss to propagate the SDFlags for SETCC which might affect how we lower the select_cc ?

nemanjai added inline comments.May 28 2020, 4:49 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21382

See inline for a minor improvement, otherwise LGTM. Someone from PPC should provide final approval after looking at the test diffs.

llvm/include/llvm/CodeGen/TargetLowering.h
4197–4199

The description did not read clearly to me. How about:
"Return a target-dependent comparison result if the input operand is suitable for use with a square root estimate calculation. For example, the comparison may check if the operand is NAN, INF, zero, normal, etc. The result should be used as the condition operand for a select or branch."

If the plan is to generalize this hook for 'ftdiv' as well, then we can make the description less specific.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21382

Yes, I think we are missing propagation of flags on all of the created nodes in this sequence. I don't know if we can induce any test differences from that with current regression tests, but that can be another patch.

steven.zhang marked 2 inline comments as done.May 28 2020, 8:18 PM
steven.zhang added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21382

PowerPC backend depends on the flags to determine how to lower select_cc between select and cmp+branch. I believe we can see the test difference. And yes, it is another patch.

21382

This thread may be relevant here: http://lists.llvm.org/pipermail/llvm-dev/2020-May/141561.html

Yeah, this seems to be a big hole of DAGCombine. We even don't have parameter to pass the flags for getSetCC now, though it could be set later.

Update the comments.

steven.zhang marked an inline comment as done.May 29 2020, 12:29 AM
steven.zhang added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21382

Can we add a verifier in DAG to verify the flag ? i.e. Checking that, there is no flags missed during the dagcombine.

spatel added inline comments.May 30 2020, 9:48 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21382

I'm not seeing how it would work because the flags are always optional and not necessarily propagated from the operands. But if you see a way to do it, that would be a nice enhancement.

Ping for PowerPC backend change ...

shchenz added a subscriber: shchenz.Tue, Jun 9, 7:26 PM

I think it's better to add a denormal test case

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
1437

Better to follow legacy formatting or reformat all the sentences here?

12386

Move the comments above line 12383?

12389

Any specific reason for i32 here? I guess here i8 would be enough

12398

Target independent code checks denormal input, ftsqrt denormal input checking result is in fg_flag, your comments seems like related to fe_flag, does this matters?

llvm/lib/Target/PowerPC/PPCInstrVSX.td
624

Same as above, i8 should be enough for crD?

steven.zhang marked 8 inline comments as done.Wed, Jun 10, 1:50 AM
steven.zhang added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
1437

It is format by clang-format. Maybe, we can commit a NFC patch to format all the statements here.

12386

Ok, I will do it with later change with this patch.

12389

It is because the type of the CRRC must be i32.

def CRRC : RegisterClass<"PPC", [i32], 32,
  (add CR0, CR1, CR5, CR6,
       CR7, CR2, CR3, CR4)> {
  let AltOrders = [(sub CRRC, CR2, CR3, CR4)];
  let AltOrderSelect = [{
    return MF.getSubtarget<PPCSubtarget>().isELFv2ABI() &&
           MF.getInfo<PPCFunctionInfo>()->isNonVolatileCRDisabled();
  }];
}
12398

Target independent code didn't assume the content of the check and the target is free to do the kind of check. The ISA contains a program note:

ftdiv and ftsqrt are provided to accelerate software
emulation of divide and square root operations, by
performing the requisite special case checking.
Software needs only a single branch, on FE=1 (in
CR[BF]), to a special case handler. FG and FL may
provide further acceleration opportunities.

So, I select the FE for the special case handle.

shchenz added inline comments.Wed, Jun 10, 3:41 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
12398

I think there is functionality issue here if we use fe_flag, not fg_flag. From the comments in target independent code:

// The estimate is now completely wrong if the input was exactly 0.0 or
// possibly a denormal. Force the answer to 0.0 for those cases.

The iteration method to calculate the sqrt would be wrong if the input if denormal.

But in PowerPC's hook implementation, fe_flag will not be set even if the input is denormal. So now for denormal input, we may also use the newton iterated est after testing fe_flag.