Page MenuHomePhabricator

[X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32
ClosedPublic

Authored by RKSimon on Aug 23 2016, 10:24 AM.

Details

Summary

As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types.

This patch adds support for fptosi to 2i32 from both 2f64 and 2f32.

It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 69012.Aug 23 2016, 10:24 AM
RKSimon retitled this revision from to [X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32.
RKSimon updated this object.
RKSimon added reviewers: mkuper, delena, spatel, andreadb.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
delena added inline comments.Sep 3 2016, 2:25 AM
lib/Target/X86/X86ISelLowering.cpp
22223 ↗(On Diff #69012)

Why you do not call Lower_FP_TO_SINT ?

22225 ↗(On Diff #69012)

It is source, not Res. Please change variable name.

22228 ↗(On Diff #69012)

Can you use FP_TO_SINT here?

RKSimon updated this revision to Diff 70289.Sep 4 2016, 6:45 AM

Updated based on Elena'a comments.

RKSimon marked an inline comment as done.Sep 4 2016, 6:51 AM
RKSimon added inline comments.
lib/Target/X86/X86ISelLowering.cpp
22229 ↗(On Diff #70289)

As the result type is the illegal v2i32 type we need to use ReplaceNodeResults - we can't just go direct to Lower_FP_TO_SINT as type legalization will get in the way.

22234 ↗(On Diff #70289)

See above.

delena added inline comments.Sep 4 2016, 12:48 PM
lib/Target/X86/X86ISelLowering.cpp
22229 ↗(On Diff #70289)

If you specify
setOperationAction(ISD::FP_TO_SINT, MVT::v2f32, Custom) - specify source type, because type legalizer looks at it first.
it takes you directly from type legalizer to LowerFP_TO_SINT().

Please look at LowerSINT_TO_FP(), we do the same there.

Additionally, you do
setOperationAction(ISD::FP_TO_SINT, MVT::v2i32, Custom) - it will call LowerFP_TO_SINT() for v2f64 -> v2i32

22234 ↗(On Diff #70289)

I meant using ISD::FP_TO_SINT instead of X86ISD::CVTTPD2DQ

RKSimon added inline comments.Sep 6 2016, 6:21 AM
lib/Target/X86/X86ISelLowering.cpp
22229 ↗(On Diff #70289)

I think the difference is that we are returning a illegal type (v2i32) while your SINT_TO_FP examples are taking it as an input.

22234 ↗(On Diff #70289)

This is similar to the need for X86ISD::CVTDQ2PD - can't use (v2f32 SINT_TO_FP (v2i32)) would return an illegal type.

delena edited edge metadata.Sep 8 2016, 5:28 AM

I propose to do the following (I wrote some code to be sure that it works)
In ReplaceNodeResults():

case ISD::FP_TO_SINT: {
    SDValue V = LowerFP_TO_SINT(SDValue(N, 0), DAG)) {
    Results.push_back(V);
    return;
  }

SDValue X86TargetLowering::LowerFP_TO_SINT(SDValue Op,

                                         SelectionDAG &DAG) const {
MVT VT = Op.getSimpleValueType();
SDValue Src = Op.getOperand(0);
MVT SrcVT = Src.getSimpleValueType();
SDLoc DL(Op);

if (SrcVT == MVT::v2f32 && VT == MVT::v2i32) {
  SDValue Idx = DAG.getIntPtrConstant(0, DL);
  SDValue ExtSrc = DAG.getNode(ISD::CONCAT_VECTORS, DL, MVT::v4f32, Src,
                               DAG.getUNDEF(MVT::v2f32));
  SDValue FpToInt = DAG.getNode(ISD::FP_TO_SINT, DL, MVT::v4i32, ExtSrc);
  return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, MVT::v2i32, FpToInt, Idx);
}
if (SrcVT == MVT::v2f64 && VT == MVT::v2i32) {
  SDValue FpToInt = DAG.getNode(ISD::FP_TO_SINT, DL, MVT::v4i32, Src);
  return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, MVT::v2i32, FpToInt,
                     DAG.getIntPtrConstant(0, DL));
}
  setOperationAction(ISD::FP_TO_SINT,         MVT::v2i32, Custom);
  setOperationAction(ISD::FP_TO_SINT,         MVT::v2f32, Custom);

And you need to add some patterns to the .td files for "fp_to_sint" instead of adding a new node type.

I propose to do the following (I wrote some code to be sure that it works)
In ReplaceNodeResults():

case ISD::FP_TO_SINT: {
    SDValue V = LowerFP_TO_SINT(SDValue(N, 0), DAG)) {
    Results.push_back(V);
    return;
  }

I've investigated this approach and don't agree with it. Not only would we be introducing illegal return types in lowering code (far more serious than illegal argument types) but the x87 fallback code in ReplaceNodeResults is subtly different to LowerFP_TO_SINT as it replaces the node entries in FP_TO_INTHelper.

And you need to add some patterns to the .td files for "fp_to_sint" instead of adding a new node type.

Doing so would require us to attempt to introduce illegal types into the td file which we've avoided so far and is unlikely to work.

We already do something similar from CVTPD2PS (2f64 -> 2f32) where we introduced the X86ISD::VFPROUND node. I don't consider X86ISD::CVTTPD2DQ a major intrusion.

Actually, I wanted to put this pattern:
def : Pat<(v4i32 (fp_to_sint (v2f64 VR128:$src))),

(CVTTPD2DQrr VR128:$src)>;

But if you say that it's impossible, you can use your variant.
I also tried to simplify, whenever it's possible, the huge switch of ReplaceNodeResults() and offload the lowering into a separate function.
LowerFP_TO_SINT() may return illegal result if the node VT is illegal, but you can do the same in any other function
LowerFP_TO_SINT_v2i32(), for example.
You also may handle UINT in the same way:
if (VT == MVT::v2i32) {

if (IsSigned)
  LowerFP_TO_SINT_v2i32()
else
 LowerFP_TO_UINT_v2i32()

} else

// scalar types
craig.topper edited edge metadata.Oct 1 2016, 8:00 PM

Elena, are you just wanting the code to be outside of the switch body in replaceNodeResults and in a separate function? I don't think it makes sense to put it into the existing LowerFP_To_SINT. That would just make that function have two different behaviors based on where its being called from.

lib/Target/X86/X86ISelLowering.cpp
22248 ↗(On Diff #70289)

What types does the FIST code below handle? Should there be a return or llvm_unreachable after the MVT::v2f32 if. I don't think the FIST code was intended for a v2i32 result type so we shouldn't fallthrough into it.

Elena, are you just wanting the code to be outside of the switch body in replaceNodeResults and in a separate function? I don't think it makes sense to put it into the existing LowerFP_To_SINT. That would just make that function have two different behaviors based on where its being called from.

That is the role of these Lower* functions to provide custom solution for legal and and illegal types. We do the same in LowerSINT_TO_FP(). But if you think that it does not make sense, please feel free to proceed.

RKSimon updated this revision to Diff 73837.Oct 6 2016, 12:48 PM
RKSimon edited edge metadata.

Rebased

RKSimon added inline comments.Oct 6 2016, 12:50 PM
lib/Target/X86/X86ISelLowering.cpp
22248 ↗(On Diff #70289)

FP_TO_INTHelper only handles f32/f64/f80 scalar inputs, returning null for other inputs. rL283485 added tests to check this.

If you like I can add an early return at the end of the MVT::v2i32 block to make it clearer?

RKSimon updated this revision to Diff 74130.Oct 10 2016, 8:24 AM

Added sitofp 2f64 -> 2i32 cost model

craig.topper added inline comments.Oct 15 2016, 11:50 AM
lib/Target/X86/X86ISelLowering.cpp
22248 ↗(On Diff #70289)

Yeah that's probably a little clearer.

lib/Target/X86/X86InstrAVX512.td
6273 ↗(On Diff #74130)

Why is this block under HasAVX512, but the patterns AVX pattersn in X86InstrSSE.td are only disabled if VLX is enabled?

RKSimon updated this revision to Diff 74798.Oct 16 2016, 7:46 AM

Nice catch Craig! Fixed AVX512/HasVLX requirement

RKSimon updated this revision to Diff 74799.Oct 16 2016, 7:51 AM

Forgot to add the early out.

craig.topper added inline comments.Oct 16 2016, 4:52 PM
test/CodeGen/X86/vec_fp_to_int.ll
102 ↗(On Diff #74799)

Why does this test case end up with 2 cvttpd2dq instructions that need to be unpacked?

RKSimon added inline comments.Oct 17 2016, 2:34 AM
test/CodeGen/X86/vec_fp_to_int.ll
102 ↗(On Diff #74799)

Its demonstrating that we're not propagating the undef nature of the upper <2 x double> from the shuffle in the test. On SSE it results in an unnecessary extra cvttpd2dq and on AVX it results in a vcvttpd2dqy instead of just vcvttpd2dq. Makes a difference on 128-bit ALU Jaguar but is a separate fix from this patch.

craig.topper accepted this revision.Oct 17 2016, 8:46 PM
craig.topper edited edge metadata.
This revision is now accepted and ready to land.Oct 17 2016, 8:46 PM
This revision was automatically updated to reflect the committed changes.