Index: docs/AddingConstrainedIntrinsics.rst =================================================================== --- docs/AddingConstrainedIntrinsics.rst +++ docs/AddingConstrainedIntrinsics.rst @@ -0,0 +1,90 @@ +================================================== +How To Add A Constrained Floating-Point Intrinsic +================================================== + +.. contents:: + :local: + +.. warning:: + This is a work in progress. + +Add the intrinsic +================= + +Multiple files need to be updated when adding a new constrained intrinsic. + +Add the new intrinsic to the table of intrinsics.:: + + include/llvm/IR/Intrinsics.td + +Update class ConstrainedFPIntrinsic to know about the intrinsics.:: + + include/llvm/IR/IntrinsicInst.h + +Functions like ConstrainedFPIntrinsic::isUnaryOp() or +ConstrainedFPIntrinsic::isTernaryOp() may need to know about the new +intrinsic.:: + + lib/IR/IntrinsicInst.cpp + +Update the IR verifier:: + + lib/IR/Verifier.cpp + +Add SelectionDAG node types +=========================== + +Add the new STRICT version of the node type to the ISD::NodeType enum.:: + + include/llvm/CodeGen/ISDOpcodes.h + +In class SDNode update isStrictFPOpcode():: + + include/llvm/CodeGen/SelectionDAGNodes.h + +A mapping from the STRICT SDnode type to the non-STRICT is done in +TargetLoweringBase::getStrictFPOperationAction(). This allows STRICT +nodes to be legalized similarly to the non-STRICT node type.:: + + include/llvm/CodeGen/TargetLowering.h + +Building the SelectionDAG +------------------------- + +The switch statement in SelectionDAGBuilder::visitIntrinsicCall() needs +to be updated to call SelectionDAGBuilder::visitConstrainedFPIntrinsic(). +That function, in turn, needs to be updated to know how to create the +SDNode for the intrinsic. The new STRICT node will eventually be converted +to the matching non-STRICT node. For this reason it _must_ have the same +operands and values as the non-STRICT version in case the non-STRICT +version's default lowering is used. This means that if the non-STRICT +version of the node does not use the chain then the STRICT node cannot +either.:: + + lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp + +Most of the STRICT nodes get legalized the same as their matching non-STRICT +counterparts. A new STRICT node with this property must get added to the +switch in SelectionDAGLegalize::LegalizeOp().:: + + lib/CodeGen/SelectionDAG/LegalizeDAG.cpp + +The code to do the conversion or mutation of the STRICT node to a non-STRICT +version of the node happens in SelectionDAG::mutateStrictFPToFP(). Be +careful updating this function since some nodes are always chained and +some are not. Some nodes have the same return type as their input operand, +but some are different. Both of these points must be properly handled.:: + + lib/CodeGen/SelectionDAG/SelectionDAG.cpp + +To make debug logs readable it is helpful to update the SelectionDAG's +debug logger::: + + lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp + +Add documentation and tests +=========================== + +:: + + docs/LangRef.rst Index: docs/LangRef.rst =================================================================== --- docs/LangRef.rst +++ docs/LangRef.rst @@ -14051,6 +14051,141 @@ operand computed with infinite precision, and then rounded to the target precision. +'``llvm.experimental.constrained.fptoui``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare + @llvm.experimental.constrained.fptoui( , + metadata ) + +Overview: +""""""""" + +The '``llvm.experimental.constrained.fptoui``' intrinsic converts a +floating-point ``value`` to its unsigned integer equivalent of type ``ty2``. + +Arguments: +"""""""""" + +The first argument to the '``llvm.experimental.constrained.fptoui``' +intrinsic must be :ref:`floating point ` or :ref:`vector +` of floating point values. + +The second argument specifies the exception behavior as described above. + +Semantics: +"""""""""" + +The result produced is an unsigned integer converted from the floating +point operand. The value is truncated, so it is rounded towards zero. + +'``llvm.experimental.constrained.fptosi``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare + @llvm.experimental.constrained.fptosi( , + metadata ) + +Overview: +""""""""" + +The '``llvm.experimental.constrained.fptosi``' intrinsic converts +:ref:`floating-point ` ``value`` to type ``ty2``. + +Arguments: +"""""""""" + +The first argument to the '``llvm.experimental.constrained.fptosi``' +intrinsic must be :ref:`floating point ` or :ref:`vector +` of floating point values. + +The second argument specifies the exception behavior as described above. + +Semantics: +"""""""""" + +The result produced is a signed integer converted from the floating +point operand. The value is truncated, so it is rounded towards zero. + +'``llvm.experimental.constrained.fptrunc``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare + @llvm.experimental.constrained.fptrunc( , + metadata ) + +Overview: +""""""""" + +The '``llvm.experimental.constrained.fptrunc``' intrinsic truncates ``value`` +to type ``ty2``. + +Arguments: +"""""""""" + +The first argument to the '``llvm.experimental.constrained.fptrunc``' +intrinsic must be :ref:`floating point ` or :ref:`vector +` of floating point values. This argument must be larger in size +than the result. + +The second argument specifies the exception behavior as described above. + +Semantics: +"""""""""" + +The result produced is a floating point value truncated to be smaller in size +than the operand. + +'``llvm.experimental.constrained.fpext``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare + @llvm.experimental.constrained.fpext( , + metadata ) + +Overview: +""""""""" + +The '``llvm.experimental.constrained.fpext``' intrinsic extends a +floating-point ``value`` to a larger floating-point value. + +Arguments: +"""""""""" + +The first argument to the '``llvm.experimental.constrained.fpext``' +intrinsic must be :ref:`floating point ` or :ref:`vector +` of floating point values. This argument must be smaller in size +than the result. + +The second argument specifies the exception behavior as described above. + +Semantics: +"""""""""" + +The result produced is a floating point value extended to be larger in size +than the operand. All restrictions that apply to the fpext instruction also +apply to this intrinsic. + Constrained libm-equivalent Intrinsics -------------------------------------- Index: docs/index.rst =================================================================== --- docs/index.rst +++ docs/index.rst @@ -191,6 +191,7 @@ CommandLine CompilerWriterInfo ExtendingLLVM + AddingConstrainedIntrinsics HowToSetUpLLVMStyleRTTI ProgrammersManual Extensions @@ -227,6 +228,10 @@ :doc:`ExtendingLLVM` Look here to see how to add instructions and intrinsics to LLVM. +:doc:`AddingConstrainedIntrinsics` + Gives the steps necessary when adding a new constrained math intrinsic + to LLVM. + `Doxygen generated documentation `_ (`classes `_) Index: include/llvm/CodeGen/ISDOpcodes.h =================================================================== --- include/llvm/CodeGen/ISDOpcodes.h +++ include/llvm/CodeGen/ISDOpcodes.h @@ -508,8 +508,11 @@ /// FP_TO_[US]INT - Convert a floating point value to a signed or unsigned /// integer. These have the same semantics as fptosi and fptoui in IR. If /// the FP value cannot fit in the integer type, the results are undefined. + /// The STRICT_ versions are identical except they limit DAG optimzations. FP_TO_SINT, FP_TO_UINT, + STRICT_FP_TO_SINT, + STRICT_FP_TO_UINT, /// X = FP_ROUND(Y, TRUNC) - Rounding 'Y' from a larger floating point type /// down to the precision of the destination VT. TRUNC is a flag, which is @@ -522,7 +525,9 @@ /// precision of source type. This allows certain transformations like /// FP_EXTEND(FP_ROUND(X,1)) -> X which are not safe for /// FP_EXTEND(FP_ROUND(X,0)) because the extra bits aren't removed. + /// The STRICT_ version is identical except it limits DAG optimzations. FP_ROUND, + STRICT_FP_ROUND, /// FLT_ROUNDS_ - Returns current rounding mode: /// -1 Undefined @@ -540,7 +545,9 @@ FP_ROUND_INREG, /// X = FP_EXTEND(Y) - Extend a smaller FP type into a larger FP type. + /// The STRICT_ version is identical except it limits DAG optimzations. FP_EXTEND, + STRICT_FP_EXTEND, /// BITCAST - This operator converts between integer, vector and FP /// values, as if the value was stored to memory with one type and loaded Index: include/llvm/CodeGen/SelectionDAGNodes.h =================================================================== --- include/llvm/CodeGen/SelectionDAGNodes.h +++ include/llvm/CodeGen/SelectionDAGNodes.h @@ -678,6 +678,10 @@ case ISD::STRICT_FFLOOR: case ISD::STRICT_FROUND: case ISD::STRICT_FTRUNC: + case ISD::STRICT_FP_TO_SINT: + case ISD::STRICT_FP_TO_UINT: + case ISD::STRICT_FP_ROUND: + case ISD::STRICT_FP_EXTEND: return true; } } Index: include/llvm/CodeGen/TargetLowering.h =================================================================== --- include/llvm/CodeGen/TargetLowering.h +++ include/llvm/CodeGen/TargetLowering.h @@ -825,6 +825,10 @@ case ISD::STRICT_FFLOOR: EqOpc = ISD::FFLOOR; break; case ISD::STRICT_FROUND: EqOpc = ISD::FROUND; break; case ISD::STRICT_FTRUNC: EqOpc = ISD::FTRUNC; break; + case ISD::STRICT_FP_TO_SINT: EqOpc = ISD::FP_TO_SINT; break; + case ISD::STRICT_FP_TO_UINT: EqOpc = ISD::FP_TO_UINT; break; + case ISD::STRICT_FP_ROUND: EqOpc = ISD::FP_ROUND; break; + case ISD::STRICT_FP_EXTEND: EqOpc = ISD::FP_EXTEND; break; } auto Action = getOperationAction(EqOpc, VT); Index: include/llvm/IR/IntrinsicInst.h =================================================================== --- include/llvm/IR/IntrinsicInst.h +++ include/llvm/IR/IntrinsicInst.h @@ -239,6 +239,10 @@ case Intrinsic::experimental_constrained_fdiv: case Intrinsic::experimental_constrained_frem: case Intrinsic::experimental_constrained_fma: + case Intrinsic::experimental_constrained_fptosi: + case Intrinsic::experimental_constrained_fptoui: + case Intrinsic::experimental_constrained_fptrunc: + case Intrinsic::experimental_constrained_fpext: case Intrinsic::experimental_constrained_sqrt: case Intrinsic::experimental_constrained_pow: case Intrinsic::experimental_constrained_powi: Index: include/llvm/IR/Intrinsics.td =================================================================== --- include/llvm/IR/Intrinsics.td +++ include/llvm/IR/Intrinsics.td @@ -512,6 +512,22 @@ llvm_metadata_ty, llvm_metadata_ty ]>; + def int_experimental_constrained_fptosi : Intrinsic<[ llvm_anyint_ty ], + [ llvm_anyfloat_ty, + llvm_metadata_ty ]>; + + def int_experimental_constrained_fptoui : Intrinsic<[ llvm_anyint_ty ], + [ llvm_anyfloat_ty, + llvm_metadata_ty ]>; + + def int_experimental_constrained_fptrunc : Intrinsic<[ llvm_anyfloat_ty ], + [ llvm_anyfloat_ty, + llvm_metadata_ty ]>; + + def int_experimental_constrained_fpext : Intrinsic<[ llvm_anyfloat_ty ], + [ llvm_anyfloat_ty, + llvm_metadata_ty ]>; + // These intrinsics are sensitive to the rounding mode so we need constrained // versions of each of them. When strict rounding and exception control are // not required the non-constrained versions of these intrinsics should be @@ -593,7 +609,7 @@ llvm_metadata_ty, llvm_metadata_ty ]>; } -// FIXME: Add intrinsics for fcmp, fptrunc, fpext, fptoui and fptosi. +// FIXME: Add intrinsic for fcmp // FIXME: Add intrinsics for fabs and copysign? Index: lib/CodeGen/SelectionDAG/LegalizeDAG.cpp =================================================================== --- lib/CodeGen/SelectionDAG/LegalizeDAG.cpp +++ lib/CodeGen/SelectionDAG/LegalizeDAG.cpp @@ -1114,6 +1114,10 @@ case ISD::STRICT_FFLOOR: case ISD::STRICT_FROUND: case ISD::STRICT_FTRUNC: + case ISD::STRICT_FP_TO_SINT: + case ISD::STRICT_FP_TO_UINT: + case ISD::STRICT_FP_ROUND: + case ISD::STRICT_FP_EXTEND: // These pseudo-ops get legalized as if they were their non-strict // equivalent. For instance, if ISD::FSQRT is legal then ISD::STRICT_FSQRT // is also legal, but if ISD::FSQRT requires expansion then so does @@ -2788,12 +2792,14 @@ break; } case ISD::FP_ROUND: + case ISD::STRICT_FP_ROUND: case ISD::BITCAST: Tmp1 = EmitStackConvert(Node->getOperand(0), Node->getValueType(0), Node->getValueType(0), dl); Results.push_back(Tmp1); break; case ISD::FP_EXTEND: + case ISD::STRICT_FP_EXTEND: Tmp1 = EmitStackConvert(Node->getOperand(0), Node->getOperand(0).getValueType(), Node->getValueType(0), dl); @@ -2856,9 +2862,13 @@ Results.push_back(Tmp1); break; case ISD::FP_TO_SINT: + case ISD::STRICT_FP_TO_SINT: if (TLI.expandFP_TO_SINT(Node, Tmp1, DAG)) Results.push_back(Tmp1); break; + case ISD::STRICT_FP_TO_UINT: + llvm_unreachable("No default lowering for STRICT_FP_TO_UINT!"); + break; case ISD::FP_TO_UINT: if (TLI.expandFP_TO_UINT(Node, Tmp1, DAG)) Results.push_back(Tmp1); Index: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp =================================================================== --- lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp +++ lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp @@ -113,6 +113,8 @@ case ISD::ZERO_EXTEND: case ISD::ANY_EXTEND: Res = PromoteIntRes_INT_EXTEND(N); break; + case ISD::STRICT_FP_TO_SINT: + case ISD::STRICT_FP_TO_UINT: case ISD::FP_TO_SINT: case ISD::FP_TO_UINT: Res = PromoteIntRes_FP_TO_XINT(N); break; @@ -454,8 +456,21 @@ TLI.isOperationLegalOrCustom(ISD::FP_TO_SINT, NVT)) NewOpc = ISD::FP_TO_SINT; - SDValue Res = DAG.getNode(NewOpc, dl, NVT, N->getOperand(0)); + if (N->getOpcode() == ISD::STRICT_FP_TO_UINT && + !TLI.isOperationLegal(ISD::STRICT_FP_TO_UINT, NVT) && + TLI.isOperationLegalOrCustom(ISD::STRICT_FP_TO_SINT, NVT)) + NewOpc = ISD::STRICT_FP_TO_SINT; + SDValue Res; + if (N->isStrictFPOpcode()) { + Res = DAG.getNode(NewOpc, dl, { NVT, MVT::Other }, + { N->getOperand(0), N->getOperand(1) }); + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(N, 1), Res.getValue(1)); + } else + Res = DAG.getNode(NewOpc, dl, NVT, N->getOperand(0)); + // Assert that the converted value fits in the original type. If it doesn't // (eg: because the value being converted is too big), then the result of the // original operation was undefined anyway, so the assert is still correct. @@ -463,7 +478,8 @@ // NOTE: fp-to-uint to fp-to-sint promotion guarantees zero extend. For example: // before legalization: fp-to-uint16, 65534. -> 0xfffe // after legalization: fp-to-sint32, 65534. -> 0x0000fffe - return DAG.getNode(N->getOpcode() == ISD::FP_TO_UINT ? + return DAG.getNode((N->getOpcode() == ISD::FP_TO_UINT || + N->getOpcode() == ISD::STRICT_FP_TO_UINT) ? ISD::AssertZext : ISD::AssertSext, dl, NVT, Res, DAG.getValueType(N->getValueType(0).getScalarType())); } Index: lib/CodeGen/SelectionDAG/LegalizeTypes.h =================================================================== --- lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -660,6 +660,7 @@ SDValue ScalarizeVecRes_BUILD_VECTOR(SDNode *N); SDValue ScalarizeVecRes_EXTRACT_SUBVECTOR(SDNode *N); SDValue ScalarizeVecRes_FP_ROUND(SDNode *N); + SDValue ScalarizeVecRes_STRICT_FP_ROUND(SDNode *N); SDValue ScalarizeVecRes_FPOWI(SDNode *N); SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N); SDValue ScalarizeVecRes_LOAD(LoadSDNode *N); @@ -681,6 +682,7 @@ SDValue ScalarizeVecOp_VSETCC(SDNode *N); SDValue ScalarizeVecOp_STORE(StoreSDNode *N, unsigned OpNo); SDValue ScalarizeVecOp_FP_ROUND(SDNode *N, unsigned OpNo); + SDValue ScalarizeVecOp_STRICT_FP_ROUND(SDNode *N, unsigned OpNo); //===--------------------------------------------------------------------===// // Vector Splitting Support: LegalizeVectorTypes.cpp @@ -785,6 +787,7 @@ SDValue WidenVecRes_BinaryCanTrap(SDNode *N); SDValue WidenVecRes_StrictFP(SDNode *N); SDValue WidenVecRes_Convert(SDNode *N); + SDValue WidenVecRes_Convert_StrictFP(SDNode *N); SDValue WidenVecRes_FCOPYSIGN(SDNode *N); SDValue WidenVecRes_POWI(SDNode *N); SDValue WidenVecRes_Shift(SDNode *N); Index: lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp =================================================================== --- lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp +++ lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp @@ -312,6 +312,10 @@ case ISD::STRICT_FFLOOR: case ISD::STRICT_FROUND: case ISD::STRICT_FTRUNC: + case ISD::STRICT_FP_TO_SINT: + case ISD::STRICT_FP_TO_UINT: + case ISD::STRICT_FP_ROUND: + case ISD::STRICT_FP_EXTEND: // These pseudo-ops get legalized as if they were their non-strict // equivalent. For instance, if ISD::FSQRT is legal then ISD::STRICT_FSQRT // is also legal, but if ISD::FSQRT requires expansion then so does @@ -766,6 +770,8 @@ case ISD::STRICT_FFLOOR: case ISD::STRICT_FROUND: case ISD::STRICT_FTRUNC: + case ISD::STRICT_FP_TO_SINT: + case ISD::STRICT_FP_TO_UINT: return ExpandStrictFPOp(Op); default: return DAG.UnrollVectorOp(Op.getNode()); @@ -1178,7 +1184,7 @@ if (OperVT.isVector()) Oper = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, - EltVT, Oper, Idx); + OperVT.getVectorElementType(), Oper, Idx); Opers.push_back(Oper); } Index: lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp =================================================================== --- lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -51,6 +51,7 @@ case ISD::BITCAST: R = ScalarizeVecRes_BITCAST(N); break; case ISD::BUILD_VECTOR: R = ScalarizeVecRes_BUILD_VECTOR(N); break; case ISD::EXTRACT_SUBVECTOR: R = ScalarizeVecRes_EXTRACT_SUBVECTOR(N); break; + case ISD::STRICT_FP_ROUND: R = ScalarizeVecRes_STRICT_FP_ROUND(N); break; case ISD::FP_ROUND: R = ScalarizeVecRes_FP_ROUND(N); break; case ISD::FP_ROUND_INREG: R = ScalarizeVecRes_InregOp(N); break; case ISD::FPOWI: R = ScalarizeVecRes_FPOWI(N); break; @@ -170,6 +171,9 @@ case ISD::STRICT_FFLOOR: case ISD::STRICT_FROUND: case ISD::STRICT_FTRUNC: + case ISD::STRICT_FP_TO_SINT: + case ISD::STRICT_FP_TO_UINT: + case ISD::STRICT_FP_EXTEND: R = ScalarizeVecRes_StrictFPOp(N); break; } @@ -264,6 +268,18 @@ NewVT, Op, N->getOperand(1)); } +SDValue DAGTypeLegalizer::ScalarizeVecRes_STRICT_FP_ROUND(SDNode *N) { + EVT NewVT = N->getValueType(0).getVectorElementType(); + SDValue Op = GetScalarizedVector(N->getOperand(1)); + SDValue Res = DAG.getNode(ISD::STRICT_FP_ROUND, SDLoc(N), + { NewVT, MVT::Other }, + { N->getOperand(0), Op, N->getOperand(2) }); + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(N, 1), Res.getValue(1)); + return Res; +} + SDValue DAGTypeLegalizer::ScalarizeVecRes_FPOWI(SDNode *N) { SDValue Op = GetScalarizedVector(N->getOperand(0)); return DAG.getNode(ISD::FPOWI, SDLoc(N), @@ -547,6 +563,9 @@ case ISD::STORE: Res = ScalarizeVecOp_STORE(cast(N), OpNo); break; + case ISD::STRICT_FP_ROUND: + Res = ScalarizeVecOp_STRICT_FP_ROUND(N, OpNo); + break; case ISD::FP_ROUND: Res = ScalarizeVecOp_FP_ROUND(N, OpNo); break; @@ -680,6 +699,18 @@ return DAG.getNode(ISD::SCALAR_TO_VECTOR, SDLoc(N), N->getValueType(0), Res); } +SDValue DAGTypeLegalizer::ScalarizeVecOp_STRICT_FP_ROUND(SDNode *N, + unsigned OpNo) { + SDValue Elt = GetScalarizedVector(N->getOperand(1)); + SDValue Res = DAG.getNode(ISD::STRICT_FP_ROUND, SDLoc(N), + { N->getValueType(0).getVectorElementType(), MVT::Other }, + { N->getOperand(0), Elt, N->getOperand(2) }); + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(N, 1), Res.getValue(1)); + return DAG.getNode(ISD::SCALAR_TO_VECTOR, SDLoc(N), N->getValueType(0), Res); +} + //===----------------------------------------------------------------------===// // Result Vector Splitting //===----------------------------------------------------------------------===// @@ -763,9 +794,13 @@ case ISD::FNEARBYINT: case ISD::FNEG: case ISD::FP_EXTEND: + case ISD::STRICT_FP_EXTEND: case ISD::FP_ROUND: + case ISD::STRICT_FP_ROUND: case ISD::FP_TO_SINT: + case ISD::STRICT_FP_TO_SINT: case ISD::FP_TO_UINT: + case ISD::STRICT_FP_TO_UINT: case ISD::FRINT: case ISD::FROUND: case ISD::FSIN: @@ -1468,15 +1503,33 @@ // If the input also splits, handle it directly for a compile time speedup. // Otherwise split it by hand. - EVT InVT = N->getOperand(0).getValueType(); + EVT InVT = N->getOperand(N->isStrictFPOpcode()).getValueType(); if (getTypeAction(InVT) == TargetLowering::TypeSplitVector) - GetSplitVector(N->getOperand(0), Lo, Hi); + GetSplitVector(N->getOperand(N->isStrictFPOpcode()), Lo, Hi); else - std::tie(Lo, Hi) = DAG.SplitVectorOperand(N, 0); + std::tie(Lo, Hi) = DAG.SplitVectorOperand(N, N->isStrictFPOpcode()); if (N->getOpcode() == ISD::FP_ROUND) { Lo = DAG.getNode(N->getOpcode(), dl, LoVT, Lo, N->getOperand(1)); Hi = DAG.getNode(N->getOpcode(), dl, HiVT, Hi, N->getOperand(1)); + } else if (N->getOpcode() == ISD::STRICT_FP_ROUND) { + Lo = DAG.getNode(N->getOpcode(), dl, { LoVT, MVT::Other }, + { N->getOperand(0), Lo, N->getOperand(2) }); + Hi = DAG.getNode(N->getOpcode(), dl, { HiVT, MVT::Other }, + { N->getOperand(0), Hi, N->getOperand(2) }); + SDValue NewChain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, + Lo.getValue(1), Hi.getValue(1)); + ReplaceValueWith(SDValue(N, 1), NewChain); + } else if (N->isStrictFPOpcode()) { + Lo = DAG.getNode(N->getOpcode(), dl, { LoVT, MVT::Other }, + { N->getOperand(0), Lo }); + Hi = DAG.getNode(N->getOpcode(), dl, { HiVT, MVT::Other }, + { N->getOperand(0), Hi }); + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + SDValue NewChain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, + Lo.getValue(1), Hi.getValue(1)); + ReplaceValueWith(SDValue(N, 1), NewChain); } else { Lo = DAG.getNode(N->getOpcode(), dl, LoVT, Lo); Hi = DAG.getNode(N->getOpcode(), dl, HiVT, Hi); @@ -1677,6 +1730,7 @@ case ISD::TRUNCATE: Res = SplitVecOp_TruncateHelper(N); break; + case ISD::STRICT_FP_ROUND: case ISD::FP_ROUND: Res = SplitVecOp_FP_ROUND(N); break; case ISD::FCOPYSIGN: Res = SplitVecOp_FCOPYSIGN(N); break; case ISD::STORE: @@ -1695,8 +1749,10 @@ Res = SplitVecOp_VSELECT(N, OpNo); break; case ISD::FP_TO_SINT: + case ISD::STRICT_FP_TO_SINT: case ISD::FP_TO_UINT: - if (N->getValueType(0).bitsLT(N->getOperand(0).getValueType())) + case ISD::STRICT_FP_TO_UINT: + if (N->getValueType(0).bitsLT(N->getOperand(N->isStrictFPOpcode()).getValueType())) Res = SplitVecOp_TruncateHelper(N); else Res = SplitVecOp_UnaryOp(N); @@ -1711,6 +1767,7 @@ case ISD::CTTZ: case ISD::CTLZ: case ISD::CTPOP: + case ISD::STRICT_FP_EXTEND: case ISD::FP_EXTEND: case ISD::SIGN_EXTEND: case ISD::ZERO_EXTEND: @@ -1752,7 +1809,11 @@ if (Res.getNode() == N) return true; - assert(Res.getValueType() == N->getValueType(0) && N->getNumValues() == 1 && + if (N->isStrictFPOpcode()) + assert(Res.getValueType() == N->getValueType(0) && N->getNumValues() == 2 && + "Invalid operand expansion"); + else + assert(Res.getValueType() == N->getValueType(0) && N->getNumValues() == 1 && "Invalid operand expansion"); ReplaceValueWith(SDValue(N, 0), Res); @@ -1840,15 +1901,31 @@ EVT ResVT = N->getValueType(0); SDValue Lo, Hi; SDLoc dl(N); - GetSplitVector(N->getOperand(0), Lo, Hi); + GetSplitVector(N->getOperand(N->isStrictFPOpcode()), Lo, Hi); EVT InVT = Lo.getValueType(); EVT OutVT = EVT::getVectorVT(*DAG.getContext(), ResVT.getVectorElementType(), InVT.getVectorNumElements()); - Lo = DAG.getNode(N->getOpcode(), dl, OutVT, Lo); - Hi = DAG.getNode(N->getOpcode(), dl, OutVT, Hi); + if (N->isStrictFPOpcode()) { + Lo = DAG.getNode(N->getOpcode(), dl, { OutVT, MVT::Other }, + { N->getOperand(0), Lo }); + Hi = DAG.getNode(N->getOpcode(), dl, { OutVT, MVT::Other }, + { N->getOperand(0), Hi }); + // Build a factor node to remember that this load is independent of the + // other one. + SDValue Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1), + Hi.getValue(1)); + + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(N, 1), Ch); + } else { + Lo = DAG.getNode(N->getOpcode(), dl, OutVT, Lo); + Hi = DAG.getNode(N->getOpcode(), dl, OutVT, Hi); + } + return DAG.getNode(ISD::CONCAT_VECTORS, dl, ResVT, Lo, Hi); } @@ -2225,7 +2302,7 @@ // // Without this transform, the original truncate would end up being // scalarized, which is pretty much always a last resort. - SDValue InVec = N->getOperand(0); + SDValue InVec = N->getOperand(N->isStrictFPOpcode()); EVT InVT = InVec->getValueType(0); EVT OutVT = N->getValueType(0); unsigned NumElements = OutVT.getVectorNumElements(); @@ -2254,8 +2331,22 @@ EVT::getIntegerVT(*DAG.getContext(), InElementSize/2); EVT HalfVT = EVT::getVectorVT(*DAG.getContext(), HalfElementVT, NumElements/2); - SDValue HalfLo = DAG.getNode(N->getOpcode(), DL, HalfVT, InLoVec); - SDValue HalfHi = DAG.getNode(N->getOpcode(), DL, HalfVT, InHiVec); + SDValue HalfLo; + SDValue HalfHi; + SDValue Chain; + if (N->isStrictFPOpcode()) { + HalfLo = DAG.getNode(N->getOpcode(), DL, { HalfVT, MVT::Other }, + { N->getOperand(0), InLoVec }); + HalfHi = DAG.getNode(N->getOpcode(), DL, { HalfVT, MVT::Other }, + { N->getOperand(0), InHiVec }); + // Build a factor node to remember that this Op is independent of the + // other one. + Chain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, + HalfLo.getValue(1), HalfHi.getValue(1)); + } else { + HalfLo = DAG.getNode(N->getOpcode(), DL, HalfVT, InLoVec); + HalfHi = DAG.getNode(N->getOpcode(), DL, HalfVT, InHiVec); + } // Concatenate them to get the full intermediate truncation result. EVT InterVT = EVT::getVectorVT(*DAG.getContext(), HalfElementVT, NumElements); SDValue InterVec = DAG.getNode(ISD::CONCAT_VECTORS, DL, InterVT, HalfLo, @@ -2264,11 +2355,23 @@ // type. This should normally be something that ends up being legal directly, // but in theory if a target has very wide vectors and an annoyingly // restricted set of legal types, this split can chain to build things up. + if (N->isStrictFPOpcode()) { + SDValue Round = DAG.getNode(ISD::STRICT_FP_ROUND, DL, { OutVT, MVT::Other }, + { Chain, InterVec, DAG.getTargetConstant( + 0, DL, TLI.getPointerTy(DAG.getDataLayout())) }); + + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(N, 1), Round.getValue(1)); + + return Round; + } else { return IsFloat ? DAG.getNode(ISD::FP_ROUND, DL, OutVT, InterVec, DAG.getTargetConstant( 0, DL, TLI.getPointerTy(DAG.getDataLayout()))) : DAG.getNode(ISD::TRUNCATE, DL, OutVT, InterVec); + } } SDValue DAGTypeLegalizer::SplitVecOp_VSETCC(SDNode *N) { @@ -2296,14 +2399,26 @@ EVT ResVT = N->getValueType(0); SDValue Lo, Hi; SDLoc DL(N); - GetSplitVector(N->getOperand(0), Lo, Hi); + GetSplitVector(N->getOperand(N->isStrictFPOpcode()), Lo, Hi); EVT InVT = Lo.getValueType(); EVT OutVT = EVT::getVectorVT(*DAG.getContext(), ResVT.getVectorElementType(), InVT.getVectorNumElements()); - Lo = DAG.getNode(ISD::FP_ROUND, DL, OutVT, Lo, N->getOperand(1)); - Hi = DAG.getNode(ISD::FP_ROUND, DL, OutVT, Hi, N->getOperand(1)); + if (N->isStrictFPOpcode()) { + Lo = DAG.getNode(N->getOpcode(), DL, { OutVT, MVT::Other }, + { N->getOperand(0), Lo, N->getOperand(2) }); + Hi = DAG.getNode(N->getOpcode(), DL, { OutVT, MVT::Other }, + { N->getOperand(0), Hi, N->getOperand(2) }); + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + SDValue NewChain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, + Lo.getValue(1), Hi.getValue(1)); + ReplaceValueWith(SDValue(N, 1), NewChain); + } else { + Lo = DAG.getNode(ISD::FP_ROUND, DL, OutVT, Lo, N->getOperand(1)); + Hi = DAG.getNode(ISD::FP_ROUND, DL, OutVT, Hi, N->getOperand(1)); + } return DAG.getNode(ISD::CONCAT_VECTORS, DL, ResVT, Lo, Hi); } @@ -2454,6 +2569,13 @@ Res = WidenVecRes_Convert(N); break; + case ISD::STRICT_FP_EXTEND: + case ISD::STRICT_FP_ROUND: + case ISD::STRICT_FP_TO_SINT: + case ISD::STRICT_FP_TO_UINT: + Res = WidenVecRes_Convert_StrictFP(N); + break; + case ISD::FABS: case ISD::FCEIL: case ISD::FCOS: @@ -2876,6 +2998,85 @@ return DAG.getBuildVector(WidenVT, DL, Ops); } +SDValue DAGTypeLegalizer::WidenVecRes_Convert_StrictFP(SDNode *N) { + SDValue InOp = N->getOperand(1); + SDLoc DL(N); + SmallVector NewOps(N->op_begin(), N->op_end()); + + EVT WidenVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0)); + unsigned WidenNumElts = WidenVT.getVectorNumElements(); + SmallVector WidenVTs = { WidenVT, MVT::Other }; + + EVT InVT = InOp.getValueType(); + EVT InEltVT = InVT.getVectorElementType(); + EVT InWidenVT = EVT::getVectorVT(*DAG.getContext(), InEltVT, WidenNumElts); + SmallVector InWidenVTs = { InWidenVT, MVT::Other }; + + unsigned Opcode = N->getOpcode(); + unsigned InVTNumElts = InVT.getVectorNumElements(); + if (getTypeAction(InVT) == TargetLowering::TypeWidenVector) { + InOp = GetWidenedVector(N->getOperand(1)); + InVT = InOp.getValueType(); + InVTNumElts = InVT.getVectorNumElements(); + if (InVTNumElts == WidenNumElts) { + NewOps[1] = InOp; + SDValue Result = DAG.getNode(Opcode, DL, WidenVTs, NewOps); + ReplaceValueWith(SDValue(N, 1), Result.getValue(1)); // Chain + return Result; + } + } + + if (TLI.isTypeLegal(InWidenVT)) { + // Because the result and the input are different vector types, widening + // the result could create a legal type but widening the input might make + // it an illegal type that might lead to repeatedly splitting the input + // and then widening it. To avoid this, we widen the input only if + // it results in a legal type. + if (WidenNumElts % InVTNumElts == 0) { + // Widen the input and call convert on the widened input vector. + unsigned NumConcat = WidenNumElts/InVTNumElts; + SmallVector Ops(NumConcat, DAG.getUNDEF(InVT)); + Ops[0] = InOp; + SDValue InVec = DAG.getNode(ISD::CONCAT_VECTORS, DL, InWidenVT, Ops); + + NewOps[1] = InVec; + SDValue Result = DAG.getNode(Opcode, DL, WidenVTs, NewOps); + ReplaceValueWith(SDValue(N, 1), Result.getValue(1)); // Chain + return Result; + } + + if (InVTNumElts % WidenNumElts == 0) { + NewOps[1] = DAG.getNode( + ISD::EXTRACT_SUBVECTOR, DL, InWidenVT, InOp, + DAG.getConstant(0, DL, TLI.getVectorIdxTy(DAG.getDataLayout()))); + // Extract the input and convert the shorten input vector. + SDValue Result = DAG.getNode(Opcode, DL, WidenVTs, NewOps); + ReplaceValueWith(SDValue(N, 1), Result.getValue(1)); // Chain + return Result; + } + } + + // Otherwise unroll into some nasty scalar code and rebuild the vector. + EVT EltVT = WidenVT.getVectorElementType(); + SmallVector EltVTs = { EltVT, MVT::Other }; + SmallVector Ops(WidenNumElts, DAG.getUNDEF(EltVT)); + SmallVector OpChains; + // Use the original element count so we don't do more scalar opts than + // necessary. + unsigned MinElts = N->getValueType(0).getVectorNumElements(); + for (unsigned i=0; i < MinElts; ++i) { + NewOps[1] = DAG.getNode( + ISD::EXTRACT_VECTOR_ELT, DL, InEltVT, InOp, + DAG.getConstant(i, DL, TLI.getVectorIdxTy(DAG.getDataLayout()))); + Ops[i] = DAG.getNode(Opcode, DL, EltVTs, NewOps); + OpChains.push_back(Ops[i].getValue(1)); + } + SDValue NewChain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, OpChains); + ReplaceValueWith(SDValue(N, 1), NewChain); + + return DAG.getBuildVector(WidenVT, DL, Ops); +} + SDValue DAGTypeLegalizer::WidenVecRes_EXTEND_VECTOR_INREG(SDNode *N) { unsigned Opcode = N->getOpcode(); SDValue InOp = N->getOperand(0); @@ -3654,8 +3855,11 @@ break; case ISD::FP_EXTEND: + case ISD::STRICT_FP_EXTEND: case ISD::FP_TO_SINT: + case ISD::STRICT_FP_TO_SINT: case ISD::FP_TO_UINT: + case ISD::STRICT_FP_TO_UINT: case ISD::SINT_TO_FP: case ISD::UINT_TO_FP: case ISD::TRUNCATE: @@ -3672,8 +3876,12 @@ return true; - assert(Res.getValueType() == N->getValueType(0) && N->getNumValues() == 1 && - "Invalid operand expansion"); + if (N->isStrictFPOpcode()) + assert(Res.getValueType() == N->getValueType(0) && N->getNumValues() == 2 && + "Invalid operand expansion"); + else + assert(Res.getValueType() == N->getValueType(0) && N->getNumValues() == 1 && + "Invalid operand expansion"); ReplaceValueWith(SDValue(N, 0), Res); return false; @@ -3753,7 +3961,7 @@ EVT EltVT = VT.getVectorElementType(); SDLoc dl(N); unsigned NumElts = VT.getVectorNumElements(); - SDValue InOp = N->getOperand(0); + SDValue InOp = N->getOperand(N->isStrictFPOpcode()); assert(getTypeAction(InOp.getValueType()) == TargetLowering::TypeWidenVector && "Unexpected type action"); @@ -3765,7 +3973,15 @@ EVT WideVT = EVT::getVectorVT(*DAG.getContext(), EltVT, InVT.getVectorNumElements()); if (TLI.isTypeLegal(WideVT)) { - SDValue Res = DAG.getNode(Opcode, dl, WideVT, InOp); + SDValue Res; + if (N->isStrictFPOpcode()) { + Res = DAG.getNode(Opcode, dl, { WideVT, MVT::Other }, + { N->getOperand(0), InOp }); + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(N, 1), Res.getValue(1)); + } else + Res = DAG.getNode(Opcode, dl, WideVT, InOp); return DAG.getNode( ISD::EXTRACT_SUBVECTOR, dl, VT, Res, DAG.getConstant(0, dl, TLI.getVectorIdxTy(DAG.getDataLayout()))); @@ -3775,6 +3991,19 @@ // Unroll the convert into some scalar code and create a nasty build vector. SmallVector Ops(NumElts); + if (N->isStrictFPOpcode()) { + SmallVector NewOps(N->op_begin(), N->op_end()); + SmallVector OpChains; + for (unsigned i=0; i < NumElts; ++i) { + NewOps[1] = DAG.getNode( + ISD::EXTRACT_VECTOR_ELT, dl, InEltVT, InOp, + DAG.getConstant(i, dl, TLI.getVectorIdxTy(DAG.getDataLayout()))); + Ops[i] = DAG.getNode(Opcode, dl, { EltVT, MVT::Other }, NewOps); + OpChains.push_back(Ops[i].getValue(1)); + } + SDValue NewChain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, OpChains); + ReplaceValueWith(SDValue(N, 1), NewChain); + } else for (unsigned i=0; i < NumElts; ++i) Ops[i] = DAG.getNode( Opcode, dl, EltVT, Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp =================================================================== --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -7388,6 +7388,10 @@ case ISD::STRICT_FFLOOR: NewOpc = ISD::FFLOOR; IsUnary = true; break; case ISD::STRICT_FROUND: NewOpc = ISD::FROUND; IsUnary = true; break; case ISD::STRICT_FTRUNC: NewOpc = ISD::FTRUNC; IsUnary = true; break; + case ISD::STRICT_FP_TO_SINT: NewOpc = ISD::FP_TO_SINT; IsUnary = true; break; + case ISD::STRICT_FP_TO_UINT: NewOpc = ISD::FP_TO_UINT; IsUnary = true; break; + case ISD::STRICT_FP_ROUND: NewOpc = ISD::FP_ROUND; break; + case ISD::STRICT_FP_EXTEND: NewOpc = ISD::FP_EXTEND; IsUnary = true; break; } // We're taking this node out of the chain, so we need to re-link things. @@ -7395,8 +7399,21 @@ SDValue OutputChain = SDValue(Node, 1); ReplaceAllUsesOfValueWith(OutputChain, InputChain); - SDVTList VTs = getVTList(Node->getOperand(1).getValueType()); + SDVTList VTs; SDNode *Res = nullptr; + + switch (OrigOpc) { + default: + VTs = getVTList(Node->getOperand(1).getValueType()); + break; + case ISD::STRICT_FP_TO_SINT: + case ISD::STRICT_FP_TO_UINT: + case ISD::STRICT_FP_ROUND: + case ISD::STRICT_FP_EXTEND: + VTs = getVTList(Node->ValueList[0]); + break; + } + if (IsUnary) Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1) }); else if (IsTernary) Index: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp =================================================================== --- lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -5619,6 +5619,10 @@ case Intrinsic::experimental_constrained_fdiv: case Intrinsic::experimental_constrained_frem: case Intrinsic::experimental_constrained_fma: + case Intrinsic::experimental_constrained_fptosi: + case Intrinsic::experimental_constrained_fptoui: + case Intrinsic::experimental_constrained_fptrunc: + case Intrinsic::experimental_constrained_fpext: case Intrinsic::experimental_constrained_sqrt: case Intrinsic::experimental_constrained_pow: case Intrinsic::experimental_constrained_powi: @@ -6348,6 +6352,18 @@ case Intrinsic::experimental_constrained_fma: Opcode = ISD::STRICT_FMA; break; + case Intrinsic::experimental_constrained_fptosi: + Opcode = ISD::STRICT_FP_TO_SINT; + break; + case Intrinsic::experimental_constrained_fptoui: + Opcode = ISD::STRICT_FP_TO_UINT; + break; + case Intrinsic::experimental_constrained_fptrunc: + Opcode = ISD::STRICT_FP_ROUND; + break; + case Intrinsic::experimental_constrained_fpext: + Opcode = ISD::STRICT_FP_EXTEND; + break; case Intrinsic::experimental_constrained_sqrt: Opcode = ISD::STRICT_FSQRT; break; @@ -6411,7 +6427,12 @@ SDVTList VTs = DAG.getVTList(ValueVTs); SDValue Result; - if (FPI.isUnaryOp()) + if (Opcode == ISD::STRICT_FP_ROUND) + Result = DAG.getNode(Opcode, sdl, VTs, + { Chain, getValue(FPI.getArgOperand(0)), + DAG.getTargetConstant(0, sdl, + TLI.getPointerTy(DAG.getDataLayout())) }); + else if (FPI.isUnaryOp()) Result = DAG.getNode(Opcode, sdl, VTs, { Chain, getValue(FPI.getArgOperand(0)) }); else if (FPI.isTernaryOp()) @@ -6427,6 +6448,7 @@ assert(Result.getNode()->getNumValues() == 2); SDValue OutChain = Result.getValue(1); DAG.setRoot(OutChain); + SDValue FPResult = Result.getValue(0); setValue(&FPI, FPResult); } Index: lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp =================================================================== --- lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp +++ lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp @@ -306,14 +306,18 @@ case ISD::ZERO_EXTEND_VECTOR_INREG: return "zero_extend_vector_inreg"; case ISD::TRUNCATE: return "truncate"; case ISD::FP_ROUND: return "fp_round"; + case ISD::STRICT_FP_ROUND: return "strict_fp_round"; case ISD::FLT_ROUNDS_: return "flt_rounds"; case ISD::FP_ROUND_INREG: return "fp_round_inreg"; case ISD::FP_EXTEND: return "fp_extend"; + case ISD::STRICT_FP_EXTEND: return "strict_fp_extend"; case ISD::SINT_TO_FP: return "sint_to_fp"; case ISD::UINT_TO_FP: return "uint_to_fp"; case ISD::FP_TO_SINT: return "fp_to_sint"; + case ISD::STRICT_FP_TO_SINT: return "strict_fp_to_sint"; case ISD::FP_TO_UINT: return "fp_to_uint"; + case ISD::STRICT_FP_TO_UINT: return "strict_fp_to_uint"; case ISD::BITCAST: return "bitcast"; case ISD::ADDRSPACECAST: return "addrspacecast"; case ISD::FP16_TO_FP: return "fp16_to_fp"; Index: lib/IR/IntrinsicInst.cpp =================================================================== --- lib/IR/IntrinsicInst.cpp +++ lib/IR/IntrinsicInst.cpp @@ -142,6 +142,10 @@ switch (getIntrinsicID()) { default: return false; + case Intrinsic::experimental_constrained_fptosi: + case Intrinsic::experimental_constrained_fptoui: + case Intrinsic::experimental_constrained_fptrunc: + case Intrinsic::experimental_constrained_fpext: case Intrinsic::experimental_constrained_sqrt: case Intrinsic::experimental_constrained_sin: case Intrinsic::experimental_constrained_cos: Index: lib/IR/Verifier.cpp =================================================================== --- lib/IR/Verifier.cpp +++ lib/IR/Verifier.cpp @@ -4092,6 +4092,10 @@ case Intrinsic::experimental_constrained_fdiv: case Intrinsic::experimental_constrained_frem: case Intrinsic::experimental_constrained_fma: + case Intrinsic::experimental_constrained_fptosi: + case Intrinsic::experimental_constrained_fptoui: + case Intrinsic::experimental_constrained_fptrunc: + case Intrinsic::experimental_constrained_fpext: case Intrinsic::experimental_constrained_sqrt: case Intrinsic::experimental_constrained_pow: case Intrinsic::experimental_constrained_powi: @@ -4518,17 +4522,149 @@ void Verifier::visitConstrainedFPIntrinsic(ConstrainedFPIntrinsic &FPI) { unsigned NumOperands = FPI.getNumArgOperands(); - Assert(((NumOperands == 5 && FPI.isTernaryOp()) || - (NumOperands == 3 && FPI.isUnaryOp()) || (NumOperands == 4)), + bool HasExceptionMD = false; + bool HasRoundingMD = false; + switch (FPI.getIntrinsicID()) + { + case Intrinsic::experimental_constrained_sqrt: + case Intrinsic::experimental_constrained_sin: + case Intrinsic::experimental_constrained_cos: + case Intrinsic::experimental_constrained_exp: + case Intrinsic::experimental_constrained_exp2: + case Intrinsic::experimental_constrained_log: + case Intrinsic::experimental_constrained_log10: + case Intrinsic::experimental_constrained_log2: + case Intrinsic::experimental_constrained_rint: + case Intrinsic::experimental_constrained_nearbyint: + case Intrinsic::experimental_constrained_ceil: + case Intrinsic::experimental_constrained_floor: + case Intrinsic::experimental_constrained_round: + case Intrinsic::experimental_constrained_trunc: + Assert((NumOperands == 3), "invalid arguments for constrained FP intrinsic", &FPI); - Assert(isa(FPI.getArgOperand(NumOperands-1)), - "invalid exception behavior argument", &FPI); - Assert(isa(FPI.getArgOperand(NumOperands-2)), - "invalid rounding mode argument", &FPI); - Assert(FPI.getRoundingMode() != ConstrainedFPIntrinsic::rmInvalid, - "invalid rounding mode argument", &FPI); - Assert(FPI.getExceptionBehavior() != ConstrainedFPIntrinsic::ebInvalid, - "invalid exception behavior argument", &FPI); + HasExceptionMD = true; + HasRoundingMD = true; + break; + + case Intrinsic::experimental_constrained_fma: + Assert((NumOperands == 5), + "invalid arguments for constrained FP intrinsic", &FPI); + HasExceptionMD = true; + HasRoundingMD = true; + break; + + case Intrinsic::experimental_constrained_fadd: + case Intrinsic::experimental_constrained_fsub: + case Intrinsic::experimental_constrained_fmul: + case Intrinsic::experimental_constrained_fdiv: + case Intrinsic::experimental_constrained_frem: + case Intrinsic::experimental_constrained_pow: + case Intrinsic::experimental_constrained_powi: + case Intrinsic::experimental_constrained_maxnum: + case Intrinsic::experimental_constrained_minnum: + Assert((NumOperands == 4), + "invalid arguments for constrained FP intrinsic", &FPI); + HasExceptionMD = true; + HasRoundingMD = true; + break; + + case Intrinsic::experimental_constrained_fptosi: + case Intrinsic::experimental_constrained_fptoui: { + Assert((NumOperands == 2), + "invalid arguments for constrained FP intrinsic", &FPI); + HasExceptionMD = true; + + Value *Operand = FPI.getArgOperand(0); + uint64_t NumSrcElem = 0; + if (Operand->getType()->isVectorTy()) { + auto *OperandT = cast(Operand->getType()); + NumSrcElem = OperandT->getNumElements(); + Assert(OperandT->getVectorElementType()->isFloatingPointTy(), + "Intrinsic first argument vector must be floating point", + &FPI); + } + else + Assert(Operand->getType()->isFloatingPointTy(), + "Intrinsic first argument must be floating point", + &FPI); + + Operand = &FPI; + Assert((NumSrcElem > 0) == Operand->getType()->isVectorTy(), + "Intrinsic first argument and result disagree on vector use", + &FPI); + if (Operand->getType()->isVectorTy()) { + auto *OperandT = cast(Operand->getType()); + Assert(NumSrcElem == OperandT->getNumElements(), + "Intrinsic first argument and result vector lengths must be equal", + &FPI); + Assert(OperandT->getVectorElementType()->isIntegerTy(), + "Intrinsic result vector must be integer", + &FPI); + } + else + Assert(Operand->getType()->isIntegerTy(), + "Intrinsic result must be an integer", + &FPI); + } + break; + + case Intrinsic::experimental_constrained_fptrunc: + case Intrinsic::experimental_constrained_fpext: { + Assert((NumOperands == 2), + "invalid arguments for constrained FP intrinsic", &FPI); + HasExceptionMD = true; + + Value *Operand = FPI.getArgOperand(0); + uint64_t NumSrcElem = 0; + if (Operand->getType()->isVectorTy()) { + auto *OperandT = cast(Operand->getType()); + NumSrcElem = OperandT->getNumElements(); + Assert(OperandT->getVectorElementType()->isFloatingPointTy(), + "Intrinsic first argument vector must be floating point", + &FPI); + } + else + Assert(Operand->getType()->isFloatingPointTy(), + "Intrinsic first argument must be floating point", + &FPI); + + Operand = &FPI; + Assert((NumSrcElem > 0) == Operand->getType()->isVectorTy(), + "Intrinsic first argument and result disagree on vector use", + &FPI); + if (Operand->getType()->isVectorTy()) { + auto *OperandT = cast(Operand->getType()); + Assert(NumSrcElem == OperandT->getNumElements(), + "Intrinsic first argument and result vector lengths must be equal", + &FPI); + Assert(OperandT->getVectorElementType()->isFloatingPointTy(), + "Intrinsic result vector must be floating point", + &FPI); + } + else + Assert(Operand->getType()->isFloatingPointTy(), + "Intrinsic result must be an floating point", + &FPI); + } + break; + + default: + llvm_unreachable("Invalid constrained FP intrinsic!"); + } + + if (HasExceptionMD) { + Assert(isa(FPI.getArgOperand(NumOperands-1)), + "invalid exception behavior argument", &FPI); + Assert(FPI.getExceptionBehavior() != ConstrainedFPIntrinsic::ebInvalid, + "invalid exception behavior argument", &FPI); + } + if (HasRoundingMD) { + int RoundingIdx = (HasExceptionMD ? NumOperands - 2 : NumOperands - 1); + Assert(isa(FPI.getArgOperand(RoundingIdx)), + "invalid rounding mode argument", &FPI); + Assert(FPI.getRoundingMode() != ConstrainedFPIntrinsic::rmInvalid, + "invalid rounding mode argument", &FPI); + } } void Verifier::visitDbgIntrinsic(StringRef Kind, DbgVariableIntrinsic &DII) { Index: test/CodeGen/SystemZ/fp-con-conv-14.ll =================================================================== --- test/CodeGen/SystemZ/fp-con-conv-14.ll +++ test/CodeGen/SystemZ/fp-con-conv-14.ll @@ -0,0 +1,82 @@ +; Test conversion of floating-point values to unsigned integers. +; +; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z196 | FileCheck %s + +; Test f32->i32. +define i32 @f1(float %f) { +; CHECK-LABEL: f1: +; CHECK: clfebr %r2, 5, %f0, 0 +; CHECK: br %r14 + %conv = call zeroext i32 @llvm.experimental.constrained.fptoui.i32.f32( + float %f, + metadata !"fpexcept.strict") + ret i32 %conv +} + +; Test f64->i32. +define i32 @f2(double %f) { +; CHECK-LABEL: f2: +; CHECK: clfdbr %r2, 5, %f0, 0 +; CHECK: br %r14 + %conv = call zeroext i32 @llvm.experimental.constrained.fptoui.i32.f64( + double %f, + metadata !"fpexcept.strict") + ret i32 %conv +} + +; Test f128->i32. +define i32 @f3(fp128 *%src) { +; CHECK-LABEL: f3: +; CHECK-DAG: ld %f0, 0(%r2) +; CHECK-DAG: ld %f2, 8(%r2) +; CHECK: clfxbr %r2, 5, %f0, 0 +; CHECK: br %r14 + %f = load fp128, fp128 *%src + %conv = call zeroext i32 @llvm.experimental.constrained.fptoui.i32.f128( + fp128 %f, + metadata !"fpexcept.strict") + ret i32 %conv +} + +; Test f32->i64. +define i64 @f4(float %f) { +; CHECK-LABEL: f4: +; CHECK: clgebr %r2, 5, %f0, 0 +; CHECK: br %r14 + %conv = call zeroext i64 @llvm.experimental.constrained.fptoui.i64.f32( + float %f, + metadata !"fpexcept.strict") + ret i64 %conv +} + +; Test f64->i64. +define i64 @f5(double %f) { +; CHECK-LABEL: f5: +; CHECK: clgdbr %r2, 5, %f0, 0 +; CHECK: br %r14 + %conv = call zeroext i64 @llvm.experimental.constrained.fptoui.i64.f64( + double %f, + metadata !"fpexcept.strict") + ret i64 %conv +} + +; Test f128->i64. +define i64 @f6(fp128 *%src) { +; CHECK-LABEL: f6: +; CHECK-DAG: ld %f0, 0(%r2) +; CHECK-DAG: ld %f2, 8(%r2) +; CHECK: clgxbr %r2, 5, %f0, 0 +; CHECK: br %r14 + %f = load fp128, fp128 *%src + %conv = call zeroext i64 @llvm.experimental.constrained.fptoui.i64.f128( + fp128 %f, + metadata !"fpexcept.strict") + ret i64 %conv +} + +declare zeroext i32 @llvm.experimental.constrained.fptoui.i32.f32(float, metadata); +declare zeroext i32 @llvm.experimental.constrained.fptoui.i32.f64(double, metadata); +declare zeroext i32 @llvm.experimental.constrained.fptoui.i32.f128(fp128, metadata); +declare zeroext i64 @llvm.experimental.constrained.fptoui.i64.f32(float, metadata); +declare zeroext i64 @llvm.experimental.constrained.fptoui.i64.f64(double, metadata); +declare zeroext i64 @llvm.experimental.constrained.fptoui.i64.f128(fp128, metadata); Index: test/CodeGen/X86/fp-intrinsics.ll =================================================================== --- test/CodeGen/X86/fp-intrinsics.ll +++ test/CodeGen/X86/fp-intrinsics.ll @@ -286,6 +286,43 @@ ret double %rem } +; Verify that fptosi(42.1) isn't simplified when the rounding mode is +; unknown. +; Verify that no gross errors happen. +; CHECK-LABEL: @f20 +; COMMON: cvttsd2si +define i32 @f20() { +entry: + %result = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double 42.1, + metadata !"fpexcept.strict") + ret i32 %result +} + +; Verify that round(42.1) isn't simplified when the rounding mode is +; unknown. +; Verify that no gross errors happen. +; CHECK-LABEL: @f21 +; COMMON: cvtsd2ss +define float @f21() { +entry: + %result = call float @llvm.experimental.constrained.fptrunc.f32.f64( + double 42.1, + metadata !"fpexcept.strict") + ret float %result +} + +; Verify that fpext(42.1) isn't simplified when the rounding mode is +; unknown. +; Verify that no gross errors happen. +; CHECK-LABEL: @f22 +; COMMON: cvtss2sd +define double @f22(float %x) { +entry: + %result = call double @llvm.experimental.constrained.fpext.f64.f32(float %x, + metadata !"fpexcept.strict") + ret double %result +} + @llvm.fp.env = thread_local global i8 zeroinitializer, section "llvm.metadata" declare double @llvm.experimental.constrained.fadd.f64(double, double, metadata, metadata) declare double @llvm.experimental.constrained.fsub.f64(double, double, metadata, metadata) @@ -306,3 +343,7 @@ declare double @llvm.experimental.constrained.nearbyint.f64(double, metadata, metadata) declare float @llvm.experimental.constrained.fma.f32(float, float, float, metadata, metadata) declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata) +declare i32 @llvm.experimental.constrained.fptosi.i32.f64(double, metadata) +declare float @llvm.experimental.constrained.fptrunc.f32.f64(double, metadata) +declare double @llvm.experimental.constrained.fpext.f64.f32(float, metadata) + Index: test/CodeGen/X86/vector-constrained-fp-intrinsics.ll =================================================================== --- test/CodeGen/X86/vector-constrained-fp-intrinsics.ll +++ test/CodeGen/X86/vector-constrained-fp-intrinsics.ll @@ -2423,6 +2423,418 @@ ret <4 x double> %min } +define <1 x i32> @constrained_vector_fptosi_v1i32_v1f32() { +; CHECK-LABEL: constrained_vector_fptosi_v1i32_v1f32: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: cvttss2si {{.*}}(%rip), %eax +; CHECK-NEXT: retq +entry: + %result = call <1 x i32> @llvm.experimental.constrained.fptosi.v1i32.v1f32( + <1 x float>, + metadata !"fpexcept.strict") + ret <1 x i32> %result +} + +define <2 x i32> @constrained_vector_fptosi_v2i32_v2f32() { +; CHECK-LABEL: constrained_vector_fptosi_v2i32_v2f32: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm1 +; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm0 +; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT: retq +entry: + %result = call <2 x i32> @llvm.experimental.constrained.fptosi.v2i32.v2f32( + <2 x float>, + metadata !"fpexcept.strict") + ret <2 x i32> %result +} + +define <3 x i32> @constrained_vector_fptosi_v3i32_v3f32() { +; CHECK-LABEL: constrained_vector_fptosi_v3i32_v3f32: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: cvttps2dq {{.*}}(%rip), %xmm0 +; CHECK-NEXT: retq +entry: + %result = call <3 x i32> @llvm.experimental.constrained.fptosi.v3i32.v3f32( + <3 x float>, + metadata !"fpexcept.strict") + ret <3 x i32> %result +} + +define <4 x i32> @constrained_vector_fptosi_v4i32_v4f32() { +; CHECK-LABEL: constrained_vector_fptosi_v4i32_v4f32: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: cvttps2dq {{.*}}(%rip), %xmm0 +; CHECK-NEXT: retq +entry: + %result = call <4 x i32> @llvm.experimental.constrained.fptosi.v4i32.v4f32( + <4 x float>, + metadata !"fpexcept.strict") + ret <4 x i32> %result +} + +define <1 x i64> @constrained_vector_fptosi_v1i64_v1f32() { +; CHECK-LABEL: constrained_vector_fptosi_v1i64_v1f32: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax +; CHECK-NEXT: retq +entry: + %result = call <1 x i64> @llvm.experimental.constrained.fptosi.v1i64.v1f32( + <1 x float>, + metadata !"fpexcept.strict") + ret <1 x i64> %result +} + +define <2 x i64> @constrained_vector_fptosi_v2i64_v2f32() { +; CHECK-LABEL: constrained_vector_fptosi_v2i64_v2f32: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm1 +; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm0 +; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT: retq +entry: + %result = call <2 x i64> @llvm.experimental.constrained.fptosi.v2i64.v2f32( + <2 x float>, + metadata !"fpexcept.strict") + ret <2 x i64> %result +} + +define <3 x i64> @constrained_vector_fptosi_v3i64_v3f32() { +; CHECK-LABEL: constrained_vector_fptosi_v3i64_v3f32: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm0 +; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm1 +; CHECK-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0] +; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rcx +; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm1[2,3,0,1] +; CHECK-NEXT: movq %xmm0, %rdx +; CHECK-NEXT: retq +entry: + %result = call <3 x i64> @llvm.experimental.constrained.fptosi.v3i64.v3f32( + <3 x float>, + metadata !"fpexcept.strict") + ret <3 x i64> %result +} + +define <4 x i64> @constrained_vector_fptosi_v4i64_v4f32() { +; CHECK-LABEL: constrained_vector_fptosi_v4i64_v4f32: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm1 +; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm0 +; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm2 +; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm1 +; CHECK-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0] +; CHECK-NEXT: retq +entry: + %result = call <4 x i64> @llvm.experimental.constrained.fptosi.v4i64.v4f32( + <4 x float>, + metadata !"fpexcept.strict") + ret <4 x i64> %result +} + +define <1 x i32> @constrained_vector_fptosi_v1i32_v1f64() { +; CHECK-LABEL: constrained_vector_fptosi_v1i32_v1f64: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %eax +; CHECK-NEXT: retq +entry: + %result = call <1 x i32> @llvm.experimental.constrained.fptosi.v1i32.v1f64( + <1 x double>, + metadata !"fpexcept.strict") + ret <1 x i32> %result +} + + +define <2 x i32> @constrained_vector_fptosi_v2i32_v2f64() { +; CHECK-LABEL: constrained_vector_fptosi_v2i32_v2f64: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm1 +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm0 +; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT: retq +entry: + %result = call <2 x i32> @llvm.experimental.constrained.fptosi.v2i32.v2f64( + <2 x double>, + metadata !"fpexcept.strict") + ret <2 x i32> %result +} + +define <3 x i32> @constrained_vector_fptosi_v3i32_v3f64() { +; CHECK-LABEL: constrained_vector_fptosi_v3i32_v3f64: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm1 +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm0 +; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm1 +; CHECK-NEXT: cvttsd2si %xmm0, %rax +; CHECK-NEXT: movq %rax, %xmm2 +; CHECK-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0] +; CHECK-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,3] +; CHECK-NEXT: retq +entry: + %result = call <3 x i32> @llvm.experimental.constrained.fptosi.v3i32.v3f64( + <3 x double>, + metadata !"fpexcept.strict") + ret <3 x i32> %result +} + +define <4 x i32> @constrained_vector_fptosi_v4i32_v4f64() { +; CHECK-LABEL: constrained_vector_fptosi_v4i32_v4f64: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm0 +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm1 +; CHECK-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0] +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm2 +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm0 +; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0] +; CHECK-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2] +; CHECK-NEXT: retq +entry: + %result = call <4 x i32> @llvm.experimental.constrained.fptosi.v4i32.v4f64( + <4 x double>, + metadata !"fpexcept.strict") + ret <4 x i32> %result +} + +define <1 x i64> @constrained_vector_fptosi_v1i64_v1f64() { +; CHECK-LABEL: constrained_vector_fptosi_v1i64_v1f64: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax +; CHECK-NEXT: retq +entry: + %result = call <1 x i64> @llvm.experimental.constrained.fptosi.v1i64.v1f64( + <1 x double>, + metadata !"fpexcept.strict") + ret <1 x i64> %result +} + +define <2 x i64> @constrained_vector_fptosi_v2i64_v2f64() { +; CHECK-LABEL: constrained_vector_fptosi_v2i64_v2f64: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm1 +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm0 +; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT: retq +entry: + %result = call <2 x i64> @llvm.experimental.constrained.fptosi.v2i64.v2f64( + <2 x double>, + metadata !"fpexcept.strict") + ret <2 x i64> %result +} + +define <3 x i64> @constrained_vector_fptosi_v3i64_v3f64() { +; CHECK-LABEL: constrained_vector_fptosi_v3i64_v3f64: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm0 +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm1 +; CHECK-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0] +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rcx +; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm1[2,3,0,1] +; CHECK-NEXT: movq %xmm0, %rdx +; CHECK-NEXT: retq +entry: + %result = call <3 x i64> @llvm.experimental.constrained.fptosi.v3i64.v3f64( + <3 x double>, + metadata !"fpexcept.strict") + ret <3 x i64> %result +} + +define <4 x i64> @constrained_vector_fptosi_v4i64_v4f64() { +; CHECK-LABEL: constrained_vector_fptosi_v4i64_v4f64: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm1 +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm0 +; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm2 +; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax +; CHECK-NEXT: movq %rax, %xmm1 +; CHECK-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0] +; CHECK-NEXT: retq +entry: + %result = call <4 x i64> @llvm.experimental.constrained.fptosi.v4i64.v4f64( + <4 x double>, + metadata !"fpexcept.strict") + ret <4 x i64> %result +} + +define <1 x float> @constrained_vector_fptrunc_v1f64() { +; CHECK-LABEL: constrained_vector_fptrunc_v1f64: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm0, %xmm0 +; CHECK-NEXT: retq +entry: + %result = call <1 x float> @llvm.experimental.constrained.fptrunc.v1f32.v1f64( + <1 x double>, + metadata !"fpexcept.strict") + ret <1 x float> %result +} + +define <2 x float> @constrained_vector_fptrunc_v2f64() { +; CHECK-LABEL: constrained_vector_fptrunc_v2f64: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm0, %xmm1 +; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm0, %xmm0 +; CHECK-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] +; CHECK-NEXT: retq +entry: + %result = call <2 x float> @llvm.experimental.constrained.fptrunc.v2f32.v2f64( + <2 x double>, + metadata !"fpexcept.strict") + ret <2 x float> %result +} + +define <3 x float> @constrained_vector_fptrunc_v3f64() { +; CHECK-LABEL: constrained_vector_fptrunc_v3f64: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm0, %xmm1 +; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm0, %xmm0 +; CHECK-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] +; CHECK-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm1, %xmm1 +; CHECK-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT: retq +entry: + %result = call <3 x float> @llvm.experimental.constrained.fptrunc.v3f32.v3f64( + <3 x double>, + metadata !"fpexcept.strict") + ret <3 x float> %result +} + +define <4 x float> @constrained_vector_fptrunc_v4f64() { +; CHECK-LABEL: constrained_vector_fptrunc_v4f64: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm0, %xmm0 +; CHECK-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm1, %xmm1 +; CHECK-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1] +; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm0, %xmm2 +; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm0, %xmm0 +; CHECK-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1] +; CHECK-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT: retq +entry: + %result = call <4 x float> @llvm.experimental.constrained.fptrunc.v4f32.v4f64( + <4 x double>, + metadata !"fpexcept.strict") + ret <4 x float> %result +} + +define <1 x double> @constrained_vector_fpext_v1f32() { +; CHECK-LABEL: constrained_vector_fpext_v1f32: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm0, %xmm0 +; CHECK-NEXT: retq +entry: + %result = call <1 x double> @llvm.experimental.constrained.fpext.v1f64.v1f32( + <1 x float>, + metadata !"fpexcept.strict") + ret <1 x double> %result +} + +define <2 x double> @constrained_vector_fpext_v2f32() { +; CHECK-LABEL: constrained_vector_fpext_v2f32: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm0, %xmm1 +; CHECK-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm0, %xmm0 +; CHECK-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT: retq +entry: + %result = call <2 x double> @llvm.experimental.constrained.fpext.v2f64.v2f32( + <2 x float>, + metadata !"fpexcept.strict") + ret <2 x double> %result +} + +define <3 x double> @constrained_vector_fpext_v3f32() { +; CHECK-LABEL: constrained_vector_fpext_v3f32: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm0, %xmm0 +; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm1, %xmm1 +; CHECK-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm2, %xmm2 +; CHECK-NEXT: movsd %xmm2, -{{[0-9]+}}(%rsp) +; CHECK-NEXT: fldl -{{[0-9]+}}(%rsp) +; CHECK-NEXT: retq +entry: + %result = call <3 x double> @llvm.experimental.constrained.fpext.v3f64.v3f32( + <3 x float>, + metadata !"fpexcept.strict") + ret <3 x double> %result +} + +define <4 x double> @constrained_vector_fpext_v4f32() { +; CHECK-LABEL: constrained_vector_fpext_v4f32: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm0, %xmm1 +; CHECK-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm0, %xmm0 +; CHECK-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm1, %xmm2 +; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm1, %xmm1 +; CHECK-NEXT: movlhps {{.*#+}} xmm1 = xmm1[0],xmm2[0] +; CHECK-NEXT: retq +entry: + %result = call <4 x double> @llvm.experimental.constrained.fpext.v4f64.v4f32( + <4 x float>, + metadata !"fpexcept.strict") + ret <4 x double> %result +} + define <1 x float> @constrained_vector_ceil_v1f32() { ; CHECK-LABEL: constrained_vector_ceil_v1f32: ; CHECK: # %bb.0: # %entry @@ -2846,6 +3258,12 @@ declare <2 x double> @llvm.experimental.constrained.nearbyint.v2f64(<2 x double>, metadata, metadata) declare <2 x double> @llvm.experimental.constrained.maxnum.v2f64(<2 x double>, <2 x double>, metadata, metadata) declare <2 x double> @llvm.experimental.constrained.minnum.v2f64(<2 x double>, <2 x double>, metadata, metadata) +declare <2 x i32> @llvm.experimental.constrained.fptosi.v2i32.v2f32(<2 x float>, metadata) +declare <2 x i64> @llvm.experimental.constrained.fptosi.v2i64.v2f32(<2 x float>, metadata) +declare <2 x i32> @llvm.experimental.constrained.fptosi.v2i32.v2f64(<2 x double>, metadata) +declare <2 x i64> @llvm.experimental.constrained.fptosi.v2i64.v2f64(<2 x double>, metadata) +declare <2 x float> @llvm.experimental.constrained.fptrunc.v2f32.v2f64(<2 x double>, metadata) +declare <2 x double> @llvm.experimental.constrained.fpext.v2f64.v2f32(<2 x float>, metadata) declare <2 x double> @llvm.experimental.constrained.ceil.v2f64(<2 x double>, metadata, metadata) declare <2 x double> @llvm.experimental.constrained.floor.v2f64(<2 x double>, metadata, metadata) declare <2 x double> @llvm.experimental.constrained.round.v2f64(<2 x double>, metadata, metadata) @@ -2871,6 +3289,12 @@ declare <1 x float> @llvm.experimental.constrained.nearbyint.v1f32(<1 x float>, metadata, metadata) declare <1 x float> @llvm.experimental.constrained.maxnum.v1f32(<1 x float>, <1 x float>, metadata, metadata) declare <1 x float> @llvm.experimental.constrained.minnum.v1f32(<1 x float>, <1 x float>, metadata, metadata) +declare <1 x i32> @llvm.experimental.constrained.fptosi.v1i32.v1f32(<1 x float>, metadata) +declare <1 x i64> @llvm.experimental.constrained.fptosi.v1i64.v1f32(<1 x float>, metadata) +declare <1 x i32> @llvm.experimental.constrained.fptosi.v1i32.v1f64(<1 x double>, metadata) +declare <1 x i64> @llvm.experimental.constrained.fptosi.v1i64.v1f64(<1 x double>, metadata) +declare <1 x float> @llvm.experimental.constrained.fptrunc.v1f32.v1f64(<1 x double>, metadata) +declare <1 x double> @llvm.experimental.constrained.fpext.v1f64.v1f32(<1 x float>, metadata) declare <1 x float> @llvm.experimental.constrained.ceil.v1f32(<1 x float>, metadata, metadata) declare <1 x float> @llvm.experimental.constrained.floor.v1f32(<1 x float>, metadata, metadata) declare <1 x float> @llvm.experimental.constrained.round.v1f32(<1 x float>, metadata, metadata) @@ -2915,6 +3339,12 @@ declare <3 x double> @llvm.experimental.constrained.maxnum.v3f64(<3 x double>, <3 x double>, metadata, metadata) declare <3 x float> @llvm.experimental.constrained.minnum.v3f32(<3 x float>, <3 x float>, metadata, metadata) declare <3 x double> @llvm.experimental.constrained.minnum.v3f64(<3 x double>, <3 x double>, metadata, metadata) +declare <3 x i32> @llvm.experimental.constrained.fptosi.v3i32.v3f32(<3 x float>, metadata) +declare <3 x i64> @llvm.experimental.constrained.fptosi.v3i64.v3f32(<3 x float>, metadata) +declare <3 x i32> @llvm.experimental.constrained.fptosi.v3i32.v3f64(<3 x double>, metadata) +declare <3 x i64> @llvm.experimental.constrained.fptosi.v3i64.v3f64(<3 x double>, metadata) +declare <3 x float> @llvm.experimental.constrained.fptrunc.v3f32.v3f64(<3 x double>, metadata) +declare <3 x double> @llvm.experimental.constrained.fpext.v3f64.v3f32(<3 x float>, metadata) declare <3 x float> @llvm.experimental.constrained.ceil.v3f32(<3 x float>, metadata, metadata) declare <3 x double> @llvm.experimental.constrained.ceil.v3f64(<3 x double>, metadata, metadata) declare <3 x float> @llvm.experimental.constrained.floor.v3f32(<3 x float>, metadata, metadata) @@ -2944,8 +3374,13 @@ declare <4 x double> @llvm.experimental.constrained.nearbyint.v4f64(<4 x double>, metadata, metadata) declare <4 x double> @llvm.experimental.constrained.maxnum.v4f64(<4 x double>, <4 x double>, metadata, metadata) declare <4 x double> @llvm.experimental.constrained.minnum.v4f64(<4 x double>, <4 x double>, metadata, metadata) +declare <4 x i32> @llvm.experimental.constrained.fptosi.v4i32.v4f32(<4 x float>, metadata) +declare <4 x i64> @llvm.experimental.constrained.fptosi.v4i64.v4f32(<4 x float>, metadata) +declare <4 x i32> @llvm.experimental.constrained.fptosi.v4i32.v4f64(<4 x double>, metadata) +declare <4 x i64> @llvm.experimental.constrained.fptosi.v4i64.v4f64(<4 x double>, metadata) +declare <4 x float> @llvm.experimental.constrained.fptrunc.v4f32.v4f64(<4 x double>, metadata) +declare <4 x double> @llvm.experimental.constrained.fpext.v4f64.v4f32(<4 x float>, metadata) declare <4 x double> @llvm.experimental.constrained.ceil.v4f64(<4 x double>, metadata, metadata) declare <4 x double> @llvm.experimental.constrained.floor.v4f64(<4 x double>, metadata, metadata) declare <4 x double> @llvm.experimental.constrained.round.v4f64(<4 x double>, metadata, metadata) declare <4 x double> @llvm.experimental.constrained.trunc.v4f64(<4 x double>, metadata, metadata) - Index: test/Feature/fp-intrinsics.ll =================================================================== --- test/Feature/fp-intrinsics.ll +++ test/Feature/fp-intrinsics.ll @@ -242,6 +242,52 @@ ret double %result } +; Verify that fptoui(42.1) isn't simplified when the rounding mode is +; unknown. +; CHECK-LABEL: f18 +; CHECK: call zeroext i32 @llvm.experimental.constrained.fptoui +define zeroext i32 @f18() { +entry: + %result = call zeroext i32 @llvm.experimental.constrained.fptoui.i32.f64( + double 42.1, + metadata !"fpexcept.strict") + ret i32 %result +} + +; Verify that fptosi(42.1) isn't simplified when the rounding mode is +; unknown. +; CHECK-LABEL: f19 +; CHECK: call i32 @llvm.experimental.constrained.fptosi +define i32 @f19() { +entry: + %result = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double 42.1, + metadata !"fpexcept.strict") + ret i32 %result +} + +; Verify that fptrunc(42.1) isn't simplified when the rounding mode is +; unknown. +; CHECK-LABEL: f20 +; CHECK: call float @llvm.experimental.constrained.fptrunc +define float @f20() { +entry: + %result = call float @llvm.experimental.constrained.fptrunc.f32.f64( + double 42.1, + metadata !"fpexcept.strict") + ret float %result +} + +; Verify that fpext(42.1) isn't simplified when the rounding mode is +; unknown. +; CHECK-LABEL: f21 +; CHECK: call double @llvm.experimental.constrained.fpext +define double @f21() { +entry: + %result = call double @llvm.experimental.constrained.fpext.f64.f32(float 42.0, + metadata !"fpexcept.strict") + ret double %result +} + @llvm.fp.env = thread_local global i8 zeroinitializer, section "llvm.metadata" declare double @llvm.experimental.constrained.fdiv.f64(double, double, metadata, metadata) declare double @llvm.experimental.constrained.fmul.f64(double, double, metadata, metadata) @@ -260,3 +306,7 @@ declare double @llvm.experimental.constrained.rint.f64(double, metadata, metadata) declare double @llvm.experimental.constrained.nearbyint.f64(double, metadata, metadata) declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata) +declare zeroext i32 @llvm.experimental.constrained.fptoui.i32.f64(double, metadata) +declare i32 @llvm.experimental.constrained.fptosi.i32.f64(double, metadata) +declare float @llvm.experimental.constrained.fptrunc.f32.f64(double, metadata) +declare double @llvm.experimental.constrained.fpext.f64.f32(float, metadata)