Index: docs/AddingConstrainedIntrinsics.rst =================================================================== --- docs/AddingConstrainedIntrinsics.rst +++ docs/AddingConstrainedIntrinsics.rst @@ -0,0 +1,95 @@ +================================================== +How To Add A Constrained Floating-Point Intrinsic +================================================== + +.. contents:: + :local: + +.. warning:: + This is a work in progress. + +Add the intrinsic +================= + +Multiple files need to be updated when adding a new constrained intrinsic. + +Add the new intrinsic to the table of intrinsics.:: + + include/llvm/IR/Intrinsics.td + +Update class ConstrainedFPIntrinsic to know about the intrinsics.:: + + include/llvm/IR/IntrinsicInst.h + +Functions like ConstrainedFPIntrinsic::isUnaryOp() or +ConstrainedFPIntrinsic::isTernaryOp() may need to know about the new +intrinsic.:: + + lib/IR/IntrinsicInst.cpp + +Update the IR verifier:: + + lib/IR/Verifier.cpp + +Add SelectionDAG node types +=========================== + +Add the new STRICT version of the node type to the ISD::NodeType enum.:: + + include/llvm/CodeGen/ISDOpcodes.h + +In class SDNode update isStrictFPOpcode():: + + include/llvm/CodeGen/SelectionDAGNodes.h + +A mapping from the STRICT SDnode type to the non-STRICT is done in +TargetLoweringBase::getStrictFPOperationAction(). This allows STRICT +nodes to be legalized similarly to the non-STRICT node type.:: + + include/llvm/CodeGen/TargetLowering.h + +Building the SelectionDAG +------------------------- + +The switch statement in SelectionDAGBuilder::visitIntrinsicCall() needs +to be updated to call SelectionDAGBuilder::visitConstrainedFPIntrinsic(). +That function, in turn, needs to be updated to know how to create the +SDNode for the intrinsic. The new STRICT node will eventually be converted +to the matching non-STRICT node. For this reason it _must_ have the same +operands and values as the non-STRICT version in case the non-STRICT +version's default lowering is used. This means that if the non-STRICT +version of the node does not use the chain then the STRICT node cannot +either.:: + + lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp + +Most of the STRICT nodes get legalized the same as their matching non-STRICT +counterparts. A new STRICT node with this property must get added to the +switch in SelectionDAGLegalize::LegalizeOp().:: + + lib/CodeGen/SelectionDAG/LegalizeDAG.cpp + +Other parts of the legalizer may need to be updated as well. Look for +places where the non-STRICT counterpart is legalized and update as needed. +Be careful of the chain since STRICT nodes use it but their counterparts +often don't.:: + +The code to do the conversion or mutation of the STRICT node to a non-STRICT +version of the node happens in SelectionDAG::mutateStrictFPToFP(). Be +careful updating this function since some nodes are always chained and +some are not. Some nodes have the same return type as their input operand, +but some are different. Both of these points must be properly handled.:: + + lib/CodeGen/SelectionDAG/SelectionDAG.cpp + +To make debug logs readable it is helpful to update the SelectionDAG's +debug logger::: + + lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp + +Add documentation and tests +=========================== + +:: + + docs/LangRef.rst Index: docs/LangRef.rst =================================================================== --- docs/LangRef.rst +++ docs/LangRef.rst @@ -14516,6 +14516,75 @@ operand computed with infinite precision, and then rounded to the target precision. +'``llvm.experimental.constrained.fptrunc``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare + @llvm.experimental.constrained.fptrunc( , + metadata ) + +Overview: +""""""""" + +The '``llvm.experimental.constrained.fptrunc``' intrinsic truncates ``value`` +to type ``ty2``. + +Arguments: +"""""""""" + +The first argument to the '``llvm.experimental.constrained.fptrunc``' +intrinsic must be :ref:`floating point ` or :ref:`vector +` of floating point values. This argument must be larger in size +than the result. + +The second argument specifies the exception behavior as described above. + +Semantics: +"""""""""" + +The result produced is a floating point value truncated to be smaller in size +than the operand. + +'``llvm.experimental.constrained.fpext``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare + @llvm.experimental.constrained.fpext( , + metadata ) + +Overview: +""""""""" + +The '``llvm.experimental.constrained.fpext``' intrinsic extends a +floating-point ``value`` to a larger floating-point value. + +Arguments: +"""""""""" + +The first argument to the '``llvm.experimental.constrained.fpext``' +intrinsic must be :ref:`floating point ` or :ref:`vector +` of floating point values. This argument must be smaller in size +than the result. + +The second argument specifies the exception behavior as described above. + +Semantics: +"""""""""" + +The result produced is a floating point value extended to be larger in size +than the operand. All restrictions that apply to the fpext instruction also +apply to this intrinsic. + Constrained libm-equivalent Intrinsics -------------------------------------- Index: docs/index.rst =================================================================== --- docs/index.rst +++ docs/index.rst @@ -191,6 +191,7 @@ CommandLine CompilerWriterInfo ExtendingLLVM + AddingConstrainedIntrinsics HowToSetUpLLVMStyleRTTI ProgrammersManual Extensions @@ -227,6 +228,10 @@ :doc:`ExtendingLLVM` Look here to see how to add instructions and intrinsics to LLVM. +:doc:`AddingConstrainedIntrinsics` + Gives the steps necessary when adding a new constrained math intrinsic + to LLVM. + `Doxygen generated documentation `_ (`classes `_) Index: include/llvm/CodeGen/ISDOpcodes.h =================================================================== --- include/llvm/CodeGen/ISDOpcodes.h +++ include/llvm/CodeGen/ISDOpcodes.h @@ -298,6 +298,26 @@ STRICT_FRINT, STRICT_FNEARBYINT, STRICT_FMAXNUM, STRICT_FMINNUM, STRICT_FCEIL, STRICT_FFLOOR, STRICT_FROUND, STRICT_FTRUNC, + /// X = STRICT_FP_ROUND(Y, TRUNC) - Rounding 'Y' from a larger floating + /// point type down to the precision of the destination VT. TRUNC is a + /// flag, which is always an integer that is zero or one. If TRUNC is 0, + /// this is a normal rounding, if it is 1, this FP_ROUND is known to not + /// change the value of Y. + /// + /// The TRUNC = 1 case is used in cases where we know that the value will + /// not be modified by the node, because Y is not using any of the extra + /// precision of source type. This allows certain transformations like + /// STRICT_FP_EXTEND(STRICT_FP_ROUND(X,1)) -> X which are not safe for + /// STRICT_FP_EXTEND(STRICT_FP_ROUND(X,0)) because the extra bits aren't + /// removed. + /// It is used to limit optimizations while the DAG is being optimized. + STRICT_FP_ROUND, + + /// X = STRICT_FP_EXTEND(Y) - Extend a smaller FP type into a larger FP + /// type. + /// It is used to limit optimizations while the DAG is being optimized. + STRICT_FP_EXTEND, + /// FMA - Perform a * b + c with no intermediate rounding step. FMA, Index: include/llvm/CodeGen/SelectionDAGNodes.h =================================================================== --- include/llvm/CodeGen/SelectionDAGNodes.h +++ include/llvm/CodeGen/SelectionDAGNodes.h @@ -678,6 +678,8 @@ case ISD::STRICT_FFLOOR: case ISD::STRICT_FROUND: case ISD::STRICT_FTRUNC: + case ISD::STRICT_FP_ROUND: + case ISD::STRICT_FP_EXTEND: return true; } } Index: include/llvm/CodeGen/TargetLowering.h =================================================================== --- include/llvm/CodeGen/TargetLowering.h +++ include/llvm/CodeGen/TargetLowering.h @@ -865,6 +865,8 @@ case ISD::STRICT_FFLOOR: EqOpc = ISD::FFLOOR; break; case ISD::STRICT_FROUND: EqOpc = ISD::FROUND; break; case ISD::STRICT_FTRUNC: EqOpc = ISD::FTRUNC; break; + case ISD::STRICT_FP_ROUND: EqOpc = ISD::FP_ROUND; break; + case ISD::STRICT_FP_EXTEND: EqOpc = ISD::FP_EXTEND; break; } auto Action = getOperationAction(EqOpc, VT); Index: include/llvm/IR/IntrinsicInst.h =================================================================== --- include/llvm/IR/IntrinsicInst.h +++ include/llvm/IR/IntrinsicInst.h @@ -239,6 +239,8 @@ case Intrinsic::experimental_constrained_fdiv: case Intrinsic::experimental_constrained_frem: case Intrinsic::experimental_constrained_fma: + case Intrinsic::experimental_constrained_fptrunc: + case Intrinsic::experimental_constrained_fpext: case Intrinsic::experimental_constrained_sqrt: case Intrinsic::experimental_constrained_pow: case Intrinsic::experimental_constrained_powi: Index: include/llvm/IR/Intrinsics.td =================================================================== --- include/llvm/IR/Intrinsics.td +++ include/llvm/IR/Intrinsics.td @@ -588,6 +588,14 @@ llvm_metadata_ty, llvm_metadata_ty ]>; + def int_experimental_constrained_fptrunc : Intrinsic<[ llvm_anyfloat_ty ], + [ llvm_anyfloat_ty, + llvm_metadata_ty ]>; + + def int_experimental_constrained_fpext : Intrinsic<[ llvm_anyfloat_ty ], + [ llvm_anyfloat_ty, + llvm_metadata_ty ]>; + // These intrinsics are sensitive to the rounding mode so we need constrained // versions of each of them. When strict rounding and exception control are // not required the non-constrained versions of these intrinsics should be @@ -669,8 +677,6 @@ llvm_metadata_ty, llvm_metadata_ty ]>; } -// FIXME: Add intrinsics for fcmp, fptrunc, fpext, fptoui and fptosi. -// FIXME: Add intrinsics for fabs and copysign? //===------------------------- Expect Intrinsics --------------------------===// Index: lib/CodeGen/SelectionDAG/LegalizeDAG.cpp =================================================================== --- lib/CodeGen/SelectionDAG/LegalizeDAG.cpp +++ lib/CodeGen/SelectionDAG/LegalizeDAG.cpp @@ -156,7 +156,7 @@ void ExpandSinCosLibCall(SDNode *Node, SmallVectorImpl &Results); SDValue EmitStackConvert(SDValue SrcOp, EVT SlotVT, EVT DestVT, - const SDLoc &dl); + const SDLoc &dl, SDNode *OldChain = nullptr); SDValue ExpandBUILD_VECTOR(SDNode *Node); SDValue ExpandSCALAR_TO_VECTOR(SDNode *Node); void ExpandDYNAMIC_STACKALLOC(SDNode *Node, @@ -233,6 +233,16 @@ } ReplacedNode(Old); } + + void ReplaceNodeIgnoreChain(SDNode *Old, SDNode *New) { + LLVM_DEBUG(dbgs() << " ... replacing: "; Old->dump(&DAG); + dbgs() << " with: "; New->dump(&DAG)); + + DAG.ReplaceAllUsesOfValueWith(SDValue(Old, 0), SDValue(New, 0)); + if (UpdatedNodes) + UpdatedNodes->insert(New); + ReplacedNode(Old); + } }; } // end anonymous namespace @@ -1114,6 +1124,8 @@ case ISD::STRICT_FFLOOR: case ISD::STRICT_FROUND: case ISD::STRICT_FTRUNC: + case ISD::STRICT_FP_ROUND: + case ISD::STRICT_FP_EXTEND: // These pseudo-ops get legalized as if they were their non-strict // equivalent. For instance, if ISD::FSQRT is legal then ISD::STRICT_FSQRT // is also legal, but if ISD::FSQRT requires expansion then so does @@ -1722,7 +1734,8 @@ /// a load from the stack slot to DestVT, extending it if needed. /// The resultant code need not be legal. SDValue SelectionDAGLegalize::EmitStackConvert(SDValue SrcOp, EVT SlotVT, - EVT DestVT, const SDLoc &dl) { + EVT DestVT, const SDLoc &dl, + SDNode *OldChain) { // Create the stack frame object. unsigned SrcAlign = DAG.getDataLayout().getPrefTypeAlignment( SrcOp.getValueType().getTypeForEVT(*DAG.getContext())); @@ -1742,23 +1755,36 @@ // Emit a store to the stack slot. Use a truncstore if the input value is // later than DestVT. SDValue Store; + SDValue Chain = SDValue(OldChain, 1); + if (!OldChain) + Chain = DAG.getEntryNode(); if (SrcSize > SlotSize) - Store = DAG.getTruncStore(DAG.getEntryNode(), dl, SrcOp, FIPtr, PtrInfo, + Store = DAG.getTruncStore(Chain, dl, SrcOp, FIPtr, PtrInfo, SlotVT, SrcAlign); else { assert(SrcSize == SlotSize && "Invalid store"); Store = - DAG.getStore(DAG.getEntryNode(), dl, SrcOp, FIPtr, PtrInfo, SrcAlign); + DAG.getStore(Chain, dl, SrcOp, FIPtr, PtrInfo, SrcAlign); } + SDValue Load; + // Result is a load from the stack slot. - if (SlotSize == DestSize) - return DAG.getLoad(DestVT, dl, Store, FIPtr, PtrInfo, DestAlign); + if (SlotSize == DestSize) { + Load = DAG.getLoad(DestVT, dl, Store, FIPtr, PtrInfo, DestAlign); + } + else { + assert(SlotSize < DestSize && "Unknown extension!"); + Load = DAG.getExtLoad(ISD::EXTLOAD, dl, DestVT, Store, FIPtr, + PtrInfo, SlotVT, DestAlign); + } - assert(SlotSize < DestSize && "Unknown extension!"); - return DAG.getExtLoad(ISD::EXTLOAD, dl, DestVT, Store, FIPtr, PtrInfo, SlotVT, - DestAlign); + if (OldChain) { + DAG.ReplaceAllUsesOfValueWith(SDValue(Load.getNode(), 1), Chain); + } + + return Load; } SDValue SelectionDAGLegalize::ExpandSCALAR_TO_VECTOR(SDNode *Node) { @@ -2799,12 +2825,29 @@ } break; } + case ISD::STRICT_FP_ROUND: + Tmp1 = EmitStackConvert(Node->getOperand(1), + Node->getValueType(0), + Node->getValueType(0), dl, Node); + ReplaceNodeIgnoreChain(Node, Tmp1.getNode()); + LLVM_DEBUG(dbgs() << "Successfully expanded STRICT_FP_ROUND node\n"); + return true; + break; case ISD::FP_ROUND: case ISD::BITCAST: - Tmp1 = EmitStackConvert(Node->getOperand(0), Node->getValueType(0), + Tmp1 = EmitStackConvert(Node->getOperand(0), + Node->getValueType(0), Node->getValueType(0), dl); Results.push_back(Tmp1); break; + case ISD::STRICT_FP_EXTEND: + Tmp1 = EmitStackConvert(Node->getOperand(1), + Node->getOperand(1).getValueType(), + Node->getValueType(0), dl, Node); + ReplaceNodeIgnoreChain(Node, Tmp1.getNode()); + LLVM_DEBUG(dbgs() << "Successfully expanded STRICT_FP_EXTEND node\n"); + return true; + break; case ISD::FP_EXTEND: Tmp1 = EmitStackConvert(Node->getOperand(0), Node->getOperand(0).getValueType(), Index: lib/CodeGen/SelectionDAG/LegalizeTypes.h =================================================================== --- lib/CodeGen/SelectionDAG/LegalizeTypes.h +++ lib/CodeGen/SelectionDAG/LegalizeTypes.h @@ -680,6 +680,7 @@ SDValue ScalarizeVecRes_BUILD_VECTOR(SDNode *N); SDValue ScalarizeVecRes_EXTRACT_SUBVECTOR(SDNode *N); SDValue ScalarizeVecRes_FP_ROUND(SDNode *N); + SDValue ScalarizeVecRes_STRICT_FP_ROUND(SDNode *N); SDValue ScalarizeVecRes_FPOWI(SDNode *N); SDValue ScalarizeVecRes_INSERT_VECTOR_ELT(SDNode *N); SDValue ScalarizeVecRes_LOAD(LoadSDNode *N); @@ -703,6 +704,7 @@ SDValue ScalarizeVecOp_VSETCC(SDNode *N); SDValue ScalarizeVecOp_STORE(StoreSDNode *N, unsigned OpNo); SDValue ScalarizeVecOp_FP_ROUND(SDNode *N, unsigned OpNo); + SDValue ScalarizeVecOp_STRICT_FP_ROUND(SDNode *N, unsigned OpNo); //===--------------------------------------------------------------------===// // Vector Splitting Support: LegalizeVectorTypes.cpp @@ -809,6 +811,7 @@ SDValue WidenVecRes_BinaryCanTrap(SDNode *N); SDValue WidenVecRes_StrictFP(SDNode *N); SDValue WidenVecRes_Convert(SDNode *N); + SDValue WidenVecRes_Convert_StrictFP(SDNode *N); SDValue WidenVecRes_FCOPYSIGN(SDNode *N); SDValue WidenVecRes_POWI(SDNode *N); SDValue WidenVecRes_Shift(SDNode *N); Index: lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp =================================================================== --- lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp +++ lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp @@ -328,6 +328,8 @@ case ISD::STRICT_FFLOOR: case ISD::STRICT_FROUND: case ISD::STRICT_FTRUNC: + case ISD::STRICT_FP_ROUND: + case ISD::STRICT_FP_EXTEND: // These pseudo-ops get legalized as if they were their non-strict // equivalent. For instance, if ISD::FSQRT is legal then ISD::STRICT_FSQRT // is also legal, but if ISD::FSQRT requires expansion then so does @@ -1233,7 +1235,7 @@ if (OperVT.isVector()) Oper = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, - EltVT, Oper, Idx); + OperVT.getVectorElementType(), Oper, Idx); Opers.push_back(Oper); } Index: lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp =================================================================== --- lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -51,6 +51,7 @@ case ISD::BITCAST: R = ScalarizeVecRes_BITCAST(N); break; case ISD::BUILD_VECTOR: R = ScalarizeVecRes_BUILD_VECTOR(N); break; case ISD::EXTRACT_SUBVECTOR: R = ScalarizeVecRes_EXTRACT_SUBVECTOR(N); break; + case ISD::STRICT_FP_ROUND: R = ScalarizeVecRes_STRICT_FP_ROUND(N); break; case ISD::FP_ROUND: R = ScalarizeVecRes_FP_ROUND(N); break; case ISD::FP_ROUND_INREG: R = ScalarizeVecRes_InregOp(N); break; case ISD::FPOWI: R = ScalarizeVecRes_FPOWI(N); break; @@ -170,6 +171,7 @@ case ISD::STRICT_FFLOOR: case ISD::STRICT_FROUND: case ISD::STRICT_FTRUNC: + case ISD::STRICT_FP_EXTEND: R = ScalarizeVecRes_StrictFPOp(N); break; case ISD::SMULFIX: @@ -275,6 +277,18 @@ NewVT, Op, N->getOperand(1)); } +SDValue DAGTypeLegalizer::ScalarizeVecRes_STRICT_FP_ROUND(SDNode *N) { + EVT NewVT = N->getValueType(0).getVectorElementType(); + SDValue Op = GetScalarizedVector(N->getOperand(1)); + SDValue Res = DAG.getNode(ISD::STRICT_FP_ROUND, SDLoc(N), + { NewVT, MVT::Other }, + { N->getOperand(0), Op, N->getOperand(2) }); + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(N, 1), Res.getValue(1)); + return Res; +} + SDValue DAGTypeLegalizer::ScalarizeVecRes_FPOWI(SDNode *N) { SDValue Op = GetScalarizedVector(N->getOperand(0)); return DAG.getNode(ISD::FPOWI, SDLoc(N), @@ -558,6 +572,9 @@ case ISD::STORE: Res = ScalarizeVecOp_STORE(cast(N), OpNo); break; + case ISD::STRICT_FP_ROUND: + Res = ScalarizeVecOp_STRICT_FP_ROUND(N, OpNo); + break; case ISD::FP_ROUND: Res = ScalarizeVecOp_FP_ROUND(N, OpNo); break; @@ -691,6 +708,18 @@ return DAG.getNode(ISD::SCALAR_TO_VECTOR, SDLoc(N), N->getValueType(0), Res); } +SDValue DAGTypeLegalizer::ScalarizeVecOp_STRICT_FP_ROUND(SDNode *N, + unsigned OpNo) { + SDValue Elt = GetScalarizedVector(N->getOperand(1)); + SDValue Res = DAG.getNode(ISD::STRICT_FP_ROUND, SDLoc(N), + { N->getValueType(0).getVectorElementType(), MVT::Other }, + { N->getOperand(0), Elt, N->getOperand(2) }); + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(N, 1), Res.getValue(1)); + return DAG.getNode(ISD::SCALAR_TO_VECTOR, SDLoc(N), N->getValueType(0), Res); +} + //===----------------------------------------------------------------------===// // Result Vector Splitting //===----------------------------------------------------------------------===// @@ -774,7 +803,9 @@ case ISD::FNEARBYINT: case ISD::FNEG: case ISD::FP_EXTEND: + case ISD::STRICT_FP_EXTEND: case ISD::FP_ROUND: + case ISD::STRICT_FP_ROUND: case ISD::FP_TO_SINT: case ISD::FP_TO_UINT: case ISD::FRINT: @@ -1496,15 +1527,33 @@ // If the input also splits, handle it directly for a compile time speedup. // Otherwise split it by hand. - EVT InVT = N->getOperand(0).getValueType(); + EVT InVT = N->getOperand(N->isStrictFPOpcode()).getValueType(); if (getTypeAction(InVT) == TargetLowering::TypeSplitVector) - GetSplitVector(N->getOperand(0), Lo, Hi); + GetSplitVector(N->getOperand(N->isStrictFPOpcode()), Lo, Hi); else - std::tie(Lo, Hi) = DAG.SplitVectorOperand(N, 0); + std::tie(Lo, Hi) = DAG.SplitVectorOperand(N, N->isStrictFPOpcode()); if (N->getOpcode() == ISD::FP_ROUND) { Lo = DAG.getNode(N->getOpcode(), dl, LoVT, Lo, N->getOperand(1)); Hi = DAG.getNode(N->getOpcode(), dl, HiVT, Hi, N->getOperand(1)); + } else if (N->getOpcode() == ISD::STRICT_FP_ROUND) { + Lo = DAG.getNode(N->getOpcode(), dl, { LoVT, MVT::Other }, + { N->getOperand(0), Lo, N->getOperand(2) }); + Hi = DAG.getNode(N->getOpcode(), dl, { HiVT, MVT::Other }, + { N->getOperand(0), Hi, N->getOperand(2) }); + SDValue NewChain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, + Lo.getValue(1), Hi.getValue(1)); + ReplaceValueWith(SDValue(N, 1), NewChain); + } else if (N->isStrictFPOpcode()) { + Lo = DAG.getNode(N->getOpcode(), dl, { LoVT, MVT::Other }, + { N->getOperand(0), Lo }); + Hi = DAG.getNode(N->getOpcode(), dl, { HiVT, MVT::Other }, + { N->getOperand(0), Hi }); + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + SDValue NewChain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, + Lo.getValue(1), Hi.getValue(1)); + ReplaceValueWith(SDValue(N, 1), NewChain); } else { Lo = DAG.getNode(N->getOpcode(), dl, LoVT, Lo); Hi = DAG.getNode(N->getOpcode(), dl, HiVT, Hi); @@ -1705,6 +1754,7 @@ case ISD::TRUNCATE: Res = SplitVecOp_TruncateHelper(N); break; + case ISD::STRICT_FP_ROUND: case ISD::FP_ROUND: Res = SplitVecOp_FP_ROUND(N); break; case ISD::FCOPYSIGN: Res = SplitVecOp_FCOPYSIGN(N); break; case ISD::STORE: @@ -1734,6 +1784,7 @@ case ISD::CTTZ: case ISD::CTLZ: case ISD::CTPOP: + case ISD::STRICT_FP_EXTEND: case ISD::FP_EXTEND: case ISD::SIGN_EXTEND: case ISD::ZERO_EXTEND: @@ -1775,7 +1826,11 @@ if (Res.getNode() == N) return true; - assert(Res.getValueType() == N->getValueType(0) && N->getNumValues() == 1 && + if (N->isStrictFPOpcode()) + assert(Res.getValueType() == N->getValueType(0) && N->getNumValues() == 2 && + "Invalid operand expansion"); + else + assert(Res.getValueType() == N->getValueType(0) && N->getNumValues() == 1 && "Invalid operand expansion"); ReplaceValueWith(SDValue(N, 0), Res); @@ -1863,15 +1918,31 @@ EVT ResVT = N->getValueType(0); SDValue Lo, Hi; SDLoc dl(N); - GetSplitVector(N->getOperand(0), Lo, Hi); + GetSplitVector(N->getOperand(N->isStrictFPOpcode()), Lo, Hi); EVT InVT = Lo.getValueType(); EVT OutVT = EVT::getVectorVT(*DAG.getContext(), ResVT.getVectorElementType(), InVT.getVectorNumElements()); - Lo = DAG.getNode(N->getOpcode(), dl, OutVT, Lo); - Hi = DAG.getNode(N->getOpcode(), dl, OutVT, Hi); + if (N->isStrictFPOpcode()) { + Lo = DAG.getNode(N->getOpcode(), dl, { OutVT, MVT::Other }, + { N->getOperand(0), Lo }); + Hi = DAG.getNode(N->getOpcode(), dl, { OutVT, MVT::Other }, + { N->getOperand(0), Hi }); + // Build a factor node to remember that this load is independent of the + // other one. + SDValue Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1), + Hi.getValue(1)); + + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(N, 1), Ch); + } else { + Lo = DAG.getNode(N->getOpcode(), dl, OutVT, Lo); + Hi = DAG.getNode(N->getOpcode(), dl, OutVT, Hi); + } + return DAG.getNode(ISD::CONCAT_VECTORS, dl, ResVT, Lo, Hi); } @@ -2343,14 +2414,26 @@ EVT ResVT = N->getValueType(0); SDValue Lo, Hi; SDLoc DL(N); - GetSplitVector(N->getOperand(0), Lo, Hi); + GetSplitVector(N->getOperand(N->isStrictFPOpcode()), Lo, Hi); EVT InVT = Lo.getValueType(); EVT OutVT = EVT::getVectorVT(*DAG.getContext(), ResVT.getVectorElementType(), InVT.getVectorNumElements()); - Lo = DAG.getNode(ISD::FP_ROUND, DL, OutVT, Lo, N->getOperand(1)); - Hi = DAG.getNode(ISD::FP_ROUND, DL, OutVT, Hi, N->getOperand(1)); + if (N->isStrictFPOpcode()) { + Lo = DAG.getNode(N->getOpcode(), DL, { OutVT, MVT::Other }, + { N->getOperand(0), Lo, N->getOperand(2) }); + Hi = DAG.getNode(N->getOpcode(), DL, { OutVT, MVT::Other }, + { N->getOperand(0), Hi, N->getOperand(2) }); + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + SDValue NewChain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, + Lo.getValue(1), Hi.getValue(1)); + ReplaceValueWith(SDValue(N, 1), NewChain); + } else { + Lo = DAG.getNode(ISD::FP_ROUND, DL, OutVT, Lo, N->getOperand(1)); + Hi = DAG.getNode(ISD::FP_ROUND, DL, OutVT, Hi, N->getOperand(1)); + } return DAG.getNode(ISD::CONCAT_VECTORS, DL, ResVT, Lo, Hi); } @@ -2505,6 +2588,11 @@ Res = WidenVecRes_Convert(N); break; + case ISD::STRICT_FP_EXTEND: + case ISD::STRICT_FP_ROUND: + Res = WidenVecRes_Convert_StrictFP(N); + break; + case ISD::FABS: case ISD::FCEIL: case ISD::FCOS: @@ -2929,6 +3017,85 @@ return DAG.getBuildVector(WidenVT, DL, Ops); } +SDValue DAGTypeLegalizer::WidenVecRes_Convert_StrictFP(SDNode *N) { + SDValue InOp = N->getOperand(1); + SDLoc DL(N); + SmallVector NewOps(N->op_begin(), N->op_end()); + + EVT WidenVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0)); + unsigned WidenNumElts = WidenVT.getVectorNumElements(); + SmallVector WidenVTs = { WidenVT, MVT::Other }; + + EVT InVT = InOp.getValueType(); + EVT InEltVT = InVT.getVectorElementType(); + EVT InWidenVT = EVT::getVectorVT(*DAG.getContext(), InEltVT, WidenNumElts); + SmallVector InWidenVTs = { InWidenVT, MVT::Other }; + + unsigned Opcode = N->getOpcode(); + unsigned InVTNumElts = InVT.getVectorNumElements(); + if (getTypeAction(InVT) == TargetLowering::TypeWidenVector) { + InOp = GetWidenedVector(N->getOperand(1)); + InVT = InOp.getValueType(); + InVTNumElts = InVT.getVectorNumElements(); + if (InVTNumElts == WidenNumElts) { + NewOps[1] = InOp; + SDValue Result = DAG.getNode(Opcode, DL, WidenVTs, NewOps); + ReplaceValueWith(SDValue(N, 1), Result.getValue(1)); // Chain + return Result; + } + } + + if (TLI.isTypeLegal(InWidenVT)) { + // Because the result and the input are different vector types, widening + // the result could create a legal type but widening the input might make + // it an illegal type that might lead to repeatedly splitting the input + // and then widening it. To avoid this, we widen the input only if + // it results in a legal type. + if (WidenNumElts % InVTNumElts == 0) { + // Widen the input and call convert on the widened input vector. + unsigned NumConcat = WidenNumElts/InVTNumElts; + SmallVector Ops(NumConcat, DAG.getUNDEF(InVT)); + Ops[0] = InOp; + SDValue InVec = DAG.getNode(ISD::CONCAT_VECTORS, DL, InWidenVT, Ops); + + NewOps[1] = InVec; + SDValue Result = DAG.getNode(Opcode, DL, WidenVTs, NewOps); + ReplaceValueWith(SDValue(N, 1), Result.getValue(1)); // Chain + return Result; + } + + if (InVTNumElts % WidenNumElts == 0) { + NewOps[1] = DAG.getNode( + ISD::EXTRACT_SUBVECTOR, DL, InWidenVT, InOp, + DAG.getConstant(0, DL, TLI.getVectorIdxTy(DAG.getDataLayout()))); + // Extract the input and convert the shorten input vector. + SDValue Result = DAG.getNode(Opcode, DL, WidenVTs, NewOps); + ReplaceValueWith(SDValue(N, 1), Result.getValue(1)); // Chain + return Result; + } + } + + // Otherwise unroll into some nasty scalar code and rebuild the vector. + EVT EltVT = WidenVT.getVectorElementType(); + SmallVector EltVTs = { EltVT, MVT::Other }; + SmallVector Ops(WidenNumElts, DAG.getUNDEF(EltVT)); + SmallVector OpChains; + // Use the original element count so we don't do more scalar opts than + // necessary. + unsigned MinElts = N->getValueType(0).getVectorNumElements(); + for (unsigned i=0; i < MinElts; ++i) { + NewOps[1] = DAG.getNode( + ISD::EXTRACT_VECTOR_ELT, DL, InEltVT, InOp, + DAG.getConstant(i, DL, TLI.getVectorIdxTy(DAG.getDataLayout()))); + Ops[i] = DAG.getNode(Opcode, DL, EltVTs, NewOps); + OpChains.push_back(Ops[i].getValue(1)); + } + SDValue NewChain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, OpChains); + ReplaceValueWith(SDValue(N, 1), NewChain); + + return DAG.getBuildVector(WidenVT, DL, Ops); +} + SDValue DAGTypeLegalizer::WidenVecRes_EXTEND_VECTOR_INREG(SDNode *N) { unsigned Opcode = N->getOpcode(); SDValue InOp = N->getOperand(0); @@ -3707,6 +3874,7 @@ break; case ISD::FP_EXTEND: + case ISD::STRICT_FP_EXTEND: case ISD::FP_TO_SINT: case ISD::FP_TO_UINT: case ISD::SINT_TO_FP: @@ -3725,8 +3893,12 @@ return true; - assert(Res.getValueType() == N->getValueType(0) && N->getNumValues() == 1 && - "Invalid operand expansion"); + if (N->isStrictFPOpcode()) + assert(Res.getValueType() == N->getValueType(0) && N->getNumValues() == 2 && + "Invalid operand expansion"); + else + assert(Res.getValueType() == N->getValueType(0) && N->getNumValues() == 1 && + "Invalid operand expansion"); ReplaceValueWith(SDValue(N, 0), Res); return false; @@ -3806,7 +3978,7 @@ EVT EltVT = VT.getVectorElementType(); SDLoc dl(N); unsigned NumElts = VT.getVectorNumElements(); - SDValue InOp = N->getOperand(0); + SDValue InOp = N->getOperand(N->isStrictFPOpcode()); assert(getTypeAction(InOp.getValueType()) == TargetLowering::TypeWidenVector && "Unexpected type action"); @@ -3818,7 +3990,15 @@ EVT WideVT = EVT::getVectorVT(*DAG.getContext(), EltVT, InVT.getVectorNumElements()); if (TLI.isTypeLegal(WideVT)) { - SDValue Res = DAG.getNode(Opcode, dl, WideVT, InOp); + SDValue Res; + if (N->isStrictFPOpcode()) { + Res = DAG.getNode(Opcode, dl, { WideVT, MVT::Other }, + { N->getOperand(0), InOp }); + // Legalize the chain result - switch anything that used the old chain to + // use the new one. + ReplaceValueWith(SDValue(N, 1), Res.getValue(1)); + } else + Res = DAG.getNode(Opcode, dl, WideVT, InOp); return DAG.getNode( ISD::EXTRACT_SUBVECTOR, dl, VT, Res, DAG.getConstant(0, dl, TLI.getVectorIdxTy(DAG.getDataLayout()))); @@ -3828,6 +4008,19 @@ // Unroll the convert into some scalar code and create a nasty build vector. SmallVector Ops(NumElts); + if (N->isStrictFPOpcode()) { + SmallVector NewOps(N->op_begin(), N->op_end()); + SmallVector OpChains; + for (unsigned i=0; i < NumElts; ++i) { + NewOps[1] = DAG.getNode( + ISD::EXTRACT_VECTOR_ELT, dl, InEltVT, InOp, + DAG.getConstant(i, dl, TLI.getVectorIdxTy(DAG.getDataLayout()))); + Ops[i] = DAG.getNode(Opcode, dl, { EltVT, MVT::Other }, NewOps); + OpChains.push_back(Ops[i].getValue(1)); + } + SDValue NewChain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, OpChains); + ReplaceValueWith(SDValue(N, 1), NewChain); + } else for (unsigned i=0; i < NumElts; ++i) Ops[i] = DAG.getNode( Opcode, dl, EltVT, Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp =================================================================== --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -7661,6 +7661,8 @@ case ISD::STRICT_FFLOOR: NewOpc = ISD::FFLOOR; IsUnary = true; break; case ISD::STRICT_FROUND: NewOpc = ISD::FROUND; IsUnary = true; break; case ISD::STRICT_FTRUNC: NewOpc = ISD::FTRUNC; IsUnary = true; break; + case ISD::STRICT_FP_ROUND: NewOpc = ISD::FP_ROUND; break; + case ISD::STRICT_FP_EXTEND: NewOpc = ISD::FP_EXTEND; IsUnary = true; break; } // We're taking this node out of the chain, so we need to re-link things. @@ -7668,8 +7670,19 @@ SDValue OutputChain = SDValue(Node, 1); ReplaceAllUsesOfValueWith(OutputChain, InputChain); - SDVTList VTs = getVTList(Node->getOperand(1).getValueType()); + SDVTList VTs; SDNode *Res = nullptr; + + switch (OrigOpc) { + default: + VTs = getVTList(Node->getOperand(1).getValueType()); + break; + case ISD::STRICT_FP_ROUND: + case ISD::STRICT_FP_EXTEND: + VTs = getVTList(Node->ValueList[0]); + break; + } + if (IsUnary) Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1) }); else if (IsTernary) Index: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp =================================================================== --- lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -5652,6 +5652,8 @@ case Intrinsic::experimental_constrained_fdiv: case Intrinsic::experimental_constrained_frem: case Intrinsic::experimental_constrained_fma: + case Intrinsic::experimental_constrained_fptrunc: + case Intrinsic::experimental_constrained_fpext: case Intrinsic::experimental_constrained_sqrt: case Intrinsic::experimental_constrained_pow: case Intrinsic::experimental_constrained_powi: @@ -6400,6 +6402,12 @@ case Intrinsic::experimental_constrained_fma: Opcode = ISD::STRICT_FMA; break; + case Intrinsic::experimental_constrained_fptrunc: + Opcode = ISD::STRICT_FP_ROUND; + break; + case Intrinsic::experimental_constrained_fpext: + Opcode = ISD::STRICT_FP_EXTEND; + break; case Intrinsic::experimental_constrained_sqrt: Opcode = ISD::STRICT_FSQRT; break; @@ -6463,7 +6471,12 @@ SDVTList VTs = DAG.getVTList(ValueVTs); SDValue Result; - if (FPI.isUnaryOp()) + if (Opcode == ISD::STRICT_FP_ROUND) + Result = DAG.getNode(Opcode, sdl, VTs, + { Chain, getValue(FPI.getArgOperand(0)), + DAG.getTargetConstant(0, sdl, + TLI.getPointerTy(DAG.getDataLayout())) }); + else if (FPI.isUnaryOp()) Result = DAG.getNode(Opcode, sdl, VTs, { Chain, getValue(FPI.getArgOperand(0)) }); else if (FPI.isTernaryOp()) Index: lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp =================================================================== --- lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp +++ lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp @@ -309,9 +309,11 @@ case ISD::ZERO_EXTEND_VECTOR_INREG: return "zero_extend_vector_inreg"; case ISD::TRUNCATE: return "truncate"; case ISD::FP_ROUND: return "fp_round"; + case ISD::STRICT_FP_ROUND: return "strict_fp_round"; case ISD::FLT_ROUNDS_: return "flt_rounds"; case ISD::FP_ROUND_INREG: return "fp_round_inreg"; case ISD::FP_EXTEND: return "fp_extend"; + case ISD::STRICT_FP_EXTEND: return "strict_fp_extend"; case ISD::SINT_TO_FP: return "sint_to_fp"; case ISD::UINT_TO_FP: return "uint_to_fp"; Index: lib/IR/IntrinsicInst.cpp =================================================================== --- lib/IR/IntrinsicInst.cpp +++ lib/IR/IntrinsicInst.cpp @@ -142,6 +142,8 @@ switch (getIntrinsicID()) { default: return false; + case Intrinsic::experimental_constrained_fptrunc: + case Intrinsic::experimental_constrained_fpext: case Intrinsic::experimental_constrained_sqrt: case Intrinsic::experimental_constrained_sin: case Intrinsic::experimental_constrained_cos: Index: lib/IR/Verifier.cpp =================================================================== --- lib/IR/Verifier.cpp +++ lib/IR/Verifier.cpp @@ -4162,6 +4162,8 @@ case Intrinsic::experimental_constrained_fdiv: case Intrinsic::experimental_constrained_frem: case Intrinsic::experimental_constrained_fma: + case Intrinsic::experimental_constrained_fptrunc: + case Intrinsic::experimental_constrained_fpext: case Intrinsic::experimental_constrained_sqrt: case Intrinsic::experimental_constrained_pow: case Intrinsic::experimental_constrained_powi: @@ -4605,17 +4607,109 @@ void Verifier::visitConstrainedFPIntrinsic(ConstrainedFPIntrinsic &FPI) { unsigned NumOperands = FPI.getNumArgOperands(); - Assert(((NumOperands == 5 && FPI.isTernaryOp()) || - (NumOperands == 3 && FPI.isUnaryOp()) || (NumOperands == 4)), + bool HasExceptionMD = false; + bool HasRoundingMD = false; + switch (FPI.getIntrinsicID()) + { + case Intrinsic::experimental_constrained_sqrt: + case Intrinsic::experimental_constrained_sin: + case Intrinsic::experimental_constrained_cos: + case Intrinsic::experimental_constrained_exp: + case Intrinsic::experimental_constrained_exp2: + case Intrinsic::experimental_constrained_log: + case Intrinsic::experimental_constrained_log10: + case Intrinsic::experimental_constrained_log2: + case Intrinsic::experimental_constrained_rint: + case Intrinsic::experimental_constrained_nearbyint: + case Intrinsic::experimental_constrained_ceil: + case Intrinsic::experimental_constrained_floor: + case Intrinsic::experimental_constrained_round: + case Intrinsic::experimental_constrained_trunc: + Assert((NumOperands == 3), "invalid arguments for constrained FP intrinsic", &FPI); - Assert(isa(FPI.getArgOperand(NumOperands-1)), - "invalid exception behavior argument", &FPI); - Assert(isa(FPI.getArgOperand(NumOperands-2)), - "invalid rounding mode argument", &FPI); - Assert(FPI.getRoundingMode() != ConstrainedFPIntrinsic::rmInvalid, - "invalid rounding mode argument", &FPI); - Assert(FPI.getExceptionBehavior() != ConstrainedFPIntrinsic::ebInvalid, - "invalid exception behavior argument", &FPI); + HasExceptionMD = true; + HasRoundingMD = true; + break; + + case Intrinsic::experimental_constrained_fma: + Assert((NumOperands == 5), + "invalid arguments for constrained FP intrinsic", &FPI); + HasExceptionMD = true; + HasRoundingMD = true; + break; + + case Intrinsic::experimental_constrained_fadd: + case Intrinsic::experimental_constrained_fsub: + case Intrinsic::experimental_constrained_fmul: + case Intrinsic::experimental_constrained_fdiv: + case Intrinsic::experimental_constrained_frem: + case Intrinsic::experimental_constrained_pow: + case Intrinsic::experimental_constrained_powi: + case Intrinsic::experimental_constrained_maxnum: + case Intrinsic::experimental_constrained_minnum: + Assert((NumOperands == 4), + "invalid arguments for constrained FP intrinsic", &FPI); + HasExceptionMD = true; + HasRoundingMD = true; + break; + + case Intrinsic::experimental_constrained_fptrunc: + case Intrinsic::experimental_constrained_fpext: { + Assert((NumOperands == 2), + "invalid arguments for constrained FP intrinsic", &FPI); + HasExceptionMD = true; + + Value *Operand = FPI.getArgOperand(0); + uint64_t NumSrcElem = 0; + if (Operand->getType()->isVectorTy()) { + auto *OperandT = cast(Operand->getType()); + NumSrcElem = OperandT->getNumElements(); + Assert(OperandT->getVectorElementType()->isFloatingPointTy(), + "Intrinsic first argument vector must be floating point", + &FPI); + } + else + Assert(Operand->getType()->isFloatingPointTy(), + "Intrinsic first argument must be floating point", + &FPI); + + Operand = &FPI; + Assert((NumSrcElem > 0) == Operand->getType()->isVectorTy(), + "Intrinsic first argument and result disagree on vector use", + &FPI); + if (Operand->getType()->isVectorTy()) { + auto *OperandT = cast(Operand->getType()); + Assert(NumSrcElem == OperandT->getNumElements(), + "Intrinsic first argument and result vector lengths must be equal", + &FPI); + Assert(OperandT->getVectorElementType()->isFloatingPointTy(), + "Intrinsic result vector must be floating point", + &FPI); + } + else + Assert(Operand->getType()->isFloatingPointTy(), + "Intrinsic result must be an floating point", + &FPI); + } + break; + + default: + llvm_unreachable("Invalid constrained FP intrinsic!"); + } + + if (HasExceptionMD) { + Assert(isa(FPI.getArgOperand(NumOperands-1)), + "invalid exception behavior argument", &FPI); + Assert(FPI.getExceptionBehavior() != ConstrainedFPIntrinsic::ebInvalid, + "invalid exception behavior argument", &FPI); + } + if (HasRoundingMD) { + int RoundingIdx = (HasExceptionMD ? NumOperands - 2 : NumOperands - 1); + Assert(isa(FPI.getArgOperand(RoundingIdx)), + "invalid rounding mode argument", &FPI); + Assert(FPI.getRoundingMode() != ConstrainedFPIntrinsic::rmInvalid, + "invalid rounding mode argument", &FPI); + } } void Verifier::visitDbgIntrinsic(StringRef Kind, DbgVariableIntrinsic &DII) { Index: test/CodeGen/X86/fp-intrinsics.ll =================================================================== --- test/CodeGen/X86/fp-intrinsics.ll +++ test/CodeGen/X86/fp-intrinsics.ll @@ -286,6 +286,31 @@ ret double %rem } +; Verify that round(42.1) isn't simplified when the rounding mode is +; unknown. +; Verify that no gross errors happen. +; CHECK-LABEL: @f21 +; COMMON: cvtsd2ss +define float @f21() { +entry: + %result = call float @llvm.experimental.constrained.fptrunc.f32.f64( + double 42.1, + metadata !"fpexcept.strict") + ret float %result +} + +; Verify that fpext(42.1) isn't simplified when the rounding mode is +; unknown. +; Verify that no gross errors happen. +; CHECK-LABEL: @f22 +; COMMON: cvtss2sd +define double @f22(float %x) { +entry: + %result = call double @llvm.experimental.constrained.fpext.f64.f32(float %x, + metadata !"fpexcept.strict") + ret double %result +} + @llvm.fp.env = thread_local global i8 zeroinitializer, section "llvm.metadata" declare double @llvm.experimental.constrained.fadd.f64(double, double, metadata, metadata) declare double @llvm.experimental.constrained.fsub.f64(double, double, metadata, metadata) @@ -306,3 +331,6 @@ declare double @llvm.experimental.constrained.nearbyint.f64(double, metadata, metadata) declare float @llvm.experimental.constrained.fma.f32(float, float, float, metadata, metadata) declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata) +declare float @llvm.experimental.constrained.fptrunc.f32.f64(double, metadata) +declare double @llvm.experimental.constrained.fpext.f64.f32(float, metadata) + Index: test/CodeGen/X86/vector-constrained-fp-intrinsics.ll =================================================================== --- test/CodeGen/X86/vector-constrained-fp-intrinsics.ll +++ test/CodeGen/X86/vector-constrained-fp-intrinsics.ll @@ -2423,6 +2423,149 @@ ret <4 x double> %min } +define <1 x float> @constrained_vector_fptrunc_v1f64() { +; CHECK-LABEL: constrained_vector_fptrunc_v1f64: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm0, %xmm0 +; CHECK-NEXT: retq +entry: + %result = call <1 x float> @llvm.experimental.constrained.fptrunc.v1f32.v1f64( + <1 x double>, + metadata !"fpexcept.strict") + ret <1 x float> %result +} + +define <2 x float> @constrained_vector_fptrunc_v2f64() { +; CHECK-LABEL: constrained_vector_fptrunc_v2f64: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm0, %xmm1 +; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm0, %xmm0 +; CHECK-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] +; CHECK-NEXT: retq +entry: + %result = call <2 x float> @llvm.experimental.constrained.fptrunc.v2f32.v2f64( + <2 x double>, + metadata !"fpexcept.strict") + ret <2 x float> %result +} + +define <3 x float> @constrained_vector_fptrunc_v3f64() { +; CHECK-LABEL: constrained_vector_fptrunc_v3f64: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm0, %xmm1 +; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm0, %xmm0 +; CHECK-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] +; CHECK-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm1, %xmm1 +; CHECK-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT: retq +entry: + %result = call <3 x float> @llvm.experimental.constrained.fptrunc.v3f32.v3f64( + <3 x double>, + metadata !"fpexcept.strict") + ret <3 x float> %result +} + +define <4 x float> @constrained_vector_fptrunc_v4f64() { +; CHECK-LABEL: constrained_vector_fptrunc_v4f64: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm0, %xmm0 +; CHECK-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm1, %xmm1 +; CHECK-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1] +; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm0, %xmm2 +; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero +; CHECK-NEXT: cvtsd2ss %xmm0, %xmm0 +; CHECK-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1] +; CHECK-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT: retq +entry: + %result = call <4 x float> @llvm.experimental.constrained.fptrunc.v4f32.v4f64( + <4 x double>, + metadata !"fpexcept.strict") + ret <4 x float> %result +} + +define <1 x double> @constrained_vector_fpext_v1f32() { +; CHECK-LABEL: constrained_vector_fpext_v1f32: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm0, %xmm0 +; CHECK-NEXT: retq +entry: + %result = call <1 x double> @llvm.experimental.constrained.fpext.v1f64.v1f32( + <1 x float>, + metadata !"fpexcept.strict") + ret <1 x double> %result +} + +define <2 x double> @constrained_vector_fpext_v2f32() { +; CHECK-LABEL: constrained_vector_fpext_v2f32: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm0, %xmm1 +; CHECK-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm0, %xmm0 +; CHECK-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT: retq +entry: + %result = call <2 x double> @llvm.experimental.constrained.fpext.v2f64.v2f32( + <2 x float>, + metadata !"fpexcept.strict") + ret <2 x double> %result +} + +define <3 x double> @constrained_vector_fpext_v3f32() { +; CHECK-LABEL: constrained_vector_fpext_v3f32: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm0, %xmm0 +; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm1, %xmm1 +; CHECK-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm2, %xmm2 +; CHECK-NEXT: movsd %xmm2, -{{[0-9]+}}(%rsp) +; CHECK-NEXT: fldl -{{[0-9]+}}(%rsp) +; CHECK-NEXT: retq +entry: + %result = call <3 x double> @llvm.experimental.constrained.fpext.v3f64.v3f32( + <3 x float>, + metadata !"fpexcept.strict") + ret <3 x double> %result +} + +define <4 x double> @constrained_vector_fpext_v4f32() { +; CHECK-LABEL: constrained_vector_fpext_v4f32: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm0, %xmm1 +; CHECK-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm0, %xmm0 +; CHECK-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0] +; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm1, %xmm2 +; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero +; CHECK-NEXT: cvtss2sd %xmm1, %xmm1 +; CHECK-NEXT: movlhps {{.*#+}} xmm1 = xmm1[0],xmm2[0] +; CHECK-NEXT: retq +entry: + %result = call <4 x double> @llvm.experimental.constrained.fpext.v4f64.v4f32( + <4 x float>, + metadata !"fpexcept.strict") + ret <4 x double> %result +} + define <1 x float> @constrained_vector_ceil_v1f32() { ; CHECK-LABEL: constrained_vector_ceil_v1f32: ; CHECK: # %bb.0: # %entry @@ -2846,6 +2989,8 @@ declare <2 x double> @llvm.experimental.constrained.nearbyint.v2f64(<2 x double>, metadata, metadata) declare <2 x double> @llvm.experimental.constrained.maxnum.v2f64(<2 x double>, <2 x double>, metadata, metadata) declare <2 x double> @llvm.experimental.constrained.minnum.v2f64(<2 x double>, <2 x double>, metadata, metadata) +declare <2 x float> @llvm.experimental.constrained.fptrunc.v2f32.v2f64(<2 x double>, metadata) +declare <2 x double> @llvm.experimental.constrained.fpext.v2f64.v2f32(<2 x float>, metadata) declare <2 x double> @llvm.experimental.constrained.ceil.v2f64(<2 x double>, metadata, metadata) declare <2 x double> @llvm.experimental.constrained.floor.v2f64(<2 x double>, metadata, metadata) declare <2 x double> @llvm.experimental.constrained.round.v2f64(<2 x double>, metadata, metadata) @@ -2871,6 +3016,8 @@ declare <1 x float> @llvm.experimental.constrained.nearbyint.v1f32(<1 x float>, metadata, metadata) declare <1 x float> @llvm.experimental.constrained.maxnum.v1f32(<1 x float>, <1 x float>, metadata, metadata) declare <1 x float> @llvm.experimental.constrained.minnum.v1f32(<1 x float>, <1 x float>, metadata, metadata) +declare <1 x float> @llvm.experimental.constrained.fptrunc.v1f32.v1f64(<1 x double>, metadata) +declare <1 x double> @llvm.experimental.constrained.fpext.v1f64.v1f32(<1 x float>, metadata) declare <1 x float> @llvm.experimental.constrained.ceil.v1f32(<1 x float>, metadata, metadata) declare <1 x float> @llvm.experimental.constrained.floor.v1f32(<1 x float>, metadata, metadata) declare <1 x float> @llvm.experimental.constrained.round.v1f32(<1 x float>, metadata, metadata) @@ -2915,6 +3062,8 @@ declare <3 x double> @llvm.experimental.constrained.maxnum.v3f64(<3 x double>, <3 x double>, metadata, metadata) declare <3 x float> @llvm.experimental.constrained.minnum.v3f32(<3 x float>, <3 x float>, metadata, metadata) declare <3 x double> @llvm.experimental.constrained.minnum.v3f64(<3 x double>, <3 x double>, metadata, metadata) +declare <3 x float> @llvm.experimental.constrained.fptrunc.v3f32.v3f64(<3 x double>, metadata) +declare <3 x double> @llvm.experimental.constrained.fpext.v3f64.v3f32(<3 x float>, metadata) declare <3 x float> @llvm.experimental.constrained.ceil.v3f32(<3 x float>, metadata, metadata) declare <3 x double> @llvm.experimental.constrained.ceil.v3f64(<3 x double>, metadata, metadata) declare <3 x float> @llvm.experimental.constrained.floor.v3f32(<3 x float>, metadata, metadata) @@ -2944,8 +3093,9 @@ declare <4 x double> @llvm.experimental.constrained.nearbyint.v4f64(<4 x double>, metadata, metadata) declare <4 x double> @llvm.experimental.constrained.maxnum.v4f64(<4 x double>, <4 x double>, metadata, metadata) declare <4 x double> @llvm.experimental.constrained.minnum.v4f64(<4 x double>, <4 x double>, metadata, metadata) +declare <4 x float> @llvm.experimental.constrained.fptrunc.v4f32.v4f64(<4 x double>, metadata) +declare <4 x double> @llvm.experimental.constrained.fpext.v4f64.v4f32(<4 x float>, metadata) declare <4 x double> @llvm.experimental.constrained.ceil.v4f64(<4 x double>, metadata, metadata) declare <4 x double> @llvm.experimental.constrained.floor.v4f64(<4 x double>, metadata, metadata) declare <4 x double> @llvm.experimental.constrained.round.v4f64(<4 x double>, metadata, metadata) declare <4 x double> @llvm.experimental.constrained.trunc.v4f64(<4 x double>, metadata, metadata) - Index: test/Feature/fp-intrinsics.ll =================================================================== --- test/Feature/fp-intrinsics.ll +++ test/Feature/fp-intrinsics.ll @@ -242,6 +242,29 @@ ret double %result } +; Verify that fptrunc(42.1) isn't simplified when the rounding mode is +; unknown. +; CHECK-LABEL: f20 +; CHECK: call float @llvm.experimental.constrained.fptrunc +define float @f20() { +entry: + %result = call float @llvm.experimental.constrained.fptrunc.f32.f64( + double 42.1, + metadata !"fpexcept.strict") + ret float %result +} + +; Verify that fpext(42.1) isn't simplified when the rounding mode is +; unknown. +; CHECK-LABEL: f21 +; CHECK: call double @llvm.experimental.constrained.fpext +define double @f21() { +entry: + %result = call double @llvm.experimental.constrained.fpext.f64.f32(float 42.0, + metadata !"fpexcept.strict") + ret double %result +} + @llvm.fp.env = thread_local global i8 zeroinitializer, section "llvm.metadata" declare double @llvm.experimental.constrained.fdiv.f64(double, double, metadata, metadata) declare double @llvm.experimental.constrained.fmul.f64(double, double, metadata, metadata) @@ -260,3 +283,5 @@ declare double @llvm.experimental.constrained.rint.f64(double, metadata, metadata) declare double @llvm.experimental.constrained.nearbyint.f64(double, metadata, metadata) declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata) +declare float @llvm.experimental.constrained.fptrunc.f32.f64(double, metadata) +declare double @llvm.experimental.constrained.fpext.f64.f32(float, metadata)