Index: docs/AddingConstrainedIntrinsics.rst =================================================================== --- docs/AddingConstrainedIntrinsics.rst +++ docs/AddingConstrainedIntrinsics.rst @@ -0,0 +1,96 @@ +================================================== +How to add a constrained floating-point intrinsic +================================================== + +.. contents:: + :local: + +.. warning:: + This is a work in progress. + +Add the intrinsic +================= + +Multiple files need to be updated when adding a new constrained intrinsic. + +Add the new intrinsic to the table of intrinsics.:: + +include/llvm/IR/Intrinsics.td + +Update class ConstrainedFPIntrinsic to know about the intrinsics.:: + +include/llvm/IR/IntrinsicInst.h + +Functions like ConstrainedFPIntrinsic::isUnaryOp() or +ConstrainedFPIntrinsic::isTernaryOp() may need to know about the new +intrinsic.:: + +lib/IR/IntrinsicInst.cpp + +Update the IR verifier:: + +lib/IR/Verifier.cpp + +Add SelectionDAG node types +=========================== + +Add the new STRICT version of the node type to the ISD::NodeType enum.:: + +include/llvm/CodeGen/ISDOpcodes.h + +In class SDNode update isStrictFPOpcode():: + +include/llvm/CodeGen/SelectionDAGNodes.h + +A mapping from the STRICT SDnode type to the non-STRICT is done in +TargetLoweringBase::getStrictFPOperationAction(). This allows STRICT +nodes to be legalized similarly to the non-STRICT node type.:: + +include/llvm/CodeGen/TargetLowering.h + +Building the SelectionDAG +------------------------- + +The switch statement in SelectionDAGBuilder::visitIntrinsicCall() needs +to be updated to call SelectionDAGBuilder::visitConstrainedFPIntrinsic(). +That function, in turn, needs to be updated to know how to create the +SDNode for the intrinsic. The new STRICT node will eventually be converted +to the matching non-STRICT node. For this reason it _must_ have the same +operands and values as the non-STRICT version in case the non-STRICT +version's default lowering is used. This means that if the non-STRICT +version of the node does not use the chain then the STRICT node cannot +either.:: + +lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp + +Most of the STRICT nodes get legalized the same as their matching non-STRICT +counterparts. A new STRICT node with this property must get added to the +switch in SelectionDAGLegalize::LegalizeOp().:: + +lib/CodeGen/SelectionDAG/LegalizeDAG.cpp + +The code to do the conversion or mutation of the STRICT node to a non-STRICT +version of the node happens in SelectionDAG::mutateStrictFPToFP(). Be +careful updating this function since some nodes are always chained and +some are not. Some nodes have the same return type as their input operand, +but some are different. Both of these points must be properly handled.:: + +lib/CodeGen/SelectionDAG/SelectionDAG.cpp + +To make debug logs readable it is helpful to update the SelectionDAG's +debug logger::: + +lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp + +Add any required transforms to lib/CodeGen/StrictFP.cpp +======================================================= + +If there are any transforms that cannot or should not be done in the +SelectionDAG then the StrictFP.cpp pass is the place to put them. + +Add documentation and tests +=========================== + +:: + +docs/LangRef.rst Index: docs/LangRef.rst =================================================================== --- docs/LangRef.rst +++ docs/LangRef.rst @@ -13824,7 +13824,6 @@ declare @llvm.experimental.constrained.frem( , , - metadata , metadata ) Overview: @@ -13841,10 +13840,7 @@ intrinsic must be :ref:`floating-point ` or :ref:`vector ` of floating-point values. Both arguments must have identical types. -The third and fourth arguments specify the rounding mode and exception -behavior as described above. The rounding mode argument has no effect, since -the result of frem is never rounded, but the argument is included for -consistency with the other constrained floating-point intrinsics. +The third argument specifies the exception behavior as described above. Semantics: """""""""" @@ -13889,6 +13885,141 @@ operand computed with infinite precision, and then rounded to the target precision. +'``llvm.experimental.constrained.fptoui``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare + @llvm.experimental.constrained.fptoui( , + metadata ) + +Overview: +""""""""" + +The '``llvm.experimental.constrained.fptoui``' intrinsic returns the result of a +conversion of a floating point operand to an unsigned integer. + +Arguments: +"""""""""" + +The first argument to the '``llvm.experimental.constrained.fptoui``' +intrinsic must be :ref:`floating point ` or :ref:`vector +` of floating point values. + +The second argument specifies the exception behavior as described above. + +Semantics: +"""""""""" + +The result produced is an unsigned integer converted from the floating +point operand. The value is truncated, so it is rounded towards zero. + +'``llvm.experimental.constrained.fptosi``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare + @llvm.experimental.constrained.fptosi( , + metadata ) + +Overview: +""""""""" + +The '``llvm.experimental.constrained.fptosi``' intrinsic returns the result of a +conversion of a floating point operand to a signed integer. + +Arguments: +"""""""""" + +The first argument to the '``llvm.experimental.constrained.fptoui``' +intrinsic must be :ref:`floating point ` or :ref:`vector +` of floating point values. + +The second argument specifies the exception behavior as described above. + +Semantics: +"""""""""" + +The result produced is a signed integer converted from the floating +point operand. The value is truncated, so it is rounded towards zero. + +'``llvm.experimental.constrained.fptrunc``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare + @llvm.experimental.constrained.fptrunc( , + metadata ) + +Overview: +""""""""" + +The '``llvm.experimental.constrained.fptrunc``' intrinsic returns the result of +a truncating of a floating point operand into a smaller floating point result. + +Arguments: +"""""""""" + +The first argument to the '``llvm.experimental.constrained.fptrunc``' +intrinsic must be :ref:`floating point ` or :ref:`vector +` of floating point values. This argument must be larger in size +than the result. + +The second argument specifies the exception behavior as described above. + +Semantics: +"""""""""" + +The result produced is a floating point value truncated to be smaller in size +than the operand. + +'``llvm.experimental.constrained.fpext``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare + @llvm.experimental.constrained.fpext( , + metadata ) + +Overview: +""""""""" + +The '``llvm.experimental.constrained.fpext``' intrinsic returns the result of +an enlarging of a floating point operand. + +Arguments: +"""""""""" + +The first argument to the '``llvm.experimental.constrained.fpext`' +intrinsic must be :ref:`floating point ` or :ref:`vector +` of floating point values. This argument must be smaller in size +than the result. + +The second argument specifies the exception behavior as described above. + +Semantics: +"""""""""" + +The result produced is a floating point value extended to be larger in size +than the operand. All restrictions that apply to the fpext instruction also +apply to this intrinsic. + Constrained libm-equivalent Intrinsics -------------------------------------- Index: include/llvm/CodeGen/ISDOpcodes.h =================================================================== --- include/llvm/CodeGen/ISDOpcodes.h +++ include/llvm/CodeGen/ISDOpcodes.h @@ -525,6 +525,11 @@ /// X = FP_EXTEND(Y) - Extend a smaller FP type into a larger FP type. FP_EXTEND, + STRICT_FP_TO_SINT, + STRICT_FP_TO_UINT, + STRICT_FP_ROUND, + STRICT_FP_EXTEND, + /// BITCAST - This operator converts between integer, vector and FP /// values, as if the value was stored to memory with one type and loaded /// from the same address with the other type (or equivalently for vector Index: include/llvm/CodeGen/Passes.h =================================================================== --- include/llvm/CodeGen/Passes.h +++ include/llvm/CodeGen/Passes.h @@ -441,6 +441,8 @@ /// Creates CFI Instruction Inserter pass. \see CFIInstrInserter.cpp FunctionPass *createCFIInstrInserter(); + // Experimental pass with transforms needed for strict fp + FunctionPass *createStrictFPPass(); } // End llvm namespace #endif Index: include/llvm/CodeGen/SelectionDAGNodes.h =================================================================== --- include/llvm/CodeGen/SelectionDAGNodes.h +++ include/llvm/CodeGen/SelectionDAGNodes.h @@ -672,6 +672,10 @@ case ISD::STRICT_FLOG2: case ISD::STRICT_FRINT: case ISD::STRICT_FNEARBYINT: + case ISD::STRICT_FP_TO_SINT: + case ISD::STRICT_FP_TO_UINT: + case ISD::STRICT_FP_ROUND: + case ISD::STRICT_FP_EXTEND: return true; } } Index: include/llvm/CodeGen/TargetLowering.h =================================================================== --- include/llvm/CodeGen/TargetLowering.h +++ include/llvm/CodeGen/TargetLowering.h @@ -811,6 +811,10 @@ case ISD::STRICT_FLOG2: EqOpc = ISD::FLOG2; break; case ISD::STRICT_FRINT: EqOpc = ISD::FRINT; break; case ISD::STRICT_FNEARBYINT: EqOpc = ISD::FNEARBYINT; break; + case ISD::STRICT_FP_TO_SINT: EqOpc = ISD::FP_TO_SINT; break; + case ISD::STRICT_FP_TO_UINT: EqOpc = ISD::FP_TO_UINT; break; + case ISD::STRICT_FP_ROUND: EqOpc = ISD::FP_ROUND; break; + case ISD::STRICT_FP_EXTEND: EqOpc = ISD::FP_EXTEND; break; } auto Action = getOperationAction(EqOpc, VT); Index: include/llvm/IR/IntrinsicInst.h =================================================================== --- include/llvm/IR/IntrinsicInst.h +++ include/llvm/IR/IntrinsicInst.h @@ -223,6 +223,10 @@ case Intrinsic::experimental_constrained_fdiv: case Intrinsic::experimental_constrained_frem: case Intrinsic::experimental_constrained_fma: + case Intrinsic::experimental_constrained_fptosi: + case Intrinsic::experimental_constrained_fptoui: + case Intrinsic::experimental_constrained_fptrunc: + case Intrinsic::experimental_constrained_fpext: case Intrinsic::experimental_constrained_sqrt: case Intrinsic::experimental_constrained_pow: case Intrinsic::experimental_constrained_powi: Index: include/llvm/IR/Intrinsics.td =================================================================== --- include/llvm/IR/Intrinsics.td +++ include/llvm/IR/Intrinsics.td @@ -493,7 +493,6 @@ def int_experimental_constrained_frem : Intrinsic<[ llvm_anyfloat_ty ], [ LLVMMatchType<0>, LLVMMatchType<0>, - llvm_metadata_ty, llvm_metadata_ty ]>; def int_experimental_constrained_fma : Intrinsic<[ llvm_anyfloat_ty ], @@ -503,6 +502,22 @@ llvm_metadata_ty, llvm_metadata_ty ]>; + def int_experimental_constrained_fptosi : Intrinsic<[ llvm_anyint_ty ], + [ llvm_anyfloat_ty, + llvm_metadata_ty ]>; + + def int_experimental_constrained_fptoui : Intrinsic<[ llvm_anyint_ty ], + [ llvm_anyfloat_ty, + llvm_metadata_ty ]>; + + def int_experimental_constrained_fptrunc : Intrinsic<[ llvm_anyfloat_ty ], + [ llvm_anyfloat_ty, + llvm_metadata_ty ]>; + + def int_experimental_constrained_fpext : Intrinsic<[ llvm_anyfloat_ty ], + [ llvm_anyfloat_ty, + llvm_metadata_ty ]>; + // These intrinsics are sensitive to the rounding mode so we need constrained // versions of each of them. When strict rounding and exception control are // not required the non-constrained versions of these intrinsics should be @@ -558,7 +573,7 @@ llvm_metadata_ty, llvm_metadata_ty ]>; } -// FIXME: Add intrinsics for fcmp, fptrunc, fpext, fptoui and fptosi. +// FIXME: Add intrinsic for fcmp // FIXME: Add intrinsics for fabs, copysign, floor, ceil, trunc and round? Index: include/llvm/InitializePasses.h =================================================================== --- include/llvm/InitializePasses.h +++ include/llvm/InitializePasses.h @@ -371,6 +371,7 @@ void initializeStackProtectorPass(PassRegistry&); void initializeStackSlotColoringPass(PassRegistry&); void initializeStraightLineStrengthReducePass(PassRegistry&); +void initializeStrictFPPassPass(PassRegistry&); void initializeStripDeadDebugInfoPass(PassRegistry&); void initializeStripDeadPrototypesLegacyPassPass(PassRegistry&); void initializeStripDebugDeclarePass(PassRegistry&); Index: lib/CodeGen/CMakeLists.txt =================================================================== --- lib/CodeGen/CMakeLists.txt +++ lib/CodeGen/CMakeLists.txt @@ -143,6 +143,7 @@ StackMaps.cpp StackProtector.cpp StackSlotColoring.cpp + StrictFP.cpp TailDuplication.cpp TailDuplicator.cpp TargetFrameLoweringImpl.cpp Index: lib/CodeGen/CodeGen.cpp =================================================================== --- lib/CodeGen/CodeGen.cpp +++ lib/CodeGen/CodeGen.cpp @@ -97,6 +97,7 @@ initializeStackMapLivenessPass(Registry); initializeStackProtectorPass(Registry); initializeStackSlotColoringPass(Registry); + initializeStrictFPPassPass(Registry); initializeTailDuplicatePass(Registry); initializeTargetPassConfigPass(Registry); initializeTwoAddressInstructionPassPass(Registry); Index: lib/CodeGen/SelectionDAG/LegalizeDAG.cpp =================================================================== --- lib/CodeGen/SelectionDAG/LegalizeDAG.cpp +++ lib/CodeGen/SelectionDAG/LegalizeDAG.cpp @@ -1107,6 +1107,9 @@ case ISD::STRICT_FLOG2: case ISD::STRICT_FRINT: case ISD::STRICT_FNEARBYINT: + case ISD::STRICT_FP_TO_SINT: + case ISD::STRICT_FP_ROUND: + case ISD::STRICT_FP_EXTEND: // These pseudo-ops get legalized as if they were their non-strict // equivalent. For instance, if ISD::FSQRT is legal then ISD::STRICT_FSQRT // is also legal, but if ISD::FSQRT requires expansion then so does @@ -1114,6 +1117,9 @@ Action = TLI.getStrictFPOperationAction(Node->getOpcode(), Node->getValueType(0)); break; + case ISD::STRICT_FP_TO_UINT: + llvm_unreachable("Expansion of STRICT_FP_TO_UINT missed in earlier pass!"); + break; default: if (Node->getOpcode() >= ISD::BUILTIN_OP_END) { Action = TargetLowering::Legal; @@ -2981,12 +2987,14 @@ break; } case ISD::FP_ROUND: + case ISD::STRICT_FP_ROUND: case ISD::BITCAST: Tmp1 = EmitStackConvert(Node->getOperand(0), Node->getValueType(0), Node->getValueType(0), dl); Results.push_back(Tmp1); break; case ISD::FP_EXTEND: + case ISD::STRICT_FP_EXTEND: Tmp1 = EmitStackConvert(Node->getOperand(0), Node->getOperand(0).getValueType(), Node->getValueType(0), dl); @@ -3044,6 +3052,7 @@ Results.push_back(Tmp1); break; case ISD::FP_TO_SINT: + case ISD::STRICT_FP_TO_SINT: if (TLI.expandFP_TO_SINT(Node, Tmp1, DAG)) Results.push_back(Tmp1); break; @@ -3070,6 +3079,9 @@ Results.push_back(Tmp1); break; } + case ISD::STRICT_FP_TO_UINT: + llvm_unreachable("Expansion of STRICT_FP_TO_UINT missed in earlier pass!"); + break; case ISD::VAARG: Results.push_back(DAG.expandVAArg(Node)); Results.push_back(Results[0].getValue(1)); Index: lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp =================================================================== --- lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp +++ lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp @@ -113,6 +113,8 @@ case ISD::ZERO_EXTEND: case ISD::ANY_EXTEND: Res = PromoteIntRes_INT_EXTEND(N); break; + case ISD::STRICT_FP_TO_SINT: + case ISD::STRICT_FP_TO_UINT: case ISD::FP_TO_SINT: case ISD::FP_TO_UINT: Res = PromoteIntRes_FP_TO_XINT(N); break; @@ -417,6 +419,11 @@ TLI.isOperationLegalOrCustom(ISD::FP_TO_SINT, NVT)) NewOpc = ISD::FP_TO_SINT; + if (N->getOpcode() == ISD::STRICT_FP_TO_UINT && + !TLI.isOperationLegal(ISD::STRICT_FP_TO_UINT, NVT) && + TLI.isOperationLegalOrCustom(ISD::STRICT_FP_TO_SINT, NVT)) + NewOpc = ISD::STRICT_FP_TO_SINT; + SDValue Res = DAG.getNode(NewOpc, dl, NVT, N->getOperand(0)); // Assert that the converted value fits in the original type. If it doesn't @@ -426,7 +433,8 @@ // NOTE: fp-to-uint to fp-to-sint promotion guarantees zero extend. For example: // before legalization: fp-to-uint16, 65534. -> 0xfffe // after legalization: fp-to-sint32, 65534. -> 0x0000fffe - return DAG.getNode(N->getOpcode() == ISD::FP_TO_UINT ? + return DAG.getNode((N->getOpcode() == ISD::FP_TO_UINT || + N->getOpcode() == ISD::STRICT_FP_TO_UINT) ? ISD::AssertZext : ISD::AssertSext, dl, NVT, Res, DAG.getValueType(N->getValueType(0).getScalarType())); } Index: lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp =================================================================== --- lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp +++ lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp @@ -311,6 +311,9 @@ case ISD::STRICT_FLOG2: case ISD::STRICT_FRINT: case ISD::STRICT_FNEARBYINT: + case ISD::STRICT_FP_TO_SINT: + case ISD::STRICT_FP_ROUND: + case ISD::STRICT_FP_EXTEND: // These pseudo-ops get legalized as if they were their non-strict // equivalent. For instance, if ISD::FSQRT is legal then ISD::STRICT_FSQRT // is also legal, but if ISD::FSQRT requires expansion then so does Index: lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp =================================================================== --- lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp +++ lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp @@ -51,6 +51,7 @@ case ISD::BITCAST: R = ScalarizeVecRes_BITCAST(N); break; case ISD::BUILD_VECTOR: R = ScalarizeVecRes_BUILD_VECTOR(N); break; case ISD::EXTRACT_SUBVECTOR: R = ScalarizeVecRes_EXTRACT_SUBVECTOR(N); break; + case ISD::STRICT_FP_ROUND: case ISD::FP_ROUND: R = ScalarizeVecRes_FP_ROUND(N); break; case ISD::FP_ROUND_INREG: R = ScalarizeVecRes_InregOp(N); break; case ISD::FPOWI: R = ScalarizeVecRes_FPOWI(N); break; @@ -88,6 +89,7 @@ case ISD::FLOG2: case ISD::FNEARBYINT: case ISD::FNEG: + case ISD::STRICT_FP_EXTEND: case ISD::FP_EXTEND: case ISD::FP_TO_SINT: case ISD::FP_TO_UINT: @@ -484,6 +486,7 @@ case ISD::STORE: Res = ScalarizeVecOp_STORE(cast(N), OpNo); break; + case ISD::STRICT_FP_ROUND: case ISD::FP_ROUND: Res = ScalarizeVecOp_FP_ROUND(N, OpNo); break; @@ -1603,6 +1606,7 @@ case ISD::TRUNCATE: Res = SplitVecOp_TruncateHelper(N); break; + case ISD::STRICT_FP_ROUND: case ISD::FP_ROUND: Res = SplitVecOp_FP_ROUND(N); break; case ISD::FCOPYSIGN: Res = SplitVecOp_FCOPYSIGN(N); break; case ISD::STORE: @@ -1637,6 +1641,7 @@ case ISD::CTTZ: case ISD::CTLZ: case ISD::CTPOP: + case ISD::STRICT_FP_EXTEND: case ISD::FP_EXTEND: case ISD::SIGN_EXTEND: case ISD::ZERO_EXTEND: @@ -2342,8 +2347,11 @@ case ISD::ANY_EXTEND: case ISD::FP_EXTEND: + case ISD::STRICT_FP_EXTEND: case ISD::FP_ROUND: + case ISD::STRICT_FP_ROUND: case ISD::FP_TO_SINT: + case ISD::STRICT_FP_TO_SINT: case ISD::FP_TO_UINT: case ISD::SIGN_EXTEND: case ISD::SINT_TO_FP: @@ -2353,6 +2361,10 @@ Res = WidenVecRes_Convert(N); break; + case ISD::STRICT_FP_TO_UINT: + llvm_unreachable("Expansion of STRICT_FP_TO_UINT missed in earlier pass!"); + break; + case ISD::BITREVERSE: case ISD::BSWAP: case ISD::CTLZ: @@ -3424,6 +3436,7 @@ Res = WidenVecOp_EXTEND(N); break; + case ISD::STRICT_FP_EXTEND: case ISD::FP_EXTEND: case ISD::FP_TO_SINT: case ISD::FP_TO_UINT: Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp =================================================================== --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp @@ -7174,16 +7174,49 @@ NewOpc = ISD::FNEARBYINT; IsUnary = true; break; + case ISD::STRICT_FP_TO_SINT: NewOpc = ISD::FP_TO_SINT; break; + case ISD::STRICT_FP_TO_UINT: NewOpc = ISD::FP_TO_UINT; break; + case ISD::STRICT_FP_ROUND: NewOpc = ISD::FP_ROUND; IsUnary = true; break; + case ISD::STRICT_FP_EXTEND: NewOpc = ISD::FP_EXTEND; IsUnary = true; break; } + bool IsChained = true; + switch (OrigOpc) { + default: + break; + case ISD::STRICT_FP_TO_SINT: + case ISD::STRICT_FP_TO_UINT: + case ISD::STRICT_FP_ROUND: + case ISD::STRICT_FP_EXTEND: + IsChained = false; + break; + } + // We're taking this node out of the chain, so we need to re-link things. - SDValue InputChain = Node->getOperand(0); - SDValue OutputChain = SDValue(Node, 1); - ReplaceAllUsesOfValueWith(OutputChain, InputChain); + if (IsChained) { + SDValue InputChain = Node->getOperand(0); + SDValue OutputChain = SDValue(Node, 1); + ReplaceAllUsesOfValueWith(OutputChain, InputChain); + } - SDVTList VTs = getVTList(Node->getOperand(1).getValueType()); + SDVTList VTs; SDNode *Res = nullptr; - if (IsUnary) + + switch (OrigOpc) { + default: + VTs = getVTList(Node->getOperand(1).getValueType()); + break; + case ISD::STRICT_FP_TO_SINT: + case ISD::STRICT_FP_TO_UINT: + case ISD::STRICT_FP_ROUND: + case ISD::STRICT_FP_EXTEND: + VTs = getVTList(Node->ValueList[0]); + break; + } + + if (!IsChained) + Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(0) }); + else if (IsUnary) Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1) }); else if (IsTernary) Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1), Index: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp =================================================================== --- lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -5569,6 +5569,10 @@ case Intrinsic::experimental_constrained_fdiv: case Intrinsic::experimental_constrained_frem: case Intrinsic::experimental_constrained_fma: + case Intrinsic::experimental_constrained_fptosi: + case Intrinsic::experimental_constrained_fptoui: + case Intrinsic::experimental_constrained_fptrunc: + case Intrinsic::experimental_constrained_fpext: case Intrinsic::experimental_constrained_sqrt: case Intrinsic::experimental_constrained_pow: case Intrinsic::experimental_constrained_powi: @@ -6233,6 +6237,7 @@ const ConstrainedFPIntrinsic &FPI) { SDLoc sdl = getCurSDLoc(); unsigned Opcode; + bool IsChained = true; switch (FPI.getIntrinsicID()) { default: llvm_unreachable("Impossible intrinsic"); // Can't reach here. case Intrinsic::experimental_constrained_fadd: @@ -6253,6 +6258,18 @@ case Intrinsic::experimental_constrained_fma: Opcode = ISD::STRICT_FMA; break; + case Intrinsic::experimental_constrained_fptosi: + Opcode = ISD::STRICT_FP_TO_SINT; + IsChained = false; + break; + case Intrinsic::experimental_constrained_fptrunc: + Opcode = ISD::STRICT_FP_ROUND; + IsChained = false; + break; + case Intrinsic::experimental_constrained_fpext: + Opcode = ISD::STRICT_FP_EXTEND; + IsChained = false; + break; case Intrinsic::experimental_constrained_sqrt: Opcode = ISD::STRICT_FSQRT; break; @@ -6294,12 +6311,21 @@ SDValue Chain = getRoot(); SmallVector ValueVTs; ComputeValueVTs(TLI, DAG.getDataLayout(), FPI.getType(), ValueVTs); - ValueVTs.push_back(MVT::Other); // Out chain + if (IsChained) + ValueVTs.push_back(MVT::Other); // Out chain SDVTList VTs = DAG.getVTList(ValueVTs); SDValue Result; - if (FPI.isUnaryOp()) + if (Opcode == ISD::STRICT_FP_ROUND || Opcode == ISD::STRICT_FP_EXTEND) + Result = DAG.getNode(Opcode, sdl, VTs, + { getValue(FPI.getArgOperand(0)), + DAG.getTargetConstant(0, sdl, + TLI.getPointerTy(DAG.getDataLayout())) }); + else if (Opcode == ISD::STRICT_FP_TO_SINT) Result = DAG.getNode(Opcode, sdl, VTs, + { getValue(FPI.getArgOperand(0)) }); + else if (FPI.isUnaryOp()) + Result = DAG.getNode(Opcode, sdl, VTs, { Chain, getValue(FPI.getArgOperand(0)) }); else if (FPI.isTernaryOp()) Result = DAG.getNode(Opcode, sdl, VTs, @@ -6311,9 +6337,11 @@ { Chain, getValue(FPI.getArgOperand(0)), getValue(FPI.getArgOperand(1)) }); - assert(Result.getNode()->getNumValues() == 2); - SDValue OutChain = Result.getValue(1); - DAG.setRoot(OutChain); + if (IsChained) { + assert(Result.getNode()->getNumValues() == 2); + SDValue OutChain = Result.getValue(1); + DAG.setRoot(OutChain); + } SDValue FPResult = Result.getValue(0); setValue(&FPI, FPResult); } Index: lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp =================================================================== --- lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp +++ lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp @@ -290,14 +290,18 @@ case ISD::ZERO_EXTEND_VECTOR_INREG: return "zero_extend_vector_inreg"; case ISD::TRUNCATE: return "truncate"; case ISD::FP_ROUND: return "fp_round"; + case ISD::STRICT_FP_ROUND: return "strict_fp_round"; case ISD::FLT_ROUNDS_: return "flt_rounds"; case ISD::FP_ROUND_INREG: return "fp_round_inreg"; case ISD::FP_EXTEND: return "fp_extend"; + case ISD::STRICT_FP_EXTEND: return "strict_fp_extend"; case ISD::SINT_TO_FP: return "sint_to_fp"; case ISD::UINT_TO_FP: return "uint_to_fp"; case ISD::FP_TO_SINT: return "fp_to_sint"; + case ISD::STRICT_FP_TO_SINT: return "strict_fp_to_sint"; case ISD::FP_TO_UINT: return "fp_to_uint"; + case ISD::STRICT_FP_TO_UINT: return "strict_fp_to_uint"; case ISD::BITCAST: return "bitcast"; case ISD::ADDRSPACECAST: return "addrspacecast"; case ISD::FP16_TO_FP: return "fp16_to_fp"; Index: lib/CodeGen/StrictFP.cpp =================================================================== --- lib/CodeGen/StrictFP.cpp +++ lib/CodeGen/StrictFP.cpp @@ -0,0 +1,295 @@ +//===----- StrictFP.cpp - Required transforms for strict FP ---------------===// +// +// The LLVM Compiler Infrastructure +// +// This file is distributed under the University of Illinois Open Source +// License. See LICENSE.TXT for details. +// +//===----------------------------------------------------------------------===// +/// +/// \file +/// This file contains transforms necessary for strict floating point +/// operations. The transforms done vary depending on the backend. +/// Currently the full set of transforms is: +/// - Conversions of floating point types to unsigned integral types +/// are transformed to avoid the speculative execution present in +/// the default lowering. +/// +//===----------------------------------------------------------------------===// + +#include "llvm/ADT/ArrayRef.h" +#include "llvm/ADT/Statistic.h" +#include "llvm/Analysis/TargetTransformInfo.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/CodeGen/TargetLowering.h" +#include "llvm/CodeGen/TargetPassConfig.h" +#include "llvm/CodeGen/TargetSubtargetInfo.h" +#include "llvm/IR/InstIterator.h" +#include "llvm/IR/InstrTypes.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Intrinsics.h" +#include "llvm/IR/Module.h" +#include "llvm/Pass.h" +#include "llvm/Support/Debug.h" +#include "llvm/Support/raw_ostream.h" +#include "llvm/Transforms/IPO.h" +#include "llvm/Transforms/Utils/BasicBlockUtils.h" +using namespace llvm; + +#define DEBUG_TYPE "constrained-fp-transforms" + +STATISTIC(NumStrictFPOps, "Number of strict floating point ops transformed"); + +namespace { + +class StrictFPPass : public FunctionPass { +public: + static char ID; + + const DataLayout *DL; + const TargetLowering *TLI; + + StrictFPPass() : FunctionPass(ID) { + initializeStrictFPPassPass(*PassRegistry::getPassRegistry()); + } + + bool runOnFunction(Function &) override; + +private: + void inspectIntrinsicCall(IntrinsicInst *); + + bool processIntrinsicCall(IntrinsicInst *); + + bool processVectorIntrinsicCall(IntrinsicInst *); + + void replaceConstrainedFPToUI(IntrinsicInst *); + + void replaceVectorConstrainedFPToUI(IntrinsicInst *); + + std::vector IntrinsicWorkList; + std::vector VectorWorkList; +}; + +bool StrictFPPass::runOnFunction(Function &F) { + bool Changed = false; + DL = &F.getParent()->getDataLayout(); + + auto *TPC = getAnalysisIfAvailable(); + if (!TPC) + return false; + + auto &TM = TPC->getTM(); + + TLI = TM.getSubtargetImpl(F)->getTargetLowering(); + + for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; I++) { + if (auto *Call = dyn_cast(&*I)) + inspectIntrinsicCall(Call); + } + + for (auto *I : VectorWorkList) { + Changed |= processVectorIntrinsicCall(I); + } + + for (auto *I : IntrinsicWorkList) { + Changed |= processIntrinsicCall(I); + } + + VectorWorkList.clear(); + IntrinsicWorkList.clear(); + + return Changed; +} + +void StrictFPPass::inspectIntrinsicCall(IntrinsicInst *I) { + + switch (Intrinsic::ID IID = I->getIntrinsicID()) { + default: + return; + case Intrinsic::experimental_constrained_fptoui: + Value *IntDst = I; + Type *IntDstType = IntDst->getType(); + EVT VT = TLI->getValueType(*DL, IntDstType); + + auto Action = TLI->getOperationAction(ISD::FP_TO_UINT, VT); + + // We don't currently handle Custom or Promote for strict FP pseudo-ops. + // For now, we just expand for those cases. + if (Action != TargetLowering::Legal) + Action = TargetLowering::Expand; + + if (Action == TargetLowering::Expand) { + if (IntDstType->isVectorTy()) + VectorWorkList.push_back(I); + else + IntrinsicWorkList.push_back(I); + } + + break; + } + return; +} + +bool StrictFPPass::processIntrinsicCall(IntrinsicInst *Call) { + switch (Intrinsic::ID IID = Call->getIntrinsicID()) { + default: + return false; + case Intrinsic::experimental_constrained_fptoui: + replaceConstrainedFPToUI(Call); + break; + } + return true; +} + +bool StrictFPPass::processVectorIntrinsicCall(IntrinsicInst *Call) { + switch (Intrinsic::ID IID = Call->getIntrinsicID()) { + default: + return false; + case Intrinsic::experimental_constrained_fptoui: + replaceVectorConstrainedFPToUI(Call); + break; + } + return true; +} + +void StrictFPPass::replaceConstrainedFPToUI(IntrinsicInst *I) { + + // Four blocks: + // #1 Gets the compare instruction, is the original block + // #2 Gets conversion instructions when in signed range + // #3 Conversion instructions when out of signed range + // #4 Gets the PHI plus the remainder of the original block + // + // The original call gets replaced with the PHI + // + // An example of a transform of a double into an unsigned i32: + // + // entry: + // %within.sint.range = fcmp ult double 4.210000e+01, 0x41E0000000000000 + // br i1 %within.sint.range, label %0, label %2 + + // ;