Index: llvm/docs/LangRef.rst =================================================================== --- llvm/docs/LangRef.rst +++ llvm/docs/LangRef.rst @@ -22550,6 +22550,1180 @@ ``invariant.group`` metadata. It does not read any memory and can be speculated. +.. _fpbuiltin: + +Floating-Point Builtin Intrinsics +------------------------------------- + +These intrinsics are used to represent common floating-point operations with +the explicit expectation that the semantics of the operation may be modified +by call-site attributes that are specific to these intrinsics. Although many +of these operations correspond directly to functions defined by the standard +C math library, these intrinsics are intended to allow replacement of the +intrinsic with implementation outside the standard library, such as vector +implementations of the operation or alternate implementations to satisfy +different accuracy requirements. + +The following call-site attributes are currently recognized as being associated +with the floating-point builtin intrinsics: + +``"fp-max-error"=""`` + This attribute specifies the required accuracy for the operation in ULPs. + The accuracy value must be a non-negative floating-point number. A value + of 0.5 or less indicates that the result is required to be correctly + rounded according to IEEE-754 rules. The default rounding mode + (round-to-nearest) may be assumed. + + If this attribute is absent, basic operations (fadd, fsub, fmul, fdiv, + frem, and sqrt) are assumed to provide correctly rounded result. The + accuracy of other operations is target-dependent, corresponding to the + accuracy of the target-default implementation of the operation (usually + the implementation provided by the standard math library). If this + attribute is present, the intrinsic may only be replaced with + implementations which are known to provide at least the accuracy described. + An implementation which is more accurate than required by this attribute + may be used. + +The semantics of the fpbuiltin intrinsics may be further constrained by defining +new callsite attributes beginning with "fp-". All such string attribute +identifiers are considered reserved for use with fpbuiltin intrinsics. + +No transformation should be performed on any fpbuiltin intrinsic if the +intrinsic has any callsite attributes begining with "fp-" that that code +performing the transformation does not recognize. + +Unless otherwise specified using callsite attributes, the fpbuiltin intrinsics +do not set ``errno`` or and may be assumed not to trap or raise floating-point +exceptions. + +All fpbuiltin intrinsics are overloaded intrinsics which may operate on any +scalar or vector floating-point type. Not all targets support all types. + +'``llvm.fpbuiltin.fadd``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.fadd( , ) + +Overview: +""""""""" + +The '``llvm.fpbuiltin.fadd``' intrinsic returns the sum of its two operands. + +Arguments: +"""""""""" + +The arguments to the '``llvm.fpbuiltin.fadd``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. Both arguments must have identical types. + +Semantics: +"""""""""" + +The value produced is the floating-point sum of the two value operands and has +the same type as the operands. Unless modified by the "fp-max-error" callsite +attribute, the result is assumed to be correctly rounded. + + +'``llvm.fpbuiltin.fsub``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.fsub( , ) + +Overview: +""""""""" + +The '``llvm.fpbuiltin.fsub``' intrinsic returns the difference of its two +operands. + +Arguments: +"""""""""" + +The arguments to the '``llvm.fpbuiltin.fsub``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. Both arguments must have identical types. + +Semantics: +"""""""""" + +The value produced is the floating-point difference of the two value operands +and has the same type as the operands. Unless modified by the "fp-max-error" +callsite attribute, the result is assumed to be correctly rounded. + + +'``llvm.fpbuiltin.fmul``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.fmul( , ) + +Overview: +""""""""" + +The '``llvm.fpbuiltin.fmul``' intrinsic returns the product of its two operands. + +Arguments: +"""""""""" + +The arguments to the '``llvm.fpbuiltin.fmul``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. Both arguments must have identical types. + +Semantics: +"""""""""" + +The value produced is the floating-point product of the two value operands and +has the same type as the operands. Unless modified by the "fp-max-error" +callsite attribute, the result is assumed to be correctly rounded. + + +'``llvm.fpbuiltin.fdiv``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.fdiv( , ) + +Overview: +""""""""" + +The '``llvm.fpbuiltin.fdiv``' intrinsic returns the quotient of its two +operands. + +Arguments: +"""""""""" + +The arguments to the '``llvm.fpbuiltin.fdiv``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. Both arguments must have identical types. + +Semantics: +"""""""""" + +The value produced is the floating-point quotient of the two value operands and +has the same type as the operands. Unless modified by the "fp-max-error" +callsite attribute, the result is assumed to be correctly rounded. + + +'``llvm.fpbuiltin.frem``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.frem( , ) + +Overview: +""""""""" + +The '``llvm.fpbuiltin.frem``' intrinsic returns the remainder from the division +of its two operands. + + +Arguments: +"""""""""" + +The arguments to the '``llvm.fpbuiltin.frem``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. Both arguments must have identical types. + +Semantics: +"""""""""" + +The value produced is the floating-point remainder from the division of the two +value operands and has the same type as the operands. The remainder has the +same sign as the dividend. Unless modified by the "fp-max-error" callsite +attribute, the result is assumed to be correctly rounded. + + +'``llvm.fpbuilt.sin``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.sin( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.sin``' intrinsics return the sine of the operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.sin``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point sine of the operand and +has the same type as the operand. Unless modified by the "fp-max-error" +callsite attribute, the result is assumed to have the accuracy of the +target-default implementation of the sine operation for the input type. + + +'``llvm.fpbuilt.cos``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.cos( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.cos``' intrinsics return the cosine of the operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.cos``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point cosine of the operand and +has the same type as the operand. Unless modified by the "fp-max-error" +callsite attribute, the result is assumed to have the accuracy of the +target-default implementation of the cosine operation for the input type. + + +'``llvm.fpbuilt.tan``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.tan( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.tan``' intrinsics return the tangent of the operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.tan``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point tangent of the operand and +has the same type as the operand. Unless modified by the "fp-max-error" +callsite attribute, the result is assumed to have the accuracy of the +target-default implementation of the tangent operation for the input type. + + +'``llvm.fpbuilt.sinh^``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.sinh( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.sinh``' intrinsics return the hyperbolic sine of the +operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.sinh``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point hyperbolic sine of the operand +and has the same type as the operand. Unless modified by the "fp-max-error" +callsite attribute, the result is assumed to have the accuracy of the +target-default implementation of the hyperbolic sine operation for the input +type. + + +'``llvm.fpbuilt.cosh``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.cosh( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.cosh``' intrinsics return the hyperbolic cosine of the +operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.cosh``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point hyperbolic cosine of the operand +and has the same type as the operand. Unless modified by the "fp-max-error" +callsite attribute, the result is assumed to have the accuracy of the +target-default implementation of the hyperbolic cosine operation for the input +type. + + +'``llvm.fpbuilt.tanh``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.tanh( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.tanh``' intrinsics return the hyperbolic tangent of the +operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.tanh``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point hyperbolic tangent of the operand +and has the same type as the operand. Unless modified by the "fp-max-error" +callsite attribute, the result is assumed to have the accuracy of the +target-default implementation of the hyperbolic tangent operation for the +input type. + + +'``llvm.fpbuilt.asin``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.asin( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.asin``' intrinsics return the principal value of the +arc sine of the operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.asin``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the principal value of the floating-point arc sine of +the operand and has the same type as the operand. Unless modified by the +"fp-max-error" callsite attribute, the result is assumed to have the accuracy +of the target-default implementation of the arc sine operation for the input +type. + + +'``llvm.fpbuilt.acos``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.acos( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.acos``' intrinsics return the principal value of the +arc cosine of the operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.acos``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the principal value of the floating-point arc cosine +of the operand and has the same type as the operand. Unless modified by the +"fp-max-error" callsite attribute, the result is assumed to have the accuracy +of the target-default implementation of the arc cosine operation for the input +type. + + +'``llvm.fpbuilt.atan``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.atan( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.atan``' intrinsics return the principal value of the +arc tangent of the operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.atan``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the principal value of the floating-point arc tangent +of the operand and has the same type as the operand. Unless modified by the +"fp-max-error" callsite attribute, the result is assumed to have the accuracy +of the target-default implementation of the arc tangent operation for the +input type. + + +'``llvm.fpbuilt.atan2``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.atan2( , ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.atan2``' intrinsics return the principal value of the +arc tangent of op1/op2, expressed in radians. + +Arguments: +"""""""""" + +The arguments to the '``llvm.fpbuiltin.atan2``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. Both arguments must have identical types. + +Semantics: +"""""""""" + +The value produced is the principal value of the floating-point arc tangent +of the quotient of the operands, expressed in radians, and has the same type +as the operands. Unless modified by the "fp-max-error" callsite attribute, +the result is assumed to have the accuracy of the target-default +implementation of the atan2 operation for the input type. + + +'``llvm.fpbuilt.asinh^``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.asinh( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.asinh``' intrinsics return the area hyperbolic sine of the +operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.asinh``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point area hyperbolic sine of the operand +and has the same type as the operand. Unless modified by the "fp-max-error" +callsite attribute, the result is assumed to have the accuracy of the +target-default implementation of the area hyperbolic sine operation for the +input type. + + +'``llvm.fpbuilt.acosh``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.acosh( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.acosh``' intrinsics return the area hyperbolic cosine of +the operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.acosh``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point area hyperbolic cosine of the operand +and has the same type as the operand. Unless modified by the "fp-max-error" +callsite attribute, the result is assumed to have the accuracy of the +target-default implementation of the area hyperbolic cosine operation for the +input type. + + +'``llvm.fpbuilt.atanh``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.atanh( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.tanh``' intrinsics return the area hyperbolic tangent of +the operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.atanh``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point area hyperbolic tangent of the operand +and has the same type as the operand. Unless modified by the "fp-max-error" +callsite attribute, the result is assumed to have the accuracy of the +target-default implementation of the area hyperbolic tangent operation for the +input type. + + +'``llvm.fpbuilt.exp``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.exp( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.exp``' intrinsics return the base-e exponential function +of the operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.exp``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point base-e exponential function of the +operand and has the same type as the operand. Unless modified by the +"fp-max-error" callsite attribute, the result is assumed to have the accuracy +of the target-default implementation of the exp operation for the input type. + + +'``llvm.fpbuilt.exp2``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.exp2( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.exp2``' intrinsics return the base-2 exponential function +of the operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.exp2``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point base-2 exponential function of the +operand and has the same type as the operand. Unless modified by the +"fp-max-error" callsite attribute, the result is assumed to have the accuracy +of the target-default implementation of the exp2 operation for the input type. + + +'``llvm.fpbuilt.exp10``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.exp10( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.exp10``' intrinsics return the base-10 exponential function +of the operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.exp10``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point base-10 exponential function of the +operand and has the same type as the operand. Unless modified by the +"fp-max-error" callsite attribute, the result is assumed to have the accuracy +of the target-default implementation of the exp10 operation for the input type. + + +'``llvm.fpbuilt.expm1``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.expm1( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.expm1``' intrinsics return e raised to the power of the +operand minus one. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.expm1``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point value of e raised to the power the +operand minus one and has the same type as the operand. Unless modified by the +"fp-max-error" callsite attribute, the result is assumed to have the accuracy +of the target-default implementation of the expm1 operation for the input type. + + +'``llvm.fpbuilt.log``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.log( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.log``' intrinsics return the natural logarithm of the +operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.log``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point natural logarithm of the operand and +has the same type as the operand. Unless modified by the "fp-max-error" +callsite attribute, the result is assumed to have the accuracy of the +target-default implementation of the log operation for the input type. + + +'``llvm.fpbuilt.log2``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.log2( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.log2``' intrinsics return the base-2 logarithm of the +operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.log2``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point base-2 logarithm of the operand and +has the same type as the operand. Unless modified by the "fp-max-error" +callsite attribute, the result is assumed to have the accuracy of the +target-default implementation of the log2 operation for the input type. + + +'``llvm.fpbuilt.log10``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.log10( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.log10``' intrinsics return the base-10 logarithm of the +operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.log10``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point base-10 logarithm of the operand and +has the same type as the operand. Unless modified by the "fp-max-error" +callsite attribute, the result is assumed to have the accuracy of the +target-default implementation of the log10 operation for the input type. + + +'``llvm.fpbuilt.log1p``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.log( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.log``' intrinsics return the natural logarithm of +one plus the operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.log1p``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point natural logarithm of one plus +the operand and has the same type as the operand. Unless modified by the +"fp-max-error" callsite attribute, the result is assumed to have the accuracy +of the target-default implementation of the log1p operation for the input +type. + + +'``llvm.fpbuilt.hypot``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.hypot( , ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.hypot``' intrinsics return the hypotenuse of a +right triangle whose legs are op1 and op2. + +Arguments: +"""""""""" + +The arguments to the '``llvm.fpbuiltin.hypot``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. Both arguments must have identical types. + +Semantics: +"""""""""" + +The value produced is the floating-point hypotenuse of a right triangle +whose legs are the operands and has the same type as the operands. Unless +modified by the "fp-max-error" callsite attribute, the result is assumed +to have the accuracy of the target-default implementation of the hypot +operation for the input type. + + +'``llvm.fpbuilt.pow``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.pow( , ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.pow``' intrinsics return the value of op1 raised +to the power of op2. + +Arguments: +"""""""""" + +The arguments to the '``llvm.fpbuiltin.pow``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. Both arguments must have identical types. + +Semantics: +"""""""""" + +The value produced is the floating-point value of the first operand raised +to the power of the second operand and has the same type as the operands. +Unless modified by the "fp-max-error" callsite attribute, the result is +assumed to have the accuracy of the target-default implementation of the pow +operation for the input type. + + +'``llvm.fpbuilt.ldexp``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.ldexp( , ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.ldexp``' intrinsics return the value of op1 multiplied by +by two raised to the power of op2. + +Arguments: +"""""""""" + +The first argument to the '``llvm.fpbuiltin.ldexp``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. The second argument must be a 32-bit integer value +or a :ref:`vector ` of 32-bit integers with the same number of +elements as the first operand. + +Semantics: +"""""""""" + +The value produced is the floating-point value of the first operand multiplied +by two raised to the power of the second operand and has the same type as the +operands. Unless modified by the "fp-max-error" callsite attribute, the result +is assumed to have the accuracy of the target-default implementation of the +ldexp operation for the input type. + + +'``llvm.fpbuilt.sqrt``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.sqrt( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.sqrt``' intrinsics return the square root of the operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.sqrt``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point square root the operand and +has the same type as the operand. Unless modified by the "fp-max-error" +callsite attribute, the result is assumed to be correctly rounded. + + +'``llvm.fpbuilt.rsqrt``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.rsqrt( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.sqrt``' intrinsics return the inverse square root of the +operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.rsqrt``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point inverse square root the operand +and has the same type as the operand. Unless modified by the "fp-max-error" +callsite attribute, the result is assumed to have the accuracy of the +target-default implementation of the rsqrt operation for the input type. + + +'``llvm.fpbuilt.erf``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.erf( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.erf``' intrinsics return the error function value for +the operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.erf``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point error function value for the operand +and has the same type as the operand. Unless modified by the "fp-max-error" +callsite attribute, the result is assumed to have the accuracy of the +target-default implementation of the erf operation for the input type. + + +'``llvm.fpbuilt.erfc``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.erfc( ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.erfc``' intrinsics return the error function value for +the operand. + +Arguments: +"""""""""" + +The argument to the '``llvm.fpbuiltin.erfc``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. + +Semantics: +"""""""""" + +The value produced is the floating-point complementary error function value +for the operand and has the same type as the operand. Unless modified by the +"fp-max-error" callsite attribute, the result is assumed to have the accuracy +of the target-default implementation of the erf operation for the input type. + + +'``llvm.fpbuilt.sincos``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare @llvm.fpbuiltin.sincos( , ptr , ptr ) + + +Overview: +""""""""" + +The '``llvm.fpbuilt.sincos``' intrinsics compute the sine and cosine of the +first operand and returns the results via the pointers passed as the second +and third operands. + +Arguments: +"""""""""" + +The first argument to the '``llvm.fpbuiltin.sincos``' intrinsic must be +:ref:`floating-point ` or :ref:`vector ` of +floating-point values. The second and third arguments must be dereferenceable +pointers to memory which can hold a value of the first operand's type. + +Semantics: +"""""""""" + +The values produced are the floating-point sine and cosine of the first +operand and are stored using the same type as the first operand. Unless +modified by the "fp-max-error" callsite attribute, the result is assumed to +have the accuracy of the target-default implementation of the sincos operation +for the input type. + .. _constrainedfp: Index: llvm/include/llvm/Analysis/AltMathLibFuncs.def =================================================================== --- /dev/null +++ llvm/include/llvm/Analysis/AltMathLibFuncs.def @@ -0,0 +1,82 @@ +//===-- AltMathLibFuncs.def - Library information ---------*- C++ -*-------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +// This .def file will create descriptions of available fpbuilt math library +// function implementations and their constraining attributes. The current +// support is limited to a fake test library for verifying the infrastructure. +// The fake implementation can be removed when a real implementation is +// available. + +// An accuracy of 0.5 indicates that the result is exact or correctly rounded. + +#define FIXED(NL) ElementCount::getFixed(NL) +#define SCALABLE(NL) ElementCount::getScalable(NL) + +#if !(defined(TLI_DEFINE_ALTMATHFUNC)) +#define TLI_DEFINE_ALTMATHFUNC(IID, TYPE, VECSIZE, NAME, ACCURACY) \ + {IID, TYPE, VECSIZE, NAME, ACCURACY}, +#endif + + +#if defined(TLI_DEFINE_TEST_ALTMATHFUNCS) + +// Just define a few examples to test the infrastructure + +// TEST_ALTMATH_LIB Half precision implementations +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_fdiv, Type::HalfTyID, FIXED(1), "__test_altmath_fdivh_med", 2.5) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_sin, Type::HalfTyID, FIXED(1), "__test_altmath_sinh_high", 1.0) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_cos, Type::HalfTyID, FIXED(1), "__test_altmath_cosh_high", 1.0) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_cos, Type::HalfTyID, FIXED(1), "__test_altmath_cosh_med", 4.0) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_sqrt, Type::HalfTyID, FIXED(1), "__test_altmath_sqrth_cr", 0.5) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_rsqrt, Type::HalfTyID, FIXED(1), "__test_altmath_rsqrth_cr", 0.5) + +// TEST_ALTMATH_LIB Single precision implementations +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_fdiv, Type::FloatTyID, FIXED(1), "__test_altmath_fdivf_med", 2.5) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_sin, Type::FloatTyID, FIXED(1), "__test_altmath_sinf_cr", 0.5) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_sin, Type::FloatTyID, FIXED(1), "__test_altmath_sinf_high", 1.0) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_cos, Type::FloatTyID, FIXED(1), "__test_altmath_cosf_high", 1.0) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_cos, Type::FloatTyID, FIXED(1), "__test_altmath_cosf_med", 4.0) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_tan, Type::FloatTyID, FIXED(1), "__test_altmath_tanf_high", 1.0) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_sqrt, Type::FloatTyID, FIXED(1), "__test_altmath_sqrtf_cr", 0.5) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_sqrt, Type::FloatTyID, FIXED(1), "__test_altmath_sqrtf_med", 2.5) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_rsqrt, Type::FloatTyID, FIXED(1), "__test_altmath_rsqrtf_cr", 0.5) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_rsqrt, Type::FloatTyID, FIXED(1), "__test_altmath_rsqrtf_high", 1.0) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_rsqrt, Type::FloatTyID, FIXED(1), "__test_altmath_rsqrtf_low", 4096.0) + +// TEST_ALTMATH_LIB Double precision implementations +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_fdiv, Type::DoubleTyID, FIXED(1), "__test_altmath_fdiv_med", 2.5) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_sin, Type::DoubleTyID, FIXED(1), "__test_altmath_sin_cr", 0.5) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_sin, Type::DoubleTyID, FIXED(1), "__test_altmath_sin_high", 1.0) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_cos, Type::DoubleTyID, FIXED(1), "__test_altmath_cos_high", 1.0) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_cos, Type::DoubleTyID, FIXED(1), "__test_altmath_cos_med", 4.0) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_tan, Type::DoubleTyID, FIXED(1), "__test_altmath_tan_high", 1.0) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_sqrt, Type::DoubleTyID, FIXED(1), "__test_altmath_sqrt_cr", 0.5) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_sqrt, Type::DoubleTyID, FIXED(1), "__test_altmath_sqrt_med", 2.5) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_rsqrt, Type::DoubleTyID, FIXED(1), "__test_altmath_rsqrt_cr", 0.5) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_rsqrt, Type::DoubleTyID, FIXED(1), "__test_altmath_rsqrt_high", 1.0) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_rsqrt, Type::DoubleTyID, FIXED(1), "__test_altmath_rsqrt_low", 4096.0) + +// TEST_ALTMATH_LIB 4 x float implementations +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_sin, Type::FloatTyID, FIXED(4), "__test_altmath_sinf4_high", 1.0) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_cos, Type::FloatTyID, FIXED(4), "__test_altmath_cosf4_high", 1.0) + +// TEST_ALTMATH_LIB 8 x float implementations +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_sin, Type::FloatTyID, FIXED(8), "__test_altmath_sinf8_high", 1.0) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_cos, Type::FloatTyID, FIXED(8), "__test_altmath_cosf8_high", 1.0) + +// TEST_ALTMATH_LIB 2 x double implementations +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_sin, Type::DoubleTyID, FIXED(2), "__test_altmath_sin2_high", 1.0) +TLI_DEFINE_ALTMATHFUNC(Intrinsic::fpbuiltin_cos, Type::DoubleTyID, FIXED(2), "__test_altmath_cos2_high", 1.0) + + +#endif + + + +#undef TLI_DEFINE_ALTMATHFUNC +#undef TLI_DEFINE_TEST_ALTMATHFUNCS Index: llvm/include/llvm/Analysis/TargetLibraryInfo.h =================================================================== --- llvm/include/llvm/Analysis/TargetLibraryInfo.h +++ llvm/include/llvm/Analysis/TargetLibraryInfo.h @@ -13,6 +13,7 @@ #include "llvm/ADT/DenseMap.h" #include "llvm/ADT/Optional.h" #include "llvm/IR/InstrTypes.h" +#include "llvm/IR/IntrinsicInst.h" #include "llvm/IR/PassManager.h" #include "llvm/Pass.h" @@ -23,6 +24,15 @@ class Module; class Triple; +/// Describes a possible implementation of a floating point builtin operation +struct AltMathDesc { + Intrinsic::ID IntrinID; + Type::TypeID BaseFPType; + ElementCount VectorizationFactor; + StringRef FnImplName; + float Accuracy; +}; + /// Describes a possible vectorization of a function. /// Function 'VectorFnName' is equivalent to 'ScalarFnName' vectorized /// by a factor 'VectorizationFactor'. @@ -68,6 +78,10 @@ return static_cast((AvailableArray[F/4] >> 2*(F&3)) & 3); } + /// Alternate math library functions - sorted by intrinsic ID, then type, + /// then vector size, then accuracy + std::vector AltMathFuncDescs; + /// Vectorization descriptors - sorted by ScalarFnName. std::vector VectorDescs; /// Scalarization descriptors - same content as VectorDescs but sorted based @@ -96,6 +110,19 @@ SVML // Intel short vector math library. }; + /// List of known alternate math libraries. + /// + /// The alternate math library provides a set of functions that can ve used + /// to replace llvm.fpbuiltin intrinsic calls when one or more constraining + /// attributes are specified. + /// The library can be specified by either frontend or a commandline option, + /// and then used by addAltMathFunctionsFromLib for populating the tables of + /// math function implementations. + enum AltMathLibrary { + NoAltMathLibrary, // Don't use any alternate math library + TestAltMathLibrary // Use a fake alternate math library for testing + }; + TargetLibraryInfoImpl(); explicit TargetLibraryInfoImpl(const Triple &T); @@ -147,6 +174,19 @@ /// This can be used for options like -fno-builtin. void disableAllFunctions(); + /// Add a set of alternate math library function implementations with + /// attributes that can be used to select an implementation for an + /// llvm.fpbuiltin intrinsic + void addAltMathFunctions(ArrayRef Fns); + + /// Calls addAltMathFunctions with a known preset of functions for the + /// given alternate math library. + void addAltMathFunctionsFromLib(enum AltMathLibrary AltLib); + + /// Select an alternate math library implementation that meets the criteria + /// described by an FPBuiltinIntrinsic call. + StringRef selectFPBuiltinImplementation(FPBuiltinIntrinsic *Builtin) const; + /// Add a set of scalar -> vector mappings, queryable via /// getVectorizedFunction and getScalarizedFunction. void addVectorizableFunctions(ArrayRef Fns); @@ -337,6 +377,9 @@ bool isFunctionVectorizable(StringRef F) const { return Impl->isFunctionVectorizable(F); } + StringRef selectFPBuiltinImplementation(FPBuiltinIntrinsic *Builtin) const { + return Impl->selectFPBuiltinImplementation(Builtin); + } StringRef getVectorizedFunction(StringRef F, const ElementCount &VF) const { return Impl->getVectorizedFunction(F, VF); } Index: llvm/include/llvm/CodeGen/CodeGenPassBuilder.h =================================================================== --- llvm/include/llvm/CodeGen/CodeGenPassBuilder.h +++ llvm/include/llvm/CodeGen/CodeGenPassBuilder.h @@ -25,6 +25,7 @@ #include "llvm/Analysis/TargetTransformInfo.h" #include "llvm/Analysis/TypeBasedAliasAnalysis.h" #include "llvm/CodeGen/ExpandReductions.h" +#include "llvm/CodeGen/FPBuiltinFnSelection.h" #include "llvm/CodeGen/MachinePassManager.h" #include "llvm/CodeGen/PreISelIntrinsicLowering.h" #include "llvm/CodeGen/ReplaceWithVeclib.h" @@ -599,6 +600,7 @@ addPass(PreISelIntrinsicLoweringPass()); derived().addIRPasses(addPass); + addPass(FPBuiltinFnSelectionPass()); derived().addCodeGenPrepare(addPass); addPassesToHandleExceptions(addPass); derived().addISelPrepare(addPass); Index: llvm/include/llvm/CodeGen/FPBuiltinFnSelection.h =================================================================== --- /dev/null +++ llvm/include/llvm/CodeGen/FPBuiltinFnSelection.h @@ -0,0 +1,29 @@ +//===- FPBuiltinFnSelection.h - Pre-ISel intrinsic lowering pass ----------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This pass implements alternate math library implementation selection for +// llvm.fpbuiltin.* intrinsics. +// +//===----------------------------------------------------------------------===// +#ifndef LLVM_CODEGEN_FPBUILTINFNSELECTION_H +#define LLVM_CODEGEN_FPBUILTINFNSELECTION_H + +#include "llvm/IR/PassManager.h" + +namespace llvm { + +class Module; + +struct FPBuiltinFnSelectionPass + : PassInfoMixin { + PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM); +}; + +} // end namespace llvm + +#endif // LLVM_CODEGEN_FPBUILTINFNSELECTION_H Index: llvm/include/llvm/CodeGen/MachinePassRegistry.def =================================================================== --- llvm/include/llvm/CodeGen/MachinePassRegistry.def +++ llvm/include/llvm/CodeGen/MachinePassRegistry.def @@ -39,6 +39,7 @@ FUNCTION_PASS("lower-constant-intrinsics", LowerConstantIntrinsicsPass, ()) FUNCTION_PASS("unreachableblockelim", UnreachableBlockElimPass, ()) FUNCTION_PASS("consthoist", ConstantHoistingPass, ()) +FUNCTION_PASS("fpbuiltin-fn-selection", FPBuiltinFnSelectionPass, ()) FUNCTION_PASS("replace-with-veclib", ReplaceWithVeclib, ()) FUNCTION_PASS("partially-inline-libcalls", PartiallyInlineLibCallsPass, ()) FUNCTION_PASS("ee-instrument", EntryExitInstrumenterPass, (false)) Index: llvm/include/llvm/CodeGen/Passes.h =================================================================== --- llvm/include/llvm/CodeGen/Passes.h +++ llvm/include/llvm/CodeGen/Passes.h @@ -442,6 +442,10 @@ /// evaluation. ModulePass *createPreISelIntrinsicLoweringPass(); + /// This pass lowers the \@llvm.fpbuiltin.{operation} intrinsics to + /// matching library function calls based on call site attributes. + FunctionPass *createFPBuiltinFnSelectionPass(); + /// GlobalMerge - This pass merges internal (by default) globals into structs /// to enable reuse of a base pointer by indexed addressing modes. /// It can also be configured to focus on size optimizations only. Index: llvm/include/llvm/IR/FPBuiltinOps.def =================================================================== --- /dev/null +++ llvm/include/llvm/IR/FPBuiltinOps.def @@ -0,0 +1,59 @@ +//===--- llvm/IR/FPBuiltinOps.def - Constrained intrinsics ------*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// Defines properties of floating point builtin intrinsics. +// +//===----------------------------------------------------------------------===// + +#ifndef OPERATION +#define OPERATION(N,I) +#endif + +// Arguments of the entries are: +// - operation name. +// - name of the fpbuiltin intrinsic to represent this operation. + +// These are definitions for instructions, that are converted into constrained +// intrinsics. +// +OPERATION(FAdd, fpbuiltin_fadd) +OPERATION(FSub, fpbuiltin_fsub) +OPERATION(FMul, fpbuiltin_fmul) +OPERATION(FDiv, fpbuiltin_fdiv) +OPERATION(FRem, fpbuiltin_frem) +OPERATION(Sin, fpbuiltin_sin) +OPERATION(Cos, fpbuiltin_cos) +OPERATION(Tan, fpbuiltin_tan) +OPERATION(Sinh, fpbuiltin_sinh) +OPERATION(Cosh, fpbuiltin_cosh) +OPERATION(Tanh, fpbuiltin_tanh) +OPERATION(Asin, fpbuiltin_asin) +OPERATION(Acos, fpbuiltin_acos) +OPERATION(Atan, fpbuiltin_atan) +OPERATION(Atan2, fpbuiltin_atan2) +OPERATION(Asinh, fpbuiltin_asinh) +OPERATION(Acosh, fpbuiltin_acosh) +OPERATION(Atanh, fpbuiltin_atanh) +OPERATION(Exp, fpbuiltin_exp) +OPERATION(Exp2, fpbuiltin_exp2) +OPERATION(Exp10, fpbuiltin_exp10) +OPERATION(Expm1, fpbuiltin_expm1) +OPERATION(Log, fpbuiltin_log) +OPERATION(Log2, fpbuiltin_log2) +OPERATION(Log10, fpbuiltin_log10) +OPERATION(Log1p, fpbuiltin_log1p) +OPERATION(Hypot, fpbuiltin_hypot) +OPERATION(Pow, fpbuiltin_pow) +OPERATION(Ldexp, fpbuiltin_ldexp) +OPERATION(Sqrt, fpbuiltin_sqrt) +OPERATION(Rsqrt, fpbuiltin_rsqrt) +OPERATION(Erf, fpbuiltin_erf) +OPERATION(Erfc, fpbuiltin_erfc) +OPERATION(Sincos, fpbuiltin_sincos) + +#undef OPERATION Index: llvm/include/llvm/IR/IntrinsicInst.h =================================================================== --- llvm/include/llvm/IR/IntrinsicInst.h +++ llvm/include/llvm/IR/IntrinsicInst.h @@ -564,6 +564,21 @@ /// @} }; +/// This is the common base class for floating point builtin intrinsics. +class FPBuiltinIntrinsic : public IntrinsicInst { +public: + Optional getRequiredAccuracy() const; + + Type::TypeID getBaseTypeID() const; + ElementCount getElementCount() const; + + // Methods for support type inquiry through isa, cast, and dyn_cast: + static bool classof(const IntrinsicInst *I); + static bool classof(const Value *V) { + return isa(V) && classof(cast(V)); + } +}; + /// This is the common base class for constrained floating point intrinsics. class ConstrainedFPIntrinsic : public IntrinsicInst { public: Index: llvm/include/llvm/IR/Intrinsics.td =================================================================== --- llvm/include/llvm/IR/Intrinsics.td +++ llvm/include/llvm/IR/Intrinsics.td @@ -759,6 +759,108 @@ [llvm_anyfloat_ty, llvm_i32_ty], [IntrNoMem, IntrWillReturn, ImmArg>]>; +//===----------------- Floating Point Builtin Intrinsics ------------------===// +// +// These intrinsics are intended as explicitly replaceable versions of common +// floating point math operations. Passes must check for call site attributes +// that constrain the behavior of these intrinsics before transforming them in +// any way. +// +// While many of these operations correspond to functions in the standard C +// math library, these intrinsics are explicitly intended to be replaceable by +// by alternate implementations. +// + +let IntrProperties = [IntrNoMem, IntrWillReturn] in { + def int_fpbuiltin_fadd : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0>, + LLVMMatchType<0> ]>; + def int_fpbuiltin_fsub : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0>, + LLVMMatchType<0> ]>; + def int_fpbuiltin_fmul : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0>, + LLVMMatchType<0> ]>; + def int_fpbuiltin_fdiv : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0>, + LLVMMatchType<0> ]>; + def int_fpbuiltin_frem : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0>, + LLVMMatchType<0> ]>; + + def int_fpbuiltin_sin : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_cos : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_tan : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_sinh : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_cosh : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_tanh : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_asin : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_acos : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_atan : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_atan2 : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0>, + LLVMMatchType<0> ]>; + def int_fpbuiltin_asinh : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_acosh : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_atanh : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + + def int_fpbuiltin_exp : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_exp2 : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_exp10 : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_expm1 : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_log : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_log2 : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_log10 : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_log1p : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + + def int_fpbuiltin_hypot : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0>, + LLVMMatchType<0> ]>; + def int_fpbuiltin_pow : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0>, + LLVMMatchType<0> ]>; + def int_fpbuiltin_ldexp : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0>, + LLVMScalarOrSameVectorWidth<0, llvm_i32_ty> ]>; + + def int_fpbuiltin_sqrt : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_rsqrt : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + + def int_fpbuiltin_erf : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; + def int_fpbuiltin_erfc : DefaultAttrsIntrinsic<[ llvm_anyfloat_ty ], + [ LLVMMatchType<0> ]>; +} + +let IntrProperties = [IntrArgMemOnly, IntrWillReturn] in { + def int_fpbuiltin_sincos : DefaultAttrsIntrinsic<[], + [ llvm_anyfloat_ty, + llvm_ptr_ty, + llvm_ptr_ty ]>; +} + //===--------------- Constrained Floating Point Intrinsics ----------------===// // Index: llvm/include/llvm/InitializePasses.h =================================================================== --- llvm/include/llvm/InitializePasses.h +++ llvm/include/llvm/InitializePasses.h @@ -142,6 +142,7 @@ void initializeMakeGuardsExplicitLegacyPassPass(PassRegistry&); void initializeExternalAAWrapperPassPass(PassRegistry&); void initializeFEntryInserterPass(PassRegistry&); +void initializeFPBuiltinFnSelectionLegacyPassPass(PassRegistry&); void initializeFinalizeISelPass(PassRegistry&); void initializeFinalizeMachineBundlesPass(PassRegistry&); void initializeFixIrreduciblePass(PassRegistry &); Index: llvm/lib/Analysis/TargetLibraryInfo.cpp =================================================================== --- llvm/lib/Analysis/TargetLibraryInfo.cpp +++ llvm/lib/Analysis/TargetLibraryInfo.cpp @@ -17,6 +17,15 @@ #include "llvm/Support/CommandLine.h" using namespace llvm; +static cl::opt ClAltMathLibrary( + "alt-math-library", cl::Hidden, + cl::desc("Alternate floating point math library"), + cl::init(TargetLibraryInfoImpl::NoAltMathLibrary), + cl::values(clEnumValN(TargetLibraryInfoImpl::NoAltMathLibrary, "none", + "No alternate math library"), + clEnumValN(TargetLibraryInfoImpl::TestAltMathLibrary, "test", + "Fake library used for testing"))); + static cl::opt ClVectorLibrary( "vector-library", cl::Hidden, cl::desc("Vector functions library"), cl::init(TargetLibraryInfoImpl::NoLibrary), @@ -862,6 +871,7 @@ } TLI.addVectorizableFunctionsFromVecLib(ClVectorLibrary); + TLI.addAltMathFunctionsFromLib(ClAltMathLibrary); } TargetLibraryInfoImpl::TargetLibraryInfoImpl() { @@ -886,6 +896,7 @@ memcpy(AvailableArray, TLI.AvailableArray, sizeof(AvailableArray)); VectorDescs = TLI.VectorDescs; ScalarDescs = TLI.ScalarDescs; + AltMathFuncDescs = TLI.AltMathFuncDescs; } TargetLibraryInfoImpl::TargetLibraryInfoImpl(TargetLibraryInfoImpl &&TLI) @@ -898,6 +909,7 @@ AvailableArray); VectorDescs = TLI.VectorDescs; ScalarDescs = TLI.ScalarDescs; + AltMathFuncDescs = TLI.AltMathFuncDescs; } TargetLibraryInfoImpl &TargetLibraryInfoImpl::operator=(const TargetLibraryInfoImpl &TLI) { @@ -907,6 +919,9 @@ ShouldSignExtI32Param = TLI.ShouldSignExtI32Param; SizeOfInt = TLI.SizeOfInt; memcpy(AvailableArray, TLI.AvailableArray, sizeof(AvailableArray)); + VectorDescs = TLI.VectorDescs; + ScalarDescs = TLI.ScalarDescs; + AltMathFuncDescs = TLI.AltMathFuncDescs; return *this; } @@ -918,6 +933,9 @@ SizeOfInt = TLI.SizeOfInt; std::move(std::begin(TLI.AvailableArray), std::end(TLI.AvailableArray), AvailableArray); + VectorDescs = TLI.VectorDescs; + ScalarDescs = TLI.ScalarDescs; + AltMathFuncDescs = TLI.AltMathFuncDescs; return *this; } @@ -1118,6 +1136,78 @@ memset(AvailableArray, 0, sizeof(AvailableArray)); } +static bool compareAltMathDescs(const AltMathDesc &LHS, + const AltMathDesc &RHS) { + if (LHS.IntrinID != RHS.IntrinID) + return LHS.IntrinID < RHS.IntrinID; + if (LHS.BaseFPType != RHS.BaseFPType) + return LHS.BaseFPType < RHS.BaseFPType; + if (LHS.VectorizationFactor != RHS.VectorizationFactor) { + // Sort scalar types ahead of vector types + if (LHS.VectorizationFactor.isScalar() != + RHS.VectorizationFactor.isScalar()) + return LHS.VectorizationFactor.isScalar() > + RHS.VectorizationFactor.isScalar(); + assert((LHS.VectorizationFactor.isVector() && + RHS.VectorizationFactor.isVector()) && + "Unexpected vectorization factor in alt math fn desc"); + // Sort scaleable vector types ahead of fixed vector types + if (LHS.VectorizationFactor.isScalable() != + RHS.VectorizationFactor.isScalable()) + return LHS.VectorizationFactor.isScalable() > + RHS.VectorizationFactor + .isScalable(); + // For non-scaleable vectors, this will be the fixed size + // For scaleable vectors, it's the size that's multiplied by the vscale + return LHS.VectorizationFactor.getKnownMinValue() < + RHS.VectorizationFactor.getKnownMinValue(); + } + // Sort in order of descending accuracy + return LHS.Accuracy > RHS.Accuracy; +} + +void TargetLibraryInfoImpl::addAltMathFunctions(ArrayRef Fns) { + llvm::append_range(AltMathFuncDescs, Fns); + llvm::sort(AltMathFuncDescs, compareAltMathDescs); +} + +void TargetLibraryInfoImpl::addAltMathFunctionsFromLib( + enum AltMathLibrary AltLib) { + switch (AltLib) { + case TestAltMathLibrary: { + const AltMathDesc AltMathFuncs[] = { + #define TLI_DEFINE_TEST_ALTMATHFUNCS + #include "llvm/Analysis/AltMathLibFuncs.def" + }; + addAltMathFunctions(AltMathFuncs); + break; + } + case NoAltMathLibrary: + break; + } +} + +/// Select an alternate math library implementation that meets the criteria +/// described by an FPBuiltinIntrinsic call. +StringRef TargetLibraryInfoImpl::selectFPBuiltinImplementation( + FPBuiltinIntrinsic *Builtin) const { + // TODO: Handle the case of no specified accuracy. + if (Builtin->getRequiredAccuracy() == None) + return StringRef(); + AltMathDesc RequiredDesc = {Builtin->getIntrinsicID(), + Builtin->getBaseTypeID(), + Builtin->getElementCount(), + "", Builtin->getRequiredAccuracy().value()}; + std::vector::const_iterator I = + llvm::lower_bound(AltMathFuncDescs, RequiredDesc, compareAltMathDescs); + if (I == AltMathFuncDescs.end()) + return StringRef(); // TODO: Report fatal error? + // No match found + if (I->IntrinID != Builtin->getIntrinsicID()) + return StringRef(); // TODO: Report fatal error? + return I->FnImplName; +} + static bool compareByScalarFnName(const VecDesc &LHS, const VecDesc &RHS) { return LHS.ScalarFnName < RHS.ScalarFnName; } Index: llvm/lib/CodeGen/CMakeLists.txt =================================================================== --- llvm/lib/CodeGen/CMakeLists.txt +++ llvm/lib/CodeGen/CMakeLists.txt @@ -60,6 +60,7 @@ ExpandVectorPredication.cpp FaultMaps.cpp FEntryInserter.cpp + FPBuiltinFnSelection.cpp FinalizeISel.cpp FixupStatepointCallerSaved.cpp FuncletLayout.cpp Index: llvm/lib/CodeGen/CodeGen.cpp =================================================================== --- llvm/lib/CodeGen/CodeGen.cpp +++ llvm/lib/CodeGen/CodeGen.cpp @@ -40,6 +40,7 @@ initializeExpandMemCmpPassPass(Registry); initializeExpandPostRAPass(Registry); initializeFEntryInserterPass(Registry); + initializeFPBuiltinFnSelectionLegacyPassPass(Registry); initializeFinalizeISelPass(Registry); initializeFinalizeMachineBundlesPass(Registry); initializeFixupStatepointCallerSavedPass(Registry); Index: llvm/lib/CodeGen/FPBuiltinFnSelection.cpp =================================================================== --- /dev/null +++ llvm/lib/CodeGen/FPBuiltinFnSelection.cpp @@ -0,0 +1,162 @@ +//===- FPBuiltinFnSelection.cpp - Pre-ISel intrinsic lowering pass --------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// +// +// This pass implements alternate math library implementation selection for +// llvm.fpbuiltin.* intrinsics. +// +//===----------------------------------------------------------------------===// + +#include "llvm/CodeGen/FPBuiltinFnSelection.h" +#include "llvm/Analysis/TargetLibraryInfo.h" +#include "llvm/CodeGen/Passes.h" +#include "llvm/IR/IRBuilder.h" +#include "llvm/IR/InstIterator.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/InitializePasses.h" + +using namespace llvm; + +#define DEBUG_TYPE "fpbuiltin-fn-selection" + +static bool replaceWithAltMathFunction(FPBuiltinIntrinsic &BuiltinCall, + const StringRef ImplName) { + Module *M = BuiltinCall.getModule(); + + Function *OldFunc = BuiltinCall.getCalledFunction(); + + // Check if the alt math library function is already declared in this module, + // otherwise insert it. + Function *ImplFunc = M->getFunction(ImplName); + if (!ImplFunc) { + ImplFunc = Function::Create(OldFunc->getFunctionType(), + Function::ExternalLinkage, ImplName, *M); + // TODO: Copy non-builtin attributes ImplFunc->copyAttributesFrom(OldFunc); + } + + // Replace the call to the fpbuiltin intrinsic with a call + // to the corresponding function from the alternate math library. + IRBuilder<> IRBuilder(&BuiltinCall); + SmallVector Args(BuiltinCall.args()); + // Preserve the operand bundles. + SmallVector OpBundles; + BuiltinCall.getOperandBundlesAsDefs(OpBundles); + CallInst *Replacement = IRBuilder.CreateCall(ImplFunc, Args, OpBundles); + assert(OldFunc->getFunctionType() == ImplFunc->getFunctionType() && + "Expecting function types to be identical"); + BuiltinCall.replaceAllUsesWith(Replacement); + // TODO: fpbuiltin.sincos won't be reported as an FPMathOperator + // Do we need to do anything about that? + if (isa(Replacement)) { + // Preserve fast math flags for FP math. + Replacement->copyFastMathFlags(&BuiltinCall); + } + + LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": Replaced call to `" + << OldFunc->getName() << "` with call to `" << ImplName + << "`.\n"); + return true; +} + +static bool selectFnForFPBuiltinCalls(const TargetLibraryInfo &TLI, + FPBuiltinIntrinsic &BuiltinCall) { + LLVM_DEBUG({ + dbgs() << "Selecting an implementation for " + << BuiltinCall.getCalledFunction()->getName() + << " with accuracy = "; + if (BuiltinCall.getRequiredAccuracy() == None) + dbgs() << "(none)\n"; + else + dbgs() << BuiltinCall.getRequiredAccuracy().value() << "\n"; + }); + + /// Call TLI to select a function implementation to call + StringRef ImplName = TLI.selectFPBuiltinImplementation(&BuiltinCall); + if (ImplName.empty()) { + // TODO: Report an error + LLVM_DEBUG(dbgs() << "No matching implementation found!\n"); + return false; + } + + LLVM_DEBUG(dbgs() << "Selected " << ImplName << "\n"); + + return replaceWithAltMathFunction(BuiltinCall, ImplName); +} + +static bool runImpl(const TargetLibraryInfo &TLI, Function &F) { + bool Changed = false; + SmallVector ReplacedCalls; + for (auto &I : instructions(F)) { + if (auto *CI = dyn_cast(&I)) { + if (selectFnForFPBuiltinCalls(TLI, *CI)) { + ReplacedCalls.push_back(CI); + Changed = true; + } + } + } + // Erase the calls to the intrinsics that have been replaced + // with calls to the alternate math library. + for (auto *CI : ReplacedCalls) { + CI->eraseFromParent(); + } + return Changed; +} + + +namespace { + +class FPBuiltinFnSelectionLegacyPass : public FunctionPass { +public: + static char ID; + + FPBuiltinFnSelectionLegacyPass() : FunctionPass(ID) {} + + bool runOnFunction(Function &F) override { + const TargetLibraryInfo *TLI = + &getAnalysis().getTLI(F); + + return runImpl(*TLI, F); + } + + void getAnalysisUsage(AnalysisUsage &AU) const { + AU.setPreservesCFG(); + AU.addRequired(); + AU.addPreserved(); + } +}; + +} // end anonymous namespace + +char FPBuiltinFnSelectionLegacyPass::ID; + +INITIALIZE_PASS_BEGIN(FPBuiltinFnSelectionLegacyPass, + DEBUG_TYPE, "FPBuiltin Function Selection", + false, false) +INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass) +INITIALIZE_PASS_END(FPBuiltinFnSelectionLegacyPass, + DEBUG_TYPE, "FPBuiltin Function Selection", + false, false) + +FunctionPass *llvm::createFPBuiltinFnSelectionPass() { + return new FPBuiltinFnSelectionLegacyPass; +} + +PreservedAnalyses FPBuiltinFnSelectionPass::run(Function &F, + FunctionAnalysisManager &AM) { + const TargetLibraryInfo &TLI = AM.getResult(F); + bool Changed = runImpl(TLI, F); + if (Changed) { + PreservedAnalyses PA; + PA.preserveSet(); + PA.preserve(); + return PA; + } else { + // The pass did not replace any calls, hence it preserves all analyses. + return PreservedAnalyses::all(); + } + +} Index: llvm/lib/CodeGen/TargetPassConfig.cpp =================================================================== --- llvm/lib/CodeGen/TargetPassConfig.cpp +++ llvm/lib/CodeGen/TargetPassConfig.cpp @@ -1115,6 +1115,7 @@ PM->add(createTargetTransformInfoWrapperPass(TM->getTargetIRAnalysis())); addPass(createExpandLargeDivRemPass()); addIRPasses(); + addPass(createFPBuiltinFnSelectionPass()); addCodeGenPrepare(); addPassesToHandleExceptions(); addISelPrepare(); Index: llvm/lib/IR/IntrinsicInst.cpp =================================================================== --- llvm/lib/IR/IntrinsicInst.cpp +++ llvm/lib/IR/IntrinsicInst.cpp @@ -269,6 +269,55 @@ return ConstantInt::get(Type::getInt64Ty(Context), 1); } +Type::TypeID FPBuiltinIntrinsic::getBaseTypeID() const { + // All currently supported FP builtins are characterized by the type of their + // first argument. Since llvm.fpbuiltin.sincos doesn't return a value, using + // the type of the first argument is the most consistent technique. + Type *OperandTy = getArgOperand(0)->getType(); + assert((OperandTy->isFloatingPointTy() || + (OperandTy->isVectorTy() && + OperandTy->getScalarType()->isFloatingPointTy())) && + "Unexpected type for floating point builtin intrinsic!"); + return OperandTy->getScalarType()->getTypeID(); +} + +ElementCount FPBuiltinIntrinsic::getElementCount() const { + Type *OperandTy = getArgOperand(0)->getType(); + assert((OperandTy->isFloatingPointTy() || + (OperandTy->isVectorTy() && + OperandTy->getScalarType()->isFloatingPointTy())) && + "Unexpected type for floating point builtin intrinsic!"); + if (auto *VecTy = dyn_cast(OperandTy)) + return VecTy->getElementCount(); + return ElementCount::getFixed(1); +} + +Optional FPBuiltinIntrinsic::getRequiredAccuracy() const { + if (!hasFnAttr("fp-max-error")) + return None; + // This should be a string attribute with a floating-point value + // If it isn't the IR verifier should report the problem. Here + // we handle that as if the attribute were absent. + // TODO: Create Attribute::getValueAsDouble()? + double Accuracy; + // getAsDouble returns false if it succeeds + if (getFnAttr("fp-max-error").getValueAsString().getAsDouble(Accuracy)) + return None; + return (float)Accuracy; +} + +bool FPBuiltinIntrinsic::classof(const IntrinsicInst *I) { + switch (I->getIntrinsicID()) { +#define OPERATION(NAME, INTRINSIC) \ + case Intrinsic::INTRINSIC: +#include "llvm/IR/FPBuiltinOps.def" + return true; + default: + return false; + } +} + + Optional ConstrainedFPIntrinsic::getRoundingMode() const { unsigned NumOperands = arg_size(); Metadata *MD = nullptr; Index: llvm/test/CodeGen/AArch64/O0-pipeline.ll =================================================================== --- llvm/test/CodeGen/AArch64/O0-pipeline.ll +++ llvm/test/CodeGen/AArch64/O0-pipeline.ll @@ -27,6 +27,7 @@ ; CHECK-NEXT: Expand reduction intrinsics ; CHECK-NEXT: AArch64 Stack Tagging ; CHECK-NEXT: SME ABI Pass +; CHECK-NEXT: FPBuiltin Function Selection ; CHECK-NEXT: Exception handling preparation ; CHECK-NEXT: Safe Stack instrumentation pass ; CHECK-NEXT: Insert stack protectors Index: llvm/test/CodeGen/AArch64/O3-pipeline.ll =================================================================== --- llvm/test/CodeGen/AArch64/O3-pipeline.ll +++ llvm/test/CodeGen/AArch64/O3-pipeline.ll @@ -93,6 +93,7 @@ ; CHECK-NEXT: Dominator Tree Construction ; CHECK-NEXT: Interleaved Access Pass ; CHECK-NEXT: SME ABI Pass +; CHECK-NEXT: FPBuiltin Function Selection ; CHECK-NEXT: Dominator Tree Construction ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: Type Promotion Index: llvm/test/CodeGen/AMDGPU/llc-pipeline.ll =================================================================== --- llvm/test/CodeGen/AMDGPU/llc-pipeline.ll +++ llvm/test/CodeGen/AMDGPU/llc-pipeline.ll @@ -49,6 +49,7 @@ ; GCN-O0-NEXT: Expand vector predication intrinsics ; GCN-O0-NEXT: Scalarize Masked Memory Intrinsics ; GCN-O0-NEXT: Expand reduction intrinsics +; GCN-O0-NEXT: FPBuiltin Function Selection ; GCN-O0-NEXT: AMDGPU Attributor ; GCN-O0-NEXT: CallGraph Construction ; GCN-O0-NEXT: Call Graph SCC Pass Manager @@ -221,6 +222,7 @@ ; GCN-O1-NEXT: Expand reduction intrinsics ; GCN-O1-NEXT: Natural Loop Information ; GCN-O1-NEXT: TLS Variable Hoist +; GCN-O1-NEXT: FPBuiltin Function Selection ; GCN-O1-NEXT: AMDGPU Attributor ; GCN-O1-NEXT: CallGraph Construction ; GCN-O1-NEXT: Call Graph SCC Pass Manager @@ -502,6 +504,7 @@ ; GCN-O1-OPTS-NEXT: Natural Loop Information ; GCN-O1-OPTS-NEXT: TLS Variable Hoist ; GCN-O1-OPTS-NEXT: Early CSE +; GCN-O1-OPTS-NEXT: FPBuiltin Function Selection ; GCN-O1-OPTS-NEXT: AMDGPU Attributor ; GCN-O1-OPTS-NEXT: CallGraph Construction ; GCN-O1-OPTS-NEXT: Call Graph SCC Pass Manager @@ -797,6 +800,7 @@ ; GCN-O2-NEXT: Natural Loop Information ; GCN-O2-NEXT: TLS Variable Hoist ; GCN-O2-NEXT: Early CSE +; GCN-O2-NEXT: FPBuiltin Function Selection ; GCN-O2-NEXT: AMDGPU Attributor ; GCN-O2-NEXT: CallGraph Construction ; GCN-O2-NEXT: Call Graph SCC Pass Manager @@ -1107,6 +1111,7 @@ ; GCN-O3-NEXT: Lazy Block Frequency Analysis ; GCN-O3-NEXT: Optimization Remark Emitter ; GCN-O3-NEXT: Global Value Numbering +; GCN-O3-NEXT: FPBuiltin Function Selection ; GCN-O3-NEXT: AMDGPU Attributor ; GCN-O3-NEXT: CallGraph Construction ; GCN-O3-NEXT: Call Graph SCC Pass Manager Index: llvm/test/CodeGen/ARM/O3-pipeline.ll =================================================================== --- llvm/test/CodeGen/ARM/O3-pipeline.ll +++ llvm/test/CodeGen/ARM/O3-pipeline.ll @@ -49,6 +49,7 @@ ; CHECK-NEXT: Transform functions to use DSP intrinsics ; CHECK-NEXT: Complex Deinterleaving Pass ; CHECK-NEXT: Interleaved Access Pass +; CHECK-NEXT: FPBuiltin Function Selection ; CHECK-NEXT: Type Promotion ; CHECK-NEXT: CodeGen Prepare ; CHECK-NEXT: Dominator Tree Construction Index: llvm/test/CodeGen/Generic/fp-builtin-intrinsics.ll =================================================================== --- /dev/null +++ llvm/test/CodeGen/Generic/fp-builtin-intrinsics.ll @@ -0,0 +1,186 @@ +; RUN: opt -alt-math-library=test -fpbuiltin-fn-selection -S < %s | FileCheck %s + +; Basic argument tests for fp-builtin intrinsics. +; Only a few representative functions are tested. + +; CHECK-LABEL: @test_scalar_cr +; CHECK: call half @__test_altmath_sqrth_cr +; CHECK: call half @__test_altmath_rsqrth_cr +; CHECK: call float @__test_altmath_sinf_cr +; CHECK: call float @__test_altmath_sqrtf_cr +; CHECK: call float @__test_altmath_rsqrtf_cr +; CHECK: call double @__test_altmath_sin_cr +; CHECK: call double @__test_altmath_sqrt_cr +; CHECK: call double @__test_altmath_rsqrt_cr +define void @test_scalar_cr(half %h, float %f, double %d) { +entry: + %t1 = call half @llvm.fpbuiltin.sqrt.f16(half %h) #0 + %t2 = call half @llvm.fpbuiltin.rsqrt.f16(half %h) #0 + %t3 = call float @llvm.fpbuiltin.sin.f32(float %f) #0 + %t4 = call float @llvm.fpbuiltin.sqrt.f32(float %f) #0 + %t5 = call float @llvm.fpbuiltin.rsqrt.f32(float %f) #0 + %t6 = call double @llvm.fpbuiltin.sin.f64(double %d) #0 + %t7 = call double @llvm.fpbuiltin.sqrt.f64(double %d) #0 + %t8 = call double @llvm.fpbuiltin.rsqrt.f64(double %d) #0 + ret void +} + +; CHECK-LABEL: @test_scalar_1_0 +; CHECK: call half @__test_altmath_sinh_high +; CHECK: call half @__test_altmath_cosh_high +; CHECK: call float @__test_altmath_sinf_high +; CHECK: call float @__test_altmath_cosf_high +; CHECK: call float @__test_altmath_tanf_high +; CHECK: call float @__test_altmath_rsqrtf_high +; CHECK: call double @__test_altmath_sin_high +; CHECK: call double @__test_altmath_cos_high +; CHECK: call double @__test_altmath_tan_high +; CHECK: call double @__test_altmath_rsqrt_high +define void @test_scalar_1_0(half %h, float %f, double %d) { +entry: + %t1 = call half @llvm.fpbuiltin.sin.f16(half %h) #1 + %t2 = call half @llvm.fpbuiltin.cos.f16(half %h) #1 + %t3 = call float @llvm.fpbuiltin.sin.f32(float %f) #1 + %t4 = call float @llvm.fpbuiltin.cos.f32(float %f) #1 + %t5 = call float @llvm.fpbuiltin.tan.f32(float %f) #1 + %t6 = call float @llvm.fpbuiltin.rsqrt.f32(float %f) #1 + %t7 = call double @llvm.fpbuiltin.sin.f64(double %d) #1 + %t8 = call double @llvm.fpbuiltin.cos.f64(double %d) #1 + %t9 = call double @llvm.fpbuiltin.tan.f64(double %d) #1 + %t10 = call double @llvm.fpbuiltin.rsqrt.f64(double %d) #1 + ret void +} + +; CHECK-LABEL: @test_scalar_2_5 +; CHECK: call half @__test_altmath_fdivh_med +; CHECK: call float @__test_altmath_fdivf_med +; CHECK: call float @__test_altmath_sqrtf_med +; CHECK: call double @__test_altmath_fdiv_med +; CHECK: call double @__test_altmath_sqrt_med +define void @test_scalar_2_5(half %h1, half %h2, float %f1, float %f2, + double %d1, double %d2) { +entry: + %t1 = call half @llvm.fpbuiltin.fdiv.f16(half %h1, half %h2) #2 + %t2 = call float @llvm.fpbuiltin.fdiv.f32(float %f1, float %f2) #2 + %t3 = call float @llvm.fpbuiltin.sqrt.f32(float %f1) #2 + %t4 = call double @llvm.fpbuiltin.fdiv.f64(double %d1, double %d2) #2 + %t5 = call double @llvm.fpbuiltin.sqrt.f64(double %d1) #2 + ret void +} + +; CHECK-LABEL: @test_scalar_4_0 +; CHECK: call half @__test_altmath_cosh_med +; CHECK: call float @__test_altmath_cosf_med +; CHECK: call double @__test_altmath_cos_med +define void @test_scalar_4_0(half %h, float %f, double %d) { +entry: + %t1 = call half @llvm.fpbuiltin.cos.f16(half %h) #3 + %t2 = call float @llvm.fpbuiltin.cos.f32(float %f) #3 + %t3 = call double @llvm.fpbuiltin.cos.f64(double %d) #3 + ret void +} + +; CHECK-LABEL: @test_scalar_4096 +; CHECK: call float @__test_altmath_rsqrtf_low +; CHECK: call double @__test_altmath_rsqrt_low +define void @test_scalar_4096(float %f, double %d) { +entry: + %t6 = call float @llvm.fpbuiltin.rsqrt.f32(float %f) #4 + %t10 = call double @llvm.fpbuiltin.rsqrt.f64(double %d) #4 + ret void +} + +; CHECK-LABEL: @test_vector_1_0 +; CHECK: call <4 x float> @__test_altmath_sinf4_high +; CHECK: call <4 x float> @__test_altmath_cosf4_high +; CHECK: call <8 x float> @__test_altmath_sinf8_high +; CHECK: call <8 x float> @__test_altmath_cosf8_high +; CHECK: call <2 x double> @__test_altmath_sin2_high +; CHECK: call <2 x double> @__test_altmath_cos2_high +define void @test_vector_1_0(<4 x float> %v4f, <8 x float> %v8f, <2 x double> %vd) { +entry: + %t1 = call <4 x float> @llvm.fpbuiltin.sin.v4f32(<4 x float> %v4f) #1 + %t2 = call <4 x float> @llvm.fpbuiltin.cos.v4f32(<4 x float> %v4f) #1 + %t3 = call <8 x float> @llvm.fpbuiltin.sin.v8f32(<8 x float> %v8f) #1 + %t4 = call <8 x float> @llvm.fpbuiltin.cos.v8f32(<8 x float> %v8f) #1 + %t5 = call <2 x double> @llvm.fpbuiltin.sin.v2f64(<2 x double> %vd) #1 + %t6 = call <2 x double> @llvm.fpbuiltin.cos.v2f64(<2 x double> %vd) #1 + ret void +} + +; TODO: Add a test with different vector sizes of the same base type + + +; Test cases where the only available implementations are more accurate than +; the required accuracy (3.5) +; CHECK-LABEL: @test_scalar_inexact +; CHECK: call half @__test_altmath_fdivh_med +; CHECK: call half @__test_altmath_sinh_high +; CHECK: call half @__test_altmath_cosh_high +; CHECK: call half @__test_altmath_sqrth_cr +; CHECK: call half @__test_altmath_rsqrth_cr +; CHECK: call float @__test_altmath_fdivf_med +; CHECK: call float @__test_altmath_sinf_high +; CHECK: call float @__test_altmath_cosf_high +; CHECK: call float @__test_altmath_tanf_high +; CHECK: call float @__test_altmath_sqrtf_med +; CHECK: call float @__test_altmath_rsqrtf_high +; CHECK: call double @__test_altmath_fdiv_med +; CHECK: call double @__test_altmath_sin_high +; CHECK: call double @__test_altmath_cos_high +; CHECK: call double @__test_altmath_tan_high +; CHECK: call double @__test_altmath_sqrt_med +; CHECK: call double @__test_altmath_rsqrt_high +define void @test_scalar_inexact(half %h1, half %h2, float %f1, float %f2, + double %d1, double %d2) { +entry: + %t1 = call half @llvm.fpbuiltin.fdiv.f16(half %h1, half %h2) #5 + %t2 = call half @llvm.fpbuiltin.sin.f16(half %h1) #5 + %t3 = call half @llvm.fpbuiltin.cos.f16(half %h1) #5 + %t4 = call half @llvm.fpbuiltin.sqrt.f16(half %h1) #5 + %t5 = call half @llvm.fpbuiltin.rsqrt.f16(half %h1) #5 + %t6 = call float @llvm.fpbuiltin.fdiv.f32(float %f1, float %f2) #5 + %t7 = call float @llvm.fpbuiltin.sin.f32(float %f1) #5 + %t8 = call float @llvm.fpbuiltin.cos.f32(float %f1) #5 + %t9 = call float @llvm.fpbuiltin.tan.f32(float %f1) #5 + %t10 = call float @llvm.fpbuiltin.sqrt.f32(float %f1) #5 + %t11 = call float @llvm.fpbuiltin.rsqrt.f32(float %f1) #5 + %t12 = call double @llvm.fpbuiltin.fdiv.f64(double %d1, double %d2) #5 + %t13 = call double @llvm.fpbuiltin.sin.f64(double %d1) #5 + %t14 = call double @llvm.fpbuiltin.cos.f64(double %d1) #5 + %t15 = call double @llvm.fpbuiltin.tan.f64(double %d1) #5 + %t16 = call double @llvm.fpbuiltin.sqrt.f64(double %d1) #5 + %t17 = call double @llvm.fpbuiltin.rsqrt.f64(double %d1) #5 + ret void +} + +declare half @llvm.fpbuiltin.fdiv.f16(half, half) +declare half @llvm.fpbuiltin.sin.f16(half) +declare half @llvm.fpbuiltin.cos.f16(half) +declare half @llvm.fpbuiltin.sqrt.f16(half) +declare half @llvm.fpbuiltin.rsqrt.f16(half) +declare float @llvm.fpbuiltin.fdiv.f32(float, float) +declare float @llvm.fpbuiltin.sin.f32(float) +declare float @llvm.fpbuiltin.cos.f32(float) +declare float @llvm.fpbuiltin.tan.f32(float) +declare float @llvm.fpbuiltin.sqrt.f32(float) +declare float @llvm.fpbuiltin.rsqrt.f32(float) +declare double @llvm.fpbuiltin.fdiv.f64(double, double) +declare double @llvm.fpbuiltin.sin.f64(double) +declare double @llvm.fpbuiltin.cos.f64(double) +declare double @llvm.fpbuiltin.tan.f64(double) +declare double @llvm.fpbuiltin.sqrt.f64(double) +declare double @llvm.fpbuiltin.rsqrt.f64(double) +declare <4 x float> @llvm.fpbuiltin.sin.v4f32(<4 x float>) +declare <4 x float> @llvm.fpbuiltin.cos.v4f32(<4 x float>) +declare <8 x float> @llvm.fpbuiltin.sin.v8f32(<8 x float>) +declare <8 x float> @llvm.fpbuiltin.cos.v8f32(<8 x float>) +declare <2 x double> @llvm.fpbuiltin.sin.v2f64(<2 x double>) +declare <2 x double> @llvm.fpbuiltin.cos.v2f64(<2 x double>) + +attributes #0 = { "fp-max-error"="0.5" } +attributes #1 = { "fp-max-error"="1.0" } +attributes #2 = { "fp-max-error"="2.5" } +attributes #3 = { "fp-max-error"="4.0" } +attributes #4 = { "fp-max-error"="4096.0" } +attributes #5 = { "fp-max-error"="3.0" } Index: llvm/test/CodeGen/PowerPC/O3-pipeline.ll =================================================================== --- llvm/test/CodeGen/PowerPC/O3-pipeline.ll +++ llvm/test/CodeGen/PowerPC/O3-pipeline.ll @@ -68,6 +68,7 @@ ; CHECK-NEXT: Expand reduction intrinsics ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: TLS Variable Hoist +; CHECK-NEXT: FPBuiltin Function Selection ; CHECK-NEXT: CodeGen Prepare ; CHECK-NEXT: Dominator Tree Construction ; CHECK-NEXT: Exception handling preparation Index: llvm/test/CodeGen/RISCV/O0-pipeline.ll =================================================================== --- llvm/test/CodeGen/RISCV/O0-pipeline.ll +++ llvm/test/CodeGen/RISCV/O0-pipeline.ll @@ -29,6 +29,7 @@ ; CHECK-NEXT: Expand vector predication intrinsics ; CHECK-NEXT: Scalarize Masked Memory Intrinsics ; CHECK-NEXT: Expand reduction intrinsics +; CHECK-NEXT: FPBuiltin Function Selection ; CHECK-NEXT: Exception handling preparation ; CHECK-NEXT: Safe Stack instrumentation pass ; CHECK-NEXT: Insert stack protectors Index: llvm/test/CodeGen/RISCV/O3-pipeline.ll =================================================================== --- llvm/test/CodeGen/RISCV/O3-pipeline.ll +++ llvm/test/CodeGen/RISCV/O3-pipeline.ll @@ -60,6 +60,7 @@ ; CHECK-NEXT: Expand reduction intrinsics ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: TLS Variable Hoist +; CHECK-NEXT: FPBuiltin Function Selection ; CHECK-NEXT: CodeGen Prepare ; CHECK-NEXT: Dominator Tree Construction ; CHECK-NEXT: Exception handling preparation Index: llvm/test/CodeGen/X86/O0-pipeline.ll =================================================================== --- llvm/test/CodeGen/X86/O0-pipeline.ll +++ llvm/test/CodeGen/X86/O0-pipeline.ll @@ -30,6 +30,7 @@ ; CHECK-NEXT: Scalarize Masked Memory Intrinsics ; CHECK-NEXT: Expand reduction intrinsics ; CHECK-NEXT: Expand indirectbr instructions +; CHECK-NEXT: FPBuiltin Function Selection ; CHECK-NEXT: Exception handling preparation ; CHECK-NEXT: Safe Stack instrumentation pass ; CHECK-NEXT: Insert stack protectors Index: llvm/test/CodeGen/X86/opt-pipeline.ll =================================================================== --- llvm/test/CodeGen/X86/opt-pipeline.ll +++ llvm/test/CodeGen/X86/opt-pipeline.ll @@ -67,6 +67,7 @@ ; CHECK-NEXT: Interleaved Access Pass ; CHECK-NEXT: X86 Partial Reduction ; CHECK-NEXT: Expand indirectbr instructions +; CHECK-NEXT: FPBuiltin Function Selection ; CHECK-NEXT: Natural Loop Information ; CHECK-NEXT: CodeGen Prepare ; CHECK-NEXT: Dominator Tree Construction Index: llvm/tools/opt/opt.cpp =================================================================== --- llvm/tools/opt/opt.cpp +++ llvm/tools/opt/opt.cpp @@ -422,7 +422,8 @@ "dot-regions", "dot-regions-only", "view-regions", "view-regions-only", "select-optimize", "expand-large-div-rem", - "structurizecfg", "fix-irreducible"}; + "structurizecfg", "fix-irreducible", + "fpbuiltin-fn-selection"}; for (const auto &P : PassNamePrefix) if (Pass.startswith(P)) return true; @@ -493,6 +494,7 @@ initializeTypePromotionPass(Registry); initializeReplaceWithVeclibLegacyPass(Registry); initializeJMCInstrumenterPass(Registry); + initializeFPBuiltinFnSelectionLegacyPassPass(Registry); #ifdef BUILD_EXAMPLES initializeExampleIRTransforms(Registry);