Changeset View
Standalone View
clang/docs/LanguageExtensions.rst
Show First 20 Lines • Show All 500 Lines • ▼ Show 20 Lines | |||||
See also :ref:`langext-__builtin_shufflevector`, :ref:`langext-__builtin_convertvector`. | See also :ref:`langext-__builtin_shufflevector`, :ref:`langext-__builtin_convertvector`. | ||||
.. [#] ternary operator(?:) has different behaviors depending on condition | .. [#] ternary operator(?:) has different behaviors depending on condition | ||||
operand's vector type. If the condition is a GNU vector (i.e. __vector_size__), | operand's vector type. If the condition is a GNU vector (i.e. __vector_size__), | ||||
it's only available in C++ and uses normal bool conversions (that is, != 0). | it's only available in C++ and uses normal bool conversions (that is, != 0). | ||||
If it's an extension (OpenCL) vector, it's only available in C and OpenCL C. | If it's an extension (OpenCL) vector, it's only available in C and OpenCL C. | ||||
And it selects base on signedness of the condition operands (OpenCL v1.1 s6.3.9). | And it selects base on signedness of the condition operands (OpenCL v1.1 s6.3.9). | ||||
Vector Builtins | |||||
--------------- | |||||
In addition to the operators mentioned above, Clang provides a set of builtins | |||||
to perform additional operations on certain scalar and vector types. | |||||
Let ``T`` be one of the following types: | |||||
* an integer type (as in C2x 6.2.5p19), but excluding enumerated types and _Bool | |||||
* the standard floating types float or double | |||||
* a half-precision floating point type, if one is supported on the target | |||||
* a vector type. | |||||
For scalar types, consider the operation applied to a vector with a single element. | |||||
*Elementwise Builtins* | |||||
Each builtin returns a vector equivalent to applying the specified operation | |||||
elementwise to the input. | |||||
Unless specified otherwise operation(±0) = ±0 and operation(±infinity) = ±infinity | |||||
======================================= ================================================================ ================================== | |||||
Name Operation Supported element types | |||||
======================================= ================================================================ ================================== | |||||
T __builtin_elementwise_abs(T x) return the absolute value of a number x integer and floating point types | |||||
T __builtin_elementwise_ceil(T x) return the smallest integral value greater than or equal to x floating point types | |||||
T __builtin_elementwise_floor(T x) return the largest integral value less than or equal to x floating point types | |||||
T __builtin_elementwise_rint(T x) return the integral value nearest to x (according to the floating point types | |||||
prevailing rounding mode) in floating-point format | |||||
scanon: "Prevailing rounding mode" is not super-useful, other than as a spelling for round-to-nearest… | |||||
fhahnAuthorUnsubmitted I removed rint and round for now and add` _roundeven` with the wording from TS 18661-1 fhahn: I removed `rint` and `round` for now and add` _roundeven` with the wording from TS 18661-1 | |||||
T __builtin_elementwise_round(T x) return the integral value nearest to x rounding half-way cases floating point types | |||||
away from zero, regardless of the current rounding direction | |||||
T__builtin_elementwise_trunc(T x) return the integral value nearest to but no larger in floating point types | |||||
magnitude than x | |||||
T __builtin_elementwise_max(T x, T y) return x or y, whichever is larger integer and floating point types | |||||
T __builtin_elementwise_min(T x, T y) return x or y, whichever is smaller integer and floating point types | |||||
======================================= ================================================================ ================================== | |||||
*Reduction Builtins* | |||||
Each builtin returns a scalar equivalent to applying the specified | |||||
operation(x, y) as pairwise tree reduction to the input. The pairs are formed | |||||
by concatenating both inputs and pairing adjacent elements. | |||||
craig.topperUnsubmitted I'm not sure I understand what is being concatenated here. craig.topper: I'm not sure I understand what is being concatenated here. | |||||
fhahnAuthorUnsubmitted I tried to spell it out more clearly. I'm still not sure if that spells it out as clearly as possibly and I'd appreciate any suggestions on how to improve the wording. fhahn: I tried to spell it out more clearly. I'm still not sure if that spells it out as clearly as… | |||||
scanonUnsubmitted It's unclear because there's no apparent "first" or "second" vector; there's just a single argument, and the result isn't a vector, it's a scalar. I think you want to say something like: "the operation is repeatedly applied to adjacent pairs of elements until the result is a scalar" and then provide a worked example. scanon: It's unclear because there's no apparent "first" or "second" vector; there's just a single… | |||||
craig.topperUnsubmitted Should it somehow mention the pair is the even element i and the odd element i+1. There are n-1 adjacent pairs in an n element vector, but we want non-overlapping pairs. Should probably spell out the non-power2 behavior. Presumably we pad identity elements after the last element to widen the vector out to a power 2 and then proceed normally? craig.topper: Should it somehow mention the pair is the even element `i` and the odd element `i+1`. There are… | |||||
fhahnAuthorUnsubmitted
Thanks, I tried to update the wording to make it clear that it operates on even-odd non-overlapping pairs.
Good point, done!
Used and added an example. fhahn: > Should it somehow mention the pair is the even element i and the odd element i+1. There are n… | |||||
craig.topperUnsubmitted The input is a single vector. I'm not understanding where we get a second vector to concatenate. craig.topper: The input is a single vector. I'm not understanding where we get a second vector to concatenate. | |||||
fhahnAuthorUnsubmitted Oh yes, now I see where the confusion was coming from. I was thinking about the reduction tree and how the input is broken up. Sorry for the confusing wording. I gave it another try, should be much simpler again now. fhahn: Oh yes, now I see where the confusion was coming from. I was thinking about the reduction tree… | |||||
It's really not clear what "horizontal recursive pairwise" means unless one has read the mailing list discussions. Maybe you could spell it out, e.g. "recursive even-odd pairwise reduction" or something like that. kparzysz: It's really not clear what "horizontal recursive pairwise" means unless one has read the… | |||||
Thanks, I used that wording! fhahn: Thanks, I used that wording! | |||||
Let ``VT`` be a vector type and ``ET`` the element type of ``VT``. | |||||
======================================= ================================================================ ================================== | |||||
Name Operation Supported element types | |||||
widening -> widened craig.topper: widening -> widened | |||||
thanks, should be fixed! fhahn: thanks, should be fixed! | |||||
======================================= ================================================================ ================================== | |||||
ET __builtin_reduce_max(VT a) return x or y, whichever is larger; If exactly one argument is integer and floating point types | |||||
a NaN, return the other argument. If both arguments are NaNs, | |||||
fmax() return a NaN. | |||||
ET __builtin_reduce_min(VT a) return x or y, whichever is smaller; If exactly one argument integer and floating point types | |||||
is a NaN, return the other argument. If both arguments are | |||||
NaNs, fmax() return a NaN. | |||||
ET __builtin_reduce_add(VT a) \+ integer and floating point types | |||||
scanonUnsubmitted Should be restricted to integer types. scanon: Should be restricted to integer types. | |||||
scanonUnsubmitted (Never mind, somehow read this as & instead of \+.) scanon: (Never mind, somehow read this as `&` instead of `\+`.) | |||||
ET __builtin_reduce_and(VT a) & integer types | |||||
ET __builtin_reduce_or(VT a) \| integer types | |||||
ET __builtin_reduce_xor(VT a) ^ integer types | |||||
======================================= ================================================================ ================================== | |||||
Matrix Types | Matrix Types | ||||
============ | ============ | ||||
Clang provides an extension for matrix types, which is currently being | Clang provides an extension for matrix types, which is currently being | ||||
implemented. See :ref:`the draft specification <matrixtypes>` for more details. | implemented. See :ref:`the draft specification <matrixtypes>` for more details. | ||||
For example, the code below uses the matrix types extension to multiply two 4x4 | For example, the code below uses the matrix types extension to multiply two 4x4 | ||||
float matrices and add the result to a third 4x4 matrix. | float matrices and add the result to a third 4x4 matrix. | ||||
The example above use __builtin_reduce_fadd, but not listed here? or should we just use __builtin_reduce_add for floating point and fix the example? kito-cheng: The example above use `__builtin_reduce_fadd`, but not listed here? or should we just use… | |||||
Thanks it should be _add instead of _fadd. fixed. fhahn: Thanks it should be `_add` instead of `_fadd`. fixed. | |||||
.. code-block:: c++ | .. code-block:: c++ | ||||
typedef float m4x4_t __attribute__((matrix_type(4, 4))); | typedef float m4x4_t __attribute__((matrix_type(4, 4))); | ||||
m4x4_t f(m4x4_t a, m4x4_t b, m4x4_t c) { | m4x4_t f(m4x4_t a, m4x4_t b, m4x4_t c) { | ||||
return a + b * c; | return a + b * c; | ||||
} | } | ||||
▲ Show 20 Lines • Show All 3,576 Lines • Show Last 20 Lines |
"Prevailing rounding mode" is not super-useful, other than as a spelling for round-to-nearest-ties-to-even (IEEE 754 default rounding). Outside of a FENV_ACCESS ON context, there's not even really a notion of "prevailing rounding mode" to appeal to. I assume the intent is for this to lower to e.g. x86 ROUND* with the dynamic rounding-mode immediate.
I would recommend adding __builtin_elementwise_roundeven(T x) instead, which would statically bind IEEE default rounding (following TS 18661-1 naming) without having to appeal to prevailing rounding mode, and can still lower to ROUND* on x86 outside of FENV_ACCESS ON contexts, which is the norm for vector code (and FRINTN unconditionally on armv8). I think we can punt on rint/nearbyint for now, and add them in the future if there's a need.