This patch specifies a set of vector builtins for Clang, as discussed on
cfe-dev:
https://lists.llvm.org/pipermail/cfe-dev/2021-September/068999.html
https://lists.llvm.org/pipermail/cfe-dev/2021-October/069070.html
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/docs/LanguageExtensions.rst | ||
---|---|---|
552 | I'm not sure I understand what is being concatenated here. |
clang/docs/LanguageExtensions.rst | ||
---|---|---|
538 | "Prevailing rounding mode" is not super-useful, other than as a spelling for round-to-nearest-ties-to-even (IEEE 754 default rounding). Outside of a FENV_ACCESS ON context, there's not even really a notion of "prevailing rounding mode" to appeal to. I assume the intent is for this to lower to e.g. x86 ROUND* with the dynamic rounding-mode immediate. I would recommend adding __builtin_elementwise_roundeven(T x) instead, which would statically bind IEEE default rounding (following TS 18661-1 naming) without having to appeal to prevailing rounding mode, and can still lower to ROUND* on x86 outside of FENV_ACCESS ON contexts, which is the norm for vector code (and FRINTN unconditionally on armv8). I think we can punt on rint/nearbyint for now, and add them in the future if there's a need. |
clang/docs/LanguageExtensions.rst | ||
---|---|---|
565 | Should be restricted to integer types. |
clang/docs/LanguageExtensions.rst | ||
---|---|---|
565 | (Never mind, somehow read this as & instead of \+.) |
Try to be more precise about how the reduction steps are performed and replace _round and _rint by roundeven.
clang/docs/LanguageExtensions.rst | ||
---|---|---|
552 | The input is a single vector. I'm not understanding where we get a second vector to concatenate. |
Another stab at phrasing the reduction step. Also added a note that the implementation is work-in-progress.
clang/docs/LanguageExtensions.rst | ||
---|---|---|
552 | Oh yes, now I see where the confusion was coming from. I was thinking about the reduction tree and how the input is broken up. Sorry for the confusing wording. I gave it another try, should be much simpler again now. |
I'm happy with this now.
clang/docs/LanguageExtensions.rst | ||
---|---|---|
552 | It's unclear because there's no apparent "first" or "second" vector; there's just a single argument, and the result isn't a vector, it's a scalar. I think you want to say something like: "the operation is repeatedly applied to adjacent pairs of elements until the result is a scalar" and then provide a worked example. |
clang/docs/LanguageExtensions.rst | ||
---|---|---|
552 | Should it somehow mention the pair is the even element i and the odd element i+1. There are n-1 adjacent pairs in an n element vector, but we want non-overlapping pairs. Should probably spell out the non-power2 behavior. Presumably we pad identity elements after the last element to widen the vector out to a power 2 and then proceed normally? |
clang/docs/LanguageExtensions.rst | ||
---|---|---|
553 | It's really not clear what "horizontal recursive pairwise" means unless one has read the mailing list discussions. Maybe you could spell it out, e.g. "recursive even-odd pairwise reduction" or something like that. |
Thanks for the latest set of comments!
I tried to incorporate the suggestions about improving the reduction wording. I also added an example.
I also put up 2 patches to start with the implementation of min/max and the abs builtins in D111985 and D111986. I adjusted the supported types for __builtin_elementwise_abs to *signed* integer and floating point types.
clang/docs/LanguageExtensions.rst | ||
---|---|---|
552 |
Thanks, I tried to update the wording to make it clear that it operates on even-odd non-overlapping pairs.
Good point, done!
Used and added an example. | |
553 | Thanks, I used that wording! |
clang/docs/LanguageExtensions.rst | ||
---|---|---|
557 | widening -> widened |
clang/docs/LanguageExtensions.rst | ||
---|---|---|
557 | thanks, should be fixed! |
clang/docs/LanguageExtensions.rst | ||
---|---|---|
579 | The example above use __builtin_reduce_fadd, but not listed here? or should we just use __builtin_reduce_add for floating point and fix the example? |
Thanks @kito-cheng, the example should use __builtin_reduce_add instead of _fadd! Fixed
clang/docs/LanguageExtensions.rst | ||
---|---|---|
579 | Thanks it should be _add instead of _fadd. fixed. |
Following feedback from D111986, explicitly spell out abs behavior of most negative integer as undefined.
As @scanon pointed out in D111986, most simd implementations should handle abs(INT_MIN) consistently by returngin INT_MIN. It's therefore better to avoid defining abs(INT_MIN) as UB, which would make the intrinsic more difficult to use. I updated the abs definition to spell out the behavior of abs(INT_MIN) as returning INT_MIN.
"Prevailing rounding mode" is not super-useful, other than as a spelling for round-to-nearest-ties-to-even (IEEE 754 default rounding). Outside of a FENV_ACCESS ON context, there's not even really a notion of "prevailing rounding mode" to appeal to. I assume the intent is for this to lower to e.g. x86 ROUND* with the dynamic rounding-mode immediate.
I would recommend adding __builtin_elementwise_roundeven(T x) instead, which would statically bind IEEE default rounding (following TS 18661-1 naming) without having to appeal to prevailing rounding mode, and can still lower to ROUND* on x86 outside of FENV_ACCESS ON contexts, which is the norm for vector code (and FRINTN unconditionally on armv8). I think we can punt on rint/nearbyint for now, and add them in the future if there's a need.