Download Raw Diff

Details

Reviewers

scanon
erichkeane
rjmccall
aaron.ballman
dexonsmith
rsmith
craig.topper

Commits

rG025988ded6b2: Specify Clang vector builtins.

Summary

This patch specifies a set of vector builtins for Clang, as discussed on
cfe-dev:
https://lists.llvm.org/pipermail/cfe-dev/2021-September/068999.html
https://lists.llvm.org/pipermail/cfe-dev/2021-October/069070.html

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn requested review of this revision.Oct 11 2021, 3:12 AM

fhahn created this revision.

Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2021, 3:12 AM

Harbormaster completed remote builds in B128061: Diff 378601.Oct 11 2021, 3:44 AM

craig.topper added inline comments.Oct 11 2021, 5:31 AM

clang/docs/LanguageExtensions.rst
552	I'm not sure I understand what is being concatenated here.

tschuett added a subscriber: tschuett.Oct 11 2021, 6:31 AM

scanon added inline comments.Oct 11 2021, 7:39 AM

clang/docs/LanguageExtensions.rst
538	"Prevailing rounding mode" is not super-useful, other than as a spelling for round-to-nearest-ties-to-even (IEEE 754 default rounding). Outside of a `FENV_ACCESS ON` context, there's not even really a notion of "prevailing rounding mode" to appeal to. I assume the intent is for this to lower to e.g. x86 ROUND* with the dynamic rounding-mode immediate. I would recommend adding `__builtin_elementwise_roundeven(T x)` instead, which would statically bind IEEE default rounding (following TS 18661-1 naming) without having to appeal to prevailing rounding mode, and can still lower to ROUND* on x86 outside of FENV_ACCESS ON contexts, which is the norm for vector code (and FRINTN unconditionally on armv8). I think we can punt on rint/nearbyint for now, and add them in the future if there's a need.

scanon added inline comments.Oct 11 2021, 7:40 AM

clang/docs/LanguageExtensions.rst
565	Should be restricted to integer types.

scanon added inline comments.Oct 11 2021, 7:41 AM

clang/docs/LanguageExtensions.rst
565	(Never mind, somehow read this as `&` instead of `\+`.)

Try to be more precise about how the reduction steps are performed and replace _round and _rint by roundeven.

fhahn marked 4 inline comments as done.Oct 12 2021, 6:10 AM

fhahn added inline comments.

clang/docs/LanguageExtensions.rst
538	I removed `rint` and `round` for now and add` _roundeven` with the wording from TS 18661-1
552	I tried to spell it out more clearly. I'm still not sure if that spells it out as clearly as possibly and I'd appreciate any suggestions on how to improve the wording.

Harbormaster completed remote builds in B128345: Diff 378992.Oct 12 2021, 6:22 AM

craig.topper added inline comments.Oct 12 2021, 8:15 AM

clang/docs/LanguageExtensions.rst
552	The input is a single vector. I'm not understanding where we get a second vector to concatenate.

Another stab at phrasing the reduction step. Also added a note that the implementation is work-in-progress.

fhahn marked an inline comment as done.Oct 13 2021, 1:40 PM

fhahn added inline comments.

clang/docs/LanguageExtensions.rst
552	Oh yes, now I see where the confusion was coming from. I was thinking about the reduction tree and how the input is broken up. Sorry for the confusing wording. I gave it another try, should be much simpler again now.

I'm happy with this now.

clang/docs/LanguageExtensions.rst
552	It's unclear because there's no apparent "first" or "second" vector; there's just a single argument, and the result isn't a vector, it's a scalar. I think you want to say something like: "the operation is repeatedly applied to adjacent pairs of elements until the result is a scalar" and then provide a worked example.

This revision is now accepted and ready to land.Oct 13 2021, 1:50 PM

craig.topper added inline comments.Oct 13 2021, 2:15 PM

clang/docs/LanguageExtensions.rst
552	Should it somehow mention the pair is the even element `i` and the odd element `i+1`. There are n-1 adjacent pairs in an n element vector, but we want non-overlapping pairs. Should probably spell out the non-power2 behavior. Presumably we pad identity elements after the last element to widen the vector out to a power 2 and then proceed normally?

kparzysz added a subscriber: kparzysz.Oct 13 2021, 2:28 PM

kparzysz added inline comments.

clang/docs/LanguageExtensions.rst
553	It's really not clear what "horizontal recursive pairwise" means unless one has read the mailing list discussions. Maybe you could spell it out, e.g. "recursive even-odd pairwise reduction" or something like that.

Harbormaster completed remote builds in B128712: Diff 379517.Oct 13 2021, 2:37 PM

fhahn mentioned this in D111985: [Clang] Add elementwise min/max builtins..Oct 18 2021, 4:47 AM

fhahn mentioned this in D111986: [Clang] Add elementwise abs builtin..

fhahn added a child revision: D111985: [Clang] Add elementwise min/max builtins..Oct 18 2021, 4:51 AM

Thanks for the latest set of comments!

I tried to incorporate the suggestions about improving the reduction wording. I also added an example.

I also put up 2 patches to start with the implementation of min/max and the abs builtins in D111985 and D111986. I adjusted the supported types for __builtin_elementwise_abs to *signed* integer and floating point types.

adjust padding wording.

fhahn marked 2 inline comments as done.Oct 18 2021, 5:18 AM

fhahn added inline comments.

clang/docs/LanguageExtensions.rst
552	Should it somehow mention the pair is the even element i and the odd element i+1. There are n-1 adjacent pairs in an n element vector, but we want non-overlapping pairs. Thanks, I tried to update the wording to make it clear that it operates on even-odd non-overlapping pairs. Should probably spell out the non-power2 behavior. Presumably we pad identity elements after the last element to widen the vector out to a power 2 and then proceed normally? Good point, done! I think you want to say something like: "the operation is repeatedly applied to adjacent pairs of elements until the result is a scalar" and then provide a worked example. Used and added an example.
553	Thanks, I used that wording!

Harbormaster completed remote builds in B129312: Diff 380352.Oct 18 2021, 5:43 AM

fhahn mentioned this in D112001: [Clang] Add min/max reduction builtins..Oct 18 2021, 8:28 AM

craig.topper added inline comments.Oct 18 2021, 11:07 AM

clang/docs/LanguageExtensions.rst
557	widening -> widened

Fix wording: widening -> widened, thanks!

fhahn marked an inline comment as done.Oct 18 2021, 12:54 PM

fhahn added inline comments.

clang/docs/LanguageExtensions.rst
557	thanks, should be fixed!

Harbormaster completed remote builds in B129417: Diff 380500.Oct 18 2021, 1:27 PM

kito-cheng added a subscriber: kito-cheng.Oct 18 2021, 8:28 PM

kito-cheng added inline comments.

clang/docs/LanguageExtensions.rst
579	The example above use `__builtin_reduce_fadd`, but not listed here? or should we just use `__builtin_reduce_add` for floating point and fix the example?

Thanks @kito-cheng, the example should use __builtin_reduce_add instead of _fadd! Fixed

fhahn marked an inline comment as done.Oct 19 2021, 1:16 AM

fhahn added inline comments.

clang/docs/LanguageExtensions.rst
579	Thanks it should be `_add` instead of `_fadd`. fixed.

Harbormaster completed remote builds in B129483: Diff 380598.Oct 19 2021, 1:24 AM

Following feedback from D111986, explicitly spell out abs behavior of most negative integer as undefined.

Harbormaster completed remote builds in B129495: Diff 380612.Oct 19 2021, 2:32 AM

As @scanon pointed out in D111986, most simd implementations should handle abs(INT_MIN) consistently by returngin INT_MIN. It's therefore better to avoid defining abs(INT_MIN) as UB, which would make the intrinsic more difficult to use. I updated the abs definition to spell out the behavior of abs(INT_MIN) as returning INT_MIN.

Harbormaster completed remote builds in B129592: Diff 380755.Oct 19 2021, 12:57 PM

Closed by commit rG025988ded6b2: Specify Clang vector builtins. (authored by fhahn). · Explain WhyOct 26 2021, 7:37 AM

This revision was automatically updated to reflect the committed changes.

fhahn added a commit: rG025988ded6b2: Specify Clang vector builtins..

fhahn mentioned this in rG1ef25d28c19e: [Clang] Add elementwise min/max builtins..Oct 26 2021, 8:54 AM

fhahn mentioned this in rG01870d51b848: [Clang] Add elementwise abs builtin..Oct 27 2021, 1:04 PM

fhahn mentioned this in rG7999355106fb: [Clang] Add min/max reduction builtins..Nov 2 2021, 7:02 AM

fhahn mentioned this in D108832: [Builtins] Support ext_vector_type args for __builtin_fminf..Nov 3 2021, 6:19 AM

junaire mentioned this in D114688: [Clang] Add __builtin_elementwise_ceil.Nov 28 2021, 10:25 PM

gandhi21299 added a subscriber: gandhi21299.Nov 28 2021, 10:54 PM

junaire mentioned this in D115231: [Clang] Add __builtin_reduce_xor.Dec 7 2021, 2:56 AM

aaron.ballman mentioned this in rG8680f951c21e: Add __builtin_elementwise_ceil.Dec 8 2021, 5:30 AM

junaire mentioned this in D115429: [Clang] Implement the rest of __builtin_elementwise_* functions..Dec 9 2021, 12:05 AM

fhahn mentioned this in rGb55ea2fbc0a0: [Clang] Add __builtin_reduce_xor.Dec 22 2021, 2:01 AM

junaire mentioned this in D116161: [Clang] Extend emitUnaryBuiltin to avoid duplicate logic..Dec 28 2021, 11:18 PM

fhahn mentioned this in rG5c57e6aa5777: [Clang] Extend emitUnaryBuiltin to avoid duplicate logic..Jan 4 2022, 3:48 AM

junaire mentioned this in D116736: [Clang] Add __builtin_reduce_or and __builtin_reduce_and.Jan 6 2022, 3:57 AM

fhahn mentioned this in rGb2ed9f3f44d0: [Clang] Implement the rest of __builtin_elementwise_* functions..Jan 7 2022, 7:12 AM

junaire mentioned this in rG8de0c1feca28: [Clang] Add __builtin_reduce_or and __builtin_reduce_and.Jan 14 2022, 6:06 AM

Diff 382311

clang/docs/LanguageExtensions.rst

	Show First 20 Lines • Show All 500 Lines • ▼ Show 20 Lines
	See also :ref:`langext-__builtin_shufflevector`, :ref:`langext-__builtin_convertvector`.			See also :ref:`langext-__builtin_shufflevector`, :ref:`langext-__builtin_convertvector`.

	.. [#] ternary operator(?:) has different behaviors depending on condition			.. [#] ternary operator(?:) has different behaviors depending on condition
	operand's vector type. If the condition is a GNU vector (i.e. __vector_size__),			operand's vector type. If the condition is a GNU vector (i.e. __vector_size__),
	it's only available in C++ and uses normal bool conversions (that is, != 0).			it's only available in C++ and uses normal bool conversions (that is, != 0).
	If it's an extension (OpenCL) vector, it's only available in C and OpenCL C.			If it's an extension (OpenCL) vector, it's only available in C and OpenCL C.
	And it selects base on signedness of the condition operands (OpenCL v1.1 s6.3.9).			And it selects base on signedness of the condition operands (OpenCL v1.1 s6.3.9).

				Vector Builtins
				---------------

				Note: The implementation of vector builtins is work-in-progress and incomplete.

				In addition to the operators mentioned above, Clang provides a set of builtins
				to perform additional operations on certain scalar and vector types.

				Let ``T`` be one of the following types:

				* an integer type (as in C2x 6.2.5p19), but excluding enumerated types and _Bool
				* the standard floating types float or double
				* a half-precision floating point type, if one is supported on the target
				* a vector type.

				For scalar types, consider the operation applied to a vector with a single element.

				Elementwise Builtins

				Each builtin returns a vector equivalent to applying the specified operation
				elementwise to the input.

				Unless specified otherwise operation(±0) = ±0 and operation(±infinity) = ±infinity

				========================================= ================================================================ =========================================
				Name Operation Supported element types
				========================================= ================================================================ =========================================
				T __builtin_elementwise_abs(T x) return the absolute value of a number x; the absolute value of signed integer and floating point types
				the most negative integer remains the most negative integer
				T __builtin_elementwise_ceil(T x) return the smallest integral value greater than or equal to x floating point types
				scanonUnsubmitted Done Reply Inline Actions "Prevailing rounding mode" is not super-useful, other than as a spelling for round-to-nearest-ties-to-even (IEEE 754 default rounding). Outside of a `FENV_ACCESS ON` context, there's not even really a notion of "prevailing rounding mode" to appeal to. I assume the intent is for this to lower to e.g. x86 ROUND* with the dynamic rounding-mode immediate. I would recommend adding `__builtin_elementwise_roundeven(T x)` instead, which would statically bind IEEE default rounding (following TS 18661-1 naming) without having to appeal to prevailing rounding mode, and can still lower to ROUND* on x86 outside of FENV_ACCESS ON contexts, which is the norm for vector code (and FRINTN unconditionally on armv8). I think we can punt on rint/nearbyint for now, and add them in the future if there's a need. scanon: "Prevailing rounding mode" is not super-useful, other than as a spelling for round-to-nearest…
				fhahnAuthorUnsubmitted Done Reply Inline Actions I removed `rint` and `round` for now and add` _roundeven` with the wording from TS 18661-1 fhahn: I removed `rint` and `round` for now and add` _roundeven` with the wording from TS 18661-1
				T __builtin_elementwise_floor(T x) return the largest integral value less than or equal to x floating point types
				T __builtin_elementwise_roundeven(T x) round x to the nearest integer value in floating point format, floating point types
				rounding halfway cases to even (that is, to the nearest value
				that is an even integer), regardless of the current rounding
				direction.
				T__builtin_elementwise_trunc(T x) return the integral value nearest to but no larger in floating point types
				magnitude than x
				T __builtin_elementwise_max(T x, T y) return x or y, whichever is larger integer and floating point types
				T __builtin_elementwise_min(T x, T y) return x or y, whichever is smaller integer and floating point types
				========================================= ================================================================ =========================================


				Reduction Builtins

				craig.topperUnsubmitted Done Reply Inline Actions I'm not sure I understand what is being concatenated here. craig.topper: I'm not sure I understand what is being concatenated here.
				fhahnAuthorUnsubmitted Done Reply Inline Actions I tried to spell it out more clearly. I'm still not sure if that spells it out as clearly as possibly and I'd appreciate any suggestions on how to improve the wording. fhahn: I tried to spell it out more clearly. I'm still not sure if that spells it out as clearly as…
				scanonUnsubmitted Done Reply Inline Actions It's unclear because there's no apparent "first" or "second" vector; there's just a single argument, and the result isn't a vector, it's a scalar. I think you want to say something like: "the operation is repeatedly applied to adjacent pairs of elements until the result is a scalar" and then provide a worked example. scanon: It's unclear because there's no apparent "first" or "second" vector; there's just a single…
				craig.topperUnsubmitted Done Reply Inline Actions Should it somehow mention the pair is the even element `i` and the odd element `i+1`. There are n-1 adjacent pairs in an n element vector, but we want non-overlapping pairs. Should probably spell out the non-power2 behavior. Presumably we pad identity elements after the last element to widen the vector out to a power 2 and then proceed normally? craig.topper: Should it somehow mention the pair is the even element `i` and the odd element `i+1`. There are…
				fhahnAuthorUnsubmitted Done Reply Inline Actions Should it somehow mention the pair is the even element i and the odd element i+1. There are n-1 adjacent pairs in an n element vector, but we want non-overlapping pairs. Thanks, I tried to update the wording to make it clear that it operates on even-odd non-overlapping pairs. Should probably spell out the non-power2 behavior. Presumably we pad identity elements after the last element to widen the vector out to a power 2 and then proceed normally? Good point, done! I think you want to say something like: "the operation is repeatedly applied to adjacent pairs of elements until the result is a scalar" and then provide a worked example. Used and added an example. fhahn: > Should it somehow mention the pair is the even element i and the odd element i+1. There are n…
				craig.topperUnsubmitted Done Reply Inline Actions The input is a single vector. I'm not understanding where we get a second vector to concatenate. craig.topper: The input is a single vector. I'm not understanding where we get a second vector to concatenate.
				fhahnAuthorUnsubmitted Done Reply Inline Actions Oh yes, now I see where the confusion was coming from. I was thinking about the reduction tree and how the input is broken up. Sorry for the confusing wording. I gave it another try, should be much simpler again now. fhahn: Oh yes, now I see where the confusion was coming from. I was thinking about the reduction tree…
				Each builtin returns a scalar equivalent to applying the specified
				kparzyszUnsubmitted Done Reply Inline Actions It's really not clear what "horizontal recursive pairwise" means unless one has read the mailing list discussions. Maybe you could spell it out, e.g. "recursive even-odd pairwise reduction" or something like that. kparzysz: It's really not clear what "horizontal recursive pairwise" means unless one has read the…
				fhahnAuthorUnsubmitted Done Reply Inline Actions Thanks, I used that wording! fhahn: Thanks, I used that wording!
				operation(x, y) as recursive even-odd pairwise reduction to all vector
				elements. ``operation(x, y)`` is repeatedly applied to each non-overlapping
				even-odd element pair with indices ``i * 2`` and ``i * 2 + 1`` with
				``i in [0, Number of elements / 2)``. If the numbers of elements is not a
				craig.topperUnsubmitted Done Reply Inline Actions widening -> widened craig.topper: widening -> widened
				fhahnAuthorUnsubmitted Done Reply Inline Actions thanks, should be fixed! fhahn: thanks, should be fixed!
				power of 2, the vector is widened with neutral elements for the reduction
				at the end to the next power of 2.

				Example:

				.. code-block:: c++

				__builtin_reduce_add([e3, e2, e1, e0]) = __builtin_reduced_add([e3 + e2, e1 + e0])
				scanonUnsubmitted Done Reply Inline Actions Should be restricted to integer types. scanon: Should be restricted to integer types.
				scanonUnsubmitted Done Reply Inline Actions (Never mind, somehow read this as `&` instead of `\+`.) scanon: (Never mind, somehow read this as `&` instead of `\+`.)
				= (e3 + e2) + (e1 + e0)


				Let ``VT`` be a vector type and ``ET`` the element type of ``VT``.

				======================================= ================================================================ ==================================
				Name Operation Supported element types
				======================================= ================================================================ ==================================
				ET __builtin_reduce_max(VT a) return x or y, whichever is larger; If exactly one argument is integer and floating point types
				a NaN, return the other argument. If both arguments are NaNs,
				fmax() return a NaN.
				ET __builtin_reduce_min(VT a) return x or y, whichever is smaller; If exactly one argument integer and floating point types
				is a NaN, return the other argument. If both arguments are
				NaNs, fmax() return a NaN.
				kito-chengUnsubmitted Done Reply Inline Actions The example above use `__builtin_reduce_fadd`, but not listed here? or should we just use `__builtin_reduce_add` for floating point and fix the example? kito-cheng: The example above use `__builtin_reduce_fadd`, but not listed here? or should we just use…
				fhahnAuthorUnsubmitted Done Reply Inline Actions Thanks it should be `_add` instead of `_fadd`. fixed. fhahn: Thanks it should be `_add` instead of `_fadd`. fixed.
				ET __builtin_reduce_add(VT a) \+ integer and floating point types
				ET __builtin_reduce_and(VT a) & integer types
				ET __builtin_reduce_or(VT a) \\| integer types
				ET __builtin_reduce_xor(VT a) ^ integer types
				======================================= ================================================================ ==================================

	Matrix Types			Matrix Types
	============			============

	Clang provides an extension for matrix types, which is currently being			Clang provides an extension for matrix types, which is currently being
	implemented. See :ref:`the draft specification <matrixtypes>` for more details.			implemented. See :ref:`the draft specification <matrixtypes>` for more details.

	For example, the code below uses the matrix types extension to multiply two 4x4			For example, the code below uses the matrix types extension to multiply two 4x4
	float matrices and add the result to a third 4x4 matrix.			float matrices and add the result to a third 4x4 matrix.
	▲ Show 20 Lines • Show All 3,585 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Specify Clang vector builtins.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 382311

clang/docs/LanguageExtensions.rst

This is an archive of the discontinued LLVM Phabricator instance.

Specify Clang vector builtins.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 382311

clang/docs/LanguageExtensions.rst

Specify Clang vector builtins.
ClosedPublic