# Changeset View

# Standalone View

# llvm/docs/LangRef.rst

- This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 18,221 Lines • ▼ Show 20 Lines | .. code-block:: llvm | ||||

%r = call <4 x i32> @llvm.vp.xor.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl) | %r = call <4 x i32> @llvm.vp.xor.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl) | ||||

;; For all lanes below %evl, %r is lane-wise equivalent to %also.r | ;; For all lanes below %evl, %r is lane-wise equivalent to %also.r | ||||

%t = xor <4 x i32> %a, %b | %t = xor <4 x i32> %a, %b | ||||

%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef | %also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef | ||||

.. _int_vp_fadd: | |||||

'``llvm.vp.fadd.*``' Intrinsics | |||||

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |||||

Syntax: | |||||

""""""" | |||||

This is an overloaded intrinsic. | |||||

:: | |||||

declare <16 x float> @llvm.vp.fadd.v16f32 (<16 x float> <left_op>, <16 x float> <right_op>, <16 x i1> <mask>, i32 <vector_length>) | |||||

declare <vscale x 4 x float> @llvm.vp.fadd.nxv4f32 (<vscale x 4 x float> <left_op>, <vscale x 4 x float> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>) | |||||

declare <256 x double> @llvm.vp.fadd.v256f64 (<256 x double> <left_op>, <256 x double> <right_op>, <256 x i1> <mask>, i32 <vector_length>) | |||||

Overview: | |||||

""""""""" | |||||

Predicated floating-point addition of two vectors of floating-point values. | |||||

Arguments: | |||||

"""""""""" | |||||

The first two operands and the result have the same vector of floating-point type. The | |||||

third operand is the vector mask and has the same number of elements as the | |||||

result vector type. The fourth operand is the explicit vector length of the | |||||

operation. | |||||

Semantics: | |||||

"""""""""" | |||||

The '``llvm.vp.fadd``' intrinsic performs floating-point addition (:ref:`add <i_fadd>`) | |||||

of the first and second vector operand on each enabled lane. The result on | |||||

disabled lanes is undefined. The operation is performed in the default | |||||

floating-point environment. | |||||

Examples: | |||||

""""""""" | |||||

.. code-block:: llvm | |||||

%r = call <4 x float> @llvm.vp.fadd.v4f32(<4 x float> %a, <4 x float> %b, <4 x i1> %mask, i32 %evl) | |||||

;; For all lanes below %evl, %r is lane-wise equivalent to %also.r | |||||

frasercrmck: I realise this inherits from the documentation of the integer intrinsics, but I was wondering… | |||||

I see what are you getting at with this. I am not sure that introducing novel syntax only for explaining things is really helpful. My take here is that most aren't used to the simoll: I see what are you getting at with this. I am not sure that introducing novel syntax only for… | |||||

%t = fadd <4 x float> %a, %b | |||||

%also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> undef | |||||

Would it be more general/useful to have the intrinsics take an alternative value just like llvm.masked.load and its passthru operand? You would get identical behavior to your proposal by setting this passthru to undef. majnemer: Would it be more general/useful to have the intrinsics take an alternative value just like llvm. | |||||

My take is this: if there was a passthru parameter we'd still have to optimize/match the We had this discussion a while back: https://reviews.llvm.org/D57504#1851456 simoll: My take is this: if there was a passthru parameter we'd still have to optimize/match the… | |||||

Ah, I see. Fair enough :) majnemer: Ah, I see. Fair enough :) | |||||

Thanks for following up on your drive-by comment ;-) simoll: Thanks for following up on your drive-by comment ;-) | |||||

.. _int_vp_fsub: | |||||

'``llvm.vp.fsub.*``' Intrinsics | |||||

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |||||

Syntax: | |||||

""""""" | |||||

This is an overloaded intrinsic. | |||||

:: | |||||

declare <16 x float> @llvm.vp.fsub.v16f32 (<16 x float> <left_op>, <16 x float> <right_op>, <16 x i1> <mask>, i32 <vector_length>) | |||||

declare <vscale x 4 x float> @llvm.vp.fsub.nxv4f32 (<vscale x 4 x float> <left_op>, <vscale x 4 x float> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>) | |||||

declare <256 x double> @llvm.vp.fsub.v256f64 (<256 x double> <left_op>, <256 x double> <right_op>, <256 x i1> <mask>, i32 <vector_length>) | |||||

Overview: | |||||

""""""""" | |||||

Predicated floating-point subtraction of two vectors of floating-point values. | |||||

frasercrmck: `addition` -> `subtraction` | |||||

Arguments: | |||||

"""""""""" | |||||

The first two operands and the result have the same vector of floating-point type. The | |||||

third operand is the vector mask and has the same number of elements as the | |||||

result vector type. The fourth operand is the explicit vector length of the | |||||

operation. | |||||

Semantics: | |||||

"""""""""" | |||||

The '``llvm.vp.fsub``' intrinsic performs floating-point subtraction (:ref:`add <i_fsub>`) | |||||

same here: frasercrmck: same here: `addition` and maybe `add <i_fsub>`? | |||||

of the first and second vector operand on each enabled lane. The result on | |||||

disabled lanes is undefined. The operation is performed in the default | |||||

floating-point environment. | |||||

Examples: | |||||

""""""""" | |||||

.. code-block:: llvm | |||||

%r = call <4 x float> @llvm.vp.fsub.v4f32(<4 x float> %a, <4 x float> %b, <4 x i1> %mask, i32 %evl) | |||||

;; For all lanes below %evl, %r is lane-wise equivalent to %also.r | |||||

%t = fsub <4 x float> %a, %b | |||||

%also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> undef | |||||

.. _int_vp_fmul: | |||||

'``llvm.vp.fmul.*``' Intrinsics | |||||

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |||||

Syntax: | |||||

""""""" | |||||

This is an overloaded intrinsic. | |||||

:: | |||||

declare <16 x float> @llvm.vp.fmul.v16f32 (<16 x float> <left_op>, <16 x float> <right_op>, <16 x i1> <mask>, i32 <vector_length>) | |||||

declare <vscale x 4 x float> @llvm.vp.fmul.nxv4f32 (<vscale x 4 x float> <left_op>, <vscale x 4 x float> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>) | |||||

declare <256 x double> @llvm.vp.fmul.v256f64 (<256 x double> <left_op>, <256 x double> <right_op>, <256 x i1> <mask>, i32 <vector_length>) | |||||

Overview: | |||||

""""""""" | |||||

Predicated floating-point multiplication of two vectors of floating-point values. | |||||

frasercrmck: `addition` -> `multiplication` | |||||

Arguments: | |||||

"""""""""" | |||||

The first two operands and the result have the same vector of floating-point type. The | |||||

third operand is the vector mask and has the same number of elements as the | |||||

result vector type. The fourth operand is the explicit vector length of the | |||||

operation. | |||||

Semantics: | |||||

"""""""""" | |||||

The '``llvm.vp.fmul``' intrinsic performs floating-point multiplication (:ref:`add <i_fmul>`) | |||||

frasercrmck: `addition` -> `multiplication` | |||||

of the first and second vector operand on each enabled lane. The result on | |||||

disabled lanes is undefined. The operation is performed in the default | |||||

floating-point environment. | |||||

Examples: | |||||

""""""""" | |||||

.. code-block:: llvm | |||||

%r = call <4 x float> @llvm.vp.fmul.v4f32(<4 x float> %a, <4 x float> %b, <4 x i1> %mask, i32 %evl) | |||||

;; For all lanes below %evl, %r is lane-wise equivalent to %also.r | |||||

%t = fmul <4 x float> %a, %b | |||||

%also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> undef | |||||

.. _int_vp_fdiv: | |||||

'``llvm.vp.fdiv.*``' Intrinsics | |||||

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |||||

Syntax: | |||||

""""""" | |||||

This is an overloaded intrinsic. | |||||

:: | |||||

declare <16 x float> @llvm.vp.fdiv.v16f32 (<16 x float> <left_op>, <16 x float> <right_op>, <16 x i1> <mask>, i32 <vector_length>) | |||||

declare <vscale x 4 x float> @llvm.vp.fdiv.nxv4f32 (<vscale x 4 x float> <left_op>, <vscale x 4 x float> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>) | |||||

declare <256 x double> @llvm.vp.fdiv.v256f64 (<256 x double> <left_op>, <256 x double> <right_op>, <256 x i1> <mask>, i32 <vector_length>) | |||||

Overview: | |||||

""""""""" | |||||

Predicated floating-point division of two vectors of floating-point values. | |||||

frasercrmck: `addition` -> `division` | |||||

Arguments: | |||||

"""""""""" | |||||

The first two operands and the result have the same vector of floating-point type. The | |||||

third operand is the vector mask and has the same number of elements as the | |||||

result vector type. The fourth operand is the explicit vector length of the | |||||

operation. | |||||

Semantics: | |||||

"""""""""" | |||||

The '``llvm.vp.fdiv``' intrinsic performs floating-point division (:ref:`add <i_fdiv>`) | |||||

same here frasercrmck: same here | |||||

of the first and second vector operand on each enabled lane. The result on | |||||

disabled lanes is undefined. The operation is performed in the default | |||||

floating-point environment. | |||||

Examples: | |||||

""""""""" | |||||

.. code-block:: llvm | |||||

%r = call <4 x float> @llvm.vp.fdiv.v4f32(<4 x float> %a, <4 x float> %b, <4 x i1> %mask, i32 %evl) | |||||

;; For all lanes below %evl, %r is lane-wise equivalent to %also.r | |||||

%t = fdiv <4 x float> %a, %b | |||||

%also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> undef | |||||

.. _int_vp_frem: | |||||

'``llvm.vp.frem.*``' Intrinsics | |||||

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |||||

Syntax: | |||||

""""""" | |||||

This is an overloaded intrinsic. | |||||

:: | |||||

declare <16 x float> @llvm.vp.frem.v16f32 (<16 x float> <left_op>, <16 x float> <right_op>, <16 x i1> <mask>, i32 <vector_length>) | |||||

declare <vscale x 4 x float> @llvm.vp.frem.nxv4f32 (<vscale x 4 x float> <left_op>, <vscale x 4 x float> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>) | |||||

declare <256 x double> @llvm.vp.frem.v256f64 (<256 x double> <left_op>, <256 x double> <right_op>, <256 x i1> <mask>, i32 <vector_length>) | |||||

Overview: | |||||

""""""""" | |||||

Predicated floating-point remainder of two vectors of floating-point values. | |||||

frasercrmck: `addition` -> `remainder` | |||||

Arguments: | |||||

"""""""""" | |||||

The first two operands and the result have the same vector of floating-point type. The | |||||

third operand is the vector mask and has the same number of elements as the | |||||

result vector type. The fourth operand is the explicit vector length of the | |||||

operation. | |||||

Semantics: | |||||

"""""""""" | |||||

The '``llvm.vp.frem``' intrinsic performs floating-point remainder (:ref:`add <i_frem>`) | |||||

frasercrmck: `addition` -> `remainder` | |||||

of the first and second vector operand on each enabled lane. The result on | |||||

disabled lanes is undefined. The operation is performed in the default | |||||

floating-point environment. | |||||

Examples: | |||||

""""""""" | |||||

.. code-block:: llvm | |||||

%r = call <4 x float> @llvm.vp.frem.v4f32(<4 x float> %a, <4 x float> %b, <4 x i1> %mask, i32 %evl) | |||||

;; For all lanes below %evl, %r is lane-wise equivalent to %also.r | |||||

%t = frem <4 x float> %a, %b | |||||

%also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> undef | |||||

.. _int_get_active_lane_mask: | .. _int_get_active_lane_mask: | ||||

'``llvm.get.active.lane.mask.*``' Intrinsics | '``llvm.get.active.lane.mask.*``' Intrinsics | ||||

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||

Syntax: | Syntax: | ||||

""""""" | """"""" | ||||

This is an overloaded intrinsic. | This is an overloaded intrinsic. | ||||

▲ Show 20 Lines • Show All 4,033 Lines • Show Last 20 Lines |

I realise this inherits from the documentation of the integer intrinsics, but I was wondering if this can be expressed as being equivalent to

<%evl x float>in the following example. Would that be any clearer that the intrinsic is conceptually working on vectors of length%evlisn't actually executing the lanes above%evl(as in%tbelow)?