This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
10/10
LangRef.rst
-
include/llvm/IR/
-
llvm/
-
IR/
4/4
IntrinsicInst.h
-
Intrinsics.td
8/8
VPIntrinsics.def
-
lib/
-
CodeGen/
5/7
ExpandVectorPredication.cpp
-
SelectionDAG/
-
SelectionDAGBuilder.cpp
-
IR/
-
IntrinsicInst.cpp
-
test/
-
CodeGen/Generic/
-
Generic/
1/1
expand-vp.ll
-
Verifier/
2/2
vp-intrinsics.ll
-
unittests/IR/
-
IR/
2/2
VPIntrinsicTest.cpp

Differential D104308

[VP] Add vector-predicated reduction intrinsics
ClosedPublic

Authored by frasercrmck on Jun 15 2021, 9:41 AM.

Download Raw Diff

Details

Reviewers

simoll
craig.topper
rogfer01
vkmr
andrew.w.kaylor
HsiangKai

Commits

rGf3e9047249d0: [VP] Add vector-predicated reduction intrinsics

Summary

This patch adds vector-predicated ("VP") reduction intrinsics corresponding to
each of the existing unpredicated llvm.vector.reduce.* versions. Unlike the
unpredicated reductions, all VP reductions have a start value. This start value
is returned when the no vector element is active.

Support for expansion on targets without native vector-predication support is
included.

This patch is based on the "reduction slice" of the LLVM-VP reference patch
(https://reviews.llvm.org/D57504).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

frasercrmck created this revision.Jun 15 2021, 9:41 AM

Herald added subscribers: dexonsmith, jdoerfert, hiraditya. · View Herald TranscriptJun 15 2021, 9:41 AM

frasercrmck requested review of this revision.Jun 15 2021, 9:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2021, 9:41 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

frasercrmck added inline comments.Jun 15 2021, 9:43 AM

llvm/test/CodeGen/Generic/expand-vp.ll
29	Note that I've still got some reductions to add here but I feel the patch itself is good enough to start reviewing.

frasercrmck added inline comments.Jun 15 2021, 9:46 AM

llvm/include/llvm/IR/IntrinsicInst.h
450	This can probably go in `VPReductionIntrinsic`

Harbormaster completed remote builds in B109317: Diff 352164.Jun 15 2021, 10:16 AM

I wonder whether the semantics sections in the documentation should just refer back to semantics sections of the regular reduction intrinsics instead of replicating them. In the end, we use those when vp reduction are expanded anyway: if standard reductions switch semantics at some point, we will too unwittingly.

llvm/unittests/IR/VPIntrinsicTest.cpp
67	Good ol' printf debugging

rebase
remove debug print
move isVPReduction into VPReductionIntrinsic
flesh out expand-vp.ll test

In D104308#2826308, @simoll wrote:

I wonder whether the semantics sections in the documentation should just refer back to semantics sections of the regular reduction intrinsics instead of replicating them. In the end, we use those when vp reduction are expanded anyway: if standard reductions switch semantics at some point, we will too unwittingly.

I think that's a good idea. I found it difficult to strike a balance between providing enough "interesting" information that doesn't involve jumping about the page and plain copy/paste. I do think it's important to clarify how disabled lanes behave (I hope you agree) but almost everything else is just the base version.

llvm/unittests/IR/VPIntrinsicTest.cpp
67	:)

frasercrmck marked an inline comment as done.Jun 18 2021, 4:08 AM

Harbormaster completed remote builds in B109901: Diff 352970.Jun 18 2021, 6:34 PM

In D104308#2826690, @frasercrmck wrote:

In D104308#2826308, @simoll wrote:

I wonder whether the semantics sections in the documentation should just refer back to semantics sections of the regular reduction intrinsics instead of replicating them. In the end, we use those when vp reduction are expanded anyway: if standard reductions switch semantics at some point, we will too unwittingly.

I think that's a good idea. I found it difficult to strike a balance between providing enough "interesting" information that doesn't involve jumping about the page and plain copy/paste. I do think it's important to clarify how disabled lanes behave (I hope you agree) but almost everything else is just the base version.

Disabling lanes is really what makes the difference between these and the regular reduction intrinsics.
There is also the corner case that all lanes are disabled and i am unsure what the return value should be then. Any thoughts on that?

In D104308#2830045, @simoll wrote:

Disabling lanes is really what makes the difference between these and the regular reduction intrinsics.
There is also the corner case that all lanes are disabled and i am unsure what the return value should be then. Any thoughts on that?

Agreed. I'll update the docs accordingly.

We have a couple of options for what happens when all lanes are disabled. For starters, it follows logically from the definition of the expansion into reduce(select %mask, %v, %neutral) that we just return the neutral element. So that's the "easiest" definition in that sense.

We could return undef too which would be closer to how the other VP intrinsics work.

As for other options, it wouldn't make sense to me to specify that we return any of the vector elements (e.g. v[0]) as they're not conceptually active. And I think poison wouldn't make any sense. Are those the only realistic options?

In terms of hardware (which is orthogonal but may help guide us), RVV always takes a start value so we'd just return the neutral element; I admit I might be surreptitiously led by that. The lowering for returning the "neutral element" may be complicated on some targets, involving fetching the active length and perhaps doing some kind of scalar select. How would VE work?

In D104308#2830379, @frasercrmck wrote:

In D104308#2830045, @simoll wrote:

Disabling lanes is really what makes the difference between these and the regular reduction intrinsics.
There is also the corner case that all lanes are disabled and i am unsure what the return value should be then. Any thoughts on that?

Agreed. I'll update the docs accordingly.

We have a couple of options for what happens when all lanes are disabled. For starters, it follows logically from the definition of the expansion into reduce(select %mask, %v, %neutral) that we just return the neutral element. So that's the "easiest" definition in that sense.

We could return undef too which would be closer to how the other VP intrinsics work.

As for other options, it wouldn't make sense to me to specify that we return any of the vector elements (e.g. v[0]) as they're not conceptually active. And I think poison wouldn't make any sense. Are those the only realistic options?

In terms of hardware (which is orthogonal but may help guide us), RVV always takes a start value so we'd just return the neutral element; I admit I might be surreptitiously led by that. The lowering for returning the "neutral element" may be complicated on some targets, involving fetching the active length and perhaps doing some kind of scalar select. How would VE work?

Could we add a start value operand to all of the intrinsics? fadd and fmul already have it. If the vectorizer doesn't have a value it can put the neutral value. On targets like RISCV we can use that neutral value directly since we need the scalar input anyway. For other targets they can detect that the argument is neutral and not use it if it doesn't mesh with their ISA.

In D104308#2831225, @craig.topper wrote:

In D104308#2830379, @frasercrmck wrote:

In D104308#2830045, @simoll wrote:

Disabling lanes is really what makes the difference between these and the regular reduction intrinsics.
There is also the corner case that all lanes are disabled and i am unsure what the return value should be then. Any thoughts on that?

Agreed. I'll update the docs accordingly.

We have a couple of options for what happens when all lanes are disabled. For starters, it follows logically from the definition of the expansion into reduce(select %mask, %v, %neutral) that we just return the neutral element. So that's the "easiest" definition in that sense.

We could return undef too which would be closer to how the other VP intrinsics work.

As for other options, it wouldn't make sense to me to specify that we return any of the vector elements (e.g. v[0]) as they're not conceptually active. And I think poison wouldn't make any sense. Are those the only realistic options?

In terms of hardware (which is orthogonal but may help guide us), RVV always takes a start value so we'd just return the neutral element; I admit I might be surreptitiously led by that. The lowering for returning the "neutral element" may be complicated on some targets, involving fetching the active length and perhaps doing some kind of scalar select. How would VE work?

Could we add a start value operand to all of the intrinsics? fadd and fmul already have it. If the vectorizer doesn't have a value it can put the neutral value. On targets like RISCV we can use that neutral value directly since we need the scalar input anyway. For other targets they can detect that the argument is neutral and not use it if it doesn't mesh with their ISA.

Feeding back the discussions we had off Phabricator into the review thread:

We agreed to have scalar start value parameters in all vp reduction intrinsics (unlike llvm.vector.reduce.* which only have them where needed for non-reassociatable reductions). This makes them regular and there is no need to match the start value.
There appears to be an issue with the %evl == 0 corner case in that the ISAs (RISC- V and VE) do not update the result register in that case. There would need to be a register constraint between the start value register and the result register to enforce this. This sure is an artifact of the intrinsic reaching into isel. However, I am not sure how big of a problem that really is.

add start value to all intrinsics
clarify %evl==0 semantics in docs

Wasn't sure if it's best to explicitly say in the docs that vp reductions have a start value unlike their unpredicated counterparts. I've left it out for now.

craig.topper added inline comments.Jul 14 2021, 10:06 AM

llvm/docs/LangRef.rst
18624	With the start value added, I think it's the second operand now?
18849	Neutral value is -1 or UINT_MAX for AND
19217	Does -QNAN mean anything? Isn't the sign of a NaN mostly ignored.
19224	Even if all vector elements are NaN, the result would depend on the start value
llvm/include/llvm/IR/IntrinsicInst.h
465	Should this be isVPReduction?
llvm/include/llvm/IR/VPIntrinsics.def
251	Why accu instead of start?
llvm/lib/CodeGen/ExpandVectorPredication.cpp
263	I was wondering if we could go through ConstantExpr::getBinOpIdentity to avoid repeating the constants in multiple places but we'd still need special handling for min/max
llvm/test/Verifier/vp-intrinsics.ll
34	Start value is missing

Harbormaster completed remote builds in B114004: Diff 358635.Jul 14 2021, 10:12 AM

dexonsmith removed a subscriber: dexonsmith.Jul 14 2021, 12:15 PM

fix Verifier test
fix VPReductionIntrinsic::classof
add new unit test
update docs:

fix references to operand indices
fix neutral element for AND reduction
fix specified return value for FMIN/FMAX and NaNs
be more explicit about return value EVL==0
be more explicit about start value
reference operands by their names in Arguments sections

llvm/docs/LangRef.rst
18624	Yes, thanks. I didn't do a proper sweep over the docs. I've made further changes to them so it might be worth going over them again.
18849	Good spot, thanks. Updated the "equivalent to" section below, too.
19217	I have heard that the sign of NaNs is often ignored but I'm not sure it's guaranteed to be ignored? `-QNAN` does have a bit representation and `ConstantFP` and `APFloat` both take a `Negative` parameter to their `getQNAN` methods. It's also what `SelectionDAG::getNeutralElement` is doing for `FMAXNUM`.
19224	Yes true. I've replaced that sentence now.
llvm/include/llvm/IR/IntrinsicInst.h
456	Not sure whether to keep this in or not, now that all vp reductions have start parameter? Currently start is always 0 and vector is always 1.
465	Oh good spot, thanks. I've fixed that and added a unit test for these `VPReductionIntrinsic` methods.
llvm/include/llvm/IR/VPIntrinsics.def
251	I think the reference patch uses/used `accu` so I was flip-flopping between the two. I agree that "start" is better though.
llvm/lib/CodeGen/ExpandVectorPredication.cpp
263	Coming at it from a slightly different angle, I was wondering if there should be a single source of truth for neutral reduction elements between IR and SelectionDAG.
llvm/test/Verifier/vp-intrinsics.ll
34	Cheers.

Harbormaster completed remote builds in B114204: Diff 358920.Jul 15 2021, 5:28 AM

rebase

Harbormaster completed remote builds in B114221: Diff 358942.Jul 15 2021, 6:51 AM

rebase

Harbormaster completed remote builds in B117904: Diff 364100.Aug 4 2021, 7:55 AM

add logic for legalization similar to regular reductions

Harbormaster completed remote builds in B117944: Diff 364160.Aug 4 2021, 10:19 AM

return correct non-SEQ ISD opcode from reassoc reductions

Harbormaster completed remote builds in B118176: Diff 364496.Aug 5 2021, 9:22 AM

frasercrmck added a child revision: D107657: [RISCV][VP] Add support for VP_REDUCE_* operations.Aug 6 2021, 9:48 AM

craig.topper added inline comments.Aug 6 2021, 10:42 AM

llvm/docs/LangRef.rst
18752	vale -> value
llvm/include/llvm/IR/VPIntrinsics.def
241	Is this comment still accurate with start value?
297	Should this be %acc or %start?
313	Should this be lined up with vp_reduce_fadd on the line above?
llvm/lib/CodeGen/ExpandVectorPredication.cpp
375	Is the documentation for this in IRBuilder incorrect? It says the accumulator is for ordered reductions. Is it also used for unordered? /// Create a vector fadd reduction intrinsic of the source vector. /// The first parameter is a scalar accumulator value for ordered reductions. CallInst CreateFAddReduce(Value Acc, Value Src); /// Create a vector fmul reduction intrinsic of the source vector. /// The first parameter is a scalar accumulator value for ordered reductions. CallInst CreateFMulReduce(Value Acc, Value Src);

rebase
address wee bits of feedback

frasercrmck marked an inline comment as done.Aug 9 2021, 3:19 AM

frasercrmck added inline comments.

llvm/docs/LangRef.rst
18752	Nice spot
llvm/include/llvm/IR/VPIntrinsics.def
241	Nope, but it is now, thanks for catching that.
297	Aye it should be `%start` but I've reworked the documentation anyway; this specialized macro is now really for reductions with separate "seq" forms. Cheers.
313	Done, cheers.
llvm/lib/CodeGen/ExpandVectorPredication.cpp
375	Yeah so the unordered reduction intrinsics do indeed have an accumulator, so this is misleading. However, since the only difference between ordered and unordered intrinsics is the presence of the `reassoc` flag, this method always creates an ordered reduction. It's up to the user to add the flag later. That's what we do in `replaceOperation` by transferring any existing fast-math flags on to the expanded reduction. From the SelectionDAG's perspective the unordered reductions don't have an accumulator, because it's split out early in the SelectionDAGBuilder. I wonder if that's where the confusion came from. I think I still support addressing this documentation though. I can do that in a separate patch as it's likely to spawn discussion.

Harbormaster completed remote builds in B118639: Diff 365136.Aug 9 2021, 4:10 AM

frasercrmck marked an inline comment as done.Aug 9 2021, 4:17 AM

frasercrmck added inline comments.

llvm/lib/CodeGen/ExpandVectorPredication.cpp
375	I tried to address this in D107753.

craig.topper added inline comments.Aug 9 2021, 9:44 AM

llvm/lib/CodeGen/ExpandVectorPredication.cpp
27	What did we start using from the Operator.h? I couldn't spot it.

remove unused header

frasercrmck marked an inline comment as done.Aug 10 2021, 2:45 AM

frasercrmck added inline comments.

llvm/lib/CodeGen/ExpandVectorPredication.cpp
27	Ah, nice; must have been an artifact from an intermediate change. That's that gone now.

Harbormaster completed remote builds in B118832: Diff 365399.Aug 10 2021, 3:31 AM

LGTM

This revision is now accepted and ready to land.Aug 10 2021, 8:30 AM

Thanks for the review, Craig. Does the patch LGTY, @simoll?

rebase
address clang-tidy warnings

Harbormaster completed remote builds in B119039: Diff 365706.Aug 11 2021, 4:01 AM

rebase

Harbormaster completed remote builds in B119917: Diff 366917.Aug 17 2021, 10:02 AM

frasercrmck edited the summary of this revision. (Show Details)Aug 17 2021, 10:05 AM

This revision was landed with ongoing or failed builds.Aug 17 2021, 10:06 AM

Closed by commit rGf3e9047249d0: [VP] Add vector-predicated reduction intrinsics (authored by frasercrmck). · Explain Why

This revision was automatically updated to reflect the committed changes.

frasercrmck added a commit: rGf3e9047249d0: [VP] Add vector-predicated reduction intrinsics.

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

794 lines

include/

llvm/

IR/

IntrinsicInst.h

22 lines

Intrinsics.td

69 lines

VPIntrinsics.def

94 lines

lib/

CodeGen/

ExpandVectorPredication.cpp

138 lines

SelectionDAG/

SelectionDAGBuilder.cpp

7 lines

IR/

IntrinsicInst.cpp

52 lines

test/

CodeGen/

Generic/

expand-vp.ll

182 lines

Verifier/

vp-intrinsics.ll

32 lines

unittests/

IR/

VPIntrinsicTest.cpp

82 lines

Diff 366932

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 16,627 Lines • ▼ Show 20 Lines
Vector Reduction Intrinsics		Vector Reduction Intrinsics
---------------------------		---------------------------

Horizontal reductions of vectors can be expressed using the following		Horizontal reductions of vectors can be expressed using the following
intrinsics. Each one takes a vector operand as an input and applies its		intrinsics. Each one takes a vector operand as an input and applies its
respective operation across all elements of the vector, returning a single		respective operation across all elements of the vector, returning a single
scalar result of the same element type.		scalar result of the same element type.

		.. _int_vector_reduce_add:

'``llvm.vector.reduce.add.*``' Intrinsic		'``llvm.vector.reduce.add.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %a)		declare i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %a)
declare i64 @llvm.vector.reduce.add.v2i64(<2 x i64> %a)		declare i64 @llvm.vector.reduce.add.v2i64(<2 x i64> %a)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.vector.reduce.add.*``' intrinsics do an integer ``ADD``		The '``llvm.vector.reduce.add.*``' intrinsics do an integer ``ADD``
reduction of a vector, returning the result as a scalar. The return type matches		reduction of a vector, returning the result as a scalar. The return type matches
the element-type of the vector input.		the element-type of the vector input.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of integer values.		The argument to this intrinsic must be a vector of integer values.

		.. _int_vector_reduce_fadd:

'``llvm.vector.reduce.fadd.*``' Intrinsic		'``llvm.vector.reduce.fadd.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 36 Lines
"""""""""		"""""""""

::		::

%unord = call reassoc float @llvm.vector.reduce.fadd.v4f32(float -0.0, <4 x float> %input) ; relaxed reduction		%unord = call reassoc float @llvm.vector.reduce.fadd.v4f32(float -0.0, <4 x float> %input) ; relaxed reduction
%ord = call float @llvm.vector.reduce.fadd.v4f32(float %start_value, <4 x float> %input) ; sequential reduction		%ord = call float @llvm.vector.reduce.fadd.v4f32(float %start_value, <4 x float> %input) ; sequential reduction


		.. _int_vector_reduce_mul:

'``llvm.vector.reduce.mul.*``' Intrinsic		'``llvm.vector.reduce.mul.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> %a)		declare i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> %a)
declare i64 @llvm.vector.reduce.mul.v2i64(<2 x i64> %a)		declare i64 @llvm.vector.reduce.mul.v2i64(<2 x i64> %a)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.vector.reduce.mul.*``' intrinsics do an integer ``MUL``		The '``llvm.vector.reduce.mul.*``' intrinsics do an integer ``MUL``
reduction of a vector, returning the result as a scalar. The return type matches		reduction of a vector, returning the result as a scalar. The return type matches
the element-type of the vector input.		the element-type of the vector input.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of integer values.		The argument to this intrinsic must be a vector of integer values.

		.. _int_vector_reduce_fmul:

'``llvm.vector.reduce.fmul.*``' Intrinsic		'``llvm.vector.reduce.fmul.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 35 Lines
Examples:		Examples:
"""""""""		"""""""""

::		::

%unord = call reassoc float @llvm.vector.reduce.fmul.v4f32(float 1.0, <4 x float> %input) ; relaxed reduction		%unord = call reassoc float @llvm.vector.reduce.fmul.v4f32(float 1.0, <4 x float> %input) ; relaxed reduction
%ord = call float @llvm.vector.reduce.fmul.v4f32(float %start_value, <4 x float> %input) ; sequential reduction		%ord = call float @llvm.vector.reduce.fmul.v4f32(float %start_value, <4 x float> %input) ; sequential reduction

		.. _int_vector_reduce_and:

'``llvm.vector.reduce.and.*``' Intrinsic		'``llvm.vector.reduce.and.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare i32 @llvm.vector.reduce.and.v4i32(<4 x i32> %a)		declare i32 @llvm.vector.reduce.and.v4i32(<4 x i32> %a)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.vector.reduce.and.*``' intrinsics do a bitwise ``AND``		The '``llvm.vector.reduce.and.*``' intrinsics do a bitwise ``AND``
reduction of a vector, returning the result as a scalar. The return type matches		reduction of a vector, returning the result as a scalar. The return type matches
the element-type of the vector input.		the element-type of the vector input.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of integer values.		The argument to this intrinsic must be a vector of integer values.

		.. _int_vector_reduce_or:

'``llvm.vector.reduce.or.*``' Intrinsic		'``llvm.vector.reduce.or.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare i32 @llvm.vector.reduce.or.v4i32(<4 x i32> %a)		declare i32 @llvm.vector.reduce.or.v4i32(<4 x i32> %a)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.vector.reduce.or.*``' intrinsics do a bitwise ``OR`` reduction		The '``llvm.vector.reduce.or.*``' intrinsics do a bitwise ``OR`` reduction
of a vector, returning the result as a scalar. The return type matches the		of a vector, returning the result as a scalar. The return type matches the
element-type of the vector input.		element-type of the vector input.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of integer values.		The argument to this intrinsic must be a vector of integer values.

		.. _int_vector_reduce_xor:

'``llvm.vector.reduce.xor.*``' Intrinsic		'``llvm.vector.reduce.xor.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare i32 @llvm.vector.reduce.xor.v4i32(<4 x i32> %a)		declare i32 @llvm.vector.reduce.xor.v4i32(<4 x i32> %a)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.vector.reduce.xor.*``' intrinsics do a bitwise ``XOR``		The '``llvm.vector.reduce.xor.*``' intrinsics do a bitwise ``XOR``
reduction of a vector, returning the result as a scalar. The return type matches		reduction of a vector, returning the result as a scalar. The return type matches
the element-type of the vector input.		the element-type of the vector input.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of integer values.		The argument to this intrinsic must be a vector of integer values.

		.. _int_vector_reduce_smax:

'``llvm.vector.reduce.smax.*``' Intrinsic		'``llvm.vector.reduce.smax.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> %a)		declare i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> %a)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.vector.reduce.smax.*``' intrinsics do a signed integer		The '``llvm.vector.reduce.smax.*``' intrinsics do a signed integer
``MAX`` reduction of a vector, returning the result as a scalar. The return type		``MAX`` reduction of a vector, returning the result as a scalar. The return type
matches the element-type of the vector input.		matches the element-type of the vector input.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of integer values.		The argument to this intrinsic must be a vector of integer values.

		.. _int_vector_reduce_smin:

'``llvm.vector.reduce.smin.*``' Intrinsic		'``llvm.vector.reduce.smin.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> %a)		declare i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> %a)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.vector.reduce.smin.*``' intrinsics do a signed integer		The '``llvm.vector.reduce.smin.*``' intrinsics do a signed integer
``MIN`` reduction of a vector, returning the result as a scalar. The return type		``MIN`` reduction of a vector, returning the result as a scalar. The return type
matches the element-type of the vector input.		matches the element-type of the vector input.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of integer values.		The argument to this intrinsic must be a vector of integer values.

		.. _int_vector_reduce_umax:

'``llvm.vector.reduce.umax.*``' Intrinsic		'``llvm.vector.reduce.umax.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare i32 @llvm.vector.reduce.umax.v4i32(<4 x i32> %a)		declare i32 @llvm.vector.reduce.umax.v4i32(<4 x i32> %a)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.vector.reduce.umax.*``' intrinsics do an unsigned		The '``llvm.vector.reduce.umax.*``' intrinsics do an unsigned
integer ``MAX`` reduction of a vector, returning the result as a scalar. The		integer ``MAX`` reduction of a vector, returning the result as a scalar. The
return type matches the element-type of the vector input.		return type matches the element-type of the vector input.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of integer values.		The argument to this intrinsic must be a vector of integer values.

		.. _int_vector_reduce_umin:

'``llvm.vector.reduce.umin.*``' Intrinsic		'``llvm.vector.reduce.umin.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> %a)		declare i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> %a)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.vector.reduce.umin.*``' intrinsics do an unsigned		The '``llvm.vector.reduce.umin.*``' intrinsics do an unsigned
integer ``MIN`` reduction of a vector, returning the result as a scalar. The		integer ``MIN`` reduction of a vector, returning the result as a scalar. The
return type matches the element-type of the vector input.		return type matches the element-type of the vector input.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of integer values.		The argument to this intrinsic must be a vector of integer values.

		.. _int_vector_reduce_fmax:

'``llvm.vector.reduce.fmax.*``' Intrinsic		'``llvm.vector.reduce.fmax.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 14 Lines

If the intrinsic call has the ``nnan`` fast-math flag, then the operation can		If the intrinsic call has the ``nnan`` fast-math flag, then the operation can
assume that NaNs are not present in the input vector.		assume that NaNs are not present in the input vector.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of floating-point values.		The argument to this intrinsic must be a vector of floating-point values.

		.. _int_vector_reduce_fmin:

'``llvm.vector.reduce.fmin.*``' Intrinsic		'``llvm.vector.reduce.fmin.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""
This is an overloaded intrinsic.		This is an overloaded intrinsic.

::		::
▲ Show 20 Lines • Show All 1,598 Lines • ▼ Show 20 Lines	.. code-block:: llvm
%r = call <4 x float> @llvm.vp.frem.v4f32(<4 x float> %a, <4 x float> %b, <4 x i1> %mask, i32 %evl)		%r = call <4 x float> @llvm.vp.frem.v4f32(<4 x float> %a, <4 x float> %b, <4 x i1> %mask, i32 %evl)
;; For all lanes below %evl, %r is lane-wise equivalent to %also.r		;; For all lanes below %evl, %r is lane-wise equivalent to %also.r

%t = frem <4 x float> %a, %b		%t = frem <4 x float> %a, %b
%also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> undef		%also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> undef



		.. _int_vp_reduce_add:

		'``llvm.vp.reduce.add.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare i32 @llvm.vp.reduce.add.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
		declare i16 @llvm.vp.reduce.add.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated integer ``ADD`` reduction of a vector and a scalar starting value,
		returning the result as a scalar.

		Arguments:
		""""""""""

		The first operand is the start value of the reduction, which must be a scalar
		integer type equal to the result type. The second operand is the vector on
		craig.topperUnsubmitted Done Reply Inline Actions With the start value added, I think it's the second operand now? craig.topper: With the start value added, I think it's the second operand now?
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Yes, thanks. I didn't do a proper sweep over the docs. I've made further changes to them so it might be worth going over them again. frasercrmck: Yes, thanks. I didn't do a proper sweep over the docs. I've made further changes to them so it…
		which the reduction is performed and must be a vector of integer values whose
		element type is the result/start type. The third operand is the vector mask and
		is a vector of boolean values with the same number of elements as the vector
		operand. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.add``' intrinsic performs the integer ``ADD`` reduction
		(:ref:`llvm.vector.reduce.add <int_vector_reduce_add>`) of the vector operand
		``val`` on each enabled lane, adding it to the scalar ``start_value``. Disabled
		lanes are treated as containing the neutral value ``0`` (i.e. having no effect
		on the reduction operation). If the vector length is zero, the result is equal
		to ``start_value``.

		To ignore the start value, the neutral value can be used.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call i32 @llvm.vp.reduce.add.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> zeroinitializer
		%reduction = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %masked.a)
		%also.r = add i32 %reduction, %start


		.. _int_vp_reduce_fadd:

		'``llvm.vp.reduce.fadd.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare float @llvm.vp.reduce.fadd.v4f32(float <start_value>, <4 x float> <val>, <4 x i1> <mask>, i32 <vector_length>)
		declare double @llvm.vp.reduce.fadd.nxv8f64(double <start_value>, <vscale x 8 x double> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated floating-point ``ADD`` reduction of a vector and a scalar starting
		value, returning the result as a scalar.

		Arguments:
		""""""""""

		The first operand is the start value of the reduction, which must be a scalar
		floating-point type equal to the result type. The second operand is the vector
		on which the reduction is performed and must be a vector of floating-point
		values whose element type is the result/start type. The third operand is the
		vector mask and is a vector of boolean values with the same number of elements
		as the vector operand. The fourth operand is the explicit vector length of the
		operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.fadd``' intrinsic performs the floating-point ``ADD``
		reduction (:ref:`llvm.vector.reduce.fadd <int_vector_reduce_fadd>`) of the
		vector operand ``val`` on each enabled lane, adding it to the scalar
		``start_value``. Disabled lanes are treated as containing the neutral value
		``-0.0`` (i.e. having no effect on the reduction operation). If no lanes are
		enabled, the resulting value will be equal to ``start_value``.

		To ignore the start value, the neutral value can be used.

		See the unpredicated version (:ref:`llvm.vector.reduce.fadd
		<int_vector_reduce_fadd>`) for more detail on the semantics of the reduction.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call float @llvm.vp.reduce.fadd.v4f32(float %start, <4 x float> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x float> %a, <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>
		%also.r = call float @llvm.vector.reduce.fadd.v4f32(float %start, <4 x float> %masked.a)


		.. _int_vp_reduce_mul:

		'``llvm.vp.reduce.mul.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare i32 @llvm.vp.reduce.mul.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
		declare i16 @llvm.vp.reduce.mul.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated integer ``MUL`` reduction of a vector and a scalar starting value,
		returning the result as a scalar.


		Arguments:
		""""""""""

		The first operand is the start value of the reduction, which must be a scalar
		integer type equal to the result type. The second operand is the vector on
		which the reduction is performed and must be a vector of integer values whose
		element type is the result/start type. The third operand is the vector mask and
		is a vector of boolean values with the same number of elements as the vector
		operand. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.mul``' intrinsic performs the integer ``MUL`` reduction
		(:ref:`llvm.vector.reduce.mul <int_vector_reduce_mul>`) of the vector operand ``val``
		on each enabled lane, multiplying it by the scalar ``start_value``. Disabled
		lanes are treated as containing the neutral value ``1`` (i.e. having no effect
		craig.topperUnsubmitted Done Reply Inline Actions vale -> value craig.topper: vale -> value
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Nice spot frasercrmck: Nice spot
		on the reduction operation). If the vector length is zero, the result is the
		start value.

		To ignore the start value, the neutral value can be used.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call i32 @llvm.vp.reduce.mul.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
		%reduction = call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> %masked.a)
		%also.r = mul i32 %reduction, %start

		.. _int_vp_reduce_fmul:

		'``llvm.vp.reduce.fmul.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare float @llvm.vp.reduce.fmul.v4f32(float <start_value>, <4 x float> <val>, <4 x i1> <mask>, i32 <vector_length>)
		declare double @llvm.vp.reduce.fmul.nxv8f64(double <start_value>, <vscale x 8 x double> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated floating-point ``MUL`` reduction of a vector and a scalar starting
		value, returning the result as a scalar.


		Arguments:
		""""""""""

		The first operand is the start value of the reduction, which must be a scalar
		floating-point type equal to the result type. The second operand is the vector
		on which the reduction is performed and must be a vector of floating-point
		values whose element type is the result/start type. The third operand is the
		vector mask and is a vector of boolean values with the same number of elements
		as the vector operand. The fourth operand is the explicit vector length of the
		operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.fmul``' intrinsic performs the floating-point ``MUL``
		reduction (:ref:`llvm.vector.reduce.fmul <int_vector_reduce_fmul>`) of the
		vector operand ``val`` on each enabled lane, multiplying it by the scalar
		`start_value``. Disabled lanes are treated as containing the neutral value
		``1.0`` (i.e. having no effect on the reduction operation). If no lanes are
		enabled, the resulting value will be equal to the starting value.

		To ignore the start value, the neutral value can be used.

		See the unpredicated version (:ref:`llvm.vector.reduce.fmul
		<int_vector_reduce_fmul>`) for more detail on the semantics.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call float @llvm.vp.reduce.fmul.v4f32(float %start, <4 x float> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x float> %a, <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>
		%also.r = call float @llvm.vector.reduce.fmul.v4f32(float %start, <4 x float> %masked.a)


		.. _int_vp_reduce_and:

		'``llvm.vp.reduce.and.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare i32 @llvm.vp.reduce.and.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
		declare i16 @llvm.vp.reduce.and.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated integer ``AND`` reduction of a vector and a scalar starting value,
		returning the result as a scalar.
		craig.topperUnsubmitted Done Reply Inline Actions Neutral value is -1 or UINT_MAX for AND craig.topper: Neutral value is -1 or UINT_MAX for AND
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Good spot, thanks. Updated the "equivalent to" section below, too. frasercrmck: Good spot, thanks. Updated the "equivalent to" section below, too.


		Arguments:
		""""""""""

		The first operand is the start value of the reduction, which must be a scalar
		integer type equal to the result type. The second operand is the vector on
		which the reduction is performed and must be a vector of integer values whose
		element type is the result/start type. The third operand is the vector mask and
		is a vector of boolean values with the same number of elements as the vector
		operand. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.and``' intrinsic performs the integer ``AND`` reduction
		(:ref:`llvm.vector.reduce.and <int_vector_reduce_and>`) of the vector operand
		``val`` on each enabled lane, performing an '``and``' of that with with the
		scalar ``start_value``. Disabled lanes are treated as containing the neutral
		value ``UINT_MAX``, or ``-1`` (i.e. having no effect on the reduction
		operation). If the vector length is zero, the result is the start value.

		To ignore the start value, the neutral value can be used.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call i32 @llvm.vp.reduce.and.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
		%reduction = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> %masked.a)
		%also.r = and i32 %reduction, %start


		.. _int_vp_reduce_or:

		'``llvm.vp.reduce.or.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare i32 @llvm.vp.reduce.or.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
		declare i16 @llvm.vp.reduce.or.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated integer ``OR`` reduction of a vector and a scalar starting value,
		returning the result as a scalar.


		Arguments:
		""""""""""

		The first operand is the start value of the reduction, which must be a scalar
		integer type equal to the result type. The second operand is the vector on
		which the reduction is performed and must be a vector of integer values whose
		element type is the result/start type. The third operand is the vector mask and
		is a vector of boolean values with the same number of elements as the vector
		operand. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.or``' intrinsic performs the integer ``OR`` reduction
		(:ref:`llvm.vector.reduce.or <int_vector_reduce_or>`) of the vector operand
		``val`` on each enabled lane, performing an '``or``' of that with the scalar
		``start_value``. Disabled lanes are treated as containing the neutral value
		``0`` (i.e. having no effect on the reduction operation). If the vector length
		is zero, the result is the start value.

		To ignore the start value, the neutral value can be used.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call i32 @llvm.vp.reduce.or.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
		%reduction = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> %masked.a)
		%also.r = or i32 %reduction, %start

		.. _int_vp_reduce_xor:

		'``llvm.vp.reduce.xor.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare i32 @llvm.vp.reduce.xor.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
		declare i16 @llvm.vp.reduce.xor.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated integer ``XOR`` reduction of a vector and a scalar starting value,
		returning the result as a scalar.


		Arguments:
		""""""""""

		The first operand is the start value of the reduction, which must be a scalar
		integer type equal to the result type. The second operand is the vector on
		which the reduction is performed and must be a vector of integer values whose
		element type is the result/start type. The third operand is the vector mask and
		is a vector of boolean values with the same number of elements as the vector
		operand. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.xor``' intrinsic performs the integer ``XOR`` reduction
		(:ref:`llvm.vector.reduce.xor <int_vector_reduce_xor>`) of the vector operand
		``val`` on each enabled lane, performing an '``xor``' of that with the scalar
		``start_value``. Disabled lanes are treated as containing the neutral value
		``0`` (i.e. having no effect on the reduction operation). If the vector length
		is zero, the result is the start value.

		To ignore the start value, the neutral value can be used.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call i32 @llvm.vp.reduce.xor.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
		%reduction = call i32 @llvm.vector.reduce.xor.v4i32(<4 x i32> %masked.a)
		%also.r = xor i32 %reduction, %start


		.. _int_vp_reduce_smax:

		'``llvm.vp.reduce.smax.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare i32 @llvm.vp.reduce.smax.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
		declare i16 @llvm.vp.reduce.smax.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated signed-integer ``MAX`` reduction of a vector and a scalar starting
		value, returning the result as a scalar.


		Arguments:
		""""""""""

		The first operand is the start value of the reduction, which must be a scalar
		integer type equal to the result type. The second operand is the vector on
		which the reduction is performed and must be a vector of integer values whose
		element type is the result/start type. The third operand is the vector mask and
		is a vector of boolean values with the same number of elements as the vector
		operand. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.smax``' intrinsic performs the signed-integer ``MAX``
		reduction (:ref:`llvm.vector.reduce.smax <int_vector_reduce_smax>`) of the
		vector operand ``val`` on each enabled lane, and taking the maximum of that and
		the scalar ``start_value``. Disabled lanes are treated as containing the
		neutral value ``INT_MIN`` (i.e. having no effect on the reduction operation).
		If the vector length is zero, the result is the start value.

		To ignore the start value, the neutral value can be used.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call i8 @llvm.vp.reduce.smax.v4i8(i8 %start, <4 x i8> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x i8> %a, <4 x i8> <i8 -128, i8 -128, i8 -128, i8 -128>
		%reduction = call i8 @llvm.vector.reduce.smax.v4i8(<4 x i8> %masked.a)
		%also.r = call i8 @llvm.smax.i8(i8 %reduction, i8 %start)


		.. _int_vp_reduce_smin:

		'``llvm.vp.reduce.smin.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare i32 @llvm.vp.reduce.smin.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
		declare i16 @llvm.vp.reduce.smin.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated signed-integer ``MIN`` reduction of a vector and a scalar starting
		value, returning the result as a scalar.


		Arguments:
		""""""""""

		The first operand is the start value of the reduction, which must be a scalar
		integer type equal to the result type. The second operand is the vector on
		which the reduction is performed and must be a vector of integer values whose
		element type is the result/start type. The third operand is the vector mask and
		is a vector of boolean values with the same number of elements as the vector
		operand. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.smin``' intrinsic performs the signed-integer ``MIN``
		reduction (:ref:`llvm.vector.reduce.smin <int_vector_reduce_smin>`) of the
		vector operand ``val`` on each enabled lane, and taking the minimum of that and
		the scalar ``start_value``. Disabled lanes are treated as containing the
		neutral value ``INT_MAX`` (i.e. having no effect on the reduction operation).
		If the vector length is zero, the result is the start value.

		To ignore the start value, the neutral value can be used.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call i8 @llvm.vp.reduce.smin.v4i8(i8 %start, <4 x i8> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x i8> %a, <4 x i8> <i8 127, i8 127, i8 127, i8 127>
		%reduction = call i8 @llvm.vector.reduce.smin.v4i8(<4 x i8> %masked.a)
		%also.r = call i8 @llvm.smin.i8(i8 %reduction, i8 %start)


		.. _int_vp_reduce_umax:

		'``llvm.vp.reduce.umax.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare i32 @llvm.vp.reduce.umax.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
		declare i16 @llvm.vp.reduce.umax.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated unsigned-integer ``MAX`` reduction of a vector and a scalar starting
		value, returning the result as a scalar.


		Arguments:
		""""""""""

		The first operand is the start value of the reduction, which must be a scalar
		integer type equal to the result type. The second operand is the vector on
		which the reduction is performed and must be a vector of integer values whose
		element type is the result/start type. The third operand is the vector mask and
		is a vector of boolean values with the same number of elements as the vector
		operand. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.umax``' intrinsic performs the unsigned-integer ``MAX``
		reduction (:ref:`llvm.vector.reduce.umax <int_vector_reduce_umax>`) of the
		vector operand ``val`` on each enabled lane, and taking the maximum of that and
		the scalar ``start_value``. Disabled lanes are treated as containing the
		neutral value ``0`` (i.e. having no effect on the reduction operation). If the
		vector length is zero, the result is the start value.

		To ignore the start value, the neutral value can be used.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call i32 @llvm.vp.reduce.umax.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
		%reduction = call i32 @llvm.vector.reduce.umax.v4i32(<4 x i32> %masked.a)
		%also.r = call i32 @llvm.umax.i32(i32 %reduction, i32 %start)


		.. _int_vp_reduce_umin:

		'``llvm.vp.reduce.umin.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare i32 @llvm.vp.reduce.umin.v4i32(i32 <start_value>, <4 x i32> <val>, <4 x i1> <mask>, i32 <vector_length>)
		declare i16 @llvm.vp.reduce.umin.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated unsigned-integer ``MIN`` reduction of a vector and a scalar starting
		value, returning the result as a scalar.


		Arguments:
		""""""""""

		The first operand is the start value of the reduction, which must be a scalar
		integer type equal to the result type. The second operand is the vector on
		which the reduction is performed and must be a vector of integer values whose
		element type is the result/start type. The third operand is the vector mask and
		is a vector of boolean values with the same number of elements as the vector
		operand. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.umin``' intrinsic performs the unsigned-integer ``MIN``
		reduction (:ref:`llvm.vector.reduce.umin <int_vector_reduce_umin>`) of the
		vector operand ``val`` on each enabled lane, taking the minimum of that and the
		scalar ``start_value``. Disabled lanes are treated as containing the neutral
		value ``UINT_MAX``, or ``-1`` (i.e. having no effect on the reduction
		operation). If the vector length is zero, the result is the start value.

		To ignore the start value, the neutral value can be used.

		Examples:
		"""""""""

		craig.topperUnsubmitted Done Reply Inline Actions Does -QNAN mean anything? Isn't the sign of a NaN mostly ignored. craig.topper: Does -QNAN mean anything? Isn't the sign of a NaN mostly ignored.
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions I have heard that the sign of NaNs is often ignored but I'm not sure it's guaranteed to be ignored? `-QNAN` does have a bit representation and `ConstantFP` and `APFloat` both take a `Negative` parameter to their `getQNAN` methods. It's also what `SelectionDAG::getNeutralElement` is doing for `FMAXNUM`. frasercrmck: I have heard that the sign of NaNs is often ignored but I'm not sure it's guaranteed to be…
		.. code-block:: llvm

		%r = call i32 @llvm.vp.reduce.umin.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
		craig.topperUnsubmitted Done Reply Inline Actions Even if all vector elements are NaN, the result would depend on the start value craig.topper: Even if all vector elements are NaN, the result would depend on the start value
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Yes true. I've replaced that sentence now. frasercrmck: Yes true. I've replaced that sentence now.
		%reduction = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> %masked.a)
		%also.r = call i32 @llvm.umin.i32(i32 %reduction, i32 %start)


		.. _int_vp_reduce_fmax:

		'``llvm.vp.reduce.fmax.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare float @llvm.vp.reduce.fmax.v4f32(float <start_value>, <4 x float> <val>, <4 x i1> <mask>, float <vector_length>)
		declare double @llvm.vp.reduce.fmax.nxv8f64(double <start_value>, <vscale x 8 x double> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated floating-point ``MAX`` reduction of a vector and a scalar starting
		value, returning the result as a scalar.


		Arguments:
		""""""""""

		The first operand is the start value of the reduction, which must be a scalar
		floating-point type equal to the result type. The second operand is the vector
		on which the reduction is performed and must be a vector of floating-point
		values whose element type is the result/start type. The third operand is the
		vector mask and is a vector of boolean values with the same number of elements
		as the vector operand. The fourth operand is the explicit vector length of the
		operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.fmax``' intrinsic performs the floating-point ``MAX``
		reduction (:ref:`llvm.vector.reduce.fmax <int_vector_reduce_fmax>`) of the
		vector operand ``val`` on each enabled lane, taking the maximum of that and the
		scalar ``start_value``. Disabled lanes are treated as containing the neutral
		value (i.e. having no effect on the reduction operation). If the vector length
		is zero, the result is the start value.

		The neutral value is dependent on the :ref:`fast-math flags <fastmath>`. If no
		flags are set, the neutral value is ``-QNAN``. If ``nnan`` and ``ninf`` are
		both set, then the neutral value is the smallest floating-point value for the
		result type. If only ``nnan`` is set then the neutral value is ``-Infinity``.

		This instruction has the same comparison semantics as the
		:ref:`llvm.vector.reduce.fmax <int_vector_reduce_fmax>` intrinsic (and thus the
		'``llvm.maxnum.*``' intrinsic). That is, the result will always be a number
		unless all elements of the vector and the starting value are ``NaN``. For a
		vector with maximum element magnitude ``0.0`` and containing both ``+0.0`` and
		``-0.0`` elements, the sign of the result is unspecified.

		To ignore the start value, the neutral value can be used.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call float @llvm.vp.reduce.fmax.v4f32(float %float, <4 x float> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x float> %a, <4 x float> <float QNAN, float QNAN, float QNAN, float QNAN>
		%reduction = call float @llvm.vector.reduce.fmax.v4f32(<4 x float> %masked.a)
		%also.r = call float @llvm.maxnum.f32(float %reduction, float %start)


		.. _int_vp_reduce_fmin:

		'``llvm.vp.reduce.fmin.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare float @llvm.vp.reduce.fmin.v4f32(float <start_value>, <4 x float> <val>, <4 x i1> <mask>, float <vector_length>)
		declare double @llvm.vp.reduce.fmin.nxv8f64(double <start_value>, <vscale x 8 x double> <val>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated floating-point ``MIN`` reduction of a vector and a scalar starting
		value, returning the result as a scalar.


		Arguments:
		""""""""""

		The first operand is the start value of the reduction, which must be a scalar
		floating-point type equal to the result type. The second operand is the vector
		on which the reduction is performed and must be a vector of floating-point
		values whose element type is the result/start type. The third operand is the
		vector mask and is a vector of boolean values with the same number of elements
		as the vector operand. The fourth operand is the explicit vector length of the
		operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.fmin``' intrinsic performs the floating-point ``MIN``
		reduction (:ref:`llvm.vector.reduce.fmin <int_vector_reduce_fmin>`) of the
		vector operand ``val`` on each enabled lane, taking the minimum of that and the
		scalar ``start_value``. Disabled lanes are treated as containing the neutral
		value (i.e. having no effect on the reduction operation). If the vector length
		is zero, the result is the start value.

		The neutral value is dependent on the :ref:`fast-math flags <fastmath>`. If no
		flags are set, the neutral value is ``+QNAN``. If ``nnan`` and ``ninf`` are
		both set, then the neutral value is the largest floating-point value for the
		result type. If only ``nnan`` is set then the neutral value is ``+Infinity``.

		This instruction has the same comparison semantics as the
		:ref:`llvm.vector.reduce.fmin <int_vector_reduce_fmin>` intrinsic (and thus the
		'``llvm.minnum.*``' intrinsic). That is, the result will always be a number
		unless all elements of the vector and the starting value are ``NaN``. For a
		vector with maximum element magnitude ``0.0`` and containing both ``+0.0`` and
		``-0.0`` elements, the sign of the result is unspecified.

		To ignore the start value, the neutral value can be used.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call float @llvm.vp.reduce.fmin.v4f32(float %start, <4 x float> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x float> %a, <4 x float> <float QNAN, float QNAN, float QNAN, float QNAN>
		%reduction = call float @llvm.vector.reduce.fmin.v4f32(<4 x float> %masked.a)
		%also.r = call float @llvm.minnum.f32(float %reduction, float %start)


.. _int_get_active_lane_mask:		.. _int_get_active_lane_mask:

'``llvm.get.active.lane.mask.*``' Intrinsics		'``llvm.get.active.lane.mask.*``' Intrinsics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""
This is an overloaded intrinsic.		This is an overloaded intrinsic.
▲ Show 20 Lines • Show All 4,123 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IntrinsicInst.h

Show First 20 Lines • Show All 441 Lines • ▼ Show 20 Lines	public:
// Equivalent non-predicated opcode		// Equivalent non-predicated opcode
Optional<unsigned> getFunctionalOpcode() const {		Optional<unsigned> getFunctionalOpcode() const {
return getFunctionalOpcodeForVP(getIntrinsicID());		return getFunctionalOpcodeForVP(getIntrinsicID());
}		}

// Equivalent non-predicated opcode		// Equivalent non-predicated opcode
static Optional<unsigned> getFunctionalOpcodeForVP(Intrinsic::ID ID);		static Optional<unsigned> getFunctionalOpcodeForVP(Intrinsic::ID ID);
};		};

		frasercrmckAuthorUnsubmitted Done Reply Inline Actions This can probably go in `VPReductionIntrinsic` frasercrmck: This can probably go in `VPReductionIntrinsic`
		/// This represents vector predication reduction intrinsics.
		class VPReductionIntrinsic : public VPIntrinsic {
		public:
		static bool isVPReduction(Intrinsic::ID ID);

		unsigned getStartParamPos() const;
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Not sure whether to keep this in or not, now that all vp reductions have start parameter? Currently start is always 0 and vector is always 1. frasercrmck: Not sure whether to keep this in or not, now that all vp reductions have start parameter?
		unsigned getVectorParamPos() const;

		static Optional<unsigned> getStartParamPos(Intrinsic::ID ID);
		static Optional<unsigned> getVectorParamPos(Intrinsic::ID ID);

		/// Methods for support type inquiry through isa, cast, and dyn_cast:
		/// @{
		static bool classof(const IntrinsicInst *I) {
		return VPReductionIntrinsic::isVPReduction(I->getIntrinsicID());
		craig.topperUnsubmitted Done Reply Inline Actions Should this be isVPReduction? craig.topper: Should this be isVPReduction?
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Oh good spot, thanks. I've fixed that and added a unit test for these `VPReductionIntrinsic` methods. frasercrmck: Oh good spot, thanks. I've fixed that and added a unit test for these `VPReductionIntrinsic`…
		}
		static bool classof(const Value *V) {
		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
		}
		/// @}
		};

/// This is the common base class for constrained floating point intrinsics.		/// This is the common base class for constrained floating point intrinsics.
class ConstrainedFPIntrinsic : public IntrinsicInst {		class ConstrainedFPIntrinsic : public IntrinsicInst {
public:		public:
bool isUnaryOp() const;		bool isUnaryOp() const;
bool isTernaryOp() const;		bool isTernaryOp() const;
Optional<RoundingMode> getRoundingMode() const;		Optional<RoundingMode> getRoundingMode() const;
Optional<fp::ExceptionBehavior> getExceptionBehavior() const;		Optional<fp::ExceptionBehavior> getExceptionBehavior() const;
bool isDefaultFPEnvironment() const;		bool isDefaultFPEnvironment() const;
▲ Show 20 Lines • Show All 851 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 1,502 Lines • ▼ Show 20 Lines	def int_vp_fdiv : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],
llvm_i32_ty]>;		llvm_i32_ty]>;
def int_vp_frem : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],		def int_vp_frem : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],
[ LLVMMatchType<0>,		[ LLVMMatchType<0>,
LLVMMatchType<0>,		LLVMMatchType<0>,
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
llvm_i32_ty]>;		llvm_i32_ty]>;
}		}

		// Reductions
		let IntrProperties = [IntrSpeculatable, IntrNoMem, IntrNoSync, IntrWillReturn] in {
		def int_vp_reduce_fadd : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_fmul : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_add : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_mul : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_and : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_or : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_xor : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_smax : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_smin : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_umax : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_umin : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_fmax : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_fmin : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		}

def int_get_active_lane_mask:		def int_get_active_lane_mask:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],		DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_anyint_ty, LLVMMatchType<1>],		[llvm_anyint_ty, LLVMMatchType<1>],
[IntrNoMem, IntrNoSync, IntrWillReturn]>;		[IntrNoMem, IntrNoSync, IntrWillReturn]>;

//===-------------------------- Masked Intrinsics -------------------------===//		//===-------------------------- Masked Intrinsics -------------------------===//
//		//
def int_masked_load:		def int_masked_load:
▲ Show 20 Lines • Show All 275 Lines • Show Last 20 Lines

llvm/include/llvm/IR/VPIntrinsics.def

	Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
	#endif			#endif

	// This VP Intrinsic is a memory operation			// This VP Intrinsic is a memory operation
	// The pointer arg is at POINTERPOS and the data arg is at DATAPOS.			// The pointer arg is at POINTERPOS and the data arg is at DATAPOS.
	#ifndef HANDLE_VP_IS_MEMOP			#ifndef HANDLE_VP_IS_MEMOP
	#define HANDLE_VP_IS_MEMOP(VPID, POINTERPOS, DATAPOS)			#define HANDLE_VP_IS_MEMOP(VPID, POINTERPOS, DATAPOS)
	#endif			#endif

				// Map this VP reduction intrinsic to its reduction operand positions.
				#ifndef HANDLE_VP_REDUCTION
				#define HANDLE_VP_REDUCTION(ID, STARTPOS, VECTORPOS)
				#endif

	/// } Property Macros			/// } Property Macros

	///// Integer Arithmetic {			///// Integer Arithmetic {

	// Specialized helper macro for integer binary operators (%x, %y, %mask, %evl).			// Specialized helper macro for integer binary operators (%x, %y, %mask, %evl).
	#ifdef HELPER_REGISTER_BINARY_INT_VP			#ifdef HELPER_REGISTER_BINARY_INT_VP
	#error "The internal helper macro HELPER_REGISTER_BINARY_INT_VP is already defined!"			#error "The internal helper macro HELPER_REGISTER_BINARY_INT_VP is already defined!"
	#endif			#endif
	▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
	// llvm.vp.gather(ptr,mask,vlen)			// llvm.vp.gather(ptr,mask,vlen)
	BEGIN_REGISTER_VP(vp_gather, 1, 2, VP_GATHER, -1)			BEGIN_REGISTER_VP(vp_gather, 1, 2, VP_GATHER, -1)
	HANDLE_VP_TO_INTRIN(masked_gather)			HANDLE_VP_TO_INTRIN(masked_gather)
	HANDLE_VP_IS_MEMOP(vp_gather, 0, None)			HANDLE_VP_IS_MEMOP(vp_gather, 0, None)
	END_REGISTER_VP(vp_gather, VP_GATHER)			END_REGISTER_VP(vp_gather, VP_GATHER)

	///// } Memory Operations			///// } Memory Operations

				///// Reductions {

				// Specialized helper macro for VP reductions (%start, %x, %mask, %evl).
				craig.topperUnsubmitted Done Reply Inline Actions Is this comment still accurate with start value? craig.topper: Is this comment still accurate with start value?
				frasercrmckAuthorUnsubmitted Done Reply Inline Actions Nope, but it is now, thanks for catching that. frasercrmck: Nope, but it is now, thanks for catching that.
				#ifdef HELPER_REGISTER_REDUCTION_VP
				#error "The internal helper macro HELPER_REGISTER_REDUCTION_VP is already defined!"
				#endif
				#define HELPER_REGISTER_REDUCTION_VP(VPINTRIN, SDOPC, INTRIN) \
				BEGIN_REGISTER_VP(VPINTRIN, 2, 3, SDOPC, -1) \
				HANDLE_VP_TO_INTRIN(INTRIN) \
				HANDLE_VP_REDUCTION(VPINTRIN, 0, 1) \
				END_REGISTER_VP(VPINTRIN, SDOPC)

				// llvm.vp.reduce.add(start,x,mask,vlen)
				craig.topperUnsubmitted Done Reply Inline Actions Why accu instead of start? craig.topper: Why accu instead of start?
				frasercrmckAuthorUnsubmitted Done Reply Inline Actions I think the reference patch uses/used `accu` so I was flip-flopping between the two. I agree that "start" is better though. frasercrmck: I think the reference patch uses/used `accu` so I was flip-flopping between the two. I agree…
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_add, VP_REDUCE_ADD,
				experimental_vector_reduce_add)

				// llvm.vp.reduce.mul(start,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_mul, VP_REDUCE_MUL,
				experimental_vector_reduce_mul)

				// llvm.vp.reduce.and(start,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_and, VP_REDUCE_AND,
				experimental_vector_reduce_and)

				// llvm.vp.reduce.or(start,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_or, VP_REDUCE_OR,
				experimental_vector_reduce_or)

				// llvm.vp.reduce.xor(start,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_xor, VP_REDUCE_XOR,
				experimental_vector_reduce_xor)

				// llvm.vp.reduce.smax(start,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_smax, VP_REDUCE_SMAX,
				experimental_vector_reduce_smax)

				// llvm.vp.reduce.smin(start,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_smin, VP_REDUCE_SMIN,
				experimental_vector_reduce_smin)

				// llvm.vp.reduce.umax(start,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_umax, VP_REDUCE_UMAX,
				experimental_vector_reduce_umax)

				// llvm.vp.reduce.umin(start,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_umin, VP_REDUCE_UMIN,
				experimental_vector_reduce_umin)

				// llvm.vp.reduce.fmax(start,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_fmax, VP_REDUCE_FMAX,
				experimental_vector_reduce_fmax)

				// llvm.vp.reduce.fmin(start,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_fmin, VP_REDUCE_FMIN,
				experimental_vector_reduce_fmin)

				#undef HELPER_REGISTER_REDUCTION_VP

				// Specialized helper macro for VP reductions as above but with two forms:
				craig.topperUnsubmitted Done Reply Inline Actions Should this be %acc or %start? craig.topper: Should this be %acc or %start?
				frasercrmckAuthorUnsubmitted Done Reply Inline Actions Aye it should be `%start` but I've reworked the documentation anyway; this specialized macro is now really for reductions with separate "seq" forms. Cheers. frasercrmck: Aye it should be `%start` but I've reworked the documentation anyway; this specialized macro is…
				// sequential and reassociative. These manifest as the presence of 'reassoc'
				// fast-math flags in the IR and as two distinct ISD opcodes in the
				// SelectionDAG.
				#ifdef HELPER_REGISTER_REDUCTION_SEQ_VP
				#error "The internal helper macro HELPER_REGISTER_REDUCTION_SEQ_VP is already defined!"
				#endif
				#define HELPER_REGISTER_REDUCTION_SEQ_VP(VPINTRIN, SDOPC, SEQ_SDOPC, INTRIN) \
				BEGIN_REGISTER_VP_INTRINSIC(VPINTRIN, 2, 3) \
				BEGIN_REGISTER_VP_SDNODE(SDOPC, -1, VPINTRIN, 2, 3) \
				END_REGISTER_VP_SDNODE(SDOPC) \
				BEGIN_REGISTER_VP_SDNODE(SEQ_SDOPC, -1, VPINTRIN, 2, 3) \
				END_REGISTER_VP_SDNODE(SEQ_SDOPC) \
				HANDLE_VP_TO_INTRIN(INTRIN) \
				HANDLE_VP_REDUCTION(VPINTRIN, 0, 1) \
				END_REGISTER_VP_INTRINSIC(VPINTRIN)

				craig.topperUnsubmitted Done Reply Inline Actions Should this be lined up with vp_reduce_fadd on the line above? craig.topper: Should this be lined up with vp_reduce_fadd on the line above?
				frasercrmckAuthorUnsubmitted Done Reply Inline Actions Done, cheers. frasercrmck: Done, cheers.
				// llvm.vp.reduce.fadd(start,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_SEQ_VP(vp_reduce_fadd, VP_REDUCE_FADD,
				VP_REDUCE_SEQ_FADD,
				experimental_vector_reduce_fadd)

				// llvm.vp.reduce.fmul(start,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_SEQ_VP(vp_reduce_fmul, VP_REDUCE_FMUL,
				VP_REDUCE_SEQ_FMUL,
				experimental_vector_reduce_fmul)

				#undef HELPER_REGISTER_REDUCTION_SEQ_VP

				///// } Reduction

	#undef BEGIN_REGISTER_VP			#undef BEGIN_REGISTER_VP
	#undef BEGIN_REGISTER_VP_INTRINSIC			#undef BEGIN_REGISTER_VP_INTRINSIC
	#undef BEGIN_REGISTER_VP_SDNODE			#undef BEGIN_REGISTER_VP_SDNODE
	#undef END_REGISTER_VP			#undef END_REGISTER_VP
	#undef END_REGISTER_VP_INTRINSIC			#undef END_REGISTER_VP_INTRINSIC
	#undef END_REGISTER_VP_SDNODE			#undef END_REGISTER_VP_SDNODE
	#undef HANDLE_VP_TO_OPC			#undef HANDLE_VP_TO_OPC
	#undef HANDLE_VP_TO_CONSTRAINEDFP			#undef HANDLE_VP_TO_CONSTRAINEDFP
	#undef HANDLE_VP_TO_INTRIN			#undef HANDLE_VP_TO_INTRIN
	#undef HANDLE_VP_IS_MEMOP			#undef HANDLE_VP_IS_MEMOP
				#undef HANDLE_VP_REDUCTION

llvm/lib/CodeGen/ExpandVectorPredication.cpp

Show All 18 Lines
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
		craig.topperUnsubmitted Done Reply Inline Actions What did we start using from the Operator.h? I couldn't spot it. craig.topper: What did we start using from the Operator.h? I couldn't spot it.
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Ah, nice; must have been an artifact from an intermediate change. That's that gone now. frasercrmck: Ah, nice; must have been an artifact from an intermediate change. That's that gone now.
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"

using namespace llvm;		using namespace llvm;

▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	struct CachingVPExpander {
/// "Remove" the %evl parameter of \p PI by setting it to the static vector		/// "Remove" the %evl parameter of \p PI by setting it to the static vector
/// length of the operation.		/// length of the operation.
void discardEVLParameter(VPIntrinsic &PI);		void discardEVLParameter(VPIntrinsic &PI);

/// \brief Lower this VP binary operator to a unpredicated binary operator.		/// \brief Lower this VP binary operator to a unpredicated binary operator.
Value *expandPredicationInBinaryOperator(IRBuilder<> &Builder,		Value *expandPredicationInBinaryOperator(IRBuilder<> &Builder,
VPIntrinsic &PI);		VPIntrinsic &PI);

		/// \brief Lower this VP reduction to a call to an unpredicated reduction
		/// intrinsic.
		Value *expandPredicationInReduction(IRBuilder<> &Builder,
		VPReductionIntrinsic &PI);

/// \brief Query TTI and expand the vector predication in \p P accordingly.		/// \brief Query TTI and expand the vector predication in \p P accordingly.
Value *expandPredication(VPIntrinsic &PI);		Value *expandPredication(VPIntrinsic &PI);

/// \brief Determine how and whether the VPIntrinsic \p VPI shall be		/// \brief Determine how and whether the VPIntrinsic \p VPI shall be
/// expanded. This overrides TTI with the cl::opts listed at the top of this		/// expanded. This overrides TTI with the cl::opts listed at the top of this
/// file.		/// file.
VPLegalization getVPLegalizationStrategy(const VPIntrinsic &VPI) const;		VPLegalization getVPLegalizationStrategy(const VPIntrinsic &VPI) const;
bool UsingTTIOverrides;		bool UsingTTIOverrides;
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	CachingVPExpander::expandPredicationInBinaryOperator(IRBuilder<> &Builder,
}		}

Value *NewBinOp = Builder.CreateBinOp(OC, Op0, Op1, VPI.getName());		Value *NewBinOp = Builder.CreateBinOp(OC, Op0, Op1, VPI.getName());

replaceOperation(*NewBinOp, VPI);		replaceOperation(*NewBinOp, VPI);
return NewBinOp;		return NewBinOp;
}		}

		static Value *getNeutralReductionElement(const VPReductionIntrinsic &VPI,
		Type *EltTy) {
		bool Negative = false;
		unsigned EltBits = EltTy->getScalarSizeInBits();
		switch (VPI.getIntrinsicID()) {
		default:
		llvm_unreachable("Expecting a VP reduction intrinsic");
		case Intrinsic::vp_reduce_add:
		craig.topperUnsubmitted Not Done Reply Inline Actions I was wondering if we could go through ConstantExpr::getBinOpIdentity to avoid repeating the constants in multiple places but we'd still need special handling for min/max craig.topper: I was wondering if we could go through ConstantExpr::getBinOpIdentity to avoid repeating the…
		frasercrmckAuthorUnsubmitted Not Done Reply Inline Actions Coming at it from a slightly different angle, I was wondering if there should be a single source of truth for neutral reduction elements between IR and SelectionDAG. frasercrmck: Coming at it from a slightly different angle, I was wondering if there should be a single…
		case Intrinsic::vp_reduce_or:
		case Intrinsic::vp_reduce_xor:
		case Intrinsic::vp_reduce_umax:
		return Constant::getNullValue(EltTy);
		case Intrinsic::vp_reduce_mul:
		return ConstantInt::get(EltTy, 1, /IsSigned/ false);
		case Intrinsic::vp_reduce_and:
		case Intrinsic::vp_reduce_umin:
		return ConstantInt::getAllOnesValue(EltTy);
		case Intrinsic::vp_reduce_smin:
		return ConstantInt::get(EltTy->getContext(),
		APInt::getSignedMaxValue(EltBits));
		case Intrinsic::vp_reduce_smax:
		return ConstantInt::get(EltTy->getContext(),
		APInt::getSignedMinValue(EltBits));
		case Intrinsic::vp_reduce_fmax:
		Negative = true;
		LLVM_FALLTHROUGH;
		case Intrinsic::vp_reduce_fmin: {
		FastMathFlags Flags = VPI.getFastMathFlags();
		const fltSemantics &Semantics = EltTy->getFltSemantics();
		return !Flags.noNaNs() ? ConstantFP::getQNaN(EltTy, Negative)
		: !Flags.noInfs()
		? ConstantFP::getInfinity(EltTy, Negative)
		: ConstantFP::get(EltTy,
		APFloat::getLargest(Semantics, Negative));
		}
		case Intrinsic::vp_reduce_fadd:
		return ConstantFP::getNegativeZero(EltTy);
		case Intrinsic::vp_reduce_fmul:
		return ConstantFP::get(EltTy, 1.0);
		}
		}

		Value *
		CachingVPExpander::expandPredicationInReduction(IRBuilder<> &Builder,
		VPReductionIntrinsic &VPI) {
		assert((isSafeToSpeculativelyExecute(&VPI) \|\|
		VPI.canIgnoreVectorLengthParam()) &&
		"Implicitly dropping %evl in non-speculatable operator!");

		Value *Mask = VPI.getMaskParam();
		Value *RedOp = VPI.getOperand(VPI.getVectorParamPos());

		// Insert neutral element in masked-out positions
		if (Mask && !isAllTrueMask(Mask)) {
		auto *NeutralElt = getNeutralReductionElement(VPI, VPI.getType());
		auto *NeutralVector = Builder.CreateVectorSplat(
		cast<VectorType>(RedOp->getType())->getElementCount(), NeutralElt);
		RedOp = Builder.CreateSelect(Mask, RedOp, NeutralVector);
		}

		Value *Reduction;
		Value *Start = VPI.getOperand(VPI.getStartParamPos());

		switch (VPI.getIntrinsicID()) {
		default:
		llvm_unreachable("Impossible reduction kind");
		case Intrinsic::vp_reduce_add:
		Reduction = Builder.CreateAddReduce(RedOp);
		Reduction = Builder.CreateAdd(Reduction, Start);
		break;
		case Intrinsic::vp_reduce_mul:
		Reduction = Builder.CreateMulReduce(RedOp);
		Reduction = Builder.CreateMul(Reduction, Start);
		break;
		case Intrinsic::vp_reduce_and:
		Reduction = Builder.CreateAndReduce(RedOp);
		Reduction = Builder.CreateAnd(Reduction, Start);
		break;
		case Intrinsic::vp_reduce_or:
		Reduction = Builder.CreateOrReduce(RedOp);
		Reduction = Builder.CreateOr(Reduction, Start);
		break;
		case Intrinsic::vp_reduce_xor:
		Reduction = Builder.CreateXorReduce(RedOp);
		Reduction = Builder.CreateXor(Reduction, Start);
		break;
		case Intrinsic::vp_reduce_smax:
		Reduction = Builder.CreateIntMaxReduce(RedOp, /IsSigned/ true);
		Reduction =
		Builder.CreateBinaryIntrinsic(Intrinsic::smax, Reduction, Start);
		break;
		case Intrinsic::vp_reduce_smin:
		Reduction = Builder.CreateIntMinReduce(RedOp, /IsSigned/ true);
		Reduction =
		Builder.CreateBinaryIntrinsic(Intrinsic::smin, Reduction, Start);
		break;
		case Intrinsic::vp_reduce_umax:
		Reduction = Builder.CreateIntMaxReduce(RedOp, /IsSigned/ false);
		Reduction =
		Builder.CreateBinaryIntrinsic(Intrinsic::umax, Reduction, Start);
		break;
		case Intrinsic::vp_reduce_umin:
		Reduction = Builder.CreateIntMinReduce(RedOp, /IsSigned/ false);
		Reduction =
		Builder.CreateBinaryIntrinsic(Intrinsic::umin, Reduction, Start);
		break;
		case Intrinsic::vp_reduce_fmax:
		Reduction = Builder.CreateFPMaxReduce(RedOp);
		transferDecorations(*Reduction, VPI);
		Reduction =
		Builder.CreateBinaryIntrinsic(Intrinsic::maxnum, Reduction, Start);
		break;
		case Intrinsic::vp_reduce_fmin:
		Reduction = Builder.CreateFPMinReduce(RedOp);
		transferDecorations(*Reduction, VPI);
		Reduction =
		Builder.CreateBinaryIntrinsic(Intrinsic::minnum, Reduction, Start);
		break;
		case Intrinsic::vp_reduce_fadd:
		Reduction = Builder.CreateFAddReduce(Start, RedOp);
		craig.topperUnsubmitted Done Reply Inline Actions Is the documentation for this in IRBuilder incorrect? It says the accumulator is for ordered reductions. Is it also used for unordered? /// Create a vector fadd reduction intrinsic of the source vector. /// The first parameter is a scalar accumulator value for ordered reductions. CallInst CreateFAddReduce(Value Acc, Value Src); /// Create a vector fmul reduction intrinsic of the source vector. /// The first parameter is a scalar accumulator value for ordered reductions. CallInst CreateFMulReduce(Value Acc, Value Src); craig.topper: Is the documentation for this in IRBuilder incorrect? It says the accumulator is for ordered…
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Yeah so the unordered reduction intrinsics do indeed have an accumulator, so this is misleading. However, since the only difference between ordered and unordered intrinsics is the presence of the `reassoc` flag, this method always creates an ordered reduction. It's up to the user to add the flag later. That's what we do in `replaceOperation` by transferring any existing fast-math flags on to the expanded reduction. From the SelectionDAG's perspective the unordered reductions don't have an accumulator, because it's split out early in the SelectionDAGBuilder. I wonder if that's where the confusion came from. I think I still support addressing this documentation though. I can do that in a separate patch as it's likely to spawn discussion. frasercrmck: Yeah so the unordered reduction intrinsics do indeed have an accumulator, so this is misleading.
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions I tried to address this in D107753. frasercrmck: I tried to address this in D107753.
		break;
		case Intrinsic::vp_reduce_fmul:
		Reduction = Builder.CreateFMulReduce(Start, RedOp);
		break;
		}

		replaceOperation(*Reduction, VPI);
		return Reduction;
		}

void CachingVPExpander::discardEVLParameter(VPIntrinsic &VPI) {		void CachingVPExpander::discardEVLParameter(VPIntrinsic &VPI) {
LLVM_DEBUG(dbgs() << "Discard EVL parameter in " << VPI << "\n");		LLVM_DEBUG(dbgs() << "Discard EVL parameter in " << VPI << "\n");

if (VPI.canIgnoreVectorLengthParam())		if (VPI.canIgnoreVectorLengthParam())
return;		return;

Value *EVLParam = VPI.getVectorLengthParam();		Value *EVLParam = VPI.getVectorLengthParam();
if (!EVLParam)		if (!EVLParam)
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	Value *CachingVPExpander::expandPredication(VPIntrinsic &VPI) {
IRBuilder<> Builder(&VPI);		IRBuilder<> Builder(&VPI);

// Try lowering to a LLVM instruction first.		// Try lowering to a LLVM instruction first.
auto OC = VPI.getFunctionalOpcode();		auto OC = VPI.getFunctionalOpcode();

if (OC && Instruction::isBinaryOp(*OC))		if (OC && Instruction::isBinaryOp(*OC))
return expandPredicationInBinaryOperator(Builder, VPI);		return expandPredicationInBinaryOperator(Builder, VPI);

		if (auto *VPRI = dyn_cast<VPReductionIntrinsic>(&VPI))
		return expandPredicationInReduction(Builder, *VPRI);

return &VPI;		return &VPI;
}		}

//// } CachingVPExpander		//// } CachingVPExpander

struct TransformJob {		struct TransformJob {
VPIntrinsic *PI;		VPIntrinsic *PI;
TargetTransformInfo::VPLegalization Strategy;		TargetTransformInfo::VPLegalization Strategy;
▲ Show 20 Lines • Show All 138 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 7,348 Lines • ▼ Show 20 Lines
	#define END_REGISTER_VP_INTRINSIC(...) break;			#define END_REGISTER_VP_INTRINSIC(...) break;
	#include "llvm/IR/VPIntrinsics.def"			#include "llvm/IR/VPIntrinsics.def"
	}			}

	if (!ResOPC.hasValue())			if (!ResOPC.hasValue())
	llvm_unreachable(			llvm_unreachable(
	"Inconsistency: no SDNode available for this VPIntrinsic!");			"Inconsistency: no SDNode available for this VPIntrinsic!");

				if (*ResOPC == ISD::VP_REDUCE_SEQ_FADD \|\|
				*ResOPC == ISD::VP_REDUCE_SEQ_FMUL) {
				if (VPIntrin.getFastMathFlags().allowReassoc())
				return *ResOPC == ISD::VP_REDUCE_SEQ_FADD ? ISD::VP_REDUCE_FADD
				: ISD::VP_REDUCE_FMUL;
				}

	return ResOPC.getValue();			return ResOPC.getValue();
	}			}

	void SelectionDAGBuilder::visitVectorPredicationIntrinsic(			void SelectionDAGBuilder::visitVectorPredicationIntrinsic(
	const VPIntrinsic &VPIntrin) {			const VPIntrinsic &VPIntrin) {
	SDLoc DL = getCurSDLoc();			SDLoc DL = getCurSDLoc();
	unsigned Opcode = getISDForVPIntrinsic(VPIntrin);			unsigned Opcode = getISDForVPIntrinsic(VPIntrin);

	▲ Show 20 Lines • Show All 3,816 Lines • Show Last 20 Lines

llvm/lib/IR/IntrinsicInst.cpp

Show First 20 Lines • Show All 467 Lines • ▼ Show 20 Lines	bool VPIntrinsic::canIgnoreVectorLengthParam() const {
return false;		return false;
}		}

Function VPIntrinsic::getDeclarationForParams(Module M, Intrinsic::ID VPID,		Function VPIntrinsic::getDeclarationForParams(Module M, Intrinsic::ID VPID,
ArrayRef<Value *> Params) {		ArrayRef<Value *> Params) {
assert(isVPIntrinsic(VPID) && "not a VP intrinsic");		assert(isVPIntrinsic(VPID) && "not a VP intrinsic");
Function *VPFunc;		Function *VPFunc;
switch (VPID) {		switch (VPID) {
default:		default: {
VPFunc = Intrinsic::getDeclaration(M, VPID, Params[0]->getType());		Type *OverloadTy = Params[0]->getType();
		if (VPReductionIntrinsic::isVPReduction(VPID))
		OverloadTy =
		Params[*VPReductionIntrinsic::getVectorParamPos(VPID)]->getType();

		VPFunc = Intrinsic::getDeclaration(M, VPID, OverloadTy);
break;		break;
		}
case Intrinsic::vp_load:		case Intrinsic::vp_load:
VPFunc = Intrinsic::getDeclaration(		VPFunc = Intrinsic::getDeclaration(
M, VPID,		M, VPID,
{Params[0]->getType()->getPointerElementType(), Params[0]->getType()});		{Params[0]->getType()->getPointerElementType(), Params[0]->getType()});
break;		break;
case Intrinsic::vp_gather:		case Intrinsic::vp_gather:
VPFunc = Intrinsic::getDeclaration(		VPFunc = Intrinsic::getDeclaration(
M, VPID,		M, VPID,
Show All 12 Lines	case Intrinsic::vp_scatter:
VPFunc = Intrinsic::getDeclaration(		VPFunc = Intrinsic::getDeclaration(
M, VPID, {Params[0]->getType(), Params[1]->getType()});		M, VPID, {Params[0]->getType(), Params[1]->getType()});
break;		break;
}		}
assert(VPFunc && "Could not declare VP intrinsic");		assert(VPFunc && "Could not declare VP intrinsic");
return VPFunc;		return VPFunc;
}		}

		bool VPReductionIntrinsic::isVPReduction(Intrinsic::ID ID) {
		switch (ID) {
		default:
		return false;
		#define HANDLE_VP_REDUCTION(VPID, STARTPOS, VECTORPOS) \
		case Intrinsic::VPID: \
		break;
		#include "llvm/IR/VPIntrinsics.def"
		}
		return true;
		}

		unsigned VPReductionIntrinsic::getVectorParamPos() const {
		return *VPReductionIntrinsic::getVectorParamPos(getIntrinsicID());
		}

		unsigned VPReductionIntrinsic::getStartParamPos() const {
		return *VPReductionIntrinsic::getStartParamPos(getIntrinsicID());
		}

		Optional<unsigned> VPReductionIntrinsic::getVectorParamPos(Intrinsic::ID ID) {
		switch (ID) {
		#define HANDLE_VP_REDUCTION(VPID, STARTPOS, VECTORPOS) \
		case Intrinsic::VPID: \
		return VECTORPOS;
		#include "llvm/IR/VPIntrinsics.def"
		default:
		return None;
		}
		}

		Optional<unsigned> VPReductionIntrinsic::getStartParamPos(Intrinsic::ID ID) {
		switch (ID) {
		#define HANDLE_VP_REDUCTION(VPID, STARTPOS, VECTORPOS) \
		case Intrinsic::VPID: \
		return STARTPOS;
		#include "llvm/IR/VPIntrinsics.def"
		default:
		return None;
		}
		}

Instruction::BinaryOps BinaryOpIntrinsic::getBinaryOp() const {		Instruction::BinaryOps BinaryOpIntrinsic::getBinaryOp() const {
switch (getIntrinsicID()) {		switch (getIntrinsicID()) {
case Intrinsic::uadd_with_overflow:		case Intrinsic::uadd_with_overflow:
case Intrinsic::sadd_with_overflow:		case Intrinsic::sadd_with_overflow:
case Intrinsic::uadd_sat:		case Intrinsic::uadd_sat:
case Intrinsic::sadd_sat:		case Intrinsic::sadd_sat:
return Instruction::Add;		return Instruction::Add;
case Intrinsic::usub_with_overflow:		case Intrinsic::usub_with_overflow:
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/CodeGen/Generic/expand-vp.ll

Show All 19 Lines
declare <8 x i32> @llvm.vp.urem.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.urem.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
; Bit arith		; Bit arith
declare <8 x i32> @llvm.vp.and.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.and.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.xor.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.xor.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.or.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.or.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.shl.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.shl.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
		; Reductions
		declare i32 @llvm.vp.reduce.add.v4i32(i32, <4 x i32>, <4 x i1>, i32)
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Note that I've still got some reductions to add here but I feel the patch itself is good enough to start reviewing. frasercrmck: Note that I've still got some reductions to add here but I feel the patch itself is good enough…
		declare i32 @llvm.vp.reduce.mul.v4i32(i32, <4 x i32>, <4 x i1>, i32)
		declare i32 @llvm.vp.reduce.and.v4i32(i32, <4 x i32>, <4 x i1>, i32)
		declare i32 @llvm.vp.reduce.or.v4i32(i32, <4 x i32>, <4 x i1>, i32)
		declare i32 @llvm.vp.reduce.xor.v4i32(i32, <4 x i32>, <4 x i1>, i32)
		declare i32 @llvm.vp.reduce.smin.v4i32(i32, <4 x i32>, <4 x i1>, i32)
		declare i32 @llvm.vp.reduce.smax.v4i32(i32, <4 x i32>, <4 x i1>, i32)
		declare i32 @llvm.vp.reduce.umin.v4i32(i32, <4 x i32>, <4 x i1>, i32)
		declare i32 @llvm.vp.reduce.umax.v4i32(i32, <4 x i32>, <4 x i1>, i32)
		declare float @llvm.vp.reduce.fmin.v4f32(float, <4 x float>, <4 x i1>, i32)
		declare float @llvm.vp.reduce.fmax.v4f32(float, <4 x float>, <4 x i1>, i32)
		declare float @llvm.vp.reduce.fadd.v4f32(float, <4 x float>, <4 x i1>, i32)
		declare float @llvm.vp.reduce.fmul.v4f32(float, <4 x float>, <4 x i1>, i32)

; Fixed vector test function.		; Fixed vector test function.
define void @test_vp_int_v8(<8 x i32> %i0, <8 x i32> %i1, <8 x i32> %i2, <8 x i32> %f3, <8 x i1> %m, i32 %n) {		define void @test_vp_int_v8(<8 x i32> %i0, <8 x i32> %i1, <8 x i32> %i2, <8 x i32> %f3, <8 x i1> %m, i32 %n) {
%r0 = call <8 x i32> @llvm.vp.add.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)		%r0 = call <8 x i32> @llvm.vp.add.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
%r1 = call <8 x i32> @llvm.vp.sub.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)		%r1 = call <8 x i32> @llvm.vp.sub.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
%r2 = call <8 x i32> @llvm.vp.mul.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)		%r2 = call <8 x i32> @llvm.vp.mul.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
%r3 = call <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)		%r3 = call <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
%r4 = call <8 x i32> @llvm.vp.srem.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)		%r4 = call <8 x i32> @llvm.vp.srem.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
Show All 37 Lines	define void @test_vp_int_vscale(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i32> %i2, <vscale x 4 x i32> %f3, <vscale x 4 x i1> %m, i32 %n) {
%r7 = call <vscale x 4 x i32> @llvm.vp.and.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		%r7 = call <vscale x 4 x i32> @llvm.vp.and.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
%r8 = call <vscale x 4 x i32> @llvm.vp.or.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		%r8 = call <vscale x 4 x i32> @llvm.vp.or.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
%r9 = call <vscale x 4 x i32> @llvm.vp.xor.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		%r9 = call <vscale x 4 x i32> @llvm.vp.xor.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
%rA = call <vscale x 4 x i32> @llvm.vp.ashr.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		%rA = call <vscale x 4 x i32> @llvm.vp.ashr.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
%rB = call <vscale x 4 x i32> @llvm.vp.lshr.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		%rB = call <vscale x 4 x i32> @llvm.vp.lshr.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
%rC = call <vscale x 4 x i32> @llvm.vp.shl.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		%rC = call <vscale x 4 x i32> @llvm.vp.shl.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
ret void		ret void
}		}

		; Fixed vector reduce test function.
		define void @test_vp_reduce_int_v4(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n) {
		%r0 = call i32 @llvm.vp.reduce.add.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		%r1 = call i32 @llvm.vp.reduce.mul.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		%r2 = call i32 @llvm.vp.reduce.and.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		%r3 = call i32 @llvm.vp.reduce.or.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		%r4 = call i32 @llvm.vp.reduce.xor.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		%r5 = call i32 @llvm.vp.reduce.smin.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		%r6 = call i32 @llvm.vp.reduce.smax.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		%r7 = call i32 @llvm.vp.reduce.umin.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		%r8 = call i32 @llvm.vp.reduce.umax.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		ret void
		}

		define void @test_vp_reduce_fp_v4(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n) {
		%r0 = call float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		%r1 = call nnan float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		%r2 = call nnan ninf float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		%r3 = call float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		%r4 = call nnan float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		%r5 = call nnan ninf float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		%r6 = call float @llvm.vp.reduce.fadd.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		%r7 = call reassoc float @llvm.vp.reduce.fadd.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		%r8 = call float @llvm.vp.reduce.fmul.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		%r9 = call reassoc float @llvm.vp.reduce.fmul.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		ret void
		}

; All VP intrinsics have to be lowered into non-VP ops		; All VP intrinsics have to be lowered into non-VP ops
; Convert %evl into %mask for non-speculatable VP intrinsics and emit the		; Convert %evl into %mask for non-speculatable VP intrinsics and emit the
; instruction+select idiom with a non-VP SIMD instruction.		; instruction+select idiom with a non-VP SIMD instruction.
;		;
; ALL-CONVERT-NOT: {{call.* @llvm.vp.add}}		; ALL-CONVERT-NOT: {{call.* @llvm.vp.add}}
; ALL-CONVERT-NOT: {{call.* @llvm.vp.sub}}		; ALL-CONVERT-NOT: {{call.* @llvm.vp.sub}}
; ALL-CONVERT-NOT: {{call.* @llvm.vp.mul}}		; ALL-CONVERT-NOT: {{call.* @llvm.vp.mul}}
; ALL-CONVERT-NOT: {{call.* @llvm.vp.sdiv}}		; ALL-CONVERT-NOT: {{call.* @llvm.vp.sdiv}}
Show All 27 Lines
; ALL-CONVERT-NEXT: %{{.+}} = or <8 x i32> %i0, %i1		; ALL-CONVERT-NEXT: %{{.+}} = or <8 x i32> %i0, %i1
; ALL-CONVERT-NEXT: %{{.+}} = xor <8 x i32> %i0, %i1		; ALL-CONVERT-NEXT: %{{.+}} = xor <8 x i32> %i0, %i1
; ALL-CONVERT-NEXT: %{{.+}} = ashr <8 x i32> %i0, %i1		; ALL-CONVERT-NEXT: %{{.+}} = ashr <8 x i32> %i0, %i1
; ALL-CONVERT-NEXT: %{{.+}} = lshr <8 x i32> %i0, %i1		; ALL-CONVERT-NEXT: %{{.+}} = lshr <8 x i32> %i0, %i1
; ALL-CONVERT-NEXT: %{{.+}} = shl <8 x i32> %i0, %i1		; ALL-CONVERT-NEXT: %{{.+}} = shl <8 x i32> %i0, %i1
; ALL-CONVERT: ret void		; ALL-CONVERT: ret void


		; Check that reductions use the correct neutral element for masked-off elements
		; ALL-CONVERT: define void @test_vp_reduce_int_v4(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n) {
		; ALL-CONVERT-NEXT: [[ADD:%.+]] = select <4 x i1> %m, <4 x i32> %vi, <4 x i32> zeroinitializer
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[ADD]])
		; ALL-CONVERT-NEXT: %{{.+}} = add i32 [[RED]], %start
		; ALL-CONVERT-NEXT: [[MUL:%.+]] = select <4 x i1> %m, <4 x i32> %vi, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> [[MUL]])
		; ALL-CONVERT-NEXT: %{{.+}} = mul i32 [[RED]], %start
		; ALL-CONVERT-NEXT: [[AND:%.+]] = select <4 x i1> %m, <4 x i32> %vi, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[AND]])
		; ALL-CONVERT-NEXT: %{{.+}} = and i32 [[RED]], %start
		; ALL-CONVERT-NEXT: [[OR:%.+]] = select <4 x i1> %m, <4 x i32> %vi, <4 x i32> zeroinitializer
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[OR]])
		; ALL-CONVERT-NEXT: %{{.+}} = or i32 [[RED]], %start
		; ALL-CONVERT-NEXT: [[XOR:%.+]] = select <4 x i1> %m, <4 x i32> %vi, <4 x i32> zeroinitializer
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call i32 @llvm.vector.reduce.xor.v4i32(<4 x i32> [[XOR]])
		; ALL-CONVERT-NEXT: %{{.+}} = xor i32 [[RED]], %start
		; ALL-CONVERT-NEXT: [[SMIN:%.+]] = select <4 x i1> %m, <4 x i32> %vi, <4 x i32> <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> [[SMIN]])
		; ALL-CONVERT-NEXT: %{{.+}} = call i32 @llvm.smin.i32(i32 [[RED]], i32 %start)
		; ALL-CONVERT-NEXT: [[SMAX:%.+]] = select <4 x i1> %m, <4 x i32> %vi, <4 x i32> <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[SMAX]])
		; ALL-CONVERT-NEXT: %{{.+}} = call i32 @llvm.smax.i32(i32 [[RED]], i32 %start)
		; ALL-CONVERT-NEXT: [[UMIN:%.+]] = select <4 x i1> %m, <4 x i32> %vi, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[UMIN]])
		; ALL-CONVERT-NEXT: %{{.+}} = call i32 @llvm.umin.i32(i32 [[RED]], i32 %start)
		; ALL-CONVERT-NEXT: [[UMAX:%.+]] = select <4 x i1> %m, <4 x i32> %vi, <4 x i32> zeroinitializer
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call i32 @llvm.vector.reduce.umax.v4i32(<4 x i32> [[UMAX]])
		; ALL-CONVERT-NEXT: %{{.+}} = call i32 @llvm.umax.i32(i32 [[RED]], i32 %start)
		; ALL-CONVERT-NEXT: ret void

		; Check that reductions use the correct neutral element for masked-off elements
		; ALL-CONVERT: define void @test_vp_reduce_fp_v4(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n) {
		; ALL-CONVERT-NEXT: [[FMIN:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float 0x7FF8000000000000, float 0x7FF8000000000000, float 0x7FF8000000000000, float 0x7FF8000000000000>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call float @llvm.vector.reduce.fmin.v4f32(<4 x float> [[FMIN]])
		; ALL-CONVERT-NEXT: %{{.+}} = call float @llvm.minnum.f32(float [[RED]], float %f)
		; ALL-CONVERT-NEXT: [[FMIN_NNAN:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call nnan float @llvm.vector.reduce.fmin.v4f32(<4 x float> [[FMIN_NNAN]])
		; ALL-CONVERT-NEXT: %{{.+}} = call nnan float @llvm.minnum.f32(float [[RED]], float %f)
		; ALL-CONVERT-NEXT: [[FMIN_NNAN_NINF:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float 0x47EFFFFFE0000000, float 0x47EFFFFFE0000000, float 0x47EFFFFFE0000000, float 0x47EFFFFFE0000000>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call nnan ninf float @llvm.vector.reduce.fmin.v4f32(<4 x float> [[FMIN_NNAN_NINF]])
		; ALL-CONVERT-NEXT: %{{.+}} = call nnan ninf float @llvm.minnum.f32(float [[RED]], float %f)
		; ALL-CONVERT-NEXT: [[FMAX:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float 0xFFF8000000000000, float 0xFFF8000000000000, float 0xFFF8000000000000, float 0xFFF8000000000000>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call float @llvm.vector.reduce.fmax.v4f32(<4 x float> [[FMAX]])
		; ALL-CONVERT-NEXT: %{{.+}} = call float @llvm.maxnum.f32(float [[RED]], float %f)
		; ALL-CONVERT-NEXT: [[FMAX_NNAN:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float 0xFFF0000000000000, float 0xFFF0000000000000, float 0xFFF0000000000000, float 0xFFF0000000000000>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call nnan float @llvm.vector.reduce.fmax.v4f32(<4 x float> [[FMAX_NNAN]])
		; ALL-CONVERT-NEXT: %{{.+}} = call nnan float @llvm.maxnum.f32(float [[RED]], float %f)
		; ALL-CONVERT-NEXT: [[FMAX_NNAN_NINF:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float 0xC7EFFFFFE0000000, float 0xC7EFFFFFE0000000, float 0xC7EFFFFFE0000000, float 0xC7EFFFFFE0000000>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call nnan ninf float @llvm.vector.reduce.fmax.v4f32(<4 x float> [[FMAX_NNAN_NINF]])
		; ALL-CONVERT-NEXT: %{{.+}} = call nnan ninf float @llvm.maxnum.f32(float [[RED]], float %f)
		; ALL-CONVERT-NEXT: [[FADD:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>
		; ALL-CONVERT-NEXT: %{{.+}} = call float @llvm.vector.reduce.fadd.v4f32(float %f, <4 x float> [[FADD]])
		; ALL-CONVERT-NEXT: [[FADD:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>
		; ALL-CONVERT-NEXT: %{{.+}} = call reassoc float @llvm.vector.reduce.fadd.v4f32(float %f, <4 x float> [[FADD]])
		; ALL-CONVERT-NEXT: [[FMUL:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
		; ALL-CONVERT-NEXT: %{{.+}} = call float @llvm.vector.reduce.fmul.v4f32(float %f, <4 x float> [[FMUL]])
		; ALL-CONVERT-NEXT: [[FMUL:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
		; ALL-CONVERT-NEXT: %{{.+}} = call reassoc float @llvm.vector.reduce.fmul.v4f32(float %f, <4 x float> [[FMUL]])
		; ALL-CONVERT-NEXT: ret void

; All legal - don't transform anything.		; All legal - don't transform anything.

; LEGAL_LEGAL: define void @test_vp_int_v8(<8 x i32> %i0, <8 x i32> %i1, <8 x i32> %i2, <8 x i32> %f3, <8 x i1> %m, i32 %n) {		; LEGAL_LEGAL: define void @test_vp_int_v8(<8 x i32> %i0, <8 x i32> %i1, <8 x i32> %i2, <8 x i32> %f3, <8 x i1> %m, i32 %n) {
; LEGAL_LEGAL-NEXT: %r0 = call <8 x i32> @llvm.vp.add.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %r0 = call <8 x i32> @llvm.vp.add.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
; LEGAL_LEGAL-NEXT: %r1 = call <8 x i32> @llvm.vp.sub.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %r1 = call <8 x i32> @llvm.vp.sub.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
; LEGAL_LEGAL-NEXT: %r2 = call <8 x i32> @llvm.vp.mul.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %r2 = call <8 x i32> @llvm.vp.mul.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
; LEGAL_LEGAL-NEXT: %r3 = call <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %r3 = call <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
Show All 19 Lines
; LEGAL_LEGAL-NEXT: %r7 = call <vscale x 4 x i32> @llvm.vp.and.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %r7 = call <vscale x 4 x i32> @llvm.vp.and.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
; LEGAL_LEGAL-NEXT: %r8 = call <vscale x 4 x i32> @llvm.vp.or.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %r8 = call <vscale x 4 x i32> @llvm.vp.or.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
; LEGAL_LEGAL-NEXT: %r9 = call <vscale x 4 x i32> @llvm.vp.xor.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %r9 = call <vscale x 4 x i32> @llvm.vp.xor.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
; LEGAL_LEGAL-NEXT: %rA = call <vscale x 4 x i32> @llvm.vp.ashr.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %rA = call <vscale x 4 x i32> @llvm.vp.ashr.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
; LEGAL_LEGAL-NEXT: %rB = call <vscale x 4 x i32> @llvm.vp.lshr.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %rB = call <vscale x 4 x i32> @llvm.vp.lshr.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
; LEGAL_LEGAL-NEXT: %rC = call <vscale x 4 x i32> @llvm.vp.shl.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %rC = call <vscale x 4 x i32> @llvm.vp.shl.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
; LEGAL_LEGAL-NEXT: ret void		; LEGAL_LEGAL-NEXT: ret void

		; LEGAL_LEGAL: define void @test_vp_reduce_int_v4(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n) {
		; LEGAL_LEGAL-NEXT: %r0 = call i32 @llvm.vp.reduce.add.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r1 = call i32 @llvm.vp.reduce.mul.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r2 = call i32 @llvm.vp.reduce.and.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r3 = call i32 @llvm.vp.reduce.or.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r4 = call i32 @llvm.vp.reduce.xor.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r5 = call i32 @llvm.vp.reduce.smin.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r6 = call i32 @llvm.vp.reduce.smax.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r7 = call i32 @llvm.vp.reduce.umin.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r8 = call i32 @llvm.vp.reduce.umax.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: ret void

		; LEGAL_LEGAL: define void @test_vp_reduce_fp_v4(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n) {
		; LEGAL_LEGAL-NEXT: %r0 = call float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r1 = call nnan float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r2 = call nnan ninf float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r3 = call float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r4 = call nnan float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r5 = call nnan ninf float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r6 = call float @llvm.vp.reduce.fadd.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r7 = call reassoc float @llvm.vp.reduce.fadd.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r8 = call float @llvm.vp.reduce.fmul.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r9 = call reassoc float @llvm.vp.reduce.fmul.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: ret void

; Drop %evl where possible else fold %evl into %mask (%evl Discard, %mask Legal)		; Drop %evl where possible else fold %evl into %mask (%evl Discard, %mask Legal)
;		;
; There is no caching yet in the ExpandVectorPredication pass and the %evl		; There is no caching yet in the ExpandVectorPredication pass and the %evl
; expansion code is emitted for every non-speculatable intrinsic again. Hence,		; expansion code is emitted for every non-speculatable intrinsic again. Hence,
; only check that..		; only check that..
; (1) The %evl folding code and %mask are correct for the first		; (1) The %evl folding code and %mask are correct for the first
; non-speculatable VP intrinsic.		; non-speculatable VP intrinsic.
Show All 32 Lines
; DISCARD_LEGAL: %r1 = call <vscale x 4 x i32> @llvm.vp.sub.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %scalable_size{{.*}})		; DISCARD_LEGAL: %r1 = call <vscale x 4 x i32> @llvm.vp.sub.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %scalable_size{{.*}})
; DISCARD_LEGAL: %r2 = call <vscale x 4 x i32> @llvm.vp.mul.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %scalable_size{{.*}})		; DISCARD_LEGAL: %r2 = call <vscale x 4 x i32> @llvm.vp.mul.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %scalable_size{{.*}})
; DISCARD_LEGAL: [[EVLM:%.+]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32 0, i32 %n)		; DISCARD_LEGAL: [[EVLM:%.+]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32 0, i32 %n)
; DISCARD_LEGAL: [[NEWM:%.+]] = and <vscale x 4 x i1> [[EVLM]], %m		; DISCARD_LEGAL: [[NEWM:%.+]] = and <vscale x 4 x i1> [[EVLM]], %m
; DISCARD_LEGAL: %r3 = call <vscale x 4 x i32> @llvm.vp.sdiv.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> [[NEWM]], i32 %scalable_size{{.*}})		; DISCARD_LEGAL: %r3 = call <vscale x 4 x i32> @llvm.vp.sdiv.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> [[NEWM]], i32 %scalable_size{{.*}})
; DISCARD_LEGAL-NOT: %{{.+}} = call <vscale x 4 x i32> @llvm.vp.{{.*}}, i32 %n)		; DISCARD_LEGAL-NOT: %{{.+}} = call <vscale x 4 x i32> @llvm.vp.{{.*}}, i32 %n)
; DISCARD_LEGAL: ret void		; DISCARD_LEGAL: ret void

		; DISCARD_LEGAL: define void @test_vp_reduce_int_v4(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n) {
		; DISCARD_LEGAL-NEXT: %r0 = call i32 @llvm.vp.reduce.add.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r1 = call i32 @llvm.vp.reduce.mul.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r2 = call i32 @llvm.vp.reduce.and.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r3 = call i32 @llvm.vp.reduce.or.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r4 = call i32 @llvm.vp.reduce.xor.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r5 = call i32 @llvm.vp.reduce.smin.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r6 = call i32 @llvm.vp.reduce.smax.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r7 = call i32 @llvm.vp.reduce.umin.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r8 = call i32 @llvm.vp.reduce.umax.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: ret void

		; DISCARD_LEGAL: define void @test_vp_reduce_fp_v4(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n) {
		; DISCARD_LEGAL-NEXT: %r0 = call float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r1 = call nnan float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r2 = call nnan ninf float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r3 = call float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r4 = call nnan float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r5 = call nnan ninf float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r6 = call float @llvm.vp.reduce.fadd.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r7 = call reassoc float @llvm.vp.reduce.fadd.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r8 = call float @llvm.vp.reduce.fmul.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r9 = call reassoc float @llvm.vp.reduce.fmul.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: ret void

; Convert %evl into %mask everywhere (%evl Convert, %mask Legal)		; Convert %evl into %mask everywhere (%evl Convert, %mask Legal)
;		;
; For the same reasons as in the (%evl Discard, %mask Legal) case only check that..		; For the same reasons as in the (%evl Discard, %mask Legal) case only check that..
; (1) The %evl folding code and %mask are correct for the first VP intrinsic.		; (1) The %evl folding code and %mask are correct for the first VP intrinsic.
; (2) All other VP intrinsics have a modified mask argument.		; (2) All other VP intrinsics have a modified mask argument.
; (3) All VP intrinsics have an ineffective %evl parameter.		; (3) All VP intrinsics have an ineffective %evl parameter.
;		;
Show All 22 Lines
; CONVERT_LEGAL-NEXT: [[EVLM:%.+]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32 0, i32 %n)		; CONVERT_LEGAL-NEXT: [[EVLM:%.+]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32 0, i32 %n)
; CONVERT_LEGAL-NEXT: [[NEWM:%.+]] = and <vscale x 4 x i1> [[EVLM]], %m		; CONVERT_LEGAL-NEXT: [[NEWM:%.+]] = and <vscale x 4 x i1> [[EVLM]], %m
; CONVERT_LEGAL-NEXT: %vscale = call i32 @llvm.vscale.i32()		; CONVERT_LEGAL-NEXT: %vscale = call i32 @llvm.vscale.i32()
; CONVERT_LEGAL-NEXT: %scalable_size = mul nuw i32 %vscale, 4		; CONVERT_LEGAL-NEXT: %scalable_size = mul nuw i32 %vscale, 4
; CONVERT_LEGAL-NEXT: %r0 = call <vscale x 4 x i32> @llvm.vp.add.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> [[NEWM]], i32 %scalable_size)		; CONVERT_LEGAL-NEXT: %r0 = call <vscale x 4 x i32> @llvm.vp.add.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> [[NEWM]], i32 %scalable_size)
; CONVERT_LEGAL-NOT: %{{.}} = call <vscale x 4 x i32> @llvm.vp.{{.}}, i32 %n)		; CONVERT_LEGAL-NOT: %{{.}} = call <vscale x 4 x i32> @llvm.vp.{{.}}, i32 %n)
; CONVERT_LEGAL: ret void		; CONVERT_LEGAL: ret void

		; CONVERT_LEGAL: define void @test_vp_reduce_int_v4(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n) {
		; CONVERT_LEGAL-NEXT: [[NINS:%.+]] = insertelement <4 x i32> poison, i32 %n, i32 0
		; CONVERT_LEGAL-NEXT: [[NSPLAT:%.+]] = shufflevector <4 x i32> [[NINS]], <4 x i32> poison, <4 x i32> zeroinitializer
		; CONVERT_LEGAL-NEXT: [[EVLM:%.+]] = icmp ult <4 x i32> <i32 0, i32 1, i32 2, i32 3>, [[NSPLAT]]
		; CONVERT_LEGAL-NEXT: [[NEWM:%.+]] = and <4 x i1> [[EVLM]], %m
		; CONVERT_LEGAL-NEXT: %{{.+}} = call i32 @llvm.vp.reduce.add.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> [[NEWM]], i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call i32 @llvm.vp.reduce.mul.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call i32 @llvm.vp.reduce.and.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call i32 @llvm.vp.reduce.or.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call i32 @llvm.vp.reduce.xor.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call i32 @llvm.vp.reduce.smin.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call i32 @llvm.vp.reduce.smax.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call i32 @llvm.vp.reduce.umin.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call i32 @llvm.vp.reduce.umax.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL: ret void

		; CONVERT_LEGAL: define void @test_vp_reduce_fp_v4(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n) {
		; CONVERT_LEGAL-NEXT: [[NINS:%.+]] = insertelement <4 x i32> poison, i32 %n, i32 0
		; CONVERT_LEGAL-NEXT: [[NSPLAT:%.+]] = shufflevector <4 x i32> [[NINS]], <4 x i32> poison, <4 x i32> zeroinitializer
		; CONVERT_LEGAL-NEXT: [[EVLM:%.+]] = icmp ult <4 x i32> <i32 0, i32 1, i32 2, i32 3>, [[NSPLAT]]
		; CONVERT_LEGAL-NEXT: [[NEWM:%.+]] = and <4 x i1> [[EVLM]], %m
		; CONVERT_LEGAL-NEXT: %{{.+}} = call float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> [[NEWM]], i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call nnan float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call nnan ninf float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call nnan float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call nnan ninf float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call float @llvm.vp.reduce.fadd.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call reassoc float @llvm.vp.reduce.fadd.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call float @llvm.vp.reduce.fmul.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call reassoc float @llvm.vp.reduce.fmul.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL: ret void

llvm/test/Verifier/vp-intrinsics.ll

Show All 23 Lines	define void @test_vp_fp(<8 x double> %f0, <8 x double> %f1, <8 x i1> %m, i32 %n) {
%r2 = call <8 x double> @llvm.vp.fmul.v8f64(<8 x double> %f0, <8 x double> %f1, <8 x i1> %m, i32 %n)		%r2 = call <8 x double> @llvm.vp.fmul.v8f64(<8 x double> %f0, <8 x double> %f1, <8 x i1> %m, i32 %n)
%r3 = call <8 x double> @llvm.vp.fdiv.v8f64(<8 x double> %f0, <8 x double> %f1, <8 x i1> %m, i32 %n)		%r3 = call <8 x double> @llvm.vp.fdiv.v8f64(<8 x double> %f0, <8 x double> %f1, <8 x i1> %m, i32 %n)
%r4 = call <8 x double> @llvm.vp.frem.v8f64(<8 x double> %f0, <8 x double> %f1, <8 x i1> %m, i32 %n)		%r4 = call <8 x double> @llvm.vp.frem.v8f64(<8 x double> %f0, <8 x double> %f1, <8 x i1> %m, i32 %n)
ret void		ret void
}		}

; TODO: test_vp_constrained_fp		; TODO: test_vp_constrained_fp


		define void @test_vp_reduction(i32 %x, <8 x i32> %vi, <8 x float> %vf, float %f, <8 x i1> %m, i32 %n) {
		%r0 = call i32 @llvm.vp.reduce.add.v8i32(i32 %x, <8 x i32> %vi, <8 x i1> %m, i32 %n)
		craig.topperUnsubmitted Done Reply Inline Actions Start value is missing craig.topper: Start value is missing
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Cheers. frasercrmck: Cheers.
		%r1 = call i32 @llvm.vp.reduce.mul.v8i32(i32 %x, <8 x i32> %vi, <8 x i1> %m, i32 %n)
		%r2 = call i32 @llvm.vp.reduce.and.v8i32(i32 %x, <8 x i32> %vi, <8 x i1> %m, i32 %n)
		%r3 = call i32 @llvm.vp.reduce.or.v8i32(i32 %x, <8 x i32> %vi, <8 x i1> %m, i32 %n)
		%r4 = call i32 @llvm.vp.reduce.xor.v8i32(i32 %x, <8 x i32> %vi, <8 x i1> %m, i32 %n)
		%r5 = call i32 @llvm.vp.reduce.smax.v8i32(i32 %x, <8 x i32> %vi, <8 x i1> %m, i32 %n)
		%r6 = call i32 @llvm.vp.reduce.smin.v8i32(i32 %x, <8 x i32> %vi, <8 x i1> %m, i32 %n)
		%r7 = call i32 @llvm.vp.reduce.umax.v8i32(i32 %x, <8 x i32> %vi, <8 x i1> %m, i32 %n)
		%r8 = call i32 @llvm.vp.reduce.umin.v8i32(i32 %x, <8 x i32> %vi, <8 x i1> %m, i32 %n)
		%r9 = call float @llvm.vp.reduce.fmin.v8f32(float %f, <8 x float> %vf, <8 x i1> %m, i32 %n)
		%rA = call float @llvm.vp.reduce.fmax.v8f32(float %f, <8 x float> %vf, <8 x i1> %m, i32 %n)
		%rB = call float @llvm.vp.reduce.fadd.v8f32(float %f, <8 x float> %vf, <8 x i1> %m, i32 %n)
		%rC = call float @llvm.vp.reduce.fmul.v8f32(float %f, <8 x float> %vf, <8 x i1> %m, i32 %n)
		ret void
		}

; integer arith		; integer arith
declare <8 x i32> @llvm.vp.add.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.add.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.sub.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.sub.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.mul.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.mul.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.srem.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.srem.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.udiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.udiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.urem.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.urem.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
; bit arith		; bit arith
declare <8 x i32> @llvm.vp.and.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.and.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.or.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.or.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.xor.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.xor.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.shl.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.shl.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
; fp arith		; fp arith
declare <8 x double> @llvm.vp.fadd.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)		declare <8 x double> @llvm.vp.fadd.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)
declare <8 x double> @llvm.vp.fsub.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)		declare <8 x double> @llvm.vp.fsub.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)
declare <8 x double> @llvm.vp.fmul.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)		declare <8 x double> @llvm.vp.fmul.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)
declare <8 x double> @llvm.vp.fdiv.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)		declare <8 x double> @llvm.vp.fdiv.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)
declare <8 x double> @llvm.vp.frem.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)		declare <8 x double> @llvm.vp.frem.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)
		; reductions
		declare i32 @llvm.vp.reduce.add.v8i32(i32, <8 x i32>, <8 x i1>, i32)
		declare i32 @llvm.vp.reduce.mul.v8i32(i32, <8 x i32>, <8 x i1>, i32)
		declare i32 @llvm.vp.reduce.and.v8i32(i32, <8 x i32>, <8 x i1>, i32)
		declare i32 @llvm.vp.reduce.or.v8i32(i32, <8 x i32>, <8 x i1>, i32)
		declare i32 @llvm.vp.reduce.xor.v8i32(i32, <8 x i32>, <8 x i1>, i32)
		declare i32 @llvm.vp.reduce.smax.v8i32(i32, <8 x i32>, <8 x i1>, i32)
		declare i32 @llvm.vp.reduce.smin.v8i32(i32, <8 x i32>, <8 x i1>, i32)
		declare i32 @llvm.vp.reduce.umax.v8i32(i32, <8 x i32>, <8 x i1>, i32)
		declare i32 @llvm.vp.reduce.umin.v8i32(i32, <8 x i32>, <8 x i1>, i32)
		declare float @llvm.vp.reduce.fmin.v8f32(float, <8 x float>, <8 x i1>, i32)
		declare float @llvm.vp.reduce.fmax.v8f32(float, <8 x float>, <8 x i1>, i32)
		declare float @llvm.vp.reduce.fadd.v8f32(float, <8 x float>, <8 x i1>, i32)
		declare float @llvm.vp.reduce.fmul.v8f32(float, <8 x float>, <8 x i1>, i32)

llvm/unittests/IR/VPIntrinsicTest.cpp

Show All 17 Lines
#include "llvm/Support/SourceMgr.h"		#include "llvm/Support/SourceMgr.h"
#include "gtest/gtest.h"		#include "gtest/gtest.h"
#include <sstream>		#include <sstream>

using namespace llvm;		using namespace llvm;

namespace {		namespace {

		static const char *ReductionIntOpcodes[] = {
		"add", "mul", "and", "or", "xor", "smin", "smax", "umin", "umax"};

		static const char *ReductionFPOpcodes[] = {"fadd", "fmul", "fmin", "fmax"};

class VPIntrinsicTest : public testing::Test {		class VPIntrinsicTest : public testing::Test {
protected:		protected:
LLVMContext Context;		LLVMContext Context;

VPIntrinsicTest() : Context() {}		VPIntrinsicTest() : Context() {}

LLVMContext C;		LLVMContext C;
SMDiagnostic Err;		SMDiagnostic Err;
Show All 16 Lines	Str << " declare void @llvm.vp.store.v8i32.p0v8i32(<8 x i32>, <8 x i32>*, "
"<8 x i1>, i32) ";		"<8 x i1>, i32) ";
Str << " declare void @llvm.vp.scatter.v8i32.v8p0i32(<8 x i32>, <8 x "		Str << " declare void @llvm.vp.scatter.v8i32.v8p0i32(<8 x i32>, <8 x "
"i32*>, <8 x i1>, i32) ";		"i32*>, <8 x i1>, i32) ";
Str << " declare <8 x i32> @llvm.vp.load.v8i32.p0v8i32(<8 x i32>*, <8 x "		Str << " declare <8 x i32> @llvm.vp.load.v8i32.p0v8i32(<8 x i32>*, <8 x "
"i1>, i32) ";		"i1>, i32) ";
Str << " declare <8 x i32> @llvm.vp.gather.v8i32.v8p0i32(<8 x i32*>, <8 x "		Str << " declare <8 x i32> @llvm.vp.gather.v8i32.v8p0i32(<8 x i32*>, <8 x "
"i1>, i32) ";		"i1>, i32) ";

		for (const char *ReductionOpcode : ReductionIntOpcodes)
		Str << " declare i32 @llvm.vp.reduce." << ReductionOpcode
		<< ".v8i32(i32, <8 x i32>, <8 x i1>, i32) ";

		for (const char *ReductionOpcode : ReductionFPOpcodes)
		simollUnsubmitted Done Reply Inline Actions Good ol' printf debugging simoll: Good ol' printf debugging
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions :) frasercrmck: :)
		Str << " declare float @llvm.vp.reduce." << ReductionOpcode
		<< ".v8f32(float, <8 x float>, <8 x i1>, i32) ";

return parseAssemblyString(Str.str(), Err, C);		return parseAssemblyString(Str.str(), Err, C);
}		}
};		};

/// Check that the property scopes include/llvm/IR/VPIntrinsics.def are closed.		/// Check that the property scopes include/llvm/IR/VPIntrinsics.def are closed.
TEST_F(VPIntrinsicTest, VPIntrinsicsDefScopes) {		TEST_F(VPIntrinsicTest, VPIntrinsicsDefScopes) {
Optional<Intrinsic::ID> ScopeVPID;		Optional<Intrinsic::ID> ScopeVPID;
#define BEGIN_REGISTER_VP_INTRINSIC(VPID, ...) \		#define BEGIN_REGISTER_VP_INTRINSIC(VPID, ...) \
▲ Show 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	#define HANDLE_VP_TO_CONSTRAINEDFP(HASROUND, HASEXCEPT, CFPID) \
for (auto TD : T) \		for (auto TD : T) \
NumMetadataArgs += (TD.Kind == Intrinsic::IITDescriptor::Metadata); \		NumMetadataArgs += (TD.Kind == Intrinsic::IITDescriptor::Metadata); \
ASSERT_EQ(NumMetadataArgs, (unsigned)(HASROUND + HASEXCEPT)); \		ASSERT_EQ(NumMetadataArgs, (unsigned)(HASROUND + HASEXCEPT)); \
}		}
#include "llvm/IR/VPIntrinsics.def"		#include "llvm/IR/VPIntrinsics.def"
}		}

} // end anonymous namespace		} // end anonymous namespace

		/// Check various properties of VPReductionIntrinsics
		TEST_F(VPIntrinsicTest, VPReductions) {
		LLVMContext C;
		SMDiagnostic Err;

		std::stringstream Str;
		Str << "declare <8 x i32> @llvm.vp.mul.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, "
		"i32)";
		for (const char *ReductionOpcode : ReductionIntOpcodes)
		Str << " declare i32 @llvm.vp.reduce." << ReductionOpcode
		<< ".v8i32(i32, <8 x i32>, <8 x i1>, i32) ";

		for (const char *ReductionOpcode : ReductionFPOpcodes)
		Str << " declare float @llvm.vp.reduce." << ReductionOpcode
		<< ".v8f32(float, <8 x float>, <8 x i1>, i32) ";

		Str << "define void @test_reductions(i32 %start, <8 x i32> %val, float "
		"%fpstart, <8 x float> %fpval, <8 x i1> %m, i32 %vl) {";

		// Mix in a regular non-reduction intrinsic to check that the
		// VPReductionIntrinsic subclass works as intended.
		Str << " %r0 = call <8 x i32> @llvm.vp.mul.v8i32(<8 x i32> %val, <8 x i32> "
		"%val, <8 x i1> %m, i32 %vl)";

		unsigned Idx = 1;
		for (const char *ReductionOpcode : ReductionIntOpcodes)
		Str << " %r" << Idx++ << " = call i32 @llvm.vp.reduce." << ReductionOpcode
		<< ".v8i32(i32 %start, <8 x i32> %val, <8 x i1> %m, i32 %vl)";
		for (const char *ReductionOpcode : ReductionFPOpcodes)
		Str << " %r" << Idx++ << " = call float @llvm.vp.reduce."
		<< ReductionOpcode
		<< ".v8f32(float %fpstart, <8 x float> %fpval, <8 x i1> %m, i32 %vl)";

		Str << " ret void"
		"}";

		std::unique_ptr<Module> M = parseAssemblyString(Str.str(), Err, C);
		assert(M);

		auto *F = M->getFunction("test_reductions");
		assert(F);

		for (const auto &I : F->getEntryBlock()) {
		const VPIntrinsic *VPI = dyn_cast<VPIntrinsic>(&I);
		if (!VPI)
		continue;

		Intrinsic::ID ID = VPI->getIntrinsicID();
		const auto *VPRedI = dyn_cast<VPReductionIntrinsic>(&I);

		if (!VPReductionIntrinsic::isVPReduction(ID)) {
		EXPECT_EQ(VPRedI, nullptr);
		EXPECT_EQ(VPReductionIntrinsic::getStartParamPos(ID).hasValue(), false);
		EXPECT_EQ(VPReductionIntrinsic::getVectorParamPos(ID).hasValue(), false);
		continue;
		}

		EXPECT_EQ(VPReductionIntrinsic::getStartParamPos(ID).hasValue(), true);
		EXPECT_EQ(VPReductionIntrinsic::getVectorParamPos(ID).hasValue(), true);
		ASSERT_NE(VPRedI, nullptr);
		EXPECT_EQ(VPReductionIntrinsic::getStartParamPos(ID),
		VPRedI->getStartParamPos());
		EXPECT_EQ(VPReductionIntrinsic::getVectorParamPos(ID),
		VPRedI->getVectorParamPos());
		EXPECT_EQ(VPRedI->getStartParamPos(), 0u);
		EXPECT_EQ(VPRedI->getVectorParamPos(), 1u);
		}
		}

This is an archive of the discontinued LLVM Phabricator instance.

[VP] Add vector-predicated reduction intrinsicsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 366932

llvm/docs/LangRef.rst

llvm/include/llvm/IR/IntrinsicInst.h

llvm/include/llvm/IR/Intrinsics.td

llvm/include/llvm/IR/VPIntrinsics.def

llvm/lib/CodeGen/ExpandVectorPredication.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/IR/IntrinsicInst.cpp

llvm/test/CodeGen/Generic/expand-vp.ll

llvm/test/Verifier/vp-intrinsics.ll

llvm/unittests/IR/VPIntrinsicTest.cpp

[VP] Add vector-predicated reduction intrinsics
ClosedPublic