This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
10/10
LangRef.rst
-
include/llvm/IR/
-
llvm/
-
IR/
4/4
IntrinsicInst.h
-
Intrinsics.td
8/8
VPIntrinsics.def
-
lib/
-
CodeGen/
5/7
ExpandVectorPredication.cpp
-
IR/
-
IntrinsicInst.cpp
-
test/
-
CodeGen/Generic/
-
Generic/
1/1
expand-vp.ll
-
Verifier/
2/2
vp-intrinsics.ll
-
unittests/IR/
-
IR/
2/2
VPIntrinsicTest.cpp

Differential D104308

[VP] Add vector-predicated reduction intrinsics
ClosedPublic

Authored by frasercrmck on Jun 15 2021, 9:41 AM.

Download Raw Diff

Details

Reviewers

simoll
craig.topper
rogfer01
vkmr
andrew.w.kaylor
HsiangKai

Commits

rGf3e9047249d0: [VP] Add vector-predicated reduction intrinsics

Summary

This patch adds vector-predicated ("VP") reduction intrinsics corresponding to
each of the existing unpredicated llvm.vector.reduce.* versions. Unlike the
unpredicated reductions, all VP reductions have a start value. This start value
is returned when the no vector element is active.

Support for expansion on targets without native vector-predication support is
included.

This patch is based on the "reduction slice" of the LLVM-VP reference patch
(https://reviews.llvm.org/D57504).

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	30 ms	x64 debian > LLVM.Verifier::vp-intrinsics.ll
	2,930 ms	x64 debian > libarcher.critical::critical.c
	2,840 ms	x64 debian > libarcher.races::critical-unrelated.c
	2,870 ms	x64 debian > libarcher.races::lock-nested-unrelated.c
	2,710 ms	x64 debian > libarcher.races::lock-unrelated.c
		View Full Test Results (18 Failed)

Event Timeline

frasercrmck created this revision.Jun 15 2021, 9:41 AM

Herald added subscribers: dexonsmith, jdoerfert, hiraditya. · View Herald TranscriptJun 15 2021, 9:41 AM

frasercrmck requested review of this revision.Jun 15 2021, 9:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2021, 9:41 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

frasercrmck added inline comments.Jun 15 2021, 9:43 AM

llvm/test/CodeGen/Generic/expand-vp.ll
29	Note that I've still got some reductions to add here but I feel the patch itself is good enough to start reviewing.

frasercrmck added inline comments.Jun 15 2021, 9:46 AM

llvm/include/llvm/IR/IntrinsicInst.h
450	This can probably go in `VPReductionIntrinsic`

Harbormaster completed remote builds in B109317: Diff 352164.Jun 15 2021, 10:16 AM

I wonder whether the semantics sections in the documentation should just refer back to semantics sections of the regular reduction intrinsics instead of replicating them. In the end, we use those when vp reduction are expanded anyway: if standard reductions switch semantics at some point, we will too unwittingly.

llvm/unittests/IR/VPIntrinsicTest.cpp
62	Good ol' printf debugging

rebase
remove debug print
move isVPReduction into VPReductionIntrinsic
flesh out expand-vp.ll test

In D104308#2826308, @simoll wrote:

I wonder whether the semantics sections in the documentation should just refer back to semantics sections of the regular reduction intrinsics instead of replicating them. In the end, we use those when vp reduction are expanded anyway: if standard reductions switch semantics at some point, we will too unwittingly.

I think that's a good idea. I found it difficult to strike a balance between providing enough "interesting" information that doesn't involve jumping about the page and plain copy/paste. I do think it's important to clarify how disabled lanes behave (I hope you agree) but almost everything else is just the base version.

llvm/unittests/IR/VPIntrinsicTest.cpp
62	:)

frasercrmck marked an inline comment as done.Jun 18 2021, 4:08 AM

Harbormaster completed remote builds in B109901: Diff 352970.Jun 18 2021, 6:34 PM

In D104308#2826690, @frasercrmck wrote:

In D104308#2826308, @simoll wrote:

I wonder whether the semantics sections in the documentation should just refer back to semantics sections of the regular reduction intrinsics instead of replicating them. In the end, we use those when vp reduction are expanded anyway: if standard reductions switch semantics at some point, we will too unwittingly.

I think that's a good idea. I found it difficult to strike a balance between providing enough "interesting" information that doesn't involve jumping about the page and plain copy/paste. I do think it's important to clarify how disabled lanes behave (I hope you agree) but almost everything else is just the base version.

Disabling lanes is really what makes the difference between these and the regular reduction intrinsics.
There is also the corner case that all lanes are disabled and i am unsure what the return value should be then. Any thoughts on that?

In D104308#2830045, @simoll wrote:

Disabling lanes is really what makes the difference between these and the regular reduction intrinsics.
There is also the corner case that all lanes are disabled and i am unsure what the return value should be then. Any thoughts on that?

Agreed. I'll update the docs accordingly.

We have a couple of options for what happens when all lanes are disabled. For starters, it follows logically from the definition of the expansion into reduce(select %mask, %v, %neutral) that we just return the neutral element. So that's the "easiest" definition in that sense.

We could return undef too which would be closer to how the other VP intrinsics work.

As for other options, it wouldn't make sense to me to specify that we return any of the vector elements (e.g. v[0]) as they're not conceptually active. And I think poison wouldn't make any sense. Are those the only realistic options?

In terms of hardware (which is orthogonal but may help guide us), RVV always takes a start value so we'd just return the neutral element; I admit I might be surreptitiously led by that. The lowering for returning the "neutral element" may be complicated on some targets, involving fetching the active length and perhaps doing some kind of scalar select. How would VE work?

In D104308#2830379, @frasercrmck wrote:

In D104308#2830045, @simoll wrote:

Disabling lanes is really what makes the difference between these and the regular reduction intrinsics.
There is also the corner case that all lanes are disabled and i am unsure what the return value should be then. Any thoughts on that?

Agreed. I'll update the docs accordingly.

We have a couple of options for what happens when all lanes are disabled. For starters, it follows logically from the definition of the expansion into reduce(select %mask, %v, %neutral) that we just return the neutral element. So that's the "easiest" definition in that sense.

We could return undef too which would be closer to how the other VP intrinsics work.

As for other options, it wouldn't make sense to me to specify that we return any of the vector elements (e.g. v[0]) as they're not conceptually active. And I think poison wouldn't make any sense. Are those the only realistic options?

In terms of hardware (which is orthogonal but may help guide us), RVV always takes a start value so we'd just return the neutral element; I admit I might be surreptitiously led by that. The lowering for returning the "neutral element" may be complicated on some targets, involving fetching the active length and perhaps doing some kind of scalar select. How would VE work?

Could we add a start value operand to all of the intrinsics? fadd and fmul already have it. If the vectorizer doesn't have a value it can put the neutral value. On targets like RISCV we can use that neutral value directly since we need the scalar input anyway. For other targets they can detect that the argument is neutral and not use it if it doesn't mesh with their ISA.

In D104308#2831225, @craig.topper wrote:

In D104308#2830379, @frasercrmck wrote:

In D104308#2830045, @simoll wrote:

Disabling lanes is really what makes the difference between these and the regular reduction intrinsics.
There is also the corner case that all lanes are disabled and i am unsure what the return value should be then. Any thoughts on that?

Agreed. I'll update the docs accordingly.

We have a couple of options for what happens when all lanes are disabled. For starters, it follows logically from the definition of the expansion into reduce(select %mask, %v, %neutral) that we just return the neutral element. So that's the "easiest" definition in that sense.

We could return undef too which would be closer to how the other VP intrinsics work.

As for other options, it wouldn't make sense to me to specify that we return any of the vector elements (e.g. v[0]) as they're not conceptually active. And I think poison wouldn't make any sense. Are those the only realistic options?

In terms of hardware (which is orthogonal but may help guide us), RVV always takes a start value so we'd just return the neutral element; I admit I might be surreptitiously led by that. The lowering for returning the "neutral element" may be complicated on some targets, involving fetching the active length and perhaps doing some kind of scalar select. How would VE work?

Could we add a start value operand to all of the intrinsics? fadd and fmul already have it. If the vectorizer doesn't have a value it can put the neutral value. On targets like RISCV we can use that neutral value directly since we need the scalar input anyway. For other targets they can detect that the argument is neutral and not use it if it doesn't mesh with their ISA.

Feeding back the discussions we had off Phabricator into the review thread:

We agreed to have scalar start value parameters in all vp reduction intrinsics (unlike llvm.vector.reduce.* which only have them where needed for non-reassociatable reductions). This makes them regular and there is no need to match the start value.
There appears to be an issue with the %evl == 0 corner case in that the ISAs (RISC- V and VE) do not update the result register in that case. There would need to be a register constraint between the start value register and the result register to enforce this. This sure is an artifact of the intrinsic reaching into isel. However, I am not sure how big of a problem that really is.

add start value to all intrinsics
clarify %evl==0 semantics in docs

Wasn't sure if it's best to explicitly say in the docs that vp reductions have a start value unlike their unpredicated counterparts. I've left it out for now.

craig.topper added inline comments.Jul 14 2021, 10:06 AM

llvm/docs/LangRef.rst
18582	With the start value added, I think it's the second operand now?
18807	Neutral value is -1 or UINT_MAX for AND
19175	Does -QNAN mean anything? Isn't the sign of a NaN mostly ignored.
19182	Even if all vector elements are NaN, the result would depend on the start value
llvm/include/llvm/IR/IntrinsicInst.h
465	Should this be isVPReduction?
llvm/include/llvm/IR/VPIntrinsics.def
251	Why accu instead of start?
llvm/lib/CodeGen/ExpandVectorPredication.cpp
264	I was wondering if we could go through ConstantExpr::getBinOpIdentity to avoid repeating the constants in multiple places but we'd still need special handling for min/max
llvm/test/Verifier/vp-intrinsics.ll
34	Start value is missing

Harbormaster completed remote builds in B114004: Diff 358635.Jul 14 2021, 10:12 AM

dexonsmith removed a subscriber: dexonsmith.Jul 14 2021, 12:15 PM

fix Verifier test
fix VPReductionIntrinsic::classof
add new unit test
update docs:

fix references to operand indices
fix neutral element for AND reduction
fix specified return value for FMIN/FMAX and NaNs
be more explicit about return value EVL==0
be more explicit about start value
reference operands by their names in Arguments sections

llvm/docs/LangRef.rst
18582	Yes, thanks. I didn't do a proper sweep over the docs. I've made further changes to them so it might be worth going over them again.
18807	Good spot, thanks. Updated the "equivalent to" section below, too.
19175	I have heard that the sign of NaNs is often ignored but I'm not sure it's guaranteed to be ignored? `-QNAN` does have a bit representation and `ConstantFP` and `APFloat` both take a `Negative` parameter to their `getQNAN` methods. It's also what `SelectionDAG::getNeutralElement` is doing for `FMAXNUM`.
19182	Yes true. I've replaced that sentence now.
llvm/include/llvm/IR/IntrinsicInst.h
456	Not sure whether to keep this in or not, now that all vp reductions have start parameter? Currently start is always 0 and vector is always 1.
465	Oh good spot, thanks. I've fixed that and added a unit test for these `VPReductionIntrinsic` methods.
llvm/include/llvm/IR/VPIntrinsics.def
251	I think the reference patch uses/used `accu` so I was flip-flopping between the two. I agree that "start" is better though.
llvm/lib/CodeGen/ExpandVectorPredication.cpp
264	Coming at it from a slightly different angle, I was wondering if there should be a single source of truth for neutral reduction elements between IR and SelectionDAG.
llvm/test/Verifier/vp-intrinsics.ll
34	Cheers.

Harbormaster completed remote builds in B114204: Diff 358920.Jul 15 2021, 5:28 AM

rebase

Harbormaster completed remote builds in B114221: Diff 358942.Jul 15 2021, 6:51 AM

rebase

Harbormaster completed remote builds in B117904: Diff 364100.Aug 4 2021, 7:55 AM

add logic for legalization similar to regular reductions

Harbormaster completed remote builds in B117944: Diff 364160.Aug 4 2021, 10:19 AM

return correct non-SEQ ISD opcode from reassoc reductions

Harbormaster completed remote builds in B118176: Diff 364496.Aug 5 2021, 9:22 AM

frasercrmck added a child revision: D107657: [RISCV][VP] Add support for VP_REDUCE_* operations.Aug 6 2021, 9:48 AM

craig.topper added inline comments.Aug 6 2021, 10:42 AM

llvm/docs/LangRef.rst
18710	vale -> value
llvm/include/llvm/IR/VPIntrinsics.def
241	Is this comment still accurate with start value?
297	Should this be %acc or %start?
313	Should this be lined up with vp_reduce_fadd on the line above?
llvm/lib/CodeGen/ExpandVectorPredication.cpp
376	Is the documentation for this in IRBuilder incorrect? It says the accumulator is for ordered reductions. Is it also used for unordered? /// Create a vector fadd reduction intrinsic of the source vector. /// The first parameter is a scalar accumulator value for ordered reductions. CallInst CreateFAddReduce(Value Acc, Value Src); /// Create a vector fmul reduction intrinsic of the source vector. /// The first parameter is a scalar accumulator value for ordered reductions. CallInst CreateFMulReduce(Value Acc, Value Src);

rebase
address wee bits of feedback

frasercrmck marked an inline comment as done.Aug 9 2021, 3:19 AM

frasercrmck added inline comments.

llvm/docs/LangRef.rst
18710	Nice spot
llvm/include/llvm/IR/VPIntrinsics.def
241	Nope, but it is now, thanks for catching that.
297	Aye it should be `%start` but I've reworked the documentation anyway; this specialized macro is now really for reductions with separate "seq" forms. Cheers.
313	Done, cheers.
llvm/lib/CodeGen/ExpandVectorPredication.cpp
376	Yeah so the unordered reduction intrinsics do indeed have an accumulator, so this is misleading. However, since the only difference between ordered and unordered intrinsics is the presence of the `reassoc` flag, this method always creates an ordered reduction. It's up to the user to add the flag later. That's what we do in `replaceOperation` by transferring any existing fast-math flags on to the expanded reduction. From the SelectionDAG's perspective the unordered reductions don't have an accumulator, because it's split out early in the SelectionDAGBuilder. I wonder if that's where the confusion came from. I think I still support addressing this documentation though. I can do that in a separate patch as it's likely to spawn discussion.

Harbormaster completed remote builds in B118639: Diff 365136.Aug 9 2021, 4:10 AM

frasercrmck marked an inline comment as done.Aug 9 2021, 4:17 AM

frasercrmck added inline comments.

llvm/lib/CodeGen/ExpandVectorPredication.cpp
376	I tried to address this in D107753.

craig.topper added inline comments.Aug 9 2021, 9:44 AM

llvm/lib/CodeGen/ExpandVectorPredication.cpp
27	What did we start using from the Operator.h? I couldn't spot it.

remove unused header

frasercrmck marked an inline comment as done.Aug 10 2021, 2:45 AM

frasercrmck added inline comments.

llvm/lib/CodeGen/ExpandVectorPredication.cpp
27	Ah, nice; must have been an artifact from an intermediate change. That's that gone now.

Harbormaster completed remote builds in B118832: Diff 365399.Aug 10 2021, 3:31 AM

LGTM

This revision is now accepted and ready to land.Aug 10 2021, 8:30 AM

Thanks for the review, Craig. Does the patch LGTY, @simoll?

rebase
address clang-tidy warnings

Harbormaster completed remote builds in B119039: Diff 365706.Aug 11 2021, 4:01 AM

rebase

Harbormaster completed remote builds in B119917: Diff 366917.Aug 17 2021, 10:02 AM

frasercrmck edited the summary of this revision. (Show Details)Aug 17 2021, 10:05 AM

This revision was landed with ongoing or failed builds.Aug 17 2021, 10:06 AM

Closed by commit rGf3e9047249d0: [VP] Add vector-predicated reduction intrinsics (authored by frasercrmck). · Explain Why

This revision was automatically updated to reflect the committed changes.

frasercrmck added a commit: rGf3e9047249d0: [VP] Add vector-predicated reduction intrinsics.

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

735 lines

include/

llvm/

IR/

IntrinsicInst.h

22 lines

Intrinsics.td

69 lines

VPIntrinsics.def

91 lines

lib/

CodeGen/

ExpandVectorPredication.cpp

139 lines

IR/

IntrinsicInst.cpp

52 lines

test/

CodeGen/

Generic/

expand-vp.ll

182 lines

Verifier/

vp-intrinsics.ll

32 lines

unittests/

IR/

VPIntrinsicTest.cpp

11 lines

Diff 358635

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 16,587 Lines • ▼ Show 20 Lines
Vector Reduction Intrinsics		Vector Reduction Intrinsics
---------------------------		---------------------------

Horizontal reductions of vectors can be expressed using the following		Horizontal reductions of vectors can be expressed using the following
intrinsics. Each one takes a vector operand as an input and applies its		intrinsics. Each one takes a vector operand as an input and applies its
respective operation across all elements of the vector, returning a single		respective operation across all elements of the vector, returning a single
scalar result of the same element type.		scalar result of the same element type.

		.. _int_vector_reduce_add:

'``llvm.vector.reduce.add.*``' Intrinsic		'``llvm.vector.reduce.add.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %a)		declare i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %a)
declare i64 @llvm.vector.reduce.add.v2i64(<2 x i64> %a)		declare i64 @llvm.vector.reduce.add.v2i64(<2 x i64> %a)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.vector.reduce.add.*``' intrinsics do an integer ``ADD``		The '``llvm.vector.reduce.add.*``' intrinsics do an integer ``ADD``
reduction of a vector, returning the result as a scalar. The return type matches		reduction of a vector, returning the result as a scalar. The return type matches
the element-type of the vector input.		the element-type of the vector input.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of integer values.		The argument to this intrinsic must be a vector of integer values.

		.. _int_vector_reduce_fadd:

'``llvm.vector.reduce.fadd.*``' Intrinsic		'``llvm.vector.reduce.fadd.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 36 Lines
"""""""""		"""""""""

::		::

%unord = call reassoc float @llvm.vector.reduce.fadd.v4f32(float -0.0, <4 x float> %input) ; relaxed reduction		%unord = call reassoc float @llvm.vector.reduce.fadd.v4f32(float -0.0, <4 x float> %input) ; relaxed reduction
%ord = call float @llvm.vector.reduce.fadd.v4f32(float %start_value, <4 x float> %input) ; sequential reduction		%ord = call float @llvm.vector.reduce.fadd.v4f32(float %start_value, <4 x float> %input) ; sequential reduction


		.. _int_vector_reduce_mul:

'``llvm.vector.reduce.mul.*``' Intrinsic		'``llvm.vector.reduce.mul.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> %a)		declare i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> %a)
declare i64 @llvm.vector.reduce.mul.v2i64(<2 x i64> %a)		declare i64 @llvm.vector.reduce.mul.v2i64(<2 x i64> %a)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.vector.reduce.mul.*``' intrinsics do an integer ``MUL``		The '``llvm.vector.reduce.mul.*``' intrinsics do an integer ``MUL``
reduction of a vector, returning the result as a scalar. The return type matches		reduction of a vector, returning the result as a scalar. The return type matches
the element-type of the vector input.		the element-type of the vector input.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of integer values.		The argument to this intrinsic must be a vector of integer values.

		.. _int_vector_reduce_fmul:

'``llvm.vector.reduce.fmul.*``' Intrinsic		'``llvm.vector.reduce.fmul.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 35 Lines
Examples:		Examples:
"""""""""		"""""""""

::		::

%unord = call reassoc float @llvm.vector.reduce.fmul.v4f32(float 1.0, <4 x float> %input) ; relaxed reduction		%unord = call reassoc float @llvm.vector.reduce.fmul.v4f32(float 1.0, <4 x float> %input) ; relaxed reduction
%ord = call float @llvm.vector.reduce.fmul.v4f32(float %start_value, <4 x float> %input) ; sequential reduction		%ord = call float @llvm.vector.reduce.fmul.v4f32(float %start_value, <4 x float> %input) ; sequential reduction

		.. _int_vector_reduce_and:

'``llvm.vector.reduce.and.*``' Intrinsic		'``llvm.vector.reduce.and.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare i32 @llvm.vector.reduce.and.v4i32(<4 x i32> %a)		declare i32 @llvm.vector.reduce.and.v4i32(<4 x i32> %a)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.vector.reduce.and.*``' intrinsics do a bitwise ``AND``		The '``llvm.vector.reduce.and.*``' intrinsics do a bitwise ``AND``
reduction of a vector, returning the result as a scalar. The return type matches		reduction of a vector, returning the result as a scalar. The return type matches
the element-type of the vector input.		the element-type of the vector input.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of integer values.		The argument to this intrinsic must be a vector of integer values.

		.. _int_vector_reduce_or:

'``llvm.vector.reduce.or.*``' Intrinsic		'``llvm.vector.reduce.or.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare i32 @llvm.vector.reduce.or.v4i32(<4 x i32> %a)		declare i32 @llvm.vector.reduce.or.v4i32(<4 x i32> %a)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.vector.reduce.or.*``' intrinsics do a bitwise ``OR`` reduction		The '``llvm.vector.reduce.or.*``' intrinsics do a bitwise ``OR`` reduction
of a vector, returning the result as a scalar. The return type matches the		of a vector, returning the result as a scalar. The return type matches the
element-type of the vector input.		element-type of the vector input.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of integer values.		The argument to this intrinsic must be a vector of integer values.

		.. _int_vector_reduce_xor:

'``llvm.vector.reduce.xor.*``' Intrinsic		'``llvm.vector.reduce.xor.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare i32 @llvm.vector.reduce.xor.v4i32(<4 x i32> %a)		declare i32 @llvm.vector.reduce.xor.v4i32(<4 x i32> %a)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.vector.reduce.xor.*``' intrinsics do a bitwise ``XOR``		The '``llvm.vector.reduce.xor.*``' intrinsics do a bitwise ``XOR``
reduction of a vector, returning the result as a scalar. The return type matches		reduction of a vector, returning the result as a scalar. The return type matches
the element-type of the vector input.		the element-type of the vector input.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of integer values.		The argument to this intrinsic must be a vector of integer values.

		.. _int_vector_reduce_smax:

'``llvm.vector.reduce.smax.*``' Intrinsic		'``llvm.vector.reduce.smax.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> %a)		declare i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> %a)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.vector.reduce.smax.*``' intrinsics do a signed integer		The '``llvm.vector.reduce.smax.*``' intrinsics do a signed integer
``MAX`` reduction of a vector, returning the result as a scalar. The return type		``MAX`` reduction of a vector, returning the result as a scalar. The return type
matches the element-type of the vector input.		matches the element-type of the vector input.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of integer values.		The argument to this intrinsic must be a vector of integer values.

		.. _int_vector_reduce_smin:

'``llvm.vector.reduce.smin.*``' Intrinsic		'``llvm.vector.reduce.smin.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> %a)		declare i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> %a)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.vector.reduce.smin.*``' intrinsics do a signed integer		The '``llvm.vector.reduce.smin.*``' intrinsics do a signed integer
``MIN`` reduction of a vector, returning the result as a scalar. The return type		``MIN`` reduction of a vector, returning the result as a scalar. The return type
matches the element-type of the vector input.		matches the element-type of the vector input.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of integer values.		The argument to this intrinsic must be a vector of integer values.

		.. _int_vector_reduce_umax:

'``llvm.vector.reduce.umax.*``' Intrinsic		'``llvm.vector.reduce.umax.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare i32 @llvm.vector.reduce.umax.v4i32(<4 x i32> %a)		declare i32 @llvm.vector.reduce.umax.v4i32(<4 x i32> %a)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.vector.reduce.umax.*``' intrinsics do an unsigned		The '``llvm.vector.reduce.umax.*``' intrinsics do an unsigned
integer ``MAX`` reduction of a vector, returning the result as a scalar. The		integer ``MAX`` reduction of a vector, returning the result as a scalar. The
return type matches the element-type of the vector input.		return type matches the element-type of the vector input.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of integer values.		The argument to this intrinsic must be a vector of integer values.

		.. _int_vector_reduce_umin:

'``llvm.vector.reduce.umin.*``' Intrinsic		'``llvm.vector.reduce.umin.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> %a)		declare i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> %a)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.vector.reduce.umin.*``' intrinsics do an unsigned		The '``llvm.vector.reduce.umin.*``' intrinsics do an unsigned
integer ``MIN`` reduction of a vector, returning the result as a scalar. The		integer ``MIN`` reduction of a vector, returning the result as a scalar. The
return type matches the element-type of the vector input.		return type matches the element-type of the vector input.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of integer values.		The argument to this intrinsic must be a vector of integer values.

		.. _int_vector_reduce_fmax:

'``llvm.vector.reduce.fmax.*``' Intrinsic		'``llvm.vector.reduce.fmax.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 14 Lines

If the intrinsic call has the ``nnan`` fast-math flag, then the operation can		If the intrinsic call has the ``nnan`` fast-math flag, then the operation can
assume that NaNs are not present in the input vector.		assume that NaNs are not present in the input vector.

Arguments:		Arguments:
""""""""""		""""""""""
The argument to this intrinsic must be a vector of floating-point values.		The argument to this intrinsic must be a vector of floating-point values.

		.. _int_vector_reduce_fmin:

'``llvm.vector.reduce.fmin.*``' Intrinsic		'``llvm.vector.reduce.fmin.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""
This is an overloaded intrinsic.		This is an overloaded intrinsic.

::		::
▲ Show 20 Lines • Show All 1,596 Lines • ▼ Show 20 Lines	.. code-block:: llvm
%r = call <4 x float> @llvm.vp.frem.v4f32(<4 x float> %a, <4 x float> %b, <4 x i1> %mask, i32 %evl)		%r = call <4 x float> @llvm.vp.frem.v4f32(<4 x float> %a, <4 x float> %b, <4 x i1> %mask, i32 %evl)
;; For all lanes below %evl, %r is lane-wise equivalent to %also.r		;; For all lanes below %evl, %r is lane-wise equivalent to %also.r

%t = frem <4 x float> %a, %b		%t = frem <4 x float> %a, %b
%also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> undef		%also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> undef



		.. _int_vp_reduce_add:

		'``llvm.vp.reduce.add.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare i32 @llvm.vp.reduce.add.v4i32(i32 <start_value>, <4 x i32> <op>, <4 x i1> <mask>, i32 <vector_length>)
		declare i16 @llvm.vp.reduce.add.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <op>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated integer ``ADD`` reduction of a vector, returning the result as a
		scalar. The return type matches the element type of the vector input.


		Arguments:
		""""""""""

		The first operand must be a vector of integer values. The result type must be
		craig.topperUnsubmitted Done Reply Inline Actions With the start value added, I think it's the second operand now? craig.topper: With the start value added, I think it's the second operand now?
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Yes, thanks. I didn't do a proper sweep over the docs. I've made further changes to them so it might be worth going over them again. frasercrmck: Yes, thanks. I didn't do a proper sweep over the docs. I've made further changes to them so it…
		the element type of the first operand. The second operand is the vector mask
		and has the same number of elements as the first operand. The third operand is
		the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.add``' intrinsic performs the integer ``ADD`` reduction
		(:ref:`llvm.vector.reduce.add <int_vector_reduce_add>`) of the vector operand
		on each enabled lane, adding to to the scalar start value. Disabled lanes are
		treated as containing the neutral value (``0``) (i.e. having no effect on the
		reduction operation). Thus if the vector length is zero, the result is the
		start value.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call i32 @llvm.vp.reduce.add.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> zeroinitializer
		%reduction = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> %masked.a)
		%also.r = add i32 %reduction, %start


		.. _int_vp_reduce_fadd:

		'``llvm.vp.reduce.fadd.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare float @llvm.vp.reduce.fadd.v4f32(float <start_value>, <4 x float> <op>, <4 x i1> <mask>, i32 <vector_length>)
		declare double @llvm.vp.reduce.fadd.nxv8f64(double <start_value>, <vscale x 8 x double> <op>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated floating-point ``ADD`` reduction of a vector, returning the result
		as a scalar. The return type matches the element type of the vector input.


		Arguments:
		""""""""""

		The first operand must be a vector of floating-point values. The result type
		must be the element type of the first operand. The second operand is the vector
		mask and has the same number of elements as the first operand. The third
		operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.fadd``' intrinsic performs the floating-point ``ADD``
		reduction (:ref:`llvm.vector.reduce.fadd <int_vector_reduce_fadd>`) of the
		vector operand on each enabled lane. Disabled lanes are treated as containing
		the neutral value (``-0.0``) (i.e. having no effect on the reduction
		operation). If no lanes are enabled, the resulting value will be equal to the
		starting value.

		See the unpredicated version (:ref:`llvm.vector.reduce.fadd
		<int_vector_reduce_fadd>`) for more detail on the semantics.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call float @llvm.vp.reduce.fadd.v4f32(float %start, <4 x float> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x float> %a, <4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>
		%also.r = call float @llvm.vector.reduce.fadd.v4f32(float %start, <4 x float> %masked.a)


		.. _int_vp_reduce_mul:

		'``llvm.vp.reduce.mul.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare i32 @llvm.vp.reduce.mul.v4i32(i32 <start_value>, <4 x i32> <op>, <4 x i1> <mask>, i32 <vector_length>)
		declare i16 @llvm.vp.reduce.mul.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <op>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated integer ``MUL`` reduction of a vector, returning the result as a
		scalar. The return type matches the element type of the vector input.


		Arguments:
		""""""""""

		The first operand must be a vector of integer values. The result type must be
		the element type of the first operand. The second operand is the vector mask
		and has the same number of elements as the first operand. The third operand is
		the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.mul``' intrinsic performs the integer ``MUL`` reduction
		(:ref:`llvm.vector.reduce.mul <int_vector_reduce_mul>`) of the vector operand
		on each enabled lane. Disabled lanes are treated as containing the neutral
		value (``1``) (i.e. having no effect on the reduction operation). Thus if the
		vector length is zero, the result is the start value.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call i32 @llvm.vp.reduce.mul.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		craig.topperUnsubmitted Done Reply Inline Actions vale -> value craig.topper: vale -> value
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Nice spot frasercrmck: Nice spot
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
		%reduction = call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> %masked.a)
		%also.r = mul i32 %reduction, %start

		.. _int_vp_reduce_fmul:

		'``llvm.vp.reduce.fmul.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare float @llvm.vp.reduce.fmul.v4f32(float <start_value>, <4 x float> <op>, <4 x i1> <mask>, i32 <vector_length>)
		declare double @llvm.vp.reduce.fmul.nxv8f64(double <start_value>, <vscale x 8 x double> <op>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated floating-point ``MUL`` reduction of a vector, returning the result
		as a scalar. The return type matches the element type of the vector input.


		Arguments:
		""""""""""

		The first operand must be a vector of floating-point values. The result type
		must be the element type of the first operand. The second operand is the vector
		mask and has the same number of elements as the first operand. The third
		operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.fmul``' intrinsic performs the floating-point ``MUL``
		reduction (:ref:`llvm.vector.reduce.fmul <int_vector_reduce_fmul>`) of the
		vector operand on each enabled lane. Disabled lanes are treated as containing
		the neutral value (``1.0``) (i.e. having no effect on the reduction operation).
		If no lanes are enabled, the resulting value will be equal to the starting
		value.

		See the unpredicated version (:ref:`llvm.vector.reduce.fmul
		<int_vector_reduce_fmul>`) for more detail on the semantics.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call float @llvm.vp.reduce.fmul.v4f32(float %start, <4 x float> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x float> %a, <4 x float> <float 1.0, float 1.0, float 1.0, float 1.0>
		%also.r = call float @llvm.vector.reduce.fmul.v4f32(float %start, <4 x float> %masked.a)


		.. _int_vp_reduce_and:

		'``llvm.vp.reduce.and.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare i32 @llvm.vp.reduce.and.v4i32(i32 <start_value>, <4 x i32> <op>, <4 x i1> <mask>, i32 <vector_length>)
		declare i16 @llvm.vp.reduce.and.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <op>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated integer ``AND`` reduction of a vector, returning the result as a
		scalar. The return type matches the element type of the vector input.


		Arguments:
		""""""""""

		The first operand must be a vector of integer values. The result type must be
		the element type of the first operand. The second operand is the vector mask
		and has the same number of elements as the first operand. The third operand is
		the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.and``' intrinsic performs the integer ``AND`` reduction
		(:ref:`llvm.vector.reduce.and <int_vector_reduce_and>`) of the vector operand
		on each enabled lane. Disabled lanes are treated as containing the neutral
		value (``1``) (i.e. having no effect on the reduction operation). Thus if the
		craig.topperUnsubmitted Done Reply Inline Actions Neutral value is -1 or UINT_MAX for AND craig.topper: Neutral value is -1 or UINT_MAX for AND
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Good spot, thanks. Updated the "equivalent to" section below, too. frasercrmck: Good spot, thanks. Updated the "equivalent to" section below, too.
		vector length is zero, the result is the start value.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call i32 @llvm.vp.reduce.and.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
		%reduction = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> %masked.a)
		%also.r = and i32 %reduction, %start


		.. _int_vp_reduce_or:

		'``llvm.vp.reduce.or.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare i32 @llvm.vp.reduce.or.v4i32(i32 <start_value>, <4 x i32> <op>, <4 x i1> <mask>, i32 <vector_length>)
		declare i16 @llvm.vp.reduce.or.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <op>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated integer ``OR`` reduction of a vector, returning the result as a
		scalar. The return type matches the element type of the vector input.


		Arguments:
		""""""""""

		The first operand must be a vector of integer values. The result type must be
		the element type of the first operand. The second operand is the vector mask
		and has the same number of elements as the first operand. The third operand is
		the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.or``' intrinsic performs the integer ``OR`` reduction
		(:ref:`llvm.vector.reduce.or <int_vector_reduce_or>`) of the vector operand on
		each enabled lane. Disabled lanes are treated as containing the neutral value
		(``0``) (i.e. having no effect on the reduction operation). Thus if the vector
		length is zero, the result is the start value.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call i32 @llvm.vp.reduce.or.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
		%reduction = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> %masked.a)
		%also.r = or i32 %reduction, %start

		.. _int_vp_reduce_xor:

		'``llvm.vp.reduce.xor.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare i32 @llvm.vp.reduce.xor.v4i32(i32 <start_value>, <4 x i32> <op>, <4 x i1> <mask>, i32 <vector_length>)
		declare i16 @llvm.vp.reduce.xor.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <op>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated integer ``XOR`` reduction of a vector, returning the result as a
		scalar. The return type matches the element type of the vector input.


		Arguments:
		""""""""""

		The first operand must be a vector of integer values. The result type must be
		the element type of the first operand. The second operand is the vector mask
		and has the same number of elements as the first operand. The third operand is
		the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.xor``' intrinsic performs the integer ``XOR`` reduction
		(:ref:`llvm.vector.reduce.xor <int_vector_reduce_xor>`) of the vector operand
		on each enabled lane. Disabled lanes are treated as containing the neutral
		value (``0``) (i.e. having no effect on the reduction operation). Thus if the
		vector length is zero, the result is the start value.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call i32 @llvm.vp.reduce.xor.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
		%reduction = call i32 @llvm.vector.reduce.xor.v4i32(<4 x i32> %masked.a)
		%also.r = xor i32 %reduction, %start


		.. _int_vp_reduce_smax:

		'``llvm.vp.reduce.smax.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare i32 @llvm.vp.reduce.smax.v4i32(i32 <start_value>, <4 x i32> <op>, <4 x i1> <mask>, i32 <vector_length>)
		declare i16 @llvm.vp.reduce.smax.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <op>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated signed-integer ``MAX`` reduction of a vector, returning the result
		as a scalar. The return type matches the element type of the vector input.


		Arguments:
		""""""""""

		The first operand must be a vector of integer values. The result type must be
		the element type of the first operand. The second operand is the vector mask
		and has the same number of elements as the first operand. The third operand is
		the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.smax``' intrinsic performs the signed-integer ``MAX``
		reduction (:ref:`llvm.vector.reduce.smax <int_vector_reduce_smax>`) of the
		vector operand on each enabled lane. Disabled lanes are treated as containing
		the neutral value (``INT_MIN``) (i.e. having no effect on the reduction
		operation). Thus if the vector length is zero, the result is the start value.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call i8 @llvm.vp.reduce.smax.v4i8(i32 %start, <4 x i8> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x i8> %a, <4 x i8> <i8 -128, i8 -128, i8 -128, i8 -128>
		%reduction = call i8 @llvm.vector.reduce.smax.v4i8(<4 x i8> %masked.a)
		%also.r = call i32 @llvm.smax.i32(i32 %reduction, i32 %start)


		.. _int_vp_reduce_smin:

		'``llvm.vp.reduce.smin.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare i32 @llvm.vp.reduce.smin.v4i32(i32 <start_value>, <4 x i32> <op>, <4 x i1> <mask>, i32 <vector_length>)
		declare i16 @llvm.vp.reduce.smin.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <op>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated signed-integer ``MIN`` reduction of a vector, returning the result
		as a scalar. The return type matches the element type of the vector input.


		Arguments:
		""""""""""

		The first operand must be a vector of integer values. The result type must be
		the element type of the first operand. The second operand is the vector mask
		and has the same number of elements as the first operand. The third operand is
		the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.smin``' intrinsic performs the signed-integer ``MIN``
		reduction (:ref:`llvm.vector.reduce.smin <int_vector_reduce_smin>`) of the
		vector operand on each enabled lane. Disabled lanes are treated as containing
		the neutral value (``INT_MAX``) (i.e. having no effect on the reduction
		operation). Thus if the vector length is zero, the result is the start value.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call i8 @llvm.vp.reduce.smin.v4i8(i32 %start, <4 x i8> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x i8> %a, <4 x i8> <i8 127, i8 127, i8 127, i8 127>
		%reduction = call i8 @llvm.vector.reduce.smin.v4i8(<4 x i8> %masked.a)
		%also.r = call i32 @llvm.smin.i32(i32 %reduction, i32 %start)


		.. _int_vp_reduce_umax:

		'``llvm.vp.reduce.umax.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare i32 @llvm.vp.reduce.umax.v4i32(i32 <start_value>, <4 x i32> <op>, <4 x i1> <mask>, i32 <vector_length>)
		declare i16 @llvm.vp.reduce.umax.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <op>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated unsigned-integer ``MAX`` reduction of a vector, returning the result
		as a scalar. The return type matches the element type of the vector input.


		Arguments:
		""""""""""

		The first operand must be a vector of integer values. The result type must be
		the element type of the first operand. The second operand is the vector mask
		and has the same number of elements as the first operand. The third operand is
		the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.umax``' intrinsic performs the unsigned-integer ``MAX``
		reduction (:ref:`llvm.vector.reduce.umax <int_vector_reduce_umax>`) of the
		vector operand on each enabled lane. Disabled lanes are treated as containing
		the neutral value (``0``) (i.e. having no effect on the reduction operation).
		Thus if the vector length is zero, the result is the start value.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call i32 @llvm.vp.reduce.umax.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
		%reduction = call i32 @llvm.vector.reduce.umax.v4i32(<4 x i32> %masked.a)
		%also.r = call i32 @llvm.umax.i32(i32 %reduction, i32 %start)


		.. _int_vp_reduce_umin:

		'``llvm.vp.reduce.umin.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare i32 @llvm.vp.reduce.umin.v4i32(i32 <start_value>, <4 x i32> <op>, <4 x i1> <mask>, i32 <vector_length>)
		declare i16 @llvm.vp.reduce.umin.nxv8i16(i16 <start_value>, <vscale x 8 x i16> <op>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated unsigned-integer ``MIN`` reduction of a vector, returning the result
		as a scalar. The return type matches the element type of the vector input.


		Arguments:
		""""""""""

		The first operand must be a vector of integer values. The result type must be
		the element type of the first operand. The second operand is the vector mask
		and has the same number of elements as the first operand. The third operand is
		the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.umin``' intrinsic performs the unsigned-integer ``MIN``
		reduction (:ref:`llvm.vector.reduce.umin <int_vector_reduce_umin>`) of the
		vector operand on each enabled lane. Disabled lanes are treated as containing
		the neutral value (``UINT_MAX``, or ``-1``) (i.e. having no effect on the
		reduction operation). Thus if the vector length is zero, the result is the
		start value.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call i32 @llvm.vp.reduce.umin.v4i32(i32 %start, <4 x i32> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x i32> %a, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
		%reduction = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> %masked.a)
		%also.r = call i32 @llvm.umin.i32(i32 %reduction, i32 %start)


		.. _int_vp_reduce_fmax:

		'``llvm.vp.reduce.fmax.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare float @llvm.vp.reduce.fmax.v4f32(float <start_value>, <4 x float> <op>, <4 x i1> <mask>, float <vector_length>)
		declare double @llvm.vp.reduce.fmax.nxv8f64(double <start_value>, <vscale x 8 x double> <op>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated floating-point ``MAX`` reduction of a vector, returning the result
		as a scalar. The return type matches the element type of the vector input.


		Arguments:
		""""""""""

		The first operand must be a vector of floating-point values. The result type
		must be the element type of the first operand. The second operand is the vector
		mask and has the same number of elements as the first operand. The third
		operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.fmax``' intrinsic performs the floating-point ``MAX``
		reduction (:ref:`llvm.vector.reduce.fmax <int_vector_reduce_fmax>`) of the
		vector operand on each enabled lane. Disabled lanes are treated as containing
		the neutral value (i.e. having no effect on the reduction operation). Thus if
		the vector length is zero, the result is the start value.

		The neutral value is dependent on the :ref:`fast-math flags <fastmath>`. If no
		flags are set, the neutral value is ``-QNAN``. If ``nnan`` and ``ninf`` are
		craig.topperUnsubmitted Done Reply Inline Actions Does -QNAN mean anything? Isn't the sign of a NaN mostly ignored. craig.topper: Does -QNAN mean anything? Isn't the sign of a NaN mostly ignored.
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions I have heard that the sign of NaNs is often ignored but I'm not sure it's guaranteed to be ignored? `-QNAN` does have a bit representation and `ConstantFP` and `APFloat` both take a `Negative` parameter to their `getQNAN` methods. It's also what `SelectionDAG::getNeutralElement` is doing for `FMAXNUM`. frasercrmck: I have heard that the sign of NaNs is often ignored but I'm not sure it's guaranteed to be…
		both set, then the neutral value is the smallest floating-point value for the
		result type. If only ``nnan`` is set then the neutral value is ``-Infinity``.

		This instruction has the same comparison semantics as the
		:ref:`llvm.vector.reduce.fmax <int_vector_reduce_fmax>` intrinsic (and thus
		the '``llvm.maxnum.*``' intrinsic). That is, the result will always be a number
		unless all elements of the vector are ``NaN``. Note that this means if all
		craig.topperUnsubmitted Done Reply Inline Actions Even if all vector elements are NaN, the result would depend on the start value craig.topper: Even if all vector elements are NaN, the result would depend on the start value
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Yes true. I've replaced that sentence now. frasercrmck: Yes true. I've replaced that sentence now.
		lanes are disabled the result will not be a number. For a vector with maximum
		element magnitude ``0.0`` and containing both ``+0.0`` and ``-0.0`` elements,
		the sign of the result is unspecified.



		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call float @llvm.vp.reduce.fmax.v4f32(float %float, <4 x float> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x float> %a, <4 x float> <float QNAN, float QNAN, float QNAN, float QNAN>
		%reduction = call float @llvm.vector.reduce.fmax.v4f32(<4 x float> %masked.a)
		%also.r = call float @llvm.maxnum.f32(float %reduction, float %start)


		.. _int_vp_reduce_fmin:

		'``llvm.vp.reduce.fmin.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare float @llvm.vp.reduce.fmin.v4f32(float <start_value>, <4 x float> <op>, <4 x i1> <mask>, float <vector_length>)
		declare double @llvm.vp.reduce.fmin.nxv8f64(double <start_value>, <vscale x 8 x double> <op>, <vscale x 8 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated floating-point ``MIN`` reduction of a vector, returning the result
		as a scalar. The return type matches the element type of the vector input.


		Arguments:
		""""""""""

		The first operand must be a vector of floating-point values. The result type
		must be the element type of the first operand. The second operand is the vector
		mask and has the same number of elements as the first operand. The third
		operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.reduce.fmin``' intrinsic performs the floating-point ``MIN``
		reduction (:ref:`llvm.vector.reduce.fmin <int_vector_reduce_fmin>`) of the
		vector operand on each enabled lane. Disabled lanes are treated as containing
		the neutral value (i.e. having no effect on the reduction operation). Thus if
		the vector length is zero, the result is the start value.

		The neutral value is dependent on the :ref:`fast-math flags <fastmath>`. If no
		flags are set, the neutral value is ``+QNAN``. If ``nnan`` and ``ninf`` are
		both set, then the neutral value is the largest floating-point value for the
		result type. If only ``nnan`` is set then the neutral value is ``+Infinity``.

		This instruction has the same comparison semantics as the
		:ref:`llvm.vector.reduce.fmin <int_vector_reduce_fmin>` intrinsic (and thus the
		'``llvm.minnum.*``' intrinsic). That is, the result will always be a number
		unless all elements of the vector are ``NaN``. Note that this means if all
		lanes are disabled the result will not be a number. For a vector with maximum
		element magnitude ``0.0`` and containing both ``+0.0`` and ``-0.0`` elements,
		the sign of the result is unspecified.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call float @llvm.vp.reduce.fmin.v4f32(float %start, <4 x float> %a, <4 x i1> %mask, i32 %evl)
		; %r is equivalent to %also.r, where lanes greater than or equal to %evl
		; are treated as though %mask were false for those lanes.

		%masked.a = select <4 x i1> %mask, <4 x float> %a, <4 x float> <float QNAN, float QNAN, float QNAN, float QNAN>
		%reduction = call float @llvm.vector.reduce.fmin.v4f32(<4 x float> %masked.a)
		%also.r = call float @llvm.minnum.f32(float %reduction, float %start)


.. _int_get_active_lane_mask:		.. _int_get_active_lane_mask:

'``llvm.get.active.lane.mask.*``' Intrinsics		'``llvm.get.active.lane.mask.*``' Intrinsics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""
This is an overloaded intrinsic.		This is an overloaded intrinsic.
▲ Show 20 Lines • Show All 4,069 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IntrinsicInst.h

Show First 20 Lines • Show All 441 Lines • ▼ Show 20 Lines	public:
// Equivalent non-predicated opcode		// Equivalent non-predicated opcode
Optional<unsigned> getFunctionalOpcode() const {		Optional<unsigned> getFunctionalOpcode() const {
return getFunctionalOpcodeForVP(getIntrinsicID());		return getFunctionalOpcodeForVP(getIntrinsicID());
}		}

// Equivalent non-predicated opcode		// Equivalent non-predicated opcode
static Optional<unsigned> getFunctionalOpcodeForVP(Intrinsic::ID ID);		static Optional<unsigned> getFunctionalOpcodeForVP(Intrinsic::ID ID);
};		};

		frasercrmckAuthorUnsubmitted Done Reply Inline Actions This can probably go in `VPReductionIntrinsic` frasercrmck: This can probably go in `VPReductionIntrinsic`
		/// This represents vector predication reduction intrinsics.
		class VPReductionIntrinsic : public VPIntrinsic {
		public:
		static bool isVPReduction(Intrinsic::ID ID);

		unsigned getStartParamPos() const;
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Not sure whether to keep this in or not, now that all vp reductions have start parameter? Currently start is always 0 and vector is always 1. frasercrmck: Not sure whether to keep this in or not, now that all vp reductions have start parameter?
		unsigned getVectorParamPos() const;

		static Optional<unsigned> getStartParamPos(Intrinsic::ID ID);
		static Optional<unsigned> getVectorParamPos(Intrinsic::ID ID);

		/// Methods for support type inquiry through isa, cast, and dyn_cast:
		/// @{
		static bool classof(const IntrinsicInst *I) {
		return VPIntrinsic::isVPIntrinsic(I->getIntrinsicID());
		craig.topperUnsubmitted Done Reply Inline Actions Should this be isVPReduction? craig.topper: Should this be isVPReduction?
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Oh good spot, thanks. I've fixed that and added a unit test for these `VPReductionIntrinsic` methods. frasercrmck: Oh good spot, thanks. I've fixed that and added a unit test for these `VPReductionIntrinsic`…
		}
		static bool classof(const Value *V) {
		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
		}
		/// @}
		};

/// This is the common base class for constrained floating point intrinsics.		/// This is the common base class for constrained floating point intrinsics.
class ConstrainedFPIntrinsic : public IntrinsicInst {		class ConstrainedFPIntrinsic : public IntrinsicInst {
public:		public:
bool isUnaryOp() const;		bool isUnaryOp() const;
bool isTernaryOp() const;		bool isTernaryOp() const;
Optional<RoundingMode> getRoundingMode() const;		Optional<RoundingMode> getRoundingMode() const;
Optional<fp::ExceptionBehavior> getExceptionBehavior() const;		Optional<fp::ExceptionBehavior> getExceptionBehavior() const;
bool isDefaultFPEnvironment() const;		bool isDefaultFPEnvironment() const;
▲ Show 20 Lines • Show All 851 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 1,492 Lines • ▼ Show 20 Lines	def int_vp_fdiv : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],
llvm_i32_ty]>;		llvm_i32_ty]>;
def int_vp_frem : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],		def int_vp_frem : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],
[ LLVMMatchType<0>,		[ LLVMMatchType<0>,
LLVMMatchType<0>,		LLVMMatchType<0>,
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
llvm_i32_ty]>;		llvm_i32_ty]>;
}		}

		// Reductions
		let IntrProperties = [IntrSpeculatable, IntrNoMem, IntrNoSync, IntrWillReturn] in {
		def int_vp_reduce_fadd : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_fmul : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_add : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_mul : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_and : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_or : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_xor : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_smax : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_smin : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_umax : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_umin : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_fmax : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_fmin : DefaultAttrsIntrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		}

def int_get_active_lane_mask:		def int_get_active_lane_mask:
DefaultAttrsIntrinsic<[llvm_anyvector_ty],		DefaultAttrsIntrinsic<[llvm_anyvector_ty],
[llvm_anyint_ty, LLVMMatchType<1>],		[llvm_anyint_ty, LLVMMatchType<1>],
[IntrNoMem, IntrNoSync, IntrWillReturn]>;		[IntrNoMem, IntrNoSync, IntrWillReturn]>;

//===-------------------------- Masked Intrinsics -------------------------===//		//===-------------------------- Masked Intrinsics -------------------------===//
//		//
def int_masked_load:		def int_masked_load:
▲ Show 20 Lines • Show All 275 Lines • Show Last 20 Lines

llvm/include/llvm/IR/VPIntrinsics.def

	Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
	#endif			#endif

	// This VP Intrinsic is a memory operation			// This VP Intrinsic is a memory operation
	// The pointer arg is at POINTERPOS and the data arg is at DATAPOS.			// The pointer arg is at POINTERPOS and the data arg is at DATAPOS.
	#ifndef HANDLE_VP_IS_MEMOP			#ifndef HANDLE_VP_IS_MEMOP
	#define HANDLE_VP_IS_MEMOP(VPID, POINTERPOS, DATAPOS)			#define HANDLE_VP_IS_MEMOP(VPID, POINTERPOS, DATAPOS)
	#endif			#endif

				// Map this VP reduction intrinsic to its reduction operand positions.
				#ifndef HANDLE_VP_REDUCTION
				#define HANDLE_VP_REDUCTION(ID, STARTPOS, VECTORPOS)
				#endif

	/// } Property Macros			/// } Property Macros

	///// Integer Arithmetic {			///// Integer Arithmetic {

	// Specialized helper macro for integer binary operators (%x, %y, %mask, %evl).			// Specialized helper macro for integer binary operators (%x, %y, %mask, %evl).
	#ifdef HELPER_REGISTER_BINARY_INT_VP			#ifdef HELPER_REGISTER_BINARY_INT_VP
	#error "The internal helper macro HELPER_REGISTER_BINARY_INT_VP is already defined!"			#error "The internal helper macro HELPER_REGISTER_BINARY_INT_VP is already defined!"
	#endif			#endif
	▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
	// llvm.vp.gather(ptr,mask,vlen)			// llvm.vp.gather(ptr,mask,vlen)
	BEGIN_REGISTER_VP(vp_gather, 1, 2, VP_GATHER, -1)			BEGIN_REGISTER_VP(vp_gather, 1, 2, VP_GATHER, -1)
	HANDLE_VP_TO_INTRIN(masked_gather)			HANDLE_VP_TO_INTRIN(masked_gather)
	HANDLE_VP_IS_MEMOP(vp_gather, 0, None)			HANDLE_VP_IS_MEMOP(vp_gather, 0, None)
	END_REGISTER_VP(vp_gather, VP_GATHER)			END_REGISTER_VP(vp_gather, VP_GATHER)

	///// } Memory Operations			///// } Memory Operations

				///// Reductions {

				// Specialized helper macro for reductions (%x, %mask, %evl).
				craig.topperUnsubmitted Done Reply Inline Actions Is this comment still accurate with start value? craig.topper: Is this comment still accurate with start value?
				frasercrmckAuthorUnsubmitted Done Reply Inline Actions Nope, but it is now, thanks for catching that. frasercrmck: Nope, but it is now, thanks for catching that.
				#ifdef HELPER_REGISTER_REDUCTION_VP
				#error "The internal helper macro HELPER_REGISTER_REDUCTION_VP is already defined!"
				#endif
				#define HELPER_REGISTER_REDUCTION_VP(VPINTRIN, SDOPC, INTRIN) \
				BEGIN_REGISTER_VP(VPINTRIN, 2, 3, SDOPC, -1) \
				HANDLE_VP_TO_INTRIN(INTRIN) \
				HANDLE_VP_REDUCTION(VPINTRIN, 0, 1) \
				END_REGISTER_VP(VPINTRIN, SDOPC)

				// llvm.vp.reduce.add(accu,x,mask,vlen)
				craig.topperUnsubmitted Done Reply Inline Actions Why accu instead of start? craig.topper: Why accu instead of start?
				frasercrmckAuthorUnsubmitted Done Reply Inline Actions I think the reference patch uses/used `accu` so I was flip-flopping between the two. I agree that "start" is better though. frasercrmck: I think the reference patch uses/used `accu` so I was flip-flopping between the two. I agree…
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_add, VP_REDUCE_ADD,
				experimental_vector_reduce_add)

				// llvm.vp.reduce.mul(accu,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_mul, VP_REDUCE_MUL,
				experimental_vector_reduce_mul)

				// llvm.vp.reduce.and(accu,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_and, VP_REDUCE_AND,
				experimental_vector_reduce_and)

				// llvm.vp.reduce.or(accu,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_or, VP_REDUCE_OR,
				experimental_vector_reduce_or)

				// llvm.vp.reduce.xor(accu,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_xor, VP_REDUCE_XOR,
				experimental_vector_reduce_xor)

				// llvm.vp.reduce.smax(accu,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_smax, VP_REDUCE_SMAX,
				experimental_vector_reduce_smax)

				// llvm.vp.reduce.smin(accu,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_smin, VP_REDUCE_SMIN,
				experimental_vector_reduce_smin)

				// llvm.vp.reduce.umax(accu,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_umax, VP_REDUCE_UMAX,
				experimental_vector_reduce_umax)

				// llvm.vp.reduce.umin(accu,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_umin, VP_REDUCE_UMIN,
				experimental_vector_reduce_umin)

				// llvm.vp.reduce.fmax(accu,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_fmax, VP_REDUCE_FMAX,
				experimental_vector_reduce_fmax)

				// llvm.vp.reduce.fmin(accu,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_VP(vp_reduce_fmin, VP_REDUCE_FMIN,
				experimental_vector_reduce_fmin)

				#undef HELPER_REGISTER_REDUCTION_VP

				// Specialized helper macro for reductions with a starting value (%acc, %x, %mask, %evl).
				craig.topperUnsubmitted Done Reply Inline Actions Should this be %acc or %start? craig.topper: Should this be %acc or %start?
				frasercrmckAuthorUnsubmitted Done Reply Inline Actions Aye it should be `%start` but I've reworked the documentation anyway; this specialized macro is now really for reductions with separate "seq" forms. Cheers. frasercrmck: Aye it should be `%start` but I've reworked the documentation anyway; this specialized macro is…
				#ifdef HELPER_REGISTER_REDUCTION_SEQ_VP
				#error "The internal helper macro HELPER_REGISTER_REDUCTION_SEQ_VP is already defined!"
				#endif
				#define HELPER_REGISTER_REDUCTION_SEQ_VP(VPINTRIN, SDOPC, SEQ_SDOPC, INTRIN) \
				BEGIN_REGISTER_VP_INTRINSIC(VPINTRIN, 2, 3) \
				BEGIN_REGISTER_VP_SDNODE(SDOPC, -1, VPINTRIN, 2, 3) \
				END_REGISTER_VP_SDNODE(SDOPC) \
				BEGIN_REGISTER_VP_SDNODE(SEQ_SDOPC, -1, VPINTRIN, 2, 3) \
				END_REGISTER_VP_SDNODE(SEQ_SDOPC) \
				HANDLE_VP_TO_INTRIN(INTRIN) \
				HANDLE_VP_REDUCTION(VPINTRIN, 0, 1) \
				END_REGISTER_VP_INTRINSIC(VPINTRIN)

				// llvm.vp.reduce.fadd(start,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_SEQ_VP(vp_reduce_fadd, VP_REDUCE_FADD,
				VP_REDUCE_SEQ_FADD,
				craig.topperUnsubmitted Done Reply Inline Actions Should this be lined up with vp_reduce_fadd on the line above? craig.topper: Should this be lined up with vp_reduce_fadd on the line above?
				frasercrmckAuthorUnsubmitted Done Reply Inline Actions Done, cheers. frasercrmck: Done, cheers.
				experimental_vector_reduce_fadd)

				// llvm.vp.reduce.fmul(start,x,mask,vlen)
				HELPER_REGISTER_REDUCTION_SEQ_VP(vp_reduce_fmul, VP_REDUCE_FMUL,
				VP_REDUCE_SEQ_FMUL,
				experimental_vector_reduce_fmul)

				#undef HELPER_REGISTER_REDUCTION_SEQ_VP

				///// } Reduction

	#undef BEGIN_REGISTER_VP			#undef BEGIN_REGISTER_VP
	#undef BEGIN_REGISTER_VP_INTRINSIC			#undef BEGIN_REGISTER_VP_INTRINSIC
	#undef BEGIN_REGISTER_VP_SDNODE			#undef BEGIN_REGISTER_VP_SDNODE
	#undef END_REGISTER_VP			#undef END_REGISTER_VP
	#undef END_REGISTER_VP_INTRINSIC			#undef END_REGISTER_VP_INTRINSIC
	#undef END_REGISTER_VP_SDNODE			#undef END_REGISTER_VP_SDNODE
	#undef HANDLE_VP_TO_OPC			#undef HANDLE_VP_TO_OPC
	#undef HANDLE_VP_TO_CONSTRAINEDFP			#undef HANDLE_VP_TO_CONSTRAINEDFP
	#undef HANDLE_VP_TO_INTRIN			#undef HANDLE_VP_TO_INTRIN
	#undef HANDLE_VP_IS_MEMOP			#undef HANDLE_VP_IS_MEMOP
				#undef HANDLE_VP_REDUCTION

llvm/lib/CodeGen/ExpandVectorPredication.cpp

Show All 18 Lines
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
		#include "llvm/IR/Operator.h"
		craig.topperUnsubmitted Done Reply Inline Actions What did we start using from the Operator.h? I couldn't spot it. craig.topper: What did we start using from the Operator.h? I couldn't spot it.
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Ah, nice; must have been an artifact from an intermediate change. That's that gone now. frasercrmck: Ah, nice; must have been an artifact from an intermediate change. That's that gone now.
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"

using namespace llvm;		using namespace llvm;
▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	struct CachingVPExpander {
/// "Remove" the %evl parameter of \p PI by setting it to the static vector		/// "Remove" the %evl parameter of \p PI by setting it to the static vector
/// length of the operation.		/// length of the operation.
void discardEVLParameter(VPIntrinsic &PI);		void discardEVLParameter(VPIntrinsic &PI);

/// \brief Lower this VP binary operator to a unpredicated binary operator.		/// \brief Lower this VP binary operator to a unpredicated binary operator.
Value *expandPredicationInBinaryOperator(IRBuilder<> &Builder,		Value *expandPredicationInBinaryOperator(IRBuilder<> &Builder,
VPIntrinsic &PI);		VPIntrinsic &PI);

		/// \brief Lower this VP reduction to a call to an unpredicated reduction
		/// intrinsic.
		Value *expandPredicationInReduction(IRBuilder<> &Builder,
		VPReductionIntrinsic &PI);

/// \brief Query TTI and expand the vector predication in \p P accordingly.		/// \brief Query TTI and expand the vector predication in \p P accordingly.
Value *expandPredication(VPIntrinsic &PI);		Value *expandPredication(VPIntrinsic &PI);

/// \brief Determine how and whether the VPIntrinsic \p VPI shall be		/// \brief Determine how and whether the VPIntrinsic \p VPI shall be
/// expanded. This overrides TTI with the cl::opts listed at the top of this		/// expanded. This overrides TTI with the cl::opts listed at the top of this
/// file.		/// file.
VPLegalization getVPLegalizationStrategy(const VPIntrinsic &VPI) const;		VPLegalization getVPLegalizationStrategy(const VPIntrinsic &VPI) const;
bool UsingTTIOverrides;		bool UsingTTIOverrides;
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	CachingVPExpander::expandPredicationInBinaryOperator(IRBuilder<> &Builder,
}		}

Value *NewBinOp = Builder.CreateBinOp(OC, Op0, Op1, VPI.getName());		Value *NewBinOp = Builder.CreateBinOp(OC, Op0, Op1, VPI.getName());

replaceOperation(*NewBinOp, VPI);		replaceOperation(*NewBinOp, VPI);
return NewBinOp;		return NewBinOp;
}		}

		static Value *getNeutralReductionElement(const VPReductionIntrinsic &VPI,
		Type *EltTy) {
		bool Negative = false;
		unsigned EltBits = EltTy->getScalarSizeInBits();
		switch (VPI.getIntrinsicID()) {
		default:
		llvm_unreachable("Expecting a VP reduction intrinsic");
		case Intrinsic::vp_reduce_add:
		craig.topperUnsubmitted Not Done Reply Inline Actions I was wondering if we could go through ConstantExpr::getBinOpIdentity to avoid repeating the constants in multiple places but we'd still need special handling for min/max craig.topper: I was wondering if we could go through ConstantExpr::getBinOpIdentity to avoid repeating the…
		frasercrmckAuthorUnsubmitted Not Done Reply Inline Actions Coming at it from a slightly different angle, I was wondering if there should be a single source of truth for neutral reduction elements between IR and SelectionDAG. frasercrmck: Coming at it from a slightly different angle, I was wondering if there should be a single…
		case Intrinsic::vp_reduce_or:
		case Intrinsic::vp_reduce_xor:
		case Intrinsic::vp_reduce_umax:
		return Constant::getNullValue(EltTy);
		case Intrinsic::vp_reduce_mul:
		return ConstantInt::get(EltTy, 1, /IsSigned/ false);
		case Intrinsic::vp_reduce_and:
		case Intrinsic::vp_reduce_umin:
		return ConstantInt::getAllOnesValue(EltTy);
		case Intrinsic::vp_reduce_smin:
		return ConstantInt::get(EltTy->getContext(),
		APInt::getSignedMaxValue(EltBits));
		case Intrinsic::vp_reduce_smax:
		return ConstantInt::get(EltTy->getContext(),
		APInt::getSignedMinValue(EltBits));
		case Intrinsic::vp_reduce_fmax:
		Negative = true;
		LLVM_FALLTHROUGH;
		case Intrinsic::vp_reduce_fmin: {
		FastMathFlags Flags = VPI.getFastMathFlags();
		const fltSemantics &Semantics = EltTy->getFltSemantics();
		return !Flags.noNaNs() ? ConstantFP::getQNaN(EltTy, Negative)
		: !Flags.noInfs()
		? ConstantFP::getInfinity(EltTy, Negative)
		: ConstantFP::get(EltTy,
		APFloat::getLargest(Semantics, Negative));
		}
		case Intrinsic::vp_reduce_fadd:
		return ConstantFP::getNegativeZero(EltTy);
		case Intrinsic::vp_reduce_fmul:
		return ConstantFP::get(EltTy, 1.0);
		}
		}

		Value *
		CachingVPExpander::expandPredicationInReduction(IRBuilder<> &Builder,
		VPReductionIntrinsic &VPI) {
		assert((isSafeToSpeculativelyExecute(&VPI) \|\|
		VPI.canIgnoreVectorLengthParam()) &&
		"Implicitly dropping %evl in non-speculatable operator!");

		Value *Mask = VPI.getMaskParam();
		Value *RedOp = VPI.getOperand(VPI.getVectorParamPos());

		// Insert neutral element in masked-out positions
		if (Mask && !isAllTrueMask(Mask)) {
		auto *NeutralElt = getNeutralReductionElement(VPI, VPI.getType());
		auto *NeutralVector = Builder.CreateVectorSplat(
		cast<VectorType>(RedOp->getType())->getElementCount(), NeutralElt);
		RedOp = Builder.CreateSelect(Mask, RedOp, NeutralVector);
		}

		Value *Reduction;
		Value *Start = VPI.getOperand(VPI.getStartParamPos());

		switch (VPI.getIntrinsicID()) {
		default:
		llvm_unreachable("Impossible reduction kind");
		case Intrinsic::vp_reduce_add:
		Reduction = Builder.CreateAddReduce(RedOp);
		Reduction = Builder.CreateAdd(Reduction, Start);
		break;
		case Intrinsic::vp_reduce_mul:
		Reduction = Builder.CreateMulReduce(RedOp);
		Reduction = Builder.CreateMul(Reduction, Start);
		break;
		case Intrinsic::vp_reduce_and:
		Reduction = Builder.CreateAndReduce(RedOp);
		Reduction = Builder.CreateAnd(Reduction, Start);
		break;
		case Intrinsic::vp_reduce_or:
		Reduction = Builder.CreateOrReduce(RedOp);
		Reduction = Builder.CreateOr(Reduction, Start);
		break;
		case Intrinsic::vp_reduce_xor:
		Reduction = Builder.CreateXorReduce(RedOp);
		Reduction = Builder.CreateXor(Reduction, Start);
		break;
		case Intrinsic::vp_reduce_smax:
		Reduction = Builder.CreateIntMaxReduce(RedOp, /IsSigned/ true);
		Reduction =
		Builder.CreateBinaryIntrinsic(Intrinsic::smax, Reduction, Start);
		break;
		case Intrinsic::vp_reduce_smin:
		Reduction = Builder.CreateIntMinReduce(RedOp, /IsSigned/ true);
		Reduction =
		Builder.CreateBinaryIntrinsic(Intrinsic::smin, Reduction, Start);
		break;
		case Intrinsic::vp_reduce_umax:
		Reduction = Builder.CreateIntMaxReduce(RedOp, /IsSigned/ false);
		Reduction =
		Builder.CreateBinaryIntrinsic(Intrinsic::umax, Reduction, Start);
		break;
		case Intrinsic::vp_reduce_umin:
		Reduction = Builder.CreateIntMinReduce(RedOp, /IsSigned/ false);
		Reduction =
		Builder.CreateBinaryIntrinsic(Intrinsic::umin, Reduction, Start);
		break;
		case Intrinsic::vp_reduce_fmax:
		Reduction = Builder.CreateFPMaxReduce(RedOp);
		transferDecorations(*Reduction, VPI);
		Reduction =
		Builder.CreateBinaryIntrinsic(Intrinsic::maxnum, Reduction, Start);
		break;
		case Intrinsic::vp_reduce_fmin:
		Reduction = Builder.CreateFPMinReduce(RedOp);
		transferDecorations(*Reduction, VPI);
		Reduction =
		Builder.CreateBinaryIntrinsic(Intrinsic::minnum, Reduction, Start);
		break;
		case Intrinsic::vp_reduce_fadd:
		Reduction = Builder.CreateFAddReduce(Start, RedOp);
		craig.topperUnsubmitted Done Reply Inline Actions Is the documentation for this in IRBuilder incorrect? It says the accumulator is for ordered reductions. Is it also used for unordered? /// Create a vector fadd reduction intrinsic of the source vector. /// The first parameter is a scalar accumulator value for ordered reductions. CallInst CreateFAddReduce(Value Acc, Value Src); /// Create a vector fmul reduction intrinsic of the source vector. /// The first parameter is a scalar accumulator value for ordered reductions. CallInst CreateFMulReduce(Value Acc, Value Src); craig.topper: Is the documentation for this in IRBuilder incorrect? It says the accumulator is for ordered…
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Yeah so the unordered reduction intrinsics do indeed have an accumulator, so this is misleading. However, since the only difference between ordered and unordered intrinsics is the presence of the `reassoc` flag, this method always creates an ordered reduction. It's up to the user to add the flag later. That's what we do in `replaceOperation` by transferring any existing fast-math flags on to the expanded reduction. From the SelectionDAG's perspective the unordered reductions don't have an accumulator, because it's split out early in the SelectionDAGBuilder. I wonder if that's where the confusion came from. I think I still support addressing this documentation though. I can do that in a separate patch as it's likely to spawn discussion. frasercrmck: Yeah so the unordered reduction intrinsics do indeed have an accumulator, so this is misleading.
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions I tried to address this in D107753. frasercrmck: I tried to address this in D107753.
		break;
		case Intrinsic::vp_reduce_fmul:
		Reduction = Builder.CreateFMulReduce(Start, RedOp);
		break;
		}

		replaceOperation(*Reduction, VPI);
		return Reduction;
		}

void CachingVPExpander::discardEVLParameter(VPIntrinsic &VPI) {		void CachingVPExpander::discardEVLParameter(VPIntrinsic &VPI) {
LLVM_DEBUG(dbgs() << "Discard EVL parameter in " << VPI << "\n");		LLVM_DEBUG(dbgs() << "Discard EVL parameter in " << VPI << "\n");

if (VPI.canIgnoreVectorLengthParam())		if (VPI.canIgnoreVectorLengthParam())
return;		return;

Value *EVLParam = VPI.getVectorLengthParam();		Value *EVLParam = VPI.getVectorLengthParam();
if (!EVLParam)		if (!EVLParam)
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	Value *CachingVPExpander::expandPredication(VPIntrinsic &VPI) {
IRBuilder<> Builder(&VPI);		IRBuilder<> Builder(&VPI);

// Try lowering to a LLVM instruction first.		// Try lowering to a LLVM instruction first.
auto OC = VPI.getFunctionalOpcode();		auto OC = VPI.getFunctionalOpcode();

if (OC && Instruction::isBinaryOp(*OC))		if (OC && Instruction::isBinaryOp(*OC))
return expandPredicationInBinaryOperator(Builder, VPI);		return expandPredicationInBinaryOperator(Builder, VPI);

		if (auto *VPRI = dyn_cast<VPReductionIntrinsic>(&VPI))
		return expandPredicationInReduction(Builder, *VPRI);

return &VPI;		return &VPI;
}		}

//// } CachingVPExpander		//// } CachingVPExpander

struct TransformJob {		struct TransformJob {
VPIntrinsic *PI;		VPIntrinsic *PI;
TargetTransformInfo::VPLegalization Strategy;		TargetTransformInfo::VPLegalization Strategy;
▲ Show 20 Lines • Show All 138 Lines • Show Last 20 Lines

llvm/lib/IR/IntrinsicInst.cpp

Show First 20 Lines • Show All 467 Lines • ▼ Show 20 Lines	bool VPIntrinsic::canIgnoreVectorLengthParam() const {
return false;		return false;
}		}

Function VPIntrinsic::getDeclarationForParams(Module M, Intrinsic::ID VPID,		Function VPIntrinsic::getDeclarationForParams(Module M, Intrinsic::ID VPID,
ArrayRef<Value *> Params) {		ArrayRef<Value *> Params) {
assert(isVPIntrinsic(VPID) && "not a VP intrinsic");		assert(isVPIntrinsic(VPID) && "not a VP intrinsic");
Function *VPFunc;		Function *VPFunc;
switch (VPID) {		switch (VPID) {
default:		default: {
VPFunc = Intrinsic::getDeclaration(M, VPID, Params[0]->getType());		Type *OverloadTy = Params[0]->getType();
		if (VPReductionIntrinsic::isVPReduction(VPID))
		OverloadTy =
		Params[*VPReductionIntrinsic::getVectorParamPos(VPID)]->getType();

		VPFunc = Intrinsic::getDeclaration(M, VPID, OverloadTy);
break;		break;
		}
case Intrinsic::vp_load:		case Intrinsic::vp_load:
VPFunc = Intrinsic::getDeclaration(		VPFunc = Intrinsic::getDeclaration(
M, VPID,		M, VPID,
{Params[0]->getType()->getPointerElementType(), Params[0]->getType()});		{Params[0]->getType()->getPointerElementType(), Params[0]->getType()});
break;		break;
case Intrinsic::vp_gather:		case Intrinsic::vp_gather:
VPFunc = Intrinsic::getDeclaration(		VPFunc = Intrinsic::getDeclaration(
M, VPID,		M, VPID,
Show All 12 Lines	case Intrinsic::vp_scatter:
VPFunc = Intrinsic::getDeclaration(		VPFunc = Intrinsic::getDeclaration(
M, VPID, {Params[0]->getType(), Params[1]->getType()});		M, VPID, {Params[0]->getType(), Params[1]->getType()});
break;		break;
}		}
assert(VPFunc && "Could not declare VP intrinsic");		assert(VPFunc && "Could not declare VP intrinsic");
return VPFunc;		return VPFunc;
}		}

		bool VPReductionIntrinsic::isVPReduction(Intrinsic::ID ID) {
		switch (ID) {
		default:
		return false;
		#define HANDLE_VP_REDUCTION(VPID, STARTPOS, VECTORPOS) \
		case Intrinsic::VPID: \
		break;
		#include "llvm/IR/VPIntrinsics.def"
		}
		return true;
		}

		unsigned VPReductionIntrinsic::getVectorParamPos() const {
		return *VPReductionIntrinsic::getVectorParamPos(getIntrinsicID());
		}

		unsigned VPReductionIntrinsic::getStartParamPos() const {
		return *VPReductionIntrinsic::getStartParamPos(getIntrinsicID());
		}

		Optional<unsigned> VPReductionIntrinsic::getVectorParamPos(Intrinsic::ID ID) {
		switch (ID) {
		#define HANDLE_VP_REDUCTION(VPID, STARTPOS, VECTORPOS) \
		case Intrinsic::VPID: \
		return VECTORPOS;
		#include "llvm/IR/VPIntrinsics.def"
		default:
		return None;
		}
		}

		Optional<unsigned> VPReductionIntrinsic::getStartParamPos(Intrinsic::ID ID) {
		switch (ID) {
		#define HANDLE_VP_REDUCTION(VPID, STARTPOS, VECTORPOS) \
		case Intrinsic::VPID: \
		return STARTPOS;
		#include "llvm/IR/VPIntrinsics.def"
		default:
		return None;
		}
		}

Instruction::BinaryOps BinaryOpIntrinsic::getBinaryOp() const {		Instruction::BinaryOps BinaryOpIntrinsic::getBinaryOp() const {
switch (getIntrinsicID()) {		switch (getIntrinsicID()) {
case Intrinsic::uadd_with_overflow:		case Intrinsic::uadd_with_overflow:
case Intrinsic::sadd_with_overflow:		case Intrinsic::sadd_with_overflow:
case Intrinsic::uadd_sat:		case Intrinsic::uadd_sat:
case Intrinsic::sadd_sat:		case Intrinsic::sadd_sat:
return Instruction::Add;		return Instruction::Add;
case Intrinsic::usub_with_overflow:		case Intrinsic::usub_with_overflow:
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/CodeGen/Generic/expand-vp.ll

Show All 19 Lines
declare <8 x i32> @llvm.vp.urem.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.urem.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
; Bit arith		; Bit arith
declare <8 x i32> @llvm.vp.and.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.and.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.xor.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.xor.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.or.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.or.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.shl.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.shl.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
		; Reductions
		declare i32 @llvm.vp.reduce.add.v4i32(i32, <4 x i32>, <4 x i1>, i32)
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Note that I've still got some reductions to add here but I feel the patch itself is good enough to start reviewing. frasercrmck: Note that I've still got some reductions to add here but I feel the patch itself is good enough…
		declare i32 @llvm.vp.reduce.mul.v4i32(i32, <4 x i32>, <4 x i1>, i32)
		declare i32 @llvm.vp.reduce.and.v4i32(i32, <4 x i32>, <4 x i1>, i32)
		declare i32 @llvm.vp.reduce.or.v4i32(i32, <4 x i32>, <4 x i1>, i32)
		declare i32 @llvm.vp.reduce.xor.v4i32(i32, <4 x i32>, <4 x i1>, i32)
		declare i32 @llvm.vp.reduce.smin.v4i32(i32, <4 x i32>, <4 x i1>, i32)
		declare i32 @llvm.vp.reduce.smax.v4i32(i32, <4 x i32>, <4 x i1>, i32)
		declare i32 @llvm.vp.reduce.umin.v4i32(i32, <4 x i32>, <4 x i1>, i32)
		declare i32 @llvm.vp.reduce.umax.v4i32(i32, <4 x i32>, <4 x i1>, i32)
		declare float @llvm.vp.reduce.fmin.v4f32(float, <4 x float>, <4 x i1>, i32)
		declare float @llvm.vp.reduce.fmax.v4f32(float, <4 x float>, <4 x i1>, i32)
		declare float @llvm.vp.reduce.fadd.v4f32(float, <4 x float>, <4 x i1>, i32)
		declare float @llvm.vp.reduce.fmul.v4f32(float, <4 x float>, <4 x i1>, i32)

; Fixed vector test function.		; Fixed vector test function.
define void @test_vp_int_v8(<8 x i32> %i0, <8 x i32> %i1, <8 x i32> %i2, <8 x i32> %f3, <8 x i1> %m, i32 %n) {		define void @test_vp_int_v8(<8 x i32> %i0, <8 x i32> %i1, <8 x i32> %i2, <8 x i32> %f3, <8 x i1> %m, i32 %n) {
%r0 = call <8 x i32> @llvm.vp.add.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)		%r0 = call <8 x i32> @llvm.vp.add.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
%r1 = call <8 x i32> @llvm.vp.sub.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)		%r1 = call <8 x i32> @llvm.vp.sub.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
%r2 = call <8 x i32> @llvm.vp.mul.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)		%r2 = call <8 x i32> @llvm.vp.mul.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
%r3 = call <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)		%r3 = call <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
%r4 = call <8 x i32> @llvm.vp.srem.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)		%r4 = call <8 x i32> @llvm.vp.srem.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
Show All 37 Lines	define void @test_vp_int_vscale(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i32> %i2, <vscale x 4 x i32> %f3, <vscale x 4 x i1> %m, i32 %n) {
%r7 = call <vscale x 4 x i32> @llvm.vp.and.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		%r7 = call <vscale x 4 x i32> @llvm.vp.and.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
%r8 = call <vscale x 4 x i32> @llvm.vp.or.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		%r8 = call <vscale x 4 x i32> @llvm.vp.or.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
%r9 = call <vscale x 4 x i32> @llvm.vp.xor.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		%r9 = call <vscale x 4 x i32> @llvm.vp.xor.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
%rA = call <vscale x 4 x i32> @llvm.vp.ashr.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		%rA = call <vscale x 4 x i32> @llvm.vp.ashr.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
%rB = call <vscale x 4 x i32> @llvm.vp.lshr.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		%rB = call <vscale x 4 x i32> @llvm.vp.lshr.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
%rC = call <vscale x 4 x i32> @llvm.vp.shl.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		%rC = call <vscale x 4 x i32> @llvm.vp.shl.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
ret void		ret void
}		}

		; Fixed vector reduce test function.
		define void @test_vp_reduce_int_v4(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n) {
		%r0 = call i32 @llvm.vp.reduce.add.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		%r1 = call i32 @llvm.vp.reduce.mul.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		%r2 = call i32 @llvm.vp.reduce.and.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		%r3 = call i32 @llvm.vp.reduce.or.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		%r4 = call i32 @llvm.vp.reduce.xor.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		%r5 = call i32 @llvm.vp.reduce.smin.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		%r6 = call i32 @llvm.vp.reduce.smax.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		%r7 = call i32 @llvm.vp.reduce.umin.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		%r8 = call i32 @llvm.vp.reduce.umax.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		ret void
		}

		define void @test_vp_reduce_fp_v4(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n) {
		%r0 = call float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		%r1 = call nnan float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		%r2 = call nnan ninf float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		%r3 = call float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		%r4 = call nnan float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		%r5 = call nnan ninf float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		%r6 = call float @llvm.vp.reduce.fadd.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		%r7 = call reassoc float @llvm.vp.reduce.fadd.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		%r8 = call float @llvm.vp.reduce.fmul.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		%r9 = call reassoc float @llvm.vp.reduce.fmul.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		ret void
		}

; All VP intrinsics have to be lowered into non-VP ops		; All VP intrinsics have to be lowered into non-VP ops
; Convert %evl into %mask for non-speculatable VP intrinsics and emit the		; Convert %evl into %mask for non-speculatable VP intrinsics and emit the
; instruction+select idiom with a non-VP SIMD instruction.		; instruction+select idiom with a non-VP SIMD instruction.
;		;
; ALL-CONVERT-NOT: {{call.* @llvm.vp.add}}		; ALL-CONVERT-NOT: {{call.* @llvm.vp.add}}
; ALL-CONVERT-NOT: {{call.* @llvm.vp.sub}}		; ALL-CONVERT-NOT: {{call.* @llvm.vp.sub}}
; ALL-CONVERT-NOT: {{call.* @llvm.vp.mul}}		; ALL-CONVERT-NOT: {{call.* @llvm.vp.mul}}
; ALL-CONVERT-NOT: {{call.* @llvm.vp.sdiv}}		; ALL-CONVERT-NOT: {{call.* @llvm.vp.sdiv}}
Show All 27 Lines
; ALL-CONVERT-NEXT: %{{.+}} = or <8 x i32> %i0, %i1		; ALL-CONVERT-NEXT: %{{.+}} = or <8 x i32> %i0, %i1
; ALL-CONVERT-NEXT: %{{.+}} = xor <8 x i32> %i0, %i1		; ALL-CONVERT-NEXT: %{{.+}} = xor <8 x i32> %i0, %i1
; ALL-CONVERT-NEXT: %{{.+}} = ashr <8 x i32> %i0, %i1		; ALL-CONVERT-NEXT: %{{.+}} = ashr <8 x i32> %i0, %i1
; ALL-CONVERT-NEXT: %{{.+}} = lshr <8 x i32> %i0, %i1		; ALL-CONVERT-NEXT: %{{.+}} = lshr <8 x i32> %i0, %i1
; ALL-CONVERT-NEXT: %{{.+}} = shl <8 x i32> %i0, %i1		; ALL-CONVERT-NEXT: %{{.+}} = shl <8 x i32> %i0, %i1
; ALL-CONVERT: ret void		; ALL-CONVERT: ret void


		; Check that reductions use the correct neutral element for masked-off elements
		; ALL-CONVERT: define void @test_vp_reduce_int_v4(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n) {
		; ALL-CONVERT-NEXT: [[ADD:%.+]] = select <4 x i1> %m, <4 x i32> %vi, <4 x i32> zeroinitializer
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[ADD]])
		; ALL-CONVERT-NEXT: %{{.+}} = add i32 [[RED]], %start
		; ALL-CONVERT-NEXT: [[MUL:%.+]] = select <4 x i1> %m, <4 x i32> %vi, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> [[MUL]])
		; ALL-CONVERT-NEXT: %{{.+}} = mul i32 [[RED]], %start
		; ALL-CONVERT-NEXT: [[AND:%.+]] = select <4 x i1> %m, <4 x i32> %vi, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call i32 @llvm.vector.reduce.and.v4i32(<4 x i32> [[AND]])
		; ALL-CONVERT-NEXT: %{{.+}} = and i32 [[RED]], %start
		; ALL-CONVERT-NEXT: [[OR:%.+]] = select <4 x i1> %m, <4 x i32> %vi, <4 x i32> zeroinitializer
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[OR]])
		; ALL-CONVERT-NEXT: %{{.+}} = or i32 [[RED]], %start
		; ALL-CONVERT-NEXT: [[XOR:%.+]] = select <4 x i1> %m, <4 x i32> %vi, <4 x i32> zeroinitializer
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call i32 @llvm.vector.reduce.xor.v4i32(<4 x i32> [[XOR]])
		; ALL-CONVERT-NEXT: %{{.+}} = xor i32 [[RED]], %start
		; ALL-CONVERT-NEXT: [[SMIN:%.+]] = select <4 x i1> %m, <4 x i32> %vi, <4 x i32> <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call i32 @llvm.vector.reduce.smin.v4i32(<4 x i32> [[SMIN]])
		; ALL-CONVERT-NEXT: %{{.+}} = call i32 @llvm.smin.i32(i32 [[RED]], i32 %start)
		; ALL-CONVERT-NEXT: [[SMAX:%.+]] = select <4 x i1> %m, <4 x i32> %vi, <4 x i32> <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call i32 @llvm.vector.reduce.smax.v4i32(<4 x i32> [[SMAX]])
		; ALL-CONVERT-NEXT: %{{.+}} = call i32 @llvm.smax.i32(i32 [[RED]], i32 %start)
		; ALL-CONVERT-NEXT: [[UMIN:%.+]] = select <4 x i1> %m, <4 x i32> %vi, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> [[UMIN]])
		; ALL-CONVERT-NEXT: %{{.+}} = call i32 @llvm.umin.i32(i32 [[RED]], i32 %start)
		; ALL-CONVERT-NEXT: [[UMAX:%.+]] = select <4 x i1> %m, <4 x i32> %vi, <4 x i32> zeroinitializer
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call i32 @llvm.vector.reduce.umax.v4i32(<4 x i32> [[UMAX]])
		; ALL-CONVERT-NEXT: %{{.+}} = call i32 @llvm.umax.i32(i32 [[RED]], i32 %start)
		; ALL-CONVERT-NEXT: ret void

		; Check that reductions use the correct neutral element for masked-off elements
		; ALL-CONVERT: define void @test_vp_reduce_fp_v4(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n) {
		; ALL-CONVERT-NEXT: [[FMIN:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float 0x7FF8000000000000, float 0x7FF8000000000000, float 0x7FF8000000000000, float 0x7FF8000000000000>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call float @llvm.vector.reduce.fmin.v4f32(<4 x float> [[FMIN]])
		; ALL-CONVERT-NEXT: %{{.+}} = call float @llvm.minnum.f32(float [[RED]], float %f)
		; ALL-CONVERT-NEXT: [[FMIN_NNAN:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call nnan float @llvm.vector.reduce.fmin.v4f32(<4 x float> [[FMIN_NNAN]])
		; ALL-CONVERT-NEXT: %{{.+}} = call nnan float @llvm.minnum.f32(float [[RED]], float %f)
		; ALL-CONVERT-NEXT: [[FMIN_NNAN_NINF:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float 0x47EFFFFFE0000000, float 0x47EFFFFFE0000000, float 0x47EFFFFFE0000000, float 0x47EFFFFFE0000000>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call nnan ninf float @llvm.vector.reduce.fmin.v4f32(<4 x float> [[FMIN_NNAN_NINF]])
		; ALL-CONVERT-NEXT: %{{.+}} = call nnan ninf float @llvm.minnum.f32(float [[RED]], float %f)
		; ALL-CONVERT-NEXT: [[FMAX:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float 0xFFF8000000000000, float 0xFFF8000000000000, float 0xFFF8000000000000, float 0xFFF8000000000000>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call float @llvm.vector.reduce.fmax.v4f32(<4 x float> [[FMAX]])
		; ALL-CONVERT-NEXT: %{{.+}} = call float @llvm.maxnum.f32(float [[RED]], float %f)
		; ALL-CONVERT-NEXT: [[FMAX_NNAN:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float 0xFFF0000000000000, float 0xFFF0000000000000, float 0xFFF0000000000000, float 0xFFF0000000000000>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call nnan float @llvm.vector.reduce.fmax.v4f32(<4 x float> [[FMAX_NNAN]])
		; ALL-CONVERT-NEXT: %{{.+}} = call nnan float @llvm.maxnum.f32(float [[RED]], float %f)
		; ALL-CONVERT-NEXT: [[FMAX_NNAN_NINF:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float 0xC7EFFFFFE0000000, float 0xC7EFFFFFE0000000, float 0xC7EFFFFFE0000000, float 0xC7EFFFFFE0000000>
		; ALL-CONVERT-NEXT: [[RED:%.+]] = call nnan ninf float @llvm.vector.reduce.fmax.v4f32(<4 x float> [[FMAX_NNAN_NINF]])
		; ALL-CONVERT-NEXT: %{{.+}} = call nnan ninf float @llvm.maxnum.f32(float [[RED]], float %f)
		; ALL-CONVERT-NEXT: [[FADD:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>
		; ALL-CONVERT-NEXT: %{{.+}} = call float @llvm.vector.reduce.fadd.v4f32(float %f, <4 x float> [[FADD]])
		; ALL-CONVERT-NEXT: [[FADD:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>
		; ALL-CONVERT-NEXT: %{{.+}} = call reassoc float @llvm.vector.reduce.fadd.v4f32(float %f, <4 x float> [[FADD]])
		; ALL-CONVERT-NEXT: [[FMUL:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
		; ALL-CONVERT-NEXT: %{{.+}} = call float @llvm.vector.reduce.fmul.v4f32(float %f, <4 x float> [[FMUL]])
		; ALL-CONVERT-NEXT: [[FMUL:%.+]] = select <4 x i1> %m, <4 x float> %vf, <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
		; ALL-CONVERT-NEXT: %{{.+}} = call reassoc float @llvm.vector.reduce.fmul.v4f32(float %f, <4 x float> [[FMUL]])
		; ALL-CONVERT-NEXT: ret void

; All legal - don't transform anything.		; All legal - don't transform anything.

; LEGAL_LEGAL: define void @test_vp_int_v8(<8 x i32> %i0, <8 x i32> %i1, <8 x i32> %i2, <8 x i32> %f3, <8 x i1> %m, i32 %n) {		; LEGAL_LEGAL: define void @test_vp_int_v8(<8 x i32> %i0, <8 x i32> %i1, <8 x i32> %i2, <8 x i32> %f3, <8 x i1> %m, i32 %n) {
; LEGAL_LEGAL-NEXT: %r0 = call <8 x i32> @llvm.vp.add.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %r0 = call <8 x i32> @llvm.vp.add.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
; LEGAL_LEGAL-NEXT: %r1 = call <8 x i32> @llvm.vp.sub.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %r1 = call <8 x i32> @llvm.vp.sub.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
; LEGAL_LEGAL-NEXT: %r2 = call <8 x i32> @llvm.vp.mul.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %r2 = call <8 x i32> @llvm.vp.mul.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
; LEGAL_LEGAL-NEXT: %r3 = call <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %r3 = call <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
Show All 19 Lines
; LEGAL_LEGAL-NEXT: %r7 = call <vscale x 4 x i32> @llvm.vp.and.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %r7 = call <vscale x 4 x i32> @llvm.vp.and.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
; LEGAL_LEGAL-NEXT: %r8 = call <vscale x 4 x i32> @llvm.vp.or.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %r8 = call <vscale x 4 x i32> @llvm.vp.or.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
; LEGAL_LEGAL-NEXT: %r9 = call <vscale x 4 x i32> @llvm.vp.xor.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %r9 = call <vscale x 4 x i32> @llvm.vp.xor.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
; LEGAL_LEGAL-NEXT: %rA = call <vscale x 4 x i32> @llvm.vp.ashr.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %rA = call <vscale x 4 x i32> @llvm.vp.ashr.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
; LEGAL_LEGAL-NEXT: %rB = call <vscale x 4 x i32> @llvm.vp.lshr.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %rB = call <vscale x 4 x i32> @llvm.vp.lshr.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
; LEGAL_LEGAL-NEXT: %rC = call <vscale x 4 x i32> @llvm.vp.shl.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)		; LEGAL_LEGAL-NEXT: %rC = call <vscale x 4 x i32> @llvm.vp.shl.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %n)
; LEGAL_LEGAL-NEXT: ret void		; LEGAL_LEGAL-NEXT: ret void

		; LEGAL_LEGAL: define void @test_vp_reduce_int_v4(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n) {
		; LEGAL_LEGAL-NEXT: %r0 = call i32 @llvm.vp.reduce.add.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r1 = call i32 @llvm.vp.reduce.mul.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r2 = call i32 @llvm.vp.reduce.and.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r3 = call i32 @llvm.vp.reduce.or.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r4 = call i32 @llvm.vp.reduce.xor.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r5 = call i32 @llvm.vp.reduce.smin.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r6 = call i32 @llvm.vp.reduce.smax.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r7 = call i32 @llvm.vp.reduce.umin.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r8 = call i32 @llvm.vp.reduce.umax.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: ret void

		; LEGAL_LEGAL: define void @test_vp_reduce_fp_v4(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n) {
		; LEGAL_LEGAL-NEXT: %r0 = call float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r1 = call nnan float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r2 = call nnan ninf float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r3 = call float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r4 = call nnan float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r5 = call nnan ninf float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r6 = call float @llvm.vp.reduce.fadd.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r7 = call reassoc float @llvm.vp.reduce.fadd.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r8 = call float @llvm.vp.reduce.fmul.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: %r9 = call reassoc float @llvm.vp.reduce.fmul.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n)
		; LEGAL_LEGAL-NEXT: ret void

; Drop %evl where possible else fold %evl into %mask (%evl Discard, %mask Legal)		; Drop %evl where possible else fold %evl into %mask (%evl Discard, %mask Legal)
;		;
; There is no caching yet in the ExpandVectorPredication pass and the %evl		; There is no caching yet in the ExpandVectorPredication pass and the %evl
; expansion code is emitted for every non-speculatable intrinsic again. Hence,		; expansion code is emitted for every non-speculatable intrinsic again. Hence,
; only check that..		; only check that..
; (1) The %evl folding code and %mask are correct for the first		; (1) The %evl folding code and %mask are correct for the first
; non-speculatable VP intrinsic.		; non-speculatable VP intrinsic.
Show All 32 Lines
; DISCARD_LEGAL: %r1 = call <vscale x 4 x i32> @llvm.vp.sub.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %scalable_size{{.*}})		; DISCARD_LEGAL: %r1 = call <vscale x 4 x i32> @llvm.vp.sub.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %scalable_size{{.*}})
; DISCARD_LEGAL: %r2 = call <vscale x 4 x i32> @llvm.vp.mul.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %scalable_size{{.*}})		; DISCARD_LEGAL: %r2 = call <vscale x 4 x i32> @llvm.vp.mul.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> %m, i32 %scalable_size{{.*}})
; DISCARD_LEGAL: [[EVLM:%.+]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32 0, i32 %n)		; DISCARD_LEGAL: [[EVLM:%.+]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32 0, i32 %n)
; DISCARD_LEGAL: [[NEWM:%.+]] = and <vscale x 4 x i1> [[EVLM]], %m		; DISCARD_LEGAL: [[NEWM:%.+]] = and <vscale x 4 x i1> [[EVLM]], %m
; DISCARD_LEGAL: %r3 = call <vscale x 4 x i32> @llvm.vp.sdiv.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> [[NEWM]], i32 %scalable_size{{.*}})		; DISCARD_LEGAL: %r3 = call <vscale x 4 x i32> @llvm.vp.sdiv.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> [[NEWM]], i32 %scalable_size{{.*}})
; DISCARD_LEGAL-NOT: %{{.+}} = call <vscale x 4 x i32> @llvm.vp.{{.*}}, i32 %n)		; DISCARD_LEGAL-NOT: %{{.+}} = call <vscale x 4 x i32> @llvm.vp.{{.*}}, i32 %n)
; DISCARD_LEGAL: ret void		; DISCARD_LEGAL: ret void

		; DISCARD_LEGAL: define void @test_vp_reduce_int_v4(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n) {
		; DISCARD_LEGAL-NEXT: %r0 = call i32 @llvm.vp.reduce.add.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r1 = call i32 @llvm.vp.reduce.mul.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r2 = call i32 @llvm.vp.reduce.and.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r3 = call i32 @llvm.vp.reduce.or.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r4 = call i32 @llvm.vp.reduce.xor.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r5 = call i32 @llvm.vp.reduce.smin.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r6 = call i32 @llvm.vp.reduce.smax.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r7 = call i32 @llvm.vp.reduce.umin.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r8 = call i32 @llvm.vp.reduce.umax.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: ret void

		; DISCARD_LEGAL: define void @test_vp_reduce_fp_v4(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n) {
		; DISCARD_LEGAL-NEXT: %r0 = call float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r1 = call nnan float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r2 = call nnan ninf float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r3 = call float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r4 = call nnan float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r5 = call nnan ninf float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r6 = call float @llvm.vp.reduce.fadd.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r7 = call reassoc float @llvm.vp.reduce.fadd.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r8 = call float @llvm.vp.reduce.fmul.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: %r9 = call reassoc float @llvm.vp.reduce.fmul.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; DISCARD_LEGAL-NEXT: ret void

; Convert %evl into %mask everywhere (%evl Convert, %mask Legal)		; Convert %evl into %mask everywhere (%evl Convert, %mask Legal)
;		;
; For the same reasons as in the (%evl Discard, %mask Legal) case only check that..		; For the same reasons as in the (%evl Discard, %mask Legal) case only check that..
; (1) The %evl folding code and %mask are correct for the first VP intrinsic.		; (1) The %evl folding code and %mask are correct for the first VP intrinsic.
; (2) All other VP intrinsics have a modified mask argument.		; (2) All other VP intrinsics have a modified mask argument.
; (3) All VP intrinsics have an ineffective %evl parameter.		; (3) All VP intrinsics have an ineffective %evl parameter.
;		;
Show All 22 Lines
; CONVERT_LEGAL-NEXT: [[EVLM:%.+]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32 0, i32 %n)		; CONVERT_LEGAL-NEXT: [[EVLM:%.+]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i32(i32 0, i32 %n)
; CONVERT_LEGAL-NEXT: [[NEWM:%.+]] = and <vscale x 4 x i1> [[EVLM]], %m		; CONVERT_LEGAL-NEXT: [[NEWM:%.+]] = and <vscale x 4 x i1> [[EVLM]], %m
; CONVERT_LEGAL-NEXT: %vscale = call i32 @llvm.vscale.i32()		; CONVERT_LEGAL-NEXT: %vscale = call i32 @llvm.vscale.i32()
; CONVERT_LEGAL-NEXT: %scalable_size = mul nuw i32 %vscale, 4		; CONVERT_LEGAL-NEXT: %scalable_size = mul nuw i32 %vscale, 4
; CONVERT_LEGAL-NEXT: %r0 = call <vscale x 4 x i32> @llvm.vp.add.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> [[NEWM]], i32 %scalable_size)		; CONVERT_LEGAL-NEXT: %r0 = call <vscale x 4 x i32> @llvm.vp.add.nxv4i32(<vscale x 4 x i32> %i0, <vscale x 4 x i32> %i1, <vscale x 4 x i1> [[NEWM]], i32 %scalable_size)
; CONVERT_LEGAL-NOT: %{{.}} = call <vscale x 4 x i32> @llvm.vp.{{.}}, i32 %n)		; CONVERT_LEGAL-NOT: %{{.}} = call <vscale x 4 x i32> @llvm.vp.{{.}}, i32 %n)
; CONVERT_LEGAL: ret void		; CONVERT_LEGAL: ret void

		; CONVERT_LEGAL: define void @test_vp_reduce_int_v4(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n) {
		; CONVERT_LEGAL-NEXT: [[NINS:%.+]] = insertelement <4 x i32> poison, i32 %n, i32 0
		; CONVERT_LEGAL-NEXT: [[NSPLAT:%.+]] = shufflevector <4 x i32> [[NINS]], <4 x i32> poison, <4 x i32> zeroinitializer
		; CONVERT_LEGAL-NEXT: [[EVLM:%.+]] = icmp ult <4 x i32> <i32 0, i32 1, i32 2, i32 3>, [[NSPLAT]]
		; CONVERT_LEGAL-NEXT: [[NEWM:%.+]] = and <4 x i1> [[EVLM]], %m
		; CONVERT_LEGAL-NEXT: %{{.+}} = call i32 @llvm.vp.reduce.add.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> [[NEWM]], i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call i32 @llvm.vp.reduce.mul.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call i32 @llvm.vp.reduce.and.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call i32 @llvm.vp.reduce.or.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call i32 @llvm.vp.reduce.xor.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call i32 @llvm.vp.reduce.smin.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call i32 @llvm.vp.reduce.smax.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call i32 @llvm.vp.reduce.umin.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call i32 @llvm.vp.reduce.umax.v4i32(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL: ret void

		; CONVERT_LEGAL: define void @test_vp_reduce_fp_v4(float %f, <4 x float> %vf, <4 x i1> %m, i32 %n) {
		; CONVERT_LEGAL-NEXT: [[NINS:%.+]] = insertelement <4 x i32> poison, i32 %n, i32 0
		; CONVERT_LEGAL-NEXT: [[NSPLAT:%.+]] = shufflevector <4 x i32> [[NINS]], <4 x i32> poison, <4 x i32> zeroinitializer
		; CONVERT_LEGAL-NEXT: [[EVLM:%.+]] = icmp ult <4 x i32> <i32 0, i32 1, i32 2, i32 3>, [[NSPLAT]]
		; CONVERT_LEGAL-NEXT: [[NEWM:%.+]] = and <4 x i1> [[EVLM]], %m
		; CONVERT_LEGAL-NEXT: %{{.+}} = call float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> [[NEWM]], i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call nnan float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call nnan ninf float @llvm.vp.reduce.fmin.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call nnan float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call nnan ninf float @llvm.vp.reduce.fmax.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call float @llvm.vp.reduce.fadd.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call reassoc float @llvm.vp.reduce.fadd.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call float @llvm.vp.reduce.fmul.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL-NOT: %{{.+}} = call reassoc float @llvm.vp.reduce.fmul.v4f32(float %f, <4 x float> %vf, <4 x i1> %m, i32 4)
		; CONVERT_LEGAL: ret void

llvm/test/Verifier/vp-intrinsics.ll

Show All 23 Lines	define void @test_vp_fp(<8 x double> %f0, <8 x double> %f1, <8 x i1> %m, i32 %n) {
%r2 = call <8 x double> @llvm.vp.fmul.v8f64(<8 x double> %f0, <8 x double> %f1, <8 x i1> %m, i32 %n)		%r2 = call <8 x double> @llvm.vp.fmul.v8f64(<8 x double> %f0, <8 x double> %f1, <8 x i1> %m, i32 %n)
%r3 = call <8 x double> @llvm.vp.fdiv.v8f64(<8 x double> %f0, <8 x double> %f1, <8 x i1> %m, i32 %n)		%r3 = call <8 x double> @llvm.vp.fdiv.v8f64(<8 x double> %f0, <8 x double> %f1, <8 x i1> %m, i32 %n)
%r4 = call <8 x double> @llvm.vp.frem.v8f64(<8 x double> %f0, <8 x double> %f1, <8 x i1> %m, i32 %n)		%r4 = call <8 x double> @llvm.vp.frem.v8f64(<8 x double> %f0, <8 x double> %f1, <8 x i1> %m, i32 %n)
ret void		ret void
}		}

; TODO: test_vp_constrained_fp		; TODO: test_vp_constrained_fp


		define void @test_vp_reduction(<8 x i32> %vi, <8 x float> %vf, float %f, <8 x i1> %m, i32 %n) {
		%r0 = call i32 @llvm.vp.reduce.add.v8i32(<8 x i32> %vi, <8 x i1> %m, i32 %n)
		craig.topperUnsubmitted Done Reply Inline Actions Start value is missing craig.topper: Start value is missing
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions Cheers. frasercrmck: Cheers.
		%r1 = call i32 @llvm.vp.reduce.mul.v8i32(<8 x i32> %vi, <8 x i1> %m, i32 %n)
		%r2 = call i32 @llvm.vp.reduce.and.v8i32(<8 x i32> %vi, <8 x i1> %m, i32 %n)
		%r3 = call i32 @llvm.vp.reduce.or.v8i32(<8 x i32> %vi, <8 x i1> %m, i32 %n)
		%r4 = call i32 @llvm.vp.reduce.xor.v8i32(<8 x i32> %vi, <8 x i1> %m, i32 %n)
		%r5 = call i32 @llvm.vp.reduce.smax.v8i32(<8 x i32> %vi, <8 x i1> %m, i32 %n)
		%r6 = call i32 @llvm.vp.reduce.smin.v8i32(<8 x i32> %vi, <8 x i1> %m, i32 %n)
		%r7 = call i32 @llvm.vp.reduce.umax.v8i32(<8 x i32> %vi, <8 x i1> %m, i32 %n)
		%r8 = call i32 @llvm.vp.reduce.umin.v8i32(<8 x i32> %vi, <8 x i1> %m, i32 %n)
		%r9 = call float @llvm.vp.reduce.fmin.v8f32(<8 x float> %vf, <8 x i1> %m, i32 %n)
		%rA = call float @llvm.vp.reduce.fmax.v8f32(<8 x float> %vf, <8 x i1> %m, i32 %n)
		%rB = call float @llvm.vp.reduce.fadd.v8f32(float %f, <8 x float> %vf, <8 x i1> %m, i32 %n)
		%rC = call float @llvm.vp.reduce.fmul.v8f32(float %f, <8 x float> %vf, <8 x i1> %m, i32 %n)
		ret void
		}

; integer arith		; integer arith
declare <8 x i32> @llvm.vp.add.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.add.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.sub.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.sub.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.mul.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.mul.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.srem.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.srem.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.udiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.udiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.urem.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.urem.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
; bit arith		; bit arith
declare <8 x i32> @llvm.vp.and.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.and.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.or.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.or.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.xor.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.xor.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
declare <8 x i32> @llvm.vp.shl.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)		declare <8 x i32> @llvm.vp.shl.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
; fp arith		; fp arith
declare <8 x double> @llvm.vp.fadd.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)		declare <8 x double> @llvm.vp.fadd.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)
declare <8 x double> @llvm.vp.fsub.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)		declare <8 x double> @llvm.vp.fsub.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)
declare <8 x double> @llvm.vp.fmul.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)		declare <8 x double> @llvm.vp.fmul.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)
declare <8 x double> @llvm.vp.fdiv.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)		declare <8 x double> @llvm.vp.fdiv.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)
declare <8 x double> @llvm.vp.frem.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)		declare <8 x double> @llvm.vp.frem.v8f64(<8 x double>, <8 x double>, <8 x i1>, i32)
		; reductions
		declare i32 @llvm.vp.reduce.add.v8i32(<8 x i32>, <8 x i1>, i32)
		declare i32 @llvm.vp.reduce.mul.v8i32(<8 x i32>, <8 x i1>, i32)
		declare i32 @llvm.vp.reduce.and.v8i32(<8 x i32>, <8 x i1>, i32)
		declare i32 @llvm.vp.reduce.or.v8i32(<8 x i32>, <8 x i1>, i32)
		declare i32 @llvm.vp.reduce.xor.v8i32(<8 x i32>, <8 x i1>, i32)
		declare i32 @llvm.vp.reduce.smax.v8i32(<8 x i32>, <8 x i1>, i32)
		declare i32 @llvm.vp.reduce.smin.v8i32(<8 x i32>, <8 x i1>, i32)
		declare i32 @llvm.vp.reduce.umax.v8i32(<8 x i32>, <8 x i1>, i32)
		declare i32 @llvm.vp.reduce.umin.v8i32(<8 x i32>, <8 x i1>, i32)
		declare float @llvm.vp.reduce.fmin.v8f32(<8 x float>, <8 x i1>, i32)
		declare float @llvm.vp.reduce.fmax.v8f32(<8 x float>, <8 x i1>, i32)
		declare float @llvm.vp.reduce.fadd.v8f32(float, <8 x float>, <8 x i1>, i32)
		declare float @llvm.vp.reduce.fmul.v8f32(float, <8 x float>, <8 x i1>, i32)

llvm/unittests/IR/VPIntrinsicTest.cpp

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	for (const char *BinaryFPOpcode : BinaryFPOpcodes)
Str << " declare <8 x float> @llvm.vp." << BinaryFPOpcode		Str << " declare <8 x float> @llvm.vp." << BinaryFPOpcode
<< ".v8f32(<8 x float>, <8 x float>, <8 x i1>, i32) ";		<< ".v8f32(<8 x float>, <8 x float>, <8 x i1>, i32) ";

Str << " declare void @llvm.vp.store.v8i32.p0v8i32(<8 x i32>, <8 x i32>*, <8 x i1>, i32) ";		Str << " declare void @llvm.vp.store.v8i32.p0v8i32(<8 x i32>, <8 x i32>*, <8 x i1>, i32) ";
Str << " declare void @llvm.vp.scatter.v8i32.v8p0i32(<8 x i32>, <8 x i32*>, <8 x i1>, i32) ";		Str << " declare void @llvm.vp.scatter.v8i32.v8p0i32(<8 x i32>, <8 x i32*>, <8 x i1>, i32) ";
Str << " declare <8 x i32> @llvm.vp.load.v8i32.p0v8i32(<8 x i32>*, <8 x i1>, i32) ";		Str << " declare <8 x i32> @llvm.vp.load.v8i32.p0v8i32(<8 x i32>*, <8 x i1>, i32) ";
Str << " declare <8 x i32> @llvm.vp.gather.v8i32.v8p0i32(<8 x i32*>, <8 x i1>, i32) ";		Str << " declare <8 x i32> @llvm.vp.gather.v8i32.v8p0i32(<8 x i32*>, <8 x i1>, i32) ";

		const char *ReductionIntOpcodes[] = {"add", "mul", "and", "or", "xor",
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - const char ReductionIntOpcodes[] = {"add", "mul", "and", "or", "xor", - "smin", "smax", "umin", "umax"}; - for (const char ReductionOpcode : ReductionIntOpcodes) - Str << " declare i32 @llvm.vp.reduce." << ReductionOpcode - << ".v8i32(i32, <8 x i32>, <8 x i1>, i32) "; + const char ReductionIntOpcodes[] = { + "add", "mul", "and", "or", "xor", "smin", "smax", "umin", "umax"}; + for (const char ReductionOpcode : ReductionIntOpcodes) + Str << " declare i32 @llvm.vp.reduce." << ReductionOpcode + << ".v8i32(i32, <8 x i32>, <8 x i1>, i32) "; Lint: Pre-merge checks: clang-format: please reformat the code ``` - const char *ReductionIntOpcodes[] = {"add"…
		"smin", "smax", "umin", "umax"};
		for (const char *ReductionOpcode : ReductionIntOpcodes)
		Str << " declare i32 @llvm.vp.reduce." << ReductionOpcode
		<< ".v8i32(i32, <8 x i32>, <8 x i1>, i32) ";

		const char *ReductionFPOpcodes[] = {"fadd", "fmul", "fmin", "fmax"};
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - const char ReductionFPOpcodes[] = {"fadd", "fmul", "fmin", "fmax"}; - for (const char ReductionOpcode : ReductionFPOpcodes) - Str << " declare float @llvm.vp.reduce." << ReductionOpcode - << ".v8f32(float, <8 x float>, <8 x i1>, i32) "; + const char ReductionFPOpcodes[] = {"fadd", "fmul", "fmin", "fmax"}; + for (const char ReductionOpcode : ReductionFPOpcodes) + Str << " declare float @llvm.vp.reduce." << ReductionOpcode + << ".v8f32(float, <8 x float>, <8 x i1>, i32) "; Lint: Pre-merge checks: clang-format: please reformat the code ``` - const char *ReductionFPOpcodes[] = {"fadd"…
		for (const char *ReductionOpcode : ReductionFPOpcodes)
		Str << " declare float @llvm.vp.reduce." << ReductionOpcode
		simollUnsubmitted Done Reply Inline Actions Good ol' printf debugging simoll: Good ol' printf debugging
		frasercrmckAuthorUnsubmitted Done Reply Inline Actions :) frasercrmck: :)
		<< ".v8f32(float, <8 x float>, <8 x i1>, i32) ";

return parseAssemblyString(Str.str(), Err, C);		return parseAssemblyString(Str.str(), Err, C);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - return parseAssemblyString(Str.str(), Err, C); + return parseAssemblyString(Str.str(), Err, C); Lint: Pre-merge checks: clang-format: please reformat the code ``` - return parseAssemblyString(Str.str(), Err, C)…
}		}
};		};

/// Check that the property scopes include/llvm/IR/VPIntrinsics.def are closed.		/// Check that the property scopes include/llvm/IR/VPIntrinsics.def are closed.
TEST_F(VPIntrinsicTest, VPIntrinsicsDefScopes) {		TEST_F(VPIntrinsicTest, VPIntrinsicsDefScopes) {
Optional<Intrinsic::ID> ScopeVPID;		Optional<Intrinsic::ID> ScopeVPID;
#define BEGIN_REGISTER_VP_INTRINSIC(VPID, ...) \		#define BEGIN_REGISTER_VP_INTRINSIC(VPID, ...) \
ASSERT_FALSE(ScopeVPID.hasValue()); \		ASSERT_FALSE(ScopeVPID.hasValue()); \
▲ Show 20 Lines • Show All 223 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[VP] Add vector-predicated reduction intrinsicsClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 358635

llvm/docs/LangRef.rst

llvm/include/llvm/IR/IntrinsicInst.h

llvm/include/llvm/IR/Intrinsics.td

llvm/include/llvm/IR/VPIntrinsics.def

llvm/lib/CodeGen/ExpandVectorPredication.cpp

llvm/lib/IR/IntrinsicInst.cpp

llvm/test/CodeGen/Generic/expand-vp.ll

llvm/test/Verifier/vp-intrinsics.ll

llvm/unittests/IR/VPIntrinsicTest.cpp

[VP] Add vector-predicated reduction intrinsics
ClosedPublic