This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
1
LangRef.rst
-
include/llvm/
-
llvm/
-
CodeGen/
-
MachinePassRegistry.def
-
Passes.h
-
IR/
-
Intrinsics.td
-
VPIntrinsics.def
-
InitializePasses.h
-
lib/CodeGen/
-
CodeGen/
-
CMakeLists.txt
13/17
ExpandPowi.cpp
-
TargetPassConfig.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
O0-pipeline.ll
-
O3-pipeline.ll
-
AMDGPU/
-
llc-pipeline.ll
-
ARM/
-
O3-pipeline.ll
-
Generic/
1
expand-powi.ll
-
LoongArch/
-
O0-pipeline.ll
-
opt-pipeline.ll
-
M68k/
-
pipeline.ll
-
PowerPC/
-
O0-pipeline.ll
-
O3-pipeline.ll
-
RISCV/
-
O0-pipeline.ll
-
O3-pipeline.ll
-
rvv/
-
expand-powi.ll
-
X86/
-
O0-pipeline.ll
-
opt-pipeline.ll
-
tools/
-
llc/
-
llc.cpp
-
opt/
-
opt.cpp
-
unittests/IR/
-
IR/
-
VPIntrinsicTest.cpp

Differential D143578

[VP] Add vp.powi and a pass for expanding vp.powi before DAG.
Needs ReviewPublic

Authored by fakepaper56 on Feb 8 2023, 6:09 AM.

Download Raw Diff

Details

Reviewers

craig.topper
reames
frasercrmck
rogfer01
simoll

Summary

The patch uses different expanding way for vp.powi from the method of powi.
Vector powi is unrolled to multiple powi() libary calls in SelectionDAG, but the
method is not work for scalable vectors.
To support scalable vectors, the patch expands vp.powi at IR level. The
expanding way of vp.powi is based on compiler-rt/__powidf2.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	30 ms	x64 debian > LLVM-Unit.IR/_/IRTests/33::38

Event Timeline

fakepaper56 created this revision.Feb 8 2023, 6:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 8 2023, 6:09 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

fakepaper56 requested review of this revision.Feb 8 2023, 6:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 8 2023, 6:09 AM

Herald added subscribers: llvm-commits, alextsao1999, jdoerfert. · View Herald Transcript

Maybe I missed the rationale, but why not use the ExpandVectorPredicationPass for this?

Harbormaster completed remote builds in B212590: Diff 495819.Feb 8 2023, 7:21 AM

Maybe I missed the rationale, but why not use the ExpandVectorPredicationPass for this?

LLVM could not expanding scalable vector type powi now. So this pass is not only for vp.powi, but also expanding scalable vector type powi in the future.

craig.topper added inline comments.Feb 9 2023, 7:32 PM

llvm/lib/CodeGen/ExpandPowi.cpp
35	expansion*
70	CreatePHI returns a PHINode*, can we use that to avoid casts?
124	support*
157	Why does this require AA?

Address Craig's comment and add missing test case.

Harbormaster completed remote builds in B212951: Diff 496323.Feb 9 2023, 9:36 PM

fakepaper56 marked 3 inline comments as done.Feb 10 2023, 1:17 AM

fakepaper56 added inline comments.

llvm/lib/CodeGen/ExpandPowi.cpp
157	Sorry, they are my misuse.

craig.topper added inline comments.Feb 10 2023, 10:09 PM

llvm/lib/CodeGen/ExpandPowi.cpp
17	GlobalsModRef Probably uneeded?

craig.topper added inline comments.Feb 10 2023, 10:10 PM

llvm/lib/CodeGen/ExpandPowi.cpp
83	What's preventing using vp.icmp?

Cleanup headers.

fakepaper56 marked 2 inline comments as done.Feb 11 2023, 4:09 AM

fakepaper56 added inline comments.

llvm/lib/CodeGen/ExpandPowi.cpp
83	It is only that I don't know how to construct vp.icmp/vp.fcmp instructions.

fakepaper56 added inline comments.Feb 11 2023, 4:18 AM

llvm/lib/CodeGen/ExpandPowi.cpp
83	I don't understand how make predicate to a pointer of `Value`.

Harbormaster completed remote builds in B213209: Diff 496678.Feb 11 2023, 5:09 AM

craig.topper added inline comments.Feb 11 2023, 11:58 AM

llvm/lib/CodeGen/ExpandPowi.cpp

Should be something like this code from IRBuilder with the assert removed.

Value *getConstrainedFPPredicate(CmpInst::Predicate Predicate) {               
  assert(CmpInst::isFPPredicate(Predicate) &&                                  
         Predicate != CmpInst::FCMP_FALSE &&                                   
         Predicate != CmpInst::FCMP_TRUE &&                                    
         "Invalid constrained FP comparison predicate!");                      
                                                                               
  StringRef PredicateStr = CmpInst::getPredicateName(Predicate);               
  auto *PredicateMDS = MDString::get(Context, PredicateStr);                   
                                                                               
  return MetadataAsValue::get(Context, PredicateMDS);                          
}

Use vp.icmp instead of icmp.

fakepaper56 marked an inline comment as done.Feb 11 2023, 11:50 PM

fakepaper56 added inline comments.

llvm/lib/CodeGen/ExpandPowi.cpp
83	Thank you for the recommendation.

Harbormaster completed remote builds in B213265: Diff 496743.Feb 12 2023, 1:18 AM

Rebase and ping.

craig.topper added inline comments.Feb 21 2023, 12:13 AM

llvm/lib/CodeGen/ExpandPowi.cpp
115	old fixme?
133	Drop curly braces.

craig.topper added inline comments.Feb 21 2023, 12:14 AM

llvm/lib/CodeGen/ExpandPowi.cpp
59	why "forward"?

Address Craig's comment.

fakepaper56 marked 3 inline comments as done.Feb 21 2023, 12:28 AM

fakepaper56 added inline comments.

llvm/lib/CodeGen/ExpandPowi.cpp
59	Sorry, it didn't make sense. I changed it to powi-expansion-loop.

Harbormaster completed remote builds in B214939: Diff 499057.Feb 21 2023, 1:32 AM

In D143578#4113149, @fakepaper56 wrote:

Maybe I missed the rationale, but why not use the ExpandVectorPredicationPass for this?

LLVM could not expanding scalable vector type powi now. So this pass is not only for vp.powi, but also expanding scalable vector type powi in the future.

Apologies, I was away on holiday.

Thanks - I missed that the plan was also to support llvm.powi. I guess I just find ExpandPowi and ExpandVectorPredicationPass to be doing two very similar things (in this patch) with regards to vp.powi: expanding it into an equivalent set of operations; that seems unfortunate.

I get that scalable-vector llvm.powi is different, but so would many other scalable-vector intrinsics if the target doesn't support that operation: llvm.sin, llvm.cos, etc. So would we have passes for each intrinsic? If not, ExpandPowi seems too restrictive in its scope.

If we're supporting intrinsics, what about plain scalable-vector add on a target without scalable vectors, like x86?

I'd basically like to know how this fits in with some longer-term strategy about what we want to support for illegal scalable-vector operations, rather than this specific powi use-case. If we start to open the door to specific intrinsics, I think it'd help to have a well-defined rationale and plan in mind.

In D143578#4140944, @frasercrmck wrote:

I get that scalable-vector llvm.powi is different, but so would many other scalable-vector intrinsics if the target doesn't support that operation: llvm.sin, llvm.cos, etc. So would we have passes for each intrinsic? If not, ExpandPowi seems too restrictive in its scope.

If we're supporting intrinsics, what about plain scalable-vector add on a target without scalable vectors, like x86?

I agree with you that only expanding powi is too restrictive. I think at least we should expand all the math function in a pass. But I don't have no idea that whether we should expand scalable operations for target without scalable vectors?

In D143578#4140944, @frasercrmck wrote:

In D143578#4113149, @fakepaper56 wrote:

Maybe I missed the rationale, but why not use the ExpandVectorPredicationPass for this?

LLVM could not expanding scalable vector type powi now. So this pass is not only for vp.powi, but also expanding scalable vector type powi in the future.

Apologies, I was away on holiday.

Thanks - I missed that the plan was also to support llvm.powi. I guess I just find ExpandPowi and ExpandVectorPredicationPass to be doing two very similar things (in this patch) with regards to vp.powi: expanding it into an equivalent set of operations; that seems unfortunate.

I get that scalable-vector llvm.powi is different, but so would many other scalable-vector intrinsics if the target doesn't support that operation: llvm.sin, llvm.cos, etc. So would we have passes for each intrinsic? If not, ExpandPowi seems too restrictive in its scope.

If we're supporting intrinsics, what about plain scalable-vector add on a target without scalable vectors, like x86?

I'd basically like to know how this fits in with some longer-term strategy about what we want to support for illegal scalable-vector operations, rather than this specific powi use-case. If we start to open the door to specific intrinsics, I think it'd help to have a well-defined rationale and plan in mind.

Note that this pass doesn't scalarize

In D143578#4141678, @fakepaper56 wrote:

In D143578#4140944, @frasercrmck wrote:

I get that scalable-vector llvm.powi is different, but so would many other scalable-vector intrinsics if the target doesn't support that operation: llvm.sin, llvm.cos, etc. So would we have passes for each intrinsic? If not, ExpandPowi seems too restrictive in its scope.

If we're supporting intrinsics, what about plain scalable-vector add on a target without scalable vectors, like x86?

I agree with you that only expanding powi is too restrictive. I think at least we should expand all the math function in a pass. But I don't have no idea that whether we should expand scalable operations for target without scalable vectors?

How would we expand the other math functions? Many of them are large and probably difficult to keep in vector form. We could scalarize them with a loop and use scalar libcalls. But that makes it very different than what we're doing for powi here.

How do envision sharing this code for llvm.powi. A lot of this code creates VP intrinsics. Do you have an abstraction plan?

In D143578#4142322, @craig.topper wrote:

How do envision sharing this code for llvm.powi. A lot of this code creates VP intrinsics. Do you have an abstraction plan?

My plan is use same expanding function but use true mask for its mask and the elementcount for its evl.

Also expanding llvm.powi.

Harbormaster completed remote builds in B216934: Diff 501809.Mar 2 2023, 3:11 AM

No test for RISC-V?

llvm/test/CodeGen/Generic/expand-powi.ll
3	This needs a `REQUIRES: x86-registered-target` or it needs to be moved into the X86 directory.

craig.topper added inline comments.Mar 7 2023, 9:54 PM

llvm/lib/CodeGen/ExpandPowi.cpp
128	I think we should do a vp_icmp followed by a mask vp_reduce_or.

All the existing tests for llvm.powi use a scalar exponent even when the result is a vector. Should vp.powi only accept scalar exponent?

I think we should follow rule of llvm.powi first.

This update does,

Make vp.powi follows llvm.powi to only accept scalar exponent.
Add tests for RISC-V.
Update test cases.

But it still a test fail for ir unit test. I don't know how to debug it. I even
can not use gdb to trace it.
The below command about the test fails.

$ LLVM_SYMBOLIZER_PATH=./build/bin/llvm-symbolizer ./build/unittests/IR/./IRTests
...
[ RUN      ] VPIntrinsicTest.VPIntrinsicDeclarationForParams
IRTests: /home/yeting/x86-riscv-llvm/llvm/include/llvm/ADT/ArrayRef.h:255: const T& llvm::ArrayRef<T>::operator[](size_t) const [with T = llvm::Type*; size_t = long unsigned int]: Assertion `Index < Length && "Invalid index!"' failed.
 #0 0x00005620c5f909ee llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/yeting/x86-riscv-llvm/llvm/lib/Support/Unix/Signals.inc:567:22
 #1 0x00005620c5f90dc0 PrintStackTraceSignalHandler(void*) /home/yeting/x86-riscv-llvm/llvm/lib/Support/Unix/Signals.inc:641:1
 #2 0x00005620c5f8e4fe llvm::sys::RunSignalHandlers() /home/yeting/x86-riscv-llvm/llvm/lib/Support/Signals.cpp:104:20
 #3 0x00005620c5f9033f SignalHandler(int) /home/yeting/x86-riscv-llvm/llvm/lib/Support/Unix/Signals.inc:412:1
 #4 0x00007f3c933e0980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)
 #5 0x00007f3c91d92e87 raise /build/glibc-uZu3wS/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #6 0x00007f3c91d947f1 abort /build/glibc-uZu3wS/glibc-2.27/stdlib/abort.c:81:0
 #7 0x00007f3c91d843fa __assert_fail_base /build/glibc-uZu3wS/glibc-2.27/assert/assert.c:89:0
 #8 0x00007f3c91d84472 (/lib/x86_64-linux-gnu/libc.so.6+0x30472)
 #9 0x00005620c5b9b1c6 llvm::ArrayRef<llvm::Type*>::operator[](unsigned long) const /home/yeting/x86-riscv-llvm/llvm/include/llvm/ADT/ArrayRef.h:256:14
#10 0x00005620c5cde47b DecodeFixedType(llvm::ArrayRef<llvm::Intrinsic::IITDescriptor>&, llvm::ArrayRef<llvm::Type*>, llvm::LLVMContext&) /home/yeting/x86-riscv-llvm/llvm/lib/IR/Function.cpp:1401:37
#11 0x00005620c5cdeae6 llvm::Intrinsic::getType(llvm::LLVMContext&, unsigned int, llvm::ArrayRef<llvm::Type*>) /home/yeting/x86-riscv-llvm/llvm/lib/IR/Function.cpp:1480:21
#12 0x00005620c5cf70c8 llvm::Intrinsic::getDeclaration(llvm::Module*, unsigned int, llvm::ArrayRef<llvm::Type*>) /home/yeting/x86-riscv-llvm/llvm/lib/IR/Function.cpp:1505:21
#13 0x00005620c5d49863 llvm::VPIntrinsic::getDeclarationForParams(llvm::Module*, unsigned int, llvm::Type*, llvm::ArrayRef<llvm::Value*>) /home/yeting/x86-riscv-llvm/llvm/lib/IR/IntrinsicInst.cpp:594:39
#14 0x00005620c56b2283 (anonymous namespace)::VPIntrinsicTest_VPIntrinsicDeclarationForParams_Test::TestBody() /home/yeting/x86-riscv-llvm/llvm/unittests/IR/VPIntrinsicTest.cpp:367:72

Herald added subscribers: luke, kosarev, • pcwang-thead and 24 others. · View Herald TranscriptMar 15 2023, 7:50 AM

craig.topper added inline comments.Mar 15 2023, 8:31 AM

llvm/docs/LangRef.rst
20078	`Predicated version of raising a vector of floating-point values to an integer power.`

Fixed crash by adding special case in llvm::VPIntrinsic::getDeclarationForParams

Harbormaster completed remote builds in B219650: Diff 505511.Mar 15 2023, 9:56 AM

In D143578#4142322, @craig.topper wrote:

How would we expand the other math functions? Many of them are large and probably difficult to keep in vector form. We could scalarize them with a loop and use scalar libcalls.

I want to second this point. I think doing the fancy expansion here is a bad idea at this time. We can come back to that, but an initial implementation should scalarize via a loop. The lowering works for all of the lane-wise math routines. Only once we have correct lowering for the majority of the routines should we bother optimizing any of them.

Even then, I'm not convinced that inlining this loop is profitable over generating a runtime call to a new routine.

llvm/lib/CodeGen/ExpandPowi.cpp
36	This appears to correspond to the recently introduced IRBuilder::CreateElementCount.

In D143578#4197721, @reames wrote:

In D143578#4142322, @craig.topper wrote:

How would we expand the other math functions? Many of them are large and probably difficult to keep in vector form. We could scalarize them with a loop and use scalar libcalls.

I want to second this point. I think doing the fancy expansion here is a bad idea at this time. We can come back to that, but an initial implementation should scalarize via a loop. The lowering works for all of the lane-wise math routines. Only once we have correct lowering for the majority of the routines should we bother optimizing any of them.

Even then, I'm not convinced that inlining this loop is profitable over generating a runtime call to a new routine.

I want to mention that powi is weird and does not correspond to a real math routine. It's a fast math optimization for pow with an integer argument. The scalar version of powi is provided in libgcc/compiler-rt while pow itself is in libm. This almost makes it a compiler implementation detail. Should a vector math library provide this function?

In D143578#4197800, @craig.topper wrote:

Even then, I'm not convinced that inlining this loop is profitable over generating a runtime call to a new routine.

I want to mention that powi is weird and does not correspond to a real math routine. It's a fast math optimization for pow with an integer argument. The scalar version of powi is provided in libgcc/compiler-rt while pow itself is in libm. This almost makes it a compiler implementation detail. Should a vector math library provide this function?

One of the options which was mentioned in the recent compiler-rt thread on discourse was to have a weak definition defined in each object file so that the linker could pick one (including the runtime libs if available). I'd lean towards something like that.

Use CreateElementCount and fix typos in LangRef.rst.

Harbormaster completed remote builds in B219806: Diff 505725.Mar 16 2023, 3:09 AM

In D143578#4197817, @reames wrote:

One of the options which was mentioned in the recent compiler-rt thread on discourse was to have a weak definition defined in each object file so that the linker could pick one (including the runtime libs if available). I'd lean towards something like that.

Could you provide the link of the discourse you mentioned?

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

53 lines

include/

llvm/

CodeGen/

MachinePassRegistry.def

1 line

Passes.h

3 lines

IR/

Intrinsics.td

5 lines

VPIntrinsics.def

3 lines

InitializePasses.h

1 line

lib/

CodeGen/

CMakeLists.txt

1 line

ExpandPowi.cpp

160 lines

TargetPassConfig.cpp

1 line

test/

CodeGen/

AArch64/

O0-pipeline.ll

1 line

O3-pipeline.ll

1 line

AMDGPU/

llc-pipeline.ll

5 lines

ARM/

O3-pipeline.ll

1 line

Generic/

expand-powi.ll

66 lines

LoongArch/

O0-pipeline.ll

1 line

opt-pipeline.ll

1 line

M68k/

pipeline.ll

1 line

PowerPC/

O0-pipeline.ll

1 line

O3-pipeline.ll

1 line

RISCV/

O0-pipeline.ll

1 line

O3-pipeline.ll

1 line

rvv/

expand-powi.ll

151 lines

X86/

O0-pipeline.ll

1 line

opt-pipeline.ll

1 line

tools/

llc/

llc.cpp

1 line

opt/

opt.cpp

2 lines

unittests/

IR/

VPIntrinsicTest.cpp

2 lines

Diff 505725

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,399 Lines • ▼ Show 20 Lines

Return the same value as a corresponding libm '``sqrt``' function but without		Return the same value as a corresponding libm '``sqrt``' function but without
trapping or setting ``errno``. For types specified by IEEE-754, the result		trapping or setting ``errno``. For types specified by IEEE-754, the result
matches a conforming libm implementation.		matches a conforming libm implementation.

When specified with the fast-math-flag 'afn', the result may be approximated		When specified with the fast-math-flag 'afn', the result may be approximated
using a less accurate calculation.		using a less accurate calculation.

		.. _int_powi:

'``llvm.powi.*``' Intrinsic		'``llvm.powi.*``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

This is an overloaded intrinsic. You can use ``llvm.powi`` on any		This is an overloaded intrinsic. You can use ``llvm.powi`` on any
floating-point or vector of floating-point type. Not all targets support		floating-point or vector of floating-point type. Not all targets support
▲ Show 20 Lines • Show All 5,603 Lines • ▼ Show 20 Lines	::

declare <16 x float> @llvm.vp.sqrt.v16f32 (<16 x float> <op>, <16 x i1> <mask>, i32 <vector_length>)		declare <16 x float> @llvm.vp.sqrt.v16f32 (<16 x float> <op>, <16 x i1> <mask>, i32 <vector_length>)
declare <vscale x 4 x float> @llvm.vp.sqrt.nxv4f32 (<vscale x 4 x float> <op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)		declare <vscale x 4 x float> @llvm.vp.sqrt.nxv4f32 (<vscale x 4 x float> <op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
declare <256 x double> @llvm.vp.sqrt.v256f64 (<256 x double> <op>, <256 x i1> <mask>, i32 <vector_length>)		declare <256 x double> @llvm.vp.sqrt.v256f64 (<256 x double> <op>, <256 x i1> <mask>, i32 <vector_length>)

Overview:		Overview:
"""""""""		"""""""""

Predicated floating-point square root of a vector of floating-point values.		Predicated version of raising a vector of floating-point values to an integer power.


Arguments:		Arguments:
""""""""""		""""""""""

The first operand and the result have the same vector of floating-point type.		The first operand and the result have the same vector of floating-point type.
The second operand is the vector mask and has the same number of elements as the		The second operand is the vector mask and has the same number of elements as the
result vector type. The third operand is the explicit vector length of the		result vector type. The third operand is the explicit vector length of the
Show All 14 Lines	.. code-block:: llvm

%r = call <4 x float> @llvm.vp.sqrt.v4f32(<4 x float> %a, <4 x i1> %mask, i32 %evl)		%r = call <4 x float> @llvm.vp.sqrt.v4f32(<4 x float> %a, <4 x i1> %mask, i32 %evl)
;; For all lanes below %evl, %r is lane-wise equivalent to %also.r		;; For all lanes below %evl, %r is lane-wise equivalent to %also.r

%t = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)		%t = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %a)
%also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> poison		%also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> poison


		.. _int_vp_powi:

		'``llvm.vp.powi.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x float> @llvm.vp.sqrt.v16f32.i32 (<16 x float> <base>, i32 <exp>, <16 x i1> <mask>, i32 <vector_length>)
		declare <vscale x 4 x float> @llvm.vp.sqrt.nxv4f32.i32 (<vscale x 4 x float> <op>, i32 <exp>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
		declare <256 x double> @llvm.vp.sqrt.v256f64.i64 (<256 x double> <op>, i64 <exp>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated floating-point square root of a vector of floating-point values.
		craig.topperUnsubmitted Not Done Reply Inline Actions `Predicated version of raising a vector of floating-point values to an integer power.` craig.topper: `Predicated version of raising a vector of floating-point values to an integer power.`


		Arguments:
		""""""""""

		The first operand and the result have the same vector of floating-point type.
		The second oeprand is an integer power. The third operand is the vector mask and
		has the same number of elements as the result vector type. The fourth operand is
		the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.powi``' intrinsic performs floating-point powi (:ref:`powi <int_powi>`) of
		the first vector operand on each enabled lane with the second operand as
		exponent. The result on disabled lanes is a :ref:`poison value <poisonvalues>`.
		The operation is performed in the default floating-point environment.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x float> @llvm.vp.powi.v4f32.i32(<4 x float> %a, i32 %b, <4 x i1> %mask, i32 %evl)
		;; For all lanes below %evl, %r is lane-wise equivalent to %also.r

		%t = call <4 x float> @llvm.powi.v4f32(<4 x float> %a, i32 %b)
		%also.r = select <4 x i1> %mask, <4 x float> %t, <4 x float> poison


.. _int_vp_fma:		.. _int_vp_fma:

'``llvm.vp.fma.*``' Intrinsics		'``llvm.vp.fma.*``' Intrinsics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""
This is an overloaded intrinsic.		This is an overloaded intrinsic.
▲ Show 20 Lines • Show All 6,878 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/MachinePassRegistry.def

	Show All 39 Lines
	FUNCTION_PASS("unreachableblockelim", UnreachableBlockElimPass, ())			FUNCTION_PASS("unreachableblockelim", UnreachableBlockElimPass, ())
	FUNCTION_PASS("consthoist", ConstantHoistingPass, ())			FUNCTION_PASS("consthoist", ConstantHoistingPass, ())
	FUNCTION_PASS("replace-with-veclib", ReplaceWithVeclib, ())			FUNCTION_PASS("replace-with-veclib", ReplaceWithVeclib, ())
	FUNCTION_PASS("partially-inline-libcalls", PartiallyInlineLibCallsPass, ())			FUNCTION_PASS("partially-inline-libcalls", PartiallyInlineLibCallsPass, ())
	FUNCTION_PASS("ee-instrument", EntryExitInstrumenterPass, (false))			FUNCTION_PASS("ee-instrument", EntryExitInstrumenterPass, (false))
	FUNCTION_PASS("post-inline-ee-instrument", EntryExitInstrumenterPass, (true))			FUNCTION_PASS("post-inline-ee-instrument", EntryExitInstrumenterPass, (true))
	FUNCTION_PASS("expand-large-div-rem", ExpandLargeDivRemPass, ())			FUNCTION_PASS("expand-large-div-rem", ExpandLargeDivRemPass, ())
	FUNCTION_PASS("expand-large-fp-convert", ExpandLargeFpConvertPass, ())			FUNCTION_PASS("expand-large-fp-convert", ExpandLargeFpConvertPass, ())
				FUNCTION_PASS("expand-powi", ExpandPowiPass, ())
	FUNCTION_PASS("expand-reductions", ExpandReductionsPass, ())			FUNCTION_PASS("expand-reductions", ExpandReductionsPass, ())
	FUNCTION_PASS("expandvp", ExpandVectorPredicationPass, ())			FUNCTION_PASS("expandvp", ExpandVectorPredicationPass, ())
	FUNCTION_PASS("lowerinvoke", LowerInvokePass, ())			FUNCTION_PASS("lowerinvoke", LowerInvokePass, ())
	FUNCTION_PASS("scalarize-masked-mem-intrin", ScalarizeMaskedMemIntrinPass, ())			FUNCTION_PASS("scalarize-masked-mem-intrin", ScalarizeMaskedMemIntrinPass, ())
	FUNCTION_PASS("tlshoist", TLSVariableHoistPass, ())			FUNCTION_PASS("tlshoist", TLSVariableHoistPass, ())
	FUNCTION_PASS("verify", VerifierPass, ())			FUNCTION_PASS("verify", VerifierPass, ())
	#undef FUNCTION_PASS			#undef FUNCTION_PASS

	▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/Passes.h

Show First 20 Lines • Show All 512 Lines • ▼ Show 20 Lines	namespace llvm {
FunctionPass *createExpandVectorPredicationPass();		FunctionPass *createExpandVectorPredicationPass();

// Expands large div/rem instructions.		// Expands large div/rem instructions.
FunctionPass *createExpandLargeDivRemPass();		FunctionPass *createExpandLargeDivRemPass();

// Expands large div/rem instructions.		// Expands large div/rem instructions.
FunctionPass *createExpandLargeFpConvertPass();		FunctionPass *createExpandLargeFpConvertPass();

		// Expands powi instructions.
		FunctionPass *createExpandPowiPass();

// This pass expands memcmp() to load/stores.		// This pass expands memcmp() to load/stores.
FunctionPass *createExpandMemCmpPass();		FunctionPass *createExpandMemCmpPass();

/// Creates Break False Dependencies pass. \see BreakFalseDeps.cpp		/// Creates Break False Dependencies pass. \see BreakFalseDeps.cpp
FunctionPass *createBreakFalseDeps();		FunctionPass *createBreakFalseDeps();

// This pass expands indirectbr instructions.		// This pass expands indirectbr instructions.
FunctionPass *createIndirectBrExpandPass();		FunctionPass *createIndirectBrExpandPass();
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 1,676 Lines • ▼ Show 20 Lines	let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn] in {
def int_vp_rint : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],		def int_vp_rint : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],
[ LLVMMatchType<0>,		[ LLVMMatchType<0>,
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
llvm_i32_ty]>;		llvm_i32_ty]>;
def int_vp_nearbyint : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],		def int_vp_nearbyint : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],
[ LLVMMatchType<0>,		[ LLVMMatchType<0>,
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
llvm_i32_ty]>;		llvm_i32_ty]>;
		def int_vp_powi : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		llvm_anyint_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;

// Casts		// Casts
def int_vp_trunc : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],		def int_vp_trunc : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],
[ llvm_anyvector_ty,		[ llvm_anyvector_ty,
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
llvm_i32_ty]>;		llvm_i32_ty]>;
def int_vp_zext : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],		def int_vp_zext : DefaultAttrsIntrinsic<[ llvm_anyvector_ty ],
[ llvm_anyvector_ty,		[ llvm_anyvector_ty,
▲ Show 20 Lines • Show All 513 Lines • Show Last 20 Lines

llvm/include/llvm/IR/VPIntrinsics.def

	Show First 20 Lines • Show All 358 Lines • ▼ Show 20 Lines
	// llvm.vp.rint(x,mask,vlen)			// llvm.vp.rint(x,mask,vlen)
	BEGIN_REGISTER_VP(vp_rint, 1, 2, VP_FRINT, -1)			BEGIN_REGISTER_VP(vp_rint, 1, 2, VP_FRINT, -1)
	END_REGISTER_VP(vp_rint, VP_FRINT)			END_REGISTER_VP(vp_rint, VP_FRINT)

	// llvm.vp.nearbyint(x,mask,vlen)			// llvm.vp.nearbyint(x,mask,vlen)
	BEGIN_REGISTER_VP(vp_nearbyint, 1, 2, VP_FNEARBYINT, -1)			BEGIN_REGISTER_VP(vp_nearbyint, 1, 2, VP_FNEARBYINT, -1)
	END_REGISTER_VP(vp_nearbyint, VP_FNEARBYINT)			END_REGISTER_VP(vp_nearbyint, VP_FNEARBYINT)

				// llvm.vp.powi(x, y, mask,vlen)
				BEGIN_REGISTER_VP_INTRINSIC(vp_powi, 2, 3)
				END_REGISTER_VP_INTRINSIC(vp_powi)
	///// } Floating-Point Arithmetic			///// } Floating-Point Arithmetic

	///// Type Casts {			///// Type Casts {
	// Specialized helper macro for type conversions.			// Specialized helper macro for type conversions.
	// <operation>(%x, %mask, %evl).			// <operation>(%x, %mask, %evl).
	#ifdef HELPER_REGISTER_FP_CAST_VP			#ifdef HELPER_REGISTER_FP_CAST_VP
	#error \			#error \
	"The internal helper macro HELPER_REGISTER_FP_CAST_VP is already defined!"			"The internal helper macro HELPER_REGISTER_FP_CAST_VP is already defined!"
	▲ Show 20 Lines • Show All 271 Lines • Show Last 20 Lines

llvm/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
	void initializeEarlyTailDuplicatePass(PassRegistry&);			void initializeEarlyTailDuplicatePass(PassRegistry&);
	void initializeEdgeBundlesPass(PassRegistry&);			void initializeEdgeBundlesPass(PassRegistry&);
	void initializeEHContGuardCatchretPass(PassRegistry &);			void initializeEHContGuardCatchretPass(PassRegistry &);
	void initializeEliminateAvailableExternallyLegacyPassPass(PassRegistry&);			void initializeEliminateAvailableExternallyLegacyPassPass(PassRegistry&);
	void initializeExpandLargeFpConvertLegacyPassPass(PassRegistry&);			void initializeExpandLargeFpConvertLegacyPassPass(PassRegistry&);
	void initializeExpandLargeDivRemLegacyPassPass(PassRegistry&);			void initializeExpandLargeDivRemLegacyPassPass(PassRegistry&);
	void initializeExpandMemCmpPassPass(PassRegistry&);			void initializeExpandMemCmpPassPass(PassRegistry&);
	void initializeExpandPostRAPass(PassRegistry&);			void initializeExpandPostRAPass(PassRegistry&);
				void initializeExpandPowiLegacyPassPass(PassRegistry &);
	void initializeExpandReductionsPass(PassRegistry&);			void initializeExpandReductionsPass(PassRegistry&);
	void initializeExpandVectorPredicationPass(PassRegistry &);			void initializeExpandVectorPredicationPass(PassRegistry &);
	void initializeMakeGuardsExplicitLegacyPassPass(PassRegistry&);			void initializeMakeGuardsExplicitLegacyPassPass(PassRegistry&);
	void initializeExternalAAWrapperPassPass(PassRegistry&);			void initializeExternalAAWrapperPassPass(PassRegistry&);
	void initializeFEntryInserterPass(PassRegistry&);			void initializeFEntryInserterPass(PassRegistry&);
	void initializeFinalizeISelPass(PassRegistry&);			void initializeFinalizeISelPass(PassRegistry&);
	void initializeFinalizeMachineBundlesPass(PassRegistry&);			void initializeFinalizeMachineBundlesPass(PassRegistry&);
	void initializeFixIrreduciblePass(PassRegistry &);			void initializeFixIrreduciblePass(PassRegistry &);
	▲ Show 20 Lines • Show All 252 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CMakeLists.txt

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	add_llvm_component_library(LLVMCodeGen
EarlyIfConversion.cpp		EarlyIfConversion.cpp
EdgeBundles.cpp		EdgeBundles.cpp
EHContGuardCatchret.cpp		EHContGuardCatchret.cpp
ExecutionDomainFix.cpp		ExecutionDomainFix.cpp
ExpandLargeDivRem.cpp		ExpandLargeDivRem.cpp
ExpandLargeFpConvert.cpp		ExpandLargeFpConvert.cpp
ExpandMemCmp.cpp		ExpandMemCmp.cpp
ExpandPostRAPseudos.cpp		ExpandPostRAPseudos.cpp
		ExpandPowi.cpp
ExpandReductions.cpp		ExpandReductions.cpp
ExpandVectorPredication.cpp		ExpandVectorPredication.cpp
FaultMaps.cpp		FaultMaps.cpp
FEntryInserter.cpp		FEntryInserter.cpp
FinalizeISel.cpp		FinalizeISel.cpp
FixupStatepointCallerSaved.cpp		FixupStatepointCallerSaved.cpp
FuncletLayout.cpp		FuncletLayout.cpp
GCMetadata.cpp		GCMetadata.cpp
▲ Show 20 Lines • Show All 209 Lines • Show Last 20 Lines

llvm/lib/CodeGen/ExpandPowi.cpp

This file was added.

				//===--- ExpandPowi.cpp - Expand Powi intrinsics ---------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass implements IR expansion for powi/vp.powi. The expansion is based on
				// compiler-rt/__powidf2.c.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/SmallVector.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/CodeGen/TargetLowering.h"
				#include "llvm/IR/IRBuilder.h"
				craig.topperUnsubmitted Done Reply Inline Actions GlobalsModRef Probably uneeded? craig.topper: GlobalsModRef Probably uneeded?
				#include "llvm/IR/InstIterator.h"
				#include "llvm/IR/Intrinsics.h"
				#include "llvm/IR/PassManager.h"
				#include "llvm/InitializePasses.h"
				#include "llvm/Pass.h"

				#define DEBUG_TYPE "expand-powi"

				using namespace llvm;

				// The expansion is based on the c code of compiler-rt/__powidf2.c,
				// const int recip = b < 0;
				// double r = 1;
				// while (1) {
				// if (b & 1)
				// r *= a;
				// b /= 2;
				// if (b == 0)
				craig.topperUnsubmitted Done Reply Inline Actions expansion* craig.topper: expansion*
				// break;
				reamesUnsubmitted Not Done Reply Inline Actions This appears to correspond to the recently introduced IRBuilder::CreateElementCount. reames: This appears to correspond to the recently introduced IRBuilder::CreateElementCount.
				// a *= a;
				// }
				// return recip ? 1 / r : r;
				//
				// Expansion of llvm.powi still uses vp intrinsics here. It regards llvm.powi as
				// llvm.vp.powi with true mask and maximum vl.
				static void expandPowi(IntrinsicInst *II) {
				LLVMContext &C = II->getContext();
				Value *OrigBase = II->getOperand(0);
				Value *OrigExp = II->getOperand(1);
				VectorType *BaseTy = cast<VectorType>(OrigBase->getType());
				Type *ExpTy = OrigExp->getType();
				Type *CondTy = BaseTy->getWithNewType(Type::getInt1Ty(C));
				Value *True = ConstantInt::get(CondTy, 1);
				Value Mask, EVL;
				if (II->getIntrinsicID() == Intrinsic::vp_powi) {
				Mask = II->getOperand(2);
				EVL = II->getOperand(3);
				} else {
				assert(II->getIntrinsicID() == Intrinsic::powi);
				Mask = True;
				IRBuilder<> Builder(II);
				EVL = Builder.CreateElementCount(Type::getInt32Ty(C),
				craig.topperUnsubmitted Done Reply Inline Actions why "forward"? craig.topper: why "forward"?
				fakepaper56AuthorUnsubmitted Done Reply Inline Actions Sorry, it didn't make sense. I changed it to powi-expansion-loop. fakepaper56: Sorry, it didn't make sense. I changed it to powi-expansion-loop.
				BaseTy->getElementCount());
				}

				BasicBlock *PreLoopBB = II->getParent();
				BasicBlock *PostLoopBB = PreLoopBB->splitBasicBlock(II, "powi-post-loop");
				BasicBlock *LoopBody =
				BasicBlock::Create(PreLoopBB->getContext(), "powi-expansion-loop",
				PreLoopBB->getParent(), PostLoopBB);

				IRBuilder<> Builder(PreLoopBB->getTerminator());
				Builder.CreateBr(LoopBody);
				craig.topperUnsubmitted Done Reply Inline Actions CreatePHI returns a PHINode, can we use that to avoid casts? craig.topper:* CreatePHI returns a PHINode*, can we use that to avoid casts?
				PreLoopBB->getTerminator()->eraseFromParent();

				Builder.SetInsertPoint(LoopBody);
				// Create phi of base.
				PHINode *Base = Builder.CreatePHI(BaseTy, 2, "base");
				Base->addIncoming(OrigBase, PreLoopBB);
				// Create phi of exponent.
				PHINode *Exp = Builder.CreatePHI(ExpTy, 2, "exp");
				Exp->addIncoming(OrigExp, PreLoopBB);
				// Create phi of res.
				PHINode *Res = Builder.CreatePHI(BaseTy, 2, "res");
				Res->addIncoming(ConstantFP::get(BaseTy, 1.), PreLoopBB);
				// Res *= Base if Exp is odd.
				craig.topperUnsubmitted Not Done Reply Inline Actions What's preventing using vp.icmp? craig.topper: What's preventing using vp.icmp?
				fakepaper56AuthorUnsubmitted Not Done Reply Inline Actions It is only that I don't know how to construct vp.icmp/vp.fcmp instructions. fakepaper56: It is only that I don't know how to construct vp.icmp/vp.fcmp instructions.
				fakepaper56AuthorUnsubmitted Done Reply Inline Actions I don't understand how make predicate to a pointer of `Value`. fakepaper56: I don't understand how make predicate to a pointer of `Value`.
				craig.topperUnsubmitted Done Reply Inline Actions Should be something like this code from IRBuilder with the assert removed. Value getConstrainedFPPredicate(CmpInst::Predicate Predicate) { assert(CmpInst::isFPPredicate(Predicate) && Predicate != CmpInst::FCMP_FALSE && Predicate != CmpInst::FCMP_TRUE && "Invalid constrained FP comparison predicate!"); StringRef PredicateStr = CmpInst::getPredicateName(Predicate); auto PredicateMDS = MDString::get(Context, PredicateStr); return MetadataAsValue::get(Context, PredicateMDS); } craig.topper: Should be something like this code from IRBuilder with the assert removed. ``` Value…
				fakepaper56AuthorUnsubmitted Done Reply Inline Actions Thank you for the recommendation. fakepaper56: Thank you for the recommendation.
				Value *Tmp = Builder.CreateIntrinsic(BaseTy, Intrinsic::vp_fmul,
				{Res, Base, True, EVL});
				Value *And1 = Builder.CreateAnd(Exp, ConstantInt::get(ExpTy, 1));
				Value *IsOdd = Builder.CreateICmpNE(And1, ConstantInt::get(ExpTy, 0));
				Value *IsOddVec = Builder.CreateVectorSplat(BaseTy->getElementCount(), IsOdd);
				Value *NewRes = Builder.CreateIntrinsic(BaseTy, Intrinsic::vp_select,
				{IsOddVec, Tmp, Res, EVL});
				Res->addIncoming(NewRes, LoopBody);
				// Update Exp.
				Value *NewExp = Builder.CreateLShr(Exp, ConstantInt::get(ExpTy, 1));
				Exp->addIncoming(NewExp, LoopBody);
				// Update Base.
				Value *NewBase = Builder.CreateIntrinsic(BaseTy, Intrinsic::vp_fmul,
				{Base, Base, True, EVL});
				Base->addIncoming(NewBase, LoopBody);
				// Check whether NewExp is zero.
				Builder.CreateCondBr(Builder.CreateICmpEQ(NewExp, ConstantInt::get(ExpTy, 1)),
				PostLoopBB, LoopBody);

				Builder.SetInsertPoint(&PostLoopBB->front());
				// Use reciprocal if power is negative.
				Value *Recip =
				Builder.CreateIntrinsic(BaseTy, Intrinsic::vp_fdiv,
				{ConstantFP::get(BaseTy, 1.), NewRes, Mask, EVL});
				Value *IsNegative =
				Builder.CreateICmpSLT(OrigExp, ConstantInt::get(ExpTy, 0));
				Value *IsNegativeVec =
				Builder.CreateVectorSplat(BaseTy->getElementCount(), IsNegative);
				Value *Powi = Builder.CreateIntrinsic(BaseTy, Intrinsic::vp_select,
				{IsNegativeVec, Recip, NewRes, EVL});
				II->replaceAllUsesWith(Powi);
				II->eraseFromParent();
				craig.topperUnsubmitted Done Reply Inline Actions old fixme? craig.topper: old fixme?
				}

				static bool runImpl(Function &F) {
				SmallVector<IntrinsicInst *, 4> Replace;
				for (auto &I : instructions(F)) {
				if (auto *II = dyn_cast<IntrinsicInst>(&I)) {
				// TODO: Add cost model to select small fixed vectors llvm.powi.
				if (II->getIntrinsicID() == Intrinsic::vp_powi \|\|
				(II->getIntrinsicID() == Intrinsic::powi &&
				craig.topperUnsubmitted Done Reply Inline Actions support* craig.topper: support*
				isa<ScalableVectorType>(II->getType())))
				Replace.push_back(II);
				}
				}
				craig.topperUnsubmitted Not Done Reply Inline Actions I think we should do a vp_icmp followed by a mask vp_reduce_or. craig.topper: I think we should do a vp_icmp followed by a mask vp_reduce_or.

				if (Replace.empty())
				return false;

				for (IntrinsicInst *II : Replace)
				craig.topperUnsubmitted Done Reply Inline Actions Drop curly braces. craig.topper: Drop curly braces.
				expandPowi(II);

				return true;
				}

				namespace {
				class ExpandPowiLegacyPass : public FunctionPass {
				public:
				static char ID;

				ExpandPowiLegacyPass() : FunctionPass(ID) {
				initializeExpandPowiLegacyPassPass(*PassRegistry::getPassRegistry());
				}

				bool runOnFunction(Function &F) override { return runImpl(F); }
				};
				} // namespace

				char ExpandPowiLegacyPass::ID = 0;
				INITIALIZE_PASS_BEGIN(ExpandPowiLegacyPass, "expand-powi",
				"Expand powi functions", false, false)
				INITIALIZE_PASS_END(ExpandPowiLegacyPass, "expand-powi",
				"Expand powi functions", false, false)

				craig.topperUnsubmitted Done Reply Inline Actions Why does this require AA? craig.topper: Why does this require AA?
				fakepaper56AuthorUnsubmitted Done Reply Inline Actions Sorry, they are my misuse. fakepaper56: Sorry, they are my misuse.
				FunctionPass *llvm::createExpandPowiPass() {
				return new ExpandPowiLegacyPass();
				}

llvm/lib/CodeGen/TargetPassConfig.cpp

	Show First 20 Lines • Show All 1,082 Lines • ▼ Show 20 Lines
	bool TargetPassConfig::addISelPasses() {			bool TargetPassConfig::addISelPasses() {
	if (TM->useEmulatedTLS())			if (TM->useEmulatedTLS())
	addPass(createLowerEmuTLSPass());			addPass(createLowerEmuTLSPass());

	addPass(createPreISelIntrinsicLoweringPass());			addPass(createPreISelIntrinsicLoweringPass());
	PM->add(createTargetTransformInfoWrapperPass(TM->getTargetIRAnalysis()));			PM->add(createTargetTransformInfoWrapperPass(TM->getTargetIRAnalysis()));
	addPass(createExpandLargeDivRemPass());			addPass(createExpandLargeDivRemPass());
	addPass(createExpandLargeFpConvertPass());			addPass(createExpandLargeFpConvertPass());
				addPass(createExpandPowiPass());
	addIRPasses();			addIRPasses();
	addCodeGenPrepare();			addCodeGenPrepare();
	addPassesToHandleExceptions();			addPassesToHandleExceptions();
	addISelPrepare();			addISelPrepare();

	return addCoreISelPasses();			return addCoreISelPasses();
	}			}

	▲ Show 20 Lines • Show All 460 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/O0-pipeline.ll

	Show All 11 Lines
	; CHECK-NEXT: Assumption Cache Tracker			; CHECK-NEXT: Assumption Cache Tracker
	; CHECK-NEXT: Profile summary info			; CHECK-NEXT: Profile summary info
	; CHECK-NEXT: Machine Branch Probability Analysis			; CHECK-NEXT: Machine Branch Probability Analysis
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Expand large div/rem			; CHECK-NEXT: Expand large div/rem
	; CHECK-NEXT: Expand large fp convert			; CHECK-NEXT: Expand large fp convert
				; CHECK-NEXT: Expand powi functions
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Lower Garbage Collection Instructions			; CHECK-NEXT: Lower Garbage Collection Instructions
	; CHECK-NEXT: Shadow Stack GC Lowering			; CHECK-NEXT: Shadow Stack GC Lowering
	; CHECK-NEXT: Lower constant intrinsics			; CHECK-NEXT: Lower constant intrinsics
	; CHECK-NEXT: Remove unreachable blocks from the CFG			; CHECK-NEXT: Remove unreachable blocks from the CFG
	; CHECK-NEXT: Expand vector predication intrinsics			; CHECK-NEXT: Expand vector predication intrinsics
	; CHECK-NEXT: Scalarize Masked Memory Intrinsics			; CHECK-NEXT: Scalarize Masked Memory Intrinsics
	▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/O3-pipeline.ll

	Show All 15 Lines
	; CHECK-NEXT: Machine Branch Probability Analysis			; CHECK-NEXT: Machine Branch Probability Analysis
	; CHECK-NEXT: Default Regalloc Eviction Advisor			; CHECK-NEXT: Default Regalloc Eviction Advisor
	; CHECK-NEXT: Default Regalloc Priority Advisor			; CHECK-NEXT: Default Regalloc Priority Advisor
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Expand large div/rem			; CHECK-NEXT: Expand large div/rem
	; CHECK-NEXT: Expand large fp convert			; CHECK-NEXT: Expand large fp convert
				; CHECK-NEXT: Expand powi functions
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: SVE intrinsics optimizations			; CHECK-NEXT: SVE intrinsics optimizations
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	▲ Show 20 Lines • Show All 224 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

	Show All 22 Lines
	; GCN-O0-NEXT:Create Garbage Collector Module Metadata			; GCN-O0-NEXT:Create Garbage Collector Module Metadata
	; GCN-O0-NEXT:Register Usage Information Storage			; GCN-O0-NEXT:Register Usage Information Storage
	; GCN-O0-NEXT:Machine Branch Probability Analysis			; GCN-O0-NEXT:Machine Branch Probability Analysis
	; GCN-O0-NEXT: ModulePass Manager			; GCN-O0-NEXT: ModulePass Manager
	; GCN-O0-NEXT: Pre-ISel Intrinsic Lowering			; GCN-O0-NEXT: Pre-ISel Intrinsic Lowering
	; GCN-O0-NEXT: FunctionPass Manager			; GCN-O0-NEXT: FunctionPass Manager
	; GCN-O0-NEXT: Expand large div/rem			; GCN-O0-NEXT: Expand large div/rem
	; GCN-O0-NEXT: Expand large fp convert			; GCN-O0-NEXT: Expand large fp convert
				; GCN-O0-NEXT: Expand powi functions
	; GCN-O0-NEXT: AMDGPU Printf lowering			; GCN-O0-NEXT: AMDGPU Printf lowering
	; GCN-O0-NEXT: FunctionPass Manager			; GCN-O0-NEXT: FunctionPass Manager
	; GCN-O0-NEXT: Dominator Tree Construction			; GCN-O0-NEXT: Dominator Tree Construction
	; GCN-O0-NEXT: Lower ctors and dtors for AMDGPU			; GCN-O0-NEXT: Lower ctors and dtors for AMDGPU
	; GCN-O0-NEXT: FunctionPass Manager			; GCN-O0-NEXT: FunctionPass Manager
	; GCN-O0-NEXT: Early propagate attributes from kernels to functions			; GCN-O0-NEXT: Early propagate attributes from kernels to functions
	; GCN-O0-NEXT: AMDGPU Lower Intrinsics			; GCN-O0-NEXT: AMDGPU Lower Intrinsics
	; GCN-O0-NEXT: AMDGPU Inline All Functions			; GCN-O0-NEXT: AMDGPU Inline All Functions
	▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
	; GCN-O1-NEXT:Register Usage Information Storage			; GCN-O1-NEXT:Register Usage Information Storage
	; GCN-O1-NEXT:Default Regalloc Eviction Advisor			; GCN-O1-NEXT:Default Regalloc Eviction Advisor
	; GCN-O1-NEXT:Default Regalloc Priority Advisor			; GCN-O1-NEXT:Default Regalloc Priority Advisor
	; GCN-O1-NEXT: ModulePass Manager			; GCN-O1-NEXT: ModulePass Manager
	; GCN-O1-NEXT: Pre-ISel Intrinsic Lowering			; GCN-O1-NEXT: Pre-ISel Intrinsic Lowering
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: Expand large div/rem			; GCN-O1-NEXT: Expand large div/rem
	; GCN-O1-NEXT: Expand large fp convert			; GCN-O1-NEXT: Expand large fp convert
				; GCN-O1-NEXT: Expand powi functions
	; GCN-O1-NEXT: AMDGPU Printf lowering			; GCN-O1-NEXT: AMDGPU Printf lowering
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: Dominator Tree Construction			; GCN-O1-NEXT: Dominator Tree Construction
	; GCN-O1-NEXT: Lower ctors and dtors for AMDGPU			; GCN-O1-NEXT: Lower ctors and dtors for AMDGPU
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: Early propagate attributes from kernels to functions			; GCN-O1-NEXT: Early propagate attributes from kernels to functions
	; GCN-O1-NEXT: AMDGPU Lower Intrinsics			; GCN-O1-NEXT: AMDGPU Lower Intrinsics
	; GCN-O1-NEXT: AMDGPU Inline All Functions			; GCN-O1-NEXT: AMDGPU Inline All Functions
	▲ Show 20 Lines • Show All 259 Lines • ▼ Show 20 Lines
	; GCN-O1-OPTS-NEXT:Register Usage Information Storage			; GCN-O1-OPTS-NEXT:Register Usage Information Storage
	; GCN-O1-OPTS-NEXT:Default Regalloc Eviction Advisor			; GCN-O1-OPTS-NEXT:Default Regalloc Eviction Advisor
	; GCN-O1-OPTS-NEXT:Default Regalloc Priority Advisor			; GCN-O1-OPTS-NEXT:Default Regalloc Priority Advisor
	; GCN-O1-OPTS-NEXT: ModulePass Manager			; GCN-O1-OPTS-NEXT: ModulePass Manager
	; GCN-O1-OPTS-NEXT: Pre-ISel Intrinsic Lowering			; GCN-O1-OPTS-NEXT: Pre-ISel Intrinsic Lowering
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: Expand large div/rem			; GCN-O1-OPTS-NEXT: Expand large div/rem
	; GCN-O1-OPTS-NEXT: Expand large fp convert			; GCN-O1-OPTS-NEXT: Expand large fp convert
				; GCN-O1-OPTS-NEXT: Expand powi functions
	; GCN-O1-OPTS-NEXT: AMDGPU Printf lowering			; GCN-O1-OPTS-NEXT: AMDGPU Printf lowering
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: Lower ctors and dtors for AMDGPU			; GCN-O1-OPTS-NEXT: Lower ctors and dtors for AMDGPU
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: Early propagate attributes from kernels to functions			; GCN-O1-OPTS-NEXT: Early propagate attributes from kernels to functions
	; GCN-O1-OPTS-NEXT: AMDGPU Lower Intrinsics			; GCN-O1-OPTS-NEXT: AMDGPU Lower Intrinsics
	; GCN-O1-OPTS-NEXT: AMDGPU Inline All Functions			; GCN-O1-OPTS-NEXT: AMDGPU Inline All Functions
	▲ Show 20 Lines • Show All 291 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT:Register Usage Information Storage			; GCN-O2-NEXT:Register Usage Information Storage
	; GCN-O2-NEXT:Default Regalloc Eviction Advisor			; GCN-O2-NEXT:Default Regalloc Eviction Advisor
	; GCN-O2-NEXT:Default Regalloc Priority Advisor			; GCN-O2-NEXT:Default Regalloc Priority Advisor
	; GCN-O2-NEXT: ModulePass Manager			; GCN-O2-NEXT: ModulePass Manager
	; GCN-O2-NEXT: Pre-ISel Intrinsic Lowering			; GCN-O2-NEXT: Pre-ISel Intrinsic Lowering
	; GCN-O2-NEXT: FunctionPass Manager			; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: Expand large div/rem			; GCN-O2-NEXT: Expand large div/rem
	; GCN-O2-NEXT: Expand large fp convert			; GCN-O2-NEXT: Expand large fp convert
				; GCN-O2-NEXT: Expand powi functions
	; GCN-O2-NEXT: AMDGPU Printf lowering			; GCN-O2-NEXT: AMDGPU Printf lowering
	; GCN-O2-NEXT: FunctionPass Manager			; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: Dominator Tree Construction			; GCN-O2-NEXT: Dominator Tree Construction
	; GCN-O2-NEXT: Lower ctors and dtors for AMDGPU			; GCN-O2-NEXT: Lower ctors and dtors for AMDGPU
	; GCN-O2-NEXT: FunctionPass Manager			; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: Early propagate attributes from kernels to functions			; GCN-O2-NEXT: Early propagate attributes from kernels to functions
	; GCN-O2-NEXT: AMDGPU Lower Intrinsics			; GCN-O2-NEXT: AMDGPU Lower Intrinsics
	; GCN-O2-NEXT: AMDGPU Inline All Functions			; GCN-O2-NEXT: AMDGPU Inline All Functions
	▲ Show 20 Lines • Show All 294 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT:Register Usage Information Storage			; GCN-O3-NEXT:Register Usage Information Storage
	; GCN-O3-NEXT:Default Regalloc Eviction Advisor			; GCN-O3-NEXT:Default Regalloc Eviction Advisor
	; GCN-O3-NEXT:Default Regalloc Priority Advisor			; GCN-O3-NEXT:Default Regalloc Priority Advisor
	; GCN-O3-NEXT: ModulePass Manager			; GCN-O3-NEXT: ModulePass Manager
	; GCN-O3-NEXT: Pre-ISel Intrinsic Lowering			; GCN-O3-NEXT: Pre-ISel Intrinsic Lowering
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Expand large div/rem			; GCN-O3-NEXT: Expand large div/rem
	; GCN-O3-NEXT: Expand large fp convert			; GCN-O3-NEXT: Expand large fp convert
				; GCN-O3-NEXT: Expand powi functions
	; GCN-O3-NEXT: AMDGPU Printf lowering			; GCN-O3-NEXT: AMDGPU Printf lowering
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Dominator Tree Construction			; GCN-O3-NEXT: Dominator Tree Construction
	; GCN-O3-NEXT: Lower ctors and dtors for AMDGPU			; GCN-O3-NEXT: Lower ctors and dtors for AMDGPU
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Early propagate attributes from kernels to functions			; GCN-O3-NEXT: Early propagate attributes from kernels to functions
	; GCN-O3-NEXT: AMDGPU Lower Intrinsics			; GCN-O3-NEXT: AMDGPU Lower Intrinsics
	; GCN-O3-NEXT: AMDGPU Inline All Functions			; GCN-O3-NEXT: AMDGPU Inline All Functions
	▲ Show 20 Lines • Show All 294 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/O3-pipeline.ll

	; RUN: llc -mtriple=arm -O3 -debug-pass=Structure < %s -o /dev/null 2>&1 \| grep -v "Verify generated machine code" \| FileCheck %s			; RUN: llc -mtriple=arm -O3 -debug-pass=Structure < %s -o /dev/null 2>&1 \| grep -v "Verify generated machine code" \| FileCheck %s

	; REQUIRES: asserts			; REQUIRES: asserts

	; CHECK: ModulePass Manager			; CHECK: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Expand large div/rem			; CHECK-NEXT: Expand large div/rem
	; CHECK-NEXT: Expand large fp convert			; CHECK-NEXT: Expand large fp convert
				; CHECK-NEXT: Expand powi functions
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: MVE gather/scatter lowering			; CHECK-NEXT: MVE gather/scatter lowering
	; CHECK-NEXT: MVE lane interleaving			; CHECK-NEXT: MVE lane interleaving
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	▲ Show 20 Lines • Show All 196 Lines • Show Last 20 Lines

llvm/test/CodeGen/Generic/expand-powi.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -expand-powi -S < %s \| FileCheck %s
				declare <vscale x 1 x float> @llvm.vp.powi.nxv1f32.i32(<vscale x 1 x float>, i32, <vscale x 1 x i1>, i32)
				craig.topperUnsubmitted Not Done Reply Inline Actions This needs a `REQUIRES: x86-registered-target` or it needs to be moved into the X86 directory. craig.topper: This needs a `REQUIRES: x86-registered-target` or it needs to be moved into the X86 directory.
				define <vscale x 1 x float> @foo(<vscale x 1 x float> %a, i32 %b, <vscale x 1 x i1> %m, i32 %evl) {
				; CHECK-LABEL: @foo(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[POWI_EXPANSION_LOOP:%.*]]
				; CHECK: powi-expansion-loop:
				; CHECK-NEXT: [[BASE:%.]] = phi <vscale x 1 x float> [ [[A:%.]], [[ENTRY:%.]] ], [ [[TMP5:%.]], [[POWI_EXPANSION_LOOP]] ]
				; CHECK-NEXT: [[EXP:%.]] = phi i32 [ [[B:%.]], [[ENTRY]] ], [ [[TMP4:%.*]], [[POWI_EXPANSION_LOOP]] ]
				; CHECK-NEXT: [[RES:%.]] = phi <vscale x 1 x float> [ shufflevector (<vscale x 1 x float> insertelement (<vscale x 1 x float> poison, float 1.000000e+00, i64 0), <vscale x 1 x float> poison, <vscale x 1 x i32> zeroinitializer), [[ENTRY]] ], [ [[TMP3:%.]], [[POWI_EXPANSION_LOOP]] ]
				; CHECK-NEXT: [[TMP0:%.]] = call <vscale x 1 x float> @llvm.vp.fmul.nxv1f32(<vscale x 1 x float> [[RES]], <vscale x 1 x float> [[BASE]], <vscale x 1 x i1> shufflevector (<vscale x 1 x i1> insertelement (<vscale x 1 x i1> poison, i1 true, i64 0), <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer), i32 [[EVL:%.]])
				; CHECK-NEXT: [[TMP1:%.*]] = and i32 [[EXP]], 1
				; CHECK-NEXT: [[TMP2:%.*]] = icmp ne i32 [[TMP1]], 0
				; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 1 x i1> poison, i1 [[TMP2]], i64 0
				; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 1 x i1> [[DOTSPLATINSERT]], <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer
				; CHECK-NEXT: [[TMP3]] = call <vscale x 1 x float> @llvm.vp.select.nxv1f32(<vscale x 1 x i1> [[DOTSPLAT]], <vscale x 1 x float> [[TMP0]], <vscale x 1 x float> [[RES]], i32 [[EVL]])
				; CHECK-NEXT: [[TMP4]] = lshr i32 [[EXP]], 1
				; CHECK-NEXT: [[TMP5]] = call <vscale x 1 x float> @llvm.vp.fmul.nxv1f32(<vscale x 1 x float> [[BASE]], <vscale x 1 x float> [[BASE]], <vscale x 1 x i1> shufflevector (<vscale x 1 x i1> insertelement (<vscale x 1 x i1> poison, i1 true, i64 0), <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer), i32 [[EVL]])
				; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i32 [[TMP4]], 1
				; CHECK-NEXT: br i1 [[TMP6]], label [[POWI_POST_LOOP:%.*]], label [[POWI_EXPANSION_LOOP]]
				; CHECK: powi-post-loop:
				; CHECK-NEXT: [[TMP7:%.]] = call <vscale x 1 x float> @llvm.vp.fdiv.nxv1f32(<vscale x 1 x float> shufflevector (<vscale x 1 x float> insertelement (<vscale x 1 x float> poison, float 1.000000e+00, i64 0), <vscale x 1 x float> poison, <vscale x 1 x i32> zeroinitializer), <vscale x 1 x float> [[TMP3]], <vscale x 1 x i1> [[M:%.]], i32 [[EVL]])
				; CHECK-NEXT: [[TMP8:%.*]] = icmp slt i32 [[B]], 0
				; CHECK-NEXT: [[DOTSPLATINSERT1:%.*]] = insertelement <vscale x 1 x i1> poison, i1 [[TMP8]], i64 0
				; CHECK-NEXT: [[DOTSPLAT2:%.*]] = shufflevector <vscale x 1 x i1> [[DOTSPLATINSERT1]], <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer
				; CHECK-NEXT: [[TMP9:%.*]] = call <vscale x 1 x float> @llvm.vp.select.nxv1f32(<vscale x 1 x i1> [[DOTSPLAT2]], <vscale x 1 x float> [[TMP7]], <vscale x 1 x float> [[TMP3]], i32 [[EVL]])
				; CHECK-NEXT: ret <vscale x 1 x float> [[TMP9]]
				;
				entry:
				%0 = call <vscale x 1 x float> @llvm.vp.powi.nxv1f32.i32(<vscale x 1 x float> %a, i32 %b, <vscale x 1 x i1> %m, i32 %evl)
				ret <vscale x 1 x float> %0
				}

				declare <vscale x 1 x float> @llvm.powi.nxv1f32.i32(<vscale x 1 x float>, i32)
				define <vscale x 1 x float> @foo2(<vscale x 1 x float> %a, i32 %b) {
				; CHECK-LABEL: @foo2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.*]] = call i32 @llvm.vscale.i32()
				; CHECK-NEXT: br label [[POWI_EXPANSION_LOOP:%.*]]
				; CHECK: powi-expansion-loop:
				; CHECK-NEXT: [[BASE:%.]] = phi <vscale x 1 x float> [ [[A:%.]], [[ENTRY:%.]] ], [ [[TMP6:%.]], [[POWI_EXPANSION_LOOP]] ]
				; CHECK-NEXT: [[EXP:%.]] = phi i32 [ [[B:%.]], [[ENTRY]] ], [ [[TMP5:%.*]], [[POWI_EXPANSION_LOOP]] ]
				; CHECK-NEXT: [[RES:%.]] = phi <vscale x 1 x float> [ shufflevector (<vscale x 1 x float> insertelement (<vscale x 1 x float> poison, float 1.000000e+00, i64 0), <vscale x 1 x float> poison, <vscale x 1 x i32> zeroinitializer), [[ENTRY]] ], [ [[TMP4:%.]], [[POWI_EXPANSION_LOOP]] ]
				; CHECK-NEXT: [[TMP1:%.*]] = call <vscale x 1 x float> @llvm.vp.fmul.nxv1f32(<vscale x 1 x float> [[RES]], <vscale x 1 x float> [[BASE]], <vscale x 1 x i1> shufflevector (<vscale x 1 x i1> insertelement (<vscale x 1 x i1> poison, i1 true, i64 0), <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer), i32 [[TMP0]])
				; CHECK-NEXT: [[TMP2:%.*]] = and i32 [[EXP]], 1
				; CHECK-NEXT: [[TMP3:%.*]] = icmp ne i32 [[TMP2]], 0
				; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 1 x i1> poison, i1 [[TMP3]], i64 0
				; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 1 x i1> [[DOTSPLATINSERT]], <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer
				; CHECK-NEXT: [[TMP4]] = call <vscale x 1 x float> @llvm.vp.select.nxv1f32(<vscale x 1 x i1> [[DOTSPLAT]], <vscale x 1 x float> [[TMP1]], <vscale x 1 x float> [[RES]], i32 [[TMP0]])
				; CHECK-NEXT: [[TMP5]] = lshr i32 [[EXP]], 1
				; CHECK-NEXT: [[TMP6]] = call <vscale x 1 x float> @llvm.vp.fmul.nxv1f32(<vscale x 1 x float> [[BASE]], <vscale x 1 x float> [[BASE]], <vscale x 1 x i1> shufflevector (<vscale x 1 x i1> insertelement (<vscale x 1 x i1> poison, i1 true, i64 0), <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer), i32 [[TMP0]])
				; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i32 [[TMP5]], 1
				; CHECK-NEXT: br i1 [[TMP7]], label [[POWI_POST_LOOP:%.*]], label [[POWI_EXPANSION_LOOP]]
				; CHECK: powi-post-loop:
				; CHECK-NEXT: [[TMP8:%.*]] = call <vscale x 1 x float> @llvm.vp.fdiv.nxv1f32(<vscale x 1 x float> shufflevector (<vscale x 1 x float> insertelement (<vscale x 1 x float> poison, float 1.000000e+00, i64 0), <vscale x 1 x float> poison, <vscale x 1 x i32> zeroinitializer), <vscale x 1 x float> [[TMP4]], <vscale x 1 x i1> shufflevector (<vscale x 1 x i1> insertelement (<vscale x 1 x i1> poison, i1 true, i64 0), <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer), i32 [[TMP0]])
				; CHECK-NEXT: [[TMP9:%.*]] = icmp slt i32 [[B]], 0
				; CHECK-NEXT: [[DOTSPLATINSERT1:%.*]] = insertelement <vscale x 1 x i1> poison, i1 [[TMP9]], i64 0
				; CHECK-NEXT: [[DOTSPLAT2:%.*]] = shufflevector <vscale x 1 x i1> [[DOTSPLATINSERT1]], <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer
				; CHECK-NEXT: [[TMP10:%.*]] = call <vscale x 1 x float> @llvm.vp.select.nxv1f32(<vscale x 1 x i1> [[DOTSPLAT2]], <vscale x 1 x float> [[TMP8]], <vscale x 1 x float> [[TMP4]], i32 [[TMP0]])
				; CHECK-NEXT: ret <vscale x 1 x float> [[TMP10]]
				;
				entry:
				%0 = call <vscale x 1 x float> @llvm.powi.nxv1f32.i32(<vscale x 1 x float> %a, i32 %b)
				ret <vscale x 1 x float> %0
				}

llvm/test/CodeGen/LoongArch/O0-pipeline.ll

	Show All 15 Lines
	; CHECK-NEXT: Assumption Cache Tracker			; CHECK-NEXT: Assumption Cache Tracker
	; CHECK-NEXT: Profile summary info			; CHECK-NEXT: Profile summary info
	; CHECK-NEXT: Machine Branch Probability Analysis			; CHECK-NEXT: Machine Branch Probability Analysis
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Expand large div/rem			; CHECK-NEXT: Expand large div/rem
	; CHECK-NEXT: Expand large fp convert			; CHECK-NEXT: Expand large fp convert
				; CHECK-NEXT: Expand powi functions
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Lower Garbage Collection Instructions			; CHECK-NEXT: Lower Garbage Collection Instructions
	; CHECK-NEXT: Shadow Stack GC Lowering			; CHECK-NEXT: Shadow Stack GC Lowering
	; CHECK-NEXT: Lower constant intrinsics			; CHECK-NEXT: Lower constant intrinsics
	; CHECK-NEXT: Remove unreachable blocks from the CFG			; CHECK-NEXT: Remove unreachable blocks from the CFG
	; CHECK-NEXT: Expand vector predication intrinsics			; CHECK-NEXT: Expand vector predication intrinsics
	; CHECK-NEXT: Scalarize Masked Memory Intrinsics			; CHECK-NEXT: Scalarize Masked Memory Intrinsics
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/CodeGen/LoongArch/opt-pipeline.ll

	Show All 27 Lines
	; CHECK-NEXT: Machine Branch Probability Analysis			; CHECK-NEXT: Machine Branch Probability Analysis
	; CHECK-NEXT: Default Regalloc Eviction Advisor			; CHECK-NEXT: Default Regalloc Eviction Advisor
	; CHECK-NEXT: Default Regalloc Priority Advisor			; CHECK-NEXT: Default Regalloc Priority Advisor
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Expand large div/rem			; CHECK-NEXT: Expand large div/rem
	; CHECK-NEXT: Expand large fp convert			; CHECK-NEXT: Expand large fp convert
				; CHECK-NEXT: Expand powi functions
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

llvm/test/CodeGen/M68k/pipeline.ll

	; RUN: llc -mtriple=m68k -debug-pass=Structure < %s -o /dev/null 2>&1 \| grep -v "Verify generated machine code" \| FileCheck %s			; RUN: llc -mtriple=m68k -debug-pass=Structure < %s -o /dev/null 2>&1 \| grep -v "Verify generated machine code" \| FileCheck %s
	; CHECK: ModulePass Manager			; CHECK: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Expand large div/rem			; CHECK-NEXT: Expand large div/rem
	; CHECK-NEXT: Expand large fp convert			; CHECK-NEXT: Expand large fp convert
				; CHECK-NEXT: Expand powi functions
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	▲ Show 20 Lines • Show All 127 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/O0-pipeline.ll

	Show All 12 Lines
	; CHECK-NEXT: Assumption Cache Tracker			; CHECK-NEXT: Assumption Cache Tracker
	; CHECK-NEXT: Profile summary info			; CHECK-NEXT: Profile summary info
	; CHECK-NEXT: Machine Branch Probability Analysis			; CHECK-NEXT: Machine Branch Probability Analysis
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Expand large div/rem			; CHECK-NEXT: Expand large div/rem
	; CHECK-NEXT: Expand large fp convert			; CHECK-NEXT: Expand large fp convert
				; CHECK-NEXT: Expand powi functions
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: PPC Lower MASS Entries			; CHECK-NEXT: PPC Lower MASS Entries
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Lower Garbage Collection Instructions			; CHECK-NEXT: Lower Garbage Collection Instructions
	; CHECK-NEXT: Shadow Stack GC Lowering			; CHECK-NEXT: Shadow Stack GC Lowering
	; CHECK-NEXT: Lower constant intrinsics			; CHECK-NEXT: Lower constant intrinsics
	; CHECK-NEXT: Remove unreachable blocks from the CFG			; CHECK-NEXT: Remove unreachable blocks from the CFG
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/O3-pipeline.ll

	Show All 15 Lines
	; CHECK-NEXT: Machine Branch Probability Analysis			; CHECK-NEXT: Machine Branch Probability Analysis
	; CHECK-NEXT: Default Regalloc Eviction Advisor			; CHECK-NEXT: Default Regalloc Eviction Advisor
	; CHECK-NEXT: Default Regalloc Priority Advisor			; CHECK-NEXT: Default Regalloc Priority Advisor
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Expand large div/rem			; CHECK-NEXT: Expand large div/rem
	; CHECK-NEXT: Expand large fp convert			; CHECK-NEXT: Expand large fp convert
				; CHECK-NEXT: Expand powi functions
	; CHECK-NEXT: Convert i1 constants to i32/i64 if they are returned			; CHECK-NEXT: Convert i1 constants to i32/i64 if they are returned
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: PPC Lower MASS Entries			; CHECK-NEXT: PPC Lower MASS Entries
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Split GEPs to a variadic base and a constant offset for better CSE			; CHECK-NEXT: Split GEPs to a variadic base and a constant offset for better CSE
	▲ Show 20 Lines • Show All 195 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/O0-pipeline.ll

	Show All 15 Lines
	; CHECK-NEXT: Assumption Cache Tracker			; CHECK-NEXT: Assumption Cache Tracker
	; CHECK-NEXT: Profile summary info			; CHECK-NEXT: Profile summary info
	; CHECK-NEXT: Machine Branch Probability Analysis			; CHECK-NEXT: Machine Branch Probability Analysis
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Expand large div/rem			; CHECK-NEXT: Expand large div/rem
	; CHECK-NEXT: Expand large fp convert			; CHECK-NEXT: Expand large fp convert
				; CHECK-NEXT: Expand powi functions
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Lower Garbage Collection Instructions			; CHECK-NEXT: Lower Garbage Collection Instructions
	; CHECK-NEXT: Shadow Stack GC Lowering			; CHECK-NEXT: Shadow Stack GC Lowering
	; CHECK-NEXT: Lower constant intrinsics			; CHECK-NEXT: Lower constant intrinsics
	; CHECK-NEXT: Remove unreachable blocks from the CFG			; CHECK-NEXT: Remove unreachable blocks from the CFG
	; CHECK-NEXT: Expand vector predication intrinsics			; CHECK-NEXT: Expand vector predication intrinsics
	; CHECK-NEXT: Scalarize Masked Memory Intrinsics			; CHECK-NEXT: Scalarize Masked Memory Intrinsics
	Show All 40 Lines

llvm/test/CodeGen/RISCV/O3-pipeline.ll

	Show All 19 Lines
	; CHECK-NEXT: Machine Branch Probability Analysis			; CHECK-NEXT: Machine Branch Probability Analysis
	; CHECK-NEXT: Default Regalloc Eviction Advisor			; CHECK-NEXT: Default Regalloc Eviction Advisor
	; CHECK-NEXT: Default Regalloc Priority Advisor			; CHECK-NEXT: Default Regalloc Priority Advisor
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Expand large div/rem			; CHECK-NEXT: Expand large div/rem
	; CHECK-NEXT: Expand large fp convert			; CHECK-NEXT: Expand large fp convert
				; CHECK-NEXT: Expand powi functions
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: RISCV gather/scatter lowering			; CHECK-NEXT: RISCV gather/scatter lowering
	; CHECK-NEXT: RISCV CodeGenPrepare			; CHECK-NEXT: RISCV CodeGenPrepare
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/expand-powi.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=riscv32 -mattr=+d,+zfh,+experimental-zvfh,+v -target-abi=ilp32d \
				; RUN: -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=RV32
				; RUN: llc -mtriple=riscv64 -mattr=+d,+zfh,+experimental-zvfh,+v -target-abi=lp64d \
				; RUN: -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=RV64

				declare <vscale x 1 x float> @llvm.vp.powi.nxv1f32.i32(<vscale x 1 x float>, i32, <vscale x 1 x i1>, i32)
				define <vscale x 1 x float> @foo(<vscale x 1 x float> %a, i32 %b, <vscale x 1 x i1> %m, i32 %evl) {
				; RV32-LABEL: foo:
				; RV32: # %bb.0: # %entry
				; RV32-NEXT: vmv1r.v v9, v0
				; RV32-NEXT: lui a2, 260096
				; RV32-NEXT: vsetvli a3, zero, e32, mf2, ta, ma
				; RV32-NEXT: vmv.v.x v10, a2
				; RV32-NEXT: li a2, 1
				; RV32-NEXT: mv a3, a0
				; RV32-NEXT: .LBB0_1: # %powi-expansion-loop
				; RV32-NEXT: # =>This Inner Loop Header: Depth=1
				; RV32-NEXT: andi a4, a3, 1
				; RV32-NEXT: vsetvli a5, zero, e8, mf8, ta, ma
				; RV32-NEXT: vmv.v.x v11, a4
				; RV32-NEXT: vmsne.vi v0, v11, 0
				; RV32-NEXT: vsetvli zero, a1, e32, mf2, ta, mu
				; RV32-NEXT: vfmul.vv v10, v10, v8, v0.t
				; RV32-NEXT: srli a3, a3, 1
				; RV32-NEXT: vfmul.vv v8, v8, v8
				; RV32-NEXT: bne a3, a2, .LBB0_1
				; RV32-NEXT: # %bb.2: # %powi-post-loop
				; RV32-NEXT: lui a2, 260096
				; RV32-NEXT: fmv.w.x ft0, a2
				; RV32-NEXT: vsetvli zero, zero, e32, mf2, ta, ma
				; RV32-NEXT: vmv1r.v v0, v9
				; RV32-NEXT: vfrdiv.vf v8, v10, ft0, v0.t
				; RV32-NEXT: slti a0, a0, 0
				; RV32-NEXT: vsetvli a2, zero, e8, mf8, ta, ma
				; RV32-NEXT: vmv.v.x v9, a0
				; RV32-NEXT: vmsne.vi v0, v9, 0
				; RV32-NEXT: vsetvli zero, a1, e32, mf2, ta, ma
				; RV32-NEXT: vmerge.vvm v8, v10, v8, v0
				; RV32-NEXT: ret
				;
				; RV64-LABEL: foo:
				; RV64: # %bb.0: # %entry
				; RV64-NEXT: vmv1r.v v9, v0
				; RV64-NEXT: lui a2, 260096
				; RV64-NEXT: vsetvli a3, zero, e32, mf2, ta, ma
				; RV64-NEXT: vmv.v.x v10, a2
				; RV64-NEXT: slli a1, a1, 32
				; RV64-NEXT: srli a1, a1, 32
				; RV64-NEXT: li a2, 1
				; RV64-NEXT: mv a3, a0
				; RV64-NEXT: .LBB0_1: # %powi-expansion-loop
				; RV64-NEXT: # =>This Inner Loop Header: Depth=1
				; RV64-NEXT: andi a4, a3, 1
				; RV64-NEXT: vsetvli a5, zero, e8, mf8, ta, ma
				; RV64-NEXT: vmv.v.x v11, a4
				; RV64-NEXT: vmsne.vi v0, v11, 0
				; RV64-NEXT: vsetvli zero, a1, e32, mf2, ta, mu
				; RV64-NEXT: vfmul.vv v10, v10, v8, v0.t
				; RV64-NEXT: srliw a3, a3, 1
				; RV64-NEXT: vfmul.vv v8, v8, v8
				; RV64-NEXT: bne a3, a2, .LBB0_1
				; RV64-NEXT: # %bb.2: # %powi-post-loop
				; RV64-NEXT: lui a2, 260096
				; RV64-NEXT: fmv.w.x ft0, a2
				; RV64-NEXT: vsetvli zero, zero, e32, mf2, ta, ma
				; RV64-NEXT: vmv1r.v v0, v9
				; RV64-NEXT: vfrdiv.vf v8, v10, ft0, v0.t
				; RV64-NEXT: sext.w a0, a0
				; RV64-NEXT: slti a0, a0, 0
				; RV64-NEXT: vsetvli a2, zero, e8, mf8, ta, ma
				; RV64-NEXT: vmv.v.x v9, a0
				; RV64-NEXT: vmsne.vi v0, v9, 0
				; RV64-NEXT: vsetvli zero, a1, e32, mf2, ta, ma
				; RV64-NEXT: vmerge.vvm v8, v10, v8, v0
				; RV64-NEXT: ret
				entry:
				%0 = call <vscale x 1 x float> @llvm.vp.powi.nxv1f32.i32(<vscale x 1 x float> %a, i32 %b, <vscale x 1 x i1> %m, i32 %evl)
				ret <vscale x 1 x float> %0
				}

				declare <vscale x 1 x float> @llvm.powi.nxv1f32.i32(<vscale x 1 x float>, i32)
				define <vscale x 1 x float> @foo2(<vscale x 1 x float> %a, i32 %b) {
				; RV32-LABEL: foo2:
				; RV32: # %bb.0: # %entry
				; RV32-NEXT: vmv1r.v v9, v8
				; RV32-NEXT: lui a1, 260096
				; RV32-NEXT: vsetvli a2, zero, e32, mf2, ta, ma
				; RV32-NEXT: vmv.v.x v8, a1
				; RV32-NEXT: csrr a1, vlenb
				; RV32-NEXT: srli a1, a1, 3
				; RV32-NEXT: li a2, 1
				; RV32-NEXT: mv a3, a0
				; RV32-NEXT: .LBB1_1: # %powi-expansion-loop
				; RV32-NEXT: # =>This Inner Loop Header: Depth=1
				; RV32-NEXT: andi a4, a3, 1
				; RV32-NEXT: vsetvli a5, zero, e8, mf8, ta, ma
				; RV32-NEXT: vmv.v.x v10, a4
				; RV32-NEXT: vmsne.vi v0, v10, 0
				; RV32-NEXT: vsetvli zero, a1, e32, mf2, ta, mu
				; RV32-NEXT: vfmul.vv v8, v8, v9, v0.t
				; RV32-NEXT: srli a3, a3, 1
				; RV32-NEXT: vfmul.vv v9, v9, v9
				; RV32-NEXT: bne a3, a2, .LBB1_1
				; RV32-NEXT: # %bb.2: # %powi-post-loop
				; RV32-NEXT: slti a0, a0, 0
				; RV32-NEXT: vsetvli a2, zero, e8, mf8, ta, ma
				; RV32-NEXT: vmv.v.x v9, a0
				; RV32-NEXT: vmsne.vi v0, v9, 0
				; RV32-NEXT: lui a0, 260096
				; RV32-NEXT: fmv.w.x ft0, a0
				; RV32-NEXT: vsetvli zero, a1, e32, mf2, ta, mu
				; RV32-NEXT: vfrdiv.vf v8, v8, ft0, v0.t
				; RV32-NEXT: ret
				;
				; RV64-LABEL: foo2:
				; RV64: # %bb.0: # %entry
				; RV64-NEXT: vmv1r.v v9, v8
				; RV64-NEXT: lui a1, 260096
				; RV64-NEXT: vsetvli a2, zero, e32, mf2, ta, ma
				; RV64-NEXT: vmv.v.x v8, a1
				; RV64-NEXT: csrr a1, vlenb
				; RV64-NEXT: srli a1, a1, 3
				; RV64-NEXT: li a2, 1
				; RV64-NEXT: mv a3, a0
				; RV64-NEXT: .LBB1_1: # %powi-expansion-loop
				; RV64-NEXT: # =>This Inner Loop Header: Depth=1
				; RV64-NEXT: andi a4, a3, 1
				; RV64-NEXT: vsetvli a5, zero, e8, mf8, ta, ma
				; RV64-NEXT: vmv.v.x v10, a4
				; RV64-NEXT: vmsne.vi v0, v10, 0
				; RV64-NEXT: vsetvli zero, a1, e32, mf2, ta, mu
				; RV64-NEXT: vfmul.vv v8, v8, v9, v0.t
				; RV64-NEXT: srliw a3, a3, 1
				; RV64-NEXT: vfmul.vv v9, v9, v9
				; RV64-NEXT: bne a3, a2, .LBB1_1
				; RV64-NEXT: # %bb.2: # %powi-post-loop
				; RV64-NEXT: sext.w a0, a0
				; RV64-NEXT: slti a0, a0, 0
				; RV64-NEXT: vsetvli a2, zero, e8, mf8, ta, ma
				; RV64-NEXT: vmv.v.x v9, a0
				; RV64-NEXT: vmsne.vi v0, v9, 0
				; RV64-NEXT: lui a0, 260096
				; RV64-NEXT: fmv.w.x ft0, a0
				; RV64-NEXT: vsetvli zero, a1, e32, mf2, ta, mu
				; RV64-NEXT: vfrdiv.vf v8, v8, ft0, v0.t
				; RV64-NEXT: ret
				entry:
				%0 = call <vscale x 1 x float> @llvm.powi.nxv1f32.i32(<vscale x 1 x float> %a, i32 %b)
				ret <vscale x 1 x float> %0
				}

llvm/test/CodeGen/X86/O0-pipeline.ll

	Show All 13 Lines
	; CHECK-NEXT: Assumption Cache Tracker			; CHECK-NEXT: Assumption Cache Tracker
	; CHECK-NEXT: Profile summary info			; CHECK-NEXT: Profile summary info
	; CHECK-NEXT: Machine Branch Probability Analysis			; CHECK-NEXT: Machine Branch Probability Analysis
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Expand large div/rem			; CHECK-NEXT: Expand large div/rem
	; CHECK-NEXT: Expand large fp convert			; CHECK-NEXT: Expand large fp convert
				; CHECK-NEXT: Expand powi functions
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: Lower AMX intrinsics			; CHECK-NEXT: Lower AMX intrinsics
	; CHECK-NEXT: Lower AMX type for load/store			; CHECK-NEXT: Lower AMX type for load/store
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Lower Garbage Collection Instructions			; CHECK-NEXT: Lower Garbage Collection Instructions
	; CHECK-NEXT: Shadow Stack GC Lowering			; CHECK-NEXT: Shadow Stack GC Lowering
	; CHECK-NEXT: Lower constant intrinsics			; CHECK-NEXT: Lower constant intrinsics
	; CHECK-NEXT: Remove unreachable blocks from the CFG			; CHECK-NEXT: Remove unreachable blocks from the CFG
	▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/opt-pipeline.ll

	Show All 23 Lines
	; CHECK-NEXT: Machine Branch Probability Analysis			; CHECK-NEXT: Machine Branch Probability Analysis
	; CHECK-NEXT: Default Regalloc Eviction Advisor			; CHECK-NEXT: Default Regalloc Eviction Advisor
	; CHECK-NEXT: Default Regalloc Priority Advisor			; CHECK-NEXT: Default Regalloc Priority Advisor
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Expand large div/rem			; CHECK-NEXT: Expand large div/rem
	; CHECK-NEXT: Expand large fp convert			; CHECK-NEXT: Expand large fp convert
				; CHECK-NEXT: Expand powi functions
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: Lower AMX intrinsics			; CHECK-NEXT: Lower AMX intrinsics
	; CHECK-NEXT: Lower AMX type for load/store			; CHECK-NEXT: Lower AMX type for load/store
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	▲ Show 20 Lines • Show All 198 Lines • Show Last 20 Lines

llvm/tools/llc/llc.cpp

Show First 20 Lines • Show All 357 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {
initializeCore(*Registry);		initializeCore(*Registry);
initializeCodeGen(*Registry);		initializeCodeGen(*Registry);
initializeLoopStrengthReducePass(*Registry);		initializeLoopStrengthReducePass(*Registry);
initializeLowerIntrinsicsPass(*Registry);		initializeLowerIntrinsicsPass(*Registry);
initializeUnreachableBlockElimLegacyPassPass(*Registry);		initializeUnreachableBlockElimLegacyPassPass(*Registry);
initializeConstantHoistingLegacyPassPass(*Registry);		initializeConstantHoistingLegacyPassPass(*Registry);
initializeScalarOpts(*Registry);		initializeScalarOpts(*Registry);
initializeVectorization(*Registry);		initializeVectorization(*Registry);
		initializeExpandPowiLegacyPassPass(*Registry);
initializeScalarizeMaskedMemIntrinLegacyPassPass(*Registry);		initializeScalarizeMaskedMemIntrinLegacyPassPass(*Registry);
initializeExpandReductionsPass(*Registry);		initializeExpandReductionsPass(*Registry);
initializeExpandVectorPredicationPass(*Registry);		initializeExpandVectorPredicationPass(*Registry);
initializeHardwareLoopsLegacyPass(*Registry);		initializeHardwareLoopsLegacyPass(*Registry);
initializeTransformUtils(*Registry);		initializeTransformUtils(*Registry);
initializeReplaceWithVeclibLegacyPass(*Registry);		initializeReplaceWithVeclibLegacyPass(*Registry);
initializeTLSVariableHoistLegacyPassPass(*Registry);		initializeTLSVariableHoistLegacyPassPass(*Registry);

▲ Show 20 Lines • Show All 394 Lines • Show Last 20 Lines

llvm/tools/opt/opt.cpp

Show First 20 Lines • Show All 388 Lines • ▼ Show 20 Lines	std::vector<StringRef> PassNameExact = {
"view-regions",		"view-regions",
"view-regions-only",		"view-regions-only",
"select-optimize",		"select-optimize",
"expand-large-div-rem",		"expand-large-div-rem",
"structurizecfg",		"structurizecfg",
"fix-irreducible",		"fix-irreducible",
"expand-large-fp-convert",		"expand-large-fp-convert",
"callbrprepare",		"callbrprepare",
		"expand-powi",
};		};
for (const auto &P : PassNamePrefix)		for (const auto &P : PassNamePrefix)
if (Pass.startswith(P))		if (Pass.startswith(P))
return true;		return true;
for (const auto &P : PassNameContain)		for (const auto &P : PassNameContain)
if (Pass.contains(P))		if (Pass.contains(P))
return true;		return true;
return llvm::is_contained(PassNameExact, Pass);		return llvm::is_contained(PassNameExact, Pass);
Show All 33 Lines	int main(int argc, char **argv) {
initializeTransformUtils(Registry);		initializeTransformUtils(Registry);
initializeInstCombine(Registry);		initializeInstCombine(Registry);
initializeTarget(Registry);		initializeTarget(Registry);
// For codegen passes, only passes that do IR to IR transformation are		// For codegen passes, only passes that do IR to IR transformation are
// supported.		// supported.
initializeExpandLargeDivRemLegacyPassPass(Registry);		initializeExpandLargeDivRemLegacyPassPass(Registry);
initializeExpandLargeFpConvertLegacyPassPass(Registry);		initializeExpandLargeFpConvertLegacyPassPass(Registry);
initializeExpandMemCmpPassPass(Registry);		initializeExpandMemCmpPassPass(Registry);
		initializeExpandPowiLegacyPassPass(Registry);
initializeScalarizeMaskedMemIntrinLegacyPassPass(Registry);		initializeScalarizeMaskedMemIntrinLegacyPassPass(Registry);
initializeSelectOptimizePass(Registry);		initializeSelectOptimizePass(Registry);
initializeCallBrPreparePass(Registry);		initializeCallBrPreparePass(Registry);
initializeCodeGenPreparePass(Registry);		initializeCodeGenPreparePass(Registry);
initializeAtomicExpandPass(Registry);		initializeAtomicExpandPass(Registry);
initializeRewriteSymbolsLegacyPassPass(Registry);		initializeRewriteSymbolsLegacyPassPass(Registry);
initializeWinEHPreparePass(Registry);		initializeWinEHPreparePass(Registry);
initializeDwarfEHPrepareLegacyPassPass(Registry);		initializeDwarfEHPrepareLegacyPassPass(Registry);
▲ Show 20 Lines • Show All 472 Lines • Show Last 20 Lines

llvm/unittests/IR/VPIntrinsicTest.cpp

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	std::unique_ptr<Module> createVPDeclarationModule() {
Str << " declare <8 x float> @llvm.vp.ceil.v8f32(<8 x float>, <8 x i1>, "		Str << " declare <8 x float> @llvm.vp.ceil.v8f32(<8 x float>, <8 x i1>, "
"i32)";		"i32)";
Str << " declare <8 x float> @llvm.vp.fneg.v8f32(<8 x float>, <8 x i1>, "		Str << " declare <8 x float> @llvm.vp.fneg.v8f32(<8 x float>, <8 x i1>, "
"i32)";		"i32)";
Str << " declare <8 x float> @llvm.vp.fabs.v8f32(<8 x float>, <8 x i1>, "		Str << " declare <8 x float> @llvm.vp.fabs.v8f32(<8 x float>, <8 x i1>, "
"i32)";		"i32)";
Str << " declare <8 x float> @llvm.vp.sqrt.v8f32(<8 x float>, <8 x i1>, "		Str << " declare <8 x float> @llvm.vp.sqrt.v8f32(<8 x float>, <8 x i1>, "
"i32)";		"i32)";
		Str << " declare <8 x float> @llvm.vp.powi.v8f32.i32(<8 x float>, i32, "
		"<8 x i1>, i32)";
Str << " declare <8 x float> @llvm.vp.fma.v8f32(<8 x float>, <8 x float>, "		Str << " declare <8 x float> @llvm.vp.fma.v8f32(<8 x float>, <8 x float>, "
"<8 x float>, <8 x i1>, i32) ";		"<8 x float>, <8 x i1>, i32) ";
Str << " declare <8 x float> @llvm.vp.fmuladd.v8f32(<8 x float>, "		Str << " declare <8 x float> @llvm.vp.fmuladd.v8f32(<8 x float>, "
"<8 x float>, <8 x float>, <8 x i1>, i32) ";		"<8 x float>, <8 x float>, <8 x i1>, i32) ";

Str << " declare void @llvm.vp.store.v8i32.p0v8i32(<8 x i32>, <8 x i32>*, "		Str << " declare void @llvm.vp.store.v8i32.p0v8i32(<8 x i32>, <8 x i32>*, "
"<8 x i1>, i32) ";		"<8 x i1>, i32) ";
Str << "declare void "		Str << "declare void "
▲ Show 20 Lines • Show All 380 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[VP] Add vp.powi and a pass for expanding vp.powi before DAG.Needs ReviewPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 505725

llvm/docs/LangRef.rst

llvm/include/llvm/CodeGen/MachinePassRegistry.def

llvm/include/llvm/CodeGen/Passes.h

llvm/include/llvm/IR/Intrinsics.td

llvm/include/llvm/IR/VPIntrinsics.def

llvm/include/llvm/InitializePasses.h

llvm/lib/CodeGen/CMakeLists.txt

llvm/lib/CodeGen/ExpandPowi.cpp

llvm/lib/CodeGen/TargetPassConfig.cpp

llvm/test/CodeGen/AArch64/O0-pipeline.ll

llvm/test/CodeGen/AArch64/O3-pipeline.ll

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

llvm/test/CodeGen/ARM/O3-pipeline.ll

llvm/test/CodeGen/Generic/expand-powi.ll

llvm/test/CodeGen/LoongArch/O0-pipeline.ll

llvm/test/CodeGen/LoongArch/opt-pipeline.ll

llvm/test/CodeGen/M68k/pipeline.ll

llvm/test/CodeGen/PowerPC/O0-pipeline.ll

llvm/test/CodeGen/PowerPC/O3-pipeline.ll

llvm/test/CodeGen/RISCV/O0-pipeline.ll

llvm/test/CodeGen/RISCV/O3-pipeline.ll

llvm/test/CodeGen/RISCV/rvv/expand-powi.ll

llvm/test/CodeGen/X86/O0-pipeline.ll

llvm/test/CodeGen/X86/opt-pipeline.ll

llvm/tools/llc/llc.cpp

llvm/tools/opt/opt.cpp

llvm/unittests/IR/VPIntrinsicTest.cpp

[VP] Add vp.powi and a pass for expanding vp.powi before DAG.
Needs ReviewPublic